7,542 Matching Annotations
  1. Aug 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides a useful strategy for treating mouse cutaneous squamous cell carcinoma (mCSCC) with serum derived from mCSCC-exposed mice. The exploration of serum-derived antibodies as a potential therapy for curing cancer is particularly promising but the study provides inadequate evidence for specific effects of mCSCC-binding serum antibodies. This study will be of interest to scientists seeking a novel immunotherapic strategy in cancer therapy.

      Joint Public Review:

      Summary:

      This study presents an immunotherapeutic strategy for treating mouse cutaneous squamous cell carcinoma (mCSCC) using serum from mice inoculated with mCSCC. The author hypothesizes that antibodies in the generated serum could aid the immune system in tumor volume reduction. The study results showed a reduction in tumor volume and altered expression of several cancer markers (p53, Bcl-xL, NF-κB, Bax) suggesting the potential effectiveness of this approach.

      Strengths:

      The approach shows potential effect on preventing tumor progression, from both the tumor size and the cancer biomarker expression levels bringing attention to the potential role of antibodies and B cell responses in cancer therapy.

      We greatly appreciate your positive feedback on our study.

      Weaknesses:

      These are some of the specific things that the author could consider to strengthen the evidence supporting the claims in their study.

      (1) The study fails to provide evidence of the specific effect of mCSCC-antibodies on mCSCC. The study utilized serum which also contains many immune response factors like cytokines that could contribute to tumor reduction. There is no information on serum centrifugation conditions, which makes it unclear whether immune components like antigen-specific T cells, activated NK cells, or other immune cells were removed from the serum. The study does not provide evidence of neutralizing antibodies through isolation, analysis of B cell responses, or efficacy testing against specific cancer epitopes. To affirm the specific antibodies' role in the observed immune response, isolating antibodies rather than employing whole serum could provide more conclusive evidence. Purifying the serum to isolate mCSCC-binding antibodies, such as through protein A purification, and ELISA would have been more useful to quantify the immune response. It would be interesting to investigate the types of epitopes targeted following direct tumor cell injection. A more thorough characterization of the antibodies, including B cell isolation and/or hybridoma techniques, would strengthen the claim.

      I am deeply appreciative of the reviewer's highly professional comments. Tumor development involves the coexistence of cancer cells at different developmental stages, each harboring a variety of known and unknown mutated proteins. These mutated proteins expose multiple known and unknown epitopes, each capable of stimulating the production of corresponding antibodies in healthy mice. Identifying all these antibodies presents a significant challenge. Current research methodologies, such as ELISA, WB, and ChIP, can only identify known antibodies based on existing antigens. A prerequisite for using these techniques is that both antigens and antibodies are identified. At present, there is no technology available to identify antibodies produced by an unknown mutated protein and epitope. However, I find the reviewer's comments insightful. Perhaps we can initially identify some known mCSCC-antibodies on mCSCC. However, studying the specific effect of these known mCSCC-antibodies on mCSCC is uncertain because we believe that tumor shrinkage results from the combined action of both known and unknown antibodies.

      We concur with the reviewer's observations regarding the use of serum, which is rich in immune response factors such as cytokines that could potentially contribute to tumor reduction. In our future research, we plan to systematically analyze the individual roles of these antibodies and cytokines in tumor reduction. In 1973, Nature published a report indicating that serum demonstrated promising results in tumor treatment (Immunotherapy of Cancer with Antibody in Rats. Nature 243, 492 (1973). https://doi.org/10.1038/243492b0). Since then, there have been scarcely any reports on serum therapy for tumors. The primary focus of our study is to evaluate the efficacy of serum therapy in treating tumors. We hypothesize that antibodies and cytokines form a complex interactive network, working in synergy to reduce tumors. Consequently, we believe that studying these antibodies and cytokines in isolation may not yield effective results.

      In this study, the methodology section outlines the process of serum preparation. It is important to note that serum is devoid of blood cells. I hypothesized that whole blood might have superior therapeutic effects compared to serum. This is because antibodies could potentially synergize with immune cells (including T cells, B cells, and NK cells), thereby enhancing the effectiveness of the treatment. As previously discussed, these antibodies, cytokines, and immune cells form a complex interactive network aimed at tumor reduction. Consequently, there are numerous factors that could influence the experimental outcomes, which presents a challenge for analyzing the results. Furthermore, the implementation of whole blood transfusion therapy introduces additional considerations, such as potential side effects and reactions associated with blood transfusions.

      We thank the reviewers for their suggestion to purify the serum in order to isolate mCSCC-binding antibodies. As we previously mentioned, separating a large number of both known and unknown serum antibodies presents a significant technical challenge. We are eager to discuss and consider suggestions from the reviewers regarding methods to identify a large variety and number of unknown antibodies on cells. Perhaps, as the reviewer suggested, we could begin with known antibodies and employ Protein A purification technology to purify these antibodies and subsequently detect immune responses. We could also categorize the types of epitopes targeted, direct tumor cell injection, to study the epitopes of these types in further studies. The suggestion to study the response of B cells is valuable, and we plan to conduct comprehensive research on the response and status of B cells in our future studies.  

      The purification of antibodies to enhance the specificity of their effectiveness against tumors is a critical aspect of our study. However, we would like to address some concerns raised. (1) The separation of all antibodies and cytokines presents a significant technical challenge. Particularly, there is a risk of overlooking antibodies that are present in low concentrations but play crucial roles. (2) What concerns us is that studying the composition separately would lose the overall effectiveness of the study. Our primary concern is that studying these components in isolation could compromise the holistic understanding of the study. This is akin to current research on traditional medicine, where the separation and individual study of compounds often result in a loss of overall therapeutic efficacy. For instance, consider a scenario where 100 antibodies collectively work to shrink a tumor. These antibodies interact with 20 cytokines, forming a complex network that enhances the cytokines' activity against tumor cells. Furthermore, many important antibodies and cytokines are currently unknown. Studying these antibodies in isolation could potentially result in the loss of this therapeutic effect. Therefore, in the discussion section, we have emphasized that our study considers a tumor mass, including tumor cells at various stages of development, as a single entity. As a practicing clinician, my primary focus is on the therapeutic outcomes in tumor treatments, despite the mechanisms of serum therapy remaining largely elusive, liking a black box.

      (2) In the study design, the control group does not account for the potential immunostimulatory effects of serum injection itself. A better control would be tumor-bearing mice receiving serum from healthy non-mCSCC-exposed mice. Additionally, employing a completely random process for allocating the treatment groups would be preferable. Also, the study does not explain why intravenous injection of tumor cells would produce superior antibodies compared to those naturally generated in mCSCC-bearing mice.

      I concur with the reviewer's perspective that using serum from healthy, non-mCSCC exposed mice as a control could potentially improve our study. Initially, our primary concern was to minimize harm to the mice and avoid excessive blood reactions, which led us to exclude the use of serum from healthy, non-mCSCC exposed mice in our control group. The main objective of our study was to investigate tumor shrinkage through serum treatment, specifically serum-derived antibodies. We anticipated that tumor-bearing mice receiving serum from healthy, non-mCSCC exposed mice would exhibit a response to the injected serum, which would manifest as a blood reaction. However, we did not expect this to result in a tumor treatment effect. If it turns out that normal serum (from healthy, non-mCSCC-exposed mice) possesses tumor-reducing properties, it would indeed be a novel discovery. We appreciate the reviewer's insightful suggestion and will consider incorporating it into our future research.

      We concur with the reviewer's observations that the use of a completely random process for assigning treatment groups would be more desirable. Indeed, the complete randomization of the entire process further underscores the efficacy and universality of serum therapy. In this study, we utilized paired mice to mitigate the risk of cross-infection and adverse reactions associated with blood transfusions. We deeply value the reviewer's expert feedback.  

      Lastly, the reason why tumor cells, when intravenously injected, produce antibodies superior to those naturally generated in mCSCC-bearing mice, is due to the following reasons. As tumor cells grow, they produce a variety of mutated proteins to adapt to the immune microenvironment and evade the immune system of mCSCC-bearing mice. However, these tumor cells with mutated proteins are exceptionally sensitive and recognizable to healthy mice. This recognition triggers an immune response in healthy mice, leading to the production of specific therapeutic antibodies. This simultaneous production of diverse and abundant antibodies is only achievable by living organisms.

      (3) In Figure 2B, it would be more helpful if the author could provide raw data/figures of the tumor than just the bar graph. Similarly in Figure 3, the author should show individual data points in addition to the error bar to visualize the actual distribution.

      Raw data (numerical values) have been incorporated into Figures 2B and 3, but the data is placed in the table below the graph. If placed above the error bar, it requires a small font and may not be clear.

      (4) The author mentioned that different stages of tumor cells have different surface biomarkers. Therefore, experimenting with injecting tumor cells at various stages could reveal the most immunogenic stage. Such an approach would allow for a comparative analysis of immune responses elicited by tumor cells at different stages of development.

      Yes, throughout the course of tumor development, tumor cells at various stages will exhibit distinct markers or possess different mutated proteins. The concept of segregating tumor cells from different stages and independently comparing their immune responses is indeed commendable. Future research could involve isolating cells that express identical biomarkers at each stage for a comparative analysis of the immune responses triggered by the tumor cells. However, this approach diverges from the original intent of this study.

      Most tumor cells exist within the same developmental stage. However, this does not imply that all tumor cells within the tumor mass are at the same stage. For instance, a stage III liver cancer tumor may contain both stage I and stage IV tumor cells. Moreover, due to the complexity of tumor development, not all tumor cell surface markers are identical, even for tumors at the same stage. For instance, 20 major proteins and 100 minor proteins are implicated in tumor formation. In fact, random mutations in just 5 of these major proteins and 10 minor proteins can instigate the development of tumors. This implies that the protein pattern (tumor cell surface markers) associated with each individual's tumor is unique. While studying tumor cells at different stages separately allows for the observation of the immune response of tumor cells at each stage, it lacks a comprehensive research and treatment effect. For this reason, the design of this study treats a tumor mass as a whole, encompassing both the primary stage tumor cells and those not in that stage. These tumor cells are then injected to produce corresponding therapeutic antibodies. Furthermore, if tumor cells from only one stage are isolated and specific antibodies are produced against these cells, it could lead to immune escape of tumor cells at other stages, preventing the tumor from shrinking. Therefore, our approach aims to address this issue by considering the tumor mass as a whole.

      (5) In the abstract the author mentioned that using mCSCC is a proof-of-concept for this potential cancer treatment strategy. The discussion session should extend to how this strategy might apply to other cancer types beyond carcinoma.

      We have incorporated an additional paragraph in the discussion section where we delve into the concepts and experimental principles underpinning this study. This, we believe, addresses the reviewer's query regarding the applicability of our study's methodology to other types of tumors. The process for other tumors also involves isolating cells from the tumor, stimulating therapeutic antibody production in healthy mice using these cells, and ultimately reintroducing these antibodies into mice with tumors to facilitate tumor elimination

      Recommendations For The Authors:

      The author is encouraged to refine the study's design in future studies considering the weaknesses highlighted above, summarize the results more effectively, and seek opportunities to expand on this promising idea and enhance the research's impact and applicability.

      We greatly appreciate the valuable suggestions provided by the editor and reviewers. These insights will certainly be addressed in our future research endeavors.

      Suggestions for title modification:

      Following the scope of the study, the term 'specific homologous neutralizing-antibodies' may be misleading as neutralizing antibodies typically refer to antibodies preventing viral cell entry. In cancer therapy, 'neutralization' is not a relevant concept, as cancer cells do not infect host cells. Using whole tumor cells as immunogens diverges from the specificity of traditional vaccination approaches that utilize well-defined proteins or antigens. Furthermore, the term "homologous" suggests a precision in targeting that is not demonstrated by reintroducing serum without isolating its specific components. Therapeutic effects should not be attributed to "neutralizing antibodies" without isolating or characterizing the antibody response or verifying their efficacy against specific cancer epitopes. Additionally, it is suggested that you indicate the biological system that your study utilised in the title. More so, this approach is not entirely novel, as seen with the use of adjuvants in some flu vaccines, or in Moderna's cancer vaccine mRNA-4157, which encodes up to 34 patient-specific tumor neoantigens. You can consider the title below or a variant of the same.

      Suggested title: Generating serum-based antibodies from tumor-exposed mice: a potential strategy in cutaneous squamous cell carcinoma treatment

      I concur with your suggestion and have modified the title to " Generating serum-based antibodies from tumor-exposed mice: a new potential strategy for cutaneous squamous cell carcinoma treatment ". I believe this research remains some new, hence the addition of the word "new". Furthermore, the term "novel" in the paper has been either removed or substituted.

      Moreover, I propose that this study shares similarities with Moderna's cancer vaccine mRNA-415, albeit with certain differences. Moderna's cancer vaccine mRNA-415 encodes 34 recognized neoantigens to stimulate an immune response by eliciting specific T cell responses. This is similar to the strategy of some companies developing a protein set for diagnosing lung cancer, liver cancer, among others. Without a doubt, these methods have improved the effectiveness of tumor diagnosis and treatment. However, I think that these methods currently face challenges in completely eradicating tumors because they perceive tumors as a static process and cells that express certain mutated proteins in a fixed manner. I believe that small molecule antibodies, cytokines, and immune cells present in serum that are difficult to detect, have low concentrations, or are unknown are essential for maintaining the expression of important mutant proteins and the escape of tumor cells. This is also the primary reason why tumors are difficult to treat and prone to recurrence at present.

      From my perspective, different tumors, as well as different stages of the same tumor, express varying mutated proteins or surface markers. Targeting some may result in others escaping or even creating a more conducive growth environment for those that do escape. Our study adopts a comprehensive view of a tumor block, encompassing tumor cells at different stages and tumor cells at the same stage but expressing different biomarkers. This approach generates a multitude of known and unknown antibodies that work in concert with cytokines and immune cells. While our method may not be capable of generating all mutated proteins and epitope antibodies due to the weakness of some antigens (epitopes of mutated proteins), it can still be effective. As long as the number of tumor cells is reduced below a certain threshold following multiple rounds of treatment with various antibodies produced at different stages, these cancer cells can be eradicated by the body's immune system. This is a process that is real-time and dynamic. Undoubtedly, if it becomes evident that alterations in a set of proteins can bolster the immune system and eradicate tumor cells, then the implications are significant. The immunotherapy proteins, which have demonstrated positive therapeutic effects, developed by certain companies are also predicated on this very principle.

      Finally, I greatly appreciate your suggestions, which will be considered and gradually addressed in future research.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review): 

      By mapping H3K4me2 in mouse oocytes and pre-implantation embryos, the authors aim to elucidate how this histone modification is erased and re-established during the parental-to-zygotic transition, as well as how the reprogramming of H3K4me2 regulates gene expression and facilitates zygotic genome activation.

      Employing an improved CUT&RUN approach, the authors successfully generated H3K4me2 profiling data from a limited number of embryos. While the profiling experiments are very well executed, several weaknesses, particularly in data analysis, are apparent:

      (1) The study emphasizes H3K4me2, which often serves as a precursor to H3K4me3, a well-studied modification during early development. Analyzing the new H3K4me2 dataset alongside published H3K4me3 data is crucial for comprehensively understanding epigenetic reprogramming post-fertilization and the interplay between histone modifications. However, the current analysis is preliminary and lacks depth.

      Thank you very much for your valuable suggestions. The data of histone H3K4me3 in humans and mice has been published,and our previous data revealed the unique pattern of H3K4me3 during early human embryos and oocytes (Xia et al., 2019). So, this study mainly focuses on the localization of H3K4me2 in mouse oocytes and preimplantation embryos, how it is erased and re-established during mammalian parental-to-zygote transition, and its function. The combined analysis of H3K4me2 and H3K4me3 is not our main work, but it is not ruled out that there may be new discoveries between these two histones. Previously, our data tended to show that the H3K4me2 not only acts as a precursor of H3K4me3, but also plays its role independently.

      (2) Tranylcypromine (TCP) is known as an irreversible inhibitor of monoamine oxidase and LSD1. While the authors suggest TCP inhibits the expression of LSD2, this assertion is questionable. Given TCP's potential non-specific effects in cells, conclusions related to the experiments using TCP should be made with caution.

      Thank you for pointing this out, and we thank the reviewer again for the important suggestion. We found that the previous study indicated that TCP was a non-reversible inhibitor of LSD1 and LSD2, but according to our data, the content of LSD1 was very low in the early stages of mouse embryos, which mainly inhibited the function of LSD2. (Binda et al., 2010; Fang et al., 2010 )

      (3) Some batches of H3K4me2 antibody are known to cross-react with H3K4me3. Has the H3K4me2 antibody used in CUT&RUN been tested for such cross-reactivity? Heatmaps in the figures indeed show similar distribution for H3K4me2 and H3K4me3, further raising concerns about antibody specificity.

      We thank the reviewer for the insightful comments. The H3K4me2 antibody was purchased from Millipore (cat. 07030). Figure 2A shows the specific enrichment area of H3K4me2 in promoter and distal region. Some batches of H3K4me2 antibody are known to cross-react with H3K4me3, but the H3K4me2 antibody we used in our CUT&RUN seems to have Low cross-reactivity.

      (4) Certain statements lack supporting references or figures (examples on page 9 can be found on line 245, line 254, and line 258).

      Thank you for pointing this out, and we will add references to support the statement in the paper as suggested.

      (5) Extensive language editing is recommended to clarify ambiguous sentences. Additionally, caution should be taken to avoid overstatement - most analyses in this study only suggest correlation rather than causality.

      Thank you for your kind comments. We will revise the expression in the manuscript later.

      Reviewer #2 (Public Review):

      Chong Wang et al. investigated the role of H3K4me2 during the reprogramming processes in mouse preimplantation embryos. The authors show that H3K4me2 is erased from GV to MII oocytes and re-established in the late 2-cell stage by performing Cut & Run H3K4me2 and immunofluorescence staining. Erasure and re-establishment of H3K4me2 have not been studied well, and profiling of H3K4me2 in germ cells and preimplantation embryos is valuable to understanding the reprogramming process and epigenetic inheritance.

      (1) The authors claim that the Cut & Run worked for MII oocytes, zygotes, and the 2-cell embryos. However, it is unclear if H3K4me2 is erased during the stage or if the Cut & Run did not work for these samples. To support the hypothesis of the erasure of H3K4me2, the authors conducted immunofluorescence staining, and H3k4me2 was undetected in the MII oocyte, PN5, and 2-cell stage. However, the published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage ((Ancelin et al., 2016; Shao et al., 2014)). The authors need to cite these papers and discuss the contradictory findings.

      The authors used 165 MII oocytes and 190 GV oocytes for the Cut & Run. The amount of DNA in MII oocytes is halved because of the emission of the first polar body. Would it be a reason that H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes?

      First of all, thank you for your valuable advice. The published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage, which is interesting. I think we may have used different parameters in the confocal laser shooting process(Ancelin et al., 2016). We used the same parameter to continuously shoot the blastocyst stage from the GV stage. If we only shot the fertilized egg and the 2-cell stage, I think we may also see weak fluorescence at the 2-cell stage under different parameters. We will refer to this reference and discuss it in the resubmitted version.

      Moreover, you mentioned the H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes, because the MII expelled the polar body. There is no problem with this logic. However, the first polar body expelled from the MII stage is still in the zona pellucida, and we also collected the polar body in the CUT&RUN experiment; Therefore, compared to GV, the DNA content of MII samples is not halved. After further discussion, we believe that the reduction of H3K4me2 peaks in MII stage compared with GV stage may be closely related to oocyte maturation. It is the specific modification of histones in different forms at different times that affects the chromatin structure change appropriately with the different stages of meiosis. At present, it has been confirmed that H3K4me3 gradually decreases from GV to MII stage during the maturation of human oocytes. H3K27me3 did not change from GV to MII stage.

      In Figure 3C, 98% (13,183/13,428) of H3K4me2 marked genes in GV oocytes overlap with those in the 4-cell stage. Furthermore, 92% (14,049/15,112) of H3K4me2 marked genes in sperm overlap with those in the 4-cell stage. Therefore, most regions maintain germ line-derived H3K4me2 in the 4-cell stage. The authors need to clarify which regions of germ line-derived H3K4me2 are maintained or erased in preimplantation embryos. Additionally, it would be interesting to investigate which regions show the parental allele-specific H3K4me2 in preimplantation embryos since the authors used hybrid preimplantation embryos (B6 x DBA).

      Thank you very much for your suggestion. Further analysis of which regions show the parental allele-specific H3K4me2 in preimplantation embryos will make the study more interesting. We will discuss this in depth in resubmitted vision.

      (2) The authors claim that Kdm1a is rarely expressed during mouse embryonic development (Figure 4A). However, the published paper showed that KDM1a is present in the zygote and 2-cell stage using immunostaining and western blotting ((Ancelin et al., 2016)). Additionally, this paper showed that depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage, and therefore, KDM1a is functionally important in early development. The authors should have cited the paper and described the role of KDM1a in early embryos.

      In the analysis of this experiment, we believe that in the early embryonic development of mice, the expression of KDM1A is lower than that of KDM1B, which is relative. Similarly, the transcriptome data we cite also show that KDM1A is expressed at elevated levels during oocyte maturation and fertilization compared to immature oocytes. In addition, the effects of loss of maternal KDM1a on embryonic development were not discussed. We believe that the absence of maternal KDM1b blocks embryonic development, and we will cite and discus the references later.

      (3) The authors used the published RNA data set and interpreted that KDM1B (LSD2) was highly expressed at the MII stage (Figure S3A). However, the heat map shows that KDM1B expression is high in growing oocytes but not at 8w_oocytes and MII oocytes. The authors need to interpret the data accurately.

      After re-checking the data, we found that there was a problem with the normalization method of our heat map, and we will re-make the heatmap and submit it in the modified version. With reference to Figure 4A, the content of Kdm1b is indeed higher than that of Kdm1a.

      (4) All embryos in the TCP group were arrested at the four-cell stage. Embryos generated from KDM1b KO females can survive until E10.5 (Ciccone et al., 2009); therefore, TCP-treated embryos show a more severe phenotype than oocyte-derived KDM1b deleted embryos. Depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage ((Ancelin et al., 2016)). The authors need to examine whether TCP treatment affects KDM1a expression. Western blotting would be recommended to quantify the expression of KDM1A and KDM1B in the TCP-treated embryos.

      We will further dig the transcriptome data to confirm the specificity of TCP to KDM1b. In addition, the intervention of TCP on the whole fertilized egg in this study increased the H3K4me2 content, and the embryo development retarding effect was more significant than that obtained by crossing with normal paternal lines after knocking down KDM1B from the mother.

      (5) H3K4me2 is increased dramatically in the TCP-treated embryos in Figure 4 (the intensity is 1,000 times more than the control). However, the Cut & Run H3K4me2 shows that the H3K4me2 signal is increased in 251 genes and decreased in 194 genes in the TCP-treated embryos (Fold changes > 2, P < 0.01). The authors need to explain why the gain of H3K4me2 is less evident in the Cut & Run data set than in the immunofluorescence result.

      Thanks a lot for your question. In the experimental group, the fluorescence value of H3K4me2 in IF was increased by 1000 times (Figure 4E), and the expression of H3K4Me2-related genes in CR was up-regulated and down-regulated for a total of 445 changes (Figure 6A). In our opinion, as a semi-quantitative analysis, immunofluorescence cannot be compared with the quantitative analysis method of CR because of the different analysis models and threshold Settings.

      References

      Ancelin, K., ne Syx, L., Borensztein, M., mie Ranisavljevic, N., Vassilev, I., Briseñ o-Roa, L., Liu, T., Metzger, E., Servant, N., Barillot, E., Chen, C.-J., Schü le, R., & Heard, E. (2016). Maternal LSD1/KDM1A is an essential regulator of chromatin and transcription landscapes during zygotic genome activation. https://doi.org/10.7554/eLife.08851.001

      Ciccone, D. N., Su, H., Hevi, S., Gay, F., Lei, H., Bajko, J., Xu, G., Li, E., & Chen, T. (2009). KDM1B is a histone H3K4 demethylase required to establish maternal genomic imprints. Nature, 461(7262), 415-418. https://doi.org/10.1038/nature08315

      Shao, G. B., Chen, J. C., Zhang, L. P., Huang, P., Lu, H. Y., Jin, J., Gong, A. H., & Sang, J. R. (2014). Dynamic patterns of histone H3 lysine 4 methyltransferases and demethylases during mouse preimplantation development. In Vitro Cellular and Developmental Biology - Animal, 50(7), 603-613. https://doi.org/10.1007/s11626-014-9741-6

      References

      Xia W, Xu J, Yu G, Yao G, Xu K, Ma X, Zhang N, Liu B, Li T, Lin Z, Chen X, Li L, Wang Q, Shi D, Shi S, Zhang Y, Song W, Jin H, Hu L, Bu Z, Wang Y, Na J, Xie W, Sun YP. Resetting histone modifications during human parental-to-zygotic transition. Science. 2019 Jul 26;365(6451):353-360. doi: 10.1126/science.aaw5118. Epub 2019 Jul 4. PMID: 31273069.

      Binda C, Valente S, Romanenghi M, Pilotto S, Cirilli R, Karytinos A, Ciossani G, Botrugno OA, Forneris F, Tardugno M, Edmondson DE, Minucci S, Mattevi A, Mai A. Biochemical, structural, and biological evaluation of tranylcypromine derivatives as inhibitors of histone demethylases LSD1 and LSD2. J Am Chem Soc. 2010 May 19;132(19):6827-33.

      Fang R, Barbera AJ, Xu Y, Rutenberg M, Leonor T, Bi Q, Lan F, Mei P, Yuan GC, Lian C, Peng J, Cheng D, Sui G, Kaiser UB, Shi Y, Shi YG. Human LSD2/KDM1b/AOF1 regulates gene transcription by modulating intragenic H3K4me2 methylation. Mol Cell. 2010 Jul 30;39(2):222-33. doi: 10.1016/j.molcel.2010.07.008. PMID: 20670891; PMCID: PMC3518444.

      Ancelin K, Syx L, Borensztein M, Ranisavljevic N, Vassilev I, Briseño-Roa L, Liu T, Metzger E, Servant N, Barillot E, Chen CJ, Schüle R, Heard E. Maternal LSD1/KDM1A is an essential regulator of chromatin and transcription landscapes during zygotic genome activation. Elife. 2016 Feb 2;5:e08851. doi: 10.7554/eLife.08851. PMID: 26836306; PMCID: PMC4829419.

      Reviewer #3 (Public Review):

      Summary:

      This study explores the dynamic reprogramming of histone modification H3K4me2 during the early stages of mammalian embryogenesis. Utilizing the advanced CUT&RUN technique coupled with high-throughput sequencing, the authors investigate the erasure and re-establishment of H3K4me2 in mouse germinal vesicle (GV) oocytes, metaphase II (MII) oocytes, and early embryos.

      Strengths:

      The findings provide valuable insights into the temporal and spatial dynamics of H3K4me2 and its potential role in zygotic genome activation (ZGA).

      Weaknesses:

      The study primarily remains descriptive at this point. It would be advantageous to conduct further comprehensive functional validation and mechanistic exploration.

      Key areas for improvement include enhancing the innovation and novelty of the study, providing robust functional validation, establishing a clear model for H3K4me2's role, and addressing technical and presentation issues. The text would benefit from the introduction of a novel conceptual framework or model that provides a clear explanation of the functional consequences and molecular mechanisms underlying H3K4me2 reprogramming in the transition from parental to early embryonic development.

      While the findings are significant, the current manuscript falls short in several critical areas. Addressing major and minor issues will significantly strengthen the study's contribution to the field of epigenetic reprogramming and embryonic development.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Summary of the changes

      Changes in the manuscript were made to clarify some ambiguities raised by the reviewers and to improve the report following their recommendations. A summary of the main changes is listed below:

      - The title was changed to better reflect the results of this study - Re-training the model on log transformed FACS scores.

      - Testing the specificity of the FEPS to facial expression of pain within this experimental setup by comparing it to the activation maps obtained from the Warm stimulation condition.

      - Testing for sensitization/habituation of the behavioral measures (FACS scores and pain ratings).

      - Adding a section in the discussion to better address the limitations of this study and provide potential directions for future studies.

      Other changes target areas where the original manuscript may have been ambiguous or lacked precision. To address these concerns, additional details have been incorporated, and certain terms have been revised to ensure a more precise and transparent presentation of the information.

      Public Reviews:

      Reviewer #1 (Public Review):

      Picard et al. report a novel neural signature of facial expressions of pain. In other words, they provide evidence that a specific set of brain activations, as measured by means of functional magnetic resonance imaging (fMRI), can tell us when someone is expressing pain via a concerted activation of distinctive facial muscles. They demonstrate that this signature provides a better characterization of this pain behaviour when compared with other signatures of pain reported by past research. The Facial Expression of Pain Signature (FEPS) thus enriches this collection and, if further validated, may allow scientists to identify the neural structures subserving important non-verbal pain behaviour. I have, however, some reservations about the strength of the evidence, relating to insufficient characterization of the underlying processes involved.

      We are thankful for the summary of our work. We are hopeful that the modifications made in the latest version effectively address these concerns. The changes are outlined in the summary above, and detailed in the following point-by-point response.

      Strengths:

      The study relies on a robust machine-learning approach, able to capitalise on the multivariate nature of the fMRI data, an approach pioneered in the field of pain by one of the authors (Dr. Tor Wager). This paper extends Wager's and other colleagues' work attempting to identify specific combinations of brain structures subserving different aspects of the pain experience while examining the extent of similarity/dissimilarity with the other signatures. In doing so, the study provides further methodological insight into fine-grained network characterization that may inspire future work beyond this specific field.

      We are thankful for the positive comments.

      Weaknesses:

      The main weakness concerns the lack of a targeted experimental design aimed to dissect the shared variance explained by activations both specific to facial expressions and to pain reports. In particular, I believe that two elements would have significantly increased the robustness of the findings:

      (1) Control conditions for both the facial expressions and the sensory input. An efficient signature should not be predictive of neutral and emotional facial expressions (e.g., disgust) other than pain expressions, as well as it should not be predictive of sensations originating from innocuous warm stimulation or other unpleasant but non-painful stimulation.

      We do recognize the lack of specificity testing for the FEPS, especially towards negative emotional facial expressions. This would be relevant to test given the behavioural overlap between the facial expressions of pain and disgust, fear, anger, and sadness (Kunz et al., 2013; Williams, 2003). The experimental design used in this study did not include other negative states. However, we fully support the necessity of collecting data throughout those conditions, and we believe that the present study highlights the importance of such a demonstration. Future research should involve recording facial expressions while exposing participants to stimuli that elicit a range of negative emotions but, to our knowledge, such combination of fMRI and behavioural data is currently unavailable. As raised by the reviewer, this approach would allow us to assess the specificity of the FEPS to the facial expression evoked by pain compared to different affective states. We would like to emphasise that specificity and generalizability testing is a massive amount of work, requiring multiple studies to address comprehensively. A Limitations paragraph addressing this research direction has been added to the Discussion. A conclusion was added to the abstract as follows: “Future studies should explore other pain-relevant manifestations and assess the specificity of the FEPS against other types of aversive or emotional states.”

      (2) Graded intensity of the sensory stimulation: different intensities of the thermal stimulation would have caused a graded facial expression (from neutral to pain) and graded verbal reports (from no pain to strong pain), thus offering a sensitive characterisation of the signal associated with this condition (and the warm control condition).

      However, these conditions are missing from the current design, and therefore we cannot make a strong conclusion about the generalisability of the signature (regardless of whether it can predict better than other signatures - which may/may not suffer from similar or other methodological issues - another potential interesting scientific question!). The authors seem to work on the assumption that the trials where warm stimulation was delivered are of no use. I beg to disagree. As per my previous comment, warm trials (and associated neutral expressions) could be incorporated into the statistical model to increase the classification sensitivity and precision of the FEPS decoding.

      The experience of pain can fluctuate for a fixed intensity or after controlling statistically for the intensity of the stimulation (Woo et al., 2017). Consistent with this, the current study focused on spontaneous facial expression in response to noxious thermal stimuli delivered at a constant intensity that produced moderate to strong pain in every participant. As the reviewer points out, this does not allow us to characterise and compare the stimulus-response function of facial expression and pain ratings. The advantage of the approach adopted is to maximise the number of trials where facial expression is more likely to occur, while ensuring that changes in facial expression and pain ratings are not confounded with changes in stimulus intensity. The manuscript has been revised to clarify that point. However, we do agree that it would be interesting to conduct more studies focusing on facial expression in response to a range of stimulus intensities. This discussion has been added to the Limitations paragraph.

      Furthermore, following the reviewer’s suggestion, we performed complementary analyses on the warm trials in the proposed revisions. The dot product (FEPS scores) between the FEPS and the activation maps associated with the warm condition was computed. A linear mixed model was conducted to investigate the association between FEPS scores and the experimental condition (warm vs pain). The trials in the pain condition were divided into two conditions: null FACS scores (painful trials with no facial response; FACS scores = 0) and non-null FACS scores (painful trials with a facial response; FACS > 0). The details of this analysis have been added to the manuscript (see Response of the FEPS to pain and warm section in the Methods; lines 427 to 439) as well as the corresponding results (see Results and Discussion; lines 138 to 158). The FEPS scores were larger in the pain condition where a facial response was expressed, compared to both the pain condition without facial expression and the warm condition. These results confirmed the sensitivity of the FEPS to facial expression of pain.

      Reviewer #2 (Public Review):

      Summary:

      The objective of this study was to further our understanding of the brain mechanisms associated with facial expressions of pain. To achieve this, participants' facial expressions and brain activity were recorded while they received noxious heat stimulation. The authors then used a decoding approach to predict facial expressions from functional magnetic resonance imaging (fMRI) data. They found a distinctive brain signature for pain facial expressions. This signature had minimal overlap with brain signatures reflecting other components of pain phenomenology, such as signatures reflecting subjective pain intensity or negative effects.

      We appreciate this concise and accurate summary of our study.

      Strength:

      The manuscript is clearly written. The authors used a rigorous approach involving multivariate brain decoding to predict the occurrence and intensity of pain facial expressions during noxious heat stimulation. The analyses seem solid and well-conducted. I think that this is an important study of fundamental and clinical relevance.

      Weaknesses:

      Despite those major strengths, I felt that the authors did not suffciently explain their own interpretation of the significance of the findings. What does it mean, according to them, that the brain signature associated with facial expressions of pain shows a minimal overlap with other pain-related brain signatures?

      We express our sincere gratitude for the valuable insights and constructive comments on the strengths and weaknesses of the current study. We thank reviewer 2 for the encouragement to reinforce our interpretation of the significance of the findings, while acknowledging the limitations raised by the three reviewers.

      A few questions also arose during my reading.

      Question 1: Is the FEPS really specific to pain expressions? Is it possible that the signature includes a facial expression signal that would be shared with facial expressions of other emotions, especially since it involves socio-affective regulation processes? Perhaps this question should be discussed as a limit of the study?

      We acknowledge this limitation as outlined in response to Reviewer #1. We have incorporated a Limitations paragraph to provide a more in-depth discussion of this limitation and to explore potential future avenues (lines 225 to 268). Again, please note that the demonstration of specificity is an incremental process that requires a systematic comparison with other conditions where facial expressions are produced without pain. A concluding sentence was added to the abstract to encourage specificity testing in future studies. as indicated above.

      Question 2: All AUs are combined together in a composite score for the regression. Given that the authors have other work showing that different AUs may be associated with different components of pain (affective vs. sensory), is it possible that combining all AUs together has decreased the correlation with other pain signatures? Or that the FEPS actually reflects multiple independent signatures?

      The question raised is consistent with the work of Kunz, Lautenbacher, LeBlanc and Rainville (2012), and Kunz, Chen and Rainville (2020). In the current study, the pain-relevant action units were combined in order to increase the number of trials where a facial response to pain was expressed, thus enhancing the robustness of our analyses. Given the limited sample size, our current dataset is unfortunately insufficient to perform such analysis as there would not be enough trials to look at the action units separately or in subgroups. While the approach of combining the different AUs has proven to be valid and useful, we recognize the value of investigating potential independent signatures associated with the different AUs within the FEPS, and examining whether those signatures can lead to more similar patterns compared to previously developed pain signatures. This discussion has been included in the Limitations paragraph in the Discussion (lines 225 to 268).

      Question 3: Is facial expressivity constant throughout the experiment? Is it possible that the expressivity changes between the beginning and the end of the experiment? For instance, if there is a habituation, or if the participant is less surprised by the pain, or in contrast if they get tired by the end of the experiment and do not inhibit their expression as much as they did at the beginning. If facial expressivity changes, this could perhaps affect the correlation with the pain ratings and/or with the brain signatures; perhaps time (trial number) could be added as one of the variables in the model to address this question.

      The concern raised by the reviewer is legitimate. We conducted a mixed-effects model to assess the impact of successive trials and runs on facial expressivity. Results indicate that the FACS scores did not change significantly throughout the experiment, suggesting no notable effect of habituation or sensitization on the facial expressivity in our study. Details about the analysis and the results have been added to the Facial Expression section in the Methods (lines 335 to 346).

      Reviewer #3 (Public Review):

      In this manuscript, Picard et al. propose a Facial Expression Pain Signature (FEPS) as a distinctive marker of pain processing in the brain. Specifically, they attempt to use functional magnetic resonance imaging (fMRI) data to predict facial expressions associated with painful heat stimulation. The main strengths of the manuscript are that it is built on an extensive foundation of work from the research group, and that experience can be observed in the analysis of fMRI data and the development of the machine learning model. Additionally, it provides a comparative account of the similarities of the FEPS with other proposed pain signatures. The main weaknesses of the manuscript are the absence of a proper control condition to assess the specificity of the facial pain expressions, a few relevant omissions in the methodology regarding the original analysis of the data and its purpose, and a biased interpretation of the results.

      I believe that the authors partially succeed in their aims, as described in the introduction, which are to assess the association between pain facial expression and existing pain-relevant brain signatures, and to develop a predictive brain activation model of the facial responses to painful thermal stimulation. However, I believe that there is a clear difference between those aims and the claim of the title, and that the interpretation of the results needs to be more rigorous.

      We wish to express our appreciation for the insightful and constructive critique provided. The limitation pertaining to the absence of specificity testing had been addressed in response to Reviewer #1, and it has been incorporated into the manuscript (lines 251 to 258).

      The commentary made by Reviewer #3 has drawn our attention to a critical concern, namely the potential misalignment between the study findings and our original title. Consequently, we have changed the title to “A distributed brain response predicting the facial expression of acute nociceptive pain”. We also revised the interpretation of the results in the discussion section and we have added a section on limitations.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      I hope the following comments will be useful to improve the manuscript.

      Abstract

      I felt the abstract could be more clear in terms of experimental or scientific questions, hypotheses/expectations, and findings. I also feel the abstract should briefly support the conclusive claim ("is better than...": how better? Or according to what criterion? This may be more relevant than the final conclusive general sentence that does not specifically address the significance of the findings).

      The abstract was revised to reinforce the functional perspective adopted to interpret brain activity produced by noxious stimuli and predicting various pain-relevant manifestations. We also mention explicitly the other pain-relevant signatures against which the FEPS is compared in this report, and we added a concluding sentence highlighting the importance of assessing the specificity of the FEPS in future studies.

      Introduction - background and rationale

      I would postpone the discussion around pain signature and anticipate the one about the brain mechanisms of facial expressions of pain. This will allow you to reinforce the logical flow of rationale, literature gap/question, why the problem is important, and study aims. Only then go for a review of relevant literature on signatures before providing a more specific final paragraph about the study-specific questions, expectations, and implementation. At the moment this is limited to a single very descriptive short paragraph at the end of the intro.

      The introduction was structured to guide the readers through a comprehensive understanding of different pain neurosignatures. The introduction aimed to establish a robust rationale for the subsequent analyses detailed in the results section. Indeed, the presentation of that literature ensured that the discussion around pain signatures is contextualised within a broader continuous framework. We acknowledge the reviewer’s comment on the limited description of the brain mechanisms of facial expression of pain. However, this was addressed in several previous reports of our laboratory (Kunz et al. 2011; Vachon-Presseau et al. 2016; Kunz, Chen, and Rainville 2020). We have added some more details about the brain mechanisms of facial expression, and highlighted those references in the first paragraph of the introduction.

      Methods and Results

      (1) Was there any indication of power based on the previous work or the other signature papers? If yes, how that would inform the present analysis?

      The NPS was trained on 20 participants that experienced 12 trials at each of four different intensities. The assessment of the effect sizes was performed on the Neurological Pain Signature in Han et al. (2022). That study revealed a moderate effect size for predicting between-subject pain reports, and a large one for predicting within-subject pain reports. We trained our model on 34 participants that underwent 16 trials. We expected our results to show a smaller effect size as the current experimental design only allowed us to examine spontaneous changes in the facial expression, as noted in the comments made by Reviewer #1. However, the best way to calculate the unbiased effect size of the results presented in the current study would be to test the unchanged model on new independent datasets (see Reddan, Lindquist, and Wager, 2017). Unfortunately, such datasets do not currently exist.

      (2) I would clarify to the reader what is meant by normal range of thermal pain and why is this relevant. Also, I did not find data about this assessment nor about the assessment of facial expressiveness (or reference to where it can be found).

      We changed this formulation to “All participants included in this study had normal thermal pain sensitivity” and we added a few references. By targeting a healthy population with normal thermal pain sensitivity, our study sought to identify a predictive brain pattern related to facial expression evoked by typical responses to pain that could eventually be generalised to other individuals from the same population. Details about the assessment of facial expressiveness have been added in the appropriate section in the Methods.

      (3) That pain ratings are only weakly associated with facial responses is, in its own right, an interesting finding, as a naïve reader would expect the two to be highly positively correlated. I'd suggest discussing this aspect (in reference to previous research) as it is interesting on both theoretical and empirical grounds.

      The likelihood and the strength of pain facial expression generally increase with pain ratings in response to acute noxious stimuli of increasing physical intensities, thereby leading to a positive association between the two responses that is driven by the stimulus. However, the poor correlation or the dissociation between facial pain expression and pain rating is a very well known phenomenon that can be demonstrated easily using experimental methods where the stimulus intensity is held constant and spontaneous fluctuations are observed in both facial expression and pain ratings. This result was not discussed in the current manuscript as it was already addressed in the work of Kunz et al. (2011) and Kunz, Karos and Vervoot (2018). We added the references to these studies in the revised manuscript (lines 330 to 334).

      (4) It may be worth having CIs throughout the whole set of analyses.

      Thanks for the suggestions, this was an oversight. The confidence intervals have been added in the manuscript where applicable.

      (5) I would clarify if there are two measures of the brain signature: dot-product and activation map. Relatedly, I cannot find where the authors explained what "FEPS pattern expression scores". Can the authors please clarify?

      The clarification has been added in the manuscript (lines 413 to 414).

      (6) There seems to be the assumption that the relationship between pain-relevant brain signatures and facial expressions of pain would be parametric and linear. However, this might not hold true. Did the authors test these assumptions?

      We indeed decided to use a linear regression technique (i.e. LASSO regression) to model the association between the brain activity and the facial expression of pain. The algorithm choice was mainly based on the simplicity and the interpretability of that approach, and our limited number of observations. The choice was also coherent with previous studies in the domain (e.g. Wager et al., 2011; Wager et al., 2013; Krishnan et al. 2016; Woo et al., 2017). Using a linear model, we were able to predict above chance level the facial expression evoked by pain using the fMRI activation. However, it is legitimate to think that more complex non linear models can better capture the brain patterns predictive of that behavioural manifestation of pain.

      (7) Did the authors assess whether the FACS were better to be transformed/normalised? More generally, I would report any data assessment/transformation that has not been reported.

      Thank you for this highly relevant suggestion. FACS scores were indeed not normally distributed and the analyses were conducted again to predict the log transformed FACS scores. This transformation was effective to normalize the distribution (skewness = 0.75, kurtosis = -0.84). The predictive model was confirmed on transformed data.

      (8) Page 12: I am not clear on whether all the signatures are included in the same model (like a multiple regression) or if separate regressions are calculated per signature. The authors seem to imply that several regressions have been computed (possibly one per comparison with each signature?).

      The correlation between the FACS scores and the pain-related signatures was computed separately for each signature. This information has been clarified.

      (9) MVPA: See my main comment about warm trials and experimental/statistical design. For example, the LASSO regression model for the pain trials could be compared with a model using warm trials besides (or instead of) the unfitted model. Otherwise, add the warm trials as another predictor or within the subject level in a dummy fixed factor comprising pain and warm trials.

      The inclusion of warm trials in the model training would be inconsistent with the goal of the main analysis to predict the facial expression of pain when a noxious pain stimulus is presented. Secondary analyses were conducted to compare the response of the FEPS to the warm trials compared to noxious pain trials. The dot product between the FEPS and the activation maps (FEPS scores) associated with the warm condition was computed. A linear mixed model was conducted to investigate the association between FEPS scores and the experimental condition (warm vs pain). Additional contrasts compared the warm trials with the pain trials with and without pain facial expression. The details of this analysis have been added to the manuscript (see Response of the FEPS to pain and warm in the Methods) as well as the corresponding results (see Results and Discussion).

      (10) I would clarify for the reader why the separate M1 analysis has been run. Although obvious, I feel the reader would benefit from the specific hypothesis about this control analysis being spelled out together with the other statistical hypotheses within the statistical design in a more streamlined manner.

      We extended the discussion on the rationale of that analysis and its interpretation taking into account the most recent results using the log transformed FACS scores (lines 125 to 133).

      (11) The mixed model aimed to assess the relationship between pain ratings FEPS scores and facial scores is a crucial finding. I believe it speaks to the importance of a more complete design, which I already highlighted. I have a couple of technical questions: did the authors assess random slopes too? And, what was the strategy used to determine the random effects structure?

      The linear mixed model considered the participants as a random effect, with random intercepts, considering the grouping structure in our data (i.e., each participant completed multiple trials). The reported results in the original manuscript were considering fixed slopes. However, following the reviewer’s comment, we re-computed the mixed linear models allowing the slopes to vary according to the intensity ratings. The results were changed in the manuscript to represent the output of those models.

      (12) The text from lines 63 to 67 could go in the methods.

      We decided to include those lines within the Result and Discussion section to give the reader more specification about the FACS scores, as this term is subsequently referenced in the following part of the Results and Discussion section. We are concerned that putting this information only in the Methods section would disrupt the reading.

      Reviewer #2 (Recommendations For The Authors):

      p. 4-5. When you report the positive weight clusters, you follow up with a sentence specifying which cognitive processes those brain regions are typically associated with. However, when you report the negative weight clusters, you do not specify the cognitive processes typically associated with those brain areas. I think that providing that information would be helpful to the readers.

      Thanks for noticing this omission. The information has been added in the most recent version of the manuscript (lines 119 to 121).

      p. 9. You specify that the degree of expressiveness of participants was evaluated. How did you evaluate expressiveness? Did you use this variable in your analyses? Were participants excluded based on their degree of expressiveness?

      Details about the assessment of facial expressiveness have been added in the appropriate section in the Methods (lines 285 to 289).

      p. 10. You explain that two certified FACS-coders evaluated the video recordings to rate the frequency of AUs. Could you please provide more details about the frequency measure? I think that there are different ways in which this could have been done. For instance, were the videos decomposed into frames, and then the frequency measured by summing the number of frames in which the AU occurred? Or was it "expression-based", so one occurrence of an AU (frequency of 1) would correspond to the whole period between its activation onset and offset? Both ways have pros and cons. For example, if the frequency represents the number of frames, then it controls for the total duration of the AU activation within a trial (pro); but if there were multiple activations/deactivations of the AU within one trial, this will not be controlled for (con). And vice-versa with the second way of calculating frequency.

      Details about the frequency scores have been added to the manuscript (lines 315 to 319).

      p. 11. When you explained how you calculated the association between the facial expression of pain and pain-related brain signatures, I felt that there was some information missing. Did you use the thresholded maps (available in the published articles), or did you somehow have access to the complete, voxel-by-voxel, raw regression coefficient maps?

      The unthresholded maps were used. The information has been clarified in the latest version of the manuscript, as well as the details about the availability of the maps (see Data Availability section at the end of the manuscript).

      Reviewer #3 (Recommendations For The Authors):

      Format

      The authors will notice that many observations about the manuscript are related to missing information and a lack of graphical representations. I believe the topic and the content of the manuscript are too complex to condense into a short report.

      Title

      The claim of the title is simply not substantiated by the content of the manuscript. Demonstrating that the FEPS is a distinctive (i.e., specific) marker of pain processing requires a substantially different experimental design, with more rigorous controls and a broader set of painful stimulations. The manuscript would benefit from a more accurate title.

      We agree that the title could better align with our findings. We modified the title accordingly : “A distributed brain response predicting the facial expression of acute nociceptive pain”.

      Abstract

      I find it puzzling that the authors claim that there is limited knowledge of the neural correlates of facial expression of pain given what they describe in the first paragraph of the introduction. Besides, they propose to reanalyze a dataset that has been extensively described in Kunz et al. (2011), which is unlikely to provide any new significant information.

      We respectfully disagree with that comment. We considered that three articles (i.e., Kunz et al., 2011; Vachon-presseau et al., 2016; Kunz, Chen and Rainville, 2020) on the topic do constitute limited knowledge, especially if we compare it to the very large body of literature on the neural correlates associated with pain ratings. Except for these three studies, all the other citations pertain to behavioral studies on facial expression of pain, and do not examine the brain activity related to it. Furthermore, we believe that the complementary nature of the analyses performed in Kunz et al. (2011) and in this manuscript offers new insights into our understanding of facial expression in the context of pain. Indeed, the multivariate approach used in this study addresses some limitations present in Kunz et al. (2011) univariate analyses, mainly that it provides a quantifiable way to compare the similarity between different predictive patterns (Reddan and Wager, 2017). We submit that the assessment of the FEPS against several other pain-relevant signatures provides new and important information.

      Furthermore, the abstract does not clearly state the aim, and the first line of the results does not match what the authors claim in the preceding line. The take-home message (last sentence) introduces the concept of a biomarker, which, as stated before, cannot be validated with the current data/experimental design. To put it in plain words, a given facial expression (or a composite score derived from a combination of expressions) cannot be a specific biomarker for pain, because a person can always mimic the same expression without feeling pain. Whether a given facial expression can be predicted from brain activity is a different issue, and whether that prediction can differentiate between painful and non-painful origins of the facial expression is another different issue. Unfortunately, neither of those issues can be tested with the current data/experimental design. The abstract would improve if the authors would circumscribe to what they actually tested, which is accurately described in the last sentence of the Introduction.

      The abstract was revised accordingly. The term ‘biomarker’ was used in accordance with preceding studies in the field (see Reddan and Wager, 2017; Lee et al., 2021). Please note that we applied the same reasoning to fluctuations in pain expression as previous studies have applied to pain ratings. Of course, we can not dismiss the possibility of someone mimicking facial expressions. Similar reasoning applies to subjective reports, as individuals can intentionally overestimate their pain experience conveyed through verbal reports. This is another case of specificity testing that cannot be addressed in the present study (see new conclusion of the abstract and discussion of limitations). The challenge of pain assessment is a classical problem within both the scientific and the clinical literature. Here, we suggest that the consideration of multiple manifestations of pain is necessary to address this challenge and will provide a more comprehensive portrait of pain-related brain function.

      Introduction

      I believe that the Introduction would benefit from a strict definition of what is a marker/biomarker/neuromarkers (all those terms are used in the manuscript) and what are its desirable features (validity, reliability, specificity, etc.). I also believe that the Introduction (and the rest of the text) would benefit from a critical assessment of the term "signature". The Introduction describes four existing "signatures", all of them differing in the experimental condition in which acute nociceptive pain is studied, and proposes a fifth one. Keeping with the analogy, I'm wondering whether they should be called (pain) "signatures" if there is a different one for each experimental acute pain condition, and they are so dissimilar between them when they are tested on the same condition (this dataset).

      The last part of that comment raises fundamental methodological potential limitations that should be addressed in more depth in another article. That point goes beyond the scope of a research article. Regarding the stability aspect of the signatures, most of the signatures have not been studied extensively. It is thus difficult to currently assess their reliability. However, Han et al. (2022) showed high within-individual test-retest reliability for the NPS across eight different studies. Given that pain is a multidimensional experience, it is not surprising to find different patterns of activation predictive of different aspects or dimensions of the pain experience (see Čeko et al., 2022 for a similar discussion applied to negative affect).

      The authors state that "As an automatic behavioral manifestation, pain facial expression might be an indicator of activity in nociceptive systems, perceptual and evaluative processes, or general negative affect." Doesn't it reflect all three of them? (and instead of or?) Why "might"?

      The original sentence has been modified as follows: “As an automatic behavioral manifestation, pain facial expression is considered to be an indicator of activity in nociceptive systems, and to reflect perceptual and affective-evaluative processes” (lines 65 to 67).

      Methods

      The pain scale should be described. Kunz et al. used a 0-100 scale, where 50 was the pain threshold. This is crucial to interpret the 75-80/100 score for the painful thermal intensity.

      The description of the pain scale has been added to the manuscript (lines 299 to 300).

      Ratings for warm and painful temperatures should be reported (ideally plotted with individual-trial/subject data). In the same line of reasoning, FACS scores should be reported as well (ideally plotted with individual-trial/subject data). It would be interesting to explore the across-trial variability of pain ratings and FACS scores. That is, do people keep giving the same ratings and making the same facial expression after 16 trials? How much variability is between trials and between subjects?

      The point raised in that comment was already addressed in response to a comment made by Reviewer #1 (also see the new Figures S2 and S4; see also lines 335 to 346).

      How come only painful trials are analyzed? What if the FEPS signature was the same for warm and painful stimulation, thus reflecting the settings (fMRI experiment, stimulation, etc.) rather than the brain response to the stimuli?

      The point raised in that comment was already addressed in response to a comment made by Reviewer #1. There was no pain expression in the warm trials and the FEPS shows no response to warm trials. This is now illustrated in the new Figure S4B (see also lines 138 to 158).

      The authors propose to predict the trial-by-trial FACS composite score from the pain ratings using a LMM. However, it is interesting that they aim for an almost constant within- and between-subject pain score (75-80/100) as stated in the Methods. This should theoretically render the linear model invalid since its first (and main) assumption would be that FACS should vary linearly with the pain score. Even if patients were not aware that the temperatures were constant across trials, the variation in pain scores should be explained by random noise for a constant stimulation intensity.

      Reviewer #3 raises an important point that we need to clarify. Contrary to the expectation that FACS responses should be strongly correlated to pain ratings, we posited that these response channels depend at least in part on separate brain networks that may be differentially sensitive to a variety of modulatory mechanisms (attention, emotion, expectancy, motor priming, social context, etc.). This implies that part of the variance in FACS is independent from pain ratings. We, therefore, consider what Reviewer #3 refers to as random noise to be relevant and meaningful fluctuations reflecting endogenous processes influencing one’s experience of pain and differentially affecting various output responses.

      I noticed that fMRI data was analyzed with SPM5 in the original paper (Kunz et al., 2011) and with SPM8 in this manuscript. Was fMRI data re-processed for this manuscript? Were there any differences between the original analysis and this one that might induce changes in the interpretation of results?

      The data were indeed re-processed using SPM8, which was the most recent version available when we started the analyses reported here. We used trial-by-trial activation maps for MVPA, which differs from what was used in the previous study (contrast maps at the level of the conditions, not the trials). We have no reason to believe that the different versions will change the message of this manuscript since those versions do not differ significantly in terms of the fMRI preprocessing pipeline (see SPM8 release notes; https://www.fil.ion.ucl.ac.uk/spm/software/spm8/). Furthermore, the aim of this present study is not to compare the different analysis parameters implemented in SPM5 vs SPM8.

      What is the rationale for including PVP in the comparison among signatures? The experimental settings in which it was devised are distant from those described here.

      The inclusion of the PVP was aimed at enhancing our comparative analysis with the FEPS, as we sought to investigate the potential functional meaning of the FEPS. The PVP was developed to capture the aversive value of pain, a dimension that is conceptually proximal to the interpretation of the facial expression as a manifestation of the affective response to nociceptive pain.

      The LASSO-PCR approach is, in my opinion, not a procedure for (brain) decoding in this context. It is accurately described in the section title as a method for multivariate pattern analysis, or as a variable selection and regularization method for a prediction model. Here, brain activity in specific areas related to pain processing can hardly be described as "encoded", and the method just helps select those activations relevant for explaining a certain outcome (in this case, facial expressions).

      We understand the point made by reviewer #3. The term brain decoding was changed for multivariate pattern analysis in the latest version of the manuscript.

      Details are missing with regards to the dataset split into training, validation, and testing.

      Details about the training and testing procedure were added in the manuscript (lines 383 to 385).

      This might just be ignorance from me, so I apologize in advance, but what are "contrast" fMRI images? They are mentioned three times in the text but not really described. Are they the "Pain > Warm" contrasts from the original paper?

      We apologize for any confusion caused by the use of the term “contrast images” which suggests a direct comparison between two experimental conditions. We have replaced “contrast images” with “activation maps” to provide a more accurate description of the nature of the data used in the multivariate pattern analysis (lines 388 to 389).

      In the "Facial expression" section, the authors run an LMM to test the association between pain ratings (response variable) and facial responses (explanatory variable). If I understand correctly, in the "Multivariate pattern analysis" section they test the association between facial composite scores (response variable) and pain ratings (explanatory variable), but they obtain different results.

      The analyses were recomputed on the log transformed data, as mentioned previously in the response to reviewers 1-2. The first model (in the “Facial expression” section) used the log transformed FACS scores as a dependent variable, the pain ratings as the fixed effect, and the participants as the random effect. The results of that analysis suggested that the transformed facial expression scores were not significantly associated with the pain ratings (p = .07). The second model uses both the FEPS pattern expression scores and pain ratings as fixed effects to predict facial responses. This analysis showed the significant contribution of the FEPS to the prediction of FACS scores (p < .001) and no significant effect of the pain ratings. However, a significant interaction was found (p = .03) suggesting that the prediction of the pain facial expression by the FEPS may vary with pain ratings (i.e. moderator effect). Those results have been clarified in the “Multivariate pattern analysis” section in the Methods (lines 416 to 426).

      In this same section, what are "FEPS pattern expression scores"? They are used three times in the text, but I could not find their description.

      The FEPS pattern expression scores correspond to the dot product between the trial-by-trial activation maps and the unthresholded FEPS signature. This information has been added to the manuscript (lines 413 to 414).

      It would not be far-fetched to hypothesize that FACS scores could be predicted using solely activity from the motor cortex. The authors attempted to do this, but only with information from M1. Why did they not use the entire motor cortex, or better, regions of the motor cortex directly linked with the AUs described in the manuscript?

      The selection of the primary motor area (M1) was based on the results found in Kunz et al. (2011). In this study, M1 showed the strongest correlation with facial expression of pain. There are numerous possibilities of combinations of multiple brain regions considering a variety of criteria based on distributed networks involved in motor, affective, or pain-related processes. We limited our exploration to the region with the strongest hypothesis due to practical feasibility concerns.

      Results and Discussion

      As a general recommendation, results should present individual data whenever possible. For example, the association between signatures and facial expression should be plotted using scatterplots.

      We have added figures showing individual data when it was applicable (Figure S2; Figure S4).

      The authors state that the LASSO-PCR model accounts for the facial responses to pain. I believe this is an overstatement, considering:

      - A Pearson's r of 0.49 is usually considered low/weak correlation (moderate at best). In the same line, an R2 of 0.17 means that only 17% of the variance is explained by the model.

      More nuanced interpretation of the results has been added to the discussion. A section has been added to highlight the limitations of the study.

      - Figure 1 needs to display individual subject data and the ideal regression line.

      The model was trained using a k-fold cross-validation procedure. The regression lines thus represent the model’s prediction for each one of the 10 folds (i.e. each fold is trained and tested on a different subset of the data). A scatter plot including the ideal regression line computed across all trials and subjects was added in supplementary material to illustrate the relation between the FACS scores and the FEPS pattern expression scores (Figure S4).

      - Looking at Figure 1, it is clear that the model has an intercept different from zero. This means that when the FACS score was zero (i.e., volunteers did not make any distinguishable facial expression), the model predicted a score larger than zero. This is not discussed in the manuscript, and in simple terms, it means that there are brain activation patterns when no discernible facial expression is being made by the volunteers. In the original paper by Kunz et al., two groups of subjects were categorized, and one of them was a facially low- or non-expressive group (n=13). This fact is not even mentioned in the manuscript.

      The categorization in the previous report (Kunz et al., 2012) was based on a pre-experimental session. All subjects were included in the current analysis. This is now indicated in the Methods (lines 287 to 289).

      - On the other end of the range in Figure 1, differences between the FACS scores near the maximum range (40) are underestimated by 23 to 33 points! I guess that the RMSE is smaller (6-7 points), because many FACS scores are concentrated on the low end of the scale.

      This is a very interesting comment. A section discussing the limits of the model to predict the lower and higher FACS scores has been added in the manuscript (lines 232 to 250).

      It is of course acceptable to interpret the low similarity between signatures as a sign that each signature describes a different mechanism related to pain processing. However, I believe that a complete discussion should contemplate other competing hypotheses. Considering that all signatures were developed using a similar painful thermal stimulation protocol, it is reasonable to expect larger similarities between signatures. The fact that they are so dissimilar could be a reflection of model overfit, i.e., all these signatures are just fitted to these particular experimental protocols and data, and do not generalize to brain mechanisms of pain processing.

      We appreciate the pertinent observation. We have included a limitations section in which we discussed, among other considerations, the possible overfitting of models and the necessity of pursuing generalizability studies (lines 225 to 268).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is an important study on the regulation of chlorophyll biosynthesis in rice embryos. It provides insights into the genetic and molecular interactions that underlie chlorophyll accumulation, highlighting the inhibition of OsGLK1 by OsNF-YB7 and the broader implications for understanding chloroplast development and seed maturation in angiosperms. The results presented, including mutation analysis, gene expression profiles, and protein interaction studies, provide convincing evidence for the function of OsNF-YB7 as a repressor in the chlorophyll biosynthesis pathway.

      Thank you very much for your positive assessment of our manuscript. We have carefully revised the manuscript according to the reviewers’ valuable suggestions and comments. For more details, please see the point-to-point response to the reviewers below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript investigates the regulation of chlorophyll biosynthesis in rice embryos, focusing on the role of OsNF-YB7. The rigorous experimental approach, combining genetic, biochemical, and molecular analyses, provides a robust foundation for these findings. The research achieves its objectives, offering new insights into chlorophyll biosynthesis regulation, with the results convincingly supporting the authors' conclusions.

      Strengths:

      The major strengths include the detailed experimental design and the findings regarding OsNF-YB7's inhibitory role.

      Weaknesses:

      However, the manuscript's discussion on the practical implications for agriculture and the evolutionary analysis of regulatory mechanisms could be expanded.

      Thank you for your insightful comments and suggestions. In the revised manuscript, we discussed the potential application of the chlorophyllous embryo (please see line 270-274). The presence of chlorophyll in the embryo facilitates photosynthesis at early developmental stages, potentially leading to improved seedling growth and vigor (Smolikova and Medvedev, 2016). In crops such as soybean and canola, green embryo is considered as a valuable trait due to its association with enhanced photosynthetic capacity, which consequently promotes fatty acid biosynthesis (Ruuska et al., 2004). However, chlorophyll degradation must be carefully managed during seed maturation to avoid negative effects on seed viability and meal quality (Chung et al., 2006). Interestingly, the green embryo of lotus (Nelumbo nucifera) is widely used as a food ingredient in Asian, Australia, and North America. It is employed in herbal medicine to treat nervous disorders, insomnia, and other conditions (Zhu et al., 2017; Ha et al., 2022), highlighting the significant potential value of the green embryo.

      In many chloroembryophytes, such as Arabidopsis, the embryo occupies a large proportion of the seed. From an evolutionary perspective, the presence of chlorophyll in the embryo may promote adaptation in such chloroembryophytes because more reserves can be accumulated in the seed through active photosynthesis, better supporting the embryo development and subsequent seedling growth (Sela et al., 2020). On the other hand, some leucoembryophytes, such as rice, have persistent endosperm rich in storage reserves to nourish embryo development (Liu et al., 2022). Gaining the ability to accumulate chlorophyll in the embryo is unnecessary for such species. In agreement with this hypothesis, cholorophyllous embryos are more prevalent in non-endospermous seeds (Dahlgren, 1980). However, we would like to emphasize that the evolutionary force driving the divergence of chloroembryophytes and leucoembryophytes is currently almost completely unknown and deserves in-depth investigation in the future. We discussed the possible evolution of the ability to accumulate chlorophyll in the embryo, please find the details in Line 276-295.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to establish the role of the rice LEC1 homolog OsNF-YB7 in embryo development, especially as it pertains to the development of photosynthetic capacity, with chlorophyll production as a primary focus.

      Strengths:

      The results are well-supported and each approach used complements each other. There are no major questions left unanswered and the central hypothesis is addressed in every figure.

      Weaknesses:

      There are a handful of sections that could use clarifying for readers, but overall this is a solidly composed manuscript.

      The authors clearly achieved their aims; the results compellingly establish a disparity between how this system operates in rice and Arabidopsis. Conclusions are thoroughly supported by the provided data and interpretations. This work will force a reconsideration of the value of Arabidopsis as a model organism for embryo chlorophyll biosynthesis and possibly photosynthesis during embryo maturation more broadly, as rice is a major crop organism and it very clearly does not follow the Arabidopsis model. It will thus be useful to carry out similar tests in other organisms rather than relying on Arabidopsis and attempting to more fully establish the regulatory mechanism in rice.

      Thank you very much for your positive comments. We have carefully revised the manuscript according to your and the other reviewers’ comments and suggestions. Particularly, we emphasized the necessary to carry out similar tests in other organisms rather than relying on Arabidopsis to better understand the regulatory mechanism in rice.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors set out to understand the mechanisms behind chlorophyll biosynthesis in rice, focusing in particular on the role of OsNF-YB7, an ortholog of Arabidopsis LEC1, which is a positive regulator of chlorophyll (Chl) biosynthesis in Arabidopsis. They showed that OsNF-YB7 loss-of-function mutants in rice have chlorophyll-rich embryos, in contrast to Arabidopsis LEC1 loss-of-function mutants. This contrasting phenotype led the authors to carry out extensive molecular studies on OsNF-YB7, including in vitro and in vivo protein interaction studies, gene expression profiling, and protein-DNA interaction assays. The evidence provided well supported the core arguments of the authors, emphasising that OsNF-YB7 is a negative regulator of Chl biosynthesis in rice embryos by mediating the expression of OsGLK1, a transcription factor that regulates downstream Chl biosynthesis genes. In addition, they showed that OsNF-YB7 interacts with OsGLK1 to negatively regulate the expression of OsGLK1, demonstrating the broad involvement of OsNF-YB7 in rice Chl biosynthetic pathways.

      Strengths:

      This study clearly demonstrated how OsNF-YB7 regulates its downstream pathways using several in vitro and in vivo approaches. For example, gene expression analysis of OsNF-YB7 loss-of-function and gain-of-function mutants revealed the expression of selected downstream chl biosynthetic genes. This was further validated by EMSA on the gel. The authors also confirmed this using luciferase assays in rice protoplasts. These approaches were used again to show how the interaction of OsNF-YB7 and OsGLK1 regulates downstream genes. The main idea of this study is very well supported by the results and data.

      Weaknesses:

      From an evolutionary perspective, it is interesting to see how two similar genes have come to play opposite roles in Arabidopsis and rice. It would have been more interesting if the authors had carried out a cross-species analysis of AtLEC1 and OsNF-YB7. For example, overexpressing AtLEC1 in an osnf-yb7 mutant to see if the phenotype is restored or enhanced. Such an approach would help us understand how two similar proteins can play opposite roles in the same mechanism within their respective plant species.

      We appreciate your insightful comments and suggestions. It is a very interesting question whether AtLEC1 can fully restore osnf-yb7, given the possible functional divergence between the genes in terms of regulation of chlorophyll biosynthesis in the embryo. We have previously expressed OsNF-YB7 in the lec1-1 background in Arabidopsis, driven by the native promoter of LEC1 (Niu et al., 2021). We found that OsNF-YB7 could almost completely rescue the embryo defects in Arabidopsis, indicating that OsNF-YB7 plays a resemble role in rice as the LEC1 does in Arabidopsis (Niu et al., 2021). We sought to determine whether AtLEC1 can complement the chlorophyll defect in osnf-yb7. However, given the fact that osnf-yb7 shows severe callus induction defect, which is not surprising, because many studies have shown that LEC1 is indispensable for somatic embryo development in various plant species, we are struggling to obtain the genetic materials for analysis. We have to transform OsNF-YB7pro::AtLEC1 into the WT background first, and then cross the transformant with the osnf-yb7 mutant. This is a time-consuming process in rice, but hopefully we will able to isolate a line expressing OsNF-YB7pro::AtLEC1 in the osnf-yb7 background from the resulting segregating population.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A minor comment regarding the chlorophyll contents quantification in the study. Line 87: "The results showed that WT had an achlorophyllous embryo throughout embryonic development,...." In the TEM result, chloroplast was not observed in the WT embryo sections, indicating a lack of chlorophyll-containing structures, contrary to what was found in the osnf-yb7 embryos where chloroplasts were observed.

      The authors stated that the embryo morphologies and Chl autofluorescence data showed that WT had an achlorophyllous embryo throughout embryonic development. However, the quantification of Chl levels in Figure 1D and Figure 4C showed that WT does produce some chlorophylls, albeit at lower levels than osnf-yb7 or OSGLK-OX embryos (WT values in the two figures are slightly different). This discrepancy warrants clarification to ensure consistency and accuracy in the manuscript's findings.

      We re-evaluated the Chl content in the embryos of WT and OsGLK1-OX mature seeds. The result confirmed our previous finding that WT embryos produce a small amount of chlorophyll (please see the updated Fig. 4C). Notably, we observed that the dark-grown etiolated plants still have measurable chlorophyll content as reported in many studies (for example, Wang et al., 2017; Yoo et al., 2019), suggesting that there is potential bias in measuring chlorophyll content using an absorbance-based approach. We assume this possibly explains the concern you have raised.

      Reviewer #2 (Recommendations For The Authors):

      Mild editing for grammar is needed throughout, e.g. line 73, "It is still a mysterious why plant species".

      We have carefully edited the grammar.

      As a minor point, the placement of figure panels, such as in Figure 1, is not always intuitive.

      Thank you for your suggestion. This figure has been revised as suggested. Please see the updated Fig. 1.

      What is the significance of the two GFP mutants in Figures 2C and 2D? Is one of those the mislabeled Flag mutant?

      The lines showed in Fig. 2C and D were not mislabeled. They were two independent transgenic events, both of which showed that OsNF-YB7 inhibited the expression of OsPORA and OsLHCB4 in rice. The transgenic lines overexpressing OsNF-YB7 tagging with the 3× Flag (NF-YB7-Flag) were also used for this experiment. In agreement, OsPORA and OsLHCB4 were significantly downregulated in the three independent NF-YB7-Flag lines (Fig. S4C), confirming the results showed in Fig. 2C and D.

      In Figures 2G and 2H, what is that enormous band at the bottom of the gel?

      The bands at the bottom of the gel were free probes. We indicated this in the revised figure.

      Not until the Materials and Methods section did I realize that any of this study was being done in tobacco; the Introduction implies it's rice vs. Arabidopsis and it might be a good idea to mention the organism of study somewhere before Figure 6.

      We apologize for any confusion caused by our previous writing. While the majority of this study was performed with rice plants or protoplasts, the split complementary LUC assays and BiFC assays were performed with tobacco. We have specified these in the revised manuscript as suggested.

      Reviewer #3 (Recommendations For The Authors):

      It would be nice if the author could show what the phenotype is in AtLEC1 OX in osnf-yb7 and also OsNF-YB7 OX in atlec1 mutants.

      Thank you for your suggestion. We have previously expressed OsNF-YB7 in the lec1-1 background of Arabidopsis, driven by the native promoter of Arabidopsis LEC1 (Niu et al., 2021). Since OsNF-YB7 could rescue the embryo morphogenesis defects in Arabidopsis (Niu et al., 2021), we assumed that OsNF-YB7 plays a similar role in rice as the LEC1 does in Arabidopsis. However, it remains unknown whether expression of LEC1 in osnf-yb7 may restore the chlorophyllous embryo phenotype in rice. As the generation of genetic material is time-consuming, and especially given the fact that osnf-yb7 has a severe callus induction defect, we are struggling to obtain the complementary line for analysis. We have to transform OsNF-YB7pro::AtLEC1 in a WT background first, and then cross the transformant with the osnf-yb7 mutant. Hopefully, we will be able to isolate a line expressing OsNF-YB7pro::AtLEC1 in osnf-yb7 background, from the derived segregating population. We discussed the reviewer’s concern in the revised manuscript, please see Line 369-376.

      Line 46, I think it is vague to mention that 'Like most plant species'. Some species might have different copy numbers, for example, a single GLK in liverwort M. polymorpha.

      The statement has been revised. Please see Line 46.

      Figures 2F and 5B, why was only one promoter region used for OsLHCB4? It would be better to have more regions like OsPORA.

      Thank you for your comments. Here, we have examined more promoter regions (P1, P2 and P3) in the revised manuscript as suggested, among which, the previously selected promoter region (P3) contains both the G-box and CCAATC motifs that can be potentially recognized by GLK1. Consistent to our previous report, the results showed that OsNF-YB7 (left) and OsGLK1 (right) were associated with the P3 region, but showed no significant differences in the other probes. Please see the results in Fig. 2F and Fig. 5B of the revised manuscript.

      Legend of Figures 2G, H, OsPORA (I), and OsLHCB (J) should be (G) and (H) respectively.

      Corrected.

      References

      Chung, D.W., Pruzinska, A., Hortensteiner, S., and Ort, D.R. (2006). The role of pheophorbide a oxygenase expression and activity in the canola green seed problem. Plant Physiol 142, 88-97.

      Ha, T., Kim, M.S., Kang, B., Kim, K., Hong, S.S., Kang, T., Woo, J., Han, K., Oh, U., Choi, C.W., and Hong, G.S. (2022). Lotus Seed Green Embryo Extract and a Purified Glycosyloxyflavone Constituent, Narcissoside, Activate TRPV1 Channels in Dorsal Root Ganglion Sensory Neurons. J Agric Food Chem 70, 3969-3978.

      Liu, J., Wu, M.W., and Liu, C.M. (2022). Cereal Endosperms: Development and Storage Product Accumulation. Annu Rev Plant Biol 73, 255-291.

      Niu, B., Zhang, Z., Zhang, J., Zhou, Y., and Chen, C. (2021). The rice LEC1-like transcription factor OsNF-YB9 interacts with SPK, an endosperm-specific sucrose synthase protein kinase, and functions in seed development. Plant J 106, 1233-1246.

      Ruuska, S.A., Schwender, J., and Ohlrogge, J.B. (2004). The capacity of green oilseeds to utilize photosynthesis to drive biosynthetic processes. Plant Physiol 136, 2700-2709.

      Sela, A., Piskurewicz, U., Megies, C., Mene-Saffrane, L., Finazzi, G., and Lopez-Molina, L. (2020). Embryonic Photosynthesis Affects Post-Germination Plant Growth. Plant Physiol 182, 2166-2181.

      Smolikova, G.N., and Medvedev, S.S. (2016). Photosynthesis in the seeds of chloroembryophytes. Russ J Plant Physl+ 63, 1-12.

      Wang, Z., Hong, X., Hu, K., Wang, Y., Wang, X., Du, S., Li, Y., Hu, D., Cheng, K., An, B., and Li, Y. (2017). Impaired Magnesium Protoporphyrin IX Methyltransferase (ChlM) Impedes Chlorophyll Synthesis and Plant Growth in Rice. Front Plant Sci 8, 1694.

      Yoo, C.Y., Pasoreck, E.K., Wang, H., Cao, J., Blaha, G.M., Weigel, D., and Chen, M. (2019). Phytochrome activates the plastid-encoded RNA polymerase for chloroplast biogenesis via nucleus-to-plastid signaling. Nat Commun 10, 2629.

      Zhu, M., Liu, T., Zhang, C., and Guo, M. (2017). Flavonoids of Lotus (Nelumbo nucifera) Seed Embryos and Their Antioxidant Potential. J Food Sci 82, 1834-1841.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Reviews):

      Summary: 

      The authors use a combination of biochemistry and cryo-EM studies to explore a complex between the cap-binding complex and an RNA binding protein, ALYREF, that coordinates mRNA processing and export.

      Strengths: 

      The biochemistry and structural biology are supported by mutagenesis which tests the model in vitro. The structure provides new insight into how key events in RNA processing and export are likely to be coordinated.

      Weaknesses: 

      The authors provide biochemical studies to confirm the interactions that they identify; however, they do not perform any studies to test these models in cells or explore the consequences of mRNA export from the nucleus. In fact, several of the amino acids that they identified in ALYREF that are critical for the interaction, as determined by their own biochemical studies, are conserved in budding yeast Yra1 (residues E124/E128 are E/Q in budding yeast and residues Y135/V138/P139 are F/S/P), where the impact on poly(A) RNA export from the nucleus could be readily evaluated. The authors could at least mention this point as part of the implications and the need for future studies. No one seems to have yet targeted any of these conserved residues, so this would be a logical extension of the current work.

      We thank the reviewer for the feedback on our work. ALYREF coordinates pre-mRNA processing and export through interactions with a plethora of mRNA biogenesis factors including the DDX39B subunit of the TREX complex, CBC, EJC, and 3’ processing factors. ALYREF mediates the recruitment of the TREX complex on nascent transcripts which depends on its interactions with both CBC and EJC. Our work and studies by others indicate that ALYREF uses overlapping interfaces including both the N-terminal WxHD motif and the RRM domain to bind CBC and EJC. Thus, ALYREF mutants deficient in CBC interaction will also disrupt the ALYREF-EJC interaction and are not ideal for functional studies. In addition, the CBC plays important roles in multiple steps of mRNA metabolism through interactions with a plethora of factors, which often interact competitively with CBC. Identification of separation-of-function mutations on CBC or ALYREF that specifically disrupt their interaction but not other cellular complexes containing CBC or ALYREF would be an important future area to test the model in cells. 

      We appreciate the reviewer’s insightful comments regarding yeast Yra1. Thus far, the physical and functional connection between Yra1 and CBC in yeast has not been demonstrated. There are major differences between yeast Yra1 and human ALYREF. Given the lack of an EJC in S. cerevisiae, it is unclear whether Yra1 acts in a similar manner as human ALYREF. In addition, Yra1 does not contain a WxHD motif in its N-terminal unstructured region, which is involved in CBC and EJC interactions in ALYREF. Characterization of the Yra1-CBC interaction will be an interesting future direction. We now include a discussion about yeast Yra1 in the newly added “Conclusion and perspectives” section. 

      Specific suggestions:

      The authors could put their work in context by speculating how some of the amino acids that they identify as being critical for the interactions they identify could contribute to cancer. For example, they mention mutations of interacting residues in NCBP2 are associated with human cancers, pointing out that NCBP2 R105C amino acid substitution has been reported in colorectal cancer and the NCBP2 I110M mutation has been found in head and neck cancer. Do the authors speculate that these changes would decrease the interaction between NCBP2 and ALYREF and, if so, how would this contribute to cancer? They also mention that a K330N mutation in NCBP1 in human uterine corpus endometrial carcinoma, where Y135 on the α2 helix of mALYREF2 makes a hydrogen bond with K330 of NCBP1. How do they speculate loss of this interaction would contribute to cancer?

      In the revised manuscript, we include a discussion about these CBC mutants found in human cancers in the “Conclusion and perspectives” section. We think some of these CBC mutants, such as NCBP-1 K330N, could reduce interaction with ALYREF. Compromised CBC-ALYREF interaction will affect the recruitment of the TREX complex on nascent transcripts and cause dysregulation of mRNA export. In addition, that could also change the partition of CBC and ALYREF in different cellular complexes and cause perturbation of various steps in mRNA biogenesis that are regulated by CBC and ALYREF. Thus far, it is unclear whether and how loss of the CBC-ALYREF interaction directly contributes to cancer. Our work and that of others provide molecular insights to test in future studies. 

      Reviewer #2 (Public Reviews):

      Summary: 

      In this manuscript, Bradley and his colleagues represented the cryo-EM structure of the nuclear cap-binding complex (CBC) in complex with an mRNA export factor, ALYREF, providing a structural basis for understanding CBC regulating gene expression.

      Strengths: 

      The authors successfully modeled the N-terminal region and the RRM domain of ALYREF (residues 1-183) within the CBC-ALYREF structure, which revealed that both the NCBP1 and NCBP2 subunits of the CBC interact with the RBM domain of ALYREF. Further mutagenesis and pull-down studies provided additional evidence to the observed CBC-ALYREF interface. Additionally, the authors engaged in a comprehensive discussion regarding other cellular complexes containing CBC and/or ALYREF components. They proposed potential models that elucidated coordinated events during mRNA maturation. This study provided good evidence to show how CBC effectively recruits mRNA export factor machinery, enhancing our understanding of CBC regulating gene expression during mRNA transcription, splicing, and export. 

      Weaknesses: 

      No in vivo or in vitro functional data to validate and support the structural observations and the proposed models in this study. Cryo-EM data processing and structural representation need to be strengthened. 

      We appreciate the reviewer’s comments and suggestions. The fact that ALYREF uses highly overlapped binding interfaces for CBC and EJC interactions prevents us from a clear functional dissection of the ALYREF-CBC interaction using in vitro assays or in cells at the current stage. Please also see our response to Reviewer 1. 

      In this revised manuscript, we have reprocessed the cryo-EM data using a different strategy which yields significantly improved maps. We have made improvements to the presentation of the structural work based on the reviewer’s specific comments. 

      Reviewer #3 (Public Reviews):

      Summary: 

      The authors carried out structural and biochemical studies to investigate the multiple functions of CBC and ALYREF in RNA metabolism.

      Strengths: 

      For the structural study part, the authors successfully revealed how NCBP1 and NCBP2 subunits interact with mALYREF (residues 1-155). Their binding interface was then confirmed by biochemical assays (mutagenesis and pull-down assays) presented in this study. 

      Weaknesses: 

      The authors did not provide functional data to support their proposed models. The authors should include more details regarding the workflow of their cryo-EM data processing in the figure. 

      We thank the reviewer for the comments. We completely agree that testing the proposed models in cells would be ideal. However, as we also respond to the other reviewers, functional studies are premature at the current stage because both ALYREF and CBC are components of many cellular complexes that regulate mRNA metabolism. Separation-of-function mutations on CBC or ALYREF first need to be identified in future studies for further investigation. Please also see our response to Reviewer 1. 

      As suggested by the reviewer, we have included more details of the cryo-EM workflow in this revised manuscript. We have also included various validation measures including 3DFSC analyses, map vs model FSC curves, and representative density maps at various protein-protein binding interfaces. 

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the Authors):

      Major points:

      The authors should take advantage of Figure 1, which shows the domain structures of NCBP1, NCBP2, and ALYREF to indicate for the reader specifically which protein domains are included in the biochemical and structural analyses. In the current version of the manuscript, there is plenty of space to indicate below each domain structure precisely what regions are included.

      In this revised manuscript, we have revised Figure 1A to indicate the protein constructs used in this work. 

      Although it is fine to combine the Results and Discussion, the authors should really offer a concluding paragraph to highlight the novel results from this study and put the results in context.

      We thank the reviewer for the recommendation. We now include a “Conclusion and perspectives” section in this revised manuscript.  

      Minor comments:

      Page 5, last sentence (and others) starts a sentence with the word "Since" when likely "As" which does not imply a time element to the phrase, is the correct word.

      "Since the ALYREF/mALYREF2 interaction with the CBC is conserved and mALYREF2 exhibits better solubility, we focused on mALYREF2 in the cryo-EM investigations."

      Would be more correct as: "As the ALYREF/mALYREF2 interaction with the CBC is conserved and mALYREF2 exhibits better solubility, we focused on mALYREF2 in the cryo-EM investigations."

      We thank the reviewer for the comments. We have made the corrections. 

      The word 'data' is plural so the sentence at the bottom of p.9 that includes the phrase "...in vivo data shows.." should read "..in vivo data show.." 

      Corrected in the revised manuscript.

      Reviewer #2 (Recommendations for the Authors):

      Major points:

      (1) The authors claimed the improved solubility of mouse ALYREF2 (mALYREF2, residues 1-155) compared to the previously employed ALYREF construct. However, human ALYREF has already been purified successfully for pull down assay, indicating soluble human ALYREF obtained, why not use human ALYREF directly? Please clarify. 

      Pull-down studies were performed with GST-tagged ALYREF. For cryo-EM studies, untagged ALYREF is preferred to avoid potential issues that may arise from the expression tag. However, untagged ALYREF is less soluble than GST-tagged ALYREF and is not amenable for structural studies. We have revised the text to clarify this point. 

      (2) The authors confirmed CBC-ALYREF interfaces through mutagenesis and pull-down assays in vitro. However, it would be more informative and interesting to include functional assays in vitro or/and in vivo with mutagenesis. 

      We completely concur with the reviewer that testing the proposed models in vitro and in vivo would be important. However, as we pointed out in our response to public reviews, the highly overlapped binding interfaces on ALYREF for CBC and EJC interactions pose a great challenge for functional studies. Furthermore, both ALYREF and CBC are multifunctional factors and interact with a number of partners. Ideally, separation-of-function mutants that specifically disrupt the CBC-ALYREF interaction but not others need to be identified in future studies in order to perform functional studies. 

      (3) About cryo-EM data processing and structural representation:

      (1) In the description of the cryo-EM data processing, the authors claimed they did heterogeneous refinement, homogenous refinement, and then local refinement. This reviewer is puzzled by this process because the normal procedure should be non-uniform refinement following homogenous refinement. If the authors did not perform non-uniform refinement, they should do it because it would significantly improve the quality and resolution of cryo-EM maps. In addition, the right local refinement should include mask files and only show the density/map of the local region. 

      We thank the reviewer for the suggestions. In response to the reviewer’s comment on the preferred orientation issue (point 5 below), we reprocessed the cryo-EM data and obtained significantly improved cryo-EM maps. In this revised manuscript, the CBC-mALYREF map was refined using homogeneous refinement; the CBC map was refined using homogenous refinement followed by non-uniform refinement. Refinement masks are included in Figure 2-figure supplement1. 

      (2) Further local refinements with signal subtraction should be performed to improve the density and resolution of mALYREF2. 

      We tested local refinement with or without signal subtraction using masks covering mALYREF2 and various regions of CBC. Unfortunately, this approach did not improve the density of mALYREF2. We suspect that the small size of mALYREF2 (77 residues for the RRM domain) and the intrinsic flexibility of CBC are the limiting factors in these attempts. 

      (3) Figures with cryoEM map showing the side chains of the residues on the CBC-mALYREF2 interface should be included to strengthen the claims. Authors could add the map to Figure 3b/c or present it as a supplementary figure.

      We include new supplementary figures (Figure 3-figure supplement 1) to show the electron densities corresponding to the views in Figure 3B and 3C. Residues labeled in Figure 3B and 3C are shown in sticks in these supplementary figures.

      (4) For cryo-EM date processing, the authors have omitted lots of important details. Could the authors elaborate on the data processing with more details in the corresponding Figure and Methods Sections? Only one abi-initial model from the picked good particles was displayed in the figure. Are there any other different conformations of 3D classes for the dataset? In addition, too few classes have been considered in 3D classification, more classes may give a class with better density and resolution.

      We thank the reviewer for the comments. We have reprocessed the cryo-EM data. A major change is to use Topaz for particle picking. We now include more details for data processing in Figure 2-figure supplement 1 and the method section. The cryo-EM sample is relatively uniform. Ab-initio reconstruction and heterogenous refinement yielded only one good class and the other classes are “junk” classes (omitted in the workflow figure). No major conformational changes were observed throughout the multiple rounds of heterogenous refinement for both CBC and CBCmALYREF2. In this revised manuscript, we have been able to obtain significantly improved maps through the new data processing strategy employing Topaz as illustrated in Figure 2-figure supplement 1 to 5.

      (5) Angular distribution plots should be included to show if there is a preferred orientation issue. Based on the presented maps in validation reports, there may exist a preferred orientation issue for the reported two cryo-EM maps. Detailed 3D-Histogram and directional FSC plots for all the cryo-EM maps using 3DFSC web server should be presented to show the overall qualities (https://www.nature.com/articles/nmeth.4347 and https://3dfsc.salk.edu/).

      We thank the reviewer for the recommendations. In response to the reviewer’s comment on the preferred orientation issue, we reprocessed the cryo-EM data. Topaz was used for particle picking instead of template picking. 3DFSC analyses indicate that the new CBC-mALREF2 map has a sphericity of 0.946, which is a significant improvement from the previous map which has a sphericity of 0.815. Consistently, the maps presented in this revised manuscript show significantly improved densities. We now include angular distribution and 3DFSC analyses of the EM maps (Figure 2-figure supplement 2 and 4). 

      (6) Figures of model-to-map FSCs need to be present to demonstrate the quality of the models and the corresponding ones (model resolution when FSC=0.5) should also be included in Table 1. The accuracy of the model is important for structural explanations and description.

      The model-to-map FSCs are now included in Figure 2-figure supplement 3A and 5A. The model resolutions of CBC-mALYREF2 and CBC are estimated to be 3.5 Å and 3.6 Å at an FSC of 0.5. These numbers are now included in Table 1. 

      (7) In addition, figures of local density maps with different regions of the models, showing side chains, are necessary and important to justify the claimed resolutions. 

      We now include density maps overlayed with residue side chains at various regions. For the CBCmALYREF2 map, density maps are shown at the mALYREF2-NCBP1 interfaces (Figure 3-figure supplement 1A and 1B), mALYREF2-NCBP2 interface (Figure 3-figure supplement 1C), NCBP1NCPB2 interface (Figure 2-figure supplement 5B), and the region near m7G (Figure 2-figure supplement 5C). For the CBC map, density maps are shown at the NCBP1-NCPB2 interface (Figure 2-figure supplement 3B) and the region near m7G (Figure 2-figure supplement 3C). 

      Minor points:

      (1) A figure superimposing the models from the CBC-mALYREF2 amp and mALYREF2 alone map is necessary to present that there are no other CBC binding-induced conformational changes in CBC except the claimed by the authors. In addition, a figure showing the density of m7GpppG should be included as well.  

      Overlay of CBC and CBC-mALYREF2 models is now presented in Figure 2-figure supplement 3D. Comparing CBC and CBC-mALYREF2, NCBP1 and NCBP2 have a RMSD of 0.32 Å and 0.30 Å, respectively. The density maps near the M7G cap analog are shown in Figure 2-figure supplement 3C for CBC and Figure 2-figure supplement 5C for CBC-mALYREF2. 

      (2) Authors obtained the two maps from one dataset, so "we first determined" and "we next determined" (page 6) should be replaced with something like "One class of 3D cryo-EM map revealed' and "Another class of 3D cryo-EM map defined". 

      We have revised the text as suggested by the reviewer.  

      (3) In 'Abstract', 'a mRNA export factor' should be 'an mRNA export factor'. 

      Corrected in the revised manuscript.

      (4) In 'Abstract', the final sentence 'Comparison of CBC- ALYREF to other CBC and ALYREF containing cellular complexes provides insights into the coordinated events during mRNA transcription, splicing, and export' doesn't read smoothly, I would suggest revising it to 'Comparing CBC-ALYREF with other cellular complexes containing CBC and/or ALYREF components provides insight into the coordinated events during mRNA transcription, splicing, and export.' 

      We thank the reviewer for the recommendation and have revised accordingly. 

      (5) In paragraph 'CBC-ALYREF and viral hijacking of host mRNA export pathway', line 6, the sentences preceding and following the term 'However' indicate a progressive or parallel relationship, rather than a transitional one. To enhance the coherence, I would suggest replacing 'However' with 'Furthermore' or 'In addition'. 

      Corrected in the revised manuscript.

      (6) In both Figure 5 and Figure 6, the depicted models are proposed and constructed exclusively through the comparison of the CBC-partial ALYREF with other cellular complexes containing components of CBC and/or ALYREF, which need to be confirmed by more studies. To prevent potential confusion and misunderstandings, it is recommended to replace the term 'model' with 'proposed model'. 

      Corrected in the revised manuscript.

      Reviewer #3 (Recommendations for the Authors):

      Major points:

      (1) In the Results and Discussion section, the authors mentioned "Recombinant human ALYREF protein was shown to interact with the CBC in RNase-treated nuclear extracts." However, they used mouse ALYREF for cryo-EM investigations. Can the authors include an explanation for this choice during the revision?  

      In our work, we used a mixture of glutamic acid and arginine to increase the solubility of GSTALYREF. For cryo-EM studies, we use untagged ALYREF to avoid potential issues that may arise from the expression tag. However, untagged ALYREF is less soluble than GST-tagged ALYREF and is not suitable for structural studies in standard buffers. We have made further clarification on this point in this revised manuscript. 

      (2) In the paragraph on "CBC-ALYREF interfaces", the authors stated "For example, E97 forms salt bridges with K330 and K381 of NCBP1. Y135 on the α2 helix of mALYREF2 makes a hydrogen bond with K330 of NCBP1. The importance of this interface between ALYREF and NCBP1 is highlighted by a K330N mutation found in human uterine corpus endometrial carcinoma." I fail to see a strong connection between their structural observations and previous findings regarding the role of a K330N mutation found in human uterine corpus endometrial carcinoma. The authors should add more words to thread these two parts.  

      In response to the reviewer’s comment, we now move the discussion of these CBC mutants to the newly added “Conclusion and perspectives” section. 

      (3) The authors should include side chains of the residues in their figure of Local resolution estimation and FSC curves, especially when they are presenting the binding interface between two components. 

      We have now included density maps that are overlayed with structural models showing side chains of critical residues. These maps include the NCBP1-mALYREF2 interfaces (Figure 3-figure supplement 1A and 1B), NCBP2-mALYREF2 interface (Figure 3-figure supplement 1C), NCBP1NCBP2 interface (Figure 2-figure supplement 3B and 5B), and the m7G cap region (Figure 2figure supplement 3C and 5C). 

      Minor points: 

      (1) Some grammatical mistakes need to be corrected. For example, it is "an mRNA" instead of "a mRNA".  

      Corrected in the revised manuscript.

      (2) The authors can provide more information for the audience to know better about ALYREF when it first appears in the 5th line in the Abstract section. For example, "It promotes mRNA export through direct interaction with ALYREF, a key mRNA export factor, ...". 

      We have revised the sentence based on the reviewer’s comment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Some of the data is problematic and does not always support the authors' conclusions:

      (1) Fig. 1K and H are identical.

      Thank you for pointing out this problem in manuscript. We apologize for this unintentional mistake and have replaced Fig. 1K.

      (2) The graph in Figure 2B contradicts the text. It is not obvious how the image was quantified to produce the histological score graph..

      We thank the reviewer for pointing out this problem in manuscript, as the reviewer suggested, we have replaced the Figure 2B.

      (3) In Figures 2C and D, there is no clear pattern of changes in pro-inflammatory or anti-inflammatory cytokines, despite the authors' claims in the text.

      We appreciate the comment, we think the reason is that the level of cytokines in the tissue is low, so the pattern of changes is not obvious.

      (4) It is unclear why the anti-dsDNA antibody does not stain the nucleus in Figure 4B. The staining with anti-dsDNA and DAPI does not match well. Figure 5H shows there is still lots of cytosolic DNA in OGT-/- HCF-1-C, measured by DAPI. These data do not support the authors' conclusion that HCFC600 eliminates cytosolic DNA accumulation (line 229). There is no support for the authors' claim that HCF-1 restrains the cGAS-STING pathway (line 330).

      We thank these insightful comments, the most critical step in staining cytosolic DNA is to proceed to a low-permeabilization as to allow the antibody to cross the cellular membrane but not the nuclear membrane, that’s why the anti-dsDNA antibody does not stain the nucleus. In Figure 5H, we think we used a high concentrated DAPI to do the staining and nucleus DNA get stained, looks like it’s the cytosolic DNA. 

      (5) In Figure 5B, there is no increase in HCF-1 cleavage after OGT over-expression.

      We appreciate the reviewer for his/her comment, we think the reason is that we used the cell line to stably overexpress OGT-GFP and we may have missed the time point when the increase in HCF-1 cleavage occurred, so there is no big increase of it. However, there is a significant increase in Figure 5C.

      (6) In Figure 7, the TNF-a staining does not inspire confidence.

      We thank the reviewer for his/her comment, from both Figure 7K (MC38 tumor model) and Figure 7N (LLC tumor model), we observed a significant increase in TNF-α+ CD8+ T cells in the group treated with the combination of OSMI-1 and anti-PD-L1 compared to the control group, as evidenced by the clear clustering.

      The writing needs significant improvement:

      (1) There are multiple English grammar mistakes throughout the paper. It is recommended that the authors run the manuscript through an editing service.

      We thank the reviewer for his/her suggestion. We apologize for the poor language of our manuscript. We worked on the manuscript for a long time and the repeated addition and removal of sentences and sections obviously led to poor readability. We have now worked on both language and readability and have also involved native English speakers for language corrections. We really hope that the flow and language level have been substantially improved.

      (2) Some passages are misleading -- lines 161-162, line 217, lines 241-242, 263-264, 299-300. They need to be changed substantially.

      We apologize for these mistakes, we have changed them.

      (3) Figure legends should be rewritten. Currently, they are too abbreviated to be understood.

      We apologize for that, we have rewritten them.

      (4) Discussion should also be thoroughly reworked. Currently, it is merely restating the authors' findings. The authors should put their findings in the broader context of the field.

      We apologize for that. For a better understanding of our study, we have reworked the discussion.

      Reviewer #2 (Recommendations For The Authors):

      (1) Previous studies (DOI: 10.1093/nar/gkw663, 10.1016/j.jgg.2015.07.002, 10.1016/j.dnarep.2022.103394) have suggested that OGT deficiency triggers DNA damage, connecting it to DNA repair and maintenance through various mechanisms. This should be acknowledged in the manuscript. Conversely, the role of HCF1 and its cleaved products in maintaining genomic integrity hasn't been previously shown. The authors investigate HCF1's role solely in the context of OGT inhibition. It is unclear whether this is also true under other stimuli that trigger DNA damage, whether fragments of HCF1 specifically reduce DNA damage, or if HSF1 is involved in the basal machinery that would be defective only in the absence of OGT.

      We have acknowledged the manuscript mentioned above. In this paper we focused on the OGT function, which is related to HCF1. The role of HCF1 and its cleaved products in maintaining genomic integrity is an interesting topic, we may focus on it in next project.

      (2) In villin-CRE-deficient mice, the authors observe generic inflammation in the intestine unrelated to tumor development. It's unclear if this also occurs in the presence of OGT inhibitors in mice, whether these inhibitors induce a systemic inflammatory (Type I interferon) response, or if certain tissues like the intestine or proliferating tumor cells are more susceptible to such a response.

      We thank the comment, yes, investigating whether OGT inhibitors induce an inflammatory response, either systemically or tissue-specifically, is a very interesting project to focus on. However, in our current paper, we use a genetic method to identify the role of OGT deficiency in intestine inflammation-induced tumor development. This approach provides convincing evidence for our hypothesis. We may test the effect of OGT inhibitors on inflammation and tumor development in our next project.

      (3) Another critical observation is the magnitude of the interferon response triggered by DNA damage in the OGT-deficient models. While it's known that DNA damage can activate cGAS-STING, the response's extent in the absence of OGT prompts the question of whether additional OGT-specific features could explain this phenomenon. For example, Lamin A, essential for nuclear envelope integrity and shown to be O-glycosylated (DOI: 10.3390/cells7050044), and other components of the nuclear envelope or its repair might be affected by OGT. The impact of OGT inhibition on nuclear envelope integrity compared to other DNA-damaging agents could be explored.

      We appreciate the comment, in this project, we find an OGT binding protein, HCF1, though LC–MS/MS assay, it’s a top one candidate in binding profiles, so we focus on it. Like Lamin A and other components of the nuclear envelope still are good targets to check, we may explore these in our next project.

      (4) The authors also demonstrate a correlation between OGT expression in tumors compared to healthy tissues. However, the reason is unclear, raising questions about whether this is a consequence of proliferation or metabolic deregulation in the cancer. The authors should address this aspect.

      We appreciate the reviewer’s insightful point. It is very good questions and very interesting research. However, in this paper we focused on how OGT influence its downstream molecules to promote tumor, we didn’t check why OGT is increased in tumors, it is not the scope of this current work, we would love to investigate it in the future.

      Minor points

      Please add the legend to Figures S2, S3 and S5.

      We thank the comment, we have added the legend to Figures S2, S3 and S5.

      The sentence line 137 should be clarified as OGT deficiency seems more related to increased inflammation in this model.

      We thank the comment, we have corrected the sentence line 137.

      Line 732 has a ( typo before the number 34.

      We thank the comment, we have corrected the sentence line 732.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this important study, the authors manually assessed randomly selected images published in eLife between 2012 and 2020 to determine whether they were accessible for readers with deuteranopia, the most common form of color vision deficiency. They then developed an automated tool designed to classify figures and images as either "friendly" or "unfriendly" for people with deuteranopia. While such a tool could be used by publishers, editors or researchers to monitor accessibility in the research literature, the evidence supporting the tools' utility was incomplete. The tool would benefit from training on an expanded dataset that includes different image and figure types from many journals, and using more rigorous approaches when training the tool and assessing performance. The authors also provide code that readers can download and run to test their own images. This may be of most use for testing the tool, as there are already several free, user-friendly recoloring programs that allow users to see how images would look to a person with different forms of color vision deficiency. Automated classifications are of most use for assessing many images, when the user does not have the time or resources to assess each image individually.

      Thank you for this assessment. We have responded to the comments and suggestions in detail below. One minor correction to the above statement: the randomly selected images published in eLife were from articles published between 2012 and 2022 (not 2020).

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors of this study developed a software application, which aims to identify images as either "friendly" or "unfriendly" for readers with deuteranopia, the most common color-vision deficiency. Using previously published algorithms that recolor images to approximate how they would appear to a deuteranope (someone with deuteranopia), authors first manually assessed a set of images from biology-oriented research articles published in eLife between 2012 and 2022. The researchers identified 636 out of 4964 images as difficult to interpret ("unfriendly") for deuteranopes. They claim that there was a decrease in "unfriendly" images over time and that articles from cell-oriented research fields were most likely to contain "unfriendly" images. The researchers used the manually classified images to develop, train, and validate an automated screening tool. They also created a user-friendly web application of the tool, where users can upload images and be informed about the status of each image as "friendly" or "unfriendly" for deuteranopes.

      Strengths:

      The authors have identified an important accessibility issue in the scientific literature: the use of color combinations that make figures difficult to interpret for people with color-vision deficiency. The metrics proposed and evaluated in the study are a valuable theoretical contribution. The automated screening tool they provide is well-documented, open source, and relatively easy to install and use. It has the potential to provide a useful service to the scientists who want to make their figures more accessible. The data are open and freely accessible, well documented, and a valuable resource for further research. The manuscript is well written, logically structured, and easy to follow.

      We thank the reviewer for these comments.

      Weaknesses:

      (1) The authors themselves acknowledge the limitations that arise from the way they defined what constitutes an "unfriendly" image. There is a missed chance here to have engaged deuteranopes as stakeholders earlier in the experimental design. This would have allowed [them] to determine to what extent spatial separation and labelling of problematic color combinations responds to their needs and whether setting the bar at a simulated severity of 80% is inclusive enough. A slightly lowered barrier is still a barrier to accessibility.

      We agree with this point in principle. However, different people experience deuteranopia in different ways, so it would require a large effort to characterize these differences and provide empirical evidence about many individuals' interpretations of problematic images in the "real world." In this study, we aimed to establish a starting point that would emphasize the need for greater accessibility, and we have provided tools to begin accomplishing that. We erred on the side of simulating relatively high severity (but not complete deuteranopia). Thus, our findings and tools should be relevant to some (but not all) people with deuteranopia. Furthermore, as noted in the paper, an advantage of our approach is that "by using simulations, the reviewers were capable of seeing two versions of each image: the original and a simulated version." We believe this step is important in assessing the extent to which deuteranopia could confound image interpretations. Conceivably, this could be done with deuteranopes after recoloration, but it is difficult to know whether deuteranopes would see the recolored images in the same way that non-deuteranopes see the original images. It is also true that images simulating deuteranopia may not perfectly reflect how deuteranopes see those images. It is a tradeoff either way. We have added comments along these lines to the paper.

      (2) The use of images from a single journal strongly limits the generalizability of the empirical findings as well as of the automated screening tool itself. Machine-learning algorithms are highly configurable but also notorious for their lack of transparency and for being easily biased by the training data set. A quick and unsystematic test of the web application shows that the classifier works well for electron microscopy images but fails at recognizing red-green scatter plots and even the classical diagnostic images for color-vision deficiency (Ishihara test images) as "unfriendly". A future iteration of the tool should be trained on a wider variety of images from different journals.

      Thank you for these comments. We have reviewed an additional 2,000 images, which were randomly selected from PubMed Central. We used our original model to make predictions for those images. The corresponding results are now included in the paper.

      We agree that many of the images identified as being "unfriendly" are microscope images, which often use red and green dyes. However, many other image types were identified as unfriendly, including heat maps, line charts, maps, three-dimensional structural representations of proteins, photographs, network diagrams, etc. We have uploaded these figures to our Open Science Framework repository so it's easier for readers to review these examples. We have added a comment along these lines to the paper.

      The reviewer mentioned uploading red/green scatter plots and Ishihara test images to our Web application and that it reported they were friendly. Firstly, it depends on the scatter plot. Even though some such plots include green and red, the image's scientific meaning may be clear. Secondly, although the Ishihara images were created as informal tests for humans, these images (and ones similar to them) are not in eLife journal articles (to our knowledge) and thus are not included in our training set. Thus, it is unsurprising that our machine-learning models would not classify such images correctly as unfriendly.

      (3) Focusing the statistical analyses on individual images rather than articles (e.g. in figures 1 and 2) leads to pseudoreplication. Multiple images from the same article should not be treated as statistically independent measures, because they are produced by the same authors. A simple alternative is to instead use articles as the unit of analysis and score an article as "unfriendly" when it contains at least one "unfriendly" image. In addition, collapsing the counts of "unfriendly" images to proportions loses important information about the sample size. For example, the current analysis presented in Fig. 1 gives undue weight to the three images from 2012, two of which came from the same article. If we perform a logistic regression on articles coded as "friendly" and "unfriendly" (rather than the reported linear regression on the proportion of "unfriendly" images), there is still evidence for a decrease in the frequency of "unfriendly" eLife articles over time.

      Thank you for taking the time to provide these careful insights. We have adjusted these statistical analyses to focus on articles rather than individual images. For Figure 1, we treat an article as "Definitely problematic" if any image in the article was categorized as "Definitely problematic." Additionally, we no longer collapse the counts to proportions, and we use logistic regression to summarize the trend over time. The overall conclusions remain the same.

      Another issue concerns the large number of articles (>40%) that are classified as belonging to two subdisciplines, which further compounds the image pseudoreplication. Two alternatives are to either group articles with two subdisciplines into a "multidisciplinary" group or recode them to include both disciplines in the category name.

      Thank you for this insight. We have modified Figure 2 so that it puts all articles that have been assigned two subdisciplines into a "Multidisciplinary" category. The overall conclusions remain the same.

      (4) The low frequency of "unfriendly" images in the data (under 15%) calls for a different performance measure than the AUROC used by the authors. In such imbalanced classification cases the recommended performance measure is precision-recall area under the curve (PR AUC: https://doi.org/10.1371%2Fjournal.pone.0118432) that gives more weight to the classification of the rare class ("unfriendly" images).

      We now calculate the area under the precision-recall curve and provide these numbers (and figures) alongside the AUROC values (and figures). We agree that these numbers are informative; both metrics lead to the same overall conclusions.

      Reviewer #2 (Public Review):

      Summary:

      An analysis of images in the biology literature that are problematic for people with a color-vision deficiency (CVD) is presented, along with a machine learning-based model to identify such images and a web application that uses the model to flag problematic images. Their analysis reveals that about 13% of the images could be problematic for people with CVD and that the frequency of such images decreased over time. Their model yields 0.89 AUC score. It is proposed that their approach could help making biology literature accessible to diverse audiences.

      Strengths:

      The manuscript focuses on an important yet mostly overlooked problem, and makes contributions both in expanding our understanding of the extent of the problem and in developing solutions to mitigate the problem. The paper is generally well-written and clearly organized. Their CVD simulation combines five different metrics. The dataset has been assessed by two researchers and is likely to be of high-quality. Machine learning algorithm used (convolutional neural network, CNN) is an appropriate choice for the problem. The evaluation of various hyperparameters for the CNN model is extensive.

      We thank the reviewer for these comments.

      Weaknesses:

      The focus seems to be on one type of CVD (deuteranopia) and it is unclear whether this would generalize to other types.

      We agree that it would be interesting to perform similar analyses for protanopia and other color-vision deficiencies. But we leave that work for future studies.

      The dataset consists of images from eLife articles. While this is a reasonable starting point, whether this can generalize to other biology/biomedical articles is not assessed.

      This is an important point. We have reviewed an additional 2,000 images, which were randomly selected from PubMed Central, and used our original model to make predictions for those images. The corresponding results are now included in the paper.

      "Probably problematic" and "probably okay" classes are excluded from the analysis and classification, and the effect of this exclusion is not discussed.

      We now address this in the Discussion section.

      Machine learning aspects can be explained better, in a more standard way.

      Thank you. We address this comment in our responses to your comments below.

      The evaluation metrics used for validating the machine learning models seem lacking (e.g., precision, recall, F1 are not reported).

      We now provide these metrics (in a supplementary file).

      The web application is not discussed in any depth.

      The paper includes a paragraph about how the Web application works and which technologies we used to create it. We are unsure which additional aspects should be addressed.

      Reviewer #3 (Public Review):

      Summary:

      This work focuses on accessibility of scientific images for individuals with color vision deficiencies, particularly deuteranopia. The research involved an analysis of images from eLife published in 2012-2022. The authors manually reviewed nearly 5,000 images, comparing them with simulated versions representing the perspective of individuals with deuteranopia, and also evaluated several methods to automatically detect such images including training a machine-learning algorithm to do so, which performed the best. The authors found that nearly 13% of the images could be challenging for people with deuteranopia to interpret. There was a trend toward a decrease in problematic images over time, which is encouraging.

      Strengths:

      The manuscript is well organized and written. It addresses inclusivity and accessibility in scientific communication, and reinforces that there is a problem and that in part technological solutions have potential to assist with this problem.

      The number of manually assessed images for evaluation and training an algorithm is, to my knowledge, much larger than any existing survey. This is a valuable open source dataset beyond the work herein.

      The sequential steps used to classify articles follow best practices for evaluation and training sets.

      We thank the reviewer for these comments.

      Weaknesses:

      I do not see any major issues with the methods. The authors were transparent with the limitations (the need to rely on simulations instead of what deuteranopes see), only capturing a subset of issues related to color vision deficiency, and the focus on one journal that may not be representative of images in other journals and disciplines.

      We thank the reviewer for these comments. Regarding the last point, we have reviewed an additional 2,000 images, which were randomly selected from PubMed Central, and used our original model to make predictions for those images. The corresponding results are now included in the paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      N/A

      Thank you.

      Reviewer #2 (Recommendations For The Authors):

      - The web application link can be provided in the Abstract for more visibility.

      We have added the URL to the Abstract.

      - They focus on deuteranopia in this paper. It seems that protanopia is not considered. Why? What are the challenges in considered this type of CVD?

      We agree that it would be interesting to perform similar analyses for protanopia and other color-vision deficiencies. But we leave that work for future studies. Deuteranopia is the most common color-vision deficiency, so we focused on the needs of these individuals as a starting point.

      - The dataset is limited to eLife articles. More discussion of this limitation is needed. Couldn't one also include some papers from PMC open access dataset for comparison?

      We have reviewed an additional 2,000 images, which we randomly selected from PubMed Central, and used our original model to make predictions for those images. The corresponding results are now included in the paper.

      - An analysis of the effect of selecting a severity value of 0.8 can be included.

      We agree that this would be interesting, but we leave it for future work.

      - "Probably problematic" and "probably okay" classes are excluded from analysis, which may oversimplify the findings and bias the models. It would have been interesting to study these classes as well.

      We agree that this would be interesting, but we leave it for future work. However, we have added a comment to the Discussion on this point.

      - Some machine learning aspects are discussed in a non-standard way. Class weighting or transfer learning would not typically be considered hyperparameters."corpus" is not a model. Description of how fine-tuning was performed could be clearer.

      We have updated this wording to use more appropriate terminology to describe these different "configurations." Additionally, we expanded and clarified our description of fine tuning.

      - Reporting performance on the training set is not very meaningful. Although I understand this is cross-validated, it is unclear what is gained by reporting two results. Maybe there should be more discussion of the difference.

      We used cross validation to compare different machine-learning models and configurations. Providing performance metrics helps to illustrate how we arrived at the final configurations that we used. We have updated the manuscript to clarify this point.

      - True positives, false positives, etc. are described as evaluation metrics. Typically, one would think of these as numbers that are used to calculate evaluation metrics, like precision (PPV), recall (sensitivity), etc. Furthermore, they say they measure precision, recall, precision-recall curves, but I don't see these reported in the manuscript. They should be (especially precision, recall, F1).

      We have clarified this wording in the manuscript.

      - There are many figures in the supplementary material, but not much interpretation/insights provided. What should we learn from these figures?

      We have revised the captions and now provide more explanations about these figures in the manuscript.

      - CVD simulations are mentioned (line 312). It is unclear whether these methods could be used for this work and if so, why they were not used. How do the simulations in this work compare to other simulations?

      This part of the manuscript refers to recolorization techniques, which attempt to make images more friendly to people with color vision deficiencies. For our paper, we used a form of recolorization that simulates how a deuteranope would see a figure in its original form. Therefore, unless we misunderstand the reviewer's question, these two types of simulation have distinct purposes and thus are not comparable.

      - relu -> ReLU

      We have corrected this.

      Reviewer #3 (Recommendations For The Authors):

      The title can be more specific to denote that the survey was done in eLife papers in the years 2012-2022. Similarly, this should be clear in the abstract instead of only "images published in biology-oriented research articles".

      Thank you for this suggestion. Because we have expanded this work to include images from PubMed Central papers, we believe the title is acceptable as it stands. We updated the abstract to say, "images published in biology- and medicine-oriented research articles"

      Two mentions of existing work that I did not see are to Jambor and colleagues' assessment on color accessibility in several fields: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041175/, and whether this work overlaps with the 'JetFighter' tool

      (https://elifesciences.org/labs/c2292989/jetfighter-towards-figure-accuracy-and-accessibility).

      Thank you for bringing these to our attention. We have added a citation to Jambor, et al.

      We also mention JetFighter and describe its uses.

      Similarly, on Line 301: Significant prior work has been done to address and improve accessibility for individuals with CVD. This work can be generally categorized into three types of studies: simulation methods, recolorization methods, and estimating the frequency of accessible images.

      - One might mention education as prior work as well, which might in part be contributing to a decrease in problematic images (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041175/)

      We now suggest that there are four categories and include education as one of these.

      Line 361, when discussing resources to make figures suitable, the authors may consider citing this paper about an R package for single-cell data: https://elifesciences.org/articles/82128

      Thank you. We now cite this paper.

      The web application is a good demonstration of how this can be applied, and all code is open so others can apply the CNN in their own uses cases. Still, by itself, it is tedious to upload individual image files to screen them. Future work can implement this into a workflow more typical to researchers, but I understand that this will take additional resources beyond the scope of this project. The demonstration that these algorithms can be run with minimal resources in the browser with tensorflow.js is novel.

      Thank you.

      General:

      It is encouraging that 'definitely problematic' images have been decreasing over time in eLife. Might this have to do with eLife policies? I could not quickly find if eLife has checks in place for this, but given that JetFighter was developed in association with eLife, I wonder if there is an enhanced awareness of this issue here vs. other journals.

      This is possible. We are not aware of a way to test this formally.

    1. Reviewer #1 (Public Review):

      Summary:

      A nice study trying to identify the relationship between E. coli O157 from cattle and humans in Alberta, Canada.

      Strengths:

      (1) The combined human and animal sampling is a great foundation for this kind of study.

      (2) Phylogenetic analyses seem to have been carried out in a high-quality fashion.

      Weaknesses:

      I think there may be a problem with the selection of the isolates for the primary analysis. This is what I'm thinking:

      (1) Transmission analyses are strongly influenced by the sampling frame.

      (2) While the authors have randomly selected from their isolate collections, which is fine, the collections themselves are not random.

      (3) The animal isolates are likely to represent a broad swathe of diversity, because of the structured sampling of animal reservoirs undertaken (as I understand it).

      (4) The human isolates are all from clinical cases. Clinical cases of the disease are likely to be closely related to other clinical cases, because of outbreaks (either detected, or undetected), and the high ascertainment rate for serious infections.

      (5) Therefore, taking an equivalent number of animal and clinical isolates, will underestimate the total diversity in the clinical isolates because the sampling of the clinical isolates is less "independent" (in the statistical sense) than sampling from the animal isolates.

      (6) This could lead to over-estimating of transmission from cattle to humans.

      (7) "We hypothesize that the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence" - this seems a bit tautological. There is a lot of O157 because there's a lot of transmission. What part of the fact it is local means that it is a principal cause of high incidence? It seems that they've observed a high rate of local transmission, but the reasons for this are not apparent, and hence the cause of Alberta's incidence is not apparent. Would a better conclusion not be that "X% of STEC in Alberta is the result of transmission of local variants"? And then, this poses a question for future epi studies of what the transmission pathway is.

    2. Author response:

      Reviewer #1 (Public Review):

      Summary:

      A nice study trying to identify the relationship between E. coli O157 from cattle and humans in Alberta, Canada.

      Strengths:

      (1) The combined human and animal sampling is a great foundation for this kind of study.

      (2) Phylogenetic analyses seem to have been carried out in a high-quality fashion.

      Weaknesses:

      I think there may be a problem with the selection of the isolates for the primary analysis. This is what I'm thinking:

      (1) Transmission analyses are strongly influenced by the sampling frame.

      (2) While the authors have randomly selected from their isolate collections, which is fine, the collections themselves are not random.

      (3) The animal isolates are likely to represent a broad swathe of diversity, because of the structured sampling of animal reservoirs undertaken (as I understand it).

      (4) The human isolates are all from clinical cases. Clinical cases of the disease are likely to be closely related to other clinical cases, because of outbreaks (either detected, or undetected), and the high ascertainment rate for serious infections.

      (5) Therefore, taking an equivalent number of animal and clinical isolates, will underestimate the total diversity in the clinical isolates because the sampling of the clinical isolates is less "independent" (in the statistical sense) than sampling from the animal isolates.

      (6) This could lead to over-estimating of transmission from cattle to humans.

      We appreciate the reviewer’s careful thoughts about our sampling strategy. We agree with points (1) and (2), and we will provide additional details on the animal collections as requested.

      We agree with point (3) in theory but not in fact. As shown in Figure 3a, the cattle isolates were very closely related, despite the temporal and geographic breadth of sampling within Alberta. The median SNP distance between cattle sequences was 45 (IQR 36-56), compared to 54 (IQR 43-229) SNPs between human sequences from cases in Alberta during the same years. Additionally, as shown in Figure 2, only clade A and B isolates – clades that diverge substantially from the rest of the tree – were dominated by human cases in Alberta. We will better highlight this evidence in the revision.

      We agree with the reviewer in point (4) that outbreaks can be an important confounder of phylogenetic inference. This is why we down-sampled outbreaks (based on genetic relatedness, not external designation) in our extended analyses (lines 192-194). We did not do this in the primary analysis, because there were no large clusters of identical isolates. Figure 3b shows a limited number of small clusters; however, clustered cattle isolates outnumbered clustered human isolates, suggesting that any bias would be in the opposite direction the reviewer suggests. Regarding severe cases being oversampled among the clinical isolates, this is absolutely true and a limitation of all studies utilizing public health reporting data. We will make this limitation to generalizability clearer in the discussion. However, as noted above, clinical isolates were more variable than cattle isolates, so it does not appear to have heavily biased the analysis.

      We disagree with the reviewer on point (5). While the bias toward severe cases could make the human isolates less independent, the relative sampling proportions are likely to induce greater distance between clinical isolates than cattle isolates, which is exactly what we observe (see response to point (3) above). Cattle are E. coli O157:H7’s primary reservoir, and humans are incidental hosts not able to sustain infection chains long-term. Not only is the bacteria prevalent among cattle, cattle are also highly prevalent in Alberta. Thus, even with 89 sampling points, we are still capturing a small proportion of the E. coli O157:H7 in the province. Being able to sample only a small proportion of cattle’s E. coli O157:H7 increases the likelihood of only sampling from the center of the distribution, making extreme cases such as that shown at the very bottom of the tree in Figure 3b, rare and important. In comparison, sampling from human cases constitutes a higher proportion of human infections relative to cattle, and is therefore more representative of the underlying distribution, including extremes. We will add this point to the limitations. As with the clustering above, if anything, this outcome would have biased the study away from identifying cattle as the primary reservoir. Additionally, the relatively small proportion of cattle sampled makes our finding that 15.7% of clinical isolates were within 5 SNPs of a cattle isolate, the distance most commonly used to indicate transmission for E. coli O157:H7, all the more remarkable.

      Because of the aforementioned points, we disagree with the reviewer’s conclusion in point (6). We believe transmission from cattle-to-humans is likely underestimated for the reasons given above. Not only do all prior studies indicate ruminants as the primary reservoirs of E. coli O157:H7, and humans as only incidental hosts, our specific data do not support the reviewer’s individual contentions. That said, we will conduct a sensitivity analysis as recommended to determine the impact of sampling and inclusion of the small clusters on our primary findings.

      (7) We hypothesize that the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence" - this seems a bit tautological. There is a lot of O157 because there's a lot of transmission. What part of the fact it is local means that it is a principal cause of high incidence? It seems that they've observed a high rate of local transmission, but the reasons for this are not apparent, and hence the cause of Alberta's incidence is not apparent. Would a better conclusion not be that "X% of STEC in Alberta is the result of transmission of local variants"? And then, this poses a question for future epi studies of what the transmission pathway is.

      The reviewer is correct, and the suggestion for the direction of future studies was our intent with this statement. We will revise it.

      Reviewer #2 (Public Review):

      This study identified multiple locally evolving lineages transmitted between cattle and humans persistently associated with E. coli O157:H7 illnesses for up to 13 years. Furthermore, this study mentions a dramatic shift in the local persistent lineages toward strains with the more virulent stx2a-only profile. The authors hypothesized that this phenomenon is the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence. These opinions more effectively explain the role of the cattle reservoir in the dynamics of E. coli O157:H7 human infections.

      (1) The authors acknowledge the possibility of intermediate hosts or environmental reservoirs playing a role in transmission. Further discussion on the potential roles of other animal species commonly found in Alberta (e.g., sheep, goats, swine) could enhance the understanding of the transmission dynamics. Were isolates from these species available for analysis? If not, the authors should clearly state this limitation.

      We will expand the discussion of other species in Alberta, as suggested, including other livestock, wildlife, and the potential role of birds and flies. Unfortunately, we did not have sequences available from other species, and we will add this to the limitations. Sequences from other species may be available from sequences collected by others, which as we note in the limitations do not have sufficient metadata to assign them to Alberta vs. the rest of Canada. While we have requested this data, we have been unsuccessful in obtaining it. We will continue to pursue it.

      (2) The focus on E. coli O157:H7 is understandable given its prominence in Alberta and the availability of historical data. However, a brief discussion on the potential applicability of the findings to non-O157 STEC serogroups, and the limitations therein, would be beneficial. Are there reasons to believe the transmission dynamics would be similar or different for other serogroups?

      We appreciate this comment and will expand our discussion of relevance to non-O157 STEC. Other authors have proposed that transmission dynamics differ, and studies of STEC risk factors, including our own, support this. However, there has been very little direct study of non-O157 transmission dynamics and there is even less cross-species genomic and metadata available for non-O157 isolates of concern.

      (3) The authors briefly mention the need for elucidating local transmission systems to inform management strategies. A more detailed discussion on specific public health interventions that could be targeted at the identified LPLs and their potential reservoirs would strengthen the paper's impact.

      We agree with the reviewer that this would be a good addition to the manuscript. The public health implications for control are several and extend to non-STEC reportable zoonotic enteric infections, such as Campylobacter and Salmonella. We will add a discussion of these.

      (4) Understanding the relationship between specific risk factors and E. coli O157:H7 infections is essential for developing effective prevention strategies. Have case-control or cohort studies been conducted to assess the correlation between identified risk factors and the incidence of E. coli O157:H7 infections? What methodologies were employed to control for potential confounders in these studies?

      Yes, there have been several case-control studies of reported cases. Many of these are referenced in the discussion in terms of the contribution of different sources to infection. However, we will add a more explicit discussion of risk factors.

      (5) The study's findings are noteworthy, particularly in the context of E. coli O157:H7 epidemiology. However, the extent to which these results can be replicated across different temporal and geographical settings remains an open question. It would be constructive for the authors to provide additional data that demonstrate the replication of their sampling and sequencing experiments under varied conditions. This would address concerns regarding the specificity of the observed patterns to the initial study's parameters.

      We appreciate the reviewer’s comment, as we are currently building on this analysis with an American dataset with different types of data available than were used in this study. We will add a discussion of this. We will also be adding a sensitivity analysis to the manuscript simulating a different sampling approach, which should also be informative to this question.

    1. Reviewer #2 (Public Review):

      Summary:

      The goal is to ask if common species when studied across their range tend to have larger ranges in total. To do this the authors examined a very large citizen science database which gives estimates of numbers, and correlated that with the total range size, available from Birdlife. The average correlation is positive but close to zero, and the distribution around zero is also narrow, leading to the conclusion that, even if applicable in some cases, there is no evidence for consistent trends in one or other direction.

      Strengths:

      The study raises a dormant question, with a large dataset.

      Weaknesses:

      This study combines information from across the whole world, with many different habitats, taxa, and observations, which surely leads to a quite heterogeneous collection.

      First, scale. Many of the earlier analyses were within smaller areas, and for example, ranges are not obviously bounded by a physical barrier. I assume this study is only looking at breeding ranges; that should be stated, as 40% of all bird species migrate, and winter limitation of populations is important. Also are abundances only breeding abundances or are they measured through the year? Are alien distributions removed?

      Second, consider various reasons why abundance and range size may be correlated (sometimes positively and sometimes negatively) at large scales. Combining studies across such a large diversity of ecological situations seems to create many possibilities to miss interesting patterns. For example:

      (1) Islands are small and often show density release.

      (2) North temperate regions have large ranges (Rapoport's rule) and higher population sizes than the tropics.

      (3) Body size correlates with global range size (I am unsure if this has recently been tested but is present in older papers) and with density. For example, cosmopolitan species (barn owl, osprey, peregrine) are relatively large and relatively rare.

      (4) In the consideration of alien species, it certainly looks to me as if the law is followed, with pigeon, starling, and sparrow both common and widely distributed. I guess one needs to make some sort of statement about anthropogenic influences, given the dramatic changes in both populations and environments over the past 50 years.

      (5) Wing shape correlates with ecological niche and range size (e.g. White, American Naturalist). Aerial foraging species with pointed wings are likely to be easily detected, and several have large ranges reflecting dispersal (e.g. barn swallow).

      Third, biases. I am not conversant with ebird methodology, but the number appearing on checklists seems a very poor estimate of local abundance. As noted in the paper, common species may be underestimated in their abundance. Flocking species must generate large numbers, skulking species few. The survey is often likely to be in areas favorable to some species and not others. The alternative approach in the paper comes from an earlier study, based on ebird but then creating densities within grids and surely comes with similar issues.<br /> Biases are present in range as well. Notably, tropical mountain-occupying species have range sizes over-estimated because holes in the range are not generally accounted for (Ocampo-Peñuela et al Nature communications). These species are often quite rare too.

      Fourth, random error. Random error in ebird assessments is likely to be large, with differences among observers, seasons, days, and weather (e.g. Callaghan et al. 2021, PNAS). Range sizes also come with many errors, which is why occupancy is usually seen as the more appropriate measure.

      If we consider both range and abundance measurements to be subject to random error in any one species list, then the removal of all these errors will surely increase the correlation for that list (the covariance shouldn't change but the variances will decrease). I think (but am not sure) that this will affect the mean correlation because more of the positive correlations appear 'real' given the overall mean is positive. It will definitely affect the variance of the correlations; the low variance is one of the main points in the paper. A high variance would point to the operation of multiple mechanisms, some perhaps producing negative correlations (Blackburn et al. 2006).

      On P.80 it is stated: "Specifically, we can quantify how AOR will change in relation to increases in species richness and sampling duration, both of which are predicted to reduce the magnitude of AORs" I haven't checked the references that make this statement, but intuitively the opposite is expected? More species and longer durations should both increase the accuracy of the estimate, so removing them introduces more error? Perhaps dividing by an uncertain estimate introduces more error anyway. At any rate, the authors should explain the quoted statement in this paper.

      It would be of considerable interest to look at the extreme negative and extreme positive correlations: do they make any biological sense?

      Discussion:

      I can see how publication bias can affect meta-analyses (addressed in the Gaston et al. 2006 paper) but less easily see how confirmation bias can. It seems to me that some of the points made above must explain the difference between this study and Blackburn et al. 2006's strong result.

      Certainly, AOR really does seem to be present in at least some cases (e.g. British breeding birds) and a discussion of individual cases would be valuable. Previous studies have also noted that there are at least some negative and some non-significant associations, and understanding the underlying causes is of great interest (e.g. Kotiaho et al. Biology Letters).

    1. Textbook authors also never invite students to critique their own work. Again, our Mississippi textbook shows this can be done. For example, we noted that only four of our twenty-five mini-biographies were of women. “Has the book therefore been guilty of discrimination against women?” we then asked. Such a question implies that students can think for themselves, which then helps them learn to do so. When students are not asked to assess, but only to remember, they do not learn how to assess or how to think for themselves.

      It is not easy to crtitque your own work in a way such as these authors did. However, by stating in their book that "only four of our twenty-five mini-biographies were of women," shows that it is okay to admit your faults. Nobody is perfect and it is foolish to illustrate yourself as such. Another benefit of this particular group making these statements is that it draws the student to look closer at these types of things. To ask questions, such as, "Out of these authors, how many are women? How many are of a different race?" While these questions may cause some backlash for "discrimination", they are valid questions for this instance. As long as you are not using gender or race in a hateful way, it is okay to observe these things. It is common sense that people of different genders and races might have different opinions, life styles, experiences, and so much more.

    1. Late work may be accepted with a request for extension which was submitted up to 48 hours before the due date.

      This is good to know that we are able to receive extensions. I know with papers I tend to reach a writers block, or sometimes need extra time to re-read my paper and make sure it is up to my standards. I think that this is very beneficial to the student, and you as a professor. I think with having that policy allows writers to be more comfortable with turning in their work that is completed, it also doesn't waste your time either, by reading/grading a paper that could've used a little more work.

    2. In this course I need you to be brave. You will read things that may make you uncomfortable. You will discuss difficult topics. This will stretch the boundaries of what you may think you are capable of to new levels.

      I am looking forward to writing about topics that are more uncomfortable. I feel like, as students, we focus a lot on writing reports and more analytical projects. I hope that this class allows us to have a more vulnerable perspective on writing.

    1. Author response:

      The following is the authors’ response to the current reviews.

      The concerns raised during the review have been incorporated into the discussion of the results, and the need for further research is acknowledged in the paper. This is not possible in the present study, as the clinical project has been completed and further patients cannot be enrolled without starting a new project. We are confident that the results are scientifically valid and that the methodology was scientifically sound and up to date. They were obtained on a dataset that was obviously large enough to allow 20% of it to be set aside and a machine-learned classifier to be trained on the remaining 80%, which then assigned samples to neuropathy with an accuracy better than guessing.

      Furthermore, our results are at least tentatively replicated in a completely independent data set from another patient cohort. The strengths and limitations of the study design, in particular the latter, are discussed in the necessary depth. In summary, the machine-learned results provided major hits on one side and probably unimportant lipids on the other side of the variable importance scale. Both could be verified in vitro. We are therefore confident that we have contributed to the advancement of knowledge about cancer therapy-associated neuropathy and look forward to further developments in this area.


      The following is the authors’ response to the original reviews.

      Weaknesses Reviewer 1: 

      There are a number of weaknesses in the study. The small sample size is a significant limitation of the study. Out of 31 patients, only 17 patients were reported to develop neuropathy, with significant neuropathy (grade 2/3) in only 5 patients. The authors acknowledge this limitation in the results and discussion sections of the manuscript, but it limits the interpretation of the results. Also acknowledged is the limited method used to assess neuropathy. 

      We agree with the reviewer that the cohort size and assessment of neuropathy are limitations of our study as we already described in the corresponding section of the manuscript. However, occurrence and grade of the neuropathy are in line with results reported from previous studies. From these studies, the expected occurrence of neuropathy with our therapeutic regimen is around 50-70% (54.9% in our cohort), and most patients (80-90%) are expected to experience Grade 1 neuropathy after 12 weeks (13). In these studies, neuropathy is assessed by using questionnaires or by grading via NCTCTCAE as in our study. In summary, assessment and occurrence of neuropathy of our reported cohort are in line with previous reports.

      Potentially due to this small number of patients with neuropathy, the machine learning algorithms could not distinguish between samples with and without neuropathy. Only selected univariate analyses identified differences in lipid profiles potentially related to neuropathy.  

      The data analysis consistently followed a "mixture of experts" approach, as this seems to be the most successful way to deal with omics data. We have elaborated on this in the Methods section, including several supporting references. Regarding the quoted sentence from the results section, after rereading it, we realized that it was somewhat awkwardly worded. What we mean is now better worded in the results section, namely “Although the three algorithms detected neuropathy in new cases, unseen during training, at balanced accuracy of up to 0.75, while only the guess level of 0.5 was achieved when using permuted data for training, the 95% CI of the performance measures was not separated from guess level”. Therefore, multivariate feature selection was not considered a valid approach, since it requires that the algorithms from which the feature importance is read can successfully perform their task of class assignment (4). Therefore, univariate methods (Cohen's d, FPR, FWE) were preferred, as well as a direct hypothesis transfer of the top hits from the abovementioned day1/2 assessments to neuropathy. Classical statistics consisting of direct group comparisons using Kruskal-Wallis tests (5) were performed.” 

      It was our approach to investigate the data set in an unbiased manner by different machine learning algorithms and select those lipids that the majority of the algorithms considered important for distinguishing the patient groups (majority voting). This way, the inconsistencies and limitations of a single evaluation method, such as regression analysis, that occur in some datasets, can be mitigated. 

      Three sphingolipid mediators including SA1P differed between patients with and without neuropathy at the end of treatment. These sphingolipids were elevated at the end of treatment in the cohort with neuropathy, relative to those without neuropathy. However, across all samples from pre to post-paclitaxel treatment, there was a significant reduction in SA1P levels. It is unclear from the data presented what the underlying mechanism for this result would be. 

      We agree with the reviewer that our study does not identify the mechanism by which paclitaxel treatment alters sphingolipid concentrations in the plasma of patients. It has been reported before that paclitaxel may increase expression and activity of serine palmitoyltransferase (SPT) which is the crucial enzyme and rate-limiting step in the denovo synthesis of sphingolipids. This may be associated with a shift towards increased synthesis of 1-deoxysphingolipids and a decrease of “classical” sphingolipids (6) and may explain the general reduction of SA1P and other sphingolipid levels after paclitaxel treatment in our study. 

      It is also conceivable that paclitaxel reduces the release of sphingolipids into the plasma. Paclitaxel is a microtubule stabilizing agent (7) that may interfere with intracellular transport processes and release of paracrine mediators. 

      The mechanistic details of paclitaxel involvement in sphingolipid metabolism or transport are highly interesting but identifying them is beyond the scope of our manuscript.

      If elevated SA1P is associated with neuropathy development, it would be expected to increase in those who develop neuropathy from pre to post-treatment time points. 

      There is a general trend of reduced plasma SA1P concentrations following paclitaxel treatment. Nevertheless, patients experiencing neuropathy exhibit significantly elevated SA1P levels post-treatment. 

      It has been shown before that paclitaxel-induced neuropathic pain requires activation of the S1P1 receptor in a preclinical study (8). Moreover, a meta-analysis of genome-wide association studies (GWAS) from two clinical cohorts identified multiple regulatory elements and increased activity of S1PR1 associated with paclitaxel-induced neuropathy (9). These data imply that enhanced S1P receptor activity and signaling are key drivers of paclitaxel-induced neuropathy. It seems that both, increased levels of the sphingolipid ligands in combination with enhanced expression and activity of S1P receptors can potentiate paclitaxel-induced neuropathy in patients. This explains why also decreased SA1P concentrations after paclitaxel treatment can still enhance neuropathy via the S1PRTRPV1 axis in sensory neurons.

      We added this paragraph to the discussions section of our manuscript.

      Primary sensory neuron cultures were used to examine the effects of SA1P application.

      SA1P application produced calcium transients in a small proportion of sensory neurons. It is not clear how this experimental model assists in validating the role of SA1P in neuropathy development as there is no assessment of sensory neuron damage or other hallmarks of peripheral neuropathy. These results demonstrate that some sensory neurons respond to SA1P and that this activity is linked to TRPV1 receptors. However, further studies will be required to determine if this is mechanistically related to neuropathy.

      As we detected elevated levels of SA1P in the plasma of PIPN patients, we can assume higher concentrations in the vicinity of sensory neurons. These neurons are the main drivers for neuropathy and neuropathic pain and are strongly affected by paclitaxel in their activity (10-15). Also, TRPV1 shows altered activity patterns in response to paclitaxel treatment (16). Because of its relevance for nociception and pathological pain, TRPV1 activity is a suitable and representative readout for pathological pain states in peripheral sensory neurons (17, 18), which is why we investigated them.

      We would like to point out the potency of SA1P to increase capsaicin-induced calciumtransients in sensory neurons at submicromolar concentrations. 

      We also agree with the reviewer that further studies need to investigate the underlying mechanisms in more detail. We added this sentence to the final paragraph in the discussion section of our manuscript.

      Weaknesses Reviewer 2: 

      The article is poorly written, hindering a clear understanding of core results. While the study's goals are apparent, the interpretation of sphingolipids, particularly SA1P, as key mediators of paclitaxel-induced neuropathy lacks robust evidence. 

      We agree that the relevance of SA1P as key mediator of paclitaxel-induced neuropathy might be overstated and changed the wording throughout the manuscript accordingly. However, we would like to point out the potency of this lipid to increase capsaicin-induced calcium-transients in sensory neurons at submicromolar concentrations. 

      Also, the lipid signature in the plasma of PIPN patients shows a unique pattern and sphingolipids are the group that showed the strongest alterations when comparing the patient groups. We also measured eicosanoids, such as prostaglandins, linoleic acid metabolites, endocannabinoids and other lipid groups that have previously been associated with influences on pain perception or nociceptor sensitization. However, none of these lipids showed significant differences in their concentrations in patient plasma. This is why we consider sphingolipids as contributors to or markers of paclitaxel-induced neuropathy in patients.

      We also revised the entire article to improve its clarity.

      The introduction fails to establish the significance of general neuropathy or peripheral neuropathy in anticancer drug-treated patients, and crucial details, such as the percentage of patients developing general neuropathy or peripheral neuropathy, are omitted. This omission is particularly relevant given that only around 50% of patients developed neuropathy in this study, primarily of mild Grade 1 severity with negligible symptoms, contradicting the study's assertion of CIPN as a significant side effect. 

      As we already described in the introduction, CIPN is a serious dose- and therapy-limiting side effect, which affects up to 80% of treated patients. This depends on dose and combination of chemotherapeutic agents. For paclitaxel, therapeutic doses range from 80 – 225 mg/m². As CIPN symptoms are dose-dependent, the number of PIPN patients that receive a high paclitaxel dose is higher than the number of PIPN patient receiving a low dose.

      In our study, we mainly used a low dose paclitaxel, because this therapeutic regimen is the most widely used paclitaxel monotherapy. From previous studies, the expected occurrence of neuropathy with this therapeutic regimen is around 50-70%, and most patients (8090%) are expected to experience Grade 1 neuropathy after 12 weeks (1-3).

      Our results are within the range reported by these studies (54.9% patients with neuropathy). Also, as we highlight in Table S1, the neuropathy symptoms persist in most cases for several years after chemotherapy, affecting quality of life of these patients which makes it far from being a negligible symptom.

      We added some more information concerning PIPN in the introduction section in which we emphasize the clinical problem.

      The lack of clarity in distinguishing results obtained by lipidomics using machine learning methods and conventional methods adds to the confusion. The poorly written results section fails to specify SA1P's downregulation or upregulation, and the process of narrowing down to sphingolipids and SA1P is inadequately explained. 

      We have tried to keep the machine learning part in the main manuscript short and moved major parts of it to a supplement. However, as this has been claimed to have led to a lack of clarity, we have expanded the description of the data analysis and added extensive explanations and supporting references for the mixed expert approach that was used throughout the analysis. We hope this is now clear.

      Integrating a significant portion of the discussion section into the results section could enhance clarity. An explanation of the utility of machine learning in classifying patient groups over conventional methods and the citation of original research articles, rather than relying on review articles, may also add clarity to the usefulness of the study. 

      As suggested by the reviewer, we moved the relevant parts from the discussion to the results section in the revised version of our manuscript.

      Reviewer #1 (Recommendations For The Authors): 

      Figure 2 should be better explained or removed. In its current form, it does not add to the interpretation of the manuscript.  

      As mentioned above, we have expanded the description of the ESOM/U-matrix method in the Methods section and rewritten the figure legend. In addition, we have annotated the U-matrix in the figure. The method has been reported extensively in the computer science and biomedical literature, and a more detailed description in the referenced papers would go beyond the current focus on lipidomics. However, we believe that this discussion is sufficiently detailed for the readers of this report: "… a second unsupervised approach was used to verify the agreement between the lipidomics data structure and the prior classification, implemented as self-organizing maps (SOM) of artificial neurons (19). In the special form of an “emergent” SOM (ESOM (20)), the present map consisted of 4,000 neurons arranged on a two-dimensional toroidal grid with 50 rows and 80 columns (21, 22). ESOM was used because it has been repeatedly shown to correctly detect subgroup structures in biomedical data sets comparable to the present one (20, 22, 23). The core principle of SOM learning is to adjust the weights of neurons based on their proximity to input data points. In this process, the best matching unit (BMU) is identified as the neuron closest to a given data point. The adaptation of the weights is determined by a learning rate (η) and a neighborhood function (h), both of which gradually decrease during the learning process. Finally, the groups are projected onto separate regions of the map. On top of the trained ESOM, the distance structure in the high-dimensional feature space was visualized in the form of a so-called U-matrix (24) which is the canonical tool for displaying the distance structures of input data on ESOM (21). 

      The visual presentation facilitates data group separation by displaying the distances between BMUs in high-dimensional space in a color-coding that uses a geographical map analogy, where large "heights" represent large distances in feature space, while low "valleys" represent data subsets that are similar. "Mountain ranges" with "snow-covered" heights visually separate the clusters in the data. Further details about ESOM can be found in (24)."

      The second patient cohort is only included in the discussion - with cohort details in the supplementary material and figures included in the main text. Perhaps these data should be removed entirely. The findings are described as trends and not statistically significant and multiple issues with this second cohort are mentioned in the discussion. 

      We agree with the reviewer that including the second patient cohort in the discussion is inadequate. Of course, there are differences between the patient cohorts that do not allow direct comparison and that are highlighted in the section on limitations of the study. However, we still think it is interesting and relevant to show these data, because we used our algorithms trained on the first patient cohort to analyze the second cohort. And these data support the main results. 

      We therefore moved the entire paragraph to the results section of to improve coherence of our manuscript. The passage was introduced with the subheading:  “Support of the main results in an independent second patient cohort”.

      The title does not reflect the content of the paper and should be changed to better reflect the content and its significance. 

      We change the title to “Machine learning and biological validation identify sphingolipids as potential mediators of paclitaxel-induced neuropathy in cancer patients” to avoid overstating the results as suggested by the Reviewer.

      Further, the discussion should be modified to avoid overstating the results. 

      As the reviewer suggests, we changed the wording to avoid overstating the results. 

      Reviewer #2 (Recommendations For The Authors): 

      Please address the absence of clear neuropathy in the majority of patients after treatment with paclitaxel in your discussion. 

      As stated above, occurrence and grade of the neuropathy are in line with the results from previous studies. From these studies, the expected occurrence of neuropathy with our therapeutic regimen is around 50-70%, (the variability is due to differences in the assessment methods) and most patients (80-90%) are expected to experience Grade 1 neuropathy after 12 weeks (1-3). 

      We added this information in the discussion section of the revised manuscript.

      Line 65: Kindly replace review articles with original research articles for proper citation. 

      We replaced the review articles with original publications, focusing on clinical observations. We added the following publications: Jensen et al., Front Neurosci 2020; Chen et al., Neurobiol Aging 2018; Igarashi et al., J Alzheimers Dis. 2011; Kim et al., Oncotarget 2017 as references 17-20 in the revised version of our manuscript.

      Line 260: The mention of SA1P is introduced here without prior reference (do not use words like "again", or "see above", if it is not previously mentioned). Adjust the text for coherence.

      We agree with the reviewer that the introduction of SA1P in this passage in incoherent. We replaced the sentence in line 260 with: 

      The small set of lipid mediators emerging from all three methods as informative for neuropathy included the sphingolipid sphinganine-1-phosphate (SA1P), also known as dihydrosphingosine-1-phosphate (DH-S1P)…”

      Lines 301-315: Consider relocating several lines from this section to the results section for improved clarity. 

      We moved the lines 309-312 explaining the algorithm selection and their validation success in the corresponding results section (Lipid mediators informative for assigning postpaclitaxel therapy samples to neuropathy).

      Lines 382-396: Move this content to the results section to enhance the organization and coherence of the manuscript. 

      We moved the entire paragraph to the results section of our manuscript to improve coherence. The passage was introduced with the subheading:  “Support of the main results in an independent second patient cohort”.

      References

      (1) Barginear M, Dueck AC, Allred JB, Bunnell C, Cohen HJ, Freedman RA, et al. Age and the Risk of Paclitaxel-Induced Neuropathy in Women with Early-Stage Breast Cancer (Alliance A151411): Results from 1,881 Patients from Cancer and Leukemia Group B (CALGB) 40101. Oncologist. 2019;24(5):617-23.

      (2) Mauri D, Kamposioras K, Tsali L, Bristianou M, Valachis A, Karathanasi I, et al. Overall survival benefit for weekly vs. three-weekly taxanes regimens in advanced breast cancer: A metaanalysis. Cancer Treat Rev. 2010;36(1):69-74.

      (3) Budd GT, Barlow WE, Moore HC, Hobday TJ, Stewart JA, Isaacs C, et al. SWOG S0221: a phase III trial comparing chemotherapy schedules in high-risk early-stage breast cancer. J Clin Oncol. 2015;33(1):58-64.

      (4) Lötsch J, and Ultsch A. Pitfalls of Using Multinomial Regression Analysis to Identify ClassStructure-Relevant Variables in Biomedical Data Sets: Why a Mixture of Experts (MOE) Approach Is Better. BioMedInformatics. 2023;3(4):869-84.

      (5) Kruskal WH, and Wallis WA. Use of Ranks in One-Criterion Variance Analysis. J Am Stat Assoc. 1952;47(260):583-621.

      (6) Kramer R, Bielawski J, Kistner-Griffin E, Othman A, Alecu I, Ernst D, et al. Neurotoxic 1deoxysphingolipids and paclitaxel-induced peripheral neuropathy. FASEB J. 2015;29(11):4461-72.

      (7) Field JJ, Diaz JF, and Miller JH. The binding sites of microtubule-stabilizing agents. Chem Biol. 2013;20(3):301-15.

      (8) Janes K, Little JW, Li C, Bryant L, Chen C, Chen Z, et al. The development and maintenance of paclitaxel-induced neuropathic pain require activation of the sphingosine 1-phosphate receptor subtype 1. J Biol Chem. 2014;289(30):21082-97.

      (9) Chua KC, Xiong C, Ho C, Mushiroda T, Jiang C, Mulkey F, et al. Genomewide Meta-Analysis Validates a Role for S1PR1 in Microtubule Targeting Agent-Induced Sensory Peripheral Neuropathy. Clin Pharmacol Ther. 2020;108(3):625-34.

      (10) Kawakami K, Chiba T, Katagiri N, Saduka M, Abe K, Utsunomiya I, et al. Paclitaxel increases high voltage-dependent calcium channel current in dorsal root ganglion neurons of the rat. J Pharmacol Sci. 2012;120(3):187-95.

      (11) Pittman SK, Gracias NG, Vasko MR, and Fehrenbacher JC. Paclitaxel alters the evoked release of calcitonin gene-related peptide from rat sensory neurons in culture. Exp Neurol. 2013.

      (12) Luo H, Liu HZ, Zhang WW, Matsuda M, Lv N, Chen G, et al. Interleukin-17 Regulates NeuronGlial Communications, Synaptic Transmission, and Neuropathic Pain after Chemotherapy.

      Cell reports. 2019;29(8):2384-97 e5.

      (13) Pease-Raissi SE, Pazyra-Murphy MF, Li Y, Wachter F, Fukuda Y, Fenstermacher SJ, et al. Paclitaxel Reduces Axonal Bclw to Initiate IP3R1-Dependent Axon Degeneration. Neuron. 2017;96(2):373-86 e6.

      (14) Duggett NA, Griffiths LA, and Flatters SJL. Paclitaxel-induced painful neuropathy is associated with changes in mitochondrial bioenergetics, glycolysis, and an energy deficit in dorsal root ganglia neurons. Pain. 2017.

      (15) Li Y, Adamek P, Zhang H, Tatsui CE, Rhines LD, Mrozkova P, et al. The Cancer Chemotherapeutic Paclitaxel Increases Human and Rodent Sensory Neuron Responses to TRPV1 by Activation of TLR4. J Neurosci. 2015;35(39):13487-500.

      (16) Hara T, Chiba T, Abe K, Makabe A, Ikeno S, Kawakami K, et al. Effect of paclitaxel on transient receptor potential vanilloid 1 in rat dorsal root ganglion. Pain. 2013;154(6):882-9.

      (17) Jardin I, Lopez JJ, Diez R, Sanchez-Collado J, Cantonero C, Albarran L, et al. TRPs in Pain Sensation. Front Physiol. 2017;8:392.

      (18) Julius D. TRP Channels and Pain. Annual review of cell and developmental biology.

      2013;29:355-84.

      (19) Kohonen T. Self-Organized Formation of Topologically Correct Feature Maps. Biol Cybern. 1982;43(1):59-69.

      (20) Lötsch J, Lerch F, Djaldetti R, Tegder I, and Ultsch A. Identification of disease-distinct complex biomarker patterns by means of unsupervised machine-learning using an interactive R toolbox (Umatrix). Big Data Analytics. 2018;3(1):5.

      (21) Ultsch A. 2003.

      (22) Lotsch J, Geisslinger G, Heinemann S, Lerch F, Oertel BG, and Ultsch A. Quantitative sensory testing response patterns to capsaicin- and ultraviolet-B-induced local skin hypersensitization in healthy subjects: a machine-learned analysis. Pain. 2018;159(1):11-24.

      (23) Lötsch J, Thrun M, Lerch F, Brunkhorst R, Schiffmann S, Thomas D, et al. Machine-Learned Data Structures of Lipid Marker Serum Concentrations in Multiple Sclerosis Patients Differ from Those in Healthy Subjects. Int J Mol Sci. 2017;18(6).

      (24) Lötsch J, and Ultsch A. Cham: Springer International Publishing; 2014:249-57.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Wu et al. introduce a novel approach to reactivate the Muller glia cell cycle in the mouse retina by simultaneously reducing p27Kip1 and increasing cyclin D1 using a single AAV vector. The approach effectively promotes Muller glia proliferation and reprograming without disrupting retinal structure or function. Interestingly, reactivation of the Muller glia cell cycle downregulates IFN pathway, which may contribute to the induced retinal regeneration. The results presented in this manuscript may offer a promising approach for developing Müller glia cell-mediated regenerative therapies for retinal diseases.

      Strengths:

      The data are convincing and supported by appropriate, validated methodology. These results are both technically and scientifically exciting and are likely to appeal to retinal specialists and neuroscientists in general.

      Weaknesses:

      There are some data gaps that need to be addressed.

      (1) Please label the time points of AAV injection, EdU labeling, and harvest in Figure 1B.

      We thank the reviewer for highlighting the lack of clarity in our experimental design. We will label all experiment timelines in the figures where appropriate in the revised version.

      (2) What fraction of Müller cells were transduced by AAV under the experimental conditions?

      We apologize for not clearly conveying the transduction efficiency. The retinal region adjacent to the injection site, typically near the central retina, exhibits a transduction efficiency of nearly 100%. In contrast, the peripheral retina shows a lower transduction efficiency compared to the central region. We will include the quantification of AAV transduction efficiency in the revised manuscript.

      The quantification of Edu+ MG or other markers was conducted in the area with the highest efficiency. 

      (3) It seems unusually rapid for MG proliferation to begin as early as the third day after CCA injection. Can the authors provide evidence for cyclin D1 overexpression and p27 Kip1 knockdown three days after CCA injection?

      In our pilot study, we tested the onset time of GFP expression from AAV-GFAP-GFP following intravitreal injection. We observed GFP expression in MG as early as two days post-infection. These findings will be included in the revised manuscript. Additionally, we plan to perform qPCR or Western blot analysis to confirm cyclin D1 overexpression and p27kip1 knockdown at the onset of Müller glia proliferation, which will also be included in the revised manuscript.

      (4) The authors reported that MG proliferation largely ceased two weeks after CCA treatment. While this is an interesting finding, the explanation that it might be due to the dilution of AAV episomal genome copies in the dividing cells seems far-fetched.

      We believe that the lack of durability in high Cyclin D1 and low p27kip1 levels in MG contributes to the cessation of their proliferation. A potential reason for the loss of high Cyclin D1 overexpression and p27kip1 knockdown during MG proliferation could be the dilution of the AAV episomal genome. However, testing this hypothesis is challenging. Instead, we plan to provide direct evidence in the revised manuscript by examining the levels of Cyclin D1 and p27kip1 in the retina treated with CCA before and after the peak of MG proliferation.

      Reviewer #2 (Public Review):

      This manuscript by Wu, Liao et al. reports that simultaneous knockdown of P27Kip1 with overexpression of Cyclin D can stimulate Muller glia to re-enter the cell cycle in the mouse retina. There is intense interest in reprogramming mammalian muller glia into a source for neurogenic progenitors, in the hopes that these cells could be a source for neuronal replacement in neurodegenerative diseases. Previous work in the field has shown ways in which mouse Muller glia can be neurogenically reprogrammed and these studies have shown cell cycle re-entry prior to neurogenesis. In other works, typically, the extent of glial proliferation is limited, and the authors of this study highlight the importance of stimulating large numbers of Muller glia to re-enter the cell cycle with the hopes they will differentiate into neurons. While the evidence for stimulating proliferation in this study is convincing, the evidence for neurogenesis in this study is not convincing or robust, suggesting that stimulating cell cycle-reentry may not be associated with increasing regeneration without another proneural stimulus.

      Below are concerns and suggestions.

      Intro:

      (1) The authors cite past studies showing "direct conversion" of MG into neurons. However, these studies (PMID: 34686336; 36417510) show EdU+ MG-derived neurons suggesting cell cycle re-entry does occur in these strategies of proneural TF overexpression.

      We thank the reviewer for pointing this out. We will revise the statement to "MG neurogenesis," which encompasses both direct conversion and Müller glia proliferation followed by neuronal differentiation.

      (2) Multiple citations are incorrectly listed, using the authors first name only (i.e. Yumi, et al; Levi, et al;). Studies are also incompletely referenced in the references.

      We apologize for the mistake with the reference. We will fix these mistakes in the revised version.

      Figure 1:

      (3) When are these experiments ending? On Figure 1B it says "analysis" on the end of the paradigm without an actual day associated with this. This is the case for many later figures too. The authors should update the paradigms to accurately reflect experimental end points.

      We thank the reviewer for highlighting the lack of clarity in our experimental design. We will label all experiment timelines in the figures where appropriate in the revised version.

      (4) Are there better representative pictures between P27kd and CyclinD OE, the EdU+ counts say there is a 3 fold increase between Figure 1D&E, however the pictures do not reflect this. In fact, most of the Edu+ cells in Figure 1E don't seem to be Sox9+ MG but rather horizontally oriented nuclei in the OPL that are likely microglia.

      Thanks to the reviewer for pointing this out. We will replace the image of Cyclin D1 which a better representative image.

      (5) Is the infection efficacy of these viruses different between different combinations (i.e. CyclinD OE vs. P27kd vs. control vs. CCA combo)? As the counts are shown in Figure 1G only Sox9+/Edu+ cells are shown not divided by virus efficacy. If these are absolute counts blind to where the virus is and how many cells the virus hits, if the virus efficacy varies in efficiency this could drive absolute differences that aren't actually biological.

      Because the AAV-GFAP-Cyclin D1 and AAV-GFAP-Cyclin D1-p27kip1 shRNA viruses do not carry a fluorescent reporter gene, we cannot easily measure viral efficacy in the same experiment. We believe that variations in viral efficacy cannot account for the significant differences in MG proliferation for two reasons: 1) We injected the same titer for all three viruses, and 2) Viral infection efficacy is very high, approaching 100% in the central retina. Nonetheless, to rule out the possibility that the differences in MG proliferation among the Cyclin D overexpression, p27kip1 knockdown, and CCA groups are due to variations in viral efficacy, we will include the p27kip1 knockdown and Cyclin D1 overexpression efficiencies for all four groups using qPCR and/or Western blot analysis in the revised manuscript.

      (6) According to the Jax laboratories, mice aren't considered aged until they are over 18months old. While it is interesting that CCA treatment does not seem to lose efficacy over maturation I would rephrase the findings as the experiment does not test this virus in aged retinas.

      Thank you to the reviewer for bringing this to our attention. We will void using “aged mice” in our revised manuscript.

      (7) Supplemental Figure 2c-d. These viruses do not hit 100% of MG, however 100% of the P27Kip staining is gone in the P27sh1 treatment, even the P27+ cell in the GCL that is likely an astrocyte has no staining in the shRNA 1 picture. Why is this?

      For Supplementary Figure 2c-d, we focused on the central area where knockdown efficiency was high, approaching 100%. We will replace this image with one that includes both high and low Müller glia transduction efficiency regions, clearly demonstrating the complete loss of p27kip1 staining in the area of high transduction efficiency.

      Figure 2

      (8) Would you expect cells to go through two rounds of cell cycle in such a short time? The treatment of giving Edu then BrdU 24 hours later would have to catch a cell going through two rounds of division in a very short amount of time. Again the end point should be added graphically to this figure.

      We thank the reviewer for raising this important point. While the typical cell cycle time for human cells is approximately 24 hours, we hypothesized that 24 hours would be the most likely timepoint to capture cells continuously progressing through the cell cycle. However, we acknowledge that we cannot exclude the possibility of some cells entering a second cell cycle at much later timepoints.

      In the revised manuscript, we will carefully qualify our conclusion to state that the majority of MG do not immediately undergo another cell division, rather than making a definitive statement. This more cautious phrasing will better reflect the limitations of the 24-hour timepoint and allow for the potential of a small subset of cells proceeding through additional rounds of division at later stages.

      Figure 3

      (9) I am confused by the mixing of ratios of viruses to indicate infection success. I know mixtures of viruses containing CCA or control GFP or a control LacZ was injected. Was the idea to probe for GFP or LacZ in the single cell data to see which cells were infected but not treated? This is not shown anywhere?

      The virus infection was not uniform across the entire retina. To mark the infection hotspots, we added 10% GFP virus to the mixture. Regions of the retina with low infection efficiency were removed by dissection and excluded from the scRNA-seq analysis. We apologize for not clearly explaining this methodological detail in the original text, and will update the Methods section accordingly.

      (10) The majority of glia sorted from TdTomato are probably not infected with virus. Can you subset cells that were infected only for analysis? Otherwise it makes it very hard to make population judgements like Figure 3E-H if a large portion are basically WT glia.

      This question is related to the last one. Since the regions with high virus infection efficiency were selectively dissected and isolated for analysis, the percentage of CCA-infected MG should constitute the majority in the scRNA-seq data.

      (11) Figure 3C you can see Rho is expressed everywhere which is common in studies like this because the ambient RNA is so high. This makes it very hard to talk about "Rod-like" MG as this is probably an artifact from the technique. Most all scRNA-seq studies from MG-reprogramming have shown clusters of "rods" with MG hybrid gene expression and these had in the past just been considered an artifact.

      We agree that the low levels of Rho in other MG clusters (such as quiescent, reactivated, and proliferating MG) are likely due to RNA contamination. However, the level of Rho in the rod-like MG is significantly higher than in the other clusters, indicating that this is unlikely to be solely due to contamination.

      As shown in Supplementary Figure 7A-C, a cluster of MG-rod hybrid cells (cluster C4) was present in all three experimental groups at similar ratios, and this hybrid cluster was excluded from further analysis. In contrast, the rod-like Müller glia (cluster C3) were predominantly found in the CCA and CCANT groups, suggesting a genuine response to CCA treatment.

      Furthermore, we will conduct Rho and Gnat1 RNA in situ hybridization on the dissociated retinal cells to further support the conclusion that rod-specific genes are upregulated in a subset of MG in the revised manuscript.

      (12) It is mentioned the "glial" signature is downregulated in response to CCA treatment. Where is this shown convincingly? Figure H has a feature plot of Glul , which is not clear it is changed between treatments. Otherwise MG genes are shown as a function of cluster not treatment.

      We will add box plots of several MG-specific genes to better illustrate the downregulation of the glial signature in the relevant cell cluster in the revised manuscript.

      Figure 4

      (13) The authors should be commended for being very careful in their interpretations. They employ the proper controls (Er-Cre lineage tracing/EdU-pulse chasing/scRNA-seq omics) and were very careful to attempt to see MG-derived rods. This makes the conclusion from the FISH perplexing. The few puncta dots of Rho and GNAT in MG are not convincing to this reviewer, Rho and GNAT dots are dense everywhere throughout the ONL and if you drew any random circle in the ONL it would be full of dots. The rigor of these counts also comes into question because some dots are picked up in MG in the INL even in the control case. This is confusing because baseline healthy MG do not express RNA-transcripts of these Rod genes so what is this picking up? Taken together, the conclusion that there are Rod-like MG are based off scRNA-seq data (which is likely ambient contamination) and these FISH images. I don't think this data warrants the conclusion that MG upregulate Rod genes in response to CCA.

      We performed RNA in situ hybridization on retinal sections because we aimed to correlate cell localization with rod gene expression. We understand the reviewer’s concern that the punctate signals of Rho and GNAT1 in the ONL MG may actually originate from neighboring rods. In the revised manuscript, we will conduct RNAscope on dissociated retinal cells to avoid this issue.

      Figure 5

      (14) Similar point to above but this Glul probe seems odd, why is it throughout the ONL but completely dark through the IPL, this should also be in astrocytes can you see it in the GCL? These retinas look cropped at the INL where below is completely black. The whole retinal section should be shown. Antibodies exist to GS that work in mouse along with many other MG genes, IHC or western blots could be done to better serve this point.

      Indeed, the GCL was cropped out in Figure 5 A-B. We have other images with all retinal layers, which we will use in the revised manuscript. Additionally, we will perform the GS antibody staining to demonstrate partial MG dedifferentiation following CCA treatment.

      Figure 6

      (15) Figure 6D is not a co-labeled OTX2+/ TdTomato+ cell, Otx2 will fill out the whole nucleus as can be seen with examples from other MG-reprogramming papers in the field (Hoang, et al. 2020; Todd, et al. 2020; Palazzo, et al. 2022). You can clearly see in the example in Figure 6D the nucleus extending way beyond Otx2 expression as it is probably overlapping in space. Other examples should be shown, however, considering less than 1% of cells were putatively Otx2+, the safer interpretation is that these cells are not differentiating into neurons. At least 99.5% are not.

      We have additional examples of Otx2+ Tdt+ Edu+ cells, which suggest that MG neurogenesis to Otx2+ cells does occur, despite the low efficiency. We will include these images in the revised manuscript.

      (16) Same as above Figure 6I is not convincingly co-labeled HuC/D is an RNA-binding protein and unfortunately is not always the clearest stain but this looks like background haze in the INL overlapping. Other amacrine markers could be tested, but again due to the very low numbers, I think no neurogenesis is occurring.

      We have additional examples of HuC/D+ Tdt+ Edu+ cells, which we will show in the revised manuscript.

      (17) In the text the authors are accidently referring to Figure 6 as Figure 7.

      We thank the reviewer for pointing out the mistake. We will correct the mistake in the revised manuscript.

      Figure 7

      (18) I like this figure and the concept that you can have additional MG proliferating without destroying the retina or compromising vision. This is reminiscent of the chick MG reprogramming studies in which MG proliferate in large numbers and often do not differentiate into neurons yet still persist de-laminated for long time points.

      General:

      (19) The title should be changed, as I don't believe there is any convincing evidence of regeneration of neurons. Understanding the barriers to MG cell-cycle re-entry are important and I believe the authors did a good job in that respect, however it is an oversell to report regeneration of neurons from this data.

      We thank the reviewer for the suggestion. We will consider changing the title in the revised manuscript.

      (20) This paper uses multiple mouse lines and it is often confusing when the text and figures switch between models. I think it would be helpful to readers if the mouse strain was added to graphical paradigms in each figure when a different mouse line is employed.

      We will label the mouse lines used in each experiment in the figures where appropriate.

    1. Reviewer #1 (Public Review):

      Summary:

      In this paper the researchers aimed to address whether bees causally understand string-pulling through a series of experiments. I first briefly summarize what they did:

      - In experiment 1, the researchers trained bees without string and then presented them with flowers in the test phase that either had connected or disconnected strings, to determine what their preference was without any training. Bees did not show any preference.

      - In experiment 2, bees were trained to have experience with string and then tested on their choice between connected vs. disconnected string.

      - Experiment 3 was similar except that instead of having one option which was an attached string broken in the middle, the string was completely disconnected from the flower.

      - In experiment 4, bees were trained on green strings and tested on white strings to determine if they generalize across color.

      - In experiment 5, bees were trained on blue strings and tested on white strings.

      - In experiment 6, bees were trained where black tape covered the area between the string and the flower (i.e. so they would not be able to see/ learn whether it was connected or disconnected).

      - In experiments 2-6, bees chose the connected string in the test phase.

      - In experiment 7, bees were trained as in expt 3 and then tested where string was either disconnected or coiled i.e. still being 'functional' but appearing different.

      - In experiment 8, bees were trained as before and then tested on string that was in a different coiled orientation, either connected or disconnected.

      - In experiments 7 and 8 the bees showed no preference.

      Strengths:

      I appreciate the amount of work that has gone into these experiments and think they are a nice, thorough set of experiments. I enjoyed reading the paper and felt that it was overall well-written and clear. I think experiment 1 shows that bees do not have an untrained understanding of the function of the string in this context. The rest of the experiments indicate that with training, bees have a preference for unbroken over broken string and likely use visual cues learned during training to make this choice. They also show that as in other contexts, bees readily generalize across different colors.

      The 'weaknesses' that I previously listed were dealt with by the authors in the revised version of the manuscript. I think the only point that we disagreed on was relating to the ecological relevance of the task to the bees.

      Here is my previous comment:

      I think the paper would be made stronger by considering the natural context in which the bee performs this behavior. Bees manipulate flowers in all kinds of contexts, and scrabble with their legs to achieve nectar rewards. Rather than thinking that it is pulling a string, my guess would be that the bee learns that a particular motor pattern within their usual foraging repertoire (scrabbling with legs), leads to a reward. I don't think this makes the behavior any less interesting - in fact, I think considering the behavior through an ecological lens can help make better sense of it.

      The authors disagreed, writing the following:

      "Here we respectfully disagree. The solving of Rubik s cube by humans could be said to be version of finger movements naturally required to open nuts or remove ticks from fur, but this is somewhat beside the point: it s not the motor<br /> sequences that are of interest, but the cognition involved. A general approach in work on animal intelligence and cognition is to deliberately choose paradigms that are outside the animals daily routines this is what we have done here, in asking whether there is means end comprehension in bee problem solving. Like comparable studies on this question in other animals, the experiments are designed to probe this question, not one of ecological validity."

      I think the difference would be that humans know that they are doing a rubik's cube whereas I do not think that the bee knows that it is pulling string- I think the bee thinks that it is foraging on a flower. Therefore, I stand by my statement that I think it's worth considering what the bee is experiencing in this task and how it relates to what it would be doing while foraging. I think that as animal cognition researchers we can design tasks that are distinct from what the animal would naturally encounter to ask specific questions about what they are thinking- but that we can never remove the ecological context since the animal will always be viewing the task through that lens. However, I think this may be a philosophical difference in opinion and I am happy with the manuscript as it stands.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the researchers aimed to address whether bees causally understand string-pulling through a series of experiments. I first briefly summarize what they did:

      - In experiment 1, the researchers trained bees without string and then presented them with flowers in the test phase that either had connected or disconnected strings, to determine what their preference was without any training. Bees did not show any preference.

      - In experiment 2, bees were trained to have experience with string and then tested on their choice between connected vs. disconnected string.

      - experiment 3 was similar except that instead of having one option which was an attached string broken in the middle, the string was completely disconnected from the flower.

      - In experiment 4, bees were trained on green strings and tested on white strings to determine if they generalize across color.

      - In experiment 5, bees were trained on blue strings and tested on white strings.

      - In experiment 6, bees were trained where black tape covered the area between the string and the flower (i.e. so they would not be able to see/ learn whether it was connected or disconnected).

      - In experiments 2-6, bees chose the connected string in the test phase.

      - In experiment 7, bees were trained as in experiment 3 and then tested where the string was either disconnected or coiled i.e. still being 'functional' but appearing different.

      - In experiment 8, bees were trained as before and then tested on a string that was in a different coiled orientation, either connected or disconnected.

      - In experiments 7 and 8 the bees showed no preference.

      Strengths:

      I appreciate the amount of work that has gone into this study and think it contains a nice, thorough set of experiments. I enjoyed reading the paper and felt that overall it was well-written and clear. I think experiment 1 shows that bees do not have an untrained understanding of the function of the string in this context. The rest of the experiments indicate that with training, bees have a preference for unbroken over broken string and likely use visual cues learned during training to make this choice. They also show that as in other contexts, bees readily generalize across different colors.

      Weaknesses:

      (1) I think there are 2 key pieces of information that can be taken from the test phase - the bees' first choice and then their behavior across the whole test. I think the first choice is critical in terms of what the bee has learned from the training phase - then their behavior from this point is informed by the feedback they obtain during the test phase. I think both pieces of information are worth considering, but their behavior across the entire test phase is giving different information than their first choice, and this distinction could be made more explicit. In addition, while the bees' first choice is reported, no statistics are presented for their preferences.

      We agree with the reviewer that the first choice is critical in terms of what the bumblebees have learned from the training phase. We analyzed the bees’ first choice in Table 1, and we added the tested videos. The entire connected and disconnected strings were glued to the floor, the bees were unable to move either the connected or disconnected strings, and avoid learning behavior during the tests. We added the data of bee's each choice in the Supplementary table.

      (2) It seemed to me that the bees might not only be using visual feedback but also motor feedback. This would not explain their behavior in the first test choice, but could explain some of their subsequent behavior. For example, bees might learn during training that there is some friction/weight associated with pulling the string, but in cases where the string is separated from the flower, this would presumably feel different to the bee in terms of the physical feedback it is receiving. I'd be interested to see some of these test videos (perhaps these could be shared as supplementary material, in addition to the training videos already uploaded), to see what the bees' behavior looks like after they attempt to pull a disconnected string.

      We added supplementary videos of testing phase. As noted in General Methods, both connected and disconnected strings were glued to the floor to prevent the air flow generated by flying bumblebees’ wings from changing the position of the string during the testing phase. The bees were unable to move either the connected or disconnected strings during the tests, and only attempted to pull them. Therefore, the difference in the friction/weight of pulling the both strings cannot be a factor in the test.

      (3) I think the statistics section needs to be made clearer (more in private comments).

      We changed the statistical analysis section as suggested by the reviewer.

      (4) I think the paper would be made stronger by considering the natural context in which the bee performs this behavior. Bees manipulate flowers in all kinds of contexts and scrabble with their legs to achieve nectar rewards. Rather than thinking that it is pulling a string, my guess would be that the bee learns that a particular motor pattern within their usual foraging repertoire (scrabbling with legs), leads to a reward. I don't think this makes the behavior any less interesting - in fact, I think considering the behavior through an ecological lens can help make better sense of it.

      Here we respectfully disagree. The solving of Rubik’s cube by humans could be said to be version of finger-movements naturally required to open nuts or remove ticks from fur, but this is somewhat beside the point: it’s not the motor sequences that are of interest, but the cognition involved. A general approach in work on animal intelligence and cognition is to deliberately choose paradigms that are outside the animals’ daily routines-this is what we have done here, in asking whether there is means-end comprehension in bee problem solving. Like comparable studies on this question in other animals, the experiments are designed to probe this question, not one of ecological validity.

      Reviewer #2 (Public Review):

      Summary:

      The authors wanted to see if bumblebees could succeed in the string-pulling paradigm with broken strings. They found that bumblebees can learn to pull strings and that they have a preference to pull on intact strings vs broken ones. The authors conclude that bumblebees use image matching to complete the string-pulling task.

      Strengths:

      The study has an excellent experimental design and contributes to our understanding of what information bumblebees use to solve a string-pulling task.

      Weaknesses:

      Overall, I think the manuscript is good, but it is missing some context. Why do bumblebees rely on image matching rather than causal reasoning? Could it have something to do with their ecology? And how is the task relevant for bumblebees in the wild? Does the test translate to any real-life situations? Is pulling a natural behaviour that bees do? Does image matching have adaptive significance?

      We appreciate the valuable comment from the reviewer. Our explanation, which we have now added to the manuscript, is as follows:

      “Different flower species offer varying profitability in terms of nectar and pollen to bumblebees; they need to make careful choices and learn to use floral cues to predict rewards (Chittka, 2017). Bumblebees can easily learn visual patterns and shapes of flower (Meyer-Rochow, 2019); they can detect stimuli and discriminate between differently coloured stimuli when presented as briefly as 25 ms (Nityananda et al., 2014). In contrast, causal reasoning involves understanding and responding to causal relationships. Bumblebees might favor, or be limited to, a visual approach, likely due to the efficiency and simplicity of processing visual cues to solve the string-pulling task. ”

      As above, it worth noting that our work is not designed as an ecological study, but one about the question of whether causal reasoning can explain how bees solve a string-pulling puzzle. We have a cognitive focus, in line with comparable studies on other animals. We deliberately chose a paradigm that is to some extent outside of the daily challenges of the animal.

      Reviewer #3 (Public Review):

      Summary:

      This paper presents bees with varying levels of experience with a choice task where bees have to choose to pull either a connected or unconnected string, each attached to a yellow flower containing sugar water. Bees without experience of string pulling did not choose the connected string above chance (experiment 1), but with experience of horizontal string pulling (as in the right-hand panel of Figure 4) bees did choose the connected string above chance (experiments 2-3), even when the string colour changed between training and test (experiments 4-5). Bees that were not provided with perceptual-motor feedback (i.e they could not observe that each pull of the string moved the flower) during training still learned to string pull and then chose the connected string option above chance (experiment 6). Bees with normal experience of string pulling then failed to discriminate between connected and unconnected strings when the strings were coiled or looped, rather than presented straight (experiments 7-8).

      Weaknesses:

      The authors have only provided video of some of the conditions where the bees succeeded. In general, I think a video explaining each condition and then showing a clip of a typical performance would make it much easier to follow the study designs for scholars. Videos of the conditions bees failed at would be highly useful in order to compare different hypotheses for how the bees are solving this problem. I also think it is highly important to code the videos for switching behaviours. When solving the connected vs unconnected string tasks, when bees were observed pulling the unconnected string, did they quickly switch to the other string? Or did they continue to pull the wrong string? This would help discriminate the use of perceptual-motor feedback from other hypotheses.

      We added the test videos as suggested by the reviewer, and we added the data for each bee's choice. However, both connected and disconnected strings were glued to the floor, and therefore perceptual-motor feedback was equal and irrelevant between the choices during the test.

      The experiments are also not described well, for my below comments I have assumed that different groups of bees were tested for experiments 1-8, and that experiment 6 was run as described in line 331, where bees were given string-pulling training without perceptual feedback rather than how it is described in Figure 4B, which describes bees as receiving string pulling training with feedback.

      We now added figures of Experiment 6 and 7 in the Figure 1B, and we mentioned that different groups of bees were tested for Experiments 1-9.

      The authors suggest the bees' performance is best explained by what they term 'image matching'. However, experiment 6 does not seem to support this without assuming retroactive image matching after the problem is solved. The logic of experiment 6 is described as "This was to ensure that the bees could not see the familiar "lollipop shape" while pulling strings....If the bees prefer to pull the connected strings, this would indicate that bees memorize the arrangement of strings-connected flowers in this task." I disagree with this second sentence, removing perceptual feedback during training would prevent bees memorising the lollipop shape, because, while solving the task, they don't actually see a string connected to a yellow flower, due to the black barrier. At the end of the task, the string is now behind the bee, so unless the bee is turning around and encoding this object retrospectively as the image to match, it seems hard to imagine how the bee learns the lollipop shape.

      We agree with the reviewer that while solving the task in the last step during training, the bees don't actually see a string connected to a yellow flower, due to the black barrier. Since the full shape is only visible after the pulling is completed and this requires the bee to “check back” on the entire display after feeding, to basically conclude “ this is the shape that I need to be looking for later”.

      Another possibility is that bumblebees might remember the image of the “lollipop shape” while training the bees in the first step, in which the “lollipop shape” was directly presented to the bumblebee in the early step of the training.

      We added the experiment suggested by the reviewer, and the result showed that when a green table was placed behind the string to obscure the “lollipop shape” at any point during the training phase, the bees were unable to identify the connected string. The result further supports that bumblebees learn to choose the connected string through image matching.

      Despite this, the authors go on to describe image matching as one of their main findings. For this claim, I would suggest the authors run another experiment, identical to experiment 6 but with a black panel behind the bee, such that the string the bee pulls behind itself disappears from view. There is now no image to match at any point from the bee's perspective so it should now fail the connectivity task.

      Strengths:

      Despite these issues, this is a fascinating dataset. Experiments 1 and 2 show that the bees are not learning to discriminate between connected and unconnected stimuli rapidly in the first trials of the test. Instead, it is clear that experience in string pulling is needed to discriminate between connected and unconnected strings. What aspect of this experience is important? Experiment 6 suggests it is not image matching (when no image is provided during problem-solving, but only afterward, bees still attend to string connectivity) and casts doubt on perceptual-motor feedback (unless from the bee's perspective, they do actually get feedback that pulling the string moves the flower, video is needed here). Experiments 7 and 8 rule out means-end understanding because if the bees are capable of imagining the effect of their actions on the string and then planning out their actions (as hypotheses such as insight, means-end understanding and string connectivity suggest), they should solve these tasks. If the authors can compare the bees' performance in a more detailed way to other species, and run the experiment suggested, this will be a highly exciting paper

      We appreciate the valuable comment from the reviewer. We compared the bees' performance to other species, and conducted the experiment as suggested by the reviewer.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Smaller comments:

      Line 64: is the word 'simple' needed here? It could also be explained by more complex forms of associative learning, no?

      We deleted “simple”.

      Methods:

      Line 230: was it checked that this was high-contrast for the bees?

      We added the relevant reference in the revised manuscript.

      Line 240: how much sucrose solution was present in the flowers?

      We added 25 microliters sucrose solution in the flowers. We added the information in the revised manuscript.

      Line 266: check grammar.

      We checked the grammar as follows: “During tests, both strings were glued to the floor of the arena to prevent the air flow generated by flying bumblebees’ wings from changing the position of the string.”

      Statistical analysis:

      - What does it mean that "Bees identity and colony were analyzed with likelihood ratio tests"?

      Bees identity and colony was set as a random variable. We changed the analysis methods in the revised manuscript, and results of the all the experiments did not changed.

      - Line 359: do you mean proportion rather than percentage?

      We mean the percentage.

      - "the number of total choices as weights" - this should be explained further. This is the number of choices that each bee made? What was the variation and mean of this number? If bees varied a lot in this metric, it might make more sense to analyze their first choice (as I see you've done) and their first 10 choices or something like that - for consistency.

      This refers to the total number of choices made by each bumblebee. We added the mean and standard error of each bee’s number of choices in Table 1. Some bees pulled the string fewer than 10 times; we chose to include all choices made by each bee.

      - More generally I think the first test is more informative than the subsequent choices, since every choice after their first could be affected by feedback they are getting in that test phase. Or rather, they are telling you different things.

      All the bees were tested only once, however, you might be referring to the first choice. We used Chi-square test to analyze the bumblebees’ first choices in the test. It is worth noting that both connected and disconnected strings were glued to the floor. The bees were unable to move either the connected or disconnected strings during the tests, and only attempted to pull them. Therefore,the feedback from pulling either the connected or disconnected strings is the same.

      - Line 362: I think I know what you mean, but this should be re-phrased because the "number of" sounds more appropriate for a Poisson distribution. I think what you are testing is whether each individual bee chose the connected or the disconnected string - i.e. a 0 or 1 response for each bee?

      We agree with the reviewer that each bee chose the connected or the disconnected string - i.e. a 0 or 1 response for each bee, but not the number. We clarify this as: “The total number of the choices made by each bee was set as weights.” 

      - Line 364-365: here and elsewhere, every time you mention a model, make it clear what the dependent and independent variables are. i.e. for the mixed model, the 'bee' is the random factor? Or also the colony that the bee came from? Were these nested etc?

      We clarify this in the revised manuscript. The bee identity and colony is the random factor in the mixed model.

      - Line 368: "Latency to the first choice of each bee was recorded" - why? What were the hypotheses/ predictions here?

      The latency to the first choice was intended to see if the bumblebees were familiarizing with the testing pattern. A shorter delay time might indicate that the bumblebees were more familiar with the pattern.

      - Line 371: "Multiple comparisons among experiments were.." - do you mean 'within' experiments? It seems that treatments should not be compared between different experiments.

      We mean multiple comparisons among different experiments; we clarify this in the revised manuscript.

      Results

      Experiment 1: From the methods, it sounded like you both analyzed the bees' first choice and their total no. of choices, but in the results section (and Figure 1) I only see the data for all choices combined here.

      In table 1 and in the text you report the number of bees that chose each option on their first choice, but there are no statistical results associated with these results. At the very least, a chi square or binomial test could be run.

      Line 138: "Interestingly, ten out of fifteen bees pulled the connected string in their first choice" - this is presented like it is a significant majority of bees, but a chi-square test of 10 vs 5 has a p-value = 0.1967

      We used the Chi square test to analyzed of the bees’ first choice. We also added the analyzed data in the Table 1.

      Line 143: "It makes sense because the bees could see the "lollipop shape" once they pulled it out from the table." - this feels more like interpretation (i.e. Discussion) rather than results.

      We moved the sentence to the discussion.

      Line 162: again this feels more like interpretation/ conjecture than results.

      We removed the sentence in the results.

      Line 184: check grammar.

      We checked the grammar. We changed “task” to “tasks”.

      Figures

      I really appreciated the overview in Figure 5 - though I think this should be Figure 1? Even if the methods come later in eLife, I think it would be nice to have that cited earlier on (e.g. at the start of the results) to draw the reader's attention to it quickly, since it's so helpful. It also then makes the images at the bottom of what is currently Figure 1 make more sense. I also think that the authors could make it clearer in Figure 5 which strings are connected vs disconnected in the figure (even if it means exaggerating the distance more than it was in real life). I had to zoom in quite a bit to see which were connected vs. not. Alternatively, you could have an arrow to the string with the words "connected" "disconnected" the first time you draw it - and similar labels for the other string conditions.

      We appreciate the valuable comment from the reviewer. We changed Figure 5 to Figure 2, and Figure 4 to Figure 1. We cited the Figures at the start of the results. We also changed the gap distance between the disconnected strings. Additionally, we added arrows to indicate “connected” and “disconnected” strings in the Figure.

      Figure 1 - I think you could make it clearer that the bars refer to experiments (e.g. have an x-axis with this as a label). Also, check the grammar of the y-axis.

      We added the experiments number in the Figures. Additionally, we checked the grammar of the y-axis. We changed “percentages” to “parentage”. 

      I also think it's really helpful to see the supplementary videos but I think it would be nice to see some examples of the test phase, and not just the training examples.

      We added Supplementary videos of the testing phase.

      Reviewer #2 (Recommendations For The Authors):

      Below are also some minor comments:

      L40: "approaches".

      We changed “approach” to “approaches”.

      L42: but likely mainly due to sampling bias of mammals and birds.

      We changed the sentence as follows: String pulling is one of the most extensively used approaches in comparative psychology to evaluate the understanding of causal relationships (Jacobs & Osvath, 2015), with most research focused on mammals and birds, where a food item is visible to the animal but accessible only by pulling on a string attached to the reward (Taylor, 2010; Range et al., 2012; Jacobs & Osvath, 2015; Wakonig et al., 2021).

      L64: remove "in this study"

      We removed “in this study”.

      L64: simple associative learning of what? Isn't your image matching associative too?

      We removed “ simple”.

      L97: remove "a" before "connected".

      We removed “a” before “connected”.

      L136-138: but maybe they could still feel the weight of the flower when pulling?

      Because both strings were glued to the floor in the test phase, the feedback was the same and therefore irrelevant. This information is noted in the General Methods.

      L161: what are these numbers?

      We removed the latency in the revised manuscript.

      L167/ Table 1: I realise that the authors never tried slanted strings to check if bumblebees used proximity as a cue. Why?

      This was simply because we wanted to focus on whether bumblebees could recognize the connectivity of the string.

      Discussion: Why did you only control for colour of the string? What if you had used strings with different textures or smells? Unclear if the authors controlled for "bumblebee smell" on the strings, i.e., after a bee had used the string, was the string replaced by a new one or was the same one used multiple times?

      We used different colors to investigate featural generalization of the visual display of the string connected to the flower in this task. We controlled for color because it is a feature that bumblebees can easily distinguish.

      Both the flowers and the strings were used only once, to prevent the use of chemosensory cues. We clarify this in the revised manuscript.

      L182: since what?

      We deleted “since” in the revised manuscript.

      L182-188: might be worth mentioning that some crows and parrots known for complex cognition perform poorly on broken strings (e.g., https://doi.org/10.1098/rspb.2012.1998 ; https://doi.org/10.1163/1568539X-00003511 ; https://doi.org/10.1038/s41598-021-94879-x ) and Australian magpies use trial and error (https://doi.org/10.1007/s00265-023-03326-6).

      We added the following sentences as suggested by the reviewer: “It is worth noting that some crows and parrots known for complex cognition perform poorly on the broken string task without perceptual feedback or learning. For example, New Caledonian crows use perceptual feedback strategies to solve the broken string-pulling task, and no individual showed a significant preference for the connected string when perceptual feedback was restricted (Taylor et al., 2012). Some Australian magpies and African grey parrots can solve the broken string task, but they required a high number of trials, indicating that learning plays a crucial role in solving this task (Molina et al., 2019; Johnsson et al., 2023).”

      L193: maybe expand on this to put the task into a natural context?

      We added the following sentences as suggested by the reviewer:

      “Different flower species offer varying profitability in terms of nectar and pollen to bumblebees; they need to make careful choices and learn to use floral cues to predict rewards (Chittka, 2017). Bumblebees can easily learn visual patterns and shapes of flower (Meyer-Rochow, 2019); they can detect stimuli and discriminate between differently coloured stimuli when presented as briefly as 25 ms (Nityananda et al., 2014). In contrast, causal reasoning involves understanding and responding to causal relationships. Bumblebees might favor, or be limited to, a visual approach, likely due to the efficiency and simplicity of processing visual cues to solve the string-pulling task. ”

      L204: is causal understanding the same as means-end understanding?

      Means-end understanding is expressed as goal-directed behavior, which involves the deliberate and planned execution of a sequence of steps to achieve a goal. Includes some understanding of the causal relationship (Jacobs & Osvath, 2015; Ortiz et al., 2019). .

      L235: this is a very big span of time. Why not control for motivation? Cognitive performance can vary significantly across the day (at least in humans).

      Bumblebee motivation is understood to be rather consistent, as those that were trained and tested came to the flight arena of their own volition and were foragers looking to fill their crop load each time to return it to the colony.

      L232: what is "(w/w)" ? This occurs throughout the manuscript.

      “w/w” represents the weight-to-weight percentage of sugar.

      L250: this sentence sounds odd. "containing in the central well.." ?? Perhaps rephrase? Unclear what central well refers to? Did the flowers have multiple wells?

      We rephrased the sentence as follows: For each experiment, bumblebees were trained to retrieve a flower with an inverted Eppendorf cap at the center, containing 25 microliters of 50% sucrose solution, from underneath a transparent acrylic table

      L268: why euthanise?

      The reason for euthanizing the bees is that new foragers will typically only become active after the current ones were removed from the hive.

      L270: chemosensory cues answer my concern above. Maybe make it clear earlier.

      We moved this sentence earlier in the result.

      L273: did different individuals use different pulling strategies? Do you have the data to analyse this? This has been done on birds and would offer a nice comparison.

      We analyzed the string-pulling strategies among different individuals, and provided Supplementary Table 1 to display the performances of each individual in different string-pulling experiments.

      L365: unclear why both models. Would be nice to see a GLM output table.

      The duration of pulling different kinds of strings were first tested with the Shapiro-Wilk test to assess data normality. The duration data that conforms to a normal distribution was compared using linear mixed-effects models (LMM), while the data that deviates from normality were examined with a generalized linear-mixed model (GLMM). We added a GLM and GLMM output table in the revised manuscript.

      L377: should be a space between the "." and "This".

      We added a space between the “.” and “This”.

      L383-390: some commas and semicolons are in the wrong places.

      We carefully checked the commas and semicolons in this sentence.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments

      Line 32: seems to be missing a word, suggest "the bumblebees' ability to distinguish".

      we added “the” in the revised manuscript.

      Line 47: it would be good to reference other scholars here, this is the central focus of all work in comparative psychology.

      We added the reference in the revised manuscript.

      Line 50-61: I think the string-pulling literature could be described in more detail here, with mention of perceptual-motor feedback loops as a competing hypothesis to means-end understanding (see Taylor et al 2010, 2012). It seems a stretch to suggest that "String-pulling studies have directly tested means-end comprehension in various species", when perceptual-motor feedback is a competing hypothesis that we have positive evidence for in several species.

      We mentioned the perceptual-motor feedback in the introduction as follow:

      “Multiple mechanisms can be involved in the string-pulling task, including the proximity principle, perceptual feedback and means-end understanding (Taylor et al., 2012; Wasserman et al., 2013; Jacobs & Osvath, 2015; Wang et al., 2020). The principle of proximity refers to animals preferring to pull the reward that is closest to them (Jacobs & Osvath, 2015). Taylor et al. (2012) proposed that the success of New Caledonian crows in string-pulling tasks is based on a perceptual-motor feedback loop, where the reward gradually moves closer to the animal as they pull the strings. If the visual signal of the reward approaching is restricted, crows with no prior string-pulling experience are unable to solve the broken string task (Taylor et al., 2012).

      However, when a green table was placed behind the string to obscure the “lollipop” structure during the training, the bees could not see the “lollipop” during the initial training stage or after pulling the string from under the table. In this situation, the bees were unable to identify the connected string, further proving that bumblebees chose the connected string based on image matching.

      Line 68: suggest remove 'meticulously'.

      We removed “meticulously”.

      Line 99: This is an exciting finding, can the authors please provide a video of a bee solving this task on its first trial?

      We added videos in the supplementary materials.

      Line 133: perceptual-motor feedback loops should be introduced in the introduction.

      We introduced perceptual-motor feedback loops in the revised manuscript.

      Line 136: please clarify the prior experience of these bees, it is not clear from the text.

      We clarified the prior experience of these bees as follow: Bumblebees were initially attracted to feed on yellow artificial flowers, and then trained with transparent tables covered by black tape (S7 video) through a four-step process.

      Line 138: from the video it is not possible to see the bee's perspective of this occlusion. Do the authors have a video or image showing the feedback the bees received? I think this is highly important if they wish to argue that this condition prevents the use of both image matching and a perceptual-motor feedback loop.

      We prevented the use of image matching: the bees were unable to see the flower moving towards them above the table during the training phase in this condition. But the bees may receive visual image both after pulling the string out from the table and in the initial stages of training in this condition.

      Line 147: please clarify what experience these bees had before this test.

      We added the prior experience of bumblebees before training as follow: We therefore designed further experiments based on Taylor et al. (2012) to test this hypothesis. Bumblebees were first trained to feed on yellow artificial, and then trained with the same procedure as Experiment 2, but the connected strings were coiled in the test.

      Line 155: This is a highly similar test to that used in Taylor et al 2012, have the authors seen this study?

      We mentioned the reference in the revised manuscript as follows: We therefore designed further experiments based on Taylor et al. (2012) to test this hypothesis.

      Line 183: This sentence needs rewriting "Since the vast majority of animals, including dogs 183 (Osthaus et al., 2005), cats (Whitt et al., 2009), western scrub-jays (Hofmann et al.,2016) and azure-winged magpies (Wang et al., 2019) are failing in such tasks spontaneously".

      We changed the sentence as suggested by the reviewer as follow:  Some animals, including dogs (Osthaus et al., 2005), cats (Whitt et al., 2009), western scrub-jays (Hofmann et al., 2016) and azure-winged magpies (Wang et al., 2019) fail in such task spontaneously.

      Line 186: "complete comprehension of the functionality of strings is rare" I am not sure the evidence in the current literature supports any animal showing full understanding, can the authors explain how they reach this conclusion?

      We wished to say that few animal species could distinguish between connected and disconnected strings without trial and error learning. We revised the sentence as follows:

      It is worth noting that some crows and parrots known for complex cognition perform poorly on broken string task without perceptual feedback or learning. For example, New Caledonian crows use perceptual feedback strategies to solve broken string-pulling task, and no individual showed a significant preference for the connected string when perceptual feedback is restricted (Taylor et al., 2012). Some Australian magpies and African grey parrots can solve the broken string task, but it required a high number of trials, indicating that learning plays a crucial role in solving this task (Molina et al., 2019; Johnsson et al., 2023).

      Line 190: the authors need to clarify which part of their study provides positive evidence for this conclusion.

      We added the evidence for this conclusion as follows: Our findings suggest that bumblebees with experience of string pulling prefer the connected strings, but they failed to identify the interrupted strings when the string was coiled in the test.

      Line 265: was the far end of the string glued only?

      The entire string was glued to the floor, not just the far ends of the string.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Summary: 

      In this paper, the authors used target agnostic MBC sorting and activation methods to identify B cells and antibodies against sexual stages of Plasmodium falciparum. While they isolated some Mabs against PFs48/45 and PFs230, two well-known candidates for "transmission blocking" vaccines, these antibodies' efficacies, as measured by TRA, did not perform as well as other known antibodies. They also isolated one cross-reactive mAb to proteins containing glutamic acid-rich repetitive elements, that express at different stages of the parasite life cycle. They then determined the structure of the Fab with the highest protein binder they could determine through protein microarray, RESA, and observed homotypic interactions. 

      Strengths: 

      -  Target agnostic B cell isolation (although not a novel methodology). 

      -  New cross-reactive antibody with some "efficacy" (TRA) and mechanism (homotypic interactions) as demonstrated by structural data and other biophysical data. 

      Weaknesses: 

      The paper lacks clarity at times and could benefit from more transparency (showing all the data) and explanations. 

      We have added the oocyst count data from the SMFA experiments as Supplementary Table 2, and ELISA binding curves underlying Figure 4B as Supplementary Figure 5.

      In particular: 

      - define SIFA 

      - define TRAbs 

      We have carefully gone through the manuscript and have introduced abbreviations at first use, removed unnecessary abbreviations and removed unnecessary jargon to increase readability.

      - it is not possible to read the Figure 6B and C panels. 

      We regret that the labels in Supplementary Figures 6 and 7 were of poor quality and have now included higher resolution images to solve this issue.

      Reviewer #2 (Public Review): 

      This manuscript by Amen, Yoo, Fabra-Garcia et al describes a human monoclonal antibody B1E11K, targeting EENV repeats which are present in parasite antigens such as Pfs230, RESAs, and 11.1. The authors isolated B1E11K using an initial target agnostic approach for antibodies that would bind gamete/gametocyte lysate which they made 14 mAbs. Following a suite of highly appropriate characterization methods from Western blotting of recombinant proteins to native parasite material, use of knockout lines to validate specificity, ITC, peptide mapping, SEC-MALS, negative stain EM, and crystallography, the authors have built a compelling case that B1E11K does indeed bind EENV repeats. In addition, using X-ray crystallography they show that two B1E11K Fabs bind to a 16 aa RESA repeat in a head-to-head conformation using homotypic interactions and provide a separate example from CSP, of affinity-matured homotypic interactions. 

      There are some minor comments and considerations identified by this reviewer, These include that one of the main conclusions in the paper is the binding of B1E11K to RESAs which are blood stage antigens that are exported to the infected parasite surface. It would have been interesting if immunofluorescence assays with B1E11K mAb were performed with blood-stage parasites to understand its cellular localization in those stages. 

      In the current manuscript, we provide multiple lines of evidence that B1E11K binds (with high affinity) to repeats that are present in RESAs, i.e. through micro-array studies, in vitro binding experiments such as Western blot, ELISA and BLI, and through X-ray crystallography studies on B1E11k – repeat peptide complexes. Taken together, we think we provide compelling evidence that B1E11k binds to repeats present in RESA proteins. We do agree that studies on the function of this mAb against other stages of the parasite could be of interest, but as our manuscript focuses on the sexual stage of the parasite, we feel that this is beyond scope of the current work. However, this line of inquiry will be strongly considered in follow up studies.   

      Reviewer #3 (Public Review): 

      The manuscript from Amen et al reports the isolation and characterization of human antibodies that recognize proteins expressed at different sexual stages of Plasmodium falciparum. The isolation approach was antigen agnostic and based on the sorting, activation, and screening of memory B cells from a donor whose serum displays high transmission-reducing activity. From this effort, 14 antibodies were produced and further characterized. The antibodies displayed a range of transmission-reducing activities and recognized different Pf sexual stage proteins. However, none of these antibodies had substantially lower TRA than previously described antibodies. 

      The authors then performed further characterization of antibody B1E11K, which was unique in that it recognized multiple proteins expressed during sexual and asexual stages. Using protein microarrays, B1E11K was shown to recognize glutamate-rich repeats, following an EE-XX-EE pattern. An impressive set of biophysical experiments was performed to extensively characterize the interactions of B1E11K with various repeat motifs and lengths. Ultimately, the authors succeeded in determining a 2.6 A resolution crystal structure of B1E11K bound to a 16AA repeat-containing peptide. Excitingly, the structure revealed that two Fabs bound simultaneously to the peptide and made homotypic antibody-antibody contacts. This had only previously been observed with antibodies directed against CSP repeats. 

      Overall I found the manuscript to be very well written, although there are some sections that are heavy on field-specific jargon and abbreviations that make reading unnecessarily difficult. For instance, 'SIFA' is never defined. 

      We have carefully gone through the manuscript and have introduced abbreviations at first use, removed unnecessary abbreviations and removed unnecessary jargon to increase readability.

      Strengths of the manuscript include the target-agnostic screening approach and the thorough characterization of antibodies. The demonstration that B1E11K is cross-reactive to multiple proteins containing glutamate-rich repeats, and that the antibody recognizes the repeats via homotypic interactions, similar to what has been observed for CSP repeat-directed antibodies, should be of interest to many in the field. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Figure 1 - why only gametes ELISA and Spz or others?  

      The volumes of the single B cell supernatants were too small to screen against multiple antigens/parasite stages. As we aimed to isolate antibodies against the sexual stages of the parasite, our assay focused on this stage and supernatants were not tested against other stages. Furthermore, we screened for reactivity against gametes as TRA mAbs likely target gametes rather than other forms of sexual stage parasites.

      Figure 2 A 

      (a) Wild type (WT) and Pfs48/45 knock-out (KO) gametes.

      (b) I am a bit confused about what GMT is vs Pfs48/45 

      We have changed the column titles in Figure 2A to “wild-type gametes” and “Pfs48/45 knockout gametes” to improve clarity.  

      (c) Binding is high % why is it red? 

      We chose to present the results in a heatmap format with a graded color scale, from strong binders in red to weak binders in green. It has now been clarified in the legend of the figure. 

      Please state acronyms clearly 

      TRA - transmission reducing activity 

      SMFA - standard membrane feeding assay 

      We have added the full terms to clarify the acronyms.

      1123- VRC01 (not O1)

      We have corrected this.

      Figure 2 C bottom panels, clarify which ones are TRAbs (Assuming the Mabs with over 80% TRA at 500 ug/ml) (right gel) and the ones that are not (left gel)? 

      In the Western blot in Figure 2c, we have marked the antibodies with >80% TRA with an asterisk.

      Furthermore, we have replaced ‘TRAbs’ by ‘mAbs with >80% TRA at 500 µg/mL’ in the figure legend.

      ITC show the same affinity of the Fab to the 2 peptides but not the ELISA, not the BLI/SPR would be more appropriate. Any potential explanation?  

      The way binding affinity is determined across various techniques can result in slight differences in determined values. For instance, ELISAs utilize long incubation times with extensive washing steps and involve a spectroscopic signal, isothermal titration calorimetry (ITC) uses calorimetric signal at different concentration equilibriums to extract a KD, and BLI determines kinetic parameters for KD determination. Discrepancies in binding affinities between orthologous techniques have indeed been observed previously in the context of peptide-antibody binding (e.g. PMID: 34788599).

      Despite this, regardless of technique, the relative relationships in all three sets of data is the same - higher binding affinity is observed to the longer P2 peptide. This is the main takeaway of the section. As the reviewer suggests, BLI is likely the most appropriate readout here and is the only value explicitly mentioned in the main text. We primarily use ITC to support our proposed binding stoichiometry which is important to substantiate the SEC-MALS and nsEM data in Figure 4H-I. We added the following sentences to help reinforce these points: “The determined binding affinity from our ITC experiments (Table 1) differed from our BLI experiments (Fig. 4D and 4E), which can occur when measuring antibody-peptide interactions. Regardless, our data across techniques all trend toward the same finding in which a stronger binding affinity is observed toward the longer RESA P2 (16AA) peptide.”

      Figure 5C - would be helpful to have the peptide sequence above referring to what is E1, E2 etc... 

      We added two panels (Figure 5C-D) showcasing the binding interface that shows the peptide numbering in the context of the overall complex. We hope that this will help better orient the reader. 

      Figure S4 - maybe highlight in different colors the EENVV, EEIEE, Etc, etc 

      Repeats found in the sequence of the various proteins in Figure S4 have now been highlighted with different colors.

      Line 163 - why 14 mabs if 11 wells? Isn't it 1 B cell per well? The authors should explain right away that some wells have more than 1 B cell and some have 1 HC, 1LC, and 1 KC. 

      We agree that this was somewhat confusing and have modified the text which now reads: “We obtained and cloned heavy and light chain sequences for 11 out of 84 wells. For three wells we obtained a kappa light chain sequence and for five wells a lambda light chain sequence. For three wells we obtained both a lambda and kappa light chain sequence suggesting that either both chains were present in a single B cell or that two B cells were present in the well. For all 14 wells we retrieved a single heavy chain sequence. Following amplification and cloning, 14 mAbs, from 11 wells, were expressed as full human IgG1s (Table S1) (Dataset S1).”

      Line 166-167 - were they multiple HC (different ones) as well when Lambda and kappa were present?

      This is not clear at first. 

      We clarified this point in the text, see also comment above.

      Line 177 - expressed Pfs48/45 and Pfs230, is it lacking both or just Pfs48/45 (as stated on line 172)? 

      Pfs48/45 binds to the gamete surface via a GPI anchor, while Pfs230 is retained to the surface through binding to Pfs48/45. Hence, the Pfs48/45 knockout parasite will therefore also lack surfacebound Pfs230. We have added a sentence to the Results clarifying this: “The mAbs were also tested for binding to Pfs48/45 knock-out female gametes, which lack surface-bound Pfs48/45 and Pfs230”.

      Show the ELISA data used to calculate EC50 in Figure 3. 

      ELISA binding curves are now shown as Figure S5.

      Line 313-315 - what if you reverse, capture the Fab (peptide too small even if biotinylated?) 

      As anticipated by the Reviewer, immobilizing the Fab and dipping into peptide did not yield appreciable signal for kinetic analysis and thus the experiment from this setup is not reported. 

      Line 341 - add crystal structure 

      This has now been added.

      There is a bit too much speculation in the discussion. For e.g. "The B1C5L and B1C5K mAbs were shown to recognize Domain 2 of Pfs48/45 and exhibited moderate potency, as previously described for Abs with such specificity (27). These 2 mAbs were isolated from the same well and shared the same heavy chain; their three similar characteristics thus suggest that their binding is primarily mediated by the heavy chain". Actual data will reinforce this statement. 

      As B1C5L and B1C5K recognized domain 2 of Pfs48/45 with similar affinity, this strongly suggests that binding is mediated though the heavy chain. Structural analysis could confirm this statement, but this is out of the scope of this study.  

      Reviewer #2 (Recommendations For The Authors): 

      Figure 1: This figure provides a description of the workflow. To make it more relevant for the paper, the authors could add relevant numbers as the workflow proceeds. 

      (a) For example, how many memory B cells were sorted, how many supernatants were positive, and then how many mAbs were produced? These numbers can be attached to the relevant images in the workflow. 

      We modified the figure to include the numbers. 

      (b) For the "Supernatant screening via gamete extract ELISA", please change to "Supernatant screening via gamete/gametocyte extract ELISA". 

      We modified the statement as suggested. 

      Line 155: The manuscript states that 84 wells reacted with gamete/gametocyte lysate. The following sentence states that "Out of the 21 supernatants that were positive...". Can the authors provide the summary of data for all 84 wells or why focus on only 21 supernatants? 

      We screened all supernatants against gamete lysate, and only a subset against gametocyte lysate. In total, we found 84 positive supernatants that were reactive to at least one of the two lysates. 21 of those 84 positive were screened against both lysates. We have modified the text to clarify the numbers:

      “After activation, single cell culture supernatants potentially containing secreted IgGs were screened in a high-throughput 384-well ELISA for their reactivity against a crude Pf gamete lysate (Fig. S1B). A subset of supernatants was also screened against gametocyte lysate (S1C). In total, supernatants from 84 wells reacted with gamete and/or gametocyte lysate proteins, representing 5.6% of the total memory B cells. Of the 21 supernatants that were screened against both gamete and gametocyte lysates, six recognized both, while nine appeared to recognize exclusively gamete proteins, and six exclusively gametocyte proteins.”

      Please note that all 84 positive wells were taken forward for B cell sequencing and cloning. 

      Line 171: SIFA is introduced for the first time and should be completely spelled out.

      We have corrected this. 

      Figure 2: 

      (a) In Figure 2A, can you change the column title from "% pos KO GMT" to "% pos Pfs48/45 KO GMT"?

      We have changed the column titles.  

      (b) In Figure 2B, the SMFA results have been converted to %TRA. Can the authors please provide the raw data for the oocyst counts and number of mosquitoes infected in Supplementary Materials? 

      We have added oocyst count data in Table S2, to which we refer in the figure legend. 

      (c) For Figure 2F, the authors do have other domains to Pfs230 as described in Inklaar et al, NPJ Vaccines 2023. An ELISA/Western to the other domains could identify the binding site for B2C10L, though we appreciate this is not the central result of this manuscript. 

      We thank the reviewer for this suggestion. We are indeed planning to identify the target domain of B2C10L using the previously described fragments, but agree with the reviewer that this not the focus of the current manuscript and decided to therefore not include it in the current report.

      Line 116: The word sporozoites appears in subscript and should be corrected to be normal text. 

      We have corrected this.

      Line 216: Typo "B1E11K" 

      We have corrected this.

      Materials and Methods: 

      (a) PBMC sampling: Please add the ethics approval codes in this section. 

      Donor A visited the hospital with a clinical malaria infection and provided informed consent for collection of PBMCs. We have modified the method section to clarify this. 

      “Donor A had lived in Central Africa for approximately 30 years and reported multiple malaria infections during that period. At the time of sampling PBMCs, Donor A had recently returned to the Netherlands and visited the hospital with a clinical malaria infection. After providing informed consent, PBMCs were collected, but gametocyte prevalence and density were not recorded.”

      (b) Gamete/Gametocyte extract ELISA: Can the authors please provide the concentration of antibodies used for the positive and negative controls (TB31F, 2544, and 399) 

      We have added the concentrations for these mAbs in the methods section.

      Recombinant Pfs48/45 and Pfs230 ELISA: Please state the concentration or molarity used for the coating of recombinant Pfs48/45 and Pfs230CMB. 

      We have added the concentrations, i.e. 0.5 µg/mL, to the methods section.

      Western Blotting: The protocol states that DTT was added to gametocyte extracts (Line 594), but Western Blots in Figures 2 and 3 were performed in non-reducing conditions. Please confirm whether DTT was added or not. 

      Thank you for noting this. We did not use DTT for the western blots and have removed this line from the methods section.

      Reviewer #3 (Recommendations For The Authors): 

      Below are a few minor comments to help improve the manuscript. 

      (1) In Figure 4E, are the BLI data fit to a 1:1 binding model? The fits seem a bit off, and from ITC and X-ray studies it is known that 2 Fabs bind 1 peptide. The second Fab should presumably have higher affinity than the first Fab since the second Fab will make interactions with both the peptide and the first Fab. It may be better to fit the BLI data to a 2:1 binding model. 

      The 2:1 (heterogeneous ligand) model assumes that there are two different independent binding sites. However, the second binding event described is dependent on the first binding event and thus this model also does not accurately reflect the system. Given that there is not an ideal model to fit, we instead are careful about the language used in the main text to describe these results. Additionally, we also include a sentence to the results section to ensure that the proper findings/interpretations are highlighted: “…our data all trend toward the same finding in which a stronger binding affinity is observed toward the longer RESA P2 (16AA) peptide.”

      (2) The sidechain interactions shown in Figures 5C and D could probably be improved. The individual residues are just 'floating' in space, causing them to lack context and orientation. 

      We added two panels (Fig. 5C-D) showcasing the binding interface that shows the peptide numbering in the context of the overall complex. We hope that this will help orient the reader.  

      (3) The percentage of Ramachandran outliers should be listed in Table 2. Presumably, the value is 0.2%, but this is omitted in the current table. 

      Table 2 has been modified to include the requested information explicitly.

    1. In fact, research shows that the way people learn is as unique as their fingerprints

      I think this illustrates why it could be important to, as Nick says, separate our students into a few boxes because it makes it easier to think about, and then we can think of obstacles and solutions that may come up in each group while lesson planning.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This interesting study explores the mechanism behind an increased susceptibility of daf-18/PTEN mutant nematodes to paralyzing drugs that exacerbate cholinergic transmission. The authors use state-of-theart genetics and neurogenetics coupled with locomotor behavior monitoring and neuroanatomical observations using gene expression reporters to show that the susceptibility occurs due to low levels of DAF-18/PTEN in developing inhibitory GABAergic neurons early during larval development (specifically, during the larval L1 stage). DAF-18/PTEN is convincingly shown to act cell-autonomously in these cells upstream of the PI3K-PDK-1-AKT-DAF-16/FOXO pathway, consistent with its well-known role as an antagonist of this conserved signaling pathway. The authors exclude a role for the TOR pathway in this process and present evidence implicating selectivity towards developing GABAergic neurons. Finally, the authors show that a diet supplemented with a ketogenic body, β-hydroxybutyrate, which also counteracts the PI3K-PDK-1-AKT pathway, promoting DAF-16/FOXO activity, partially rescues the proper development (morphology and function) of GABAergic neurons in daf-18/PTEN mutants, but only if the diet is provided early during larval development. This strongly suggests that the critical function of DAF18/PTEN in developing inhibitory GABAergic neurons is to prevent excessive PI3K-PDK-1-AKT activity during this critical and particularly sensitive period of their development in juvenile L1 stage worms. Whether or not the sensitivity of GABAergic neurons to DAF-18/PTEN function is a defining and widespread characteristic of this class of neurons in C. elegans and other animals, or rather a particularity of the unique early-stage GABAergic neurons investigated remains to be determined.

      Strengths:

      The study reports interesting and important findings, advancing the knowledge of how daf-18/PTEN and the PI3K-PDK-1-AKT pathway can influence neurodevelopment, and providing a valuable paradigm to study the selectivity of gene activities towards certain neurons. It also defines a solid paradigm to study the potential of dietary interventions (such as ketogenic diets) or other drug treatments to counteract (prevent or revert?) neurodevelopment defects and stimulate DAF-16/FOXO activity.

      Weaknesses:

      (1) Insufficiently detailed methods and some inconsistencies between Figure 4 and the text undermine the full understanding of the work and its implications.

      The incomplete methods presented, the imprecise display of Figure 4, and the inconsistency between this figure and the text, make it presently unclear what are the precise timings of observations and treatments around the L1 stage. What exactly do E-L1 and L1-L2 mean in the figure? The timing information is critical for the understanding of the implications of the findings because important changes take place with the whole inhibitory GABAergic neuronal system during the L1 stage into the L2 stage. The precise timing of the events such as neuronal births and remodelling events are welldescribed (e.g., Figure 2 in Hallam and Jin, Nature 1998; Fig 7 in Mulcahy et al., Curr Biol, 2022). Likewise, for proper interpretation of the implication of the findings, it is important to describe the nature of the defects observed in L1 larvae reported in Figure 1E - at present, a representative figure is shown of a branched commissure. What other types of defects, if any, are observed in early L1 larvae? The nature of the defects will be informative. Are they similar or not to the defects observed in older larvae?

      We thank the reviewer for highlighting these areas for improvement. We have updated and clarified the timing of observation in the text, figures, and methodology section accordingly.

      All experiments were conducted using age-synchronized animals. Gravid worms were placed on NGM plates and removed after two hours. The assays were then carried out on animals that hatched from the eggs laid during this specific timeframe.

      Regarding the detailed timings outlined in the original Figure 4 (now Figure 5 in the revised version), we provided the following information in the revised version: For experiments involving continuous exposure to βHB throughout development, the gravid worms were placed on NGM plates containing the ketone body and removed after two hours. Therefore, this exposure covered the ex-utero embryonic development period up to the L4-Young adult stage when the experiments were conducted.

      In experiments involving exposure at different developmental stages as those depicted in Figure 4 of the original version, (now Figure 5, revised version), animals were transferred between plates with and without βHB as required. We exposed daf-18/PTEN mutant animals to βHB-supplemented diets for 18-hour periods at different developmental stages (Figure 5A, revised version). The earliest exposure occurred during the 18 hours following egg laying, covering ex-utero embryonic development and the first 8-9 hours of the L1 stage. The second exposure period encompassed the latter part of the L1 stage, the entire L2 stage, and most of the L3 stage. The third exposure spanned the latter part of the L3 stage (~1-2 hours), the entire L4 stage, and the first 6-7 hours of the adult stage.

      All this information has been conveniently included in Figure 5, text (Page13, lines 259-276), and in methodology (Page 4, Lines 85-90, Revised Methods and Supplementary information) of the revised manuscript.

      In response to the reviewer's suggestion, we have also included photos of daf-18 worms at the L1 stage (30 min/1h post-hatching). Defects are already present at this early stage, such as handedness and abnormal branching commissures, which are also observed in adult worm neurons (see Supplementary Figure 4, revised version). 

      These defects manifest in DD neurons shortly after larval birth. The prevalence of animals with errors is higher in L4 worms (when both VDs and DDs are formed) compared to early L1s (Figures 3 C-E and Supplementary Figure 4, revised version). This suggests that defects in VD neurons also occur in daf-18 mutants. Indeed, when we analyzed the neuronal morphology of several wild-type and daf-18 mutant animals, we found defects in the commissures corresponding to both DD and VD neurons (Supplementary Figure 3, revised version). 

      These data are now included in the revised version (Results (Page 10, lines 177-196), Discussion (Pages 14-16), Main Figure 3, and Supplementary Figures 3, 4 and 7 revised version)

      (2) The claim of proof of concept for a reversal of neurodevelopment defects is not fully substantiated by data.

      The authors state that the work "constitutes a proof of concept of the ability to revert a neurodevelopmental defect with a dietary intervention" (Abstract, Line 56), however, the authors do not present sufficient evidence to distinguish between a "reversal" or prevention of the neurodevelopment defect by the dietary intervention. This clarification is critical for therapeutic purposes and claims of proof-of-concept. From the best of my understanding, reversal formally means the defect was present at the time of therapy, which is then reverted to a "normal" state with the therapy. On the other hand, prevention would imply an intervention that does not allow the defect to develop to begin with, i.e., the altered or defective state never arises. In the context of this study, the authors do not convincingly show reversal. This would require showing "embryonic" GABAergic neuron defects or showing convincing data in newly hatched L1 (0-1h), which is unclear if they do so or not, as I have failed to find this information in the manuscript. Again, the method description needs to be improved and the implications can be very different if the data presented in Figure 2D-E regard newly born L1 animals (0-1h) or L1 animals at say 5-7h after hatching. This is critical because the development of the embryonically-born GABAergic DD neurons, for instance, is not finalized embryonically. Their neurites still undergo outgrowth (albeit limited) upon L1 birth (see DataS2 in Mulcahy et al., Curr Biol 2022), hence they are susceptible to both committing developmental errors and to responding to nutritional interventions to prevent them. In contrast to embryonic GABAergic neurons, embryonic cholinergic neurons (DA/DB) do not undergo neurite outgrowth post-embryonically (Mulcahy et al., Curr Biol 2022), a fact which could provide some mechanistic insight considering the data presented. However, neurites from other post-embryonically-born neurons also undergo outgrowth postembryonically, but mostly during the second half of the L1 stage following their birth up to mid-L2, with significant growth occurring during the L1-L2 transition. These are the cholinergic (VA/VB and AS neurons) and GABAergic (VD) neurons. The fact that AS neurons undergo a similar amount of outgrowth as VD neurons is informative if VD neurons are or are not susceptible to daf-18/PTEN activity. Independently, DD neurons are still quite unique on other aspects (see below), which could also bring insight into their selective response.

      Finally, even adjusting the claim to "constitutes a proof-of-concept of the ability of preventing a neurodevelpmental defect with a dietary intervention" would not be completely precise, because it is unclear how much this work "constitutes a proof of concept". This is because, unless I misunderstood something, dietary interventions are already applied to prevent neurodevelopment defects, such as when folic acid supplementation is recommended to pregnant women to prevent neural tube defects in newborns.

      Thank you very much for pointing out this issue and highlighting the need to further investigate the ameliorative capacity of βHB on GABAergic defects in daf-18 mutants. In the revised version, we have included experiments to address this point.

      Our microscopy analyses strongly indicate that the development of DD neurons is affected, with errors observed as early as one-hour post-hatching (Main Figure 3, and Supplementary Figures  4 and 7, revised version). Additionally, based on the position of the commissures in L4s, our results strongly suggest that VD neurons are also affected (Supplementary Figure 3, revised version). Both, the frequency of animals with errors and the number of errors per animal are higher in L4s compared to L1 larvae (Main Figures 3,  and Supplementary Figure 4 and 7, revised version). It is very likely that the errors in VD neurons, which are born in the late L1 stage, are responsible for the higher frequency of defects observed in L4 animals. 

      As the reviewer noted, GABAergic DD neurons, which are born embryonically, do not complete their development during the embryonic stages. Some defects in DD neurons may arise during the postembryonic period. Following the reviewer's suggestion, we analyzed L1 larvae at different times before the appearance of VDs (1 hour post-hatching and 6 hours post-hatching). We did not observe an increase in error prevalence, suggesting that DD defects in daf-18 mutants are mostly embryonic (Supplementary Fig 4B, Revised Version). 

      Our findings suggest that βHB's enhancement is not due to preventive effects in DDs, as defects persist in newly hatched larvae regardless of βHB presence (Supplementary Figure 7, revised version), and postembryonic DD growth does not introduce new errors (Supplementary Figure 4, revised version). This lack of preventive effect could be due to βHB's limited penetration into the embryonic environment. Unlike early L1s, significant improvement occurs in L4s upon βHB early exposure (Supplementary Figure 7, revised version). This could be explained by a reversing effect on malformed DD neurons and/or a protective influence on VD neuron development. While we cannot rule out the first option, even if all errors in DDs in L1 were repaired (which is very unlikely), it wouldn't explain the level of improvement in L4 (Supplementary Figure 7, revised version). Therefore, we speculate that VDs may be targeted by βHB. The notion that exposure to βHB during early L1 can ameliorate defects in neurons primarily emerging in late L1s (VDs) is intriguing. We may hypothesize that residual βHB or a metabolite from prior exposure could forestall these defects in VD neurons. Notably, βHB has demonstrated a capacity for long-lasting effects through epigenetic modifications (Reviewed in He et al, 2023, https://doi.org/10.1016%2Fj.heliyon.2023.e21098). More work is needed to elucidate the underlying fundamental mechanisms regarding the ameliorating effects of βHB supplementation. We have now discussed these possibilities under discussion (Page 17, lines 369-383, revised version).

      We agree with the reviewer that the term "reversal" is not accurate, and we have avoided using this terminology throughout the text. Furthermore, in the title, we have decided to change the word "rescue" to "ameliorate," as our experiments support the latter term but not the former. Additionally, the reviewer is correct that folic acid administration to pregnant women is already a metabolic intervention to prevent neural tube defects. In light of this, we have avoided claiming this as proof of concept in the revised manuscript 

      (3) The data presented do not warrant the dismissal of DD remodeling as a contributing factor to the daf-18/PTEN defects.

      Inhibitory GABAergic DD neurons are quite unique cells. They are well-known for their very particular property of remodeling their synaptic polarity (DD neurons switch the nature of their pre- and postsynaptic targets without changing their wiring). This process is called DD remodeling. It starts in the second half of the L1 stage and finishes during the L2 stage. Unfortunately, the fact that the authors find a specific defect in early GABAergic neurons (which are very likely these unique DD neurons) is not explored in sufficient detail and depth. The facts that these neurons are not fully developed at L1, that they still undergo limited neurite growth, and that they are poised for striking synaptic plasticity in a few hours set them apart from the other explored neurons, such as early cholinergic neurons, which show a more stable dynamics and connectivity at L1 (see Mulcahy et al., Curr Biol 2022).

      The authors use their observation that daf-18/PTEN mutants present morphological defects in GABAergic neurons prior to DD remodeling to dismiss the possibility that the DAF-18/PTEN-dependent effects are "not a consequence of deficient rearrangement during the early larval stages". However, DD remodeling is just another cell-fate-determined process and as such, its timing, for instance, can be affected by mutations in genes that affect cell fates and developmental decisions, such as daf-18 and daf-16, which affect developmental fates such as those related with the dauer fate. Specifically, the authors do not exclude the possibility that the defects observed in the absence of either gene could be explained by precocious DD remodeling. Precocious DD remodeling can occur when certain pathways, such as the lin-14 heterochronic pathway, are affected. Interestingly, lin-14 has been linked with daf16/FOXO in at least two ways: during lifespan determination (Boehm and Slack, Science 2005) and in the

      L1/L2 stages via the direct negative regulation of an insulin-like peptide gene ins-33 (Hristova et al., Mol Cell Bio 2005). It is likely that the prevention of DD dysfunction requires keeping insulin signaling in check (downregulated) in DD neurons in early larval stages, which seems to coincide with the critical timing and function of daf-18/PTEN. Hence, it will be interesting to test the involvement of these genes in the daf-18/daf-16 effects observed by the authors.

      This is another interesting point raised by the reviewer. We have demonstrated that defects manifest in early L1 (30 min-1 hour post-hatching) which corresponds to a pre-remodeling time in wild-type worms.

      We acknowledge the possibility of early remodeling in specific mutants as pointed out by the reviewer.

      However, the following points suggest that the effects of these mutations may extend beyond the particularity of DD remodeling: i) Our experiments also show defects in VD neurons in daf-18 mutants (Supplementary Figure 3, revised version), as discussed in our previous response. These neurons do not undergo significant remodeling during their development. ii) DAF-18 and DAF-16 deficiencies produce neurodevelopmental alteration on other Non-Remodeling Neurons: Severe neurite defects in neurons that are nearly fully formed at larval hatching, such as AIY in daf-18 and daf-16 mutants, have been previously reported (Christensen et al., 2011). Additionally, the migration of another neuron, HSN, is severely affected in these mutants (Kennedy et al., 2013). iii) To the best of our knowledge, DD remodeling only alters synaptic polarity without forming new commissures or significant altering the trajectory of the formed ones. Thus, it is unlikely (though not impossible) for remodeling defects to cause the observed commissural branching and handedness abnormalities in DD neurons. Therefore, we think that the impact of daf-18 mutations on GABAergic neurons is not primarily linked to DD remodeling but extends to various neuron types. It is intriguing and requires further exploration in the future, the apparent resilience of cholinergic motor neurons to these mutations. This resilience is not limited to daf18/PTEN animals since mutants in certain genes expressed in both neuron types (such as neuronal integrin ina-1 or eel-1, the C. elegans ortholog of HUWE1) alter the function or morphology of GABAergic neurons but not cholinergic motor neurons (Kowalski, J. R. et al. Mol Cell Neurosci 2014; Oliver, D. et al. J Dev Biol (2019); Opperman, K. J. et al. Cell Rep 2017). These points are discussed in the manuscript (Discussion, page 15, lines 311-322, revised version) and reveal the existence of compensatory or redundant mechanisms in these excitatory neurons, rendering them much more resistant to both morphological and functional abnormalities.

      Discussion on the impact of the work on the field and beyond:

      The authors significantly advance the field by bringing insight into how DAF-18/PTEN affects neurodevelopment, but fall short of understanding the mechanism of selectivity towards GABAergic neurons, and most importantly, of properly contextualizing their findings within the state-of-the-art C. elegans biology.

      For instance, the authors do not pinpoint which type of GABAergic neuron is affected, despite the fact that there are two very well-described populations of ventral nerve cord inhibitory GABAergic neurons with clear temporal and cell fate differences: the embryonically-born DD neurons and the postembryonically-born VD neurons. The time point of the critical period apparently defined by the authors (pending clarifications of methods, presentation of all data, and confirmation of inconsistencies between the text and figures in the submitted manuscript) could suggest that DAF-18/PTEN is required in either or both populations, which would have important and different implications. An effect on DD neurons seems more likely because an image is presented (Figure 2D) of a defect in an L1 daf-18/PTEN mutant larva with 6 neurons (which means the larva was processed at a time when VD neurons were not yet born or expressing pUnc-47, so supposedly it is an image of a larva in the first half of the L1 stage (0-~7h?)). DD neurons are also likely the critical cells here because the neurodevelopment errors are partially suppressed when the ketogenic diet is provided at an "early" L1 stage, but not later (e.g., from L2-L3, according to the text, L2-L4 according to the figure? ).

      Thank you for this insightful input. As previously mentioned, we conducted experiments in this revision to clarify the specificity of GABAergic errors in daf-18/PTEN mutants, in particular, whether they affect DDs, VDs, or both. Our results suggest that commissural defects are not limited to DD neurons but also occur in VD neurons (Supplementary Figure 3). Regarding the effect of βHB, our findings suggest that VD neurons are targets of βHB action. As mentioned in the previous response and the discussion section (Page 17, lines 369-383, revised version), we might speculate that lingering βHB or a metabolite from prior exposure could mitigate these defects in VD neurons that are born in Late L1s-Early L2s. Additionally, βHB has been noted for its capacity to induce long-term epigenetic changes. Therefore, it could act on precursor cells of VD neurons, with the resulting changes manifesting during VD development independently of whether exposure has ceased. All these possibilities are now discussed in the manuscript.

      Acknowledging that our work raises several questions that we aim to address in the future, we believe our manuscript provides valuable information regarding how the PI3K pathway modulates neuronal development and how dietary interventions can influence this process.

      This study brings important contributions to the understanding of GABAergic neuron development in C. elegans, but unfortunately, it is justified and contextualized mostly in distantly-related fields - where the study has a dubious impact at this stage rather than in the central field of the work (post-embryonic development of C. elegans inhibitory circuits) where the study has stronger impact. This study is fundamentally about a cell fate determination event that occurs in a nutritionally-sensitive

      developmental stage (post-embryonic L1 larval stage) yet the introduction and discussion are focused on more distantly related problems such as excitatory/inhibitory (E/I) balance, pathophysiology of human diseases, and treatments for them. Whereas speculation is warranted in the discussion, the reduced indepth consideration of the known biology of these neurons and organisms weakens the impact of the study as redacted. For instance, the critical role of DAF-18/PTEN seems to occur at the early L1 larval stage, a stage that is particularly sensitive to nutritional conditions. The developmental progression of L1 larvae is well-known to be sensitive to nutrition - eg, L1 larvae arrest development in the absence of food, something that is explored in nematode labs to synchronize animals at the L1 stage by allowing embryos to hatch into starvation conditions (water). Development resumes when they are exposed to food. Hence, the extensive postembryonic developmental trajectory that GABAergic neurons need to complete is expected to be highly susceptible to nutrition. Is it? The sensitivity towards the ketogenic diet intervention seems to favor this. In this sense, the attribution of the findings to issues with the nutrition-sensitive insulin-like signaling pathway seems quite plausible, yet this possibility seems insufficiently considered and discussed.

      We greatly appreciate the reviewer's emphasis on the sensitivity of the L1 stage to nutritional status. As the reviewer points out, C. elegans adjusts its development based on food availability, potentially arresting development in L1 in the absence of food. It is therefore reasonable that both the completion of DD neuron trajectories and the initial development steps of VD neurons are particularly sensitive to dietary modulation of the insulin pathway, in which both DAF-18 and DAF-16 play roles. This important point has also been included in the discussion (Page 18, lines 384-407, revised version).

      Finally, the fact that imbalances in excitatory/inhibitory (E/I) inputs are linked to Autism Spectrum Disorders (ASD) is used to justify the relevance of the study and its findings. Maybe at this stage, the speculation would be more appropriate if restricted to the discussion. In order to be relevant to ASD, for instance, the selectivity of PTEN towards inhibitory neurons should occur in humans too. However, at present, the E/I balance alteration caused by the absence of daf-18/PTEN in C. elegans could simply be a coincidence due to the uniqueness of the post-embryonic developmental program of GABAergic neurons in C. elegans. To be relevant, human GABAergic neurons should also pass through a unique developmental stage that is critically susceptible to the PI3K-PDK1-AKT pathway in order for DAF18/PTEN to have any role in determining their function. Is this the case? Hence, even in the discussion, where the authors state that "this study provides universally relevant information on.... the mechanisms underlying the positive effects of ketogenic diets on neuronal disorders characterized by GABA dysfunction and altered E/I ratios", this claim seems unsubstantiated as written particularly without acknowledging/mentioning the criteria that would have to be fulfilled and demonstrated for this claim to be true.

      Our results suggest that defects in GABAergic neurons are not limited to DDs, which, as the reviewer rightly notes, are quite unique in their post-embryonic development primarily due to the synaptic remodeling process they undergo. These defects also extend to VD neurons, which do not exhibit significant developmental peculiarities once they are born. Therefore, we think that the defects are not specific to the developmental program of DD neurons but are more related to all GABAergic motoneurons. Additionally, the observation of defects in non-GABAergic neurons in C. elegans daf-18 mutants supports the hypothesis that the role of daf-18 is not limited to DD neurons (Christensen et al., 2011; Kennedy et al., 2013).

      In mammals, Pten conditional knockout (cKO) animals have been extensively studied for synaptic connectivity and plasticity, revealing an imbalance between synaptic excitation and inhibition (E/I balance) (Reviewed in Rademacher and Eickholt, 2019, Cold Spring Harbor Perspect Med, https://doi.org/10.1101%2Fcshperspect.a036780). This imbalance is now widely accepted as a key pathological mechanism linked to the development of ASD-related behavior (Lee et al, 2017; Biological Psychiatry, https://doi.org/10.1016/j.biopsych.2016.05.011) . The importance of PTEN in the development of GABAergic neurons in mammals is well-documented. For instance, embryonic PTEN deletion from inhibitory neurons impacts the establishment of appropriate numbers of parvalbumin and somatostatin-expressing interneurons, indicating a central role for PTEN in inhibitory cell development (Vogt et al, 2015, Cell Rep, https://doi.org/10.1016%2Fj.celrep.2015.04.019). Additionally, conditional PTEN knockout in GABAergic neurons is sufficient to generate mice with seizures and autism-related behavioral phenotypes (Shin et al, 2021, Molecular Brain, https://doi.org/10.1186%2Fs13041-02100731-8). Moreover, while mice in which PV GABAergic neurons lacked both copies of Pten experienced seizures and died, heterozygous animals (PV-Pten+/−) showed impaired formation of perisomatic inhibition (Baohan et al, 2016, Nature Comm, OI: 10.1038/ncomms12829). Therefore, there is substantial evidence in mammals linking PTEN mutations to neurodevelopmental disorders in general and affecting GABAergic neurons in particular. Hence, we believe that the role of daf-18/PTEN in GABAergic development could be a more widespread phenomenon across the animal kingdom rather than a specific process unique to C. elegans.

      Beyond the points discussed, we have addressed the reviewer's comment regarding the last sentence of the abstract. We have revised it to more cautiously frame the relationship between our findings, ASD, and mammalian neurodevelopmental disorders.

      Reviewer #2 (Public Review):

      Summary:

      Disruption of the excitatory/inhibitory (E/I) balance has been reported in Autism Spectrum Disorders

      (ASD), with which PTEN mutations have been associated. Giunti et al choose to explore the impact of PTEN mutations on the balance between E/I signaling using as a platform the C. elegans neuromuscular system where both cholinergic (E) and GABAergic (I) motor neurons regulate muscle contraction and relaxation. Mutations in daf-18/PTEN specifically affect morphologically and functionally the GABAergic (I) system, while leaving the cholinergic (E) system unaffected. The study further reveals that the observed defects in the GABAergic system in daf-18/PTEN mutants are attributed to reduced activity of DAF-16/FOXO during development.

      Moreover, ketogenic diets (KGDs), known for their effectiveness in disorders associated with E/I imbalances such as epilepsy and ASD, are found to induce DAF-16/FOXO during early development. Supplementation with β-hydroxybutyrate in the nematode at early developmental stages proves to be both necessary and sufficient to correct the effects on GABAergic signaling in daf-18/PTEN mutants.

      Strengths:

      The authors combined pharmacological, behavioral, and optogenetic experiments to show the

      GABAergic signaling impairment at the C. elegans neuromuscular junction in DAF-18/PTEN and DAF-

      16/FOXO mutants. Moreover, by studying the neuron morphology, they point towards

      neurodevelopmental defects in the GABAergic motoneurons involved in locomotion. Using the same set of experiments, they demonstrate that a ketogenic diet can rescue the inhibitory defect in the daf18/PTEN mutant at an early stage.

      Weaknesses:

      The morphological experiments hint towards a pre-synaptic defect to explain the GABAergic signaling impairment, but it would have also been interesting to check the post-synaptic part of the inhibitory neuromuscular junctions such as the GABA receptor clusters to assess if the impairment is only presynaptic or both post and presynaptic.

      Moreover, all observations done at the L4 stage and /or adult stage don't discriminate between the different GABAergic neurons of the ventral nerve cord, ie the DDs which are born embryonically and undergo remodeling at the late L1 stage, and VDs which are born post-embryonically at the end of the L1 stage. Those additional elements would provide information on the mechanism of action of the FOXO pathway and the ketone bodies.

      Thank you for your insightful suggestions. 

      This is an initial study that serves as a cornerstone, demonstrating the sensitivity of GABAergic neuron development to alterations in the PI3K pathway and how these alterations can be mitigated by a dietary intervention with a ketone body. While we have determined that the transcription factor DAF-16/FOXO is essential in the neurodevelopmental process and is the target of ketone bodies to alleviate defects, there are still underlying mechanisms to be elucidated. This is only the first step that opens many avenues for further investigation, including the study of post-synaptic partners.

      While our current study primarily focuses on neuronal alterations without delving into potential postsynaptic effects, we do plan to investigate this aspect in future research. This includes examining GABAergic receptors as well as cholinergic receptors, as exacerbation of cholinergic signaling cannot be ruled out. To conduct a comprehensive study of post-synaptic structure and functionality, we would need strains with fluorescent markers for both pre- and post-synaptic components (such as rab-3, unc-49, unc29, acr-16 fusion to GFP or mCherry). Unfortunately, most of these strains are not currently available in our laboratory. Unlike the US or Europe, acquiring these strains from the C. elegans CGC repository in Argentina is challenging due to common customs delays, which require significant time and resources to navigate. Discussions at the Latin American C. elegans conference with CGC administrators, such as Ann Rougvie, have been initiated to address this issue, but a solution has not been reached yet.  Additionally, to analyze post-synaptic functionality in-depth, studying the response to perfusion with various agonists using electrophysiology would be beneficial. We are in the process of acquiring the capability to conduct electrophysiology experiments in our laboratory, but progress is slow due to limited funding.

      While we believe these experiments are very informative, they will require a considerable amount of time due to our current circumstances. We consider them non-essential to the primary message of the paper, which focuses on neuronal developmental defects leading to functional alterations in daf-18/PTEN mutants and the novel finding that these can be mitigated by supplementing food with hydroxybutyrate. We will study the structure and functionality of the post-synapse in our future projects and also plan to extend this investigation to mutants with deficiencies in genes closely related to neurodevelopmental defects, such as neuroligin, neurexin, or shank-3, which have been implicated in synaptic architecture.

      We also agree that discriminating between DD and VD neurons provides significant insights into the neurodevelopmental phenomena dependent on the FOXO pathway and the action of βHB. In this revised version, we present evidence that not only DD neurons are affected but also VD neurons (see

      Supplementary Figure 3, revised version). This allows us to suggest that daf-18 affects the development of GABAergic neurons regardless of whether they are born embryonically (DDs) or post-embryonically (VDs) (see also our response to the previous reviewer). We hope to distinguish the defects observed in each type of neuron in future studies. For this, we would need to use strains specifically marked in one neuronal type or another, which, for the same reasons mentioned earlier, would take a considerable amount of time under current conditions. 

      Conclusion:

      Giunti et al provide fundamental insights into the connection between PTEN mutations and neurodevelopmental defects through DAF-16/FOXO and shed light on the mechanisms through which ketogenic diets positively impact neuronal disorders characterized by E/I imbalances.  

      Reviewer #3 (Public Review):

      Summary:

      This is a conceptually appealing study by Giunti et al in which the authors identify a role for PTEN/daf-18 and daf-16/FOXO in the development of inhibitory GABA neurons, and then demonstrate that a diet rich in ketone body β-hydroxybutyrate partially suppresses the PTEN mutant phenotypes. The authors use three assays to assess their phenotypes: (1) pharmacological assays (with levamisole and aldicarb); (2) locomotory assays and (3) cell morphological assays. These assays are carefully performed and the article is clearly written. While neurodevelopmental phenotypes had been previously demonstrated for PTEN/daf-18 and daf-16/FOXO (in other neurons), and while KB β-hydroxybutyrate had been previously shown to increase daf-16/FOXO activity (in the context of aging), this study is significant because it demonstrates the importance of KB β-hydroxybutyrate and DAF-16 in the context of neurodevelopment. Conceptually, and to my knowledge, this is the first evidence I have seen of a rescue of a developmental defect with dietary metabolic intervention, linking, in an elegant way, the underpinning genetic mechanisms with novel metabolic pathways that could be used to circumvent the defects.

      Strengths:

      What their data clearly demonstrate, is conceptually appealing, and in my opinion, the biggest contribution of the study is the ability of reverting a neurodevelopmental defect with a dietary intervention that acts upstream or in parallel to DAF-16/FOXO.

      Weaknesses:

      The model shows AKT-1 as an inhibitor of DAF-16, yet their studies show no differences from wildtype in akt-1 and akt-2 mutants. AKT is not a major protein studied in this paper, and it can be removed from the model to avoid confusion, or the result can be discussed in the context of the model to clarify interpretation.

      Thank you very much for the suggestion. We agree with the reviewer's appreciation that the study of AKT's action itself is too limited in this study to draw conclusions that would allow its inclusion in the proposed model. Therefore, following the reviewer's suggestion, we have removed this protein from our model

      When testing additional genes in the DAF-18/FOXO pathway, there were no significant differences from wild-type in most cases. This should be discussed. Could there be an alternate pathway via DAF-18/DAF16, excluding the PI3K pathway or are there variations in activity of PI3K genes during a ketogenic diet that are hard to detect with current assays?

      Thank you for bringing up this point. Our pharmacological experiments indeed demonstrate that all mutants associated with an exacerbation of the PI3K pathway, which typically inhibits nuclear translocation and activity of the transcription factor DAF-16, lead to imbalances in E/I

      (excitation/inhibition) that manifest as hypersensitivity to cholinergic drugs. This includes the gain of function of pdk-1 and the loss of function of daf-18 and daf-16 itself. In our subsequent experiments, we demonstrate that this exacerbation of the PI3K pathway leads to errors in the neurodevelopment of GABAergic neurons, which explains the hypersensitivity to aldicarb and levamisole.

      As the reviewer remarks, it is intriguing why mutants inhibiting this pathway do not show differences in their sensitivity to cholinergic drugs compared to wild-type animals. We can speculate, for instance, that during neurodevelopment, there is a critical period where the PI3K pathway must remain with very low activity (or even deactivated) for proper development of GABAergic neurons. This could explain why there are no differences in sensitivity to cholinergic drugs between mutants that inhibit the PI3K pathway and the wild type. The PI3K pathway depends on insulin-like signals, which are in turn positively modulated by molecules associated with the presence of food. Interestingly, larval stage 1 is particularly sensitive to nutritional status, being able to completely arrest development in the absence of food. Therefore, dietary intervention with BHB may generate a signal of dietary restriction (as seen in mammals) and, as a consequence of this dietary restriction, the PI3K pathway is inhibited, resulting in increased DAF-16 activity. This could restore the proper neurodevelopment of GABAergic neurons. However, this is mere speculation, and further deeper experiments (than the pharmacology ones we performed here) with mutants in different genes within the PI3K pathway may shed light on this point.

      Following the reviewer's suggestion, this point has been discussed in the revised version of the manuscript. (Discussion Page 18, Lines 384-407).

      The consequence of SOD-3 expression in the broader context of GABA neurons was not discussed. SOD3 was also measured in the pharynx but measuring it in neurons would bolster the claims.

      SOD-3 is a known target of DAF-16. Previous studies have shown that βHB induces SOD-3 expression through the induction of DAF-16 (Edwards et al, 2014, Aging,

      https://doi.org/10.18632%2Faging.100683). The highest levels of SOD-3 expression are typically observed in the pharynx or intestine (DeRosa et al, 2019 https://doi.org/10.1038/s41586-019-1524-5;  Zheng et al., 2021, PNAS, https://doi.org/10.1073/pnas.2021063118), and it is often used as a measure of general upregulation of DAF-16. Therefore, we used this parameter as a measure of βHB upregulating systemic DAF-16 activity.  While we agree with the reviewer that observing variations in SOD-3 expression in neurons would further support our conclusions, unfortunately, we did not detect measurable signals of SOD-3 in motor neurons in either the control condition or the daf-18 background even upon stress or BHB-exposure. This may be because SOD-3 is a minor target of DAF-16 in these neurons, or its modulation may not correspond to the timing of fluorescence measurements (L4-adults).

      Despite this, our genetic experiments and neuron-specific rescue experiments lead us to conclude that DAF-16 must act autonomously in GABAergic neurons to ensure proper neurodevelopment.

      If they want to include AKT-1, seeing its effect on SOD-3 expression could be meaningful to the model.

      Thank you for this suggestion. We believe that even measuring SOD-3 levels in akt mutant backgrounds would still provide limited information to give it a predominant value in our work. Additionally, to have a complete understanding of the total role of AKT, it would be necessary to measure it in a double mutant background of akt-1; akt-2, and these double mutants generate 100 % dauers even at 15C (Oh et al., PNAS 2005, https://doi.org/10.1073/pnas.0500749102; Quevedo et al., Current Biology 2007, http://dx.doi.org/10.1016/j.cub.2006.12.038; Gatzi et al., PLOS ONE 2014,

      https://doi.org/10.1371/journal.pone.0107671), greatly complicating the execution of these experiments. Therefore, following the first advice of this reviewer, we have decided to modify our model by excluding AKT.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      ⁃ Please include earlier in the main text the rationale for using unc-25 as a control/reference already when mentioning Figure 1A.

      Thank you for pointing out the need to reference this control earlier. We have included the following paragraph in the description of Figure 1 (Page 5, line 71, revised version):

      “Hypersensitivity to cholinergic drugs is typical of animals with an increased E/I ratio in the neuromuscular system, such as mutants in unc-25 (the C. elegans orthologue for glutamic acid decarboxylase, an essential enzyme for synthesizing GABA). While daf-18/PTEN mutants become paralyzed earlier than wild-type animals, their hypersensitivity to cholinergic drugs is not as severe as that observed in animals completely deficient in GABA synthesis, such unc-25 null mutants (Figures 1B and 1C) indicating a less pronounced imbalance between excitatory and inhibitory signals.”

      ⁃ Please discuss the greater sensitivity of pdk-1(gf) animals to levamisole than to aldicarb.

      Thank you for bringing up this subtle point.  We understand that the reviewer is referring to the paralysis curve in response to aldicarb in pdk-1(gf), which is closer to unc-25 than the curve for levamisole (in both cases, they are more sensitive than the wild type). Therefore, pdk-1(gf) animals seem to be more sensitive to aldicarb than to levamisole. These results are now shown in Figure 1D (revised version).

      The PI3K pathway does not only act in neurons but also in muscles. Gain of function in pdk-1 has been shown to modulate muscle protein degradation (Szewczyk et al, EMBO Journal, 2008. https://doi.org/10.1038/sj.emboj.7601540). In contrast,  no effect on protein degradation has been reported for null mutants in this gene. Several studies have demonstrated that protein degradation levels can differentially affect receptor subunits, particularly acetylcholine receptors (Reviewed in Crespi et al, Br J Pharmacol, 2018). C. elegans is characterized by a wide repertoire of AChR subunits, and there are at least two subtypes of ACh receptors in muscles (one multimeric sensitive to levamisole and one homomeric (ACR-16) insensitive to levamisole) (Richmond et al, 1999 Nature Neuroscience http://dx.doi.org/10.1038/12160; Touroutine D, JBC 2005 https://doi.org/10.1074/jbc.M502818200).

      Interestingly, acr-16 null mutants are hypersensitive to aldicarb (Zeng et al, JCB, 2023, https://doi.org/10.1083/jcb.202301117) while the electrophysiological response to levamisole in this mutant remains similar to that of wild-type (Tourorutine et al, 2005). Therefore, it may be that the gain of function in pdk-1 induces a change in the expression of AChR subtypes in muscle that differentially affect sensitivity to levamisole and ACh. This is purely speculative, and there may be many other explanations. While it would be interesting to explore this difference further, it goes far beyond the scope of this study. The cholinergic drug sensitivity assay is purely exploratory and allowed us to delve into the GABAergic and cholinergic signals in daf-18 mutants. In this sense, the hypersensitivity of pdk-1(gf) to both drugs supports the idea that an increase in PI3K signaling leads to an increased E/I ratio.

      ⁃ Please explain the rationale to perform akt-1 and akt-2 assays separated. Why not test doublemutants? Has their lack of redundancy been determined?.  

      Our pharmacological assays are conducted at the L4 larval stage, making it impossible to analyze the potential redundancy of akt-1 and akt-2 in sensitivity to levamisole and aldicarb. This impossibility arises because the akt-1;akt-2 double mutant exhibits nearly 100% arrest as dauer even at 15°C, as reported in several prior studies (Oh et al., PNAS 2005, https://doi.org/10.1073/pnas.0500749102; Quevedo et al., Current Biology 2007, http://dx.doi.org/10.1016/j.cub.2006.12.038; Gatzi et al., PLOS ONE 2014, https://doi.org/10.1371/journal.pone.0107671). While the increased dauer arrest in the double mutant compared to the single mutants might suggest redundant functions in dauer entry, there are also reports indicating the absence of redundancy in other processes, such as vulval development (Nakdimon et al., PLOS Genetics 2012, https://doi.org/10.1371%2Fjournal.pgen.1002881).

      The complete Dauer arrest likely underlies why other studies focusing on the role of the PI3K pathway in neurodevelopment utilize both mutants separately (Christensen et al, Development 2011,

      https://doi.org/10.1242/dev.069062). While determining the potential redundancy of these genes is not feasible for this assay, we utilized various mutants of the pathway (age-1, pdk-1, daf-18, daf-16 and daf16;daf-18 in addition to the akt-s) that support the conclusion, which is that exacerbating the PI3K pathway activity makes animals hypersensitive to cholinergic drugs.

      In response to the reviewer's concern, we have added a sentence in the text explaining the impossibility of performing the assay in the akt-1;akt-2 double mutant (Page 6, lines90-92) 

      Figure 1C and D (This applies to all similarly presented bar figures). Please show data points and dispersion (preferably data, median+- 25-75% or average+-SD). 

      Thank you. Done

      ⁃ Line 112 -maybe "and resumes"? 

      Thank you. Done (Line 126, revised version)

      ⁃ Figure 1E and F. Please present mean +-SD (not SEM) of fluctuations. Please change slightly the tones so that the dispersion is easier to distinguish on the "blue light on" box.

      Thank you for the suggestion. We have adjusted the tones as recommended to enhance the visualization of the "blue light on" box. For visualization purposes, we present the shading of the standard error of the mean (SEM), as is usual in these types of optogenetic experiments where traces of animal length variations are measured (Liewald et al, Nature Methods, 2008, doi: 10.1038/nmeth.1252; Schulstheis et al, J. Neurophysiology, 2011, doi: 10.1152/jn.00578.2010; Koopman et al, BMC Biology 2021, https://doi.org/10.1186/s12915-021-01085-2; Seidhenthal et al, Micro Publication Biology, 2022, https://doi.org/10.17912%2Fmicropub.biology.000607 ).

      For the revised version, we have also included bar graphs for each optogenetic experiment, representing the mean of the length average of each worm measured from the first second after the blue light was turned on until the second before the light was turned off (in the graph, this corresponds to the period between seconds 6 and 9 of the traces). These graphs include the standard deviation and the corresponding significance levels. All of this has been included in the new legend (Figure 2D, 2E, 4E-J).

      ⁃ Figure 1A&1B & Supplementary Figure 1D x Supplementary Figure 1E&1F. What is the difference between these experiments? Whereas the unc-25 mutants paralyze in the same amount of time, the WT animals paralyze ~1 h later in Supplementary Figure 1E-1F in response to either drug. Please revise experimental conditions to see if anything can be learned eg, maybe this is a nutritional response from experiments done at different timepoints? Maybe different food recipes affected sensitivity to paralysis?

      Thank you for pointing this out. While the experiments with daf-18 (in both alleles) and daf-16 were conducted at the beginning of this project (2019-2020), the assays with the other mutants in the PI3K and mTOR pathways were performed years later. Changes in the reagents used (agar, peptone, cholesterol, etc.) to grow the worms have occurred, potentially altering the animals' response directly or through the nutritional quality of the bacteria they grow on. In addition, the difference may be attributed to the fact that experiments at the project's outset were conducted by one author, while more recent experiments were carried out by another. The goal is to quantify paralysis in non-responsive worms after touch stimulation. The force of this probing or the thickness of the hair used for touching can be slightly operator-dependent and can lead to variable responses. In addition, always the presence of wild-type and unc-25 strain is included as internal control in every experiment. Nevertheless, despite this userdependent variation, the experiments were always conducted blindly (except for unc-25, whose uncoordinated phenotype is easily identifiable), thus we trust in the outcomes.

      ⁃ Supplementary Figure 1G - Length and Width appear to be switched in both left and right panels - please revise and include a description of N and of statistics depicted. 

      Unfortunately, we don't see the switching error that the reviewer mentioned. In the left panel, we demonstrate that optogenetic activation of GABAergic neurons leads to an increase in length without modifying the width of the animal. Therefore, we conclude that the increase in area, as observed in our Fiji macro for optogenetic response analysis, is due to an increase in the animal's length. In the cholinergic activation shown in the right panel, the animal shortens (decreasing length) without modifying the width, resulting in the reduction of the total body area. 

      We have included information about N (sample size) and the statistical test used in the legends as suggested. These graphs are now shown as Figures 2F and G, revised version.

      ⁃ Supplementary Figure 1G legend lines 779-780. Please describe the post-hoc test applied following ANOVA to obtain the denoted p values. This applies to all datasets where ANOVA or Krusal-Wallis tests were applied.

      Following reviewer´s suggestion, all the post-hoc tests applied after ANOVA or Kruskal-Wallis analysis were included in the legend of each figure and Materials and Methods (statistical analysis section).

      ⁃ Line 174 maybe "arises *from* the hyperactivation" instead of *for*?.

      Corrected. Thank you. Line 190, revised version.

      ⁃ Supplementary Figure 4. On line 816 it says n=40-90, but please check the n of the daf-18, daf-16 samples, which seem to have less than 40 animals.

      We understand that the reviewer is referring to Supplementary Figure 3 from the original version (now Supplementary Figure 5 in the revised version). We have now included the number of observations below each data point cloud to clearly indicate the sample size for each condition

      ⁃ Supplementary Figure 4 - please state what are the bars on the graphs. Please state which post-hoc test was performed after Kruskal-Wallis and present at least the p values obtained between treated controls and each genotype. Alternatively, present the whole truth table in supplementary daita.

      We understand that the reviewer is referring to Supplementary Figure 3 from the original version (now Supplementary Figure 5 in the revised version). There was an error in the original legend (thank you for bringing this to our attention) since the statistics were not performed using Kruskall-Wallis in this case, but rather each treated condition was compared to its own untreated control using Mann-Whitney test. We have now added the p-values to the graph. All raw data for this figure, as well as for all other figures, are available in Open Science Framework (https://osf.io/mdpgc/?view_only=3edb6edf2298421e94982268d9802050).

      ⁃ Please cite the figure panels in order: eg, Figure 3E is mentioned in the text after panels Figure 3F-K.

      Done. We have rearranged the figures to adapt them to the text order (Figure 4, revised version)

      ⁃ Figure 4 - line 610 please revise "(n=20-30 (n: 20-25 animals per genotype/trial)."

      Thank you. Corrected.

      ⁃ Figure 4 - there appears to be an inconsistency in the figure with the text (lines 223-225). In figures it says E-L1, but in the text, it says "solely in L1". Does E-L1 include the whole L1 stage? If not- E-L1 can be interpreted only as during the embryonic stage, hence, no exposure to betaHB due to the impermeable chitin eggshell. Then there is L1-L2, which should cover the L1 stage and the L2 or something else. Please revise. The text mentions L2-L3 or L3-L4 and these categories are not in the figures. This clarification is key for the interpretation of the results. The precise developmental time of the exposures is not defined either in the methods or in the figures. Please provide precise times relative to hours and/or molts and revise the text/figure for consistency.

      The reviewer is entirely correct in pointing out the lack of relevant data regarding the exposure time to βHB. We have now clarified the information For the revised version, we have adjusted the nomenclature of each exposure period to precisely reflect the developmental stages involved.

      For the experiments involving continuous exposure to βHB throughout development, the NGM plate contained the ketone body. Therefore, the exposure encompassed, in principle, the ex-utero embryonic development period up to L4-Young adults (E-L4/YA, in Figure 5A) when the experiments were conducted. Since it could be a restriction to drug penetration through the chitin shell of the eggs (see Supplementary Figure 7), we can ensure βHB exposure from hatching.

      In experiments involving exposure at different developmental stages as those depicted in Figure 4 of the original version, (now Figure 5), animals were transferred between plates with and without βHB as required. We exposed daf-18/PTEN mutant animals to βHB-supplemented diets for 18-hour periods at different developmental stages (Figure 5A). The earliest exposure occurred during the 18 hours following egg laying, covering ex-utero embryonic development and the first 8-9 hours of the L1 stage (This period is called E-L1, in figure 5 revised version). The second exposure period encompassed the latter part of the L1 stage, the entire L2 stage, and most of the L3 stage (L1-L3). The third exposure spanned the latter part of the L3 stage (~1-2 hours), the entire L4 stage, and the first 6-7 hours of the adult stage (L3-YA).

      All this information has been conveniently included in Figure 5 (and its legend), text (Page 13, lines 259276), and Material and Methods of the revised manuscript.

      ⁃ Some methods are not sufficiently well described. Specifically, how the animals were exposed to treatments and how stages were obtained for each experiment. Was synchronization involved? If so, in which experiments and how exactly was it performed?

      As mentioned in previous responses all the experiments were performed in age-synchronized animals. We include the following sentence in Materials and Methods (C. elegans culture and maintenance section): “All experiments were conducted on age-synchronized animals. This was achieved by placing gravid worms on NGM plates and removing them after two hours. The assays were performed on the animals hatched from the eggs laid in these two hours”.

      Reviewer #2 (Recommendations For The Authors):

      Major points

      (1) To complete the study on the GABAergic signaling at the NMJs, it would be interesting to assess the status of the post-synaptic part of the synapse such as the GABAR clustering. It would also tell if the impairment is only presynaptic or both post and presynaptic.

      Thank you for your insightful suggestion. We agree that exploring post-synaptic elements can shed light on whether the impairment is solely presynaptic or involves both pre and post-synaptic components.

      While our current study primarily focuses on neuronal alterations without delving into potential postsynaptic effects, we do plan to investigate this aspect in the future. This includes not only examining GABAergic receptors but also exploring cholinergic receptors, as exacerbation of cholinergic signaling cannot be ruled out. To conduct a comprehensive study of post-synaptic structure and functionality, we would need strains with fluorescent markers for both pre and post-synaptic components (rab-3, unc-49, unc-29, acr-16 driving GFP or mCherry). However, most of these strains are not currently available in our laboratory. Unlike the US or Europe, acquiring these strains from the C. elegans CGC repository in Argentina is challenging due to common customs delays, requiring significant time and resources to navigate. Discussions at the Latin American C. elegans conference with CGC administrators, such as Ann Rougvie, have been initiated to address this issue, but a solution has not been reached yet. 

      Additionally, to analyze post-synaptic functionality in-depth, studying the response to perfusion with various agonists using electrophysiology would be beneficial. We are in the process of acquiring the capability to conduct electrophysiology experiments in our laboratory, but progress is slow due to limited funding.

      While we believe these experiments are very informative, they will require a considerable amount of time due to our current circumstances. We consider them non-essential to the primary message of the paper, which focuses on neuronal morphological defects leading to functional alterations in daf-18/PTEN mutants.

      We will include these experiments in our future projects, also planning to extend this investigation to mutants with deficiencies in genes closely related to neurodevelopmental defects, such as neuroligin, neurexin, or shank-3, which have been implicated in synaptic architecture.

      (2) The author always referred to unc-47 promoter or unc-17 promoter, never specifying where those promoters are driving the expression (and in the Materials & Methods, no information on the corresponding sequence). Depending on the promoters they may not only be expressed in the motoneurons involved in locomotion (VA, VB, DA, DB, VD, and DD), but they could also be expressed in other neurons which could be of importance for the conclusions of the optogenetic assays but also the daf-18 expression in GABAergic neurons.

      We appreciate the reviewer's insight regarding the broader expression patterns of the unc-17 and unc-47 promoters in all cholinergic and GABAergic neurons, respectively. The strains expressing constructs with these promoters were obtained from the CGC or other labs and have been widely used in previous papers (Liewald et al, Nature Methods, https://www.nature.com/articles/nmeth.1252 (2008); Byrne, A. B. et al. Neuron 81, 561-573, doi:10.1016/j.neuron.2013.11.019 (2014).

      Regarding the optogenetic assays, the readout utilized (body length elongation or contraction) is primarily associated with the activity of cholinergic and GABAergic motor neurons and has been used in numerous studies to measure motor neuron functionality (Liewald et al, Nature Methods, https://www.nature.com/articles/nmeth.1252 (2008);Hwang, H. et al. Sci Rep 6, 19900, doi:10.1038/srep19900 (2016); Schultheis et al,  . J Neurophysiol 106, 817-827, doi:10.1152/jn.00578.2010 (2011); Koopman, M., Janssen, L. & Nollen, E. A. BMC Biol 19, 170, doi:10.1186/s12915-021-01085-2 (2021);). It has previously been established that the shortening observed after optogenetic activation of the unc-17 promoter, while active in various interneurons, depends on the activity of cholinergic motor neurons (Liewald et al., Nature Methods, https://www.nature.com/articles/nmeth.1252 (2008)). This was demonstrated by examining transgenic worms expressing ChR2-YFP from another cholinergic, motoneuronspecific but weaker promoter, Punc-4. They observed contraction and coiling upon illumination, albeit to a milder degree.

      In terms of GABAergic neurons, only 3 do not directly synapse to body wall muscles (AVL, PDV, and RIS) and are primarily involved in defecation. Of the 23 GABAergic motor neurons, 19 are Dtype motoneurons, while the remaining 4 innervate head muscles (Pereira et al, eLife 2015, https://doi.org/10.7554/eLife.12432). It is therefore expected that while there may be some contribution from these latter neurons to the elongation after optogenetic activation in animals containing punc-47::ChR2, the main contribution should be from the D-type neurons. Additionally, while there may be some influence on D-type neuron development due to daf-18 rescue in neurons like RME, DVB or AVL, the most direct explanation for the rescue is that daf-18 acts autonomously in D-type cells.  Additionally, we have pharmacological and behavioral assays that support the findings of optogenetics and enable us to reach final conclusions.

      (3) DD neurons are born during embryogenesis and newborn L1s have neurites even though less than at a later stage. If possible, it would be interesting to take a look at them to see if βHB has an effect or not. It will corroborate the hypothesis that βHB action is prevented by the impermeable eggshell on a system that can respond at a later stage. Moreover, using a specific DD, DA, and DB promoter, it would be possible to check if there is a difference in the morphological defects between embryonic and post-embryonic neurons.

      This is a very interesting point raised by the reviewer. We conducted experiments to analyze the morphology of GABAergic neurons in animals exposed to βHB only during the ex-utero embryonic development (in their laid egg state). We observed that this incubation was not sufficient to rescue the defects in GABAergic neurons (Supplementary Figure 7, revised version). As reported by other authors and discussed in our paper, the chitinous eggshell might act as an impermeable barrier to most drugs. However, we cannot rule out that incubation during this period is necessary but not sufficient to mitigate the defects. We have included these experiments in Supplementary Figure 7 and in the text (Page 13, lines 272-276)

      Additionally, we analyzed confocal images where, based on their position, we could identify and assess errors in DD (embryonic) and VD (Post-embryonic) neurons (Supplementary Figure 3, revised version). These experiments show that the effects are observed in both types of neurons, and we did not observe any differential alterations in neuronal morphology between the two types of neurons.

      Minor points

      (1)   Expression of daf-18/PTEN in muscle or hypodermis, could it ensure a proper development? It could give insights into the action mechanism of βHB.

      The reviewer's observation is indeed very intriguing. Previous studies from the Grishok lab (Kennedy et al, 2013) have demonstrated that the expression of daf-18 or daf-16 in extraneuronal tissues, specifically in the hypodermis, can rescue migratory defects in the serotoninergic neuron HSN in daf-18 or daf-16 null mutants of C. elegans. Clearly, this could also be an option for rescuing the morphological and functional defects of GABAergic motoneurons.

      However, the fact that the expression of daf-18 in GABAergic neurons rescues these defects strongly suggests an autonomous effect. In this regard, autonomous effects of DAF-18 or DAF-16 on neurodevelopmental defects have also been reported in interneurons in C. elegans (Christensen et al, 2011). This is included in the discussion (Page 15, lines 330-335)

      (2) Re-organise the introduction. The paragraph on ketogenic diets (lines 35-38) is not logically linked.

      Following reviewer´s suggestion we have reorganized the introduction and changed the order of explanation regarding the significance of ketogenic diets, linking it with their proven effectiveness in alleviating symptoms of diseases with E/I imbalance (Lines 23-60, revised version)

      (3) Incorporate titles in the result section to guide the reader.

      Done. Thank you

      (4) Systematically add PTEN or FOXO when daf-18 or daf-16 are mentioned (for example lines 69, 84, 85).

      Done. Thank you  

      (5) Strain lists: lines 646 to 653: some information is missing on the different transgenes used in this study (integrated (Is) or extrachromosomal (Ex) with their numbers).

      Thank you for bringing this to our attention. We have now included all the information regarding the different transgenes used in this study, including whether they are integrated (Is) or extrachromosomal (Ex) and their respective numbers. This information can be found in the revised version of the manuscript (Materials and Methods, C. elegans culture and maintenance section highlighted in yellow).

      Reviewer #3 (Recommendations For The Authors):

      In Figure 1, some experiments were done with the unc-25 control while others, such as the optogenetic experiments, were done without those controls.

      Thank you for pointing this out. In the optogenetic experiments, we waited for the worm to move forward for 5 seconds at a sustained speed before exposing it to blue light to standardize the experiment, as the response can vary if the animal is in reverse, going forward, or stationary. Due to the severity of the uncoordinated movement in unc-25 mutants, achieving this forward movement before exposure is very difficult. Additionally, this lack of coordination prevents these animals from performing the escape response tests, as they barely move. Therefore, we limited the use of this severe GABAergic-deficient control to pharmacological or post-prodding shortening experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      […]

      (1) The authors claim that the negative frequency dependence that maintains polymorphism in their model results from a non-linear relationship between the display trait and sexual success [...] Maybe I missed something, but the authors do not provide support for their claim about the negative frequency-dependence of sexual selection in their simulations. To do so they could (1) extract the relationship between the relative mating success of the two male types from the simulations and (2) demonstrate that polymorphism is not maintained if the relationship between male display trait and mating success is linear.

      We believe that there is a confusion of terminology here. We agree that for the two alleles at a locus impacting male display in our model, the allele conferring inferior display quality will have a fitness that increases as its frequency increases, so this allele displays positive frequency dependent fitness. And, the alternate, display-favoring allele at the locus does display negative frequency dependence. Our use of the terminology ‘negative frequency dependence’ was meant to refer to the negative dependence of the fitness of the display-favoring allele with respect to its own frequency. However, a significant body of literature instead discusses models in which both an allele and its alternate(s) are beneficial when at low frequency and deleterious when at high frequency under the same selective challenge, entailing negative frequency dependence of fitness for all alleles involved. This benefit-when-rare model of a single trait is often described simply as negative frequency dependence, and generates balancing selection at the locus, but is not the model we are presenting here, and does not encompass all models involving negative frequency dependent fitness. This lexical expectation may make the interpretation of our work more difficult, and we have amended the manuscript to make our model clearer (lines 227-231). In this model, we have a negative frequency dependence for the fitness of the display-favoring allele in mate competition, but the net selective disadvantage of this allele at high frequency is due to a cost in another, pleiotropic, fitness challenge: the constant survival effect. So, the alleles are under balancing selection where alternate alleles are favored by selection when rare, but not due solely to selection during mate competition. Instead, our model relies on pleiotropy for an emergent form of frequency-dependent balancing selection (in the sense that each allele is predicted to be beneficial on balance when rare).

      In the reviewer’s model of the success of two alleles at one locus, the ratio of success is vaguely linear with allele frequency for n=3, though it starts quite convex and has an inflection point between convex and concave segments (for the disfavored allele) at p≈0.532. This is visualized easily by plotting the function and its derivatives in Wolfram-Alpha. For n>=4, the fitness function with respect to the display-favoring/disfavoring allele becomes increasingly concave/convex respectively, and this specific nonlinearity is needed to act along with the antagonistic pleiotropy to maintain balancing selection, rather than being maintained by a model that favors any rare allele on the basis of its rarity in some manner. In an attempt to make the importance of the encounter number parameter clearer, we’ve generated new panels for Figure S1 which simulate encounter numbers 2, 3, and 4, and we have updated corresponding text and figure references in lines 335-338.

      For (1-2), it is not clear how to modify the simulation such that the relationship between the trait value and mating success can be perfectly linear - either linear with respect to allele frequency in a one locus model or linear with respect to trait value at a specific population composition, without removing the simulation of mate competition altogether. While it may be of interest to explore a more comprehensive range of biological trade-offs in future studies, we are not able to meaningfully do so within the context of the present manuscript.

      (2) The authors only explore versions of the model where the survival costs are paid by females or by both sexes. We do not know if polymorphism would be maintained or not if the survival cost only affected males, and thus if sexual antagonism is crucial.

      We now present simulations with male costs only as added panels to Figure S1 and mention these results in the main text (lines 334-335). Maintenance of the polymorphism is significantly reduced or completely absent in such simulations.

      (3) The authors assume no cost to aneuploidy, with no justification. Biologically, investment in aneuploid eggs would not be recoverable by Drosophila females and thus would potentially act against inversions when they are rare.

      We did offer some discussion and justification of our decision to model no inherent fitness of the inversion mutation itself, specifically aneuploidy, in lines 36-39 and 78-80 of the original reviewed preprint. Previous research suggests that D. melanogaster females may not actually invest in aneuploid eggs generated from crossover within paracentric inversions. While surprising, and potentially limited to a subset of clades, many ‘r-selected’ taxa or those in which maternal investment is spread out over time may have some degree of reproductive compensation for non-viable offspring, which can reduce the costs of generating aneuploids significantly (for example, t-haplotypes in mice). We have added this example and citation to lines 34ff in the current draft.

      (4) The authors appear to define balanced polymorphism as a situation in which the average allele frequency from multiple simulation runs is intermediate between zero and one (e.g., Figure 3). However, a situation where 50% of simulation runs end up with the fixation of allele A and the rest with the fixation of allele B (average frequency of 0.5) is not a balanced polymorphism. The conditions for balanced polymorphism require that selection favors either variant when it is rare.

      We originally chose mean final frequency for presenting the single locus simulations based on the ease of generating a visual plot that included information on fixation vs loss and equilibrium frequency. Figure 3 and related supplemental images have been changed to now also represent the proportion of simulations retaining polymorphism at the locus in the final generation.

      (5) Possibly the most striking result of the experiment is the fact that for 14 out of 16 combinations of inversion x maternal background, the changes in allele frequencies between embryo and adult appear greater in magnitude in females than in males irrespective of the direction of change, being the same in the remaining two combinations. The authors interpret this as consistent with sexually antagonistic pleiotropy in the case of In(3L)Ok and In(3R)K. The frequencies of adult inversion frequencies were, however, measured at the age of 2 months, at which point 80% of flies had died. For all we know, this may have been 90% of females and 70% of males that died at this point. If so, it might well be that the effects of inversion on longevity do not systematically differ between the ages and the difference in Figure 9B results from the fact that the sample includes 30% longest-lived males and 10% longest-lived females.

      This critique deserves some consideration. The aging adults were separated by sex during aging, but while we recorded the number of survivors, we did not record the numbers of eclosed adults and their sexes initially collected out of an interest in maintaining high throughput collection. We therefore cannot directly calculate the associated survival proportions, but we can estimate them. We collected 1960 females and 3156 males, and we can very roughly estimate survival if we assume that equal numbers of each sex eclosed, and that the survivors represent 20% of the original population. That gives 12790 individuals per sex, or 84.7% female mortality and 75.3% male mortality.

      So, we have added a qualification discussing the possibility of stronger selection on females and its influence on observed sex-specific frequency changes, on lines 602-605.

      (6) Irrespective of the above problem, survival until the age of 2 months is arguably irrelevant from the viewpoint of fitness consequences and thus maintenance of inversion polymorphism in nature. It would seem that trade-offs in egg-to-adult survival (as assumed in the model), female fecundity, and possibly traits such as females resistance to male harm would be much more relevant to the maintenance of inversion polymorphisms.

      Adult Drosophila will continue to reproduce in good conditions until mortality, and the estimated age of a mean reproductive event for a Drosophila melanogaster individual is 24 days (Pool 2015), and likewise for D. simulans (Turelli and Hoffman 1995). Given that reproduction is centered around 24 days, we expect sampling at 2 months of age to still be relevant to fitness. In seasonally varying climates, either temperate or with long dry season, survival through challenging conditions is expected to require several months. In many such cases, females are in reproductive diapause, and so longevity is the main selective pressure. See lines 931-936 in the revised manuscript.

      As we agreed above, it would of interest to investigate a wider range of trade-offs in future studies. We focused here on the balanced between survival and male reproductive success because the latter trait generates negative frequency dependence for display-favoring alleles and a disproportionate skew towards higher quality competitors, whereas many other fitness-relevant traits lack that property.

      (7) The experiment is rather minimalistic in size, with four cages in total; given that each cage contains a different female strain, it essentially means N=1. The lack of replication makes statements like " In(2L)t and In(2R)NS each showed elevated survival with all maternal strains except ZI418N" (l. 493) unsubstantiated because the claimed special effect of ZI418N is based on a single cage subject to genetic drift and sampling error. The same applies to statements on inversion x female background interac7on (e.g., l. 550), as this is inseparable from residual variation. It is fortunate that the most interesting effects appear largely consistent across the cages/female backgrounds. Still, I am wondering why more replicates had not been included.

      Our experimental approach might be described as “diversity replication”. Essentially, the four maternal genetic backgrounds are serving dual purposes – both to assess experimental consistency and to ensure that our conclusions are not solely driven by a single non-representative genotype (which in so many published studies, can not be ruled out). It would indeed be interesting if we could have quadrupled the size of our experiment by having four replicates per maternal background. However, we suspect the reviewer may not recognize the substantial effort involved in our four existing experiments. Each of these involved collecting 500+ virgin females, hand-picking thousands of embryos during the duration of egg-laying, and repeatedly transferring offspring to maintain conditions during aging, such that cages had to be staggered by more than a month. These four cages took a year of benchwork just to collect frozen samples, before any preparation and quality control of the associated amplicon libraries for sequencing. Adding a further multiplier would take it well beyond the scope of a single PhD thesis.  Fortunately, we were able to obtain the key results of interest without that additional effort, even if clearer insights into the role of maternal background would also be of strong interest.

      We do agree that no firm conclusions about maternal background can be reached without further replication, and so we have qualified or removed relevant statements accordingly (lines 568ff, 620-622).

      Reviewer #1 (Recommendations For The Authors):

      The description of the model is confusing and incomplete, e.g., the values of several parameters used to obtain the numerical results are not given. It is first stated (l. 223) that the model is haploid, but text elsewhere talks about homozygotes and heterozygotes. If the model is diploid (this in itself is not clear), what is assumed about dominance?

      We are not presenting results for a mathematical model estimated numerically. We have now clarified our transition from a conceptual depiction of our model, in which we use haploid representations for simplified presentation, to our forward population genetic simulations, which are entirely diploid. More broadly, we have improved our communication of the assumptions and parameters used in our simulations. The scenarios we investigate involve purely additive trait effects within and between loci (except that survival probabilities are multiplicative to avoid negative values). We think that considering other dominance scenarios would be a worthy subject for a follow-up study, whereas the present manuscript is already covering a great deal of ground.   

      Similarly, it is hard to understand the design (l.442ff). I was confused as to whether a population was set up for each inversion or for all of them and what the unit or replication was. I found the description in Methods (l. 763-771) much clearer and only slightly longer; I suggest the authors transfer it to the Results. Also, Figure 8 should contain the entire crossing scheme; the current version is misleading in that it implies males with only two genotypes.

      All four tested inversions were segregating within the same karyotypically diverse population of males, and were assayed from the same experiments. We have attempted to improve the relevant description. For Figure 8, we had trouble conceiving a graphic update that contained a more complete cross scheme without seeming much more confused and cluttered. We have tried to clarify in the relevant text and the figure caption instead.

      There are a number of small issues that should be addressed:

      - No epistasis for viability assumed - what would be the consequence?

      We explored a model in which we intentionally included no terms for epistatic effects on phenotype. All epistasis with regard to fitness is emergent from competition between individuals with phenotypes composed of non-epistatic, non-dominant genetic effects. So, the simplest model of antagonism would have no epistasis for viability whatsoever. One could explore a model that has emergent viability epistasis in a similar way, by implementing stabilizing selection on a quantitative trait with a gaussian or similar non-linear phenotype-to-fitness map, but that might be better served as a topic for a future study. We have, however, tried to make this intent clearer in the text.

      l. 750 implies that aneuploidy generated by the inversion has no cost (aneuploid games are resampled)

      Yes, as addressed in public review item (3). Alternately see lines 34ff, 293, 369, 392 for in-text edits.

      l. 24-25: unclear; is this to mean that there is haplotype x sex interaction for survival?

      l. 25: success in what? (I assume this will be explained in the paper, but the abstract should stand on its own).

      l. 193-4: "producing among most competitive males": something missing or a word too much?? Figure 1B,C: a tiny detail, but the plots would be more intuitive if the blue (average) bars were ager (i.e., to the right) of the male and female ones, given that the average is derived from the two sex-specific values.

      Each of the above have been edited or implemented as suggested

      l. 205. It is convex function, but I do not understand what the authors mean by "convex distribution".

      Hopefully the updated text is clearer: “yielding a distribution of male reproductive output that follows a relatively convex trend”.

      l. 223ff: some references to Fig 1 panels in this paragraph seem off by one letter (i.e., A should be B, etc.).

      l. 231 "fitness...are equally fit": rephrase 

      l. 260: maybe "thrown out" is not the most fortunate term, maybe "eliminated" would be better?

      Each of the above have been edited or implemented as suggested

      Figure 3: I do not understand the meaning of "additive" and "multiplicative" in the case of a single locus haploid model

      All presented simulations are diploid, and these refer to the interactions between the two alleles at the locus. Hopefully the language is overall clearer in this draft.

      l. 274: "Mutation of new nucleotide" meaning what? Or is it mutation _to_ a new nucleotide?

      Hopefully the revised text is clearer.

      Figure 5. The right panel of figure 5A implies that, with the inversion, the population evolves to an extreme display trait that is so costly that it fills 95% of all individuals (or of all females?

      What is assumed about this here?). Apart from the biological realism of this result, what does it say about the accumulation of polymorphism and maintenance of the inversion? The graphs in fig 5B do plot a divergence between haplotypes, but it is not clear how they relate to those in panel A - the parameter values used to generate these plots are again not listed. Furthermore, from the viewpoint of the polymorphism, it would be good to report the frequencies at the steady-state.

      We have now clarified the figure description, including the parameter values used. The distribution of frequencies at the end of the simulation is represented in figure 6. Given that we set up the simulation with assumptions that are otherwise common to population models, what biological process would prevent this extreme? Why isn’t this extreme observed in natural populations? One possible explanation is that they become sex chromosomes, with increasing likelihood as the cost increases. Or other compensatory changes may occur that we don’t simulate, like regulatory evolution giving a complementary phenotype. Maybe genetic constraints in natural populations prevent the mutation of the kind of pleiotropic mutations that drive this dynamic. The populations still survive, though they are parameterized by relative fitness. What would an absolute fitness population function be? Would it go extinct or not? It would be of interest to explore a wider range of models, but it is the purpose of this paper to establish that this is a viable model for the maintenance of sexually antagonistic polymorphism and association with inversions. We have added a paragraph motivated by this comment to the Discussion starting on line 765.

      l. 401-2: Z-like, W-like : please specify you are talking about patterns resembling sex chromosomes. 

      l. 738: "population calculates"?

      l. 743-4 and 746-7: is this the same thing said twice, or are there two components of noise?  l. 357: there is no figure 5C.

      Each of the above have been addressed with text edits.

      L. 473-5: Yes, the offspring did not contain inversion homozygotes, but the sire pool did, didn't it? So homozygous inversions may have affected male reproductive success. Anyway, most of this paragraph (from line 473) seems to belong in Discussion rather than Results.

      We have revised this sentence to focus on offspring survival. 

      We can understand the reviewer’s suggestion about Results vs. Discussion text. While this can often be a challenging balance, we find that papers are often clearer if some initial interpretation is offered within the Results text. However, we moved the portion of this paragraph relating our findings to the published literature to the Discussion.

      l. 516: " In(3L)Ok favored male survival": this is misleading/confusing given the data, " In(3L)Ok reduced female survival more strongly than male survival..."

      Hopefully the phrasing is clearer now.

      l. 663ff: I did not have an impression that this section added anything new and could safely be cut.

      We have done some editing to make this more concise and emphasize what we think is essential, but we believe that the model of an autosomal, sexually antagonistic inversion differentiating before contributing to the origin of a sex chromosome is novel and interesting. And, that this additional emphasis is worthwhile to encourage thought and consideration of this idea in future research and among interested researchers.

      l. 751: "flat probability per locus": do the authors mean a constant probability?

      Edited.

      Reviewer #2 (Public Review):

      The manuscript lacks clarity of writing. It is impossible to fully grasp what the authors did in this study and how they reached their conclusions. Therefore, I will highlight some cases that I found problematic.

      Hopefully the revised manuscript improves writing clarity. 

      Although this is an interesting idea, it clearly cannot explain the apparent influence of seasonal and clinal variation on inversion frequencies.

      We do not believe that our model predicts a non-existence of temporal and spatial dependence of the fitness of inverted haplotypes, nor do we seek to identify the manner in which seasonal and clinal differences affect fitness of inverted haplotypes. Rather, we argued that the influence of seasonal and clinal selection on inversions does not on its own predict the observed maintenance of inversions at low to intermediate frequencies across such a diverse geographic range, along with the higher frequencies of many derived inversions in more ancestral environments. 

      We might imagine that trade-offs between life history traits such as mate competition and survival should be universal across the range of an organism. But in practice, the fitness benefits and costs of a pleiotropic variant (or haplotype) may be heavily dependent on the environment. A harsh environment such as a temperate winter may both reduce the number of females that a male encounters (decreasing the benefit of display-enhancing variants) and also increase the likelihood that survival-costly variants lead to mortality (thus increasing their survival penalty). In light of such dynamics, our model would predict that equilibrium inversion frequencies should be spatially and temporally variable, in agreement with a number of empirical observations regarding D. melanogaster inversions.

      We have edited the introduction to emphasize that inversion frequencies vary temporally as well as seasonally, on lines 144ff. We also note relevant discussion of the potential interplay between the environment and trade-offs such as those we investigate, on lines 153-155.

      The simulations are highly specific and make very strong assumptions, which are not well-justified.

      We respond to all specific concerns expressed in the Recommendations For The Authors section below. We also note that we have made further clarifications throughout the text regarding the assumptions made in our analysis and their justification.  

      Reviewer #2 (Recommendations For The Authors):

      I think that the manuscript would greatly benefit from a major rewrite and probably also a reanalysis of the empirical data.

      In particular, a genome-wide analysis of differences in SNP frequencies between sexes and developmental stages would help the reader to appreciate that inversions are special.

      [moved up within this section for clarity] We are lacking a genomic null model-how often do the authors see similar allele frequency differences when looking at the entire genome? This could be easily done with whole genome Pool-Seq and would tell us whether inversions are really different from the genomic background. I think that this information would be essential given the many uncertainties about the statistical tests performed. 

      We expect that autosome-wide SNP frequencies will be heavily influenced by the frequencies of inversions, which occur on all four major autosomal chromosome arms. These inversions often show moderate disequilibrium with distant variants (e.g. Corbett-Detig & Hartl 2012).

      Furthermore, the limited number of haplotypes present, given that the paternal population was founded from 10 inbred lines, would further enhance associations between inversions and distant variants. Therefore, we do not expect that whole-genome Pool-Seq data would provide an appropriate empirical null distribution for frequency changes. Instead, we have generated appropriate null predictions by accounting for both sampling effects and experimental variance, and we have aimed to make this methodology clearer in the current draft. 

      Some basic questions:

      why start at a frequency of 50% (line 287)?

      Isn't it obvious that in this scenario strong alleles with sexually antagonistic effects can survive?

      The initial goal of the associated Figure 4 was not to show that a strongly antagonistic variant could persist. Instead, we wanted to test the linkage conditions in which a second, relatively weaker antagonistic variant survived – which did not occur in the absence of strong linkage. 

      We have now added simulations with relatively lower initial frequencies, in which the weaker variant and the inversion both start at 0.05 frequency, while the stronger variant is still initialized at 0.5 to reflect the initial presence of one balanced locus with a strongly antagonistic variant. Here, the weaker antagonistic variant is still usually maintained when it is close to the stronger variant, and while the inversion-mediated maintenance of the weaker variant at greater distance from the stronger variant because less frequent than the original investigated case, it still happens often enough to hypothetically allow for such outcomes over evolutionary time-scales.

      Still, we should also emphasize that the goals of this proof-of-concept analysis are to establish and convey some basic elements of our model. Subsequently, analyses such as those presented in Figures 5 and 6 provide clearer evidence that the hypothesized dynamics of inversions facilitating the accumulation of sexual antagonism actually occur in our simulations.

      The experiments seem to be conducted in replicate (which is of course essential), but I could not find a clear statement of how many replicates were done for each maternal line cross.

      How did the authors arrive at 16 binomial trials (line 473)? 4 inversions, 4 maternal genotypes?

      How were replicates dealt with?

      In Figure 9, it would be important to visualize the variation among replicates.

      Unfortunately, we did not have the bandwidth to perform replicates of each maternal line. Instead, we use four maternal backgrounds to simultaneously establish consistency across independent experiments and genetic backgrounds (see our response to Reviewer 1, point 7). We’ve edited the draft to make this clearer and more clearly delineate what is supported and not supported by our data. Replicate variation for the control replicates of the extraction and sequencing process, and the exact read counts of the experiment, are available in Supplemental Tables S5, S6, and S7.

      The statistical analysis of trade-off is not clear: which null model was tested? No frequency change? In my opinion, two significances are needed: a significant difference between parental and embryo and then embryo and adult offspring. The issue with this is, however, that the embryo data are used twice and an error in estimating the frequency of the embryos could be easily mistaken as antagonistic selection.

      Hopefully the description of our null model is clearer in the text, now starting around line 967 in the Methods. We are aware of the positive dependence when performing tests comparing the paternal to embryo and then embryo to offspring frequencies, and this is accounted for by our analysis strategy - see lines 1009-1012.

      It was not clear how the authors adjusted their chi-squared test expectations. Were they reinventing the wheel? There is an improved version of the chi-squared test, which accounts for sampling variation.

      We did not actually perform chi-square tests. Instead, we used the chi statistic from the chi-squared test as a quantitative summary of the differences in read counts between samples. We compared an observed value of chi to values for this statistic obtained from simulated replicates of the experiment. Sampling from this simulation generated our ‘expected’ distribution of read counts, sampled to match sources of variance introduced in the experimental procedure, but without any effect of natural selection, per lines 825ff in the original submission. Hence, we are approximating the likelihood of observing an empirical chi statistic by generating random draws from a model of the experiment and comparing values calculated from each draw to the experimental value: a Monte Carlo method of approximating a p-value for our data. We have attempted to make the structure of these simulations and their use as a null-model clearer in this draft.

      It is not sufficiently motivated why the authors model differences in the extraction procedure with a binomial distribution.

      Adding a source of variance here seemed necessary as running control sequencing replicates revealed that there was residual variance not fully recapitulated by sample-size-dependent resampling. Given that we were still sampling a number of draws from a binomial outcome (the read being from the inverted or standard arrangement), a binomial distribution seemed a reasonable model, and we fit the level of this additional noise source to an experiment-wide constant, read-count or genome-count independent parameter that best fit the variance observed in the controls (lines 830ff in the original draft). Clarification is made in this manuscript draft, lines 979-989.

      How many reads were obtained from each amplicon? It looks like the authors tried to mimic differences between technical replicates by a binomial distribution, which matches the noise for a given sample size, but this depends on the sequence coverage of the technical replicates.

      We provide read counts in Supplemental Tables S6 and S7. The relevant paragraph in the methods has been edited for clarity, lines 972ff. Accounting for sampling differences between replicates used a hypergeometric distribution for paternal samples to account for paternal mortality before collection, and the rest were resampled with a binomial distribution. There were two additional binomial samplings, to account for resampling the read counts and to capture further residual variance in the library prep that did not seem to depend on either allele or read counts.

      It would be good to see an estimate for the strength of selection: 10% difference in a single generation appears rather high to me.

      Estimates of selection strength based on solving for a Wright-Fisher selection coefficient for each tested comparison can now be found in Table S8, mentioned in text on lines 589-590. The mean magnitude of selection coefficients for all paternal to embryo comparisons was 0.322, and for embryo to all adult offspring it was 0.648. For In(3L)Ok the mean selection coefficients were 0.479 and -0.53, and for In(3R)K they were -0.189 and 1.28, respectively. Some are of quite large magnitude, but we emphasize that the coefficients for embryo to adult are based on survival to old age, rather than developmental viability. That factor, in addition to the laboratory environment, makes these estimates distinct from selection coefficients that might be experienced in natural populations.

      Reviewer #3 (Public Review):

      Strengths:

      (1) …the authors developed and used a new simulator (although it was not 100% clear as to why SLiM could not have been used as SLiM has been used to study inversions).

      Before SLiM 3.7 or so (and including when we did the bulk of our simulation work), we do not think it would have been feasible to use SLiM to model the mutation of inversions with random breakpoints and recombination between without altering the SLiM internals. Separately, needing to script custom selection, mutation, and recombination functions in Eidos would have slowed SLiM down significantly. Given our greater familiarity with python and numpy, and the ability to implement a similar efficiency simulator more quickly than through learning C++ and Eidos, we chose to write our own.

      It should be a fair bit easier to implement comparable simulations in SLiM now, but it will still require scripting custom mutation, selection, and recombination functions and would still result in a similarly slow runtime. The current script recipe recommended by SLiM for simulating inversions uses constants to specify the breakpoints of a single inversion, without the ability to draw multiple inversions from a mutational distribution, or model recombination between more complicated karyotypes. Hence, our simulator still seems to be a more versatile and functional option for the purposes of this study.

      Weaknesses:

      [Comments 1 through 4 on Weaknesses included numerous citation suggestions, and some discussion recommendations as well. In our revised manuscript, we have substantially implemented these suggestions. In particular, we have deepened our introduction of mechanisms of balancing selection and prior work on inversion polymorphism, integrating many

      suggested references. While especially helpful, these suggestions are too extensive to completely quote and respond to in this already-copious document. Therefore, we focus our response on two select topics from these comments, and then proceed to comment 5 thereafter.]

      (2) The general reduction principle and inversion polymorphism. In Section 1.2., the authors state that "there has not been a proposed mechanism whereby alleles at multiple linked loci would directly benefit from linkage and thereby maintain an associated inversion polymorphism under indirect selection." Perhaps I am misunderstanding something, but in my reading, this statement is factually incorrect. In fact, the simplest version of Dobzhansky's epistatic coadaptation model

      (see Charlesworth 1974; also see Charlesworth and Charlesworth 1973 and discussion in Charlesworth & Flatt 2021; Berdan et al. 2023) seems to be an example of exactly what the authors seem to have in mind here: two loci experiencing overdominance, with the double heterozygote possessing the highest fitness (i.,e., 2 loci under epistatic selection, inducing some degree of LD between these loci), with subsequent capture by an inversion; in such a situation, a new inversion might capture a haplotype that is present in excess of random expectation (and which is thus filer than average)…

      We agree that the quoted statement could be misleading and have rewritten it. We intended to point out that we are presenting a model in which all loci contribute additively (with respect to display) or multiplicatively (with respect to survival probability), without any dominance relationships or genetic interaction terms. And yet, the model generates epistatic balancing selection in a panmictic population under a constant environment. This represents a novel mechanism by which (the life-history characteristics of) a population would generate epistatic balancing selection as an emergent property, instead of assuming a priori that there is some balancing mechanism and representing frequency dependence, dominance effects, or epistatic interactions directly using model parameters. We have therefore refined the scope of the statement in question (lines 155-158). 

      (4) Hearn et al. 2022 on Littorina saxatilis snails. 

      A good reference. There is considerable work on ecotype-associated inversions in L. saxatalis, but we previously cut some discussion of this and of other populations with high gene flow but identifiable spatial structure for inversion-associated phenotypes (e.g. butterfly mimicry polymorphisms, Mimulus, etc.). Due to the spatially discrete environmental preferences and sampled ranges of the inversions in these populations, we considered these examples to be somewhat distinct from explaining inversion polymorphism in a potentially homogenous and panmictic environment. 

      (4) cont. A very interesting paper that may be worth discussing is Connallon & Chenoweth (2019) about dominance reversals of antagonistically selected alleles (even though C&C do not discuss inversions): AP alleles (with dominance reversals) affecting two or more life-history traits provide one example of such antagonistically selected alleles (also see Rose 1982, 1985; Curtsinger et al. 1994) and sexually antagonistically selected alleles provide another. The two are of course not necessarily mutually exclusive, thus making a conceptual connection to what the authors model here.

      We had removed a previously drafted discussion of dominance reversal for brevity’s sake, but this topic is once again represented in the updated draft of the manuscript with a short reference in the introduction, lines 76-80. We also mention ‘segregation lift’ (Wittmann et al. 2017) involving a similar reversal of dominance for fitness between temporally fluctuating conditions, as opposed to between sexes or life history stages. 

      (5) The model. In general, the description of the model and of the simulation results was somewhat hard to follow and vague. There are several aspects that could be improved:  [5](1) it would help the reader if the terminology and distinction of inverted vs. standard arrangements and of the three karyotypes would be used throughout, wherever appropriate.

      We have attempted to do so, using the suggested heterokaryotypic/homokaryotypic terminology.

      [5](2) The mention of haploid populations/situations and haploid loci (e.g., legend to Figure 1) is somewhat confusing: the mechanism modelled here, of course, requires suppressed recombination in the inversion/standard heterokaryotype; and thus, while it may make sense to speak of haplotypes, we're dealing with an inherently diploid situation. 

      While eukaryotes with haploid-dominant life history may still experience similar dynamics, we do expect that most male display competition is in diploid animals, and we are only simulating diploid fitnesses and experimenting with diploid Drosophila. We have tried to minimize the discussion of haploids in this draft.

      [5](3) The authors have a situation in mind where the 2 karyotypes (INV vs. STD) in the heterokaryotype carry distinct sets of loci in LD with each other, with one karyotype/haplotype carrying antagonistic variants favoring high male display success and with the other karyotype/haplotype carrying non-antagonistic alternative alleles at these loci and which favor survival. Thus, at each of the linked loci, we have antagonistic alleles and non-antagonistic alleles - however, the authors don't mention or discuss the degree of dominance of these alleles. The degree of dominance of the alleles could be an important consideration, and I found it curious that this was not mentioned (or, for that matter, examined). 

      In this study, our goal was to show that the investigated model could produce balanced and increasing antagonism without the need to invoke dominance. We think there would be a strong case for a follow-up study that more investigates how dominance and other variables impact the parameter space of balanced antagonism, but this goal is beyond our capacity to pursue in this initial study. We’ve added several lines clarifying the absence of dominance from our investigated models, and pointing out that dominance could modulate the predictions of these models (lines 211-213, 278-282).  

      [5](4) In many cases, the authors do not provide sufficient detail (in the main text and the main figures) about which parameter values they used for simulations; the same is true for the Materials & Methods section that describes the simulations. Conversely, when the text does mention specific values (e.g., 20N generations, 0.22-0.25M, etc.), little or no clear context or justification is being provided. 

      We have sought to clarify in this draft that 20N was chosen as an ample time frame to establish equilibrium levels and frequencies of genetic variation under neutrality. We present a time sequence in Figure 5, and these results indicate that that antagonism has stabilized in models without inversions or with higher recombination rates, whereas its rate of increase has slowed in a model with inversions and lower levels of crossing over. 

      The inversion breakpoints and the position of the locus with stronger antagonistic effects in Figure 4 were chosen arbitrarily for this simple proof of concept demonstration, with the intent that this locus was close to one breakpoint. Hopefully these and other parameters are clearer in the revised manuscript.

      [5](5) The authors sometimes refer to "inversion mutation(s)" - the meaning of this terminology is rather ambiguous.

      Edited, hopefully the wording is clearer now. The quoted phrase had uniformly referred to the origin of new inversions by a mutagenic process. 

      (6) Throughout the manuscript, especially in the description and the discussion of the model and simulations, a clearer conceptual distinction between initial "capture" and subsequent accumulation / "gain" of variants by an inversion should be made. This distinction is important in terms of understanding the initial establishment of an inversion polymorphism and its subsequent short- as well as long-term fate. For example, it is clear from the model/simulations that an inversion accumulates (sexually) antagonistic variants over time - but barely anything is said about the initial capture of such loci by a new inversion.

      We do not have a good method of assessing a transition between these two phases for the simulations in which both antagonistic alleles and inversions arise stochastically by a mutagenic process. However, we have tried to be clearer on the distinction in this draft: we have included simulations in Figure 4 with variants starting at lower frequencies, and we have tried to better contextualize the temporal trajectories in Figure 5 as (in part) modeling the accumulation of variants after such an origin.

      Reviewer #3 (Recommendations For The Authors):

      - In general: the whole paper is quite long, and I felt that many parts could be written more clearly and succinctly - the whole manuscript would benefit from shortening, polishing, and making the wording maximally precise. Especially the Introduction (> 8 pages) and Discussion (7.5 pages) sections are quite long, and the description of the model and model results was quite hard to follow.

      We have attempted to condense some portions of the manuscript, but inevitably added to others based on important reviewer suggestions. Regarding the length Introduction and Discussion, we are covering a lot of intellectual territory in this study, and we aim to make it accessible to readers with less prior familiarity. At this point, we have well over 100 citations – far more than a typical primary research paper – in part thanks to the relevant sources provided by this reviewer. We are therefore optimistic that our text will provide a valuable reference point for future studies. We have also made significant efforts to clarify the Results and Methods text in this draft without notably expanding these sections.

      - In general: the conceptual parts of the paper (introduction, discussion) could be better connected to previous work - this concerns e.g. the theoretical mechanisms of balancing selection that might be involved in maintaining inversions; the general, theoretical role of antagonistic pleiotropy (AP) and trade-offs in maintaining polymorphisms; previously made empirical connections between inversions and AP/trade-offs; previously made empirical connections between inversions and sexual antagonism.

      In the revised manuscript, we have improved the connection of these topics to prior work.

      - L3: "accumulate". A clearer distinction could be made, throughout, between initial capture of alleles/haplotypes by an inversion vs. subsequent gain.

      Please see point 6 in the response to the Public Review, above.

      - L29: I basically agree about the enigma, however, there are quite many empirical examples in D. melanogaster / D. pseudoobscura and other species where we do know something about the nature of selection involved, e.g., cases of NFDS, spatially and temporally varying selection, fitness trade-offs, etc.

      At least for our focal species, we have emphasized that geographic (and now temporal) associations have been found for some inversions. For the sake of length and focus, we probably should not go down the road of documenting each phenotypic association that has been reported for these inversions, or say too much about specific inversions found in other species. As indicated in our response to reviewer 2, some previously documented inversion-associated trade-offs may be compatible with the model presented here. However, we did locate and add to our Discussion one report of frequency-dependent selection on a D. melanogaster inversion (Nassar et al. 1973).

      - L43: it is actually rather unlikely, though not impossible, that new inversions are ever completely neutral (see the review by Berdan et al. 2023).

      This line was intended to convey that, in line with Said et al. 2018’s results, the structural alterations involved in common segregating inversions are not expected to contribute significantly to the phenotype and fitness (as indicated by lack of strong regulatory effects), and that their phenotypic consequences are instead due to linked variation. We have rewritten this passage to better communicate this point, now lines 44-52. Interpreting Section 2 and Figure 1 of Berdan et al. 2023, the linked variation may be what is in mind when saying that inversions are almost never neutral. We have also added a line referencing the expected linked variation of a new inversion (lines 49-52).

      - L51-73: I felt this overview should be more comprehensive. The model by Kirkpatrick & Barton (2016 ) is in many ways less generic than the one of Charlesworth (1974) which essentially represents one way of modeling Dobzhansky's epistatic coadaptation. Also, the AOD mechanism is perhaps given too much weight here as this mechanism is very unlikely to be able to explain the establishment of a balanced inversion polymorphism (see Charlesworth 2023 preprint on bioRxiv). NFDS, spatially varying selection and temporally varying selection (for all of which there is quite good empirical evidence) should all be mentioned here, including the classical study of Wright and Dobzhansky (1946) which found evidence for NFDS (also see Chevin et al. 2021 in Evol. Lett.)

      On reflection, we agree that we put too much emphasis on AOD and have edited the section to be more representative.

      - L57. Two earlier Dobzhansky references, about epistatic coadaptation, would be: Dobzhansky, T. (1949). Observations and experiments on natural selection in Drosophila. Hereditas, 35(S1), 210-224. hlps://doi.org/10.1111/j.1601-5223.1949.tb033 34.xM; Dobzhansky, T. (1950). Genetics of natural populations. XIX. Origin of heterosis through natural selection in populations of Drosophila pseudoobscura. Genetics, 35, 288-302.hlps://doi.org/10.1093/gene7cs/35.3.288 - In general, in the introduction, the classical chapter by Lemeunier and Aulard (1992) should be cited as the primary reference and most comprehensive review of D. melanogaster inversion polymorphisms.

      - L101: this is of course true, though there are some exceptions, such as In(3R)Mo.

      - L110: the papers by Knibb, the chapter by Lemeunier and Aulard (1992), and the meta-analysis of INV frequencies by Kapun & Flatt (2019) could be cited here as well.

      Citation suggestions integrated.

      - L123 and elsewhere: the common D. melanogaster inversions are old but perhaps not THAT old - if we take the Corbett-Detig & Hartl (2012) es7mates, then most of them do not really exceed an age of Ne generations, or at least not by much. I mean: yes, they are somewhat old but not super-old (cf. discussion in Andolfatto et al. 2001).

      Edited to curb any hyperbole. We agree that there are much more ancient polymorphisms in populations.

      - L133-135. This needs to be rewritten: this claim is incorrect, to my mind (Charlesworth 1974; also see Charlesworth and Charlesworth 1973; discussion in Charlesworth & Flatt 2021).

      Edited. See public review response (2).

      - L154: the example of inversion polymorphism is actually explicitly discussed in Altenberg's and Feldman's (1987) paper on the reduction principle.

      Edited to mention this. Inversions are also mentioned in Feldman et al. 1980, Feldman and Balkau 1973, Feldman 1972, and have been in discussion since the origins of the idea.

      - L162ff: see Connallon & Chenoweth (2019).

      Citation suggestion integrated, along with Cox & Calsbeek 2009 which seems more directly applicable, now line 185ff.

      - L169: why? There is much evidence for other important trade-offs in this system.

      Reworded.

      - L178-179: other studies have found that trade-offs/AP contribute to the maintenance of inversion polymorphisms, e.g. Mérot et al. 2020 and Betrán et al. 1998, etc.

      Added Betrán et al. 1998 - a good reference. Moved up mention of Mérot et al. 2020 from later in the text and directed readers to the Discussion, lines 202-205.

      - L198. "alternate inversion karyotypes" - you mean INV vs. STD? It would be good to adopt a maximally clear, uniform terminology throughout.

      Edited to communicate this better.

      - L215-217: this is a theoretically well-known result due to Hazel (1943); Dickerson (1955); Robertson (1955); e.g., see the discussion in the quantative genetics book by Roff (1997) or in the review of Flatt (2020).

      Citations integrated, now lines 232ff.

      - L223 and L245: "haploid" - somewhat confusing (see public review). 

      - L259-260: This may need some explanation. 

      - L261-262: simply state that there is no recombination in D. melanogaster males.

      Edited for increased clarity.

      - L274 (and elsewhere): the meaning of "mutation...of new..inversion polymorphisms" is ambiguous - do you mean a polymorphic inversion and hence a new inversion polymorphism or do you mean polymorphisms/variants accumulating in an inversion?

      - L275: maybe better heterokaryotypic instead of heterozygous? (note that INV homokaryotypes or STD homokaryotypes can be homo- or heterozygous, so when referring to chromosomal heterozygotes instead of heterozygous chromosomes it may be best to refer to heterokaryotypes).

      Per [5](1) and [5](5) in the public review, we have edited our terminology.

      - L276: referral to M&M - I found the description of the model/simulation details there to be somewhat vague, e.g. in terms of parameter settings, etc.

      Further described.

      - L281-282: would SLiM not have worked?

      See public review response.

      - L286-287: why these parameters?

      Further described.

      - L296ff: it is not immediately clear that the loci under consideration are polymorphic for antagonistic alleles vs. non-antagonistic alternative alleles - maybe this could be made clear very explicitly.

      Edited to be explicit as suggested.

      - L341, 343: "inversion mutation" - meaning ambiguous.

      - L348, 352: "specified rate" - vague.

      - L354-357: initial capture and/or accumulation/gain? 

      - L401, 402, 404: Z-, W- and Y- are brought up here without sufficient context/explanation.

      The above have been addressed by edits in the text.

      - L523, 557, 639, 646, and elsewhere: not the first evidence - see the paper by Mérot et al. (2020) (and e.g. also by Yifan Pei et al. (2023)). 

      Citations integrated in the introduction and discussion. Mérot et al. (2020) was cited (L486 in original) but discussion was curtailed in the previous draft. 

      - L558-559. I agree but it is clear that there are many mechanisms of balancing selection that can achieve this, at least in principle; for some of them (NFDS, etc.) we have pretty good evidence. 

      - L576-577. This is correct but for In(3R)C that study did find a differential hot vs. cold selection response.

      Addressed with text edit. 

      - L584-L586: cf. Betrán et al. (1998), Mérot et al. (2020), Pei et al. (2023), etc.

      - L591. "other forms of balancing selection": yes! This should be stressed throughout. Multiple forms of balancing selection exist and they are not mutually exclusive. 

      - L593: consider adding Dobzhansky (1943), Machado et al. (2021) 

      - L596-597: this is rather unlikely, at least in terms of inversion establishment (see Charlesworth 2023; hlps://www.biorxiv.org/content/10.1101/2023.10.16.562579v1).

      - L608: consider adding Kapun & Flal (2019). 

      - L611-612: see studies by Mukai & Yamaguchi, 1974; and Watanabe et al., 1976. 

      - L639, 646: AP - see general literature on AP as a factor in maintaining polymorphism (Rose

      1982, 1985; Curtsinger et al. 1994; Charlesworth & Hughes 2000 chapter in Lewontin Festschrift; Conallon & Chenoweth 2019 - this latter paper is par7cularly relevant in terms of AP effects in the context of sexual antagonism) 

      Citation suggestions integrated.

      - L657: inversion polymorphism is explicitly discussed in Altenberg's and Feldman's (1987) paper on the reduction principle.

      Hopefully this is better communicated.

      - L724-755: I felt that this section generally lacks sufficient details, especially in terms of parameter choices and settings for the simula7ons. 

      - L732L: why not state these rates?

      Parameter values are now given a fuller description in figure legends and in the methods.  

      - L746: but we know that mutational effect sizes are not uniformly distributed (?).

      We made this choice for simplicity and to avoid invoking seemingly arbitrary distribution, but one could instead simulate trait effects with some gamma distribution. Display values would still have variable fitness effects that fluctuate with population composition, but we agree that distribution shifted toward small effects would be more realistic.

      - L765: In(3R)P is not mentioned elsewhere - is this really correct?

      That was incorrect, fixed.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the effects of the explicit recognition of statistical structure and sleep consolidation on the transfer of learned structure to novel stimuli. The results show a striking dissociation in transfer ability between explicit and implicit learning of structure, finding that only explicit learners transfer structure immediately. Implicit learners, on the other hand, show an intriguing immediate structural interference effect (better learning of novel structure) followed by successful transfer only after a period of sleep.

      Strengths:

      This paper is very well written and motivated, and the data are presented clearly with a logical flow. There are several replications and control experiments and analyses that make the pattern of results very compelling. The results are novel and intriguing, providing important constraints on theories of consolidation. The discussion of relevant literature is thorough. In summary, this work makes an exciting and important contribution to the literature.

      Weaknesses:

      There have been several recent papers that have identified issues with alternative forced choice (AFC) tests as a method of assessing statistical learning (e.g. Isbilen et al. 2020, Cognitive Science). A key argument is that while statistical learning is typically implicit, AFC involves explicit deliberation and therefore does not match the learning process well. The use of AFC in this study thus leaves open the question of whether the AFC measure benefits the explicit learners in particular, given the congruence between knowledge and testing format, and whether, more generally, the results would have been different had the method of assessing generalization been implicit. Prior work has shown that explicit and implicit measures of statistical learning do not always produce the same results (eg. Kiai & Melloni, 2021, bioRxiv; Liu et al. 2023, Cognition).

      We agree that numerous papers in the Statistical Learning literature discuss how different test measures can lead to different results and, in principle, using a different measure could have led to varying results in our study. In addition, we believe there are numerous additional factors relevant to this issue including the dichotomous vs. continuous nature of implicit vs. explicit learning and the complexity of the interactions between the (degree of) explicitness of the participants' knowledge and the applied test method that transcend a simple labeling of tests as implicit or explicit and that strongly constrains the type of variations the results of  different test would produce. Therefore, running the same experiments with different learning measures in future studies could provide additional interesting data with potentially different results.

      However, the most important aspect of our reply concerning the reviewer's comment is that although quantitative differences between the learning rate of explicit and implicit learners are reported in our study, they are not of central importance to our interpretations. What is central are the different qualitative patterns of performance shown by the explicit and the implicit learners, i.e., the opposite directions of learning differences for “novel” and “same” structure pairs, which are seen in comparisons within the explicit group vs. within the implicit group and in the reported interaction. Following the reviewer's concern, any advantage an explicit participant might have in responding to 2AFC trials using “novel” structure pairs should also be present in the replies of 2AFC trials using the “same” structure pairs and this effect, at best, could modulate the overall magnitude of the across groups (Expl/Impl.) effect but not the relative magnitudes within one group. Therefore, we see no parsimonious reason to believe that any additional interaction between the explicitness level of participants and the chosen test type would impede our results and their interpretation. We will make a note of this argument in the revised manuscript.

      Given that the explicit/implicit classification was based on an exit survey, it is unclear when participants who are labeled "explicit" gained that explicit knowledge. This might have occurred during or after either of the sessions, which could impact the interpretation of the effects.

      We agree that this is a shortcoming of the current design, and obtaining the information about participants’ learning immediately after Phase 1 would have been preferred. However, we made this choice deliberately as the disadvantage of assessing the level of learning at the end of the experiment is far less damaging than the alternative of exposing the participants to the exit survey question earlier and thereby letting them achieve explicitness or influence their mindset otherwise through contemplating the survey questions before Phase 2. Our Experiment 5 shows how realistic this danger of unwanted influence is: with a single sentence alluding to pairs in the instructions of Exp 5, we  could completely change participants' quantitative performance and qualitative response pattern. Unfortunately, there is no implicit assessment of explicitness we could use in our experimental setup. We also note that given the cumulative nature of statistical learning, we expect that the effect of using an exit survey for this assessment only shifts absolute magnitudes (i.e. the fraction of people who would fall into the explicit vs. implicit groups) but not aspects of the results that would influence our conclusions.

      Reviewer #2 (Public Review):

      Summary:

      Sleep has not only been shown to support the strengthening of memory traces but also their transformation. A special form of such transformation is the abstraction of general rules from the presentation of individual exemplars. The current work used large online experiments with hundreds of participants to shed further light on this question. In the training phase, participants saw composite items (scenes) that were made up of pairs of spatially coupled (i.e., they were next to each other) abstract shapes. In the initial training, they saw scenes made up of six horizontally structured pairs, and in the second training phase, which took place after a retention phase (2 min awake, 12 h incl. sleep, 12 h only wake, 24 h incl.

      sleep), they saw pairs that were horizontally or vertically coupled. After the second training phase, a two-alternatives-forced-choice (2-AFC) paradigm, where participants had to identify true pairs versus randomly assembled foils, was used to measure the performance of all pairs. Finally, participants were asked five questions to identify, if they had insight into the pair structure, and post-hoc groups were assigned based on this. Mainly the authors find that participants in the 2-minute retention experiment without explicit knowledge of the task structure were at chance level performance for the same structure in the second training phase, but had above chance performance for the vertical structure. The opposite was true for both sleep conditions. In the 12 h wake condition these participants showed no ability to discriminate the pairs from the second training phase at all.

      Strengths:

      All in all, the study was performed to a high standard and the sample size in the implicit condition was large enough to draw robust conclusions. The authors make several important statistical comparisons and also report an interesting resampling approach. There is also a lot of supplemental data regarding robustness.

      Weaknesses:

      My main concern regards the small sample size in the explicit group and the lack of experimental control.  

      The sample sizes of the explicit participants in our experiments are, indeed, much smaller than those of the implicit participants due to the process of how we obtain the members of the two groups. However, these sample sizes of the explicit groups are not small at all compared to typical experiments reported in Visual Statistical Learning studies, rather they tend to be average to large sizes. It is the sizes of the implicit subgroups that are unusually high due to the aforementioned data collecting process. Moreover, the explicit subgroups have significantly larger effect sizes than the implicit subgroup, bolstering the achieved power that is also confirmed by the reported Bayes Factors that support the “effect” or the “no effect” conclusions in the various tests ranging in value from substantial to very strong.  Based on these statistical measures,  we think the sample sizes of the explicit participants in our studies are adequate.

      However, we do agree that the unbalanced nature of the sample and effect sizes can be problematic for the between-group comparisons. We aim to replace the student’s t-tests that directly compares explicit and implicit participants with Welch’s t-tests that are better suited for unequal sample sizes and variances.

      As for the lack of experimental control, indeed, we could not fully randomize consolidation condition assignment. Instead, the assignment was a product of when the study was made available on the online platform Prolific. This method could, in theory, lead to an unobserved covariate, such as morningness, being unbalanced between conditions. We do not have any reasons to believe that such a condition would critically alter the effects reported in our study, but as it follows from the nature of unobserved variables, we obviously cannot state this with certainty. Therefore, we will explicitly discuss these potential pitfalls in the revised version of the manuscript.  

      Reviewer #3 (Public Review):

      In this project, Garber and Fiser examined how the structure of incidentally learned regularities influences subsequent learning of regularities, that either have the same structure or a different one. Over a series of six online experiments, it was found that the structure (spatial arrangement) of the first set of regularities affected the learning of the second set, indicating that it has indeed been abstracted away from the specific items that have been learned. The effect was found to depend on the explicitness of the original learning: Participants who noticed regularities in the stimuli were better at learning subsequent regularities of the same structure than of a different one. On the other hand, participants whose learning was only implicit had an opposite pattern: they were better in learning regularities of a novel structure than of the same one. This opposite effect was reversed and came to match the pattern of the explicit group when an overnight sleep separated the first and second learning phases, suggesting that the abstraction and transfer in the implicit case were aided by memory consolidation.

      These results are interesting and can bridge several open gaps between different areas of study in learning and memory. However, I feel that a few issues in the manuscript need addressing for the results to be completely convincing:

      (1) The reported studies have a wonderful and complex design. The complexity is warranted, as it aims to address several questions at once, and the data is robust enough to support such an endeavor. However, this work would benefit from more statistical rigor. First, the authors base their results on multiple t-tests conducted on different variables in the data. Analysis of a complex design should begin with a large model incorporating all variables of interest. Only then, significant findings would warrant further follow-up investigation into simple effects (e.g., first find an interaction effect between group and novelty, and only then dive into what drives that interaction). Furthermore, regardless of the statistical strategy used, a correction for multiple comparisons is needed here. Otherwise, it is hard to be convinced that none of these effects are spurious. Last, there is considerable variation in sample size between experiments. As the authors have conducted a power analysis, it would be good to report that information per each experiment, so readers know what power to expect in each.

      Answering the questions we were interested in required us to investigate two related but separate types of effects within our data: general above-chance performance in learning, and within- and across-group differences.

      Above-chance performance: As typical in SL studies, we needed to assess whether learning happened at all and which types of items were learned. For this, a comparison to the chance level is crucial and, therefore, one-sample t-test is the statistical test of choice. Note that all our t-tests were subject to experiment-wise correction for multiple comparisons using the Holm-Bonferroni procedure, as reported in the Supplementary Materials.

      Within- and across-group differences: To obtain our results regarding group and partype differences and their interactions, we used mixed ANOVAs and appropriate post-hoc tests as the reviewer suggested. These results are reported in the method section.

      Concerning power analysis, we will add the requested information on achieved power by experiment to the revised version of the manuscript.  

      (2) Some methodological details in this manuscript I found murky, which makes it hard to interpret results. For example, the secondary results section of Exp1 (under Methods) states that phase 2 foils for one structure were made of items of the other structure. This is an important detail, as it may make testing in phase 2 easier, and tie learning of one structure to the other. As a result, the authors infer a "consistency effect", and only 8 test trials are said to be used in all subsequent analyses of all experiments. I found the details, interpretation, and decision in this paragraph to lack sufficient detail, justification, and visibility. I could not find either of these important design and analysis decisions reflected in the main text of the manuscript or in the design figure. I would also expect to see a report of results when using all the data as originally planned.  

      We thank the reviewer for pointing out these critical open questions our manuscript that need further clarification. The inferred “consistency effect” is based on patterns found in the data, which show an increase in negative correlation between test types during the test phase. As this is apparently an effect of the design of the test phase and not an effect of the training phase, which we were interested in, we decided to minimize this effect as far as possible by focusing on the early test trials. For the revised version of the manuscript, we will revamp and expand how this issue was handled and also add a short comment in the main text, mentioning the use of only a subset of test trials and pointing the interested reader to the details.

      Similarly, the matched sample analysis is a great addition, but details are missing. Most importantly, it was not clear to me why the same matching method should be used for all experiments instead of choosing the best matching subgroup (regardless of how it was arrived at), and why the nearest-neighbor method with replacement was chosen, as it is not evident from the numbers in Supplementary Table 1 that it was indeed the best-performing method overall. Such omissions hinder interpreting the work.

      Since our approach provided four different balanced metrics (see Supp. Tables 1-4) for each matching method, it is not completely straightforward to make a principled decision across the methods. In addition, selecting the best method for each experiment separately carries the suspicion of cherry-picking the most suitable results for our purposes. For the revised version, we will expand on our description of the matching and decision process and add additional descriptive plots showing what our data looks like under each matching method for each experiment. These plots highlight that the matching techniques produce qualitatively roughly identical results and picking one of them over the other does not alter the conclusions of the test.  The plots will give the interested reader all the necessary information to assess the extent our design decisions influence our results.

      (3) To me, the most surprising result in this work relates to the performance of implicit participants when phase 2 followed phase 1 almost immediately (Experiment 1 and Supplementary Experiment 1). These participants had a deficit in learning the same structure but a benefit in learning the novel one. The first part is easier to reconcile, as primacy effects have been reported in statistical learning literature, and so new learning in this second phase could be expected to be worse. However, a simultaneous benefit in learning pairs of a new structure ("structural novelty effect") is harder to explain, and I could not find a satisfactory explanation in the manuscript.  

      Although we might not have worded it clearly, we do not claim that our "structural novelty effect" comes from a “benefit” in learning pairs of the novel structure. Rather, we used the term “interference” and lack of this interference. In other words, we believe that one possible explanation is that there is no actual benefit for learning pairs of the novel structure but simply unhindered learning for pairs of the novel structure and simultaneous inference for learning pairs of the same structure. Stronger interference for the same compared to the novel structure items seems as a reasonable interpretation as similarity-based interference is well established in the general (not SL-specific) literature under the label of proactive interference. We will clarify these ideas in the revised manuscript.

      After possible design and statistical confounds (my previous comments) are ruled out, a deeper treatment of this finding would be warranted, both empirically (e.g., do explicit participants collapse across Experiments 1 and Supplementary Experiment 1 show the same effect?) and theoretically (e.g., why would this phenomenon be unique only to implicit learning, and why would it dissipate after a long awake break?).

      Across all experiments, the explicit participants showed the same pattern of results but no significant difference between pair types, probably due to insufficiency of the available  sample sizes. We already included in the main text the collapsed explicit results across Experiments 1-4 and Supplementary Experiment 1 (p. 16).  This analysis confirmed that, indeed, there was a significant generalization for explicit participants across the two learning phases. We could re-run the same analysis for only Experiment 1 and

      Supplementary Experiment 1, but due to the small sample of  N=12 in Suppl. Exp. 1, this test will be likely completely underpowered. Obtaining the sufficient sample size for this one test would require an excessive number (several hundreds) of new participants.  

      In terms of theoretical treatment, we already presented our interpretation of our results in the discussion section, which we can expand on in the revised manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to Reviewers

      We would like to thank all the reviewers for their thorough reading and helpful comments. Below, please find our point-by-point response. The reviewer comments received through ReviewCommons have not been altered except for formatting.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors extended the existing recombination-induced tag exchange (RITE) technology to show that they can image a subset of NPCs, improving signal-to-noise ratios for live cell imaging in yeast, and to track the stability or dynamics of specific nuclear pore proteins across multiple cell divisions. Further, the authors use this technology to show that the nuclear basket proteins Mlp1, Mlp2 and Pml39 are stably associated with "old NPCs" through multiple cell cycles. The authors show that the presence of Mlp1 in these "old NPCs" correlates with exclusion of Mlp1-positive NPCs from the nucleolar territory. A surprising result is that basket-less NPCs can be excluded from the non-nucleolar region, an observation that correlates with the presence of Nup2 on the NPC regardless of maturation state of the NPC. In support of the proposal that retention of NPCs via Mlp1 and Nup2 in non-nucleolar regions, simulation data is presented to suggest that basket-less NPCs diffuse faster in the plane of the nuclear envelope.

      However, there are some points that do need addressing:

      Major Points 1. Taking into account that the Nup2 result in Figure 4B forms the basis for one half of the proposed model in Figure 6 regarding the exclusion of NPCs from the nucleolar region of the NE, there is a relatively small amount of data in support of this finding and this proposed model. For example, the only data for Nup2 in the manuscript is a column chart in Figure 4B with no supporting fluorescence microscopy examples for any Nup2 deletion. Further, the Nup60 deletion mutant will have zero basket-containing NPCs, whereas the Nup2 deletion will be a mixture of basket-containing and basket-less NPCs. The only support for the localization of basket-containing NPCs in the Nup2 deletion mutant is through a reference "Since Mlp1-positive NPCs remain excluded from the nucleolar territory in nup2Δ cells (Galy et al., 2004), the homogenous distribution observed in this mutant must be caused predominantly by the redistribution of Mlp-negative NPCs into the nucleolar territory."

      As suggested by the reviewer, we have added fluorescence microscopy examples for the Nup2 deletion to new Figure 4D. In addition, we have added data on Nup1 as suggested by reviewer 3. Since we observed a significant effect on nucleolar NPC density also upon depletion of Nup1 (new Figure 4A), we have overall revised the text and model to now reflect the shared role of Nup1 and Nup2.

      We have also localized Mlp1-GFP in a nup2Δ background as well as in the Nup60ΔC background where Nup2 can no longer bind to the NPC. In both strains, Mlp1-containing NPCs remain excluded from the nucleolus as now shown in the new Figure 4E. Although we also observed partial Mlp1 mislocalization to a nuclear focus in the nup2Δ strain, such mislocalization was only minimal in the strain with the Nup2-binding domain in Nup60 deleted (nup60ΔC), supporting our conclusion that Nup2 contributes to nucleolar exclusion of NPCs independent of Mlp1. Similarly, Mlp1-positive NPCs remained excluded from the nucleolar territory in cells depleted of Nup1 (new Figure 4B).

      1. The authors could consider utilizing this opportunity to discuss their technological innovations in the context of the prior work of Onischenko et al., 2020. This work is referenced for the statement "RITE can be used to distinguish between old and new NPCs" Page 2, Line 43. However, it is not referenced for the statement "We constructed a RITE-cassette that allows the switch from a GFP-labelled protein to a new protein that is not fluorescently labelled (RITE(GFP-to-dark))" despite Onischenko et al., 2020 having already constructed a RITE-cassette for the GFP-to-dark transition. The authors could consider taking this opportunity to instead focus on their innovative approach to apply this technology to decrease the number of fluorescently-tagged NPCs by dilution across multiple cell divisions and to interpret this finding as a measure of the stability of nuclear pore proteins within the broader NPC.

      We apologize for this imprecise citation. We have modified the text to indicate that our RITE cassette was previously used in two publications. It now reads: "We used a RITE-cassette that allows the switch from a GFP-labelled protein to a new protein that is not fluorescently labelled (RITE(GFP-to-dark)) (Onischenko et al., 2020, Kralt et al., 2022)." Together with additional changes to the text throughout, we hope that our new manuscript version more clearly highlights the innovation of our approach relative to previous use cases.

      1. The authors could also consider taking this opportunity to discuss their results in the context of the Saccharomyces cerevisiae nuclear pore complex structures published e.g. in Kim et al., 2018, Akey et al., 2022, Akey et al., 2023 in which the arrangement of proteins in the nuclear basket is presented, and also work from the Kohler lab (Mészáros et al., 2015) on how the basket proteins are anchored to the NPC. There is additional literature that also might help provide some perspective to the findings in the current manuscript, such as the observation that a lesser amount of Mlp2 to Mlp1 observed is consistent with prior work (e.g. Kim et al., 2018) and that intranuclear Mlp1 foci are also formed after Mlp1 overexpression (Strambio-de-Castillia et al., 1999).

      Following the reviewer's suggestion, we extended our discussion of basket Nup stoichiometry and organization in the discussion section including most of the citations mentioned as well as the recent articles on the nuclear basket structure and organization (Stankunas & Köhler 2024 1038/s41556-024-01484-x, Singh et al. 2024 10.1016/j.cell.2024.07.020)

      Minor Points 1. What is the "lag time" of the doRITE switching? Do the authors believe that it is comparable to the approximate 1-hour timeframe following beta-estradiol induction as shown previously in Chen et al. Nucleic Acids Research, Volume 28, Issue 24, 15 December 2000, Page e108, https://doi.org/10.1093/nar/28.24.e108

      We thank the reviewer for suggesting we analyze the kinetics of RITE switching. We carried out quantitative real-time PCR on genomic DNA and found that the half-time of switching is below 20 min. The majority of the population is switched after 1 hour, similar to the results in Chen et al. This data is now included in Supplemental Figure 1A.

      1. The authors could consider a brief explanation of radial position (um) for the benefit of the reader, in Figures 1E (right panel) and 2B (right panel), perhaps using a diagram to make it easier to understand the X-axis (um).

      To address this, we have now included a diagram and refer to it in the figure legend and the text.

      1. In Figure 1G, would the authors consider changing the vertical axis title and the figure legend wording from "mean number of NPCs per cell" to "mean labeled NPC # per cell" to reflect that what is being characterized are the remaining GFP-bearing NPCs over time?

      Thank you for spotting this inaccuracy. We have changed the label to "mean # of labeled NPCs per cell".

      1. In Figure 2C, the magenta-labeled protein in the micrographs is not described in the figure or the legend.

      A description has been added in figure and legend.

      1. In Figure S2A, there is an arrow indicating a Nup159 focus, but this is not described in the figure legend, as is done in Figure 2C.

      A description has been added to the legend.

      1. In Figure S3C, the figure legend does not match the figure. Was this supposed to be designed like Figure 3C and is missing part of the figure? Or is the legend a typographical error?

      We apologize for this error and thank the reviewer for spotting it. The legend has been corrected (now Figure S4B).

      1. In Figure S4B, the spontaneously recombined RITE (GFP-to-dark) Nup133-V5 appears in the western blot as equally abundant to pre-recombined Nup133-V5-GFP. In the figure legend, this is explained as cells grown in synthetic media without selection to eliminate cells that have lost their resistance marker from the population. In Cheng et al. Nucleic Acids Res. 2000 Dec 15; 28(24): e108, Cre-EBD was not active in the absence of B-estradiol, despite galactose-induced Cre-EBD overexpression. Would the authors be able to comment further on the Cre-Lox RITE system in the manuscript?

      We note that also in the cited publication, cells are grown in the presence of selection to select (as stated in this publication) "against pre-excision events that occur because of low but measurable basal expression of the recombinase". Although the authors report that spontaneous recombination is reduced with the b-estradiol inducible system (compared to pGAL expression control of the recombinase only), they show negligible spontaneous recombination only within a two-hour time window. Indeed, we also observe low levels of uninduced recombination on a short timeframe, but occasional events can become significant in longer incubation times (e.g. overnight growth) in the absence of selection. It should be noted that in our system, Cre expression is continuously high (TDH3-promoter) and not controlled by an inducible GAL promoter. We have added the information about the promoter controlling Cre-expression in the methods section.

      1. In Figure 6, the authors may want to consider inverting the flow of the cartoon model to start from the wild type condition and apply the deletion mutations at each step to "arrive" at the mutant conditions, rather than starting with mutant conditions and "adding back" proteins.

      Following the suggestions of this reviewer as well as reviewer 3, we have modified our model to smore clearly represent the contributions of the different basket components.

      Reviewer #1 (Significance (Required)):

      Recent work has drawn attention to the fact that not all NPCs are structurally or functionally the same, even within a single cell. In this light, the work here from Zsok et al. is an important demonstration of the kind of methodologies that can shed light on the stability and functions of different subpopulations of NPCs. Altogether, these data are used to support an interesting and topical model for Nup2 and nuclear-basket driven retention of NPCs in non-nucleolar regions of the nuclear envelope.

      We thank the reviewer for this positive assessment of our work.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this study, Zsok et al. develop innovative methods to examine the dynamics of individual nuclear pore complexes (NPCs) at the nuclear envelope of budding yeast. The underlying premise is that with the emergence of biochemically distinct NPCs that co-exist in the same cell, there is a need to develop tools to functionally isolate and study them. For example, there is a pool of NPCs that lack the nuclear basket over the nucleolus. Although the nature of this exclusion has been investigated in the past, the authors take advantage of a modification of recombination induced tag exchange (RITE), the slow turnover of scaffold nups, the closed mitosis of budding yeast, and extensive high quality time lapse microscopy to ultimately monitor the dynamics of individual NPCs over the nucleolus. By leveraging genetic knockout approaches and auxin-induced degradation with sophisticated quantitative and rigorous analyses, the authors conclude that there may be two mechanisms dependent on nuclear basket proteins that impact nucleolar exclusion. They also incorporate some computational simulations to help support their conclusions. Overall, the data are of the highest quality and are rigorously quantified, the manuscript is well written, accessible, and scholarly - the conclusions are thus on solid footing.

      We thank the reviewer for this assessment.

      Reviewer #2 (Significance (Required)):

      I have no concerns about the data or the conclusions in this manuscript. However, the significance is not overly clear as there is no major conceptual advance put forward, nor is there any new function suggested for the NPCs over nucleoli. As NPCs are immobile in metazoans, the significance may also be limited to a specialized audience.

      We respectfully disagree with this assessment. First, our work demonstrates the use of a novel approach in the application of RITE that can be useful for other researchers in the field of NPC biology and beyond. For example, doRITE could be applied to study the properties of aged NPCs, an area of considerable interest due to links between the NPC and age-related neurodegenerative diseases.

      Second, we characterize the interaction between conserved nuclear components, the NPC, the nucleolus and chromatin. While the specific architecture of the nucleus varies between species, many of these interactions are conserved. For example, Nup2's homologue Nup50 also interacts with chromatin in other systems, including mammalian cells, and thus may contribute to regulating the interplay between the nuclear basket and adjoining chromatin. This adds to our understanding of the multiple pathways and interactions that contribute to nuclear organization. Therefore, although the depletion of NPCs from the nucleolar territory in budding yeast may not be of direct importance, understanding the relationships between NPCs and their environment provide insight about nuclear organization throughout different eukaryotic lineages.

      In the revised manuscript, we attempt to better highlight and discuss these aspects.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript of Zsok et al. describes the role of nuclear basket proteins in the distribution and mobility of nuclear pore complexes in budding yeast. In particular, the authors showed that the doRITE approach can be used for the analysis of stable and dynamically associated NUPs. Moreover, it can distinguish individual NUPs and follow the inheritance of individual NPCs from mother to daughter cells. The author's findings highlight that Mlp1, Mlp2, and Pml39 are stably associated with the nuclear pore; deletion of Mlp1-Mlp2 and Nup60 leads to the higher NPC density in the nucleolar territory; and NPCs exhibit increased mobility in the absence of the nuclear basket components.

      The manuscript contains most figures supporting the data, and data supports the conclusions. However, authors need to include better explanations for figures in the text and figure legends. Lack of detailed explanation can pose challenges for non-experts. In addition, the authors jump over figures and shuffle them through the manuscript, which disrupts the flow and coherence of the manuscript.

      We thank the reviewer for pointing this out. In response to the detailed comments given below, we have moved some figures and added more explicit explanations to the text to improve the flow and make it easier to follow. In addition, we have modified the figure legends throughout the manuscript to make them more accessible to the reader.

      Major comments: - The nuclear basket contains Nup1, Nup2, Nup60, Mlp1, and Mlp2 in yeast. Nup60 works as a seed for Mlp1/Mlp2 and Nup2 recruitment and plays a key role in the assembly of nuclear pore basket scaffold (PMID: 35148185). Logically, the authors focused primarily on Nup60 in the current manuscript. However, NUP153 has another ortholog of yeast - Nup1, which has not been studied in this work. I recommend adjusting the title of the manuscript to: Nup60 and Mlp1/Mlp2 regulate the distribution and mobility of nuclear pore complexes in budding yeast. I also suggest discussing why work on Nup1 was not included/performed in the manuscript.

      We thank the reviewer for suggesting we should test the role of Nup1. Although we had originally not considered it, since we were focusing on the interactors of Mlp1/2, we found that indeed Nup1 also contributes to nucleolar exclusion. We have therefore changed the title to "Nuclear basket proteins regulate the distribution and mobility of nuclear pore complexes in budding yeast".

      • Figure 2B: I suggest choosing a more representative image for Pml39. It looks not like a stable component but rather dynamic as NUP60 or Gle1 based on figure showed in Figure 2B.

      We thank the reviewer for pointing out this poor choice of panel. We selected a panel for the 14h timepoint that more clearly shows that individual foci can still be seen for Pml39 after this time. Due to its lower copy number, the foci are dimmer for Pml39 than the other stable Nups. Nevertheless, at both the 11 and 14 h timepoint, clear dots can be detected for Pml39, while e.g. Nup116 in the same figure exhibits a more distributed signal and the signal for Nup60 and Gle1 is no longer visible.

      • Depletion of AID-tagged proteins needs to be supported by Western blot analysis with protein-specific antibodies, and PCR results should be included in supplementary data to demonstrate the homozygosity of the strains.

      The correct genomic tagging of the depleted proteins by AID was confirmed by PCR. We include this PCR analysis for the reviewer below. Since we are working with haploid yeast cells, all strains only carry a single copy of the genes. Unfortunately, we do not have protein-specific antibodies against the depleted proteins. However, other phenotypes support the successful depletion of the protein: Mlp1-mislocalization upon Nup60 depletion, reduced transcript production in Pol II depletion (characterized previously: PMID: 31753862, PMID: 36220102), growth defect upon Nup1 depletion.

      • Figure 5B: Snapshots of images from the movie are required. There are no images, only quantifications.

      We have replaced the supplemental movie with a movie showing the detection by Trackmate as well as overlaid tracks. As requested, a snapshot of this movie was inserted in figure 5B. We have also moved the example tracks from the supplement to the main figure. Furthermore, we will deposit the tracking dataset in the ETH Research Collection to make it available to the community.

      Description of figure legends is more technical than supporting/explaining the figure. For example, below my suggestions for Figure 1D. Please, consider more detailed explanation for other figures. (D) Left: Schematic of the RITE cassette. NUP of interest is tagged with V5 tag and eGFP fluorescent protein where LoxP sites flank eGFP. Before the beta-estradiol-induced recombination, the old NPCs are marked with eGFP signal, whereas new NPCs lack an eGFP signal after the recombination. ORF: open reading frame; V5: V5-tag; loxP: loxP recombination site; eGFP: enhanced green fluorescent protein. Right: doRITE assay schematic of stable or dynamic Nup behavior over cell divisions in yeast after the recombination.

      We have modified the figure legends throughout the manuscript to make them more explanatory and helpful for the reader.

      In addition, I recommend highlighting the result in the title of the figures. Please, re-consider titles for Figure S3.

      We have split this figure to better group related results. The new figures S4 and S5 are entitled: " A RITE(dark-to-GFP) cassette to visualize newly assembled NPC. " and "Mlp1 truncations localize predominantly to non-nucleolar NPCs."

      Minor: P.1 Line 31. Extra period symbol before the "(Figure 1A)".

      Fixed

      P.2 Line 10. Inconsistent writing of PML39 and MLP1. Both genes are capitalized. The same for P.4 Line 16. In some cases all letters are capitalized in other only the first one.

      We are following the official yeast gene nomenclature by spelling gene names in italicized capitals and protein names with only the first letter capitalized. We are sorry that this can be confusing for readers more familiar with other model systems.

      P.2 Line 18-22. The sentence is too long and hard to read. I recommend splitting it into two sentences.

      We agree and have fixed this.

      P.2-3 Line 46-47. The sentence is unclear. Suggestion: We expected that successive cell divisions would dilute the signal of labelled and stably associated with the NPC nucleoporins. By contrast, ...

      We have modified the sentence to read: "When tagging a Nup that stably associates with the NPC, we expected that successive cell divisions would dilute labelled NPCs by inheritance to both mother and daughter cells leading to a low density of labelled NPCs. By contrast,..."

      P.4 Line 17-21. Please, consider adding extra information and clarifying lines 19-21. For example, in Line 19 Figure 2B you can add that the reader needs to compare row 1 and row 4.

      Thank you, we have fixed this as suggested.

      P. 5 Line 15. When a number begins a sentence, that number should always be spelled out. You can pe-phrase the sentence to avoid it. Also, I recommend adding an explanation/hypothesis of why new NPCs are less frequently detected in nucleolar territory.

      We have formatted the text. Interestingly, new NPCs are more frequently detected in the nucleolar territory than old NPCs. We have reformulated this section to make it clearer, also in response to the next comment.

      P.5 Line 17-22. I recommend re-phrasing these two sentences. Logically, it is clear that Mlp1/Mlp2 loss mimics "old NPCs" to look more like "new NPCs", and for that reason, they are more frequently included in the nucleolar territory, but it is not clear when you read these two sentences from the first time.

      We have reformulated this section to make it clearer.

      P6. Line 16. No figure supporting data on graph (Figure 3B).

      We have added fluorescent images of the nup2Δ strain to the figure (new Figure 4D).

      P.7 Line 10-13. The sentence is unclear.

      We have shortened the sentence and moved part of the content to the discussion in the next paragraph.

      P.13,14 etc. If 0h timepoint has been used for normalization, why is it present on the graph?

      The 0h timepoint is shown for comparison and to illustrate the standard deviation in the data.

      P.15. Line 32-33. There is no image here. Potentially wrong description of the figure.

      Thank you for spotting this. This was fixed (new Figure S4B).

      Figures: - Inconsistent labeling of figures. For example, Fig.1, Fig.1S, Figure 2 etc.

      Thank you, this has been corrected.

      • Inconsistent labeling of figures. For example, Fig.1 G "mean number of NPCs per cell" - no capitalization of the first letter. Fig.1S "Fraction in population" is capitalized. In general, titles of axis should be capitalized.

      Thank you for spotting this. This was fixed.

      Suggestions for Figure 1D and Figure 6 are attached as a separate file.

      We thank the reviewer for their suggestions to improve these figures. We have taken their recommendation and revised the figures accordingly (see also response to reviewer 1, minor point 8).

      Reviewer #3 (Significance (Required)):

      Zsok et al. used the recombination-induced tag exchange (RITE) approach, which is an interesting and powerful method to follow individual NUPs over time with respect to their localization and abundance. This approach has been used before in PMID: 36515990 to distinguish pre-existing and newly synthesized Nup2 populations and has been extended to other basket NUPs in this work. Using this method, the authors support the earlier data on basket nucleoporins and highlight new insights on how basket nucleoporins regulate NPCs distribution and mobility. Overall, the manuscript provides new details on the stability of nucleoporins in yeast and how these data align with the mass spectrometry and FRAP data performed earlier in other studies. The limitation of this study is the absence of data on Nup1. It was unclear why these data were not present. Additional data can be included on the dynamics of Pml39, for example, using the FRAP method. The dynamic of Pml39 at the pore was shown only using the doRITE method.

      As suggested, we have tested the role of Nup1 (see above).

      Unfortunately, we are not able to provide orthologous data for the dynamics of Pml39. As we discuss in the manuscript, FRAP is not suitable for the analysis of the dynamics of most nucleoporins in yeast due to the high lateral mobility of NPCs in the nuclear envelope and has previously generated misleading results for Mlp1. Furthermore, the low expression levels of Pml39 will make it difficult to obtain reliable FRAP curves for this protein. We therefore do not think that adding FRAP experiments with Pml39 will provide valuable insight.

      However, in addition to the Pml39 doRITE result itself, our observation that the Pml39-dependent pool of Mlp1 exhibits stable association with the NPC supports the interpretation of Pml39 as a stable protein as well.

      In general, this study represents a unique research study of basic research on nuclear pore proteins that will be of general interest to the nuclear transport field.

      Field of expertise: nuclear-cytoplasmic transport, nuclear pore, inducible protein degradation. I do not have sufficient expertise in ExTrack.

    1. Poorly supported claims may be true, but without good reasons to accept those claims, a person’s support of them is irrational. In philosophy, we want to understand and evaluate the reasons for a claim. Just as a house that is built without a solid foundation will rapidly deteriorate and eventually fall, the philosopher who accepts claims without good reasons is likely to hold a system of beliefs that will crumble.

      I think this is vital information as it can be applied to everyday lfie too. Without evidence, claims and reasonings are very poor and therefore lacks external validity. Evidence aids reliability and trusting of the intial source.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility, and clarity

      Singh et al. analyze the expression and putative contribution of TEs in CD4+ T cells in HIV elite controllers. Through re-analysis of existing datasets, the authors describe broad differences in expression of TEs in ECs through analysis of RNA-seq and ATAC-seq data and come up with convincing examples where differentially-expressed innate immune genes correlate with increased accessibility of proximal TEs. Overall, the authors' conclusions are appropriately measured, though the manuscript text should be re-organized for clarity and a few further analyses are needed to support the main message of the paper.

      Major comments

      The manuscript would benefit from a re-organization of the figures to focus on TEs - in particular, Fig 1B, Fig 2, and Fig 3 reproduce known transcriptional differences between ECs and HCs and serve as quality controls for the authors' computational analysis. Conversely, Supplementary Fig 6 contains very interesting data on KZNF expression and should be included in the main figures.

      Authors: Thank you for the suggestion. We agree that Figure S6 should be featured more prominently in the manuscript. Accordingly, we have now incorporated it into the main text as Figure 6. The TE-KZNF correlation plots, previously Figure 5C, have been relocated to this new figure to provide a cohesive presentation of all KZNF-related data within the same figure.

      We’ve chosen to keep Figures 1B, 2, and 3 in their original places. We contend that they provide a foundational view of transcriptional variances in gene expression between patient groups, encompassing both previously identified and novel DEGs, which we believe warrants their placement in the main text. Furthermore, they serve as robust quality control measures for subsequent TE-centric transcriptional analyses. Given that there is no limitation in the number of figures in Genome Biology articles, we think it’s adequate to retain them as main figures.

      It remains unclear whether differences in TE expression described are specific to ECs or to EC-like CD4+ T cell states. As there are plenty of datasets available that compare the transcriptome of naïve, activated, exhausted, and regulatory CD4+ T cells, the authors should compare the TE expression patterns observed in ECs to activated CD4+ T cells, particularly those with a Th1 and cytotoxic phenotype analogous to those observed in ECs, from healthy donors.

      Authors: We thank the reviewer for this constructive suggestion to further study the foundations of HIV-1 elite control. In our initial study, we demonstrate that PBMCs from elite controllers (ECs) exhibit a heightened proportion of activated CD4+ T cells compared to PBMCs of healthy controls (HCs) and a heightened proportion of macrophages, naïve CD4+ T cells, and NK cells compared to PBMCs of treatment-naïve viremic progressors (VPs) (Figure 2D). Additionally, through clustering analysis of deconvoluted CD4+ T cell samples from elite controllers, we ascertain that the clustering pattern is not predicated on the CD4+ T cell subtype (Figure 3B). To further explore the reviewer’s inquiry, we compared the TE expression profile of ECs with that of unstimulated and stimulated CD4+ T cell subsets from HCs (data source: PMID 31570894), integrated into the revised manuscript as Figure S3B.

      “Unsupervised clustering of these samples shows that the TE expression pattern of ECs is most similar to that of Th2 progenitor cells, which are associated with HIV-1-specific adaptive immune responses (61). Still, we observed that, for the majority of families, TE expression was higher on average in all EC CD4+ T cell subsets than in CD4+ T cell subsets from HCs, regardless of stimulation (Figure S3B). While a subset of TE families exhibited an expression pattern in ECs similar to that of activated CD4+ T cells of HCs (e.g., high expression of L1s and THE1B), multiple TE families appear to be upregulated in an EC-specific way (e.g., LTR12C and LTR7). Together, these findings underscore the unique immune cell composition, transcriptome, and retrotranscriptome of ECs.” [pg.13-14, L226-235]

      While these observations are interesting, pursuing this question further falls beyond the scope of our study, as we note in the Discussion of the revised manuscript. We believe the reviewer’s inquiry pertains to a distinct research question, namely whether the potential for elite control of HIV-1 infection manifests as a detectable phenotype pre-infection within healthy CD4 T cell subsets (i.e., EC-like CD4+ T cell states) or is a unique phenotype that emerges solely after HIV-1 infection.

      “Another outstanding question is whether the gene and TE signatures revealed by our analysis of ECs exist in the general population independent of HIV-1 infection or if they are driven by the initial infection. While this inquiry is beyond the scope of this study, we have presented here evidence of common TE signatures between EC CD4+ T cells and Th2 progenitors from HCs (Figure S3B) and established that ECs possess a unique CD4+ T cell retrotranscriptome with potential implications for natural HIV-1 control. Future studies designed to assess elite control prediction should explore whether these TE profiles can serve as predictive variables for whether an individual displays enhanced viral control.” [pg. 38, L663-671]

      Therefore, while we appreciate the reviewer's suggestion and offer the addition of these preliminary findings, we believe that further investigation would be better suited for future studies specifically designed to address that question. Our manuscript aims to provide insight into the retrotranscriptome dynamics in ECs and their potential implications for natural HIV-1 control.

      In Fig 1, the authors demonstrate differential expression of both innate immune genes and TEs, but the link between the two is unclear. Is there any enrichment in differential expression for TEs located proximal to innate immune genes? This type of analysis should be possible using the authors' own software to map TE expression to specific genomic loci.

      __Authors: __Thank you for this excellent question. To answer this inquiry, we used the paired ATAC-seq and RNA-seq datasets for from ECs and HCs (used in Figures 1 and 4) to produce a new list of TE-gene pairs on which we could perform gene set enrichment analysis, the results of which have been integrated into the revised manuscript as Figure 4A.

      “We used paired ATAC-seq – which measures chromatin accessibility – and RNA-seq datasets for ECs (n=4) and HCs (n=4) to create a list of TE-gene pairs where the TE locus and gene show increased accessibility and expression, respectively, in ECs compared to HCs (Table S7, see Methods for details). These loci and genes were paired based on proximity, with a maximum distance of 10kb between the TE locus and the gene’s transcription start site, to increase the likelihood of a direct cis-regulatory influence of the TE over the nearby gene. Subsequent gene set enrichment analysis revealed that these genes were predominantly involved in cellular activation, cytokine production, and immune response regulation (Figure 4A). The enrichment for differential accessibility of TE loci near genes involved in these pathways suggests that the distinct TE landscape observed in ECs may contribute significantly to a unique immune regulome in these individuals.” [pg. 21, L357-368]

      Thus, we conclude that yes, there is an enrichment for immune-related genes with higher expression in ECs, proximal to differentially accessible TEs. We highlight six of these TE-gene pairs in Figure 4B-C. While we have high confidence in our analyses, future experimental validation is needed to confirm these regulatory relationships.

      Optional: In Fig 3, the authors cluster CD4+ T cells based on transcriptomic profiles. It would be interesting to re-cluster these samples based on TE expression alone, given the differences in TE expression described in Fig 5.

      __Authors: __Thank you for the suggestion. We agree that it would be valuable to assess how the EC clustering is altered when considering TE expression alone, as opposed to combining gene and TE family expression. To address this, we used the same graph-based k-nearest neighbors method to re-cluster the EC CD4+ T cell RNA-seq samples based only on locus-level TE expression, integrated into the revised manuscript as Figure S7.

      “To further explore locus-level expression patterns, we re-clustered the same EC samples (n=128) using only locus-level TE expression. This again resolved four EC clusters (Figure S7A), which interestingly appeared even more distinct than those identified by gene and TE family expression (Figure 3A). The TE locus-based clusters (TL-Cs) aligned well with the gene and TE family clusters (GT-Cs), with an average 70% overlap in samples between each GT-C and its corresponding TL-C (Figure S7B), indicating high consistency (Table S8). The remaining 30% of samples that shifted between clusters did so consistently within individuals, not cohorts, maintaining heterogeneous TL-C compositions similar to the GT-Cs (Figures S7C & S5A). An exception to this heterogeneity was TL-C4, comprising 22 samples from GT-C1 that were almost entirely from the CD4+ T cell subsets of only four participants in the Jiang cohort (Figure S7C, Table S8). No other samples from the Jiang cohort shifted to this cluster from other GT-Cs, suggesting that these patterns reflect individual variation rather than cohort bias. Like the GT-Cs, each TL-C included samples from all five CD4+ T cell subsets and was largely heterogeneous (Figure S7C). Notably, TL-C2 mirrored corresponding GT-C3 in its overrepresentation of EM and TM cells, while TL-C1 uniquely showed an overrepresentation of naïve CD4+ T cells. Beyond sample composition, each TL-C was characterized by a unique pattern of expressed TE loci (Figure S7D). These signatures were heterogeneous across families, with subsets of variable loci from one TE family marking separate clusters (Figure S7E), some of which did not reach the threshold of significance in earlier analyses when analyzed at the family-level, like SVA-D. Many families maintained their cluster-specific signatures, like THE1B (a marker of GT-C2), for which the majority of variable loci were found in corresponding TL-C1. However, some TE families, like the L1s that marked GT-C1, showed more heterogeneous signatures with variable loci marking multiple TL-Cs. These findings underscore the need for future locus-level investigations with high-depth sequencing to fully capture the complexity of TE expression.” [pg. 27-28, L462-488]

      We believe these findings not only validate the distinct clustering patterns observed but also highlight the potential of locus-level TE analysis to reveal additional layers of retrotranscriptomic diversity in EC CD4+ T cells.

      Significance

      The manuscript by Singh et al. describes for the first time the role of TEs in HIV elite controllers, suggesting that TEs may be co-opted for cis-regulatory function. This study builds off prior work demonstrating that HIV-infected CD4+ T cells activate LTR elements that may regulate the expression of interferon-inducible genes, demonstrating that ECs show further upregulation of innate immune genes. While these findings will need to be experimentally validated, this study constitutes a useful resource and adds to the growing body of evidence implicating TEs in cis-regulatory control of immune genes. This study will be of interest to basic scientists interested in genetic mechanisms of HIV control, and if further developed may comprise a useful source of biomarkers to predict viral kinetics in HIV-infected individuals. My expertise is in immunology, TE biology, and viral infection.

      Authors: We greatly appreciate this positive evaluation of our manuscript and recognition of its significance in uncovering novel evidence of TE co-option for immune regulatory function in HIV-1 elite control, as well as the suggestion of promising avenues for future research in this field.

      Reviewer #2

      Evidence, reproducibility and clarity

      The authors have re-analyzed published RNA-Seq data from CD4 T cells isolated from HIV elite controllers and reference cohorts, including HIV negative persons, viremic progressors and ART-treated persons. Their main finding is that in some of their comparisons, EC have higher levels of interferon-stimulated genes (ISG), paired with distinct expression patterns of transposable elements. The authors suggest that expression of transposable elements may induce altered expression of ISG, presumably due to immune recognition of TE. They also suggest that reduced expression of KZNF genes, which encode for transcription factors that can suppress TE, may be responsible for enhanced expression of TE. I have the following comments:

      1. All data included in this manuscript derive from previously published data. A new dataset, specifically designed to focus on a high-resolution analysis of TE expression, would be better suited to address the proposed questions.

      Authors: We agree that a new dataset tailored specifically for high-resolution analysis of TE expression would be optimal for addressing the proposed inquiries, and we emphasize this point in the Discussion of the revised manuscript.

      “We found that distinct sets of innate immunity genes and restriction factors are upregulated in different EC clusters even in the absence of active viremia, suggesting that elevated basal expression of these factors plays a previously underappreciated role in the EC phenotype. Further studies will be necessary to cement this idea and would especially benefit from the integration of single-cell omics to dissect TE regulation and clustering in deconvoluted CD4+ T cells of ECs. We also acknowledge that our study is limited by the small number of EC individuals with available omics data, which likely limited our ability to identify significant relationships between transcriptome clustering and available participant metadata (Figure S5). While the rarity of ECs in the seropositive population makes it challenging to study this phenotype, the transcriptomic heterogeneity revealed by our analyses underscores the need for surveying larger and more diverse EC cohorts.” [pg. 37-38, L651-662]

      Regrettably, we do not have access to elite controller samples (which are exceedingly rare), and as such the addition of a novel dataset was not feasible within the scope of this revision. Nevertheless, we assert that the publicly available sequencing data analyzed here is robust and suitable for locus- and family-level TE analysis. All sequencing runs were paired-end and of high depth, ensuring proper alignment to and high coverage of TEs at a locus-specific resolution. Additionally, we use in-house pipelines curated for TE analysis, to optimize the accuracy and quantity of TE-assigned reads (see Methods and our GitHub Repository for more details).

      Authors: We agree that a new dataset tailored specifically for high-resolution analysis of TE expression would be optimal for addressing the proposed inquiries, and we emphasize this point in the Discussion of the revised manuscript.

      “We found that distinct sets of innate immunity genes and restriction factors are upregulated in different EC clusters even in the absence of active viremia, suggesting that elevated basal expression of these factors plays a previously underappreciated role in the EC phenotype. Further studies will be necessary to cement this idea and would especially benefit from the integration of single-cell omics to dissect TE regulation and clustering in deconvoluted CD4+ T cells of ECs. We also acknowledge that our study is limited by the small number of EC individuals with available omics data, which likely limited our ability to identify significant relationships between transcriptome clustering and available participant metadata (Figure S5). While the rarity of ECs in the seropositive population makes it challenging to study this phenotype, the transcriptomic heterogeneity revealed by our analyses underscores the need for surveying larger and more diverse EC cohorts.” [pg. 37-38, L651-662]

      Regrettably, we do not have access to elite controller samples (which are exceedingly rare), and as such the addition of a novel dataset was not feasible within the scope of this revision. Nevertheless, we assert that the publicly available sequencing data analyzed here is robust and suitable for locus- and family-level TE analysis. All sequencing runs were paired-end and of high depth, ensuring proper alignment to and high coverage of TEs at a locus-specific resolution. Additionally, we use in-house pipelines curated for TE analysis, to optimize the accuracy and quantity of TE-assigned reads (see Methods and our GitHub Repository for more details).

      1. As the authors acknowledge, the described investigations are exploratory, and do not allow to draw firm conclusions. Mechanistic experiments are recommended to address the authors' hypotheses.

      Authors: We agree and have duly acknowledged throughout the Discussion the exploratory nature of our investigations and the need for future mechanistic experiments to validate our model. Below are passages from the revised manuscript which we’ve added to emphasize these points.

      “These findings underscore the need for future locus-level investigations with high-depth sequencing to fully capture the complexity of TE expression.” [pg. 28, L486-488]

      “Each step in the model will require experimental work to be validated. First and foremost, it will be important to confirm that the TEs exhibiting increased transcript levels and accessibility in ECs are indeed boosting the innate immune response and control of HIV-1 in these individuals.” [pg. 34, L583-586]

      “CRISPR-Cas9 editing was used in cell lines to demonstrate that a subset of MER41 elements function as enhancers driving the interferon-inducibility of several innate immune genes. However, the specific MER41 loci we identified here as differentially active in ECs have not been tested experimentally for enhancer activity. Thus, further work is warranted to confirm the regulatory function of these loci under the control of STAT1 or other immune TFs, as well as other TE families identified as targets of immune-related TFs (Figure S8).” [pg. 35, L594-600]

      “Overall, our results reinforce the concept that TEs are important players in the human antiviral response (25,93) and uncover specific candidate elements for boosting cellular defenses against HIV-1 in ECs. We acknowledge that these associations are drawn from correlative patterns and manipulative experiments are needed to infer causality between chromatin changes at these TEs and increased expression of nearby immunity genes.” [pg. 36, L618-623]

      “Further work is needed to validate TE-KZNF regulatory interactions in T cells, probe their connection to epigenetic variation at individual TE loci, and explore their repercussions on gene expression variation in CD4+ T cells, with and without HIV-1 infection.” [pg. 40, L715-718]

      Thus, while we appreciate and agree with the suggestion of experimental validation, we contend that these experiments fall beyond the scope of the present study, which is a computational investigation providing insight into the EC retrotranscriptome and its potential implications for natural HIV-1 control.

      1. An important limitation is that virological data of EC are not considered. For example, I believe it is a lot more likely that the upregulation of ISG in EC relates to ongoing low-level viral replication. The authors could analyze cell-associated HIV RNA and DNA levels and determine how they associate with ISG expression.

      Authors: Thank you for bringing up this important consideration. It's worth noting that the public datasets used in our study reported undetectable viremia in the EC volunteers (PMIDs 30964004, 29269040, 32848246, 27453467). Nonetheless, we sought to address this limitation and explore the potential association between ISG expression and viremia as recommended by the reviewer. These analyses were integrated into the revised manuscript as Figure S6.

      “To exclude the possibility that these gene expression signatures in ECs are associated with viremia, we quantified HIV-1 transcript levels in deconvoluted CD4+ T cell RNA-seq samples from ECs and ART-treated PLWH for comparison. In the original studies, all samples were reported to have undetected viremia by blood tests (9,37-39). Consistent with this, we found that the vast majority of the EC and ART samples taken from PBMCs exhibited very low HIV-1 transcript levels, with TPM values generally below 1. However, in samples originating from the lymph nodes of EC individuals (n = 22) (37), we detected HIV-1 expression in some subsets (Figure S6A&B). In agreement with the corresponding study (37), we found elevated HIV-1 transcript levels in germinal center and non-germinal center T follicular helper cells (GC Tfh & nGC Tfh, not included in our clustering analyses) -- and to a lesser extent in T effector memory (EM) cells (Figure S6A, average TPM This added analysis confirms that the increased expression of ISGs in ECs is not correlated with virological transcription and is therefore likely not to be driven by viremia.

      1. KZNF genes seem downregulated in EC. Can the authors propose a reason/mechanism for that?

      Authors: There is the possibility that KZNF regulatory loops are the cause of their transcriptional downregulation, which has been documented in embryogenesis (PMID 31006620) and cancer (PMID 33087347). We’ve incorporated this hypothesis into the Discussion as an additional consideration for the reader.

      “These observations suggest that interindividual variation in KZNF expression in CD4+ T cells could explain why certain TEs are variably expressed and accessible across ECs. But what are the mechanisms underlying variation in ZNF expression? It is possible that TE-KZNF regulatory loops are involved, in which a copy of the TE family targeted by a KZNF is inserted near and regulates the KZNF gene, thereby introducing a negative feedback loop. This phenomenon has been documented in prior studies of KZNF activity in embryogenesis (51) and cancer (115).” [pg. 39-40, L705-711]

      While we believe this is a viable hypothesis, it requires further experimentation to confirm the existence of this phenomenon and its impacts in the context of immune cells.

      Significance

      Overall, I think this is an interesting manuscript that proposes distinct and potentially important mechanisms that may contribute to immune control of HIV. My suggestions to improve the manuscript are complex and cannot be easily addressed through experimental work. I believe a possible option would be to publish the present manuscript without my proposed modifications but highlight the weaknesses of the current paper more clearly; mechanistic studies could then be deferred to a future study.

      Authors: We appreciate the reviewer's positive assessment of our manuscript and their recognition of its significance in elucidating novel TE-derived mechanisms that may contribute to natural HIV-1 control. We agree that mechanistic studies are required to test our predictions. As the reviewer suggests, these would be complex experiments that we feel fall beyond the scope of this study. With the additions detailed above in response to the reviewer’s point #2, we believe that we have clearly highlighted the limitations of our work and emphasized the need for future experimentation to validate our findings.

      Reviewer #3

      Evidence, reproducibility, and clarity

      Summary: This manuscript presents an analysis of published gene expression (RNA-seq and ATAC-seq) data from a couple of cohorts of HIV-infected elite controllers (EC), as compared to uninfected controls, (HC), virological progressors (VP). The authors report that HIV elite controllers may exhibit 4 distinct patterns of TE (and gene) expression and suggest that TE expression may drive some form of antiviral gene expression. Further, they show that heterogeneous TE expression may be determined by differential KZNF gene activity among the different clusters of elite controllers. These results are very interesting, even though the conclusions are very preliminary. It presents intriguing correlations between expression of certain TE groups of LINES and HERVs, and the clustering into 4 gene expression groups in EC and is a novel finding. That said, correlation is not causation, and the authors need to be more cautious in presenting their highly preliminary model in Figure 6.

      Authors: We are grateful for the reviewer's insightful assessment of our manuscript, acknowledging the novelty and interest of our findings regarding TE expression patterns in HIV-1 elite controllers. We also appreciate their constructive feedback regarding the cautious interpretation of preliminary conclusions. In the revised manuscript, we have underscored the exploratory nature of our investigations and the need for future mechanistic experiments to validate our model.

      “These findings underscore the need for future locus-level investigations with high-depth sequencing to fully capture the complexity of TE expression.” [pg. 28, L486-488]

      “Each step in the model will require experimental work to be validated. First and foremost, it will be important to confirm that the TEs exhibiting increased transcript levels and accessibility in ECs are indeed boosting the innate immune response and control of HIV-1 in these individuals.” [pg. 34, L583-586]

      “CRISPR-Cas9 editing was used in cell lines to demonstrate that a subset of MER41 elements function as enhancers driving the interferon-inducibility of several innate immune genes. However, the specific MER41 loci we identified here as differentially active in ECs have not been tested experimentally for enhancer activity. Thus, further work is warranted to confirm the regulatory function of these loci under the control of STAT1 or other immune TFs, as well as other TE families identified as targets of immune-related TFs (Figure S8).” [pg. 35, L594-600]

      “Overall, our results reinforce the concept that TEs are important players in the human antiviral response (25,93) and uncover specific candidate elements for boosting cellular defenses against HIV-1 in ECs. We acknowledge that these associations are drawn from correlative patterns and manipulative experiments are needed to infer causality between chromatin changes at these TEs and increased expression of nearby immunity genes.” [pg. 36, L618-623]

      “Further work is needed to validate TE-KZNF regulatory interactions in T cells, probe their connection to epigenetic variation at individual TE loci, and explore their repercussions on gene expression variation in CD4+ T cells, with and without HIV-1 infection.” [pg. 40, L715-718]

      We hope these passages provide sufficient caution and clarity in the presentation of our scientific inquiry.

      Major comments:

      Overall, although preliminary, as the authors note, the results are interesting and worthy of follow-up. At this point, however, a number of issues arise that need further clarification and analysis before I would consider this study complete.

      First, the analyses shown in Figures 3-5 based on data from studies on EC of CD4 cells are apparently motivated by the differential TE expression in total PBMCs shown in Fig 1 and 2. Yet, the TE groups (please don't use taxonomic terms like "subfamily") identified in Fig 2 and Fig 4 are completely different, with no overlap. This discrepancy underscores the possibility that the differential expression observed is, at least in part, due to the differences among the groups or clusters in cell type composition, as seen in Fig 2D and 3B which, themselves, could be a consequence of HIV infection and elite control (which has been shown to involve ongoing, albeit low-level, virus replication). This issue must be addressed.

      Authors: Thank you for the suggestion. First, we’d like to clarify that the data used in Figures 1 and 2 were not both derived from PBMCs. Figures 1 and S1 examine the differential expression of TEs in EC CD4+ T cells compared to HCs and ART-treated PLWH, respectively. Figure 2 examines differential expression of TEs in EC PBMCs compared to treatment-naïve VPs. Second, regarding Figure 4B-C, the TE loci that we chose to highlight were not based on our results from the PBMC analysis in Figure 2, which is why there is no overlap in the TE families presented. Instead, we selected those TE-gene pairs based on 1) known function of the genes in immunity and/or HIV-1 restriction, 2) known contribution of the TE families to immunity, and 3) differential accessibility and expression of the TEs and genes respectively in ECs compared to HCs. Thus, Figure 4B-C represents select examples that we deemed particularly relevant to the EC phenotype. We have revised the manuscript to better explain the process of TE-gene pair identification and the rationale behind our selection for Figure 4B-C.

      “We used paired ATAC-seq – which measures chromatin accessibility – and RNA-seq datasets from the CD4+ T cells of ECs (n=4) and HCs (n=4) (39) to create a list of TE-gene pairs where the TE locus and gene show increased accessibility and expression, respectively, in ECs compared to HCs (Table S7, see Methods for details). These loci and genes were paired based on proximity, with a maximum distance of 10kb between the TE locus and the gene’s transcription start site, to increase the likelihood of a direct cis-regulatory influence of the TE over the nearby gene.” [pg. 21, L357-363)

      “In Figure 4B & 4C, we have highlighted six of the TE-gene pairs from Table S7 based on the gene’s function in HIV-1 restriction and the TE family’s known contribution to immune gene regulation.” [pg. 21, L369-371]

      Regarding cell type composition, we acknowledge that the differences observed in the proportion of immune cell subtypes may contribute to the differential expression between ECs, VPs, and HCs (Figures 2D and S3A). However, we provide evidence that cell type composition cannot be the sole driver for the clustering of deconvoluted CD4+ T cell RNA-seq samples (Figure 3B and S5D). Cell subtype alone could not explain the observed clustering of EC samples by gene and TE family expression. Clusters 1 and 2, for example, had nearly identical subtype compositions, but were clearly separated on the UMAP (Figures 3A, 3B, and S5D). We remark on this in the Results of the revised manuscript.

      “[W]e visualized the samples by cellular subtype, as identified in the original studies, to assess whether the clustering could be explained by CD4+ T cell subtype composition (Figure S5D). Clusters 1 and 2 were essentially indistinguishable in cell type composition, whereas Clusters 3 and 4 showed an overrepresentation of TM/EM and naïve/CM cell types, respectively (Figure 3B). Thus, cell subtype composition could only partially explain the clustering.” [pg. 16, L271-276]

      The EC CD4+ T cell clusters also had unique gene ontology, gene & TE expression, and TE accessibility profiles (Figures 3C, 3D, 5). Moreover, while we do not have parallel RNA- and ATAC-seq data from similarly deconvoluted CD4+ T cells of ECs like those used in the clustering analysis (PMIDs 32848246 & 27453467), the original article from which we sourced the parallel RNA- and ATAC-seq data used in Figures 1 and 4 reported that these samples are predominantly effector memory CD4+ T cells (PMID 30964004). If new deconvoluted, multi-omic datasets from ECs become available, we would be interested in further exploring the contribution of cell type composition. However, the current data indicate that it is not a major contributor to the differential TE expression identified in our analyses.

      Regarding the impact of ongoing HIV-1 replication upon the unique expression patterns in the EC participants, it's worth noting that the public datasets used in our study reported undetectable viremia in the EC volunteers (PMIDs 30964004, 29269040, 32848246, 27453467). Nonetheless, we sought to address this by quantifying HIV-1 transcription and exploring its potential association with interferon-stimulated gene (ISG) expression, a group of genes that we know would be reactive to active viremia. These analyses were integrated into the revised manuscript as Figure S6.

      “To exclude the possibility that these gene expression signatures in ECs are associated with viremia, we quantified HIV-1 transcript levels in deconvoluted CD4+ T cell RNA-seq samples from ECs and ART-treated PLWH for comparison. In the original studies, all samples were reported to have undetected viremia by blood tests (9,37-39). Consistent with this, we found that the vast majority of the EC and ART samples taken from PBMCs exhibited very low HIV-1 transcript levels, with TPM values generally below 1. However, in samples originating from the lymph nodes of EC individuals (n = 22) (37), we detected HIV-1 expression in some subsets (Figure S6A&B). In agreement with the corresponding study (37), we found elevated HIV-1 transcript levels in germinal center and non-germinal center T follicular helper cells (GC Tfh & nGC Tfh, not included in our clustering analyses) -- and to a lesser extent in T effector memory (EM) cells (Figure S6A, average TPM Based on these results, we have concluded that the differential expression of genes and TEs in the EC clusters are not a consequence of low-level viral transcription in ECs.

      Finally, a remark on TE nomenclature: The reviewer suggests that we use the term “TE groups” as opposed to taxonomic terms such as TE subfamily or TE family. We respectfully disagree. This nomenclature of TEs has been well defined (PMIDs 26612867, 26612867, 17984973) and is widely used in TE literature. Throughout the manuscript, we have conformed to the nomenclature used to annotate the human genome. One can debate the way TE families and subfamilies have been classified in Dfam (the database through which repetitive elements in the human genome have been annotated), but it is outside the scope of this study to revisit that nomenclature.

      Similarly, of the 12 DE TE groups in EC in Fig 5A, only 3 overlap with the 16 in EC Fig S1.

      Authors: This is correct, but we don’t believe it’s concerning. In Figure 5A, we are comparing the expression of TE families between separate EC clusters. In Figure S1, we are comparing the expression of TE families in ECs compared to ART-treated PLWH. These are fundamentally different comparisons and thus the differences in the identified DE-TEs between the two figures reflect the distinct biological contexts being investigated in each analysis.

      Second, the introduction points out the strongly supported association between elite control and immunogenetic determinants, most notably specific HLA-B types, but also innate immunity factors. This cries out for inclusion of these factors in the analyses of this manuscript, in the format of Figure S4, for example, but none is to be found. The relevant genotypes are likely available in the metadata in the references cited, but, if not, could be inferred from the RNA-seq data.

      Authors: Thank you for the recommendation. While our project’s primary focus is on the transcriptomic and epigenomic signatures, we agree that studying the HLA-B genotypes of all EC participants could provide valuable context for understanding the clustering of elite controllers. To explore this, we inferred the HLA-B alleles in each EC participant whose RNA-seq data was included in the clustering analysis, utilizing the arcasHLA tool (PMID: 31173059) on the total CD4+ T cell samples. We then validated these inferred HLA-B alleles against the available metadata from one of the source studies (PMID 27453467) and found that they matched for all participants. This strengthened our confidence in the accuracy of the HLA-B genotype inferences for the other samples where comprehensive HLA-B data was not provided.

      In order to assess how these protective HLA-B alleles segregated between the four EC clusters derived from gene and TE family expression, we chose to visualize three of the most common alleles associated with HIV-1 elite control: HLA-B*27:03, *57:01, and *57:03 (PMIDs 30964004, 25119688, 21051598) (Figure R1, available in the Response to Reviewers PDF).

      Our analysis revealed that these major protective alleles were not significantly overrepresented in any particular cluster. Consequently, we believe that HLA-B genotype does not have a major impact on the clustering observed in Figure 3.

      It would also be very useful to present the KZNF data in Figure 5 the same way, since, looking at Fig 5C, the correlation of high and low KZNF expression, while clearly correlated with a that of few groups of elements, with clustering into specific groups does not appear to be well supported.

      Authors: Thank you for the insightful suggestion. While the KZNF genes are included in the gene set used for the clustering analysis in Figure 3, we agree that clustering based solely on KZNF expression and displaying it as we have in Figures 3A and S5 could provide valuable insights. However, when we attempted to cluster the EC RNA-seq samples using only KZNF expression data, we were limited by the relatively low number of KZNF genes that showed sufficient variability across samples (n = 120). For robust statistical power, we require at least 200 features to reliably cluster the 128 EC CD4+ T cell samples. We believe this limitation does not diminish the relevance of KZNFs in the observed clustering patterns but rather highlights the nuanced role each KZNF plays in the regulation of the transcriptome. Each individual KZNF is responsible for the regulation of hundreds to thousands of TE loci (PMID 37730438). Thus, while a clustering approach based solely on KZNF expression may not be feasible, the integral role of KZNFs in modulating the transcriptome through TE regulation remains evident and supports their inclusion in Figure 6 of the revised manuscript.

      In general, other than the cell type composition differences, there is no presentation of evidence for any biologically important feature associated with the clusters found.

      Authors: We agree that the root cause of the transcriptomic differences between the EC clusters is hard to pin down but we do identify several distinctive features of the clusters that we believe are biologically significant. First, having extracted the lists of genes whose differential expression defined the four EC clusters, gene set enrichment analysis revealed that the clusters were functionally distinct, each characterized by a unique list of top GO terms (Figure 3C). Second, we provide evidence that KZNFs expressed in CD4+ T cells significantly bind to the candidate TE families whose expression defines each of these clusters (Figure 6D) and have significantly decreased expression in ECs compared to VPs (Figure 6C). This is corroborated by pairwise correlation analysis that revealed cluster-specific anticorrelation patterns between these KZNFs and their target TEs (Figure 6A). We present this data in support of our hypothesized KZNF-based mechanism for TE co-option in viral immunity. We do not yet have data indicative of the mechanism by which KZNF expression is in turn regulated. However, we speculate that negative feedback loops may be contributing to changes in KZNF expression.

      “These observations suggest that interindividual variation in KZNF expression in CD4+ T cells could explain why certain TEs are variably expressed and accessible across ECs. But what are the mechanisms underlying variation in ZNF expression? It is possible that TE-KZNF regulatory loops are involved, in which a copy of the TE family targeted by a KZNF is inserted near and regulates the KZNF gene, thereby introducing a negative feedback loop. This phenomenon has been documented in prior studies of KZNF activity in embryogenesis (51) and cancer (115).” [pg. 39-40, L705-711]

      Overall, our study presents preliminary evidence that the four EC clusters derived from gene & TE family expression may be distinguished by complex interplay of activators (Figure S8) and repressors (Figure 6) altering the activity of infection-responsive TE families to co-opt specific elements for immune regulatory function. While not yet validated in an experimental setting, we believe these results are of biological significance.

      Third, the figures present values that have been very heavily analyzed, and it is difficult to impossible to infer what the underlying data look like. For example, with the exception of a few selected examples in Figs 4 and 5, individual provirus data are lacking. Nor can we tell how consistent the distribution of expression values within a TE group is, whether the TEs included solo LTRs (which constitute the majority of all ERVs), the possible contribution of other TFs to expression (with the exception of a brief mention of STAT1).

      Authors: We respectfully disagree that the values presented in our figures are heavily analyzed. As this manuscript represents the first investigation of TEs’ role in HIV-1 elite control, we believe the most reasonable initial approach was to compile and visualize the data at the family level, rather than at the level of individual loci, which is harder to interpret due to mapping issues, commonly low transcription, and often idiosyncratic behavior of individual loci. Nonetheless, we did not limit our analysis to full-length HERVs (proviruses) and thus retain all solo LTR data in our analyses. This was added to the Methods of the revised manuscript.

      “To facilitate comprehensive expression quantification, we curated a reference transcriptome by combining gene, TE, and HIV-1 genomic sequences. This was achieved by integrating the locus-level TE classification from RepeatMasker, the hg19 GenCode gene annotation,

      and the HXB2 reference HIV-1 annotation. For the TEs, we removed simple repeats, SINE elements, and DNA transposons, retaining LINE and HERV loci, including all solo LTRs. We also removed any loci within gene exons/UTRs. The remaining sequences were appended in fasta format, and all sequences were annotated with their respective gene, TE locus, or HIV subunit and modeled in GTF format.” [pg. 55, L869-878]

      For the sake of transparency, all relevant details on sequencing data analysis and the corresponding scripts are available in the Methods and our GitHub Repository.

      Additionally, while most of our figures make comparisons at the family level, we do visualize multiple TE loci (Figure 4C) and provide a list of putative locus-level TE-gene pairs from which those shown in Figure 4C were selected (Table S7). In our revisions, we also re-clustered the 128 EC CD4+ T cell RNA-seq samples based only on locus-level TE expression, using the same graph-based k-nearest neighbors method as in Figure 3. The results of this new analysis have been integrated into the revised manuscript as Figure S7.

      “To further explore locus-level expression patterns, we re-clustered the same EC samples (n=128) using only locus-level TE expression. This again resolved four EC clusters (Figure S7A), which interestingly appeared even more distinct than those identified by gene and TE family expression (Figure 3A). The TE locus-based clusters (TL-Cs) aligned well with the gene and TE family clusters (GT-Cs), with an average 70% overlap in samples between each GT-C and its corresponding TL-C (Figure S7B), indicating high consistency (Table S8). The remaining 30% of samples that shifted between clusters did so consistently within individuals, not cohorts, maintaining heterogeneous TL-C compositions similar to the GT-Cs (Figures S7C & S5A). An exception to this heterogeneity was TL-C4, comprising 22 samples from GT-C1 that were almost entirely from the CD4+ T cell subsets of only four participants in the Jiang cohort (Figure S7C, Table S8). No other samples from the Jiang cohort shifted to this cluster from other GT-Cs, suggesting that these patterns reflect individual variation rather than cohort bias. Like the GT-Cs, each TL-C included samples from all five CD4+ T cell subsets and was largely heterogeneous (Figure S7C). Notably, TL-C2 mirrored corresponding GT-C3 in its overrepresentation of EM and TM cells, while TL-C1 uniquely showed an overrepresentation of naïve CD4+ T cells. Beyond sample composition, each TL-C was characterized by a unique pattern of expressed TE loci (Figure S7D). These signatures were heterogeneous across families, with subsets of variable loci from one TE family marking separate clusters (Figure S7E), some of which did not reach the threshold of significance in earlier analyses when analyzed at the family-level, like SVA-D. Many families maintained their cluster-specific signatures, like THE1B (a marker of GT-C2), for which the majority of variable loci were found in corresponding TL-C1. However, some TE families, like the L1s that marked GT-C1, showed more heterogeneous signatures with variable loci marking multiple TL-Cs. These findings underscore the need for future locus-level investigations with high-depth sequencing to fully capture the complexity of TE expression.” [pg. 27-28, L462-488]

      With this addition, we include significantly more data analyzed at the locus level, which we believe not only validate the distinct clustering observed in Figure 3, but also underscore the potential for locus resolution analysis to reveal additional layers of retrotranscriptomic diversity in EC CD4+ T cells.

      Finally, we agree with the reviewer that TFs other than STAT1 may contribute to the observed changes in TE expression. To investigate this, we analyzed several TFs expressed in CD4+ T cells and, for TFs enriched over TEs of interest, subsequently examined the correlation between TF and target TE expression in the deconvoluted EC CD4+ T cell samples used for the clustering. The results of this analysis have been integrated into the revised manuscript at Figure S8.

      “In addition to KZNF repressors, transcriptional activators may also drive the differential expression of specific TE families across ECs (83). To investigate this, we focused on transcription factors (TFs) expressed in CD4+ T cells and mined ChIP-seq data from the ENCODE Consortium (84) to identify TFs with binding enrichment to TE families of interest, selected for their elevated, cluster-specific expression in ECs (highlighted in Figures 4, 5, and S4). We then examined the correlation between TF and target TE expression in the deconvoluted CD4+ T cell samples from ECs used for our clustering analysis (Figure 3) (9,37). We observed several significant positive correlations between TF and TE expression across ECs (Figure S8). Thus, differential expression of immune-related TFs may also contribute to the variation in TE expression and cis-regulatory activity across ECs, in tandem with the repressive activities of KZNFs.” [pg. 30, L517-527]

      This evidence supports the reviewer’s suggestion that other TFs may be contributing to the unique EC retrotranscriptome we profile in this study. These added analyses, mimicking those conducted for KZNFs in Figure 6B & 6D, demonstrate that transcriptional activators may indeed play a crucial role in shaping the TE landscape in ECs.

      Other issues

      Figure 1:

      A) Log2 fold change of what? TPM values? Needs to be specified.

      Authors: Thank you for pointing out this ambiguity. The log2-transformed fold change values plotted in Figure 1A refer to DESeq2-normalized expression. They were extracted from the results of the DESeq2 pipeline, which we applied to the raw count expression matrix (see our Methods for more details). Following your suggestion, we have clarified this point in the figure legend in the revised manuscript.

      “Total detected genes and TE loci are plotted by log2-transformed fold change of DESeq2-normalized counts (EC vs. HC).” [pg. 10, L163-164]

      We have similarly made these changes to any figure legend which was ambiguous in its description of the expression data.

      Why Bonferroni correction? Usually BH q values or other less stringent adjustments are used nowadays.

      Authors: In our analysis, we opted for the Bonferroni correction due to its well-established reliability and stringent control of the family-wise error rate when conducting multiple tests. Given the exploratory nature of our investigation and the desire to minimize the risk of false positive findings, we chose to employ this traditional correction method within our analytical pipelines.

      B,C): Z-score of what? Scaled, normalized counts? Scaled TPM values?

      Authors: Thank you again for highlighting this point of uncertainty. We now clarify this in the figure legend in the revised manuscript.

      “Heatmap displaying the expression of the top differentially expressed genes in CD4+ T cells of ECs (n=4; red bar) vs. HCs (n=5; blue bar). Relative expression levels are representative of row-wise scaled, log2-transformed expression in transcripts per million (TPM). Heatmap coloration is based on the z-score distribution from low (gold) to high (purple) expression.” [pg. 11, L167-171]

      Figure 2:

      B) The blue font color is very difficult to see.

      Authors: We have changed the blue font color to make it more easily distinguishable from the black.

      C) This heatmap should demarcate or separate genes versus TE clades. If that's not possible, then the two should be shown separately.

      Authors: We appreciate your suggestion regarding the heatmap presentation. While we understand the rationale for demarcating genes versus TE clades, we have chosen to retain the original figure layout. In this analysis, TEs were analyzed simultaneously with genes. The order in which they are shown was obtained by default clustering of the expression matrix using the hclust function. We chose to present them together and in this order to provide a comprehensive visualization of the differential expression patterns between the two groups and highlight the homogenous nature of gene and TE expression across VPs.

      L191: How many groups (NOT families) and how many total elements were examined?

      Authors: We begin with the RepeatMasker annotation of the hg19 assembly and filter out the SINE elements, DNA transposons, simple repeats, and all loci within gene exons/UTRs. These details are provided in the Methods of the revised manuscript, as was quoted above. In total, our analyses examine 1,104,828 loci from 603 TE groups (which we refer to as families). We apologize if this figure is not accurate to a separate classification of TEs into groups, rather than families. Any such method of grouping TEs is unfamiliar to us and outside of the Dfam annotation.

      L198: 2B, not C

      Authors: Thank you for catching this. The figures labelled were swapped in error and have been changed to reflect in Figure 2 to match the in-text references.

      L205: Did the expressed proviruses have STAT1 sites?

      Authors: Thank you for your question. The identification of LTR13’s increased expression in ECs compared to VPs was the result of a family level analysis which considered expression additively across the LTR13 loci in our annotation. To answer your question, we analyzed STAT1 ChIP-seq data from the ENCODE Consortium to characterize which LTR13 loci were bound by STAT1 (corroborated by motif prediction calls). We then integrated the EC RNA-seq data and found that the expressed LTR13 proviruses significantly overrepresented those with bound STAT1 sites (Figure R2, available in the Response to Reviewers PDF).

      These data suggest that STAT1 binding may play a critical role in the transcriptional regulation of LTR13 in ECs, contributing to their differential expression profile. Further exploration into the contribution of activating, immune-related TFs is explored in Figure S8 in the revised manuscript.

      L333: 10 kb is very close. Why was it chosen?

      Authors: We chose 10 kb as our cutoff for selection because it allowed for very high confidence in the TE loci’s cis-regulatory capacity over the nearby genes. For transparency, we have made this clearer in the Results text of the revised manuscript.

      “These loci and genes were paired based on proximity, with a maximum distance of 10kb between the TE locus and the gene’s transcription start site, to increase the likelihood of a direct cis-regulatory influence of the TE over the nearby gene.” [pg. 21, L360-363]

      However, if desired, a less stringent cutoff could also be used with relative confidence (e.g., 50 kb).

      L351-352: Again, correlation is not causation. How do the authors know it's not the other way around?

      Authors: The candidates that we chose to display in Figure 4 (the figure to which these lines refers) are from MER41, ERV3-16, and LTR12C. Our lab and others have shown that these specific loci or other loci in these TE families are capable of regulating neighboring genes’ expression, with specific evidence in the context of immunity (PMID Smitha, Ed, APOBEC, etc.). Based on this knowledge, we believe that it’s most likely that TE-derived regulatory sequences are the cause of the increased restriction factor expression, rather than TE accessibility being a consequence of the transcriptional activation of the neighboring genes. However, we recognize that these results are correlative, as the reviewer notes, and we emphasize this in the revised manuscript. Most notably:

      “We acknowledge that these associations are drawn from correlative patterns and manipulative experiments are needed to infer causality between chromatin changes at these TEs and increased expression of nearby immunity genes.” [pg. 36, L620-623]

      Figure 4

      B) Need to show a scale of the genome region, the orientation of both the gene and the TE, whether it is a solo LTR

      Authors: Thank you for the suggestion. Genomic scale and orientation have been added to Figure 4C. All loci visualized were solo LTRs, save for HCP5, which is a lncRNA derived from a full-length ERV3 element.

      Figure 5

      A) Would benefit from also showing HCs

      Authors: Thank you for the recommendation. The RNA-seq datasets used in this analysis do not include HC samples. Additionally, this analysis is meant to highlight differences in TE expression between the four EC clusters. Thus, we have chosen to keep Figure 5A as it appears in the original manuscript.

      C) Would be helped by showing adjusted p-values, and also should show examples of non-correlating relationships between these KZNF genes and other TEs.

      Authors: Thank you for the suggestion. All correlation analyses had adjusted p-values below 0.01, derived from corr.test in R. We’ve added this to the figure legends of Figure 6B [pg. 32, L539] and S8B [pg. 53, L835]. However, we have chosen not to integrate non-correlating examples into the revised manuscript for the sake of space.

      Figure 6

      Title: should start with "proposed model for.." or some such.

      Authors: Thank you for the suggestion. The title has been changed to “Proposed model for the interplay of KZNFs and TEs regulating proximal antiviral gene expression in elite controllers of HIV-1” in the revised manuscript [pg. 34, L580-581].

      L 537: Again, how do the alleles segregate in the clusters?

      Authors: This question has been addressed in response to an earlier comment from Reviewer #3.

      Generally, in the correlation analyses, I'd like to see adjusted p-values and examples of non-correlated results.

      Authors: Thank you for the suggestion. As mentioned above, all correlation analyses have been annotated with the adjusted p-value threshold. Additionally, below we’ve included examples of non-correlated results from two analyses. First, we show a TE-gene pair whose increased TE accessibility in HCs compared to ECs does not correlate with increased expression of the proximal gene (Figure R3, available in the Response to Reviewers PDF). Notably, this gene does not play a role in HIV-1 infection response. Here, we show that genes with proximal (Second, we show the pairwise correlation and linear regression results of L1PA6 and ZNF2 (Figure R4, available in the Response to Reviewers PDF). ZNF2 is one of the KZNFs highlighted in Figure 6 for its low expression in ECs, anticorrelated to its repressive target LTR12C. On the other hand, L1PA6 is active in ECs, with variably high expression across samples. ZNF2 ChIP-exo revealed that ZNF2 has no capacity to bind to L1PA6 loci (adj. p-value = 1; PMID 37730438). Thus, even though both genes are variable across samples, we observe no significant (anti)correlation between the two variables (rho = 0.051 & p-value = 0.866).

      While we have not integrated these results into the revised manuscript for the sake of space, we hope that the provided examples satisfactorily demonstrate the presence of non-correlated results in our analyses, further reinforcing the specificity and robustness of our significant findings.

      Significance:

      This study presents an in-depth analysis of the reverse transcriptome in Elite controllers. It will be of interest to both HIV researchers and those interested in the regulation of the human retrotranscriptome and its consequences.

      Provides an avenue for future explanation into elite controllers and TE involvement in the phenotype.

      Does a good job of placing the work in the context of existing lit, synthesizing other papers regarding TEs and immune control.

      Potential immune regulatory involvement of specific HERV clades.

      Authors: We’d like to thank the reviewer for their encouraging feedback. We’re pleased that they found our analysis of the EC retrotranscriptome to be of broad interest and appreciate their recognition of our efforts to synthesize existing literature, contextualizing our findings within the broader field. We agree that our study opens new avenues for exploring the role of TEs, particularly specific HERV clades, in not only the EC phenotype but immune regulation as a whole.

    1. Automatic thinking causes us to simplify problems and see them through narrow frames. We fi ll in miss- ing information based on our assumptions about the world and evaluate situations based on associations that automatically come to mind and belief systems that we take for granted. In so doing, we may form a mistaken picture of a situation, just as looking through a small window overlooking an urban park could mis- lead someone into thinking he or she was in a more bucolic place. page 12

      I think that the results from the research conducted on culting stigmatized identity affecting students' performances made me realize how much of a mental toll stereotypes can play on people. It's disheartening and ironic simultaneously to see how the high and low sides of the caste-system groups collectively performed worse when they were told their respective roles. It influences the way that I think as a student going to an international school because it is interesting to see how these ideas can parallel to students around me. Regardless, this passage relates to today's inquiry question because it can be used and argued to reflect how poor people shape individual economic actions due to neglect and stereotypes affecting their life subconsciously.

    2. Automatic thikning causes us to simplify problems and see them through narrow frames. We fill in missing information based on our assumptions about the world and evaluate situations based on associations that automatically come to mind and belief systems that we take for granted. In so doing, we may form a mistaken picture of a situation, just as looking through a small window overlooking an urban park could mislead someone into thinking he or she was in a more bucolic place. page 6

      (Question)

      Throughout this passage I noticed that automatic thinking or "thinking fast" is commonly portrayed as being bad due to its associations with irrational decisions, prejudice, and intuitive. Given this context, what are the positive benefits of automatic thinking and why did the author fail to shed light on this in the book?

      I would say that this section of the book relates to today's inquiry because it teaches us more about the first system of thinking and how it can be malicious to mainly use this for our everyday thinking. For example, statistically poor people fail to make it out of the bottom income threshold for most of their life. I hypothesize that this is due to their lack of choices, power, and education in order to think deliberately.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Thank you very much for your editorial handling of our manuscript entitled 'A conserved fungal Knr4/Smi1 protein is vital for maintaining cell wall integrity and host plant pathogenesis'. We have taken on board the reviewers' comments and thank them for their diligence and time in improving our manuscript.

      Please find our responses to each of the comments below.

      Reviewer(s)' comments

      Reviewer #1


      Major comments:


      __1.1. As a more critical comment, I find the presentation of the figures somewhat confusing, especially with the mixing of main figures, supplements to the main figures, and actual supplemental data. On top of that, the figures are not called up in the right order (e.g. Figure 4 follows 2D, while 3 comes after 4; Figure 6 comes before 5...), and some are never called up (I think) (e.g. Figure 1B, Figure 2B). __


      __Response: __The figure order has been revised according to the reviewer's suggestion, while still following eLife's formatting guidelines for naming supplementals. Thank you.

      1.2. I agree that there should be more CWI-related genes in the wheat module linked to the FgKnr4 fungal module, or, vice-versa, CW-manipulating genes in the fungal module. It would at least be good if the authors could comment further on if they find such genes, and if not, how this fits their model.


      Response: Thank you for your insightful suggestion regarding the inclusion of more CWI-related genes in the wheat module linked to the FgKnr4 fungal module F16, or vice versa. We did observe a co-regulated response between the wheat module W05 which is correlated to the FgKnr4 module F16. Namely, we observed an enrichment of oxidative stress genes including respiratory burst oxidases and two catalases (lines 304 - 313) in the correlated wheat module (W05). Early expression of these oxidative stress inducing genes likely induces the CWI pathway in the fungus, which is regulated by FgKnr4. Knr4 functions as both a regulatory protein in the CWI pathway and as a scaffolding protein across multiple pathways in S. cerevisiae (Martin-Yken et al., 2016, https://onlinelibrary.wiley.com/doi/10.1111/cmi.12618 ). Scaffolding protein-encoding genes are typically expressed earlier than the genes they regulate to enable pre-assembly with their interacting partners, ensuring that signaling pathways are ready to activate when needed. In this context, the CWI integrity MAPKs Bck1 and Mkk1 are part of module F05, which includes two chitin synthases and a glucan synthase. This module is highly expressed during the late symptomless phase. The MAPK Mgv1, found in module F13, is expressed consistently throughout the infection process, which aligns with the expectation that MAPKs are mainly post-transcriptionally regulated. Thank you for bringing our attention to this, this is now included in the discussion (lines 427 - 443) along with eigengene expression plots of all modules added to the supplementary (Figure 3 - figure supplement 1).

      To explore potential shared functions of FgKnr4 with other genes in its module, we re-analyzed the high module membership genes within module F16, which includes FgKnr4, using Knetminer (Hassani-Pak et al., 2021; https://onlinelibrary.wiley.com/doi/10.1111/pbi.13583 ). This analysis revealed that 8 out of 15 of these genes are associated with cell division and ATP binding. Four of the candidate genes are also part of a predicted protein-protein interaction subnetwork of genes within module F16, which relate to cell cycle and ATP binding. In S. cerevisiae, the absence of Knr4 results in cell division dysfunction (Martin-Yken et al., 2016, https://onlinelibrary.wiley.com/doi/10.1111/cmi.12618 ). Accordingly, we tested sensitivity of ΔFgknr4 to microtubule inhibitor benomyl (a compound commonly used to identify mutants with cell division defects; Hoyt et al., 1991 https://www.cell.com/cell/pdf/0092-8674(81)90014-3.pdf). We found that the ΔFgknr4 mutant was more susceptible to benomyl, both when grown on solid agar and in liquid culture. This data has now been added Figure 7, and referred to in lines 338-348.

      __Specific issues: __


      1.3. In the case of figure 5, I generally find it hard to follow. In the text (line 262/263), the authors state that 5C shows "eye-shaped lesions" caused by ΔFgknr4 and ΔFgtri5, but I can't see neither (5C appears to be a ΔFgknr4 complementation experiment). The figure legend also states nothing in this regard.

      __Response: __Thank you for your suggestion. We have amended the manuscript to include an additional panel that shows the dissected spikelet without its outer glumes, making the eye shaped diseased regions more visible in Figure 5.

      __1.4. Figure 5D supposedly shows 'visibly reduced fungal burden' in ΔFgknr4-infected plants, but I can't really see the fungal burden in this picture, but the infected section looks a lot thinner and more damaged than the control stem, so in a way more diseased. __


      Response: __Thank you for your insight. We have revised our conclusions based on this image to state that while ΔFgknr4 can colonise host tissue, it does so less effectively compared to the wild-type strain as we are unable to quantitatively evaluate fungal burden using image-colour thresholding due to the overlapping colours of the fungal cells and wheat tissues. Decreased host colonisation is evidenced by (i) reduced fungal hyphae proliferation, particularly in the thicker adaxial cell layer, (ii) collapsed air spaces in wheat cells, and (iii) increased polymer deposition at the wheat cell walls, indicating an enhanced defence response. __Figure 5 has been amended to include these observations in the corresponding figure legend and the resin images now include insets with detailed annotation.

      __1.5. The authors then go on to state (lines 272-273) that they analyzed the amounts of DON mycotoxin in infected tissues, but don't seem to show any data for this experiment. __

      Response: __We have amended this to now include the data in __Figure 5 - figure supplement 2B, thank you.

      Reviewer #2


      __Major issues: __


      2.1 If Knf4 is involved in the CWI pathway, what other genes involved in the CWI pathway are in this fungal module? one of the reasons for developing modules or sub-networks is to assign common function and identify new genes contributing to the function. since FgKnr4 is noted to play a role in the CWI pathways, then genes in that module should have similar functions. If WGCN does not do that, what is the purpose of this exercise?


      Response: __Thank you for raising this point regarding the role of FgKnr4 in the CWI pathway and the expectations for genes of shared function within the FgKnr4 module F16. We did observe that the module containing FgKnr4 (F16) was also correlated to a wheat module (W05) which was significantly enriched for oxidative stress genes. This pathogen-host correlated pattern led us to study module F16, which otherwise lacks significant gene ontology term enrichment, unique gene set enrichments, and contains few characterised genes. This is now highlighted in __lines 233-246. This underscores the strength of the WGCNA. By using high-resolution RNA-seq data to map modules to specific infection stages, we identified an important gene that would have otherwise been overlooked. This approach contrasts with other network analyses that often rely on the guilt-by-association principle to identify novel virulence-related genes within modules containing known virulence factors, potentially overlooking significant pathways outside the scope of prior studies. Therefore, our analysis has already benefited from several advantages of WGCNA, including the identification of key genes with high module membership that may be critical for biological processes, as well as generating a high-resolution, stage-specific co-expression map of the F. graminearum infection process in wheat. This point is now emphasised in lines 233-252. As discussed in response to reviewer 1, Knr4 functions as both a regulatory protein in the CWI pathway and as a scaffolding protein across multiple pathways in S. cerevisiae (Martin-Yken et al., 2016, https://onlinelibrary.wiley.com/doi/10.1111/cmi.12618 ) which would explain its clustering separate from the CWI pathway genes. The high module membership genes within module F16 containing FgKnr4 were re-analysed using Knetminer (Hassani-Pak et al., 2021; https://onlinelibrary.wiley.com/doi/10.1111/pbi.13583 ), which found that 8/15 of these genes were related to cell division and ATP binding. Four of the candidate genes are also part of a predicted protein-protein interaction subnetwork of genes within module F16, which relate to cell cycle and ATP binding. In S. cerevisiae, the absence Knr4 leads to dysfunction in cell division. Accordingly, we tested sensitivity of ΔFgknr4 to the microtubule inhibitor benomyl (a compound commonly used to identify mutants with cell division defects; Hoyt et al., 1991 https://www.cell.com/cell/pdf/0092-8674(81)90014-3.pdf). We found that the ΔFgknr4 mutant was more susceptible to benomyl, both when grown on solid agar and in liquid culture. This data has now been added as Figure 7 and referred to in lines 338-348.


      2.2. Due to development defects in the Fgknr1 mutant, I would not equate to as virulence factor or an effector gene.


      __Response: __We are in complete agreement with the reviewer and are not suggesting that FgKnr4 is an effector or virulence factor, we have been careful with our wording to indicate that FgKnr4 is simply necessary for full virulence and its disruption results in reduced virulence and have outlined how we believe FgKnr4 participates in a fungal signaling pathway required for infection of wheat.


      2.3. What new information is provided with WGCN modules compared with other GCN network in Fusarium (examples of GCN in Fusarium is below) ____https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5069591/ https://doi.org/10.1186/s12864-020-6596-y____ DOI: 10.1371/journal.pone.0013021. The GCN networks from Fusarium have already identified modules necessary/involved in pathogenesis.

      Response: __The 2016 New Phytologist gene regulatory network (GRN) by Guo et al. is large and comprehensive. However, only three of the eleven datasets are in planta, with just one dataset focusing on F. graminearum infection on wheat spikes. The other two in planta datasets involve barley infection and Fusarium crown rot. By combining numerous in planta and in vitro datasets, the previous GRNs lack the fine resolution needed to identify genetic relationships under specific conditions, such as the various stages of symptomatic and symptomless F. graminearum infection of mature flowering wheat plants. This limitation is highlighted in the 2016 paper itself. This network is expanded in the Guo et al., 2020 BMC genomics paper where it includes one additional in planta and nine in vitro datasets. However, the in planta dataset involves juvenile wheat coleoptile infection, which serves as an artificial model for wheat infection but is not on mature flowering wheat plants reminiscent of Fusarium Head Blight of cereals in the field. This model differs significantly in the mode of action of F. graminearum, notably DON mycotoxin is not essential for virulence in this context (Armer et al. 2024, https://pubmed.ncbi.nlm.nih.gov/38877764/ ). The Guo et al., 2020 paper still faces the same issues in terms of resolution and the inability to draw conclusions specific to the different stages of F. graminearum infection. Additionally, these GRNs use Affymetrix data, which miss over 400 genes (~ 3 % of the genome) from newer gene models. In contrast, our study addresses these limitations by analysing a meticulously sampled, stage- and tissue-specific in planta RNA-seq dataset using the latest reference annotation. Our approach provides higher resolution and insights into host transcriptomic responses during the infection process. The importance of our study in the context of these GRNs is now addressed in the introduction (__lines 85-92).


      2.4. Ideally, the WGCN should have been used identify plant targets of Fusarium pathogenicity genes. This would have provided credibility and usefulness of the WGCN. Many bioinformatic tools are available to identify virulence factors and the utility of WGCN in this regard is not viable. However, if the authors had overlapped the known virulence factors in a fungal module to a particular wheat module, the impact of the WGCN would be great. The module W12 has genes from numerous traits represented and WGCN could have been used to show novel links between Fg and wheat. For example, does tri5 mutant affect genes in other traits?

      __Response: __Thank you for your suggestions. In this study we have shown the association between the main fungal virulence factor of F. graminearum, DON mycotoxin, with wheat detoxification responses. Through this we have identified a set of tri5 responsive genes and validated this correlation in two genes belonging to the phenylalanine pathway and one transmembrane detoxification gene. Although we could validate more genes in this tri5 responsive wheat module, our paper aimed to investigate previously unstudied aspects of the F. graminearum infection process and how the fungus responded to changing conditions within the host environment. We accomplished this by characterising a gene within a fungal module that had limited annotation enrichment and few characterised genes. Tri5 on the other hand is the most extensively studied gene in F. graminearum and while the network we generated may offer new insights into tri5 responsive genes, this is beyond the scope of our current study. In addition to the tri5 co-regulated response, we have also demonstrated the coordinated response between the fungal module F16, which contains FgKnr4 that is necessary for tolerance to oxidative stress, and the wheat module W05, which is enriched for oxidative stress genes.


      While our co-expression network approach can be used to explore and validate other early downstream signaling and defense components in wheat cells, several challenges must be considered: (a) the poor quality of wheat gene calls, (b) genetic redundancy due to both homoeologous genes and large gene families, and (c) the presence of DON, which can inhibit translation and prevent many transcriptional changes from being realised within the host responses. Additionally, most plant host receptors are not transcriptionally upregulated in response to pathogen infection (most R gene studies for the NBS-LRR and exLRR-kinase classes), making their discovery through a transcriptomics approach unlikely. These points will be included in our discussion (lines 408-413), thank you.

      Specific issues

      • *

      2.5. Since tri5 mutant was used a proof of concept to link wheat/Fg modules, it would have been useful to show that TRI14, which is not involved DON biosynthesis, but involved in virulence ( https://doi.org/10.3390/applmicrobiol4020058____) impact the wheat module genes.


      Response: __Our goal was to show that wheat genes respond to the whole TRI cluster, not just individual TRI genes. Therefore, the tri5 mutant serves as a solid proof-of-concept, because TRI5 is essential for DON biosynthesis, the primary function of the TRI gene cluster, thereby representing the function of the cluster as a whole. This is now clarified in __lines 217-219. Additionally, the uncertainties surrounding other TRI mutants would complicate the question we were addressing-namely, whether a wheat module enriched in detoxification genes is responding to DON mycotoxin, as implied by shared co-expression patterns with the TRI cluster. For instance, the referenced TRI14 paper indicates that DON is produced in the same amount in vitro in a single media. Although the difference is not significant, the average DON produced is lower for the two Δtri14 transformants tested. Therefore, we cannot definitively rule out that TRI14 is involved in DON biosynthesis and extrapolate this to DON production in planta. Despite this, the suggestion is interesting, and would make a nice experiment but we believe it does not contribute to the overall aim of this study.

      2.6. Moreover, prior RNAseq studies with tri5 mutant strain on wheat would have revealed the expression of PAL and other phenylpropanoid pathway genes?

      __Response: __We agree that this would be an interesting comparison to make but unfortunately no dataset comparing in planta expression of the tri5 mutant within wheat spikes exists.

      2.7. Table S1 lists 15 candidate genes of the F16 module; however, supplementary File 1 indicates 74 genes in the same module. The basis of exclusion should be explained. The author has indicated genes with high MM was used as representative of the module. The 59 remaining genes of this module did not meet this criteria? Give examples.


      Response: __The 15 genes with the highest module membership were selected as initial candidates for further shortlisting from the 74 genes within module F16. In WGCNA, genes with high module membership (MM) (i.e. intramodular connectivity) are predicted to be central to the biological functions of the module (Langfelder and Horvath, 2008; https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-559 ) and continues to be a metric to identify biologically significant genes within WGCN analyses (https://bmcplantbiol.biomedcentral.com/articles/10.1186/s12870-024-05366-0 Tominello-Ramirez et al., 2024; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9151341/ ;Zheng et al., 2022; https://www.nature.com/articles/s41598-020-80945-3 Panahi and Hejazi et al 2021). Following methods by Mateus et al. (2019) (https://academic.oup.com/ismej/article/13/5/1226/7475138 ) key genes were defined as those exhibiting elevated MM within the module, which were also strongly correlated (R > |0.70|) with modules of the partner organism (wheat). We have clarified this point in the manuscript. Thank you for the suggestion. (__Lines 253-263).

      2.____8. A list from every module that pass this criteria will be useful resource for functional characterization studies.


      __Response: __A supplementary spreadsheet has been generated which includes full lists of the top 15 genes with the highest module membership within the five fungal modules correlated to wheat modules and a summary of shared attributes among them. Thank you for this suggestion.

      2.9. Figure 3 indicates TRI genes in the module F12; your PHI base in Supp File S2 lists only TRI14. Why other TRI genes such as TRI5 not present in this File?


      Response: For clarity, the TRI genes in module F12 are TRI3, TRI4, TRI11, TRI12, and TRI14 which was stated in Table 1. TRI5 clusters with its neighboring regulatory gene TRI6 in module F11, which exhibits a similar but reduced expression pattern compared to module F12. To improve clarity on this the TRI genes in module F12 are also listed in-text in line 168 and added to Figure 4. The enrichment and correlated relationship of W12 to a cluster's expression still imply a correlated response of the wheat gene to the TRI cluster's biosynthetic product (DON), which is absent in the Δtri5 mutant.

      TRI14 and TRI12 are listed in PHI-base. TRI12 was mistakenly excluded due to an unmapped Uniprot ID, which were added separately in the spreadsheet. We will recheck all unmapped ID lists to ensure all PHI-base entries are included in the final output. Thank you for pointing out this error.


      2.10. What is purpose of listing the same gene multiple times? Example, osp24 (a single gene in Fg) is listed 13 times in F01 module.


      __Response: __This is a consequence of each entry having a separate PHI ID, which represents different interactions including inoculations on different cultivar. Cultivar and various experimental details were omitted from the spreadsheet to reduce information density, however the multiple PHI base ID's will be kept separate to make the data more user friendly when working with the PHI-base database. An explanation for this is now provided in the file's explanatory worksheet, thank you.

      Reviewer #3:


      3.1. Why only use of high confidence transcripts maize to map the reads and not the full genome like Fusarium graminearum? I have never analyzed plant transcriptome.


      __Response: __ In the wheat genome, only high-confidence gene calls are used by the global community (Choulet et al., 2023; https://link.springer.com/chapter/10.1007/978-3-031-38294-9_4 ) until a suitable and stable wheat pan-genome becomes available.

      3.2. The regular output of DESeq are TPMs, how did the authors obtain the FPKM used in the analysis?


      Response: FPKM was calculated using the GenomicFeatures package and included on GitHub to enhance accessibility for other users. However, the input for WGCNA and this study as a whole was normalised counts rather than FPKM. The FPKM analysis was done to improve interoperability of the data for future users and made available on Github. To complement this, the information regarding FPKM calculation is now included in the methods section of the revised manuscript (line 491).

      3.3. Do the authors have a Southern blot to prove the location of the insertion and number of insertions in Zymoseptoria tritici mutant and complemented strains?


      __Response: __No, but the phenotype is attributed to the presence or absence of ZtKnr4, as the mutant was successfully complemented in multiple phenotypic aspects. This satisfies Koch's postulates which is the gold standard for reverse genetics experimentation (Falkow 1988; https://www.jstor.org/stable/4454582 ).

      __3.4. Boxplots and bar graphs should have the same format. In Figures 5 B and F and supplementary figure 6.3 the authors showed the distribution of samples but it is lacking in figure 3 B and all bar graphs. __


      __Response: __Graphs have been modified to display the distribution of all samples, thank you.

      3.5. Line 247 FGRAMPH1_0T23707 should be FGRAMPH1_01T23707


      __Response: __Thank you this has now been amended.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors develop a self-returning self-avoiding polymer model of chromosome organization and show that their framework can recapitulate at the same time local density and large-scale contact structural properties observed experimentally by various technologies. The presented theoretical framework and the results are valuable for the community of modelers working on 3D genomics. The work provides solid evidence that such a framework can be used, is reliable in describing chromatin organization at multiple scales, and could represent an interesting alternative to standard molecular dynamics simulations of chromatin polymer models.

      We appreciate the editor for an accurate description of the scope of the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Carignano et al propose an extension of the self-returning random walk (SRRW) model for chromatin to include excluded volume aspects and use it to investigate generic local and global properties of the chromosome 3D organization inside eukaryotic nuclei. In particular, they focus on chromatin volumic density, contact probability, and domain size and suggest that their framework can recapitulate several experimental observations and predict the effect of some perturbations.

      We thanks the reviewer for the attention paid to the manuscript and all the relevant comments.

      Strengths:

      - The developed methodology is convincing and may offer an alternative - less computationally demanding - framework to investigate the single-cell and population structural properties of 3D genome organization at multiple scales.

      - Compared to the previous SRRW model, it allows for investigation of the role of excluded volume locally.

      Excluded volume is accounted for everywhere, not locally. We emphasized this on page 3, line 182:

      “The method that we employ to remove overlaps is a low-temperature-controlled molecular dynamics simulation using a soft repulsive interaction potential between initially overlapping beads, that is terminated as soon as all overlaps have been resolved, as described in the Appendix 3.”


      - They perform some experiments to compare with model predictions and show consistency between the two.

      Weaknesses:

      - The model is a homopolymer model and currently cannot fully account for specific mechanisms that may shape the heterogeneous, complex organization of chromosomes (TAD at specific positions, A/B compartmentalization, promoter-enhancer loops, etc.).

      The SR-EV model is definitely not a homo-polymer, as it is not a regular concatenation of a single monomeric unit.

      The model includes loops, which may happen in two ways: 1) As in the SRRW, branching structures emerging from the configuration backbone can be interpreted as nested loops and 2) A relatively long forward step followed by a return is a single loop. The model induces the formation of packing domains, which are not TADs, and are quantitatively in agreement with ChromSTEM experiments.

      We consider convenient to add a new figure that will further clarify the structures obtained with the SR-EV model. The following paragraph and figure has been added in page 5:

      “The density heterogeneity displayed by the SR-EV configurations can be analyzed in terms of the accessibility. One way to reveal this accessibility is by calculating the coordinations number (CN) for each nucleosome, using a coordination radius of 11.5 nm, along the SR-EV configuration. CN values range from 0 for an isolated nucleosome to 12 for a nucleosome immersed in a packing domain. In Figure 3 we show the SR-EV configuration showed in Figure 2, but colored according to CN. CN can be also considered as a measure to discriminate heterochromatin (red) and euchromatin (blue). Figure 3-A shows how the density inhomogeneity is coupled to different CN, with high CN represented in red and low CN represented in blue. Figure 3-B show a 50 nm thick slab obtained from the same configuration that clearly show the nucleosomes at the center of each packing domains are almost completely inaccesible, while those outside are open and accessible. It is also clear that the surface of the packing domains are characterized by nearly white nucleosomes, i.e. coordinated towards the center of the domain and open in the opposite direction.”

      - By construction of their framework, the effect of excluded volume is only local and larger-scale properties for which excluded volume could be a main actor (formation of chromosome territories [Rosa & Everaers, PLoS CB 2009], bottle-brush effects due to loop extrusion [Polovnikov et al, PRX 2023], etc.) cannot be captured.

      Excluded volume is considered for all nucleosomes, including overlapping beads distant along the polymer chain. Chromosome territories can be treated, but it is not in this case because we look at a single model chromosome.

      - Apart from being a computationally interesting approach to generating realistic 3D chromosome organization, the method offers fewer possibilities than standard polymer models (eg, MD simulations) of chromatin (no dynamics, no specific mechanisms, etc.) with likely the same predictive power under the same hypotheses. In particular, authors often claim the superiority of their approach to describing the local chromatin compaction compared to previous polymer models without showing it or citing any relevant references that would show it.

      We apologize if the text transmit an idea of superiority over other methods that was not intended. SR-EV is an alternative tool that may give a different, even complementary point of view, to standard polymer models.

      - Comparisons with experiments are solid but are not quantified.

      The comparisons that we have presented are quantitative. We do not have so far a way to characterize alpha or phi, a priori, for a particular system.

      Impact:

      Building on the presented framework in the future to incorporate TAD and compartments may offer an interesting model to study the single-cell heterogeneity of chromatin organization. But currently, in this reviewer's opinion, standard polymer modeling frameworks may offer more possibilities.

      We thank the reviewer for the positive opinion on the potential of the presented method. The incorporation of TADs and compartments is left for a future evolution of the model as its complexity will make this work extremely long.

      Reviewer #2 (Public Review):

      Summary:

      The authors introduce a simple Self Returning Excluded Volume (SR-EV) model to investigate the 3D organization of chromatin. This is a random walk with a probability to self-return accounting for the excluded volume effects. The authors use this method to study the statistical properties of chromatin organization in 3D. They compute contact probabilities, 3D distances, and packing properties of chromatin and compare them with a set of experimental data.

      We thank the reviewer for the attention paid to our manuscript.

      Strengths:

      (1) Typically, to generate a polymer with excluded volume interactions, one needs to run long simulations with computationally expensive repulsive potentials like the WeeksChanlder-Anderson potential. However, here, instead of performing long simulations, the authors have devised a method where they can grow polymer, enabling quick generation of configurations.

      (2) Authors show that the chromatin configurations generated from their models do satisfy many of the experimentally known statistical properties of chromatin. Contact probability scalings and packing properties are comparable with Chromatin Scanning Transmission Electron Microscopy (ChromSTEM)  experimental data from some of the cell types.

      Weaknesses:

      This can only generate broad statistical distributions. This method cannot generate sequence-dependent effects, specific TAD structures, or compartments without a prior model for the folding parameter alpha. It cannot generate a 3D distance between specific sets of genes. This is an interesting soft-matter physics study. However, the output is only as good as the alpha value one provides as input.

      We proposed a model to create realistic chromatin configuration that we have contrasted with specific single cell experiments, and also reproducing ensemble average properties. 3D distances between genes can be calculated after mapping the genome to the SR-EV configuration. The future incorporation of the genome sequence will also allow us to describe TADs and A/B compartments. See added paragraph in the Discussion section:

      “The incorporation of genomic character to the SR-EV model will allow us to study all individual single chromosomes properties, and also topological associated domains and A/B compartmentalization from ensemble of configurations as in HiC experiments. “

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major:

      - In the introduction and along the text, the authors are often making strong criticisms of previous works (mostly polymer simulation-based) to emphasize the need for an alternative approach or to emphasize the outcomes of their model. Most of these statements (see below) are incomplete if not wrong. I would suggest tuning down or completely removing them unless they are explicitly demonstrated (eg, by explicit quantitative comparisons). There is no need to claim any - fake - superiority over other approaches to demonstrate the usefulness of an approach. Complementarity or redundance in the approaches could also be beneficial.

      We regret if we unintentionally transmitted a claim of superiority. We have made several small edits to change that.

      - Line 42-43: at least there exist many works towards that direction (including polymer modeling, but also statistical modeling). For eg, see the recent review of Franck Alber.

      Line removed. Citation to Franck Alber included below in the text.

      - Line 54-57: Point 1 is correct but is it a fair limitation? These models can predict TADs & compartments while SR-EV no. Point 2 is wrong, it depends on the resolution of the model and computer capacity but it is not an intrinsic limitation. Point 3 is wrong, such models can predict very well single-cell properties, and again it is not an intrinsic limitation of the model. Point 4 is incorrect. The space-filling/fractal organization was an (unfortunate) picture to emphasize the typical organization of chromosomes in the early times (2009), but crumpled polymers which are a more realistic description are not space-filling (see Halverson et al, 2013).

      Text involving points 1 to 4 removed. It was unnecessary and does not change the line of the paper.

      - L400-402 + 409-411: in such a model, the biphasic structure may emerge from loop extrusion but also naturally from the crumpled polymer organization. Simple crumpled polymer without loop extrusion and phase separation would also produce biphasic structures.

      Yes, we agree. Also SR-EV leads to biphasic structures.

      - L 448-449: any data to show that existing polymer modeling would predict a strong dependency of C_p(n) on the volumic fraction (in the range studied here)?

      No, I don’t know a work predicting that.

      - Fig. 4:

      - Large-scale structural properties (R^2(n) and C_p(n)) are not dependent on phi. Is it surprising that by construction, SR-EV only relaxes the system locally after SRRW application?

      Excluded volume is considered at all length scales. However, as the decreasing C_p curves observed in theories and experiments imply, the fraction of overlap (or contacts) is more important at small separations (local) than at large separations. Yet, it was a surprise for us to observed negligible effect on phi.

      - Why not make a quantitative comparison between predicted and measured C_p(n)? Or at least plotting them on the same panel.

      Panels B and C are in the same scale and show a good agreement between SR-EV and experiments. However, it is not perfectly quantitative agreement. SR-EV represents the generic structure of chromatin and perfect agreement should not be expected.

      - Comparison with an average C_p(n) over all the chromosomes would be better.

      Possibly, but we don’t think it adds anything to the paper.

      - In Figure 5,6,7 (and related text): authors often describe some parameter values that are 'closest to experiment findings'. Can the authors quantify/justify this? The various 'closest' parameters are different. Can the authors comment?

      The folding parameter and average volume fraction are chose so that the agreement is best with the displayed experimental system, different cell for each case.

      - Figure 5: why not show the experimental distribution from Ou et al?

      - Figure 6 & 7: experimental results. Can the authors show images from their own experiments? Can they show that cohesion/RAD21 is really depleted after auxin treatment?

      It is currently under review in a different journal.

      - In the Discussion, a fair discussion on the limitations of the methods (dynamics, etc) is missing.

      Minor

      - Line 34-36: the logical relationship between this sentence and the ones before and after is very unclear.

      - Along the text, authors use the term 'connectivity' to describe 3D (Hi-C) contacts between different regions of the same chromosome/polymer. This is misleading as connectivity in polymer physics describes the connection along the polymer and not in the 3D space.

      No. I don’t think we used connectivity in that sense. We agree with your statement on the use of connectivity in polymer physics, and is what we always had in mind for this model.

      - Line 92: typo.

      - On the SR-EV method: does the relaxation process create local knots in the structure?

      We have not checked for knots.

      - Table 1: the good correspondence with linker length is remarkable but likely 'fortunate', other chosen resolutions would have led to other results. Moreover, the model cannot account for the fine structure of chromatin fiber. Can the authors comment on that?

      Fortunate to the extent that we sample the model parameter to overall catch the structure of chromatin.

      - Line 211: 'without the need of imposing any parameter': alpha is a parameter, no?

      Correct. Phrase deleted.

      - L267-269 & 450-451: actually in Liu & Dekker, they do observe an effect on Hi-C map (C_p(n)), weak but significant and not negligible.

      Our statements read ‘minimal’ and ‘relatively insensitive’. It is observed, but very small.

      - L283-286: This is a perspective statement that should be in the discussion.

      Moved to the Discussion, as suggested.

      - L239-241: The authors seem to emphasize some contradictions with recent results on phase separation. This is unclear and should be relocated to discussion.

      We just pointed out recent experiments, as stated. No intention to generate a discussion with any of them.

      - L311-313: Unclear statement.

      - L316-325: This is not results but discussion/speculation.

      Moved to Discussion

      - Along the text: 'promotor'-> 'promoter'. 

      - Corrected.

      - L364: explain more in detail PWS microscopy.

      Reviewer #2 (Recommendations For The Authors):

      Even though there are claims about nucleosome-resolution chromatin polymer, it is not clear that this work can generate structures with known nucleosome-resolution features. Nucleosome-level structure is much beyond a random walk with excluded volume and is driven by specific interactions. The authors should clarify this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      Federer et al. tested AAVs designed to target GABAergic cells and parvalbumin-expressing cells in marmoset V1. Several new results were obtained. First, AAV-h56D targeted GABAergic cells with >90% specificity, and this varied with serotype and layer. Second, AAV-PHP.eB.S5E2 targeted parvalbumin-expressing neurons with up to 98% specificity. Third, the immunohistochemical detection of GABA and PV was attenuated near viral injection sites.

      Strengths:

      Vormstein-Schneider et al. (2020) tested their AAV-S5E2 vector in marmosets by intravenous injection. The data presented in this manuscript are valuable in part because they show the transduction pattern produced by intraparenchymal injections, which are more conventional and efficient.

      Our manuscript additionally provides detailed information on the laminar specificity and coverage of these viral vectors, which was not investigated in the original studies.

      Weaknesses:

      The conclusions regarding the effects of serotype are based on data from single injection tracks in a single animal. I understand that ethical and financial constraints preclude high throughput testing, but these limitations do not change what can be inferred from the measurements. The text asserts that "...serotype 9 is a better choice when high specificity and coverage across all layers are required". The data presented are consistent with this idea but do not make a strong case for it.

      We are aware of the limitations of our results on the AAV-h56D. We agree with the Reviewer that a single injection per serotype does not allow us to make strong statements about differences between the 3 serotypes. Therefore, in the revised version of the manuscript we have tempered our claims about such differences and use more caution in the interpretation of these data (Results p. 6 and Discussion p.10). Despite this weakness, we feel that these data still demonstrate high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested. We feel that in itself this is sufficiently useful information for the primate community, worthy of being reported. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 could have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.

      A related criticism extends to the analysis of Injection volume on viral specificity. Some replication was performed here, but reliability across injections was not reported. My understanding is that individual ROIs were treated as independent observations. These are not biological replicates (arguably, neither are multiple injection tracks in a single animal, but they are certainly closer). Idiosyncrasies between animals or injections (e.g., if one injection happened to hit one layer more than another) could have substantial impacts on the measurements. It remains unclear which results regarding injection volume or serotype would hold up had a large number of injections been made into a large number of marmosets.

      For the AAV-S5E2, we made a total of 7 injections (at least 2 at each volume), all of which, irrespective of volume, resulted in high specificity and efficiency for PV interneurons. Our conclusion is that larger volumes are slightly less specific, but the differences are minimal and do not warrant additional injections. Additionally, we kept all the other parameters across animals constant (see new Supplementary Table 1), all of our injections involved all cortical layers, and the ROIs we selected for counts encompassed reporter protein expression across all layers. To provide a better sense of the reliability of the results across injections, in the revised version of the manuscript we now provide results for each of the AAV-S5E2 injection case separately in a new Supplementary Table 2. The results in this table indicate the results are indeed rather consistent across cases with slightly greater specificity for injection volumes in the range of 105-180 nl.

      Reviewer #2 (Public Review):

      This is a straightforward manuscript assessing the specificity and efficiency of transgene expression in marmoset primary visual cortex (V1), for 4 different AAV vectors known to target transgene expression to either inhibitory cortical neurons (3 serotypes of AAV-h56D-tdTomato) or parvalbumin (PV)+ inhibitory cortical neurons in mice. Vectors are injected into the marmoset cortex and then postmortem tissue is analyzed following antibody labeling against GABA and PV. It is reported that: "in marmoset V1 AAV-h56D induces transgene expression in GABAergic cells with up to 91-94% specificity and 80% efficiency, depending on viral serotype and cortical layer. AAV-PHP.eB-S5E2 induces transgene expression in PV cells across all cortical layers with up to 98% specificity and 86-90% efficiency."

      These claims are largely supported but slightly exaggerated relative to the actual values in the results presented. In particular, the overall efficiency for the best h56D vectors described in the results is: "Overall, across all layers, AAV9 and AAV1 showed significantly higher coverage (66.1{plus minus}3.9 and 64.9%{plus minus}3.7)". The highest coverage observed is just in middle layers and is also less than 80%: "(AAV9: 78.5%{plus minus}9.1; AAV1: 76.9%{plus minus}7.4)".

      In the abstract, we indeed summarize the overall data and round up the decimals, and state that these percentages are upper bound but that they vary by serotype and layer while in the Results we report the detailed counts with decimals. To clarify this, in the revised version of the Abstract we have changed 80% to 79% and emphasize even more clearly the dependence on serotype and layer. We have amended this sentence of the Abstract as follows: “We show that in marmoset V1 AAV-h56D induces transgene expression in GABAergic cells with up to 91-94% specificity and 79% efficiency, but this depends on viral serotype and cortical layer.”

      For the AAV-PHP.eB-S5E2 the efficiency reported in the abstract (“86-90%) is also slightly exaggerated relative to the results: “Overall, across all layers coverage ranged from 78%{plus minus}1.9 for injection volumes >300nl to 81.6%{plus minus}1.8 for injection volumes of 100nl.”

      Indeed, the numbers in the Abstract are upper bounds, for example efficiency in L4A/B with S5E2 reaches 90%. To further clarify this important point, in the revised abstract we now state ”AAV-PHP.eB-S5E2 induces transgene expression in PV cells across all cortical layers with up to 98% specificity and 86-90% efficiency, depending on layer”.

      These data will be useful to others who might be interested in targeting transgene expression in these cell types in monkeys. Suggestions for improvement are to include more details about the vectors injected and to delete some comments about results that are not documented based on vectors that are not described (see below).

      Major comments:

      Details provided about the AAV vectors used with the h56D enhancer are not sufficient to allow assessment of their potential utility relative to the results presented. All that is provided is: "The fourth animal received 3 injections, each of a different AAV serotype (1, 7, and 9) of the AAV-h56D-tdTomato (Mehta et al., 2019), obtained from the Zemelman laboratory (UT Austin)." At a minimum, it is necessary to provide the titers of each of the vectors. It would also be helpful to provide more information about viral preparation for both these vectors and the AAVPHP.eB-S5E2.tdTomato. Notably, what purification methods were used, and what specific methods were used to measure the titers?

      We thank the Reviewer for this comment. In the revised version of the manuscript, we now provide a new Supplementary Table 1 with titers and other information for each viral vector injection. We also provide information regarding viral preparation in a new sections in the Methods entitled “ Viral Preparation”  (p12).

      The first paragraph of the results includes brief anecdotal claims without any data to support them and without any details about the relevant vectors that would allow any data that might have been collected to be critically assessed. These statements should be deleted. Specifically, delete: “as well as 3 different kinds of PV-specific AAVs, specifically a mixture of AAV1-PaqR4-Flp and AAV1-h56D-mCherry-FRT (Mehta et al., 2019), an AAV1-PV1-ChR2-eYFP (donated by G. Horwitz, University of Washington),” and delete “Here we report results only from those vectors that were deemed to be most promising for use in primate cortex, based on infectivity and specificity. These were the 3 serotypes of the GABA-specific pAAV-h56D-tdTomato, and the PV-specific AAVPHP.eB-S5E2.tdTomato.” These tools might in fact be just as useful or even better than what is actually tested and reported here, but maybe the viral titer was too low to expect any expression.

      These data are indeed anecdotal, but we felt this could be useful information, potentially preventing other primate labs from wasting resources, animals and time, particularly, as some of these vectors have been reported to be selective and efficient in primate cortex, which we have not been able to confirm. We made several injections in several animals of those vectors that failed either to infect a sufficient number of cells or turned out to be poorly specific. Therefore, the negative results have been consistent in our hands. But we agree with the Reviewer that our negative results could have depended on factors such as titer. In the revised version of the manuscript, following the reviewer’s suggestion, we have deleted this information.

      Based on the description in the Methods it seems that no antibody labeling against TdTomato was used to amplify the detection of the transgenes expressed from the AAV vectors. It should be verified that this is the case - a statement could be added to the Methods.

      That is indeed the case. We used no immunohistochemistry to enhance the reporter proteins as this was unnecessary. The native/ non-amplified tdT signal was strong. This is now stated in the methods (p.12).

      Reviewer #3 (Public Review):

      Summary:

      Federer et al. describe the laminar profiles of GABA+ and of PV+ neurons in marmoset V1. They also report on the selectivity and efficiency of expression of a PV-selective enhancer (S5E2). Three further viruses were tested, with a view to characterizing the expression profiles of a GABA-selective enhancer (h56d), but these results are preliminary.

      Strengths:

      The derivation of cell-type specific enhancers is key for translating the types of circuit analyses that can be performed in mice - which rely on germline modifications for access to cell-type specific manipulation - in higher-order mammals. Federer et al. further validate the utility of S5E2 as a PV-selective enhancer in NHPs.

      Additionally, the authors characterize the laminar distribution pattern of GABA+ and PV+ cells in V1. This survey may prove valuable to researchers seeking to understand and manipulate the microcircuitry mediating the excitation-inhibition balance in this region of the marmoset brain.

      Weaknesses:

      Enhancer/promoter specificity and efficiency cannot be directly compared, because they were packaged in different serotypes of AAV.

      The three different serotypes of AAV expressing reporter under the h56D promoter were only tested once each, and all in the same animal. There are many variables that can contribute to the success (or failure) of a viral injection, so observations with an n=1 cannot be considered reliable.

      This is an important point that was also brough up by Reviewer 1, which we have addressed in our reply-to-Reviewer 1. For clarity and convenience, below we copy our response to Reviewer 1.

      “We are aware of the limitations of our results on the AAV-h56D. We agree with the Reviewer that a single injection per serotype does not allow us to make strong statements about differences between the 3 serotypes. Therefore, in the revised version of the manuscript we will temper our claims about such differences and use more caution in the interpretation of these data. Despite this weakness, we feel that these data still demonstrate high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested. We feel that in itself this is sufficiently useful information for the primate community, worthy of being reported. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 would have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.”

      The language used throughout conflates the cell-type specificity conferred by the regulatory elements with that conferred by the serotype of the virus.

      Authors’ reply. In the revised version of the manuscript, we have corrected ambiguous language throughout.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      My Public Review comments can be addressed by dialing down the interpretation of the data or providing appropriate caveats in the presentation of the relevant results and their discussion.

      We have done so. See text additions on p. 6 of the Results and p.10 of the Discussion.

      Minor comments:

      92% of PV+ neurons in the marmoset cortex were GABAergic. Can the authors speculate on the identity of the 8% PV+/GABA- neurons (e.g., on the basis of morphology)? Are they likely excitatory? Are they more likely to represent failures of GABA staining?

      We do not know what the other 8% of PV+/GABA- neurons are because we did not perform any other kind of IHC staining. Our best guess is that at least to some extent these represent failures of GABA staining, which is always challenging to perform in primate cortex. However, in mouse PV expression has been demonstrated in a minority of excitatory neurons.

      "Coverage of the PV-AAV was high, did not depend on injection volume.." The fact that the coverage did not depend on injection volume presumably depends, at least in part, on how ROIs were selected. Surely different volumes of injection transduce different numbers of neurons at different distances from the injection track. This should be clarified.

      The ROIs were selected at the center of the injected site/expression core from sections in which the expression region encompassed all cortical layers. Of course, larger volumes of injection resulted in larger transduced regions and therefore overall larger number of transduced neurons, but we counted cells only withing 100 µm wide ROIs at the center of the injection and the percent of transduced PV cells in this core region did not vary significantly across volumes. We have clarified the methods of ROI selection (see Methods pp. 13).

      Figure 2. What is meant by “absolute” in the legend for Figure 2? (How does “mean absolute density” differ from “mean density?”)

      We meant not relative, but this is obvious from the units, so we have removed the word “absolute” in the legend.

      Some non-significant p-values are indicated by "p>0.05" whereas others are given precisely (e.g., p = 1). Please provide precise p-values throughout. Also, the p-value from a surprisingly large number of comparisons in the first section of the results is "1". Is this due to rounding? Is it possible to get significance in a Bonferroni-corrected Kruskal-Wallis test with only 6 observations per condition?

      We now report exact p values throughout the manuscript (with a couple of exceptions where, in order to avoid reporting a large number of p values which interrupts the flow of the manuscript) we provide the upper bound value and state all those comparisons were below that value). The minimum sample size for Kruskall Wallis is 5, for each group being compared, and we our sample is 6 per group.

      Figure 3: The density of tdTomato-expressing cells appears to be greater at the AAV9 injection site than at the AAV1 injection site in the example sections shown. Might some of the differences between serotypes be due to this difference? I would imagine that resolving individual cells with certainty becomes more difficult as the amount of tdTomato expression increases.

      There was an error in the scale bar of Fig. 3C, so that the AAV1 injection site was shown at higher magnification than indicated by the wrong scale bar. Hence the density of tdTomato appeared lower than it is. Moreover, the tdT expression region shown in Fig. 3A is a merge of two sections, while it is only from a single section in panels B and C, leading to the impression of higher density of infected cells in panel A. The pipette used for the injection in panel A was not inserted perfectly vertical to the cortical surface, resulting in an injection site that did not span all layers in a single section; thus, to demonstrate that the injection indeed encompassed all layers (and that the virus infected cells in all layers), we collapsed label from two sections. We have now corrected the magnification of panel C so that it matches the scale bar in panel A, and specify in the figure legend that panel A label is from two sections.

      Text regarding Figure 3: The term “injection sizes” is confusing. I think it is intended to mean “the area over which tdTomato-expressing cells were found” but this should be clarified.

      Throughout the manuscript, we have changed the term injection site to “viral-expression region”.

      Figure 3: What were the titers of the three AAV-h56D vectors?

      Titers are now reported in the new Supplementary Table 1.

      Figure 3: The yellow box in Figure 3C is slightly larger than the yellow boxes in 3A and 3B. Is this an error or should the inset of Figure 3 have a scale bar that differs from the 50 µm scale bar in 3A?

      There were indeed errors in scale bars in this figure, which we have now corrected. Now all boxes have the same scale bar.

      Was MM423 one of the animals that received the AAV-h56D injections or one of the three that received AAV-S5E2 injection?

      This is an animal that received a 315nl injection of AAV-PHP.eB-S5E2.tdTomato. This is now specified in the Methods (see p. 12) and in the new Supplementary Table 1.

      Please provide raw cell counts and post-injection survival times for each animal.

      We now provide this information in Supplementary Tables 1 and 2.

      How were the different injection volumes of the AAV-S5E2 virus arranged by animal? Which volume of the AAV-S5E2 virus was injected into the two animals who received single injections?

      We now provide this information in Supplementary Table 1.

      Figure 6A: the point is made in the text that "[the distribution of tdT+ and PV+ neurons] did not differ significantly... peaking in L2/3 and 4C " Is the fact that the number of tdT+ and PV+ peak in layers 2/3 and 4C a consequence of these layers being thicker than the others? If so, this statement seems trivial.

      No, and this is the reason why we measured density in addition to percent of cells across layers in Figure 2. Figure 2B shows that even when measuring density, therefore normalizing by area, GABA+ and PV+ cell density still peaks in L2/3 and 4. Thus, these peaks do not simply reflect the greater thickness of these layers.

      Do the authors have permission to use data from Xu et al. 2010?

      Yes, we do.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      "Viral strategies to restrict gene expression to PV neurons have also been recently developed (Mehta et al., 2019; Vormstein-Schneider et al., 2020)." Mich et al. should also be cited here. Cell Rep. 2021;34(13):108754.

      We thank the reviewer for pointing out this missing references. This is now cited.

      “GABA density in L4C did not differ from any other layers, but the percent of GABA+ cells in L4C was significantly higher than in L1 (p=0.009) and 4A/B (p=<0.0001).” This and other similar observations depend on calculating the percentage of cells relative to the total number of DAPI-labeled cells in each layer. Since it is apparent that there must be considerable variability between layers, it would be helpful to add a histogram showing the densities of all DAPI-labeled cells for each layer.

      This is not how we calculated density. Density, as now clarified in the Results on p. 4, was defined as the number of cells per unit area. Counts in each layer were divided by each layers’ counting area. This corrects for differences in number of total labeled cells per layer. Therefore, reporting DAPI density is not necessary (we did not count DAPI cell density per layer).

      "Identical injection volumes of each serotype, delivered at 3 different cortical depths (see Methods), resulted in different injection sizes, suggesting the different serotypes have different capacity of infecting cortical neurons. AAV7 produced the smallest injection site, which additionally was biased to the superficial and deep layers, with only few cells expressing tdT in the middle layers (Fig. 3B). AAV9 (Fig. 3A) and AAV1 (Fig. 3C) resulted in larger injection sites and infected all cortical layers." Differences noted here might reflect either differences related to the AAV serotype or to differences in titers. Please add details about titers for each vector and add comments as appropriate. Another interpretation would be that there are differences in viral spread within the tissue.

      We have now added Supplementary Table 1 which reports titers in addition to other information about injections. The titers and volumes used for AAV9 and AAV7 were identical, while the titer for AAV1 was higher. Therefore, the differences in infectivity, particularly the much smaller expression region obtained with AAV7 cannot be attributed to titer. Likely this is due to differences in tropism and/or viral spread among serotypes. This is now discussed (see Results p. 5bottom and 6 top).

      “Recently, several viral vectors have been identified that selectively and efficiently restrict gene expression to GABAergic neurons and their subtypes across several species, but a thorough validation and characterization of these vectors in primate cortex has lacked.” Is this really a fair statement, or is the characterization presented here also lacking? Methods used by others for quantifying specificity and efficiency are essentially the same as used here. See for example Mich et al. (which is not cited).

      The original validation in primates of the vectors examined in our study was based on small tissue samples and did not examine the laminar expression profile of transgene expression induced by these enhancer-AAVs. For example, the validation of the h56D-AAV in marmoset cortex in the original paper by Mehta et al (2019) was performed on a tissue biopsy with no knowledge of which cortical layers were included in the tissue sample. The only study that shows laminar expression in primate cortex (Mich et al., which is now cited), only shows qualitative images of viral expression across layers, reporting total specificity and coverage pooled across samples; moreover, the study by Mich et al.  deals with different PV-specific enhancers than the ones characterized in our study. Unlike any of the previous studies, here we have quantified specificity and coverage across layers.

      "Specifically, we have shown that the GABA-specific AAV9-h56D (Mehta et al., 2019) induces transgene expression in GABAergic cells with up to 91-94% specificity and 80% coverage, and the PV-specific AAV-PHP.eB-S5E2 (Vormstein-Schneider et al., 2020) induces transgene expression in PV cells with up to 98% specificity and 86-90% coverage." These statements in the discussion repeat the somewhat exaggerated coverage numbers noted above for the Abstract.

      The averages across all layers are reported in the Results. The Discussion, abstract and discussion report upper limits, and this is made clear by stating “up to”, and now we have also added “depending on layer”.

      Reviewer #3 (Recommendations For The Authors):

      Abstract:

      • Ln 2: Can you be more specific about what you mean by the 'various functions of inhibition'? e.g. do you mean 'the various inhibitory influences on the local microcircuit' or similar?

      These are listed in the introduction to the paper but there is no space in the abstract to do so. Now the sentence reads: “various computational functions of…”.

      • Ln 5: 'has' to 'is'/'has been'.

      The grammar here is correct “has derived”.

      • Ln 6: humans are primates! Maybe change this to 'nonhuman primates'?

      We have added “non-human”

      • Ln n-1: 'viral vectors represent' -> 'viral vectors are'.

      We have changed it to “are”

      Intro:

      • Many readers may expect 'VIP' to be listed as the third major sub-class of interneurons. Could you note that the 5HT3a receptor-expressing group includes VIP cells?

      Done (p.3).

      • "Understanding cortical inhibitory neuron function in the primate is critical for understanding cortical function and dysfunction in the model system closest to humans" - this seems close to being circular logic (not quite, but close). Could you modify this sentence to reflect why understanding cortical function and dysfunction in NHP may be of interest?

      This sentence now reads (p.3):” Understanding cortical inhibitory neuron function in the primate is critical for understanding cortical function and dysfunction in the model system closest to humans, where cortical inhibitory neuron dysfunction has been implicated in many neurological and psychiatric disorders, such as epilepsy, schizophrenia and Alzheimer’s disease (Cheah et al., 2012; Verret et al., 2012; Mukherjee et al., 2019)”. We also note that this was already stated in the previous version of the paper but in the Discussion section which read (and still reads on p. 9 2nd paragraph): “It is important to study inhibitory neuron function in the primate, because it is unclear whether findings in mice apply to higher species, and inhibitory neuron dysfunction in humans has been implicated in several neurological and psychiatric disorders (Marin, 2012; Goldberg and Coulter, 2013; Lewis, 2014).”.

      • "In particular, two recent studies have developed recombinant adeno-associated viral vectors (AAV) that restrict gene expression to GABAergic neurons". This sentence places the emphasis on the wrong component of the technology. The fact that AAV was used is irrelevant; these constructs could equally have been packaged in a lenti, CAV, HSV, rabies, etc. The emphasis should be on the recently developed regulatory elements (the enhancers/promoters).

      Same problem with the following excerpts; this text implies that the serotype/vector confers cell-type selectivity, but the results presented do not support this assertion (the promoter/enhancer is what confers the selectivity).

      • "specifically, three serotypes of an AAV that restricts gene expression to GABAergic neurons".

      • "one serotype of an AAV that restricts gene expression to PV cells".

      • "GABA- and PV-specific AAVs".

      • "GABA-specific AAV" (in results).

      • "PV-specific AAVs".

      • "In this study, we have characterized several AAV vectors designed to restrict expression to GABAergic cells" (in discussion).

      • "GABA-virus". GABA is a NT, not a virus.

      We have modified the language in all these sections and throughout the manuscript.

      Results:

      • Enhancer specificity and efficiency cannot be directly compared, because they were packaged in different serotypes of AAV.

      We agree, and in fact we are not making comparisons between different enhancers (i.e., S5E2 and h56D).

      The three different serotypes of AAV expressing reporter under the h56D promoter were only tested once each, and all in the same animal. There are many variables that can contribute to the success (or failure) of a viral injection, so observations with an n=1 cannot be considered reliable.

      The authors need to either: (1) replicate the h56D virus injections in (at least) a second animal, or (2) rewrite the paper to focus on the AAV.PhP mDlx virus alone - for which they have adequate data - and mention the h56D data as an anecdotal result, with clear warnings about the preliminary nature of the observations due to lack of replication.

      We agree about the lack of sufficient data to make strong statements about the differences between serotypes for the h56D-AAV. In the revised version of the manuscript, following the Reviewers’ suggestion, we have chosen to temper our claims about differences between serotypes for the h56D enhancer and use more caution in the interpretation of these data. We feel that these data still demonstrate sufficiently high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested, to warrant their use in primates. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 could have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species. Our edits in regard to this point can be found in the Results on p. 6 and Discussion on p. 10.

      • Did the authors compare h56D vs mDlx? This would be a useful and interesting comparison.

      We did not.

      • 3 tissue sections were used for analysis. How were these selected? Did the authors use a stereological approach?

      For the analysis in Fig. 2, the 3 sections were randomly selected and for the positioning of the ROIs we selected a region in dorsal V1 anterior to the posterior pole  (to avoid laminar distortions due to the curvature of the brain). This is now specified (see p. 4).

      • "both GABA+ and PV+ cells peak in layers" revise for clarity (e.g., the counts peak).

      In now reads “GABA+ and PV+ cell percent and density” (see p.4).

      • "we refer to this virus as GABA-AAV" these are 3 different viruses!

      The idea here was to use an abbreviation instead of using the full viral name every single time. Clearly the reviewer does not like this, so we have removed this convention throughout the paper and now specify the entire viral name each time.

      • "Identical injection volumes of each serotype, delivered at 3 different cortical depths (see Methods), resulted in different injection sizes". Do you mean 'resulted in different volumes of expression'?

      Yes. We have now rephrased this as follows: “…resulted in viral expression regions that differed in both size as well as laminar distribution” (p.5).

      • “suggesting the different serotypes have different capacity of infecting cortical neurons”. You can’t draw any firm conclusions from a single injection. The rest of this section of the results, along with the whole of Figure 4, and Figure 7a-d, is in danger of being misleading. Please remove. The best you can do here is to say ‘we injected 3 different viruses that express reporter under the h56D promoter. The results are shown in Figure 3, but these are anecdotal, as only a single injection of each virus was performed’. You could then note in the discussion to what extent these results are consistent with the existing literature (e.g., AAV9 often produces good coverage in NHP – anterograde and retrograde, AAV1 also works well in the CNS, although generally doesn’t infect as aggressively as AAV9. I’m not familiar with any attempts to use AAV7).

      With respect to Fig. 4, our approach in the revised version is detailed above. For convenience we copy it below here. With respect to Fig 7A-D, we feel the results are more robust as the data from the 3 serotypes here were pooled together, as the 3 serotype similarly downregulated GABA and PV expression at the injection site, and we do not make any statement about differences among serotypes for the data shown in Fig. 7A-D.

      “In the revised version of the manuscript, following the Reviewer ’s suggestion, we have chosen to temper our claims about differences between serotypes for the h56D enhancer and use more caution in the interpretation of these data (see revised text in the Results on p. 6 and in the Discussion on p. 10). We feel that these data still demonstrate sufficiently high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested, to warrant their use in primates. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 could have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.”

      • Figure 3: why the large variation in tissue quality? Are the 3 upper images taken at the same magnification? If not, they need different scale bars. The cells in A (upper row) look much smaller than those in B and C, and the size of the 'inset' box varies.

      We thank the reviewer for noticing this. We discovered an error in the scale bar of Fig. 3C, so that the AAV1 injection site was shown at higher magnification than indicated by the wrong scale bar. We have now corrected the error in scale bars. We have also fixed the different box sizes.

      • "Overall, across all layers coverage ranged from 78%{plus minus}1.9 for injection volumes >300nl to 81.6%{plus minus}1.8 for injection volumes of 100nl." Coverage didn't differ between layers, so revise this to: "Overall, across all layers coverage ranged from 78% to 81.6%." or give an overall mean (~80%).

      We have corrected the sentence as suggested by the Reviewer (see p. 8 first paragraph).

      • "extending farther from the borders" -> "extending beyond the borders".

      We have corrected the sentence as suggested by the Reviewer (see p. 8).

      • "The reduced GABA and PV immunoreactivity caused by the viruses implies that the specificity of the viruses we have validated in this study is likely higher than estimated". Yes, but for balance you should also note that they may harm the physiology of the cell.

      We have added a sentence acknowledging this to the Discussion. Specifically, on p. 10, we now state: “However, this reduced immunoreactivity raises concerns about the virus or high levels of reporter protein possibly harming the cell physiology.”

      Discussion:

      • "but a thorough validation and characterization of these vectors in primate cortex has lacked" better to say "has been limited", because Dimidschstein 2016 (marmoset V1) and Vormstein-schneider 2020 (macaque S1 and PFC) both reported expression in NHP.

      We have added the following sentence to this paragraph of the Discussion. “In particular, previous studies have not characterized the specificity and coverage of these vectors across cortical layers.”(see p. 8).

      • "whether finding in mice" -> 'whether findings in mice'.

      Corrected, thanks.

      • The discussion re: species differences is missing reference to Kreinen 2020 (10.1038/s41586-020-2781-z).

      This reference has been added. Thanks.

      • “Injections of about 200nl volume resulted in higher specificity (95% across layers) and coverage” – this is misleading. The coverage was not statistically different among injection volumes.

      We have added the following sentence: ”although coverage did not differ significantly across volumes.” (see p. 10).

      • "it is possible that subtle alteration of the cortical circuit upon parenchymal injection of viruses (including AAVs) leads to alteration of activity-dependent expression of PV and GABA." Or (and I would argue, more likely) the expression of large quantities of your big reporter protein compromised the function of the cell, leading to reduced expression of native proteins. You don't mention any IHC to amplify the RFP signal, so I'm assuming that your images are of direct expression. If so, you are expressing A LOT of reporter protein.

      We have added a sentence acknowledging this to the Discussion. Specifically, on p. 10, we now state: “However, this reduced immunoreactivity raises concerns about the virus or high levels of reporter protein possibly harming the cell physiology.”

      Methods:

      • It's difficult to piece together which viruses were injected in which monkeys, at what volumes, and at what titer. Please compile this info into a table for ease of reference (including any other relevant parameters).

      We now provide a Supplementary Table 1.

    1. The Gilgamesh Epic is the most notable literary product of Babylonia as yet discovered in the mounds of Mesopotamia. It recounts the exploits and adventures of a favorite hero, and in its final form covers twelve tablets, each tablet consisting of six columns (three on the obverse and three on the reverse) of about 50 lines for each column, or a total of about 3600 lines. Of this total, however, barely more than one-half has been found among the remains of the great collection of cuneiform tablets gathered by King Ashurbanapal (668–626 B.C.) in his palace at Nineveh, and discovered by Layard in 18541 in the course of his excavations of the mound Kouyunjik (opposite Mosul). The fragments of the epic painfully gathered—chiefly by George Smith—from the circa 30,000 tablets and bits of tablets brought to the British Museum were published in model form by Professor Paul Haupt;2 and that edition still remains the primary source for our study of the Epic. [10] For the sake of convenience we may call the form of the Epic in the fragments from the library of Ashurbanapal the Assyrian version, though like most of the literary productions in the library it not only reverts to a Babylonian original, but represents a late copy of a much older original. The absence of any reference to Assyria in the fragments recovered justifies us in assuming that the Assyrian version received its present form in Babylonia, perhaps in Erech; though it is of course possible that some of the late features, particularly the elaboration of the teachings of the theologians or schoolmen in the eleventh and twelfth tablets, may have been produced at least in part under Assyrian influence. A definite indication that the Gilgamesh Epic reverts to a period earlier than Hammurabi (or Hammurawi)3 i.e., beyond 2000 B. C., was furnished by the publication of a text clearly belonging to the first Babylonian dynasty (of which Hammurabi was the sixth member) in CT. VI, 5; which text Zimmern4 recognized as a part of the tale of Atra-ḫasis, one of the names given to the survivor of the deluge, recounted on the eleventh tablet of the Gilgamesh Epic.5 This was confirmed by the discovery6 of a [11]fragment of the deluge story dated in the eleventh year of Ammisaduka, i.e., c. 1967 B.C. In this text, likewise, the name of the deluge hero appears as Atra-ḫasis (col. VIII, 4).7 But while these two tablets do not belong to the Gilgamesh Epic and merely introduce an episode which has also been incorporated into the Epic, Dr. Bruno Meissner in 1902 published a tablet, dating, as the writing and the internal evidence showed, from the Hammurabi period, which undoubtedly is a portion of what by way of distinction we may call an old Babylonian version.8 It was picked up by Dr. Meissner at a dealer’s shop in Bagdad and acquired for the Berlin Museum. The tablet consists of four columns (two on the obverse and two on the reverse) and deals with the hero’s wanderings in search of a cure from disease with which he has been smitten after the death of his companion Enkidu. The hero fears that the disease will be fatal and longs to escape death. It corresponds to a portion of Tablet X of the Assyrian version. Unfortunately, only the lower portion of the obverse and the upper of the reverse have been preserved (57 lines in all); and in default of a colophon we do not know the numeration of the tablet in this old Babylonian edition. Its chief value, apart from its furnishing a proof for the existence of the Epic as early as 2000 B. C., lies (a) in the writing Gish instead of Gish-gi(n)-mash in the Assyrian version, for the name of the hero, (b) in the writing En-ki-dũ—abbreviated from dũg—() “Enki is good” for En-ki-dú () in the Assyrian version,9 and (c) in the remarkable address of the maiden Sabitum, dwelling at the seaside, to whom Gilgamesh comes in the course of his wanderings. From the Assyrian version we know that the hero tells the maiden of his grief for his lost companion, and of his longing to escape the dire fate of Enkidu. In the old Babylonian fragment the answer of Sabitum is given in full, and the sad note that it strikes, showing how hopeless it is for man to try to escape death which is in store for all mankind, is as remarkable as is the philosophy of “eat, drink and be merry” which Sabitum imparts. The address indicates how early the tendency arose to attach to ancient tales the current religious teachings. [12] “Why, O Gish, does thou run about? The life that thou seekest, thou wilt not find. When the gods created mankind, Death they imposed on mankind; Life they kept in their power. Thou, O Gish, fill thy belly, Day and night do thou rejoice, Daily make a rejoicing! Day and night a renewal of jollification! Let thy clothes be clean, Wash thy head and pour water over thee! Care for the little one who takes hold of thy hand! Let the wife rejoice in thy bosom!” Such teachings, reminding us of the leading thought in the Biblical Book of Ecclesiastes,10 indicate the didactic character given to ancient tales that were of popular origin, but which were modified and elaborated under the influence of the schools which arose in connection with the Babylonian temples. The story itself belongs, therefore, to a still earlier period than the form it received in this old Babylonian version. The existence of this tendency at so early a date comes to us as a genuine surprise, and justifies the assumption that the attachment of a lesson to the deluge story in the Assyrian version, to wit, the limitation in attainment of immortality to those singled out by the gods as exceptions, dates likewise from the old Babylonian period. The same would apply to the twelfth tablet, which is almost entirely didactic, intended to illustrate the impossibility of learning anything of the fate of those who have passed out of this world. It also emphasizes the necessity of contenting oneself with the comfort that the care of the dead, by providing burial and food and drink offerings for them affords, as the only means of ensuring for them rest and freedom from the pangs of hunger and distress. However, it is of course possible that the twelfth tablet, which impresses one as a supplement to the adventures of Gilgamesh, ending with his return to Uruk (i.e., Erech) at the close of the eleventh tablet, may represent a later elaboration of the tendency to connect religious teachings with the exploits of a favorite hero. [13] We now have further evidence both of the extreme antiquity of the literary form of the Gilgamesh Epic and also of the disposition to make the Epic the medium of illustrating aspects of life and the destiny of mankind. The discovery by Dr. Arno Poebel of a Sumerian form of the tale of the descent of Ishtar to the lower world and her release11—apparently a nature myth to illustrate the change of season from summer to winter and back again to spring—enables us to pass beyond the Akkadian (or Semitic) form of tales current in the Euphrates Valley to the Sumerian form. Furthermore, we are indebted to Dr. Langdon for the identification of two Sumerian fragments in the Nippur Collection which deal with the adventures of Gilgamesh, one in Constantinople,12 the other in the collection of the University of Pennsylvania Museum.13 The former, of which only 25 lines are preserved (19 on the obverse and 6 on the reverse), appears to be a description of the weapons of Gilgamesh with which he arms himself for an encounter—presumably the encounter with Ḫumbaba or Ḫuwawa, the ruler of the cedar forest in the mountain.14 The latter deals with the building operations of Gilgamesh in the city of Erech. A text in Zimmern’s Sumerische Kultlieder aus altbabylonischer Zeit (Leipzig, 1913), No. 196, appears likewise to be a fragment of the Sumerian version of the Gilgamesh Epic, bearing on the episode of Gilgamesh’s and Enkidu’s relations to the goddess Ishtar, covered in the sixth and seventh tablets of the Assyrian version.15 Until, however, further fragments shall have turned up, it would be hazardous to institute a comparison between the Sumerian and the Akkadian versions. All that can be said for the present is that there is every reason to believe in the existence of a literary form of the Epic in Sumerian which presumably antedated the Akkadian recension, [14]just as we have a Sumerian form of Ishtar’s descent into the nether world, and Sumerian versions of creation myths, as also of the Deluge tale.16 It does not follow, however, that the Akkadian versions of the Gilgamesh Epic are translations of the Sumerian, any more than that the Akkadian creation myths are translations of a Sumerian original. Indeed, in the case of the creation myths, the striking difference between the Sumerian and Akkadian views of creation17 points to the independent production of creation stories on the part of the Semitic settlers of the Euphrates Valley, though no doubt these were worked out in part under Sumerian literary influences. The same is probably true of Deluge tales, which would be given a distinctly Akkadian coloring in being reproduced and steadily elaborated by the Babylonian literati attached to the temples. The presumption is, therefore, in favor of an independent literary origin for the Semitic versions of the Gilgamesh Epic, though naturally with a duplication of the episodes, or at least of some of them, in the Sumerian narrative. Nor does the existence of a Sumerian form of the Epic necessarily prove that it originated with the Sumerians in their earliest home before they came to the Euphrates Valley. They may have adopted it after their conquest of southern Babylonia from the Semites who, there are now substantial grounds for believing, were the earlier settlers in the Euphrates Valley.18 We must distinguish, therefore, between the earliest literary form, which was undoubtedly Sumerian, and the origin of the episodes embodied in the Epic, including the chief actors, Gilgamesh and his companion Enkidu. It will be shown that one of the chief episodes, the encounter of the two heroes with a powerful guardian or ruler of a cedar forest, points to a western region, more specifically to Amurru, as the scene. The names of the two chief actors, moreover, appear to have been “Sumerianized” by an artificial process,19 and if this view turns out to be [15]correct, we would have a further ground for assuming the tale to have originated among the Akkadian settlers and to have been taken over from them by the Sumerians. New light on the earliest Babylonian version of the Epic, as well as on the Assyrian version, has been shed by the recovery of two substantial fragments of the form which the Epic had assumed in Babylonia in the Hammurabi period. The study of this important new material also enables us to advance the interpretation of the Epic and to perfect the analysis into its component parts. In the spring of 1914, the Museum of the University of Pennsylvania acquired by purchase a large tablet, the writing of which as well as the style and the manner of spelling verbal forms and substantives pointed distinctly to the time of the first Babylonian dynasty. The tablet was identified by Dr. Arno Poebel as part of the Gilgamesh Epic; and, as the colophon showed, it formed the second tablet of the series. He copied it with a view to publication, but the outbreak of the war which found him in Germany—his native country—prevented him from carrying out this intention.20 He, however, utilized some of its contents in his discussion of the historical or semi-historical traditions about Gilgamesh, as revealed by the important list of partly mythical and partly historical dynasties, found among the tablets of the Nippur collection, in which Gilgamesh occurs21 as a King of an Erech dynasty, whose father was Â, a priest of Kulab.22 The publication of the tablet was then undertaken by Dr. Stephen Langdon in monograph form under the title, “The Epic of Gilgamish.”23 In a preliminary article on the tablet in the Museum Journal, Vol. VIII, pages 29–38, Dr. Langdon took the tablet to be of the late [16]Persian period (i.e., between the sixth and third century B. C.), but his attention having been called to this error of some 1500 years, he corrected it in his introduction to his edition of the text, though he neglected to change some of his notes in which he still refers to the text as “late.”24 In addition to a copy of the text, accompanied by a good photograph, Dr. Langdon furnished a transliteration and translation with some notes and a brief introduction. The text is unfortunately badly copied, being full of errors; and the translation is likewise very defective. A careful collation with the original tablet was made with the assistance of Dr. Edward Chiera, and as a consequence we are in a position to offer to scholars a correct text. We beg to acknowledge our obligations to Dr. Gordon, the Director of the Museum of the University of Pennsylvania, for kindly placing the tablet at our disposal. Instead of republishing the text, I content myself with giving a full list of corrections in the appendix to this volume which will enable scholars to control our readings, and which will, I believe, justify the translation in the numerous passages in which it deviates from Dr. Langdon’s rendering. While credit should be given to Dr. Langdon for having made this important tablet accessible, the interests of science demand that attention be called to his failure to grasp the many important data furnished by the tablet, which escaped him because of his erroneous readings and faulty translations. The tablet, consisting of six columns (three on the obverse and three on the reverse), comprised, according to the colophon, 240 lines25 and formed the second tablet of the series. Of the total, 204 lines are preserved in full or in part, and of the missing thirty-six quite a number can be restored, so that we have a fairly complete tablet. The most serious break occurs at the top of the reverse, where about eight lines are missing. In consequence of this the connection between the end of the obverse (where about five lines are missing) and the beginning of the reverse is obscured, though not to the extent of our entirely losing the thread of the narrative. [17] About the same time that the University of Pennsylvania Museum purchased this second tablet of the Gilgamesh Series, Yale University obtained a tablet from the same dealer, which turned out to be a continuation of the University of Pennsylvania tablet. That the two belong to the same edition of the Epic is shown by their agreement in the dark brown color of the clay, in the writing as well as in the size of the tablet, though the characters on the Yale tablet are somewhat cramped and in consequence more difficult to read. Both tablets consist of six columns, three on the obverse and three on the reverse. The measurements of both are about the same, the Pennsylvania tablet being estimated at about 7 inches high, as against 72/16 inches for the Yale tablet, while the width of both is 6½ inches. The Yale tablet is, however, more closely written and therefore has a larger number of lines than the Pennsylvania tablet. The colophon to the Yale tablet is unfortunately missing, but from internal evidence it is quite certain that the Yale tablet follows immediately upon the Pennsylvania tablet and, therefore, may be set down as the third of the series. The obverse is very badly preserved, so that only a general view of its contents can be secured. The reverse contains serious gaps in the first and second columns. The scribe evidently had a copy before him which he tried to follow exactly, but finding that he could not get all of the copy before him in the six columns, he continued the last column on the edge. In this way we obtain for the sixth column 64 lines as against 45 for column IV, and 47 for column V, and a total of 292 lines for the six columns. Subtracting the 16 lines written on the edge leaves us 276 lines for our tablet as against 240 for its companion. The width of each column being the same on both tablets, the difference of 36 lines is made up by the closer writing. Both tablets have peculiar knobs at the sides, the purpose of which is evidently not to facilitate holding the tablet in one’s hand while writing or reading it, as Langdon assumed26 (it would be quite impracticable for this purpose), but simply to protect the tablet in its position on a shelf, where it would naturally be placed on the edge, just as we arrange books on a shelf. Finally be it noted that these two tablets of the old Babylonian version do not belong to the same edition as the Meissner tablet above described, for the latter consists [18]of two columns each on obverse and reverse, as against three columns each in the case of our two tablets. We thus have the interesting proof that as early as 2000 B.C. there were already several editions of the Epic. As to the provenance of our two tablets, there are no definite data, but it is likely that they were found by natives in the mounds at Warka, from which about the year 1913, many tablets came into the hands of dealers. It is likely that where two tablets of a series were found, others of the series were also dug up, and we may expect to find some further portions of this old Babylonian version turning up in the hands of other dealers or in museums. Coming to the contents of the two tablets, the Pennsylvania tablet deals with the meeting of the two heroes, Gilgamesh and Enkidu, their conflict, followed by their reconciliation, while the Yale tablet in continuation takes up the preparations for the encounter of the two heroes with the guardian of the cedar forest, Ḫumbaba—but probably pronounced Ḫubaba27—or, as the name appears in the old Babylonian version, Ḫuwawa. The two tablets correspond, therefore, to portions of Tablets I to V of the Assyrian version;28 but, as will be shown in detail further on, the number of completely parallel passages is not large, and the Assyrian version shows an independence of the old Babylonian version that is larger than we had reason to expect. In general, it may be said that the Assyrian version is more elaborate, which points to its having received its present form at a considerably later period than the old Babylonian version.29 On the other hand, we already find in the Babylonian version the tendency towards repetition, which is characteristic of Babylonian-Assyrian tales in general. Through the two Babylonian tablets we are enabled to fill out certain details [19]of the two episodes with which they deal: (1) the meeting of Gilgamesh and Enkidu, and (2) the encounter with Ḫuwawa; while their greatest value consists in the light that they throw on the gradual growth of the Epic until it reached its definite form in the text represented by the fragments in Ashurbanapal’s Library. Let us now take up the detailed analysis, first of the Pennsylvania tablet and then of the Yale tablet. The Pennsylvania tablet begins with two dreams recounted by Gilgamesh to his mother, which the latter interprets as presaging the coming of Enkidu to Erech. In the one, something like a heavy meteor falls from heaven upon Gilgamesh and almost crushes him. With the help of the heroes of Erech, Gilgamesh carries the heavy burden to his mother Ninsun. The burden, his mother explains, symbolizes some one who, like Gilgamesh, is born in the mountains, to whom all will pay homage and of whom Gilgamesh will become enamoured with a love as strong as that for a woman. In a second dream, Gilgamesh sees some one who is like him, who brandishes an axe, and with whom he falls in love. This personage, the mother explains, is again Enkidu. Langdon is of the opinion that these dreams are recounted to Enkidu by a woman with whom Enkidu cohabits for six days and seven nights and who weans Enkidu from association with animals. This, however, cannot be correct. The scene between Enkidu and the woman must have been recounted in detail in the first tablet, as in the Assyrian version,30 whereas here in the second tablet we have the continuation of the tale with Gilgamesh recounting his dreams directly to his mother. The story then continues with the description of the coming of Enkidu, conducted by the woman to the outskirts of Erech, where food is given him. The main feature of the incident is the conversion of Enkidu to civilized life. Enkidu, who hitherto had gone about naked, is clothed by the woman. Instead of sucking milk and drinking from a trough like an animal, food and strong drink are placed before him, and he is taught how to eat and drink in human fashion. In human fashion he also becomes drunk, and his “spree” is naïvely described: “His heart became glad and his face shone.”31 [20]Like an animal, Enkidu’s body had hitherto been covered with hair, which is now shaved off. He is anointed with oil, and clothed “like a man.” Enkidu becomes a shepherd, protecting the fold against wild beasts, and his exploit in dispatching lions is briefly told. At this point—the end of column 3 (on the obverse), i.e., line 117, and the beginning of column 4 (on the reverse), i.e., line 131—a gap of 13 lines—the tablet is obscure, but apparently the story of Enkidu’s gradual transformation from savagery to civilized life is continued, with stress upon his introduction to domestic ways with the wife chosen or decreed for him, and with work as part of his fate. All this has no connection with Gilgamesh, and it is evident that the tale of Enkidu was originally an independent tale to illustrate the evolution of man’s career and destiny, how through intercourse with a woman he awakens to the sense of human dignity, how he becomes accustomed to the ways of civilization, how he passes through the pastoral stage to higher walks of life, how the family is instituted, and how men come to be engaged in the labors associated with human activities. In order to connect this tale with the Gilgamesh story, the two heroes are brought together; the woman taking on herself, in addition to the rôle of civilizer, that of the medium through which Enkidu is brought to Gilgamesh. The woman leads Enkidu from the outskirts of Erech into the city itself, where the people on seeing him remark upon his likeness to Gilgamesh. He is the very counterpart of the latter, though somewhat smaller in stature. There follows the encounter between the two heroes in the streets of Erech, where they engage in a fierce combat. Gilgamesh is overcome by Enkidu and is enraged at being thrown to the ground. The tablet closes with the endeavor of Enkidu to pacify Gilgamesh. Enkidu declares that the mother of Gilgamesh has exalted her son above the ordinary mortal, and that Enlil himself has singled him out for royal prerogatives. After this, we may assume, the two heroes become friends and together proceed to carry out certain exploits, the first of which is an attack upon the mighty guardian of the cedar forest. This is the main episode in the Yale tablet, which, therefore, forms the third tablet of the old Babylonian version. In the first column of the obverse of the Yale tablet, which is badly preserved, it would appear that the elders of Erech (or perhaps the people) are endeavoring to dissuade Gilgamesh from making the [21]attempt to penetrate to the abode of Ḫuwawa. If this is correct, then the close of the first column may represent a conversation between these elders and the woman who accompanies Enkidu. It would be the elders who are represented as “reporting the speech to the woman,” which is presumably the determination of Gilgamesh to fight Ḫuwawa. The elders apparently desire Enkidu to accompany Gilgamesh in this perilous adventure, and with this in view appeal to the woman. In the second column after an obscure reference to the mother of Gilgamesh—perhaps appealing to the sun-god—we find Gilgamesh and Enkidu again face to face. From the reference to Enkidu’s eyes “filled with tears,” we may conclude that he is moved to pity at the thought of what will happen to Gilgamesh if he insists upon carrying out his purpose. Enkidu, also, tries to dissuade Gilgamesh. This appears to be the main purport of the dialogue between the two, which begins about the middle of the second column and extends to the end of the third column. Enkidu pleads that even his strength is insufficient, “My arms are lame, My strength has become weak.” (lines 88–89) Gilgamesh apparently asks for a description of the terrible tyrant who thus arouses the fear of Enkidu, and in reply Enkidu tells him how at one time, when he was roaming about with the cattle, he penetrated into the forest and heard the roar of Ḫuwawa which was like that of a deluge. The mouth of the tyrant emitted fire, and his breath was death. It is clear, as Professor Haupt has suggested,32 that Enkidu furnishes the description of a volcano in eruption, with its mighty roar, spitting forth fire and belching out a suffocating smoke. Gilgamesh is, however, undaunted and urges Enkidu to accompany him in the adventure. “I will go down to the forest,” says Gilgamesh, if the conjectural restoration of the line in question (l. 126) is correct. Enkidu replies by again drawing a lurid picture of what will happen “When we go (together) to the forest…….” This speech of Enkidu is continued on the reverse. In reply Gilgamesh emphasizes his reliance upon the good will of Shamash and reproaches Enkidu with cowardice. He declares himself superior to Enkidu’s warning, and in bold terms [22]says that he prefers to perish in the attempt to overcome Ḫuwawa rather than abandon it. “Wherever terror is to be faced, Thou, forsooth, art in fear of death. Thy prowess lacks strength. I will go before thee, Though thy mouth shouts to me: ‘thou art afraid to approach,’ If I fall, I will establish my name.” (lines 143–148) There follows an interesting description of the forging of the weapons for the two heroes in preparation for the encounter.33 The elders of Erech when they see these preparations are stricken with fear. They learn of Ḫuwawa’s threat to annihilate Gilgamesh if he dares to enter the cedar forest, and once more try to dissuade Gilgamesh from the undertaking. “Thou art young, O Gish, and thy heart carries thee away, Thou dost not know what thou proposest to do.” (lines 190–191) They try to frighten Gilgamesh by repeating the description of the terrible Ḫuwawa. Gilgamesh is still undaunted and prays to his patron deity Shamash, who apparently accords him a favorable “oracle” (têrtu). The two heroes arm themselves for the fray, and the elders of Erech, now reconciled to the perilous undertaking, counsel Gilgamesh to take provision along for the undertaking. They urge Gilgamesh to allow Enkidu to take the lead, for “He is acquainted with the way, he has trodden the road [to] the entrance of the forest.” (lines 252–253) The elders dismiss Gilgamesh with fervent wishes that Enkidu may track out the “closed path” for Gilgamesh, and commit him to the care of Lugalbanda—here perhaps an epithet of Shamash. They advise Gilgamesh to perform certain rites, to wash his feet in the stream of Ḫuwawa and to pour out a libation of water to Shamash. Enkidu follows in a speech likewise intended to encourage the hero; and with the actual beginning of the expedition against Ḫuwawa the tablet ends. The encounter itself, with the triumph of the two heroes, must have been described in the fourth tablet. [23] Now before taking up the significance of the additions to our knowledge of the Epic gained through these two tablets, it will be well to discuss the forms in which the names of the two heroes and of the ruler of the cedar forest occur in our tablets. As in the Meissner fragment, the chief hero is invariably designated as dGish in both the Pennsylvania and Yale tablets; and we may therefore conclude that this was the common form in the Hammurabi period, as against the writing dGish-gì(n)-mash34 in the Assyrian version. Similarly, as in the Meissner fragment, the second hero’s name is always written En-ki-dũ35 (abbreviated from dúg) as against En-ki-dú in the Assyrian version. Finally, we encounter in the Yale tablet for the first time the writing Ḫu-wa-wa as the name of the guardian of the cedar forest, as against Ḫum-ba-ba in the Assyrian version, though in the latter case, as we may now conclude from the Yale tablet, the name should rather be read Ḫu-ba-ba.36 The variation in the writing of the latter name is interesting as pointing to the aspirate pronunciation of the labial in both instances. The name would thus present a complete parallel to the Hebrew name Ḫowawa (or Ḫobab) who appears as the brother-in-law of Moses in the P document, Numbers 10, 29.37 Since the name also occurs, written precisely as in the Yale tablet, among the “Amoritic” names in the important lists published by Dr. Chiera,38 there can be no doubt that [24]Ḫuwawa or Ḫubaba is a West Semitic name. This important fact adds to the probability that the “cedar forest” in which Ḫuwawa dwells is none other than the Lebanon district, famed since early antiquity for its cedars. This explanation of the name Ḫuwawa disposes of suppositions hitherto brought forward for an Elamitic origin. Gressmann39 still favors such an origin, though realizing that the description of the cedar forest points to the Amanus or Lebanon range. In further confirmation of the West Semitic origin of the name, we have in Lucian, De Dea Syria, § 19, the name Kombabos40 (the guardian of Stratonika), which forms a perfect parallel to Ḫu(m)baba. Of the important bearings of this western character of the name Ḫuwawa on the interpretation and origin of the Gilgamesh Epic, suggesting that the episode of the encounter between the tyrant and the two heroes rests upon a tradition of an expedition against the West or Amurru land, we shall have more to say further on. The variation in the writing of the name Enkidu is likewise interesting. It is evident that the form in the old Babylonian version with the sign dũ (i.e., dúg) is the original, for it furnishes us with a suitable etymology “Enki is good.” The writing with dúg, pronounced dū, also shows that the sign dú as the third element in the form which the name has in the Assyrian version is to be read dú, and that former readings like Ea-bani must be definitely abandoned.41 The form with dú is clearly a phonetic writing of the Sumerian name, the sign dú being chosen to indicate the pronunciation (not the ideograph) of the third element dúg. This is confirmed by the writing En-gi-dú in the syllabary CT XVIII, 30, 10. The phonetic writing is, therefore, a warning against any endeavor to read the name by an Akkadian transliteration of the signs. This would not of itself prove that Enkidu is of Sumerian origin, for it might well be that the writing En-ki-dú is an endeavor to give a Sumerian aspect to a name that may have been foreign. The element dúg corresponds to the Semitic ṭâbu, “good,” and En-ki being originally a designation of a deity as the “lord of the land,” which would be the Sumerian [25]manner of indicating a Semitic Baal, it is not at all impossible that En-ki-dúg may be the “Sumerianized” form of a Semitic בַּעל טזֹב “Baal is good.” It will be recalled that in the third column of the Yale tablet, Enkidu speaks of himself in his earlier period while still living with cattle, as wandering into the cedar forest of Ḫuwawa, while in another passage (ll. 252–253) he is described as “acquainted with the way … to the entrance of the forest.” This would clearly point to the West as the original home of Enkidu. We are thus led once more to Amurru—taken as a general designation of the West—as playing an important role in the Gilgamesh Epic.42 If Gilgamesh’s expedition against Ḫuwawa of the Lebanon district recalls a Babylonian campaign against Amurru, Enkidu’s coming from his home, where, as we read repeatedly in the Assyrian version, “He ate herbs with the gazelles, Drank out of a trough with cattle,”43 may rest on a tradition of an Amorite invasion of Babylonia. The fight between Gilgamesh and Enkidu would fit in with this tradition, while the subsequent reconciliation would be the form in which the tradition would represent the enforced union between the invaders and the older settlers. Leaving this aside for the present, let us proceed to a consideration of the relationship of the form dGish, for the chief personage in the Epic in the old Babylonian version, to dGish-gi(n)-mash in the Assyrian version. Of the meaning of Gish there is fortunately no doubt. It is clearly the equivalent to the Akkadian zikaru, “man” (Brünnow No. 5707), or possibly rabû, “great” (Brünnow No. 5704). Among various equivalents, the preference is to be given to itlu, “hero.” The determinative for deity stamps the person so designated as deified, or as in part divine, and this is in accord with the express statement in the Assyrian version of the Gilgamesh Epic which describes the hero as “Two-thirds god and one-third human.”44 [26]Gish is, therefore, the hero-god par excellence; and this shows that we are not dealing with a genuine proper name, but rather with a descriptive attribute. Proper names are not formed in this way, either in Sumerian or Akkadian. Now what relation does this form Gish bear to as the name of the hero is invariably written in the Assyrian version, the form which was at first read dIz-tu-bar or dGish-du-bar by scholars, until Pinches found in a neo-Babylonian syllabary45 the equation of it with Gi-il-ga-mesh? Pinches’ discovery pointed conclusively to the popular pronunciation of the hero’s name as Gilgamesh; and since Aelian (De natura Animalium XII, 2) mentions a Babylonian personage Gilgamos (though what he tells us of Gilgamos does not appear in our Epic, but seems to apply to Etana, another figure of Babylonian mythology), there seemed to be no further reason to question that the problem had been solved. Besides, in a later Syriac list of Babylonian kings found in the Scholia of Theodor bar Koni, the name גלמגום with a variant גמיגמוס occurs,46 and it is evident that we have here again the Gi-il-ga-mesh, discovered by Pinches. The existence of an old Babylonian hero Gilgamesh who was likewise a king is thus established, as well as his identification with It is evident that we cannot read this name as Iz-tu-bar or Gish-du-bar, but that we must read the first sign as Gish and the third as Mash, while for the second we must assume a reading Gìn or Gi. This would give us Gish-gì(n)-mash which is clearly again (like En-ki-dú) not an etymological writing but a phonetic one, intended to convey an approach to the popular pronunciation. Gi-il-ga-mesh might well be merely a variant for Gish-ga-mesh, or vice versa, and this would come close to Gish-gi-mash. Now, when we have a name the pronunciation of which is not definite but approximate, and which is written in various ways, the probabilities are that the name is foreign. A foreign name might naturally be spelled in various ways. The [27]Epic in the Assyrian version clearly depicts dGish-gì(n)-mash as a conqueror of Erech, who forces the people into subjection, and whose autocratic rule leads the people of Erech to implore the goddess Aruru to create a rival to him who may withstand him. In response to this appeal dEnkidu is formed out of dust by Aruru and eventually brought to Erech.47 Gish-gì(n)-mash or Gilgamesh is therefore in all probability a foreigner; and the simplest solution suggested by the existence of the two forms (1) Gish in the old Babylonian version and (2) Gish-gì(n)-mash in the Assyrian version, is to regard the former as an abbreviation, which seemed appropriate, because the short name conveyed the idea of the “hero” par excellence. If Gish-gì(n)-mash is a foreign name, one would think in the first instance of Sumerian; but here we encounter a difficulty in the circumstance that outside of the Epic this conqueror and ruler of Erech appears in quite a different form, namely, as dGish-bil-ga-mesh, with dGish-gibil(or bìl)-ga-mesh and dGish-bil-ge-mesh as variants.48 In the remarkable list of partly mythological and partly historical dynasties, published by Poebel,49 the fifth member of the first dynasty of Erech appears as dGish-bil-ga-mesh; and similarly in an inscription of the days of Sin-gamil, dGish-bil-ga-mesh is mentioned as the builder of the wall of Erech.50 Moreover, in the several fragments of the Sumerian version of the Epic we have invariably the form dGish-bil-ga-mesh. It is evident, therefore, that this is the genuine form of the name in Sumerian and presumably, therefore, the oldest form. By way of further confirmation we have in the syllabary above referred to, CT, XVIII, 30, 6–8, three designations of our hero, viz: dGish-gibil(or bíl)-ga-mesh muḳ-tab-lu (“warrior”) a-lik pa-na (“leader”) All three designations are set down as the equivalent of the Sumerian Esigga imin i.e., “the seven-fold hero.” [28] Of the same general character is the equation in another syllabary:51 Esigga-tuk and its equivalent Gish-tuk = “the one who is a hero.” Furthermore, the name occurs frequently in “Temple” documents of the Ur dynasty in the form dGish-bil-ga-mesh52 with dGish-bil-gi(n)-mesh as a variant.53 In a list of deities (CT XXV, 28, K 7659) we likewise encounter dGish-gibil(or bíl)-ga-mesh, and lastly in a syllabary we have the equation54 dGish-gi-mas-[si?] = dGish-bil-[ga-mesh]. The variant Gish-gibil for Gish-bil may be disposed of readily, in view of the frequent confusion or interchange of the two signs Bil (Brünnow No. 4566) and Gibil or Bíl (Brünnow No. 4642) which has also the value Gi (Brünnow 4641), so that we might also read Gish-gi-ga-mesh. Both signs convey the idea of “fire,” “renew,” etc.; both revert to the picture of flames of fire, in the one case with a bowl (or some such obiect) above it, in the other the flames issuing apparently from a torch.55 The meaning of the name is not affected whether we read dGish-bil-ga-mesh or dGish-gibil(or bíl)-ga-mesh, for the middle element in the latter case being identical with the fire-god, written dBil-gi and to be pronounced in the inverted form as Gibil with -ga (or ge) as the phonetic complement; it is equivalent, therefore, to the writing bil-ga in the former case. Now Gish-gibil or Gish-bíl conveys the idea of abu, “father” (Brünnow No. 5713), just as Bil (Brünnow No. 4579) has this meaning, while Pa-gibil-(ga) or Pa-bíl-ga is abu abi, “grandfather.”56 This meaning may be derived from Gibil, as also from Bíl = išatu, “fire,” then eššu, “new,” then abu, “father,” as the renewer or creator. Gish with Bíl or Gibil would, therefore, be “the father-man” or “the father-hero,” [29]i.e., again the hero par excellence, the original hero, just as in Hebrew and Arabic ab is used in this way.57 The syllable ga being a phonetic complement, the element mesh is to be taken by itself and to be explained, as Poebel suggested, as “hero” (itlu. Brünnow No. 5967). We would thus obtain an entirely artificial combination, “man (or hero), father, hero,” which would simply convey in an emphatic manner the idea of the Ur-held, the original hero, the father of heroes as it were—practically the same idea, therefore, as the one conveyed by Gish alone, as the hero par excellence. Our investigation thus leads us to a substantial identity between Gish and the longer form Gish-bil(or bíl)-ga-mesh, and the former might, therefore, well be used as an abbreviation of the latter. Both the shorter and the longer forms are descriptive epithets based on naive folk etymology, rather than personal names, just as in the designation of our hero as muḳtablu, the “fighter,” or as âlik pâna, “the leader,” or as Esigga imin, “the seven-fold hero,” or Esigga tuk, “the one who is a hero,” are descriptive epithets, and as Atra-ḫasis, “the very wise one,” is such an epithet for the hero of the deluge story. The case is different with Gi-il-ga-mesh, or Gish-gì(n)-mash, which represent the popular and actual pronunciation of the name, or at least the approach to such pronunciation. Such forms, stripped as they are of all artificiality, impress one as genuine names. The conclusion to which we are thus led is that Gish-bil(or bíl)-ga-mesh is a play upon the genuine name, to convey to those to whom the real name, as that of a foreigner, would suggest no meaning an interpretation fitting in with his character. In other words, Gish-bil-ga-mesh is a “Sumerianized” form of the name, introduced into the Sumerian version of the tale which became a folk-possession in the Euphrates Valley. Such plays upon names to suggest the character of an individual or some incident are familiar to us from the narratives in Genesis.58 They do not constitute genuine etymologies and are rarely of use in leading to a correct etymology. Reuben, e.g., certainly does not mean “Yahweh has seen my affliction,” which the mother is supposed to have exclaimed at [30]the birth (Genesis 29, 32), with a play upon ben and be’onyi, any more than Judah means “I praise Yahweh” (v. 35), though it does contain the divine name (Yehô) as an element. The play on the name may be close or remote, as long as it fulfills its function of suggesting an etymology that is complimentary or appropriate. In this way, an artificial division and at the same time a distortion of a foreign name like Gilgamesh into several elements, Gish-bil-ga-mesh, is no more violent than, for example, the explanation of Issachar or rather Issaschar as “God has given my hire” (Genesis 30, 18) with a play upon the element sechar, and as though the name were to be divided into Yah (“God”) and sechar (“hire”); or the popular name of Alexander among the Arabs as Zu’l Karnaini, “the possessor of the two horns.” with a suggestion of his conquest of two hemispheres, or what not.59 The element Gil in Gilgamesh would be regarded as a contraction of Gish-bil or gi-bil, in order to furnish the meaning “father-hero,” or Gil might be looked upon as a variant for Gish, which would give us the “phonetic” form in the Assyrian version dGish-gi-mash,60 as well as such a variant writing dGish-gi-mas-(si). Now a name like Gilgamesh, upon which we may definitely settle as coming closest to the genuine form, certainly impresses one as foreign, i.e., it is neither Sumerian nor Akkadian; and we have already suggested that the circumstance that the hero of the Epic is portrayed as a conqueror of Erech, and a rather ruthless one at that, points to a tradition of an invasion of the Euphrates Valley as the background for the episode in the first tablet of the series. Now it is significant that many of the names in the “mythical” dynasties, as they appear in Poebel’s list,61 are likewise foreign, such as Mes-ki-in-ga-še-ir, son of the god Shamash (and the founder of the “mythical” dynasty of Erech of which dGish-bil-ga-mesh is the fifth member),62 and En-me-ir-kár his son. In a still earlier “mythical” dynasty, we encounter names like Ga-lu-mu-um, Zu-ga-gi-ib, Ar-pi, [31]E-ta-na,63 which are distinctly foreign, while such names as En-me(n)-nun-na and Bar-sal-nun-na strike one again as “Sumerianized” names rather than as genuine Sumerian formations.64 Some of these names, as Galumum, Arpi and Etana, are so Amoritic in appearance, that one may hazard the conjecture of their western origin. May Gilgamesh likewise belong to the Amurru65 region, or does he represent a foreigner from the East in contrast to Enkidu, whose name, we have seen, may have been Baal-Ṭôb in the West, with which region he is according to the Epic so familiar? It must be confessed that the second element ga-mesh would fit in well with a Semitic origin for the name, for the element impresses one as the participial form of a Semitic stem g-m-š, just as in the second element of Meskin-gašer we have such a form. Gil might then be the name of a West-Semitic deity. Such conjectures, however, can for the present not be substantiated, and we must content ourselves with the conclusion that Gilgamesh as the real name of the hero, or at least the form which comes closest to the real name, points to a foreign origin for the hero, and that such forms as dGish-bil-ga-mesh and dGish-bíl-gi-mesh and other variants are “Sumerianized” forms for which an artificial etymology was brought forward to convey the [32]idea of the “original hero” or the hero par excellence. By means of this “play” on the name, which reverts to the compilers of the Sumerian version of the Epic, Gilgamesh was converted into a Sumerian figure, just as the name Enkidu may have been introduced as a Sumerian translation of his Amoritic name. dGish at all events is an abbreviated form of the “Sumerianized” name, introduced by the compilers of the earliest Akkadian version, which was produced naturally under the influence of the Sumerian version. Later, as the Epic continued to grow, a phonetic writing was introduced, dGish-gi-mash, which is in a measure a compromise between the genuine name and the “Sumerianized” form, but at the same time an approach to the real pronunciation. Next to the new light thrown upon the names and original character of the two main figures of the Epic, one of the chief points of interest in the Pennsylvania fragment is the proof that it furnishes for a striking resemblance of the two heroes, Gish and Enkidu, to one another. In interpreting the dream of Gish, his mother. Ninsun, lays stress upon the fact that the dream portends the coming of someone who is like Gish, “born in the field and reared in the mountain” (lines 18–19). Both, therefore, are shown by this description to have come to Babylonia from a mountainous region, i.e., they are foreigners; and in the case of Enkidu we have seen that the mountain in all probability refers to a region in the West, while the same may also be the case with Gish. The resemblance of the two heroes to one another extends to their personal appearance. When Enkidu appears on the streets of Erech, the people are struck by this resemblance. They remark that he is “like Gish,” though “shorter in stature” (lines 179–180). Enkidu is described as a rival or counterpart.66 This relationship between the two is suggested also by the Assyrian version. In the creation of Enkidu by Aruru, the people urge the goddess to create the “counterpart” (zikru) of Gilgamesh, someone who will be like him (ma-ši-il) (Tablet I, 2, 31). Enkidu not only comes from the mountain,67 but the mountain is specifically designated [33]as his birth-place (I, 4, 2), precisely as in the Pennsylvania tablet, while in another passage he is also described, as in our tablet, as “born in the field.”68 Still more significant is the designation of Gilgamesh as the talimu, “younger brother,” of Enkidu.69 In accord with this, we find Gilgamesh in his lament over Enkidu describing him as a “younger brother” (ku-ta-ni);70 and again in the last tablet of the Epic, Gilgamesh is referred to as the “brother” of Enkidu.71 This close relationship reverts to the Sumerian version, for the Constantinople fragment (Langdon, above, p. 13) begins with the designation of Gish-bil-ga-mesh as “his brother.” By “his” no doubt Enkidu is meant. Likewise in the Sumerian text published by Zimmern (above, p. 13) Gilgamesh appears as the brother of Enkidu (rev. 1, 17). Turning to the numerous representations of Gilgamesh and Enkidu on Seal Cylinders,72 we find this resemblance of the two heroes to each other strikingly confirmed. Both are represented as bearded, with the strands arranged in the same fashion. The face in both cases is broad, with curls protruding at the side of the head, though at times these curls are lacking in the case of Enkidu. What is particularly striking is to find Gilgamesh generally a little taller than Enkidu, thus bearing out the statement in the Pennsylvania tablet that Enkidu is “shorter in stature.” There are, to be sure, also some distinguishing marks between the two. Thus Enkidu is generally represented with animal hoofs, but not always.73 Enkidu is commonly portrayed with the horns of a bison, but again this sign is wanting in quite a number of instances.74 The hoofs and the horns mark the period when Enkidu lived with animals and much like an [34]animal. Most remarkable, however, of all are cylinders on which we find the two heroes almost exactly alike as, for example, Ward No. 199 where two figures, the one a duplicate of the other (except that one is just a shade taller), are in conflict with each other. Dr. Ward was puzzled by this representation and sets it down as a “fantastic” scene in which “each Gilgamesh is stabbing the other.” In the light of the Pennsylvania tablet, this scene is clearly the conflict between the two heroes described in column 6, preliminary to their forming a friendship. Even in the realm of myth the human experience holds good that there is nothing like a good fight as a basis for a subsequent alliance. The fragment describes this conflict as a furious one in which Gilgamesh is worsted, and his wounded pride assuaged by the generous victor, who comforts his vanquished enemy by the assurance that he was destined for something higher than to be a mere “Hercules.” He was singled out for the exercise of royal authority. True to the description of the two heroes in the Pennsylvania tablet as alike, one the counterpart of the other, the seal cylinder portrays them almost exactly alike, as alike as two brothers could possibly be; with just enough distinction to make it clear on close inspection that two figures are intended and not one repeated for the sake of symmetry. There are slight variations in the manner in which the hair is worn, and slightly varying expressions of the face, just enough to make it evident that the one is intended for Gilgamesh and the other for Enkidu. When, therefore, in another specimen, No. 173, we find a Gilgamesh holding his counterpart by the legs, it is merely another aspect of the fight between the two heroes, one of whom is intended to represent Enkidu, and not, as Dr. Ward supposed, a grotesque repetition of Gilgamesh.75 The description of Enkidu in the Pennsylvania tablet as a parallel figure to Gilgamesh leads us to a consideration of the relationship of the two figures to one another. Many years ago it was pointed out that the Gilgamesh Epic was a composite tale in which various stories of an independent origin had been combined and brought into more or less artificial connection with the heros eponymos of southern Babylonia.76 We may now go a step further and point out that not [35]only is Enkidu originally an entirely independent figure, having no connection with Gish or Gilgamesh, but that the latter is really depicted in the Epic as the counterpart of Enkidu, a reflection who has been given the traits of extraordinary physical power that belong to Enkidu. This is shown in the first place by the fact that in the encounter it is Enkidu who triumphs over Gilgamesh. The entire analysis of the episode of the meeting between the two heroes as given by Gressmann77 must be revised. It is not Enkidu who is terrified and who is warned against the encounter. It is Gilgamesh who, during the night on his way from the house in which the goddess Ishḫara lies, encounters Enkidu on the highway. Enkidu “blocks the path”78 of Gilgamesh. He prevents Gilgamesh from re-entering the house,79 and the two attack each other “like oxen.”80 They grapple with each other, and Enkidu forces Gilgamesh to the ground. Enkidu is, therefore, the real hero whose traits of physical prowess are afterwards transferred to Gilgamesh. Similarly in the next episode, the struggle against Ḫuwawa, the Yale tablet makes it clear that in the original form of the tale Enkidu is the real hero. All warn Gish against the undertaking—the elders of Erech, Enkidu, and also the workmen. “Why dost thou desire to do this?”81 they say to him. “Thou art young, and thy heart carries thee away. Thou knowest not what thou proposest to do.”82 This part of the incident is now better known to us through the latest fragment of the Assyrian version discovered and published by King.83 The elders say to Gilgamesh: “Do not trust, O Gilgamesh, in thy strength! Be warned(?) against trusting to thy attack! The one who goes before will save his companion,84 He who has foresight will save his friend.85 [36] Let Enkidu go before thee. He knows the roads to the cedar forest; He is skilled in battle and has seen fight.” Gilgamesh is sufficiently impressed by this warning to invite Enkidu to accompany him on a visit to his mother, Ninsun, for the purpose of receiving her counsel.86 It is only after Enkidu, who himself hesitates and tries to dissuade Gish, decides to accompany the latter that the elders of Erech are reconciled and encourage Gish for the fray. The two in concert proceed against Ḫuwawa. Gilgamesh alone cannot carry out the plan. Now when a tale thus associates two figures in one deed, one of the two has been added to the original tale. In the present case there can be little doubt that Enkidu, without whom Gish cannot proceed, who is specifically described as “acquainted with the way … to the entrance of the forest”87 in which Ḫuwawa dwells is the original vanquisher. Naturally, the Epic aims to conceal this fact as much as possible ad majorem gloriam of Gilgamesh. It tries to put the one who became the favorite hero into the foreground. Therefore, in both the Babylonian and the Assyrian version Enkidu is represented as hesitating, and Gilgamesh as determined to go ahead. Gilgamesh, in fact, accuses Enkidu of cowardice and boldly declares that he will proceed even though failure stare him in the face.88 Traces of the older view, however, in which Gilgamesh is the one for whom one fears the outcome, crop out; as, for example, in the complaint of Gilgamesh’s mother to Shamash that the latter has stirred the heart of her son to take the distant way to Ḫu(m)baba, “To a fight unknown to him, he advances, An expedition unknown to him he undertakes.”89 Ninsun evidently fears the consequences when her son informs her of his intention and asks her counsel. The answer of Shamash is not preserved, but no doubt it was of a reassuring character, as was the answer of the Sun-god to Gish’s appeal and prayer as set forth in the Yale tablet.90 [37] Again, as a further indication that Enkidu is the real conqueror of Ḫuwawa, we find the coming contest revealed to Enkidu no less than three times in dreams, which Gilgamesh interprets.91 Since the person who dreams is always the one to whom the dream applies, we may see in these dreams a further trace of the primary rôle originally assigned to Enkidu. Another exploit which, according to the Assyrian version, the two heroes perform in concert is the killing of a bull, sent by Anu at the instance of Ishtar to avenge an insult offered to the goddess by Gilgamesh, who rejects her offer of marriage. In the fragmentary description of the contest with the bull, we find Enkidu “seizing” the monster by “its tail.”92 That Enkidu originally played the part of the slayer is also shown by the statement that it is he who insults Ishtar by throwing a piece of the carcass into the goddess’ face,93 adding also an insulting speech; and this despite the fact that Ishtar in her rage accuses Gilgamesh of killing the bull.94 It is thus evident that the Epic alters the original character of the episodes in order to find a place for Gilgamesh, with the further desire to assign to the latter the chief rôle. Be it noted also that Enkidu, not Gilgamesh, is punished for the insult to Ishtar. Enkidu must therefore in the original form of the episode have been the guilty party, who is stricken with mortal disease as a punishment to which after twelve days he succumbs.95 In view of this, we may supply the name of Enkidu in the little song introduced at the close of the encounter with the bull, and not Gilgamesh as has hitherto been done. “Who is distinguished among the heroes? Who is glorious among men? [Enkidu] is distinguished among heroes, [Enkidu] is glorious among men.”96 [38]Finally, the killing of lions is directly ascribed to Enkidu in the Pennsylvania tablet: “Lions he attacked *     *     *     *     * Lions he overcame”97 whereas Gilgamesh appears to be afraid of lions. On his long search for Utnapishtim he says: “On reaching the entrance of the mountain at night I saw lions and was afraid.”98 He prays to Sin and Ishtar to protect and save him. When, therefore, in another passage some one celebrates Gilgamesh as the one who overcame the “guardian,” who dispatched Ḫu(m)baba in the cedar forest, who killed lions and overthrew the bull,99 we have the completion of the process which transferred to Gilgamesh exploits and powers which originally belonged to Enkidu, though ordinarily the process stops short at making Gilgamesh a sharer in the exploits; with the natural tendency, to be sure, to enlarge the share of the favorite. We can now understand why the two heroes are described in the Pennsylvania tablet as alike, as born in the same place, aye, as brothers. Gilgamesh in the Epic is merely a reflex of Enkidu. The latter is the real hero and presumably, therefore, the older figure.100 Gilgamesh resembles Enkidu, because he is originally Enkidu. The “resemblance” motif is merely the manner in which in the course of the partly popular, partly literary transfer, the recollection is preserved that Enkidu is the original, and Gilgamesh the copy. The artificiality of the process which brings the two heroes together is apparent in the dreams of Gilgamesh which are interpreted by his mother as portending the coming of Enkidu. Not the conflict is foreseen, but the subsequent close association, naïvely described as due to the personal charm which Enkidu exercises, which will lead Gilgamesh to fall in love with the one whom he is to meet. The two will become one, like man and wife. [39] On the basis of our investigations, we are now in a position to reconstruct in part the cycle of episodes that once formed part of an Enkidu Epic. The fight between Enkidu and Gilgamesh, in which the former is the victor, is typical of the kind of tales told of Enkidu. He is the real prototype of the Greek Hercules. He slays lions, he overcomes a powerful opponent dwelling in the forests of Lebanon, he kills the bull, and he finally succumbs to disease sent as a punishment by an angry goddess. The death of Enkidu naturally formed the close of the Enkidu Epic, which in its original form may, of course, have included other exploits besides those taken over into the Gilgamesh Epic. There is another aspect of the figure of Enkidu which is brought forward in the Pennsylvania tablet more clearly than had hitherto been the case. Many years ago attention was called to certain striking resemblances between Enkidu and the figure of the first man as described in the early chapters of Genesis.101 At that time we had merely the Assyrian version of the Gilgamesh Epic at our disposal, and the main point of contact was the description of Enkidu living with the animals, drinking and feeding like an animal, until a woman is brought to him with whom he engages in sexual intercourse. This suggested that Enkidu was a picture of primeval man, while the woman reminded one of Eve, who when she is brought to Adam becomes his helpmate and inseparable companion. The Biblical tale stands, of course, on a much higher level, and is introduced, as are other traditions and tales of primitive times, in the style of a parable to convey certain religious teachings. For all that, suggestions of earlier conceptions crop out in the picture of Adam surrounded by animals to which he assigns names. Such a phrase as “there was no helpmate corresponding to him” becomes intelligible on the supposition of an existing tradition or belief, that man once lived and, indeed, cohabited with animals. The tales in the early chapters of Genesis must rest on very early popular traditions, which have been cleared of mythological and other objectionable features in order to adapt them to the purpose of the Hebrew compilers, to serve as a medium for illustrating [40]certain religious teachings regarding man’s place in nature and his higher destiny. From the resemblance between Enkidu and Adam it does not, of course, follow that the latter is modelled upon the former, but only that both rest on similar traditions of the condition under which men lived in primeval days prior to the beginnings of human culture. We may now pass beyond these general indications and recognize in the story of Enkidu as revealed by the Pennsylvania tablet an attempt to trace the evolution of primitive man from low beginnings to the regular and orderly family life associated with advanced culture. The new tablet furnishes a further illustration for the surprisingly early tendency among the Babylonian literati to connect with popular tales teachings of a religious or ethical character. Just as the episode between Gilgamesh and the maiden Sabitum is made the occasion for introducing reflections on the inevitable fate of man to encounter death, so the meeting of Enkidu with the woman becomes the medium of impressing the lesson of human progress through the substitution of bread and wine for milk and water, through the institution of the family, and through work and the laying up of resources. This is the significance of the address to Enkidu in column 4 of the Pennsylvania tablet, even though certain expressions in it are somewhat obscure. The connection of the entire episode of Enkidu and the woman with Gilgamesh is very artificial; and it becomes much more intelligible if we disassociate it from its present entanglement in the Epic. In Gilgamesh’s dream, portending the meeting with Enkidu, nothing is said of the woman who is the companion of the latter. The passage in which Enkidu is created by Aruru to oppose Gilgamesh102 betrays evidence of having been worked over in order to bring Enkidu into association with the longing of the people of Erech to get rid of a tyrannical character. The people in their distress appeal to Aruru to create a rival to Gilgamesh. In response, “Aruru upon hearing this created a man of Anu in her heart.” Now this “man of Anu” cannot possibly be Enkidu, for the sufficient reason that a few lines further on Enkidu is described as an [41]offspring of Ninib. Moreover, the being created is not a “counterpart” of Gilgamesh, but an animal-man, as the description that follows shows. We must separate lines 30–33 in which the creation of the “Anu man” is described from lines 34–41 in which the creation of Enkidu is narrated. Indeed, these lines strike one as the proper beginning of the original Enkidu story, which would naturally start out with his birth and end with his death. The description is clearly an account of the creation of the first man, in which capacity Enkidu is brought forward. “Aruru washed her hands, broke off clay, threw it on the field103 … created Enkidu, the hero, a lofty offspring of the host of Ninib.”104 The description of Enkidu follows, with his body covered with hair like an animal, and eating and drinking with the animals. There follows an episode105 which has no connection whatsoever with the Gilgamesh Epic, but which is clearly intended to illustrate how Enkidu came to abandon the life with the animals. A hunter sees Enkidu and is amazed at the strange sight—an animal and yet a man. Enkidu, as though resenting his condition, becomes enraged at the sight of the hunter, and the latter goes to his father and tells him of the strange creature whom he is unable to catch. In reply, the father advises his son to take a woman with him when next he goes out on his pursuit, and to have the woman remove her dress in the presence of Enkidu, who will then approach her, and after intercourse with her will abandon the animals among whom he lives. By this device he will catch the strange creature. Lines 14–18 of column 3 in the first tablet in which the father of the hunter refers to Gilgamesh must be regarded as a later insertion, a part of the reconstruction of the tale to connect the episode with Gilgamesh. The advice of the father to his son, the hunter, begins, line 19, “Go my hunter, take with thee a woman.” [42]In the reconstructed tale, the father tells his son to go to Gilgamesh to relate to him the strange appearance of the animal-man; but there is clearly no purpose in this, as is shown by the fact that when the hunter does so, Gilgamesh makes precisely the same speech as does the father of the hunter. Lines 40–44 of column 3, in which Gilgamesh is represented as speaking to the hunter form a complete doublet to lines 19–24, beginning “Go, my hunter, take with thee a woman, etc.” and similarly the description of Enkidu appears twice, lines 2–12 in an address of the hunter to his father, and lines 29–39 in the address of the hunter to Gilgamesh. The artificiality of the process of introducing Gilgamesh into the episode is revealed by this awkward and entirely meaningless repetition. We may therefore reconstruct the first two scenes in the Enkidu Epic as follows:106 Tablet I, col. 2, 34–35: Creation of Enkidu by Aruru. 36–41: Description of Enkidu’s hairy body and of his life with the animals. 42–50: The hunter sees Enkidu, who shows his anger, as also his woe, at his condition. 3, 1–12: The hunter tells his father of the strange being who pulls up the traps which the hunter digs, and who tears the nets so that the hunter is unable to catch him or the animals. 19–24: The father of the hunter advises his son on his next expedition to take a woman with him in order to lure the strange being from his life with the animals. Line 25, beginning “On the advice of his father,” must have set forth, in the original form of the episode, how the hunter procured the woman and took her with him to meet Enkidu. Column 4 gives in detail the meeting between the two, and naïvely describes how the woman exposes her charms to Enkidu, who is captivated by her and stays with her six days and seven nights. The animals see the change in Enkidu and run away from him. [43]He has been transformed through the woman. So far the episode. In the Assyrian version there follows an address of the woman to Enkidu beginning (col. 4, 34): “Beautiful art thou, Enkidu, like a god art thou.” We find her urging him to go with her to Erech, there to meet Gilgamesh and to enjoy the pleasures of city life with plenty of beautiful maidens. Gilgamesh, she adds, will expect Enkidu, for the coming of the latter to Erech has been foretold in a dream. It is evident that here we have again the later transformation of the Enkidu Epic in order to bring the two heroes together. Will it be considered too bold if we assume that in the original form the address of the woman and the construction of the episode were such as we find preserved in part in columns 2 to 4 of the Pennsylvania tablet, which forms part of the new material that can now be added to the Epic? The address of the woman begins in line 51 of the Pennsylvania tablet: “I gaze upon thee, Enkidu, like a god art thou.” This corresponds to the line in the Assyrian version (I, 4, 34) as given above, just as lines 52–53: “Why with the cattle Dost thou roam across the field?” correspond to I, 4, 35, of the Assyrian version. There follows in both the old Babylonian and the Assyrian version the appeal of the woman to Enkidu, to allow her to lead him to Erech where Gilgamesh dwells (Pennsylvania tablet lines 54–61 = Assyrian version I, 4, 36–39); but in the Pennsylvania tablet we now have a second speech (lines 62–63) beginning like the first one with al-ka, “come:” “Come, arise from the accursed ground.” Enkidu consents, and now the woman takes off her garments and clothes the naked Enkidu, while putting another garment on herself. She takes hold of his hand and leads him to the sheepfolds (not to Erech!!), where bread and wine are placed before him. Accustomed hitherto to sucking milk with cattle, Enkidu does not know what to do with the strange food until encouraged and instructed by the woman. The entire third column is taken up with this introduction [44]of Enkidu to civilized life in a pastoral community, and the scene ends with Enkidu becoming a guardian of flocks. Now all this has nothing to do with Gilgamesh, and clearly sets forth an entirely different idea from the one embodied in the meeting of the two heroes. In the original Enkidu tale, the animal-man is looked upon as the type of a primitive savage, and the point of the tale is to illustrate in the naïve manner characteristic of folklore the evolution to the higher form of pastoral life. This aspect of the incident is, therefore, to be separated from the other phase which has as its chief motif the bringing of the two heroes together. We now obtain, thanks to the new section revealed by the Pennsylvania tablet, a further analogy107 with the story of Adam and Eve, but with this striking difference, that whereas in the Babylonian tale the woman is the medium leading man to the higher life, in the Biblical story the woman is the tempter who brings misfortune to man. This contrast is, however, not inherent in the Biblical story, but due to the point of view of the Biblical writer, who is somewhat pessimistically inclined and looks upon primitive life, when man went naked and lived in a garden, eating of fruits that grew of themselves, as the blessed life in contrast to advanced culture which leads to agriculture and necessitates hard work as the means of securing one’s substance. Hence the woman through whom Adam eats of the tree of knowledge and becomes conscious of being naked is looked upon as an evil tempter, entailing the loss of the primeval life of bliss in a gorgeous Paradise. The Babylonian point of view is optimistic. The change to civilized life—involving the wearing of clothes and the eating of food that is cultivated (bread and wine) is looked upon as an advance. Hence the woman is viewed as the medium of raising man to a higher level. The feature common to the Biblical and Babylonian tales is the attachment of a lesson to early folk-tales. The story of Adam and Eve,108 as the story of Enkidu and the woman, is told with a purpose. Starting with early traditions of men’s primitive life on earth, that may have arisen independently, Hebrew and [45]Babylonian writers diverged, each group going its own way, each reflecting the particular point of view from which the evolution of human society was viewed. Leaving the analogy between the Biblical and Babylonian tales aside, the main point of value for us in the Babylonian story of Enkidu and the woman is the proof furnished by the analysis, made possible through the Pennsylvania tablet, that the tale can be separated from its subsequent connection with Gilgamesh. We can continue this process of separation in the fourth column, where the woman instructs Enkidu in the further duty of living his life with the woman decreed for him, to raise a family, to engage in work, to build cities and to gather resources. All this is looked upon in the same optimistic spirit as marking progress, whereas the Biblical writer, consistent with his point of view, looks upon work as a curse, and makes Cain, the murderer, also the founder of cities. The step to the higher forms of life is not an advance according to the J document. It is interesting to note that even the phrase the “cursed ground” occurs in both the Babylonian and Biblical tales; but whereas in the latter (Gen. 3, 17) it is because of the hard work entailed in raising the products of the earth that the ground is cursed, in the former (lines 62–63) it is the place in which Enkidu lives before he advances to the dignity of human life that is “cursed,” and which he is asked to leave. Adam is expelled from Paradise as a punishment, whereas Enkidu is implored to leave it as a necessary step towards progress to a higher form of existence. The contrast between the Babylonian and the Biblical writer extends to the view taken of viniculture. The Biblical writer (again the J document) looks upon Noah’s drunkenness as a disgrace. Noah loses his sense of shame and uncovers himself (Genesis 9, 21), whereas in the Babylonian description Enkidu’s jolly spirit after he has drunk seven jars of wine meets with approval. The Biblical point of view is that he who drinks wine becomes drunk;109 the Babylonian says, if you drink wine you become happy.110 If the thesis here set forth of the original character and import of the episode of Enkidu with the woman is correct, we may again regard lines 149–153 of the Pennsylvania tablet, in which Gilgamesh is introduced, as a later addition to bring the two heroes into association. [46]The episode in its original form ended with the introduction of Enkidu first to pastoral life, and then to the still higher city life with regulated forms of social existence. Now, to be sure, this Enkidu has little in common with the Enkidu who is described as a powerful warrior, a Hercules, who kills lions, overcomes the giant Ḫuwawa, and dispatches a great bull, but it is the nature of folklore everywhere to attach to traditions about a favorite hero all kinds of tales with which originally he had nothing to do. Enkidu, as such a favorite, is viewed also as the type of primitive man,111 and so there arose gradually an Epic which began with his birth, pictured him as half-animal half-man, told how he emerged from this state, how he became civilized, was clothed, learned to eat food and drink wine, how he shaved off the hair with which his body was covered,112 anointed himself—in short, “He became manlike.”113 Thereupon he is taught his duties as a husband, is introduced to the work of building, and to laying aside supplies, and the like. The fully-developed and full-fledged hero then engages in various exploits, of which some are now embodied in the Gilgamesh Epic. Who this Enkidu was, we are not in a position to determine, but the suggestion has been thrown out above that he is a personage foreign to Babylonia, that his home appears to be in the undefined Amurru district, and that he conquers that district. The original tale of Enkidu, if this view be correct, must therefore have been carried to the Euphrates Valley, at a very remote period, with one of the migratory waves that brought a western people as invaders into Babylonia. Here the tale was combined with stories current of another hero, Gilgamesh—perhaps also of Western origin—whose conquest of Erech likewise represents an invasion of Babylonia. The center of the Gilgamesh tale was Erech, and in the process of combining the stories of Enkidu and Gilgamesh, Enkidu is brought to Erech and the two perform exploits [47]in common. In such a combination, the aim would be to utilize all the incidents of both tales. The woman who accompanies Enkidu, therefore, becomes the medium of bringing the two heroes together. The story of the evolution of primitive man to civilized life is transformed into the tale of Enkidu’s removal to Erech, and elaborated with all kinds of details, among which we have, as perhaps embodying a genuine historical tradition, the encounter of the two heroes. Before passing on, we have merely to note the very large part taken in both the old Babylonian and the Assyrian version by the struggle against Ḫuwawa. The entire Yale tablet—forming, as we have seen, the third of the series—is taken up with the preparation for the struggle, and with the repeated warnings given to Gilgamesh against the dangerous undertaking. The fourth tablet must have recounted the struggle itself, and it is not improbable that this episode extended into the fifth tablet, since in the Assyrian version this is the case. The elaboration of the story is in itself an argument in favor of assuming some historical background for it—the recollection of the conquest of Amurru by some powerful warrior; and we have seen that this conquest must be ascribed to Enkidu and not to Gilgamesh. If, now, Enkidu is not only the older figure but the one who is the real hero of the most notable episode in the Gilgamesh Epic; if, furthermore, Enkidu is the Hercules who kills lions and dispatches the bull sent by an enraged goddess, what becomes of Gilgamesh? What is left for him? In the first place, he is definitely the conqueror of Erech. He builds the wall of Erech,114 and we may assume that the designation of the city as Uruk supûri, “the walled Erech,”115 rests upon this tradition. He is also associated with the great temple Eanna, “the heavenly house,” in Erech. To Gilgamesh belongs also the unenviable tradition of having exercised his rule in Erech so harshly that the people are impelled to implore Aruru to create a rival who may rid [48]the district of the cruel tyrant, who is described as snatching sons and daughters from their families, and in other ways terrifying the population—an early example of “Schrecklichkeit.” Tablets II to V inclusive of the Assyrian version being taken up with the Ḫuwawa episode, modified with a view of bringing the two heroes together, we come at once to the sixth tablet, which tells the story of how the goddess Ishtar wooed Gilgamesh, and of the latter’s rejection of her advances. This tale is distinctly a nature myth. The attempt of Gressmann116 to find some historical background to the episode is a failure. The goddess Ishtar symbolizes the earth which woos the sun in the spring, but whose love is fatal, for after a few months the sun’s power begins to wane. Gilgamesh, who in incantation hymns is invoked in terms which show that he was conceived as a sun-god,117 recalls to the goddess how she changed her lovers into animals, like Circe of Greek mythology, and brought them to grief. Enraged at Gilgamesh’s insult to her vanity, she flies to her father Anu and cries for revenge. At this point the episode of the creation of the bull is introduced, but if the analysis above given is correct it is Enkidu who is the hero in dispatching the bull, and we must assume that the sickness with which Gilgamesh is smitten is the punishment sent by Anu to avenge the insult to his daughter. This sickness symbolizes the waning strength of the sun after midsummer is past. The sun recedes from the earth, and this was pictured in the myth as the sun-god’s rejection of Ishtar; Gilgamesh’s fear of death marks the approach of the winter season, when the sun appears to have lost its vigor completely and is near to death. The entire episode is, therefore, a nature myth, symbolical of the passing of spring to midsummer and then to the bare season. The myth has been attached to Gilgamesh as a favorite figure, and then woven into a pattern with the episode of Enkidu and the bull. The bull episode can be detached from the nature myth without any loss to the symbolism of the tale of Ishtar and Gilgamesh. As already suggested, with Enkidu’s death after this conquest of the bull the original Enkidu Epic came to an end. In order to connect Gilgamesh with Enkidu, the former is represented as sharing [49]in the struggle against the bull. Enkidu is punished with death, while Gilgamesh is smitten with disease. Since both shared equally in the guilt, the punishment should have been the same for both. The differentiation may be taken as an indication that Gilgamesh’s disease has nothing to do with the bull episode, but is merely part of the nature myth. Gilgamesh now begins a series of wanderings in search of the restoration of his vigor, and this motif is evidently a continuation of the nature myth to symbolize the sun’s wanderings during the dark winter in the hope of renewed vigor with the coming of the spring. Professor Haupt’s view is that the disease from which Gilgamesh is supposed to be suffering is of a venereal character, affecting the organs of reproduction. This would confirm the position here taken that the myth symbolizes the loss of the sun’s vigor. The sun’s rays are no longer strong enough to fertilize the earth. In accord with this, Gilgamesh’s search for healing leads him to the dark regions118 in which the scorpion-men dwell. The terrors of the region symbolize the gloom of the winter season. At last Gilgamesh reaches a region of light again, described as a landscape situated at the sea. The maiden in control of this region bolts the gate against Gilgamesh’s approach, but the latter forces his entrance. It is the picture of the sun-god bursting through the darkness, to emerge as the youthful reinvigorated sun-god of the spring. Now with the tendency to attach to popular tales and nature myths lessons illustrative of current beliefs and aspirations, Gilgamesh’s search for renewal of life is viewed as man’s longing for eternal life. The sun-god’s waning power after midsummer is past suggests man’s growing weakness after the meridian of life has been left behind. Winter is death, and man longs to escape it. Gilgamesh’s wanderings are used as illustration of this longing, and accordingly the search for life becomes also the quest for immortality. Can the precious boon of eternal life be achieved? Popular fancy created the figure of a favorite of the gods who had escaped a destructive deluge in which all mankind had perished.119 Gilgamesh hears [50]of this favorite and determines to seek him out and learn from him the secret of eternal life. The deluge story, again a pure nature myth, symbolical of the rainy season which destroys all life in nature, is thus attached to the Epic. Gilgamesh after many adventures finds himself in the presence of the survivor of the Deluge who, although human, enjoys immortal life among the gods. He asks the survivor how he came to escape the common fate of mankind, and in reply Utnapishtim tells the story of the catastrophe that brought about universal destruction. The moral of the tale is obvious. Only those singled out by the special favor of the gods can hope to be removed to the distant “source of the streams” and live forever. The rest of mankind must face death as the end of life. That the story of the Deluge is told in the eleventh tablet of the series, corresponding to the eleventh month, known as the month of “rain curse”120 and marking the height of the rainy season, may be intentional, just as it may not be accidental that Gilgamesh’s rejection of Ishtar is recounted in the sixth tablet, corresponding to the sixth month,121 which marks the end of the summer season. The two tales may have formed part of a cycle of myths, distributed among the months of the year. The Gilgamesh Epic, however, does not form such a cycle. Both myths have been artificially attached to the adventures of the hero. For the deluge story we now have the definite proof for its independent existence, through Dr. Poebel’s publication of a Sumerian text which embodies the tale,122 and without any reference [51]to Gilgamesh. Similarly, Scheil and Hilprecht have published fragments of deluge stories written in Akkadian and likewise without any connection with the Gilgamesh Epic.123 In the Epic the story leads to another episode attached to Gilgamesh, namely, the search for a magic plant growing in deep water, which has the power of restoring old age to youth. Utnapishtim, the survivor of the deluge, is moved through pity for Gilgamesh, worn out by his long wanderings. At the request of his wife, Utnapishtim decides to tell Gilgamesh of this plant, and he succeeds in finding it. He plucks it and decides to take it back to Erech so that all may enjoy the benefit, but on his way stops to bathe in a cool cistern. A serpent comes along and snatches the plant from him, and he is forced to return to Erech with his purpose unachieved. Man cannot hope, when old age comes on, to escape death as the end of everything. Lastly, the twelfth tablet of the Assyrian version of the Gilgamesh Epic is of a purely didactic character, bearing evidence of having been added as a further illustration of the current belief that there is no escape from the nether world to which all must go after life has come to an end. Proper burial and suitable care of the dead represent all that can be done in order to secure a fairly comfortable rest for those who have passed out of this world. Enkidu is once more introduced into this episode. His shade is invoked by Gilgamesh and rises up out of the lower world to give a discouraging reply to Gilgamesh’s request, “Tell me, my friend, tell me, my friend, The law of the earth which thou hast experienced, tell me,” The mournful message comes back: “I cannot tell thee, my friend, I cannot tell.” Death is a mystery and must always remain such. The historical Gilgamesh has clearly no connection with the figure introduced into [52]this twelfth tablet. Indeed, as already suggested, the Gilgamesh Epic must have ended with the return to Erech, as related at the close of the eleventh tablet. The twelfth tablet was added by some school-men of Babylonia (or perhaps of Assyria), purely for the purpose of conveying a summary of the teachings in regard to the fate of the dead. Whether these six episodes covering the sixth to the twelfth tablets, (1) the nature myth, (2) the killing of the divine bull, (3) the punishment of Gilgamesh and the death of Enkidu, (4) Gilgamesh’s wanderings, (5) the Deluge, (6) the search for immortality, were all included at the time that the old Babylonian version was compiled cannot, of course, be determined until we have that version in a more complete form. Since the two tablets thus far recovered show that as early as 2000 B.C. the Enkidu tale had already been amalgamated with the current stories about Gilgamesh, and the endeavor made to transfer the traits of the former to the latter, it is eminently likely that the story of Ishtar’s unhappy love adventure with Gilgamesh was included, as well as Gilgamesh’s punishment and the death of Enkidu. With the evidence furnished by Meissner’s fragment of a version of the old Babylonian revision and by our two tablets, of the early disposition to make popular tales the medium of illustrating current beliefs and the teachings of the temple schools, it may furthermore be concluded that the death of Enkidu and the punishment of Gilgamesh were utilized for didactic purposes in the old Babylonian version. On the other hand, the proof for the existence of the deluge story in the Hammurabi period and some centuries later, independent of any connection with the Gilgamesh Epic, raises the question whether in the old Babylonian version, of which our two tablets form a part, the deluge tale was already woven into the pattern of the Epic. At all events, till proof to the contrary is forthcoming, we may assume that the twelfth tablet of the Assyrian version, though also reverting to a Babylonian original, dates as the latest addition to the Epic from a period subsequent to 2000 B.C.; and that the same is probably the case with the eleventh tablet. To sum up, there are four main currents that flow together in the Gilgamesh Epic even in its old Babylonian form: (1) the adventures of a mighty warrior Enkidu, resting perhaps on a faint tradition [53]of the conquest of Amurru by the hero; (2) the more definite recollection of the exploits of a foreign invader of Babylonia by the name of Gilgamesh, whose home appears likewise to have been in the West;124 (3) nature myths and didactic tales transferred to Enkidu and Gilgamesh as popular figures; and (4) the process of weaving the traditions, exploits, myths and didactic tales together, in the course of which process Gilgamesh becomes the main hero, and Enkidu his companion. Furthermore, our investigation has shown that to Enkidu belongs the episode with the woman, used to illustrate the evolution of primitive man to the ways and conditions of civilized life, the conquest of Ḫuwawa in the land of Amurru, the killing of lions and also of the bull, while Gilgamesh is the hero who conquers Erech. Identified with the sun-god, the nature myth of the union of the sun with the earth and the subsequent separation of the two is also transferred to him. The wanderings of the hero, smitten with disease, are a continuation of the nature myth, symbolizing the waning vigor of the sun with the approach of the wintry season. The details of the process which led to making Gilgamesh the favorite figure, to whom the traits and exploits of Enkidu and of the sun-god are transferred, escape us, but of the fact that Enkidu is the older figure, of whom certain adventures were set forth in a tale that once had an independent existence, there can now be little doubt in the face of the evidence furnished by the two tablets of the old Babylonian version; just as the study of these tablets shows that in the combination of the tales of Enkidu and Gilgamesh, the former is the prototype of which Gilgamesh is the copy. If the two are regarded as brothers, as born in the same place, even resembling one another in appearance and carrying out their adventures in common, it is because in the process of combination Gilgamesh becomes the reflex of Enkidu. That Enkidu is not the figure created by Aruru to relieve Erech of its tyrannical ruler is also shown by the fact that Gilgamesh remains in control of Erech. It is to Erech that he returns when he fails of his purpose to learn the secret of escape from old age and death. Erech is, therefore, not relieved of the presence of the ruthless ruler through Enkidu. The “Man of Anu” formed by Aruru as a deliverer is confused in the course of the growth of the [54]Epic with Enkidu, the offspring of Ninib, and in this way we obtain the strange contradiction of Enkidu and Gilgamesh appearing first as bitter rivals and then as close and inseparable friends. It is of the nature of Epic compositions everywhere to eliminate unnecessary figures by concentrating on one favorite the traits belonging to another or to several others. The close association of Enkidu and Gilgamesh which becomes one of the striking features in the combination of the tales of these two heroes naturally recalls the “Heavenly Twins” motif, which has been so fully and so suggestively treated by Professor J. Rendell Harris in his Cult of the Heavenly Twins, (London, 1906). Professor Harris has conclusively shown how widespread the tendency is to associate two divine or semi-divine beings in myths and legends as inseparable companions125 or twins, like Castor and Pollux, Romulus and Remus,126 the Acvins in the Rig-Veda,127 Cain and Abel, Jacob and Esau in the Old Testament, the Kabiri of the Phoenicians,128 Herakles and Iphikles in Greek mythology, Ambrica and Fidelio in Teutonic mythology, Patollo and Potrimpo in old Prussian mythology, Cautes and Cautopates in Mithraism, Jesus and Thomas (according to the Syriac Acts of Thomas), and the various illustrations of “Dioscuri in Christian Legends,” set forth by Dr. Harris in his work under this title, which carries the motif far down into the period of legends about Christian Saints who appear in pairs, including the reference to such a pair in Shakespeare’s Henry V: “And Crispin Crispian shall ne’er go by From that day to the ending of the world.”—(Act, IV, 3, 57–58.) There are indeed certain parallels which suggest that Enkidu-Gilgamesh may represent a Babylonian counterpart to the “Heavenly [55]Twins.” In the Indo-Iranian, Greek and Roman mythology, the twins almost invariably act together. In unison they proceed on expeditions to punish enemies.129 But after all, the parallels are of too general a character to be of much moment; and moreover the parallels stop short at the critical point, for Gilgamesh though worsted is not killed by Enkidu, whereas one of the “Heavenly Twins” is always killed by the brother, as Abel is by Cain, and Iphikles by his twin brother Herakles. Even the trait which is frequent in the earliest forms of the “Heavenly Twins,” according to which one is immortal and the other is mortal, though applying in a measure to Enkidu who is killed by Ishtar, while Gilgamesh the offspring of a divine pair is only smitten with disease, is too unsubstantial to warrant more than a general comparison between the Enkidu-Gilgamesh pair and the various forms of the “twin” motif found throughout the ancient world. For all that, the point is of some interest that in the Gilgamesh Epic we should encounter two figures who are portrayed as possessing the same traits and accomplishing feats in common, which suggest a partial parallel to the various forms in which the twin-motif appears in the mythologies, folk-lore and legends of many nations; and it may be that in some of these instances the duplication is due, as in the case of Enkidu and Gilgamesh, to an actual transfer of the traits of one figure to another who usurped his place. In concluding this study of the two recently discovered tablets of the old Babylonian version of the Gilgamesh Epic which has brought us several steps further in the interpretation and in our understanding of the method of composition of the most notable literary production of ancient Babylonia, it will be proper to consider the literary relationship of the old Babylonian to the Assyrian version. We have already referred to the different form in which the names of the chief figures appear in the old Babylonian version, dGish as against dGish-gì(n)-mash, dEn-ki-dũ as against dEn-ki-dú, Ḫu-wa-wa as against Ḫu(m)-ba-ba. Erech appears as Uruk ribîtim, “Erech of [56]the Plazas,” as against Uruk supûri, “walled Erech” (or “Erech within the walls”), in the Assyrian version.130 These variations point to an independent recension for the Assyrian revision; and this conclusion is confirmed by a comparison of parallel passages in our two tablets with the Assyrian version, for such parallels rarely extend to verbal agreements in details, and, moreover, show that the Assyrian version has been elaborated. Beginning with the Pennsylvania tablet, column I is covered in the Assyrian version by tablet I, 5, 25, to 6, 33, though, as pointed out above, in the Assyrian version we have the anticipation of the dreams of Gilgamesh and their interpretation through their recital to Enkidu by his female companion, whereas in the old Babylonian version we have the dreams directly given in a conversation between Gilgamesh and his mother. In the anticipation, there would naturally be some omissions. So lines 4–5 and 12–13 of the Pennsylvania tablet do not appear in the Assyrian version, but in their place is a line (I, 5, 35), to be restored to ”[I saw him and like] a woman I fell in love with him.” which occurs in the old Babylonian version only in connection with the second dream. The point is of importance as showing that in the Babylonian version the first dream lays stress upon the omen of the falling meteor, as symbolizing the coming of Enkidu, whereas the second dream more specifically reveals Enkidu as a man,131 of whom Gilgamesh is instantly enamored. Strikingly variant lines, though conveying the same idea, are frequent. Thus line 14 of the Babylonian version reads “I bore it and carried it to thee” and appears in the Assyrian version (I, 5, 35b supplied from 6, 26) “I threw it (or him) at thy feet”132 [57]with an additional line in elaboration “Thou didst bring him into contact with me”133 which anticipates the speech of the mother (Line 41 = Assyrian version I, 6, 33). Line 10 of the Pennsylvania tablet has pa-ḫi-ir as against iz-za-az I, 5, 31. Line 8 has ik-ta-bi-it as against da-an in the Assyrian version I, 5, 29. More significant is the variant to line 9 “I became weak and its weight I could not bear” as against I, 5, 30. “Its strength was overpowering,134 and I could not endure its weight.” The important lines 31–36 are not found in the Assyrian version, with the exception of I, 6, 27, which corresponds to lines 33–34, but this lack of correspondence is probably due to the fact that the Assyrian version represents the anticipation of the dreams which, as already suggested, might well omit some details. As against this we have in the Assyrian version I, 6, 23–25, an elaboration of line 30 in the Pennsylvania tablet and taken over from the recital of the first dream. Through the Assyrian version I, 6, 31–32, we can restore the closing lines of column I of the Pennsylvania tablet, while with line 33 = line 45 of the Pennsylvania tablet, the parallel between the two versions comes to an end. Lines 34–43 of the Assyrian version (bringing tablet I to a close)135 represent an elaboration of the speech of Ninsun, followed by a further address of Gilgamesh to his mother, and by the determination of Gilgamesh to seek out Enkidu.136 Nothing of this sort appears to have been included in the old Babylonian version.[58]Our text proceeds with the scene between Enkidu and the woman, in which the latter by her charms and her appeal endeavors to lead Enkidu away from his life with the animals. From the abrupt manner in which the scene is introduced in line 43 of the Pennsylvania tablet, it is evident that this cannot be the first mention of the woman. The meeting must have been recounted in the first tablet, as is the case in the Assyrian version.137 The second tablet takes up the direct recital of the dreams of Gilgamesh and then continues the narrative. Whether in the old Babylonian version the scene between Enkidu and the woman was described with the same naïve details, as in the Assyrian version, of the sexual intercourse between the two for six days and seven nights cannot of course be determined, though presumably the Assyrian version, with the tendency of epics to become more elaborate as they pass from age to age, added some realistic touches. Assuming that lines 44–63 of the Pennsylvania tablet—the cohabitation of Enkidu and the address of the woman—is a repetition of what was already described in the first tablet, the comparison with the Assyrian version I, 4, 16–41, not only points to the elaboration of the later version, but likewise to an independent recension, even where parallel lines can be picked out. Only lines 46–48 of the Pennsylvania tablet form a complete parallel to line 21 of column 4 of the Assyrian version. The description in lines 22–32 of column 4 is missing, though it may, of course, have been included in part in the recital in the first tablet of the old Babylonian version. Lines 49–59 of the Pennsylvania tablet are covered by 33–39, the only slight difference being the specific mention in line 58 of the Pennsylvania tablet of Eanna, the temple in Erech, described as “the dwelling of Anu,” whereas in the Assyrian version Eanna is merely referred to as the “holy house” and described as “the dwelling of Anu and Ishtar,” where Ishtar is clearly a later addition. Leaving aside lines 60–61, which may be merely a variant (though independent) of line 39 of column 4 of the Assyrian version, we now have in the Pennsylvania tablet a second speech of the woman to Enkidu (not represented in the Assyrian version) beginning like the first one with alka, “Come” (lines 62–63), in which she asks Enkidu to leave the “accursed ground” in which he dwells. This speech, as the description which follows, extending into columns 3–4, [59]and telling how the woman clothed Enkidu, how she brought him to the sheep folds, how she taught him to eat bread and to drink wine, and how she instructed him in the ways of civilization, must have been included in the second tablet of the Assyrian version which has come down to us in a very imperfect form. Nor is the scene in which Enkidu and Gilgamesh have their encounter found in the preserved portions of the second (or possibly the third) tablet of the Assyrian version, but only a brief reference to it in the fourth tablet,138 in which in Epic style the story is repeated, leading up to the second exploit—the joint campaign of Enkidu and Gilgamesh against Ḫuwawa. This reference, covering only seven lines, corresponds to lines 192–231 of the Pennsylvania tablet; but the former being the repetition and the latter the original recital, the comparison to be instituted merely reveals again the independence of the Assyrian version, as shown in the use of kibsu, “tread” (IV, 2, 46), for šêpu, “foot” (l. 216), i-na-uš, “quake” (line 5C), as against ir-tu-tu (ll. 221 and 226). Such variants as dGish êribam ûl iddin (l. 217) against dGilgamesh ana šurûbi ûl namdin, (IV, 2, 47). and again iṣṣabtûma kima lîm “they grappled at the gate of the family house” (IV, 2, 48), against iṣṣabtûma ina bâb bît emuti, “they grappled at the gate of the family house” (IV, 2, 48), all point once more to the literary independence of the Assyrian version. The end of the conflict and the reconciliation of the two heroes is likewise missing in the Assyrian version. It may have been referred to at the beginning of column 3139 of Tablet IV. Coming to the Yale tablet, the few passages in which a comparison [60]may be instituted with the fourth tablet of the Assyrian version, to which in a general way it must correspond, are not sufficient to warrant any conclusions, beyond the confirmation of the literary independence of the Assyrian version. The section comprised within lines 72–89, where Enkidu’s grief at his friend’s decision to fight Ḫuwawa is described140, and he makes confession of his own physical exhaustion, may correspond to Tablet IV, column 4, of the Assyrian version. This would fit in with the beginning of the reverse, the first two lines of which (136–137) correspond to column 5 of the fourth tablet of the Assyrian version, with a variation “seven-fold fear”141 as against “fear of men” in the Assyrian version. If lines 138–139 (in column 4) of the Yale tablet correspond to line 7 of column 5 of Tablet IV of the Assyrian version, we would again have an illustration of the elaboration of the later version by the addition of lines 3–6. But beyond this we have merely the comparison of the description of Ḫuwawa “Whose roar is a flood, whose mouth is fire, and whose breath is death” which occurs twice in the Yale tablet (lines 110–111 and 196–197), with the same phrase in the Assyrian version Tablet IV, 5, 3—but here, as just pointed out, with an elaboration. Practically, therefore, the entire Yale tablet represents an addition to our knowledge of the Ḫuwawa episode, and until we are fortunate enough to discover more fragments of the fourth tablet of the Assyrian version, we must content ourselves with the conclusions reached from a comparison of the Pennsylvania tablet with the parallels in the Assyrian version. It may be noted as a general point of resemblance in the exterior form of the old Babylonian and Assyrian versions that both were inscribed on tablets containing six columns, three on the obverse and three on the reverse; and that the length of the tablets—an average of 40 to 50 lines—was about the same, thus revealing in the external form a conventiona1 size for the tablets in the older period, which was carried over into later times. [61] 1 See for further details of this royal library, Jastrow, Civilization of Babylonia and Assyria, p. 21 seq. 2 Das Babylonische Nimrodepos (Leipzig, 1884–1891), supplemented by Haupt’s article Die Zwölfte Tafel des Babylonischen Nimrodepos in BA I, pp. 48–79, containing the fragments of the twelfth tablet. The fragments of the Epic in Ashurbanapal’s library—some sixty—represent portions of several copies. Sin-liḳî-unnini—perhaps from Erech, since this name appears as that of a family in tablets from Erech (see Clay, Legal Documents from Erech, Index, p. 73)—is named in a list of texts (K 9717—Haupt’s edition No. 51, line 18) as the editor of the Epic, though probably he was not the only compiler. Since the publication of Haupt’s edition, a few fragments were added by him as an appendix to Alfred Jeremias Izdubar-Nimrod (Leipzig, 1891) Plates II–IV, and two more are embodied in Jensen’s transliteration of all the fragments in the Keilinschriftliche Bibliothek VI; pp. 116–265, with elaborate notes, pp. 421–531. Furthermore a fragment, obtained from supplementary excavations at Kouyunjik, has been published by L. W. King in his Supplement to the Catalogue of the Cuneiform Tablets in the Kouyunjik Collection of the British Cuneiform Tablets in the Kouyunjik Collection of the British Museum No. 56 and PSBA Vol. 36, pp. 64–68. Recently a fragment of the 6th tablet from the excavations at Assur has been published by Ebeling, Keilschrifttexte aus Assur Religiösen Inhalts No. 115, and one may expect further portions to turn up. The designation “Nimrod Epic” on the supposition that the hero of the Babylonian Epic is identical with Nimrod, the “mighty hunter” of Genesis 10, has now been generally abandoned, in the absence of any evidence that the Babylonian hero bore a name like [10n]Nimrod. For all that, the description of Nimrod as the “mighty hunter” and the occurrence of a “hunter” in the Babylonian Epic (Assyrian version Tablet I)—though he is not the hero—points to a confusion in the Hebrew form of the borrowed tradition between Gilgamesh and Nimrod. The latest French translation of the Epic is by Dhorme, Choix de Textes Religieux Assyro-Babyloniens (Paris, 1907), pp. 182–325; the latest German translation by Ungnad-Gressmann, Das Gilgamesch-Epos (Göttingen, 1911), with a valuable analysis and discussion. These two translations now supersede Jensen’s translation in the Keilinschriftliche Bibliothek, which, however, is still valuable because of the detailed notes, containing a wealth of lexicographical material. Ungnad also gave a partial translation in Gressmann-Ranke, Altorientalische Texte and Bilder I, pp. 39–61. In English, we have translations of substantial portions by Muss-Arnolt in Harper’s Assyrian and Babylonian Literature (New York, 1901), pp. 324–368; by Jastrow, Religion of Babylonia and Assyria (Boston, 1898), Chap. XXIII; by Clay in Light on the Old Testament from Babel, pp. 78–84; by Rogers in Cuneiform Parallels to the Old Testament, pp. 80–103; and most recently by Jastrow in Sacred Books and Early Literature of the East (ed. C. F. Horne, New York, 1917), Vol. I, pp. 187–220. 3 See Luckenbill in JAOS, Vol. 37, p. 452 seq. Prof. Clay, it should be added, clings to the older reading, Hammurabi, which is retained in this volume. 4 ZA, Vol. 14, pp. 277–292. 5 The survivor of the Deluge is usually designated as Ut-napishtim in the Epic, but in one passage (Assyrian version, Tablet XI, 196), he is designated as Atra-ḫasis “the very wise one.” Similarly, in a second version of the Deluge story, also found in Ashurbanapal’s library (IV R² additions, p. 9, line 11). The two names clearly point to two versions, which in accordance with the manner of ancient compositions were merged into one. See an article by Jastrow in ZA, Vol. 13, pp. 288–301. 6 Published by Scheil in Recueil des Travaux, etc. Vol. 20, pp. 55–58. 7 The text does not form part of the Gilgamesh Epic, as the colophon, differing from the one attached to the Epic, shows. 8 Ein altbabylonisches Fragment des Gilgamosepos (MVAG 1902, No. 1). 9 On these variant forms of the two names see the discussion below, p. 24. 10 The passage is paralleled by Ecc. 9, 7–9. See Jastrow, A Gentle Cynic, p. 172 seq. 11 Among the Nippur tablets in the collection of the University of Pennsylvania Museum. The fragment was published by Dr. Poebel in his Historical and Grammatical Texts No. 23. See also Poebel in the Museum Journal, Vol. IV, p. 47, and an article by Dr. Langdon in the same Journal, Vol. VII, pp. 178–181, though Langdon fails to credit Dr. Poebel with the discovery and publication of the important tablet. 12 No. 55 in Langdon’s Historical and Religious Texts from the Temple Library of Nippur (Munich, 1914). 13 No. 5 in his Sumerian Liturgical Texts. (Philadelphia, 1917) 14 See on this name below, p. 23. 15 See further below, p. 37 seq. 16 See Poebel, Historical and Grammatical Texts, No. 1, and Jastrow in JAOS, Vol. 36, pp. 122–131 and 274–299. 17 See an article by Jastrow, Sumerian and Akkadian Views of Beginnings (JAOS Vol. 36, pp. 274–299). 18 See on this point Eduard Meyer, Sumerier und Semiten in Babylonien (Berlin, 1906), p. 107 seq., whose view is followed in Jastrow, Civilization of Babylonia and Assyria, p. 121. See also Clay, Empire of the Amorites (Yale University Press, 1919), p. 23 et seq. 19 See the discussion below, p. 24 seq. 20 Dr. Poebel published an article on the tablet in OLZ, 1914, pp. 4–6, in which he called attention to the correct name for the mother of Gilgamesh, which was settled by the tablet as Ninsun. 21 Historical Texts No. 2, Column 2, 26. See the discussion in Historical and Grammatical Texts, p. 123, seq. 22 See Fostat in OLZ, 1915, p. 367. 23 Publications of the University of Pennsylvania Museum, Babylonian Section, Vol. X, No. 3 (Philadelphia, 1917). It is to be regretted that Dr. Langdon should not have given full credit to Dr. Poebel for his discovery of the tablet. He merely refers in an obscure footnote to Dr. Poebel’s having made a copy. 24 E.g., in the very first note on page 211, and again in a note on page 213. 25 Dr. Langdon neglected to copy the signs 4 šú-si = 240 which appear on the edge of the tablet. He also misunderstood the word šú-tu-ur in the colophon which he translated “written,” taking the word from a stem šaṭâru, “write.” The form šú-tu-ur is III, 1, from atâru, “to be in excess of,” and indicates, presumably, that the text is a copy “enlarged” from an older original. See the Commentary to the colophon, p. 86. 26 Museum Journal, Vol. VIII, p. 29. 27 See below, p. 23. 28 I follow the enumeration of tablets, columns and lines in Jensen’s edition, though some fragments appear to have been placed by him in a wrong position. 29 According to Bezold’s investigation, Verbalsuffixformen als Alterskriterien babylonisch-assyrischer Inschriften (Heidelberg Akad. d. Wiss., Philos.-Histor. Klasse, 1910, 9te Abhandlung), the bulk of the tablets in Ashurbanapal’s library are copies of originals dating from about 1500 B.C. It does not follow, however, that all the copies date from originals of the same period. Bezold reaches the conclusion on the basis of various forms for verbal suffixes, that the fragments from the Ashurbanapal Library actually date from three distinct periods ranging from before c. 1450 to c. 700 B.C. 30 “Before thou comest from the mountain, Gilgamesh in Erech will see thy dreams,” after which the dreams are recounted by the woman to Enkidu. The expression “thy dreams” means here “dreams about thee.” (Tablet I, 5, 23–24). 31 Lines 100–101. 32 In a paper read before the American Oriental Society at New Haven, April 4, 1918. 33 See the commentary to col. 4 of the Yale tablet for further details. 34 This is no doubt the correct reading of the three signs which used to be read Iz-tu-bar or Gish-du-bar. The first sign has commonly the value Gish, the second can be read Gin or Gi (Brünnow No. 11900) and the third Mash as well as Bar. See Ungnad in Ungnad-Gressmann, Das Gilgamesch-Epos, p. 76, and Poebel, Historical and Grammatical Texts, p. 123. 35 So also in Sumerian (Zimmern, Sumerische Kultlieder aus altbabylonischer Zeit, No. 196, rev. 14 and 16.) 36 The sign used, LUM (Brünnow No. 11183), could have the value ḫu as well as ḫum. 37 The addition “father-in-law of Moses” to the name Ḫobab b. Re’uel in this passage must refer to Re’uel, and not to Ḫobab. In Judges 4, 11, the gloss “of the Bene Ḫobab, the father-in-law of Moses” must be separated into two: (1) “Bene Ḫobab,” and (2) “father-in-law of Moses.” The latter addition rests on an erroneous tradition, or is intended as a brief reminder that Ḫobab is identical with the son of Re’uel. 38 See his List of Personal Names from the Temple School of Nippur, p. 122. Ḫu-um-ba-bi-tu and ši-kin ḫu-wa-wa also occur in Omen Texts (CT XXVII, 4, 8–9 = Pl. 3, 17 = Pl. 6, 3–4 = CT XXVIII, 14, 12). The contrast to ḫuwawa is ligru, “dwarf” (CT XXVII, 4, 12 and 14 = Pl. 6, 7.9 = Pl. 3, 19). See Jastrow, Religion Babyloniens und Assyriens, II, p. 913, Note 7. Ḫuwawa, therefore, has the force of “monster.” 39 Ungnad-Gressmann, Das Gilgamesch-Epos, p. 111 seq. 40 Ungnad, 1. c. p. 77, called attention to this name, but failed to draw the conclusion that Ḫu(m)baba therefore belongs to the West and not to the East. 41 First pointed out by Ungnad in OLZ 1910, p. 306, on the basis of CT XVIII, 30, 10, where En-gi-dú appears in the column furnishing phonetic readings. 42 See Clay Amurru, pp. 74, 129, etc. 43 Tablet I, 2, 39–40; 3, 6–7 and 33–34; 4, 3–4. 44 Tablet I, 2, 1 and IX, 2, 16. Note also the statement about Gilgamesh that “his body is flesh of the gods” (Tablet IX, 2, 14; X, 1, 7). 45 BOR IV, p. 264. 46 Lewin, Die Scholien des Theodor bar Koni zur Patriarchengeschichte (Berlin, 1905), p. 2. See Gressmann in Ungnad-Gressmann, Das Gilgamesch-Epos, p. 83, who points out that the first element of גלמגוס compared with the second of גמיגמוס gives the exact form that we require, namely, Gilgamos. 47 Tablet I, col. 2, is taken up with this episode. 48 See Poebel, Historical and Grammatical Texts, p. 123. 49 See Poebel, Historical Texts No. 2, col. 2, 26. 50 Hilprecht, Old Babylonian Inscriptions I, 1 No. 26. 51 Delitzsch, Assyrische Lesestücke, p. 88, VI, 2–3. Cf. also CT XXV, 28(K 7659) 3, where we must evidently supply [Esigga]-tuk, for which in the following line we have again Gish-bil-ga-mesh as an equivalent. See Meissner, OLZ 1910, 99. 52 See, e.g., Barton, Haverford Collection II No. 27, Col. I, 14, etc. 53 Deimel, Pantheon Babylonicum, p. 95. 54 CT XII, 50 (K 4359) obv. 17. 55 See Barton, Origin and Development of Babylonian Writing, II, p. 99 seq., for various explanations, though all centering around the same idea of the picture of fire in some form. 56 See the passages quoted by Poebel, Historical and Grammatical Texts, p. 126. 57 E.g., Genesis 4, 20, Jabal, “the father of tent-dwelling and cattle holding;” Jubal (4, 21), “the father of harp and pipe striking.” 58 See particularly the plays (in the J. Document) upon the names of the twelve sons of Jacob, which are brought forward either as tribal characteristics, or as suggested by some incident or utterance by the mother at the birth of each son. 59 The designation is variously explained by Arabic writers. See Beidhawi’s Commentary (ed. Fleischer), to Súra 18, 82. 60 The writing Gish-gi-mash as an approach to the pronunciation Gilgamesh would thus represent the beginning of the artificial process which seeks to interpret the first syllable as “hero.” 61 See above, p. 27. 62 Poebel, Historical Texts, p. 115 seq. 63 Many years ago (BA III, p. 376) I equated Etana with Ethan in the Old Testament—therefore a West Semitic name. 64 See Clay, The Empire of the Amorites, p. 80. 65 Professor Clay strongly favors an Amoritic origin also for Gilgamesh. His explanation of the name is set forth in his recent work on The Empire of the Amorites, page 89, and is also referred to in his work on Amurru, page 79, and in his volume of Miscellaneous Inscriptions in the Yale Babylonian Collection, page 3, note. According to Professor Clay the original form of the hero’s name was West Semitic, and was something like Bilga-Mash, the meaning of which was perhaps “the offspring of Mash.” For the first element in this division of the name cf. Piliḳam, the name of a ruler of an early dynasty, and Balaḳ of the Old Testament. In view of the fact that the axe figures so prominently in the Epic as an instrument wielded by Gilgamesh, Professor Clay furthermore thinks it reasonable to assume that the name was interpreted by the Babylonian scribe as “the axe of Mash.” In this way he would account for the use of the determinative for weapons, which is also the sign Gish, in the name. It is certainly noteworthy that the ideogram Gish-Tún in the later form of Gish-Tún-mash = pašu, “axe,” CT XVI, 38:14b, etc. Tun also = pilaḳu “axe,” CT xii, 10:34b. Names with similar element (besides Piliḳam) are Belaḳu of the Hammurabi period, Bilaḳḳu of the Cassite period, etc. It is only proper to add that Professor Jastrow assumes the responsibility for the explanation of the form and etymology of the name Gilgamesh proposed in this volume. The question is one in regard to which legitimate differences of opinion will prevail among scholars until through some chance a definite decision, one way or the other, can be reached. 66 me-iḫ-rù (line 191). 67 Tablet I, 5, 23. Cf. I, 3, 2 and 29. 68 Tablet IV, 4, 7 and I, 5, 3. 69 Assyrian version, Tablet II, 3b 34, in an address of Shamash to Enkidu. 70 So Assyrian version, Tablet VIII, 3, 11. Also supplied VIII, 5, 20 and 21; and X, 1, 46–47 and 5, 6–7. 71 Tablet XII, 3, 25. 72 Ward, Seal Cylinders of Western Asia, Chap. X, and the same author’s Cylinders and other Ancient Oriental Seals—Morgan collection Nos. 19–50. 73 E.g., Ward No. 192, Enkidu has human legs like Gilgamesh; also No. 189, where it is difficult to say which is Gilgamesh, and which is Enkidu. The clothed one is probably Gilgamesh, though not infrequently Gilgamesh is also represented as nude, or merely with a girdle around his waist. 74 E.g., Ward, Nos. 173, 174, 190, 191, 195 as well as 189 and 192. 75 On the other hand, in Ward Nos. 459 and 461, the conflict between the two heroes is depicted with the heroes distinguished in more conventional fashion, Enkidu having the hoofs of an animal, and also with a varying arrangement of beard and hair. 76 See Jastrow, Religion of Babylonia and Assyria (Boston, 1898), p. 468 seq. 77 Ungnad-Gressmann, Das Gilgamesch-Epos, p. 90 seq. 78 Pennsylvania tablet, l. 198 = Assyrian version, Tablet IV, 2, 37. 79 “Enkidu blocked the gate” (Pennsylvania tablet, line 215) = Assyrian version Tablet IV, 2, 46: “Enkidu interposed his foot at the gate of the family house.” 80 Pennsylvania tablet, lines 218 and 224. 81 Yale tablet, line 198; also to be supplied lines 13–14. 82 Yale tablet, lines 190 and 191. 83 PSBA 1914, 65 seq. = Jensen III, 1a, 4–11, which can now be completed and supplemented by the new fragment. 84 I.e., Enkidu will save Gilgamesh. 85 These two lines impress one as popular sayings—here applied to Enkidu. 86 King’s fragment, col. I, 13–27, which now enables us to complete Jensen III, 1a, 12–21. 87 Yale tablet, lines 252–253. 88 Yale tablet, lines 143–148 = Assyrian version, Tablet IV, 6, 26 seq. 89 Assyrian version, Tablet III, 2a, 13–14. 90 Lines 215–222. 91 Assyrian version, Tablet V, Columns 3–4. We have to assume that in line 13 of column 4 (Jensen, p. 164), Enkidu takes up the thread of conversation, as is shown by line 22: “Enkidu brought his dream to him and spoke to Gilgamesh.” 92 Assyrian version, Tablet VI, lines 146–147. 93 Lines 178–183. 94 Lines 176–177. 95 Tablet VII, Column 6. 96 Assyrian version, Tablet VI, 200–203. These words are put into the mouth of Gilgamesh (lines 198–199). It is, therefore, unlikely that he would sing his own praise. Both Jensen and Ungnad admit that Enkidu is to be supplied in at least one of the lines. 97 Lines 109 and 112. 98 Assyrian version, Tablet IX, 1, 8–9. 99 Tablet VIII, 5, 2–6. 100 So also Gressmann in Ungnad-Gressmann, Das Gilgamesch-Epos, p. 97, regards Enkidu as the older figure. 101 See Jastrow, Adam and Eve in Babylonian Literature, AJSL, Vol. 15, pp. 193–214. 102 Assyrian version, Tablet I, 2, 31–36. 103 It will be recalled that Enkidu is always spoken of as “born in the field.” 104 Note the repetition ibtani “created” in line 33 of the “man of Anu” and in line 35 of the offspring of Ninib. The creation of the former is by the “heart,” i.e., by the will of Aruru, the creation of the latter is an act of moulding out of clay. 105 Tablet I, Column 3. 106 Following as usual the enumeration of lines in Jensen’s edition. 107 An analogy does not involve a dependence of one tale upon the other, but merely that both rest on similar traditions, which may have arisen independently. 108 Note that the name of Eve is not mentioned till after the fall (Genesis 3, 20). Before that she is merely ishsha, i.e., “woman,” just as in the Babylonian tale the woman who guides Enkidu is ḫarimtu, “woman.” 109 “And he drank and became drunk” (Genesis 9, 21). 110 “His heart became glad and his face shone” (Pennsylvania Tablet, lines 100–101). 111 That in the combination of this Enkidu with tales of primitive man, inconsistent features should have been introduced, such as the union of Enkidu with the woman as the beginning of a higher life, whereas the presence of a hunter and his father shows that human society was already in existence, is characteristic of folk-tales, which are indifferent to details that may be contradictory to the general setting of the story. 112 Pennsylvania tablet, lines 102–104. 113 Line 105. 114 Tablet I, 1, 9. See also the reference to the wall of Erech as an “old construction” of Gilgamesh, in the inscription of An-Am in the days of Sin-gamil (Hilprecht, Old Babylonian Inscriptions, I, No. 26.) Cf IV R² 52, 3, 53. 115 The invariable designation in the Assyrian version as against Uruk ribîtim, “Erech of the plazas,” in the old Babylonian version. 116 In Ungnad-Gressmann, Das Gilgamesch-Epos, p. 123 seq. 117 See Jensen, p. 266. Gilgamesh is addressed as “judge,” as the one who inspects the divisions of the earth, precisely as Shamash is celebrated. In line 8 of the hymn in question, Gilgamesh is in fact addressed as Shamash. 118 The darkness is emphasized with each advance in the hero’s wanderings (Tablet IX, col. 5). 119 This tale is again a nature myth, marking the change from the dry to the rainy season. The Deluge is an annual occurrence in the Euphrates Valley through the overflow [50n]of the two rivers. Only the canal system, directing the overflow into the fields, changed the curse into a blessing. In contrast to the Deluge, we have in the Assyrian creation story the drying up of the primeval waters so that the earth makes its appearance with the change from the rainy to the dry season. The world is created in the spring, according to the Akkadian view which is reflected in the Biblical creation story, as related in the P. document. See Jastrow, Sumerian and Akkadian Views of Beginnings (JAOS, Vol 36, p. 295 seq.). 120 Aš-am in Sumerian corresponding to the Akkadian Šabaṭu, which conveys the idea of destruction. 121 The month is known as the “Mission of Ishtar” in Sumerian, in allusion to another nature myth which describes Ishtar’s disappearance from earth and her mission to the lower world. 122 Historical Texts No. 1. The Sumerian name of the survivor is Zi-ū-gíd-du or perhaps Zi-ū-sū-du (cf. King, Legends of Babylon and Egypt, p. 65, note 4), signifying “He who lengthened the day of life,” i.e., the one of long life, of which Ut-napishtim (“Day of Life”) in the Assyrian version seems to be an abbreviated Akkadian rendering, [n]with the omission of the verb. So King’s view, which is here followed. See also CT XVIII, 30, 9, and Langdon, Sumerian Epic of Paradise, p. 90, who, however, enters upon further speculations that are fanciful. 123 See the translation in Ungnad-Gressmann, Das Gilgamesch-Epos, pp. 69, seq. and 73. 124 According to Professor Clay, quite certainly Amurru, just as in the case of Enkidu. 125 Gressmann in Ungnad-Gressmann, Das Gilgamesch-Epos, p. 100 seq. touches upon this motif, but fails to see the main point that the companions are also twins or at least brothers. Hence such examples as Abraham and Lot, David and Jonathan, Achilles and Patroclus, Eteokles and Polyneikes, are not parallels to Gilgamesh-Enkidu, but belong to the enlargement of the motif so as to include companions who are not regarded as brothers. 126 Or Romus. See Rendell Harris, l. c., p. 59, note 2. 127 One might also include the primeval pair Yama-Yami with their equivalents in Iranian mythology (Carnoy, Iranian Mythology, p. 294 seq.). 128 Becoming, however, a triad and later increased to seven. Cf. Rendell Harris, l. c., p. 32. 129 I am indebted to my friend, Professor A. J. Carnoy, of the University of Louvain, for having kindly gathered and placed at my disposal material on the “twin-brother” motif from Indo-European sources, supplemental to Rendell Harris’ work. 130 On the other hand, Uruk mâtum for the district of Erech, i.e., the territory over which the city holds sway, appears in both versions (Pennsylvania tablet, 1. 10 = Assyrian version I, 5, 36). 131 “My likeness” (line 27). It should be noted, however, that lines 32–44 of I, 5, in Jensen’s edition are part of a fragment K 9245 (not published, but merely copied by Bezold and Johns, and placed at Jensen’s disposal), which may represent a duplicate to I, 6, 23–34, with which it agrees entirely except for one line, viz., line 34 of K 9245 which is not found in column 6, 23–34. If this be correct, then there is lacking after line 31 of column 5, the interpretation of the dream given in the Pennsylvania tablet in lines 17–23. 132 ina šap-li-ki, literally, “below thee,” whereas in the old Babylonian version we have ana ṣi-ri-ka, “towards thee.” 133 Repeated I, 6, 28. 134 ul-tap-rid ki-is-su-šú-ma. The verb is from parâdu, “violent.” For kissu, “strong,” see CT XVI, 25, 48–49. Langdon (Gilgamesh Epic, p. 211, note 5) renders the phrase: “he shook his murderous weapon!!”—another illustration of his haphazard way of translating texts. 135 Shown by the colophon (Jeremias, Izdubar-Nimrod, Plate IV.) 136 Lines 42–43 must be taken as part of the narrative of the compiler, who tells us that after the woman had informed Enkidu that Gilgamesh already knew of Enkidu’s coming through dreams interpreted by Ninsun, Gilgamesh actually set out and encountered Enkidu. 137 Tablet I, col. 4. See also above, p. 19. 138 IV, 2, 44–50. The word ullanum, (l.43) “once” or “since,” points to the following being a reference to a former recital, and not an original recital. 139 Only the lower half (Haupt’s edition, p. 82) is preserved. 140 “The eyes of Enkidu were filled with tears,” corresponding to IV, 4, 10. 141 Unless indeed the number “seven” is a slip for the sign ša. See the commentary to the line. Pennsylvania Tablet The 240 lines of the six columns of the text are enumerated in succession, with an indication on the margin where a new column begins. This method, followed also in the case of the Yale tablet, seems preferable to Langdon’s breaking up of the text into Obverse and Reverse, with a separate enumeration for each of the six columns. In order, however, to facilitate a comparison with Langdon’s edition, a table is added: Obverse Col. I, 1 = Line 1 of our text. ,, I, 5 = ,, 5 ,, ,, ,, ,, I, 10 = ,, 10 ,, ,, ,, ,, I, 15 = ,, 15 ,, ,, ,, ,, I, 20 = ,, 20 ,, ,, ,, ,, I, 25 = ,, 25 ,, ,, ,, ,, I, 30 = ,, 30 ,, ,, ,, ,, I, 35 = ,, 35 ,, ,, ,, Col. II, 1 = Line 41 ,, ,, ,, ,, II, 5 = ,, 45 ,, ,, ,, ,, II, 10 = ,, 50 ,, ,, ,, ,, II, 15 = ,, 55 ,, ,, ,, ,, II, 20 = ,, 60 ,, ,, ,, ,, II, 25 = ,, 65 ,, ,, ,, ,, II, 30 = ,, 70 ,, ,, ,, ,, II, 35 = ,, 75 ,, ,, ,, Col. III, 1 = Line 81 ,, ,, ,, ,, III, 5 = ,, 85 ,, ,, ,, ,, III, 10 = ,, 90 ,, ,, ,, ,, III, 15 = ,, 95 ,, ,, ,, ,, III, 26 = ,, 100 ,, ,, ,, ,, III, 25 = ,, 105 ,, ,, ,, ,, III, 30 = ,, 110 ,, ,, ,, ,, III, 35 = ,, 115 ,, ,, ,, Reverse Col. I, 1 (= Col. IV) = Line 131 of our text. ,, I, 5 = ,, 135 ,, ,, ,, ,, I, 10 = ,, 140 ,, ,, ,, ,, I, 15 = ,, 145 ,, ,, ,, ,, I, 20 = ,, 150 ,, ,, ,, ,, I, 25 = ,, 155 ,, ,, ,, ,, I, 30 = ,, 160 ,, ,, ,, ,, II, 1 (= Col. V) = Line 171 ,, ,, ,, ,, II, 5 = ,, 175 ,, ,, ,, ,, II, 10 = ,, 180 ,, ,, ,, ,, II, 15 = ,, 185 ,, ,, ,, ,, II, 20 = ,, 190 ,, ,, ,, ,, II, 25 = ,, 195 ,, ,, ,, ,, II, 30 = ,, 200 ,, ,, ,, ,, III, 1 (= Col. VI) = Line 208 ,, ,, ,, ,, III, 5 = ,, 212 ,, ,, ,, ,, III, 10 = ,, 217 ,, ,, ,, ,, III, 15 = ,, 222 ,, ,, ,, ,, III, 20 = ,, 227 ,, ,, ,, ,, III, 25 = ,, 232 ,, ,, ,, ,, III, 30 = ,, 237 ,, ,, ,, ,, III, 33 = ,, 240 ,, ,, ,, [62] Pennsylvania Tablet. Transliteration. Col. I. 1it-bi-e-ma dGiš šú-na-tam i-pa-áš-šar 2iz-za-kàr-am a-na um-mi-šú 3um-mi i-na šá-at mu-ši-ti-ia 4šá-am-ḫa-ku-ma at-ta-na-al-la-ak 5i-na bi-ri-it it-lu-tim 6ib-ba-šú-nim-ma ka-ka-bu šá-ma-i 7[ki]-iṣ-rù šá A-nim im-ḳu-ut a-na ṣi-ri-ia 8áš-ši-šú-ma ik-ta-bi-it e-li-ia 9ú-ni-iš-šú-ma nu-uš-šá-šú ú-ul il-ti-’i 10Urukki ma-tum pa-ḫi-ir e-li-šú 11it-lu-tum ú-na-šá-ku ši-pi-šú 12ú-um-mi-id-ma pu-ti 13i-mi-du ia-ti 14áš-ši-a-šú-ma ab-ba-la-áš-šú a-na ṣi-ri-ki 15um-mi dGiš mu-di-a-at ka-la-ma 16iz-za-kàr-am a-na dGiš 17mi-in-di dGiš šá ki-ma ka-ti 18i-na ṣi-ri i-wa-li-id-ma 19ú-ra-ab-bi-šú šá-du-ú 20ta-mar-šú-ma [kima Sal(?)] ta-ḫa-du at-ta 21it-lu-tum ú-na-šá-ku ši-pi-šú 22tí-iṭ-ṭi-ra-áš-[šú tu-ut]-tu-ú-ma 23ta-tar-ra-[as-su] a-na ṣi-[ri]-ia 24[uš]-ti-nim-ma i-ta-mar šá-ni-tam[63] 25[šú-na]-ta i-ta-wa-a-am a-na um-mi-šú 26[um-mi] a-ta-mar šá-ni-tam 27[šú-na-tu a-ta]-mar e-mi-a i-na su-ḳi-im 28[šá Uruk]ki ri-bi-tim 29ḫa-aṣ-ṣi-nu na-di-i-ma 30e-li-šú pa-aḫ-ru 31ḫa-aṣ-ṣi-nu-um-ma šá-ni bu-nu-šú 32a-mur-šú-ma aḫ-ta-du a-na-ku 33a-ra-am-šú-ma ki-ma áš-šá-tim 34a-ḫa-ab-bu-ub el-šú 35el-ki-šú-ma áš-ta-ka-an-šú 36a-na a-ḫi-ia 37um-mi dGiš mu-da-at [ka]-la-ma 38[iz-za-kàr-am a-na dGiš] 39[dGiš šá ta-mu-ru amêlu] 40[ta-ḫa-ab-bu-ub ki-ma áš-šá-tim el-šú] Col. II. 41áš-šum uš-[ta]-ma-ḫa-ru it-ti-ka 42dGiš šú-na-tam i-pa-šar 43dEn-ki-[dũ wa]-ši-ib ma-ḫar ḫa-ri-im-tim 44ur-[šá ir]-ḫa-mu di-da-šá(?) ip-tí-[e] 45[dEn-ki]-dũ im-ta-ši a-šar i-wa-al-du 46ûm, 6 ù 7 mu-ši-a-tim 47dEn-[ki-dũ] ti-bi-i-ma 48šá-[am-ka-ta] ir-ḫi 49ḫa-[ri-im-tum pa-a]-šá i-pu-šá-am-ma 50iz-za-[kàr-am] a-na dEn-ki-dũ 51a-na-tal-ka dEn-ki-dũ ki-ma ili ta-ba-áš-ši 52am-mi-nim it-ti na-ma-áš-te-e 53ta-at-ta-[na-al]-ak ṣi-ra-am[64] 54al-kam lu-úr-di-ka 55a-na libbi [Urukki] ri-bi-tim 56a-na bît [el]-lim mu-šá-bi šá A-nim 57dEn-ki-dũ ti-bi lu-ru-ka 58a-na Ê-[an]-na mu-šá-bi šá A-nim 59a-šar [dGiš gi]-it-ma-[lu] ne-pi-ši-tim 60ù at-[ta] ki-[ma Sal ta-ḫa]-bu-[ub]-šú 61ta-[ra-am-šú ki-ma] ra-ma-an-ka 62al-ka ti-ba i-[na] ga-ag-ga-ri 63ma-a-ag-ri-i-im 64iš-me a-wa-as-sa im-ta-ḫar ga-ba-šá 65mi-il-[kum] šá aššatim 66im-ta-ḳu-ut a-na libbi-šú 67iš-ḫu-ut li-ib-šá-am 68iš-ti-nam ú-la-ab-bi-iš-sú 69li-ib-[šá-am] šá-ni-a-am 70ši-i it-ta-al-ba-áš 71ṣa-ab-tat ga-as-su 72ki-ma [ili] i-ri-id-di-šú 73a-na gu-up-ri šá-ri-i-im 74a-šar tar-ba-ṣi-im 75i-na [áš]-ri-šú [im]-ḫu-ruri-ia-ú 76[ù šú-u dEn-ki-dũ i-lit-ta-šú šá-du-um-ma] 77[it-ti ṣabâti-ma ik-ka-la šam-ma] 78[it-ti bu-lim maš-ḳa-a i-šat-ti] 79[it-ti na-ma-áš-te-e mê i-ṭab lib-ba-šú] (Perhaps one additional line missing.) Col. III. 81ši-iz-ba šá na-ma-áš-te-e 82i-te-en-ni-ik 83a-ka-lam iš-ku-nu ma-ḫar-šú 84ib-tí-ik-ma i-na-at-tal 85ù ip-pa-al-la-as[65] 86ú-ul i-di dEn-ki-dũ 87aklam a-na a-ka-lim 88šikaram a-na šá-te-e-im 89la-a lum-mu-ud 90ḫa-ri-im-tum pi-šá i-pu-šá-am-ma 91iz-za-kàr-am a-na dEn-ki-dũ 92a-ku-ul ak-lam dEn-ki-dũ 93zi-ma-at ba-la-ṭi-im 94šikaram ši-ti ši-im-ti ma-ti 95i-ku-ul a-ak-lam dEn-ki-dũ 96a-di ši-bi-e-šú 97šikaram iš-ti-a-am 987 aṣ-ṣa-am-mi-im 99it-tap-šar kab-ta-tum i-na-an-gu 100i-li-iṣ libba-šú-ma 101pa-nu-šú [it]-tam-ru 102ul-tap-pi-it [lùŠÚ]-I 103šú-ḫu-ra-am pa-ga-ar-šú 104šá-am-nam ip-ta-šá-áš-ma 105a-we-li-iš i-we 106il-ba-áš li-ib-šá-am 107ki-ma mu-ti i-ba-áš-ši 108il-ki ka-ak-ka-šú 109la-bi ú-gi-ir-ri 110uš-sa-ak-pu re’ûti mu-ši-a-tim 111ut-tap-pi-iš šib-ba-ri 112la-bi uk-ta-ši-id 113it-ti-[lu] na-ki-[di-e] ra-bu-tum 114dEn-ki-dũ ma-aṣ-ṣa-ar-šú-nu 115a-we-lum giš-ru-um 116iš-te-en it-lum 117a-na [na-ki-di-e(?) i]-za-ak-ki-ir (About five lines missing.) Col. IV. (About eight lines missing.) 131i-ip-pu-uš ul-ṣa-am 132iš-ši-ma i-ni-i-šú 133i-ta-mar a-we-lam[66] 134iz-za-kàr-am a-na ḫarimtim 135šá-am-ka-at uk-ki-ši a-we-lam 136a-na mi-nim il-li-kam 137zi-ki-ir-šú lu-uš-šú 138ḫa-ri-im-tum iš-ta-si a-we-lam 139i-ba-uš-su-um-ma i-ta-mar-šú 140e-di-il e-eš ta-ḫi-[il-la]-am 141lim-nu a-la-ku ma-na-aḫ-[ti]-ka 142e-pi-šú i-pu-šá-am-ma 143iz-za-kàr-am a-na dEn-[ki-dũ] 144bi-ti-iš e-mu-tim ik …… 145ši-ma-a-at ni-ši-i-ma 146tu-a-(?)-ar e-lu-tim 147a-na âli(?) dup-šak-ki-i e-ṣi-en 148uk-la-at âli(?) e-mi-sa a-a-ḫa-tim 149a-na šarri šá Urukki ri-bi-tim 150pi-ti pu-uk epiši(-ši) a-na ḫa-a-a-ri 151a-na dGiš šarri šá Urukki ri-bi-tim 152pi-ti pu-uk epiši(-ši) 153a-na ḫa-a-a-ri 154áš-ša-at ši-ma-tim i-ra-aḫ-ḫi 155šú-ú pa-na-nu-um-ma 156mu-uk wa-ar-ka-nu 157i-na mi-il-ki šá ili ga-bi-ma 158i-na bi-ti-iḳ a-bu-un-na-ti-šú 159ši-ma-as-su 160a-na zi-ik-ri it-li-im 161i-ri-ku pa-nu-šú (About three lines missing.) [67] Col. V. (About six lines missing.) 171i-il-la-ak [dEn-ki-dũ i-na pa-ni] 172u-šá-am-ka-at [wa]-ar-ki-šú 173i-ru-ub-ma a-na libbi Urukki ri-bi-tim 174ip-ḫur um-ma-nu-um i-na ṣi-ri-šú 175iz-zi-za-am-ma i-na su-ḳi-im 176šá Urukki ri-bi-tim 177pa-aḫ-ra-a-ma ni-šú 178i-ta-wa-a i-na ṣi-ri-šú 179a-na ṣalam dGiš ma-ši-il pi-it-tam 180la-nam šá-pi-il 181si-ma …. [šá-ki-i pu]-uk-ku-ul 182............. i-pa-ka-du 183i-[na mâti da-an e-mu]-ki i-wa 184ši-iz-ba šá na-ma-aš-te-e 185i-te-en-ni-ik 186ka-a-a-na i-na [libbi] Urukki kak-ki-a-tum 187it-lu-tum ú-te-el-li-lu 188šá-ki-in ur-šá-nu 189a-na itli šá i-šá-ru zi-mu-šú 190a-na dGiš ki-ma i-li-im 191šá-ki-iš-šum me-iḫ-rù 192a-na dIš-ḫa-ra ma-a-a-lum 193na-di-i-ma 194dGiš it-[ti-il-ma wa-ar-ka-tim] 195i-na mu-ši in-ni-[ib-bi]-it 196i-na-ag-šá-am-ma 197it-ta-[zi-iz dEn-ki-dũ] i-na sûḳim 198ip-ta-ra-[aṣ a-la]-ak-tam 199šá dGiš 200[a-na e-pi-iš] da-na-ni-iš-šú (About three lines missing.) [68] Col. VI. (About four lines missing.) 208šar(?)-ḫa 209dGiš … 210i-na ṣi-ri-[šú il-li-ka-am dEn-ki-dũ] 211i-ḫa-an-ni-ib [pi-ir-ta-šú] 212it-bi-ma [il-li-ik] 213a-na pa-ni-šú 214it-tam-ḫa-ru i-na ri-bi-tum ma-ti 215dEn-ki-dũ ba-ba-am ip-ta-ri-ik 216i-na ši-pi-šú 217dGiš e-ri-ba-am ú-ul id-di-in 218iṣ-ṣa-ab-tu-ma ki-ma li-i-im 219i-lu-du 220zi-ip-pa-am ’i-bu-tu 221i-ga-rum ir-tu-tu 222dGiš ù dEn-ki-dũ 223iṣ-ṣa-ab-tu-ú-ma 224ki-ma li-i-im i-lu-du 225zi-ip-pa-am ’i-bu-tu 226i-ga-rum ir-tu-tú 227ik-mi-is-ma dGiš 228i-na ga-ag-ga-ri ši-ip-šú 229ip-ši-iḫ uz-za-šú-ma 230i-ni-iḫ i-ra-as-su 231iš-tu i-ra-su i-ni-ḫu 232dEn-ki-dũ a-na šá-ši-im 233iz-za-kàr-am a-na dGiš 234ki-ma iš-te-en-ma um-ma-ka 235ú-li-id-ka 236ri-im-tum šá su-pu-ri 237dNin-sun-na 238ul-lu e-li mu-ti ri-eš-ka 239šar-ru-tú šá ni-ši 240i-ši-im-kum dEn-lil 241 duppu 2 kam-ma 242šú-tu-ur e-li ………………… 243 4 šú-ši [62] Translation. Col. I. 1Gish sought to interpret the dream; 2Spoke to his mother: 3“My mother, during my night 4I became strong and moved about 5among the heroes; 6And from the starry heaven 7A meteor(?) of Anu fell upon me: 8I bore it and it grew heavy upon me, 9I became weak and its weight I could not endure. 10The land of Erech gathered about it. 11The heroes kissed its feet.1 12It was raised up before me. 13They stood me up.2 14I bore it and carried it to thee.” 15The mother of Gish, who knows all things, 16Spoke to Gish: 17“Some one, O Gish, who like thee 18In the field was born and 19Whom the mountain has reared, 20Thou wilt see (him) and [like a woman(?)] thou wilt rejoice. 21Heroes will kiss his feet. 22Thou wilt spare [him and wilt endeavor] 23To lead him to me.” 24He slept and saw another[63] 25Dream, which he reported to his mother: 26[“My mother,] I have seen another 27[Dream.] My likeness I have seen in the streets 28[Of Erech] of the plazas. 29An axe was brandished, and 30They gathered about him; 31And the axe made him angry. 32I saw him and I rejoiced, 33I loved him as a woman, 34I embraced him. 35I took him and regarded him 36As my brother.” 37The mother of Gish, who knows all things, 38[Spoke to Gish]: 39[“O Gish, the man whom thou sawest,] 40[Whom thou didst embrace like a woman]. Col II. 41(means) that he is to be associated with thee.” 42Gish understood the dream. 43[As] Enki[du] was sitting before the woman, 44[Her] loins(?) he embraced, her vagina(?) he opened. 45[Enkidu] forgot the place where he was born. 46Six days and seven nights 47Enkidu continued 48To cohabit with [the courtesan]. 49[The woman] opened her [mouth] and 50Spoke to Enkidu: 51“I gaze upon thee, O Enkidu, like a god art thou! 52Why with the cattle 53Dost thou [roam] across the field?[64] 54Come, let me lead thee 55into [Erech] of the plazas, 56to the holy house, the dwelling of Anu, 57O, Enkidu arise, let me conduct thee 58To Eanna, the dwelling of Anu, 59The place [where Gish is, perfect] in vitality. 60And thou [like a wife wilt embrace] him. 61Thou [wilt love him like] thyself. 62Come, arise from the ground 63(that is) cursed.” 64He heard her word and accepted her speech. 65The counsel of the woman 66Entered his heart. 67She stripped off a garment, 68Clothed him with one. 69Another garment 70She kept on herself. 71She took hold of his hand. 72Like [a god(?)] she brought him 73To the fertile meadow, 74The place of the sheepfolds. 75In that place they received food; 76[For he, Enkidu, whose birthplace was the mountain,] 77[With the gazelles he was accustomed to eat herbs,] 78[With the cattle to drink water,] 79[With the water beings he was happy.] (Perhaps one additional line missing.) Col. III. 81Milk of the cattle 82He was accustomed to suck. 83Food they placed before him, 84He broke (it) off and looked 85And gazed.[65] 86Enkidu had not known 87To eat food. 88To drink wine 89He had not been taught. 90The woman opened her mouth and 91Spoke to Enkidu: 92“Eat food, O Enkidu, 93The provender of life! 94Drink wine, the custom of the land!” 95Enkidu ate food 96Till he was satiated. 97Wine he drank, 98Seven goblets. 99His spirit was loosened, he became hilarious. 100His heart became glad and 101His face shone. 102[The barber(?)] removed 103The hair on his body. 104He was anointed with oil. 105He became manlike. 106He put on a garment, 107He was like a man. 108He took his weapon; 109Lions he attacked, 110(so that) the night shepherds could rest. 111He plunged the dagger; 112Lions he overcame. 113The great [shepherds] lay down; 114Enkidu was their protector. 115The strong man, 116The unique hero, 117To [the shepherds(?)] he speaks: (About five lines missing.) Col. IV. (About eight lines missing.) 131Making merry. 132He lifted up his eyes, 133He sees the man.[66] 134He spoke to the woman: 135“O, courtesan, lure on the man. 136Why has he come to me? 137His name I will destroy.” 138The woman called to the man 139Who approaches to him3 and he beholds him. 140“Away! why dost thou [quake(?)] 141Evil is the course of thy activity.”4 142Then he5 opened his mouth and 143Spoke to Enkidu: 144”[To have (?)] a family home 145Is the destiny of men, and 146The prerogative(?) of the nobles. 147For the city(?) load the workbaskets! 148Food supply for the city lay to one side! 149For the King of Erech of the plazas, 150Open the hymen(?), perform the marriage act! 151For Gish, the King of Erech of the plazas, 152Open the hymen(?), 153Perform the marriage act! 154With the legitimate wife one should cohabit. 155So before, 156As well as in the future.6 157By the decree pronounced by a god, 158From the cutting of his umbilical cord 159(Such) is his fate.” 160At the speech of the hero 161His face grew pale. (About three lines missing.) [67] Col. V. (About six lines missing.) 171[Enkidu] went [in front], 172And the courtesan behind him. 173He entered into Erech of the plazas. 174The people gathered about him. 175As he stood in the streets 176Of Erech of the plazas, 177The men gathered, 178Saying in regard to him: 179“Like the form of Gish he has suddenly become; 180shorter in stature. 181[In his structure high(?)], powerful, 182.......... overseeing(?) 183In the land strong of power has he become. 184Milk of cattle 185He was accustomed to suck.” 186Steadily(?) in Erech ..... 187The heroes rejoiced. 188He became a leader. 189To the hero of fine appearance, 190To Gish, like a god, 191He became a rival to him.7 192For Ishḫara a couch 193Was stretched, and 194Gish [lay down, and afterwards(?)] 195In the night he fled. 196He approaches and 197[Enkidu stood] in the streets. 198He blocked the path 199of Gish. 200At the exhibit of his power, (About three lines missing.) [68] Col. VI. (About four lines missing.) 208Strong(?) … 209Gish 210Against him [Enkidu proceeded], 211[His hair] luxuriant. 212He started [to go] 213Towards him. 214They met in the plaza of the district. 215Enkidu blocked the gate 216With his foot, 217Not permitting Gish to enter. 218They seized (each other), like oxen, 219They fought. 220The threshold they demolished; 221The wall they impaired. 222Gish and Enkidu 223Seized (each other). 224Like oxen they fought. 225The threshold they demolished; 226The wall they impaired. 227Gish bent 228His foot to the ground,8 229His wrath was appeased, 230His breast was quieted. 231When his breast was quieted, 232Enkidu to him 233Spoke, to Gish: 234“As a unique one, thy mother 235bore thee. 236The wild cow of the stall,9 237Ninsun, 238Has exalted thy head above men. 239Kingship over men 240Enlil has decreed for thee. 241Second tablet, 242enlarged beyond [the original(?)]. 243240 lines. [69] 1 I.e., paid homage to the meteor. 2 I.e., the heroes of Erech raised me to my feet, or perhaps in the sense of “supported me.” 3 I.e., Enkidu. 4 I.e., “thy way of life.” 5 I.e., the man. 6 I.e., an idiomatic phrase meaning “for all times.” 7 I.e., Enkidu became like Gish, godlike. Cf. col. 2, 11. 8 He was thrown and therefore vanquished. 9 Epithet given to Ninsun. See the commentary to the line. Commentary on the Pennsylvania Tablet. Line 1. The verb tibû with pašâru expresses the aim of Gish to secure an interpretation for his dream. This disposes of Langdon’s note 1 on page 211 of his edition, in which he also erroneously speaks of our text as “late.” Pašâru is not a variant of zakâru. Both verbs occur just as here in the Assyrian version I, 5, 25. Line 3. ina šât mušitia, “in this my night,” i.e., in the course of this night of mine. A curious way of putting it, but the expression occurs also in the Assyrian version, e.g., I, 5, 26 (parallel passage to ours) and II, 4a, 14. In the Yale tablet we find, similarly, mu-ši-it-ka (l. 262), “thy night,” i.e., “at night to thee.” Line 5. Before Langdon put down the strange statement of Gish “wandering about in the midst of omens” (misreading id-da-tim for it-lu-tim), he might have asked himself the question, what it could possibly mean. How can one walk among omens? Line 6. ka-ka-bu šá-ma-i must be taken as a compound term for “starry heaven.” The parallel passage in the Assyrian version (Tablet I, 5, 27) has the ideograph for star, with the plural sign as a variant. Literally, therefore, “The starry heaven (or “the stars in heaven”) was there,” etc. Langdon’s note 2 on page 211 rests on an erroneous reading. Line 7. kiṣru šá Anim, “mass of Anu,” appears to be the designation of a meteor, which might well be described as a “mass” coming from Anu, i.e., from the god of heaven who becomes the personification of the heavens in general. In the Assyrian version (I, 5, 28) we have kima ki-iṣ-rù, i.e., “something like a mass of heaven.” Note also I, 3, 16, where in a description of Gilgamesh, his strength is said to be “strong like a mass (i.e., a meteor) of heaven.” Line 9. For nuššašu ûl iltê we have a parallel in the Hebrew phrase נלְַפָסֵתִי נשַׂפָס (Isaiah 1, 14). Line 10. Uruk mâtum, as the designation for the district of Erech, occurs in the Assyrian version, e.g., I, 5, 31, and IV, 2, 38; also to be supplied, I, 6, 23. For paḫir the parallel in the Assyrian version has iz-za-az (I, 5, 31), but VI, 197, we find paḫ-ru and paḫ-ra. Line 17. mi-in-di does not mean “truly” as Langdon translates, but “some one.” It occurs also in the Assyrian version X, 1, 13, mi-in-di-e ma-an-nu-ṵ, “this is some one who,” etc. [70] Line 18. Cf. Assyrian version I, 5, 3, and IV, 4, 7, ina ṣiri âlid—both passages referring to Enkidu. Line 21. Cf. Assyrian version II, 3b, 38, with malkê, “kings,” as a synonym of itlutum. Line 23. ta-tar-ra-as-sú from tarâṣu, “direct,” “guide,” etc. Line 24. I take uš-ti-nim-ma as III, 2, from išênu (יָשֵׁן), the verb underlying šittu, “sleep,” and šuttu, “dream.” Line 26. Cf. Assyrian version I, 6, 21—a complete parallel. Line 28. Uruk ri-bi-tim, the standing phrase in both tablets of the old Babylonian version, for which in the Assyrian version we have Uruk su-pu-ri. The former term suggests the “broad space” outside of the city or the “common” in a village community, while supûri, “enclosed,” would refer to the city within the walls. Dr. W. F. Albright (in a private communication) suggests “Erech of the plazas” as a suitable translation for Uruk ribîtim. A third term, Uruk mâtum (see above, note to line 10), though designating rather the district of which Erech was the capital, appears to be used as a synonym to Uruk ribîtim, as may be concluded from the phrase i-na ri-bi-tum ma-ti (l. 214 of the Pennsylvania tablet), which clearly means the “plaza” of the city. One naturally thinks of רְחֹבֹת עִיר in Genesis 10, 11—the equivalent of Babylonian ri-bi-tu âli—which can hardly be the name of a city. It appears to be a gloss, as is הִיַפָס הָעִיּר הַגְּדֹלָה at the end of v. 12. The latter gloss is misplaced, since it clearly describes “Nineveh,” mentioned in v. 11. Inasmuch as רְחֹבֹת עִיר immediately follows the mention of Nineveh, it seems simplest to take the phrase as designating the “outside” or “suburbs” of the city, a complete parallel, therefore, to ri-bi-tu mâti in our text. Nineveh, together with the “suburbs,” forms the “great city.” Uruk ribîtim is, therefore, a designation for “greater Erech,” proper to a capital city, which by its gradual growth would take in more than its original confines. “Erech of the plazas” must have come to be used as a honorific designation of this important center as early as 2000 B. C., whereas later, perhaps because of its decline, the epithet no longer seemed appropriate and was replaced by the more modest designation of “walled Erech,” with an allusion to the tradition which ascribed the building of the wall of the city to Gilgamesh. At all [71]events, all three expressions, “Erech of the plazas,” “Erech walled” and “Erech land,” are to be regarded as synonymous. The position once held by Erech follows also from its ideographic designation (Brünnow No. 4796) by the sign “house” with a “gunufied” extension, which conveys the idea of Unu = šubtu, or “dwelling” par excellence. The pronunciation Unug or Unuk (see the gloss u-nu-uk, VR 23, 8a), composed of unu, “dwelling,” and ki, “place,” is hardly to be regarded as older than Uruk, which is to be resolved into uru, “city,” and ki, “place,” but rather as a play upon the name, both Unu + ki and Uru + ki conveying the same idea of the city or the dwelling place par excellence. As the seat of the second oldest dynasty according to Babylonian traditions (see Poebel’s list in Historical and Grammatical Texts No. 2), Erech no doubt was regarded as having been at one time “the city,” i.e., the capital of the entire Euphrates Valley. Line 31. A difficult line for which Langdon proposes the translation: “Another axe seemed his visage”!!—which may be picturesque, but hardly a description befitting a hero. How can a man’s face seem to be an axe? Langdon attaches šá-ni in the sense of “second” to the preceding word “axe,” whereas šanî bunušu, “change of his countenance” or “his countenance being changed,” is to be taken as a phrase to convey the idea of “being disturbed,” “displeased” or “angry.” The phrase is of the same kind as the well-known šunnu ṭêmu, “changing of reason,” to denote “insanity.” See the passages in Muss-Arnolt, Assyrian Dictionary, pp. 355 and 1068. In Hebrew, too, we have the same two phrases, e.g., וַיְשַׁנֹּו ַפָסֶת־טַעְמֹו (I Sam. 21, 14 = Ps. 34, 1), “and he changed his reason,” i.e., feigned insanity and מְשַׁנֶּה פָּנָיו (Job 14, 20), “changing his face,” to indicate a radical alteration in the frame of mind. There is a still closer parallel in Biblical Aramaic: Dan. 3, 19, “The form of his visage was changed,” meaning “he was enraged.” Fortunately, the same phrase occurs also in the Yale tablet (l. 192), šá-nu-ú bu-nu-šú, in a connection which leaves no doubt that the aroused fury of the tyrant Ḫuwawa is described by it: ”Ḫuwawa heard and his face was changed” precisely, therefore, as we should say—following Biblical usage—“his countenance fell.” Cf. also the phrase pânušu arpu, “his countenance [72]was darkened” (Assyrian version I, 2, 48), to express “anger.” The line, therefore, in the Pennsylvania tablet must describe Enkidu’s anger. With the brandishing of the axe the hero’s anger was also stirred up. The touch was added to prepare us for the continuation in which Gish describes how, despite this (or perhaps just because of it), Enkidu seemed so attractive that Gish instantly fell in love with him. May perhaps the emphatic form ḫaṣinumma (line 31) against ḫaṣinu (line 29) have been used to indicate “The axe it was,” or “because of the axe?” It would be worth while to examine other texts of the Hammurabi period with a view of determining the scope in the use and meaning of the emphatic ma when added to a substantive. Line 32. The combination amur ù aḫtadu occurs also in the El-Amarna Letters, No. 18, 12. Line 34. In view of the common Hebrew, Syriac and Arabic חָבַב “to love,” it seems preferable to read here, as in the other passages in the Assyrian versions (I, 4, 15; 4, 35; 6, 27, etc.), a-ḫa-ab-bu-ub, aḫ-bu-ub, iḫ-bu-bu, etc. (instead of with p), and to render “embrace.” Lines 38–40, completing the column, may be supplied from the Assyrian version I, 6, 30–32, in conjunction with lines 33–34 of our text. The beginning of line 32 in Jensen’s version is therefore to be filled out [ta-ra-am-šú ki]-i. Line 43. The restoration at the beginning of this line En-ki-[dũ wa]-ši-ib ma-ḫar ḫa-ri-im-tim enables us to restore also the beginning of the second tablet of the Assyrian version (cf. the colophon of the fragment 81, 7–27, 93, in Jeremias, Izdubar-Nimrod, plate IV = Jensen, p. 134), [dEn-ki-dũ wa-ši-ib] ma-ḫar-šá. Line 44. The restoration of this line is largely conjectural, based on the supposition that its contents correspond in a general way to I, 4, 16, of the Assyrian version. The reading di-da is quite certain, as is also ip-ti-[e]; and since both words occur in the line of the Assyrian version in question, it is tempting to supply at the beginning ur-[šá] = “her loins” (cf. Holma, Namen der Körperteile, etc., p. 101), which is likewise found in the same line of the Assyrian version. At all events the line describes the fascination exercised [73]upon Enkidu by the woman’s bodily charms, which make him forget everything else. Lines 46–47 form a parallel to I, 4, 21, of the Assyrian version. The form šamkatu, “courtesan,” is constant in the old Babylonian version (ll. 135 and 172), as against šamḫatu in the Assyrian version (I, 3, 19, 40, 45; 4, 16), which also uses the plural šam-ḫa-a-ti (II, 3b, 40). The interchange between ḫ and k is not without precedent (cf. Meissner, Altbabylonisches Privatrecht, page 107, note 2, and more particularly Chiera, List of Personal Names, page 37). In view of the evidence, set forth in the Introduction, for the assumption that the Enkidu story has been combined with a tale of the evolution of primitive man to civilized life, it is reasonable to suggest that in the original Enkidu story the female companion was called šamkatu, “courtesan,” whereas in the tale of the primitive man, which was transferred to Enkidu, the associate was ḫarimtu, a “woman,” just as in the Genesis tale, the companion of Adam is simply called ishshâ, “woman.” Note that in the Assyrian parallel (Tablet I, 4, 26) we have two readings, ir-ḫi (imperf.) and a variant i-ri-ḫi (present). The former is the better reading, as our tablet shows. Lines 49–59 run parallel to the Assyrian version I, 4, 33–38, with slight variations which have been discussed above, p. 58, and from which we may conclude that the Assyrian version represents an independent redaction. Since in our tablet we have presumably the repetition of what may have been in part at least set forth in the first tablet of the old Babylonian version, we must not press the parallelism with the first tablet of the Assyrian version too far; but it is noticeable nevertheless (1) that our tablet contains lines 57–58 which are not represented in the Assyrian version, and (2) that the second speech of the “woman” beginning, line 62, with al-ka, “come” (just as the first speech, line 54), is likewise not found in the first tablet of the Assyrian version; which on the other hand contains a line (39) not in the Babylonian version, besides the detailed answer of Enkidu (I 4, 42–5, 5). Line 6, which reads “Enkidu and the woman went (il-li-ku) to walled Erech,” is also not found in the second tablet of the old Babylonian version. Line 63. For magrû, “accursed,” see the frequent use in Astrological texts (Jastrow, Religion Babyloniens und Assyriens II, page [74]450, note 2). Langdon, by his strange error in separating ma-a-ag-ri-im into two words ma-a-ak and ri-i-im, with a still stranger rendering: “unto the place yonder of the shepherds!!”, naturally misses the point of this important speech. Line 64 corresponds to I, 4, 40, of the Assyrian version, which has an additional line, leading to the answer of Enkidu. From here on, our tablet furnishes material not represented in the Assyrian version, but which was no doubt included in the second tablet of that version of which we have only a few fragments. Line 70 must be interpreted as indicating that the woman kept one garment for herself. Ittalbaš would accordingly mean, “she kept on.” The female dress appears to have consisted of an upper and a lower garment. Line 72. The restoration “like a god” is favored by line 51, where Enkidu is likened to a god, and is further confirmed by l. 190. Line 73. gupru is identical with gu-up-ri (Thompson, Reports of the Magicians and Astrologers, etc., 223 rev. 2 and 223a rev. 8), and must be correlated to gipâru (Muss-Arnolt, Assyrian Dictionary, p. 229a), “planted field,” “meadow,” and the like. Thompson’s translation “men” (as though a synonym of gabru) is to be corrected accordingly. Line 74. There is nothing missing between a-šar and tar-ba-ṣi-im. Line 75. ri-ia-ú, which Langdon renders “shepherd,” is the equivalent of the Arabic riʿy and Hebrew רְעִי “pasturage,” “fodder.” We have usually the feminine form ri-i-tu (Muss-Arnolt, Assyrian Dictionary, p. 990b). The break at the end of the second column is not serious. Evidently Enkidu, still accustomed to live like an animal, is first led to the sheepfolds, and this suggests a repetition of the description of his former life. Of the four or five lines missing, we may conjecturally restore four, on the basis of the Assyrian version, Tablet I, 4, 2–5, or I, 2, 39–41. This would then join on well to the beginning of column 3. Line 81. Both here and in l. 52 our text has na-ma-áš-te-e, as against nam-maš-ši-i in the Assyrian version, e.g., Tablet I, 2, 41; 4, 5, etc.,—the feminine form, therefore, as against the masculine. Langdon’s note 3 on page 213 is misleading. In astrological texts we also find nam-maš-te; e.g., Thompson, Reports of the Magicians and Astrologers, etc., No. 200, Obv. 2. [75] Line 93. zi-ma-at (for simat) ba-la-ṭi-im is not “conformity of life” as Langdon renders, but that which “belongs to life” like si-mat pag-ri-šá, “belonging to her body,” in the Assyrian version III, 2a, 3 (Jensen, page 146). “Food,” says the woman, “is the staff of life.” Line 94. Langdon’s strange rendering “of the conditions and fate of the land” rests upon an erroneous reading (see the corrections, Appendix I), which is the more inexcusable because in line 97 the same ideogram, Kàš = šikaru, “wine,” occurs, and is correctly rendered by him. Šimti mâti is not the “fate of the land,” but the “fixed custom of the land.” Line 98. aṣ-ṣa-mi-im (plural of aṣṣamu), which Langdon takes as an adverb in the sense of “times,” is a well-known word for a large “goblet,” which occurs in Incantation texts, e.g., CT XVI, 24, obv. 1, 19, mê a-ṣa-am-mi-e šú-puk, “pour out goblets of water.” Line 18 of the passage shoves that aṣammu is a Sumerian loan word. Line 99. it-tap-šar, I, 2, from pašâru, “loosen.” In combination with kabtatum (from kabitatum, yielding two forms: kabtatum, by elision of i, and kabittu, by elision of a), “liver,” pašâru has the force of becoming cheerful. Cf. ka-bit-ta-ki lip-pa-šir (ZA V., p. 67, line 14). Line 100, note the customary combination of “liver” (kabtatum) and “heart” (libbu) for “disposition” and “mind,” just as in the standing phrase in penitential prayers: “May thy liver be appeased, thy heart be quieted.” Line 102. The restoration [lùŠÚ]-I = gallabu “barber” (Delitzsch, Sumer. Glossar, p. 267) was suggested to me by Dr. H. F. Lutz. The ideographic writing “raising the hand” is interesting as recalling the gesture of shaving or cutting. Cf. a reference to a barber in Lutz, Early Babylonian Letters from Larsa, No. 109, 6. Line 103. Langdon has correctly rendered šuḫuru as “hair,” and has seen that we have here a loan-word from the Sumerian Suḫur = kimmatu, “hair,” according to the Syllabary Sb 357 (cf. Delitzsch, Sumer. Glossar., p. 253). For kimmatu, “hair,” more specifically hair of the head and face, see Holma, Namen der Körperteile, page 3. The same sign Suḫur or Suḫ (Brünnow No. 8615), with Lal, i.e., “hanging hair,” designates the “beard” (ziḳnu, cf. Brünnow, No. 8620, and Holma, l. c., p. 36), and it is interesting to [76]note that we have šuḫuru (introduced as a loan-word) for the barbershop, according to II R, 21, 27c (= CT XII, 41). Ê suḫur(ra) (i.e., house of the hair) = šú-ḫu-ru. In view of all this, we may regard as assured Holma’s conjecture to read šú-[ḫur-ma-šú] in the list 93074 obv. (MVAG 1904, p. 203; and Holma, Beiträge z. Assyr. Lexikon, p. 36), as the Akkadian equivalent to Suḫur-Maš-Ḫa and the name of a fish, so called because it appeared to have a double “beard” (cf. Holma, Namen der Körperteile). One is tempted, furthermore, to see in the difficult word שכירה (Isaiah 7, 20) a loan-word from our šuḫuru, and to take the words ַפָסֶת־הָרַֹפָסשׁ וְשַׂעַר הָרַגְלַיִם “the head and hair of the feet” (euphemistic for the hair around the privates), as an explanatory gloss to the rare word שכירה for “hair” of the body in general—just as in the passage in the Pennsylvania tablet. The verse in Isaiah would then read, “The Lord on that day will shave with the razor the hair (השכירה), and even the beard will be removed.” The rest of the verse would represent a series of explanatory glosses: (a) “Beyond the river” (i.e., Assyria), a gloss to יְגַלַּח (b) “with the king of Assyria,” a gloss to בְּתַעַר “with a razor;” and (c) “the hair of the head and hair of the feet,” a gloss to השכירה. For “hair of the feet” we have an interesting equivalent in Babylonian šu-ḫur (and šú-ḫu-ur) šêpi (CT XII, 41, 23–24 c-d). Cf. also Boissier, Documents Assyriens relatifs aux Présages, p. 258, 4–5. The Babylonian phrase is like the Hebrew one to be interpreted as a euphemism for the hair around the male or female organ. To be sure, the change from ה to כ in השכירה constitutes an objection, but not a serious one in the case of a loan-word, which would aim to give the pronunciation of the original word, rather than the correct etymological equivalent. The writing with aspirated כ fulfills this condition. (Cf. šamkatum and šamḫatum, above p. 73). The passage in Isaiah being a reference to Assyria, the prophet might be tempted to use a foreign word to make his point more emphatic. To take השכירה as “hired,” as has hitherto been done, and to translate “with a hired razor,” is not only to suppose a very wooden metaphor, but is grammatically difficult, since השכירח would be a feminine adjective attached to a masculine substantive. Coming back to our passage in the Pennsylvania tablet, it is to [77]be noted that Enkidu is described as covered “all over his body with hair” (Assyrian version, Tablet I, 2, 36) like an animal. To convert him into a civilized man, the hair is removed. Line 107. mutu does not mean “husband” here, as Langdon supposes, but must be taken as in l. 238 in the more general sense of “man,” for which there is good evidence. Line 109. la-bi (plural form) are “lions”—not “panthers” as Langdon has it. The verb ú-gi-ir-ri is from gâru, “to attack.” Langdon by separating ú from gi-ir-ri gets a totally wrong and indeed absurd meaning. See the corrections in the Appendix. He takes the sign ú for the copula (!!) which of course is impossible. Line 110. Read uš-sa-ak-pu, III, 1, of sakâpu, which is frequently used for “lying down” and is in fact a synonym of ṣalâlu. See Muss-Arnolt, Assyrian Dictionary, page 758a. The original has very clearly Síb (= rê’u, “shepherd”) with the plural sign. The “shepherds of the night,” who could now rest since Enkidu had killed the lions, are of course the shepherds who were accustomed to watch the flocks during the night. Line 111. ut-tap-pi-iš is II, 2, napâšu, “to make a hole,” hence “to plunge” in connection with a weapon. Šib-ba-ri is, of course, not “mountain goats,” as Langdon renders, but a by-form to šibbiru, “stick,” and designates some special weapon. Since on seal cylinders depicting Enkidu killing lions and other animals the hero is armed with a dagger, this is presumably the weapon šibbaru. Line 113. Langdon’s translation is again out of the question and purely fanciful. The traces favor the restoration na-ki-[di-e], “shepherds,” and since the line appears to be a parallel to line 110, I venture to suggest at the beginning [it-ti]-lu from na’âlu, “lie down”—a synonym, therefore, to sakâpu in line 110. The shepherds can sleep quietly after Enkidu has become the “guardian” of the flocks. In the Assyrian version (tablet II, 3a, 4) Enkidu is called a na-kid, “shepherd,” and in the preceding line we likewise have lùNa-Kid with the plural sign, i.e., “shepherds.” This would point to nakidu being a Sumerian loan-word, unless it is vice versa, a word that has gone over into the Sumerian from Akkadian. Is perhaps the fragment in question (K 8574) in the Assyrian version (Haupt’s ed. No. 25) the parallel to our passage? If in line 4 of this fragment we could read šú for sa, i.e., na-kid-šú-nu, “their shepherd, we would have a [78]parallel to line 114 of the Pennsylvania tablet, with na-kid as a synonym to maṣṣaru, “protector.” The preceding line would then be completed as follows: [it-ti-lu]-nim-ma na-kidmeš [ra-bu-tum] (or perhaps only it-ti-lu-ma, since the nim is not certain) and would correspond to line 113 of the Pennsylvania tablet. Inasmuch as the writing on the tiny fragment is very much blurred, it is quite possible that in line 2 we must read šib-ba-ri (instead of bar-ba-ri), which would furnish a parallel to line 111 of the Pennsylvania tablet. The difference between Bar and Šib is slight, and the one sign might easily be mistaken for the other in the case of close writing. The continuation of line 2 of the fragment would then correspond to line 112 of the Pennsylvania tablet, while line 1 of the fragment might be completed [re-e]-u-ti(?) šá [mu-ši-a-tim], though this is by no means certain. The break at the close of column 3 (about 5 lines) and the top of column 4 (about 8 lines) is a most serious interruption in the narrative, and makes it difficult to pick up the thread where the tablet again becomes readable. We cannot be certain whether the “strong man, the unique hero” who addresses some one (lines 115–117) is Enkidu or Gish or some other personage, but presumably Gish is meant. In the Assyrian version, Tablet I, 3, 2 and 29, we find Gilgamesh described as the “unique hero” and in l. 234 of the Pennsylvania tablet Gish is called “unique,” while again, in the Assyrian version, Tablet I, 2, 15 and 26, he is designated as gašru as in our text. Assuming this, whom does he address? Perhaps the shepherds? In either case he receives an answer that rejoices him. If the fragment of the Assyrian version (K 8574) above discussed is the equivalent to the close of column 3 of the Pennsylvania tablet, we may go one step further, and with some measure of assurance assume that Gish is told of Enkidu’s exploits and that the latter is approaching Erech. This pleases Gish, but Enkidu when he sees Gish(?) is stirred to anger and wants to annihilate him. At this point, the “man” (who is probably Gish, though the possibility of a third personage must be admitted) intervenes and in a long speech sets forth the destiny and higher aims of mankind. The contrast between Enkidu and Gish (or the third party) is that between the primitive [79]savage and the civilized being. The contrast is put in the form of an opposition between the two. The primitive man is the stronger and wishes to destroy the one whom he regards as a natural foe and rival. On the other hand, the one who stands on a higher plane wants to lift his fellow up. The whole of column 4, therefore, forms part of the lesson attached to the story of Enkidu, who, identified with man in a primitive stage, is made the medium of illustrating how the higher plane is reached through the guiding influences of the woman’s hold on man, an influence exercised, to be sure, with the help of her bodily charms. Line 135. uk-ki-ši (imperative form) does not mean “take away,” as Langdon (who entirely misses the point of the whole passage) renders, but on the contrary, “lure him on,” “entrap him,” and the like. The verb occurs also in the Yale tablet, ll. 183 and 186. Line 137. Langdon’s note to lu-uš-šú had better be passed over in silence. The form is II. 1, from ešû, “destroy.” Line 139. Since the man whom the woman calls approaches Enkidu, the subject of both verbs is the man, and the object is Enkidu; i.e., therefore, “The man approaches Enkidu and beholds him.” Line 140. Langdon’s interpretation of this line again is purely fanciful. E-di-il cannot, of course, be a “phonetic variant” of edir; and certainly the line does not describe the state of mind of the woman. Lines 140–141 are to be taken as an expression of amazement at Enkidu’s appearance. The first word appears to be an imperative in the sense of “Be off,” “Away,” from dâlu, “move, roam.” The second word e-eš, “why,” occurs with the same verb dâlu in the Meissner fragment: e-eš ta-da-al (column 3, 1), “why dost thou roam about?” The verb at the end of the line may perhaps be completed to ta-ḫi-il-la-am. The last sign appears to be am, but may be ma, in which case we should have to complete simply ta-ḫi-il-ma. Taḫîl would be the second person present of ḫîlu. Cf. i-ḫi-il, frequently in astrological texts, e.g., Virolleaud, Adad No. 3, lines 21 and 33. Line 141. The reading lim-nu at the beginning, instead of Langdon’s mi-nu, is quite certain, as is also ma-na-aḫ-ti-ka instead of what Langdon proposes, which gives no sense whatever. Manaḫtu in the sense of the “toil” and “activity of life” (like עָמָל throughout the Book of Ecclesiastes) occurs in the introductory lines to [80]the Assyrian version of the Epic I, 1, 8, ka-lu ma-na-aḫ-ti-[šu], “all of his toil,” i.e., all of his career. Line 142. The subject of the verb cannot be the woman, as Langdon supposes, for the text in that case, e.g., line 49, would have said pi-šá (“her mouth”) not pi-šú (“his mouth”). The long speech, detailing the function and destiny of civilized man, is placed in the mouth of the man who meets Enkidu. In the Introduction it has been pointed out that lines 149 and 151 of the speech appear to be due to later modifications of the speech designed to connect the episode with Gish. Assuming this to be the case, the speech sets forth the following five distinct aims of human life: (1) establishing a home (line 144), (2) work (line 147), (3) storing up resources (line 148), (4) marriage (line 150), (5) monogamy (line 154); all of which is put down as established for all time by divine decree (lines 155–157), and as man’s fate from his birth (lines 158–159). Line 144. bi-ti-iš e-mu-ti is for bîti šá e-mu-ti, just as ḳab-lu-uš Ti-a-ma-ti (Assyrian Creation Myth, IV, 65) stands for ḳablu šá Tiamti. Cf. bît e-mu-ti (Assyrian version, IV, 2, 46 and 48). The end of the line is lost beyond recovery, but the general sense is clear. Line 146. tu-a-ar is a possible reading. It may be the construct of tu-a-ru, of frequent occurrence in legal texts and having some such meaning as “right,” “claim” or “prerogative.” See the passages given by Muss-Arnolt, Assyrian Dictionary, p. 1139b. Line 148. The reading uk-la-at, “food,” and then in the wider sense “food supply,” “provisions,” is quite certain. The fourth sign looks like the one for “city.” E-mi-sa may stand for e-mid-sa, “place it.” The general sense of the line, at all events, is clear, as giving the advice to gather resources. It fits in with the Babylonian outlook on life to regard work and wealth as the fruits of work and as a proper purpose in life. Line 150 (repeated lines 152–153) is a puzzling line. To render piti pûk epši (or epiši), as Langdon proposes, “open, addressing thy speech,” is philologically and in every other respect inadmissible. The word pu-uk (which Langdon takes for “thy mouth”!!) can, of course, be nothing but the construct form of pukku, which occurs in the Assyrian version in the sense of “net” (pu-uk-ku I, 2, 9 and 21, and also in the colophon to the eleventh tablet furnishing the [81]beginning of the twelfth tablet (Haupt’s edition No. 56), as well as in column 2, 29, and column 3, 6, of this twelfth tablet). In the two last named passages pukku is a synonym of mekû, which from the general meaning of “enclosure” comes to be a euphemistic expression for the female organ. So, for example, in the Assyrian Creation Myth, Tablet IV, 66 (synonym of ḳablu, “waist,” etc.). See Holma, Namen der Körperteile, page 158. Our word pukku must be taken in this same sense as a designation of the female organ—perhaps more specifically the “hymen” as the “net,” though the womb in general might also be designated as a “net” or “enclosure.” Kak-(ši) is no doubt to be read epši, as Langdon correctly saw; or perhaps better, epiši. An expression like ip-ši-šú lul-la-a (Assyrian version, I, 4, 13; also line 19, i-pu-us-su-ma lul-la-a), with the explanation šipir zinništi, “the work of woman” (i.e., after the fashion of woman), shows that epêšu is used in connection with the sexual act. The phrase pitî pûk epiši a-na ḫa-a-a-ri, literally “open the net, perform the act for marriage,” therefore designates the fulfillment of the marriage act, and the line is intended to point to marriage with the accompanying sexual intercourse as one of the duties of man. While the general meaning is thus clear, the introduction of Gish is puzzling, except on the supposition that lines 149 and 151 represent later additions to connect the speech, detailing the advance to civilized life, with the hero. See above, p. 45 seq. Line 154. aššat šimâtim is the “legitimate wife,” and the line inculcates monogamy as against promiscuous sexual intercourse. We know that monogamy was the rule in Babylonia, though a man could in addition to the wife recognized as the legalized spouse take a concubine, or his wife could give her husband a slave as a concubine. Even in that case, according to the Hammurabi Code, §§145–146, the wife retained her status. The Code throughout assumes that a man has only one wife—the aššat šimâtim of our text. The phrase “so” (or “that”) before “as afterwards” is to be taken as an idiomatic expression—“so it was and so it should be for all times”—somewhat like the phrase maḫriam ù arkiam, “for all times,” in legal documents (CT VIII, 38c, 22–23). For the use of mûk see Behrens, Assyrisch-Babylonische Briefe, p. 3. Line 158. i-na bi-ti-iḳ a-bu-un-na-ti-šú. Another puzzling line, for which Langdon proposes “in the work of his presence,” which [82]is as obscure as the original. In a note he says that apunnâti means “nostrils,” which is certainly wrong. There has been considerable discussion about this term (see Holma, Namen der Körperteile, pages 150 and 157), the meaning of which has been advanced by Christian’s discussion in OLZ 1914, p. 397. From this it appears that it must designate a part of the body which could acquire a wider significance so as to be used as a synonym for “totality,” since it appears in a list of equivalent for Dur = nap-ḫa-ru, “totality,” ka-lu-ma, “all,” a-bu-un-na-tum e-ṣi-im-tum, “bony structure,” and kul-la-tum, “totality” (CT XII, 10, 7–10). Christian shows that it may be the “navel,” which could well acquire a wider significance for the body in general; but we may go a step further and specify the “umbilical cord” (tentatively suggested also by Christian) as the primary meaning, then the “navel,” and from this the “body” in general. The structure of the umbilical cord as a series of strands would account for designating it by a plural form abunnâti, as also for the fact that one could speak of a right and left side of the appunnâti. To distinguish between the “umbilical cord” and the “navel,” the ideograph Dur (the common meaning of which is riksu, “bond” [Delitzsch, Sumer. Glossar., p. 150]), was used for the former, while for the latter Li Dur was employed, though the reading in Akkadian in both cases was the same. The expression “with (or at) the cutting of his umbilical cord” would mean, therefore, “from his birth”—since the cutting of the cord which united the child with the mother marks the beginning of the separate life. Lines 158–159, therefore, in concluding the address to Enkidu, emphasize in a picturesque way that what has been set forth is man’s fate for which he has been destined from birth. [See now Albright’s remarks on abunnatu in the Revue d’Assyriologie 16, pp. 173–175, with whose conclusion, however, that it means primarily “backbone” and then “stature,” I cannot agree.] In the break of about three lines at the bottom of column 4, and of about six at the beginning of column 5, there must have been set forth the effect of the address on Enkidu and the indication of his readiness to accept the advice; as in a former passage (line 64), Enkidu showed himself willing to follow the woman. At all events the two now proceed to the heart of the city. Enkidu is in front [83]and the woman behind him. The scene up to this point must have taken place outside of Erech—in the suburbs or approaches to the city, where the meadows and the sheepfolds were situated. Line 174. um-ma-nu-um are not the “artisans,” as Langdon supposes, but the “people” of Erech, just as in the Assyrian version, Tablet IV, 1, 40, where the word occurs in connection with i-dip-pi-ir, which is perhaps to be taken as a synonym of paḫâru, “gather;” so also i-dip-pir (Tablet I, 2, 40) “gathers with the flock.” Lines 180–182 must have contained the description of Enkidu’s resemblance to Gish, but the lines are too mutilated to permit of any certain restoration. See the corrections (Appendix) for a suggested reading for the end of line 181. Line 183 can be restored with considerable probability on the basis of the Assyrian version, Tablet I, 3, 3 and 30, where Enkidu is described as one “whose power is strong in the land.” Lines 186–187. The puzzling word, to be read apparently kak-ki-a-tum, can hardly mean “weapons,” as Langdon proposes. In that case we should expect kakkê; and, moreover, to so render gives no sense, especially since the verb ú-te-el-li-lu is without much question to be rendered “rejoiced,” and not “purified.” Kakkiatum—if this be the correct reading—may be a designation of Erech like ribîtim. Lines 188–189 are again entirely misunderstood by Langdon, owing to erroneous readings. See the corrections in the Appendix. Line 190. i-li-im in this line is used like Hebrew Elohîm, “God.” Line 191. šakiššum = šakin-šum, as correctly explained by Langdon. Line 192. With this line a new episode begins which, owing to the gap at the beginning of column 6, is somewhat obscure. The episode leads to the hostile encounter between Gish and Enkidu. It is referred to in column 2 of the fourth tablet of the Assyrian version. Lines 35–50—all that is preserved of this column—form in part a parallel to columns 5–6 of the Pennsylvania tablet, but in much briefer form, since what on the Pennsylvania tablet is the incident itself is on the fourth tablet of the Assyrian version merely a repeated summary of the relationship between the two heroes, leading up to the expedition against Ḫu(m)baba. Lines 38–40 of [84]column 2 of the Assyrian version correspond to lines 174–177 of the Pennsylvania tablet, and lines 44–50 to lines 192–221. It would seem that Gish proceeds stealthily at night to go to the goddess Ishḫara, who lies on a couch in the bît êmuti , the “family house” Assyrian version, Tablet IV, 2. 46–48). He encounters Enkidu in the street, and the latter blocks Gish’s path, puts his foot in the gate leading to the house where the goddess is, and thus prevents Gish from entering. Thereupon the two have a fierce encounter in which Gish is worsted. The meaning of the episode itself is not clear. Does Enkidu propose to deprive Gish, here viewed as a god (cf. line 190 of the Pennsylvania tablet = Assyrian version, Tablet I, 4, 45, “like a god”), of his spouse, the goddess Ishḫara—another form of Ishtar? Or are the two heroes, the one a counterpart of the other, contesting for the possession of a goddess? Is it in this scene that Enkidu becomes the “rival” (me-iḫ-rù, line 191 of the Pennsylvania tablet) of the divine Gish? We must content ourself with having obtained through the Pennsylvania tablet a clearer indication of the occasion of the fight between the two heroes, and leave the further explanation of the episode till a fortunate chance may throw additional light upon it. There is perhaps a reference to the episode in the Assyrian version, Tablet II, 3b, 35–36. Line 196. For i-na-ag-šá-am (from nagâšu), Langdon proposes the purely fanciful “embracing her in sleep,” whereas it clearly means “he approaches.” Cf. Muss-Arnolt, Assyrian Dictionary, page 645a. Lines 197–200 appear to correspond to Tablet IV, 2, 35–37, of the Assyrian version, though not forming a complete parallel. We may therefore supply at the beginning of line 35 of the Assyrian version [ittaziz] Enkidu, corresponding to line 197 of the Pennsylvania tablet. Line 36 of IV, 2, certainly appears to correspond to line 200 (dan-nu-ti = da-na-ni-iš-šú). Line 208. The first sign looks more like šar, though ur is possible. Line 211 is clearly a description of Enkidu, as is shown by a comparison with the Assyrian version I, 2, 37: [pi]-ti-ik pi-ir-ti-šú uḫ-tan-na-ba kima dNidaba, “The form of his hair sprouted like wheat.” We must therefore supply Enkidu in the preceding line. Tablet IV, 4, 6, of the Assyrian version also contains a reference to the flowing hair of Enkidu. [85] Line 212. For the completion of the line cf. Harper, Assyrian and Babylonian Letters, No. 214. Line 214. For ribîtu mâti see the note above to line 28 of column 1. Lines 215–217 correspond almost entirely to the Assyrian version IV, 2, 46–48. The variations ki-ib-su in place of šêpu, and kima lîm, “like oxen,” instead of ina bâb êmuti (repeated from line 46), ana šurûbi for êribam, are slight though interesting. The Assyrian version shows that the “gate” in line 215 is “the gate of the family house” in which the goddess Ishḫara lies. Lines 218–228. The detailed description of the fight between the two heroes is only partially preserved in the Assyrian version. Line 218. li-i-im is evidently to be taken as plural here as in line 224, just as su-ḳi-im (lines 27 and 175), ri-bi-tim (lines 4, 28, etc.), tarbaṣim (line 74), aṣṣamim (line 98) are plural forms. Our text furnishes, as does also the Yale tablet, an interesting illustration of the vacillation in the Hammurabi period in the twofold use of im: (a) as an indication of the plural (as in Hebrew), and (b) as a mere emphatic ending (lines 63, 73, and 232), which becomes predominant in the post-Hammurabi age. Line 227. Gilgamesh is often represented on seal cylinders as kneeling, e.g., Ward Seal Cylinders Nos. 159, 160, 165. Cf. also Assyrian version V, 3, 6, where Gilgamesh is described as kneeling, though here in prayer. See further the commentary to the Yale tablet, line 215. Line 229. We must of course read uz-za-šú, “his anger,” and not uṣ-ṣa-šú, “his javelin,” as Langdon does, which gives no sense. Line 231. Langdon’s note is erroneous. He again misses the point. The stem of the verb here as in line 230 (i-ni-iḫ) is the common nâḫu, used so constantly in connection with pašâḫu, to designate the cessation of anger. Line 234. ištên applied to Gish designates him of course as “unique,” not as “an ordinary man,” as Langdon supposes. Line 236. On this title “wild cow of the stall” for Ninsun, see Poebel in OLZ 1914, page 6, to whom we owe the correct view regarding the name of Gilgamesh’s mother. Line 238. mu-ti here cannot mean “husband,” but “man” in [86]general. See above note to line 107. Langdon’s strange misreading ri-eš-su for ri-eš-ka (“thy head”) leads him again to miss the point, namely that Enkidu comforts his rival by telling him that he is destined for a career above that of the ordinary man. He is to be more than a mere prize fighter; he is to be a king, and no doubt in the ancient sense, as the representative of the deity. This is indicated by the statement that the kingship is decreed for him by Enlil. Similarly, Ḫu(m)baba or Ḫuwawa is designated by Enlil to inspire terror among men (Assyrian version, Tablet IV, 5, 2 and 5), i-šim-šú dEnlil = Yale tablet, l. 137, where this is to be supplied. This position accorded to Enlil is an important index for the origin of the Epic, which is thus shown to date from a period when the patron deity of Nippur was acknowledged as the general head of the pantheon. This justifies us in going back several centuries at least before Hammurabi for the beginning of the Gilgamesh story. If it had originated in the Hammurabi period, we should have had Marduk introduced instead of Enlil. Line 242. As has been pointed out in the corrections to the text (Appendix), šú-tu-ur can only be III, 1, from atâru, “to be in excess of.” It is a pity that the balance of the line is broken off, since this is the first instance of a colophon beginning with the term in question. In some way šutûr must indicate that the copy of the text has been “enlarged.” It is tempting to fill out the line šú-tu-ur e-li [duppi labiri], and to render “enlarged from an original,” as an indication of an independent recension of the Epic in the Hammurabi period. All this, however, is purely conjectural, and we must patiently hope for more tablets of the Old Babylonian version to turn up. The chances are that some portions of the same edition as the Yale and Pennsylvania tablets are in the hands of dealers at present or have been sold to European museums. The war has seriously interfered with the possibility of tracing the whereabouts of groups of tablets that ought never to have been separated. [87] Yale Tablet. Transliteration. (About ten lines missing.) Col. I. 11.................. [ib]-ri(?) 12[mi-im-ma(?) šá(?)]-kú-tu wa(?)-ak-rum 13[am-mi-nim] ta-aḫ-ši-iḫ 14[an-ni]-a-am [e-pi]-šá-am 15...... mi-im[-ma šá-kú-tu(?)]ma- 16di-iš 17[am-mi]-nim [taḫ]-ši-iḫ 18[ur(?)]-ta-du-ú [a-na ki-i]š-tim 19ši-ip-ra-am it-[ta-šú]-ú i-na [nišê] 20it-ta-áš-šú-ú-ma 21i-pu-šú ru-ḫu-tam 22.................. uš-ta-di-nu 23............................. bu 24............................... (About 17 lines missing.) 40.............. nam-........ 41.................... u ib-[ri] ..... 42.............. ú-na-i-du ...... 43[zi-ik]-ra-am ú-[tí-ir]-ru 44[a-na] ḫa-ri-[im]-tim 45[i]-pu(?)-šú a-na sa-[ka]-pu-ti Col. II. (About eleven lines missing.) 57... šú(?)-mu(?) ............... 58ma-ḫi-ra-am [šá i-ši-šú] 59šú-uk-ni-šum-[ma] ............... 60la-al-la-ru-[tu] .................. 61um-mi d-[Giš mu-di-a-at ka-la-ma] 62i-na ma-[ḫar dŠamaš i-di-šá iš-ši][88] 63šá ú 64i-na- an(?)-[na am-mi-nim] 65ta-[aš-kun(?) a-na ma-ri-ia li-ib-bi la] 66ṣa-[li-la te-mid-su] 67............................. (About four lines missing.) 72i-na [šá dEn-ki-dũ im-la-a] di-[im-tam] 73il-[pu-ut li]-ib-ba-šú-[ma] 74[zar-biš(?)] uš-ta-ni-[iḫ] 75[i-na šá dEn]-ki-dũ im-la-a di-im-tam 76[il-pu-ut] li-ib-ba-šú-ma 77[zar-biš(?)] uš-ta-ni-[iḫ] 78[dGiš ú-ta]-ab-bil pa-ni-šú 79[iz-za-kar-am] a-na dEn-ki-dũ 80[ib-ri am-mi-nim] i-na-ka 81[im-la-a di-im]-tam 82[il-pu-ut li-ib-bi]-ka 83[zar-biš tu-uš-ta]-ni-iḫ 84[dEn-ki-dũ pi-šú i-pu-šá]-am-ma 85iz-za-[kàr-am] a-na dGiš 86ta-ab-bi-a-tum ib-ri 87uš-ta-li-pa da-1da-ni-ia 88a-ḫa-a-a ir-ma-a-ma 89e-mu-ki i-ni-iš 90dGiš pi-šú i-pu-šá-am-ma 91iz-za-kàr-am a-na dEn-ki-dũ (About four lines missing.) Col. III. 96..... [a-di dḪu]-wa-wa da-pi-nu 97.................. ra-[am(?)-ma] 98................ [ú-ḫal]- li-ik 99[lu-ur-ra-du a-na ki-iš-ti šá] iserini[89] 100............ lam(?) ḫal-bu 101............ [li]-li-is-su 102.............. lu(?)-up-ti-šú 103dEn-ki-dũ pi-šú i-pu-šá-am-ma 104iz-za-kàr-am a-na dGiš 105i-di-ma ib-ri i-na šadî(-i) 106i-nu-ma at-ta-la-ku it-ti bu-lim 107a-na ištên(-en) kas-gíd-ta-a-an nu-ma-at ki-iš-tum 108[e-di-iš(?)] ur-ra-du a-na libbi-šá 109d[Ḫu-wa]-wa ri-ig-ma-šú a-bu-bu 110pi-[šú] dBil-gi-ma 111na-pi-iš-šú mu-tum 112am-mi-nim ta-aḫ-ši-iḫ 113an-ni-a-am e-pi-šá-am 114ga-[ba]-al-la ma-ḫa-ar 115[šú]-pa-at dḪu-wa-wa 116(d)Giš pi-šú i-pu-šá-am-ma 117[iz-za-k]àr-am a-na dEn-ki-dũ 118....... su(?)-lu-li a-šá-ki2-šá 119............. [i-na ki-iš]-tim 120............................... 121ik(?) ......................... 122a-na .......................... 123mu-šá-ab [dḪu-wa-wa] ....... 124ḫa-aṣ-si-nu ................. 125at-ta lu(?) ................. 126a-na-ku lu-[ur-ra-du a-na ki-iš-tim] 127dEn-ki-dũ pi-šú i-pu-[šá-am-ma] 128iz-za-kàr-am a-na [dGiš] 129ki-i ni[il]-la-ak [iš-te-niš(?)] 130a-na ki-iš-ti [šá iṣerini] 131na-ṣi-ir-šá dGiš muḳ-[tab-lu] 132da-a-an la ṣa[-li-lu(?)] 133dḪu-wa-wa dpi-ir-[ḫu ša (?)][90] 134dAdad iš .......... 135šú-ú .................. Col. IV. 136áš-šúm šú-ul-lu-m[u ki-iš-ti šáiṣerini] 137pu-ul-ḫi-a-tim 7 [šú(?) i-šim-šú dEnlil] 138dGiš pi-šú i-pu [šá-am-ma] 139iz-za-kàr-am a-na [dEn-ki-dũ] 140ma-an-nu ib-ri e-lu-ú šá-[ru-ba(?)] 141i-ṭib-ma it-ti dŠamaš da-ri-iš ú-[me-šú] 142a-we-lu-tum ba-ba-nu ú-tam-mu-šá-[ma] 143mi-im-ma šá i-te-ni-pu-šú šá-ru-ba 144at-ta an-na-nu-um-ma ta-dar mu-tam 145ul iš-šú da-na-nu ḳar-ra-du-ti-ka 146lu-ul-li-ik-ma i-na pa-ni-ka 147pi-ka li-iš-si-a-am ṭi-ḫi-e ta-du-ur 148šum-ma am-ta-ḳu-ut šú-mi lu-uš-zi-iz 149dGiš mi3-it-ti dḪu-wa-wa da-pi-nim 150il(?)-ḳu-ut iš-tu 151i-wa-al-dam-ma tar-bi-a i-na šam-mu(?) Il(?) 152iš-ḫi-it-ka-ma la-bu ka-la-ma ti-di 153it- ku(?) ..... [il(?)]-pu-tu-(?) ma ..... 154.............. ka-ma 155.............. ši pi-ti 156............ ki-ma re’i(?) na-gi-la sa-rak-ti 157.... [ta-šá-s]i-a-am tu-lim-mi-in li-ib-bi 158[ga-ti lu]-uš-ku-un-ma 159[lu-u-ri]-ba-am iṣerini[91] 160[šú-ma sá]-ṭa-ru-ú a-na-ku lu-uš-ta-ak-na 161[pu-tu-ku(?)] ib-ri a-na ki-iš-ka-tim lu-mu-ḫa 162[be-le-e li-iš-]-pu-ku i-na maḫ-ri-ni 163[pu-tu]-ku a-na ki-iš-ka-ti-i i-mu-ḫu 164wa-áš-bu uš-ta-da-nu um-mi-a-nu 165pa-ši iš-pu-ku ra-bu-tim 166ḫa-aṣ-si-ni 3 biltu-ta-a-an iš-tap-ku 167pa-aṭ-ri iš-pu-ku ra-bu-tim 168me-še-li-tum 2 biltu-ta-a-an 169ṣi-ip-ru 30 ma-na-ta-a-an šá a-ḫi-ši-na 170išid(?) pa-aṭ-ri 30 ma-na-ta-a-an ḫuraṣi 171[d]Giš ù [dEn-ki-]dũ 10 biltu-ta-a-an šá-ak-nu] 172.... ul-la . .[Uruk]ki 7 i-di-il-šú 173...... iš-me-ma um-ma-nu ib-bi-ra 174[uš-te-(?)]-mi-a i-na sûḳi šá Urukki ri-bi-tim 175...... [u-še(?)]-ṣa-šú dGis 176[ina sûḳi šá(?) Urukki] ri-bi-tim 177[dEn-ki-dũ(?) ú]-šá-ab i-na maḫ-ri-šú 178..... [ki-a-am(?) i-ga]-ab-bi 179[........ Urukki ri]-bi-tim 180 [ma-ḫa-ar-šú] Col. V. 181dGiš šá i-ga-ab-bu-ú lu-mu-ur 182šá šú-um-šú it-ta-nam-ma-la ma-ta-tum 183lu-uk-šú-su-ma i-na ki-iš-ti iṣerini 184ki-ma da-an-nu pi-ir-ḫu-um šá Urukki[92] 185lu-ši-eš-mi ma-tam 186ga-ti lu-uš-ku-un-ma lu-uk-[šú]4-su-ma iṣerini 187šú-ma šá-ṭa-ru-ú a-na-ku lu-uš-tak-nam 188ši-bu-tum šá Urukki ri-bi-tim 189zi-ik-ra ú-ti-ir-ru a-na dGiš 190ṣi-iḫ-ri-ti-ma dGiš libbi-ka na-ši-ka 191mi-im-ma šá te-te-ni-pu-šú la ti-di 192ni-ši-im-me-ma dḪu-wa-wa šá-nu-ú bu-nu-šú 193ma-an-nu-um [uš-tam]-ḫa-ru ka-ak-ki-šú 194a-na ištên(-en) [kas-gíd-ta-a]-an nu-ma-at kišti 195ma-an-nu šá [ur-ra]-du a-na libbi-šá 196dḪu-wa-wa ri-ig-ma-šú a-bu-bu 197pi-šú dBil-gi-ma na-pi-su mu-tum 198am-mi-nim taḫ-ši-iḫ an-ni-a-am e-pi-šá 199ga-ba-al-la ma-ḫa-ar šú-pa-at dḪu-wa-wa 200iš-me-e-ma dGiš zi-ki-ir ma-li-[ki]-šú 201ip-pa-al-sa-am-ma i-ṣi-iḫ a-na ib-[ri-šú] 202i-na-an-na ib-[ri] ki-a-am [a-ga-ab-bi] 203a-pa-al-aḫ-šú-ma a-[al-la-ak a-na kišti] 204[lu]ul-[lik it-ti-ka a-na ki-iš-ti iṣerini(?)] (About five lines missing.) 210........................ -ma 211li ............... -ka[93] 212ilu-ka li(?) ..............-ka 213ḫarrana li-šá-[tir-ka a-na šú-ul-mi] 214a-na kar šá [Urukki ri-bi-tim] 215ka-mi-is-ma dGiš [ma-ḫa-ar dŠamaš(?)] 216a-wa-at i-ga-ab- [bu-šú-ma] 217a-al-la-ak dŠamaš katâ-[ka a-ṣa-bat] 218ul-la-nu lu-uš-li-ma na-pi-[iš-ti] 219te-ir-ra-an-ni a-na kar i-[na Urukki] 220ṣi-il-[la]m šú-ku-un [a-na ia-a-ši(?)] 221iš-si-ma dGiš ib-[ri.....] 222te-ir-ta-šú .......... 223is(?) .............. 224tam ................ 225........................ 226i-nu(?)-[ma] .................. (About two lines missing.) Col. VI. 229[a-na-ku] dGiš [i-ik]-ka-di ma-tum 230........... ḫarrana šá la al-[kam] ma-ti-ma 231.... a-ka-lu ..... la(?) i-di 232[ul-la-nu] lu-uš-li-[mu] a-na-ku 233[lu-ud-lul]-ka i-na [ḫ]u-ud li-ib-bi 234...... [šú]-ḳu-ut-[ti] la-li-ka 235[lu-še-šib(?)] - ka i-na kussêmeš 236....................... ú-nu-su 237[bêlêmeš(?)ú-ti-ir]-ru ra-bu-tum 238[ka-aš-tum] ù iš-pa-tum 239[i-na] ga-ti iš-ku-nu 240[il-]te-ki pa-ši 241....... -ri iš-pa-as-su[94] 242..... [a-na] ili šá-ni-tam 243[it-ti pa(?)] - tar-[šú] i-na ši-ip-pi-šú 244........ i-ip-pu-šú a-la-kam 245[ša]-niš ú-ga-ra-bu dGiš 246[a-di ma]-ti tu-ut-te-ir a-na libbi Urukki 247[ši-bu]-tum i-ka-ra-bu-šú 248[a-na] ḫarrani i-ma-li-ku dGiš 249[la t]a-at-kal dGiš a-na e-[mu]-ḳi-ka 250[a-]ka-lu šú-wa-ra-ma ú-ṣur ra-ma-an-ka 251[li]-il-lik dEn-ki-dũ i-na pa-ni-ka 252[ur-ḫa]-am a-we-ir a-lik ḫarrana(-na) 253[a-di] šá kišti ni-ri-bi-tim 254[šá(?)] [d]Ḫu-wa-wa ka-li-šú-nu ši-ip-pi-iḫ(?)-šú 255[ša(?)a-lik] maḫ-ra tap-pa-a ú-šá-lim 256[ḫarrana](-na)-šú šú-wa-ra-[ma ú-ṣur ra-ma-na-ka] 257[li-šak-šid]-ka ir-[ni-ta]-ka dŠamaš 258[ta]-ak-bi-a-at pi-ka li-kal-li-ma i-na-ka 259li-ip-ti-ḳu pa-da-nam pi-ḫi-tam 260ḫarrana li-iš-ta-zi-ik a-na ki-ib-si-ka 261šá-di-a li-iš-ta-zi-ik a-na šêpi-ka 262mu-ši-it-ka aw-a-at ta-ḫa-du-ú 263li-ib-la-ma dLugal-ban-da li-iz-zi-iz-ka[95] 264i-na ir-ni-ti-ka 265ki-ma ṣi-iḫ-ri ir-ni-ta-ka-ma luš-mida(-da) 266i-na na-ri šá dḪu-wa-wa šá tu-ṣa-ma-ru 267mi-zi ši-pi-ka 268i-na bat-ba-ti-ka ḫi-ri bu-ur-tam 269lu-ka-a-a-nu mê ellu i-na na-di-ka 270[ka-]su-tim me-e a-na dŠamaš ta-na-di 271[li-iš]ta-ḫa-sa-as dLugal-ban-da 272[dEn-ki-]dũ pi-su i-pu-šá-am-ma, iz-za-kàr a-na dGiš 273[is(?)]-tu(?) ta-áš-dan-nu e-pu-uš a-la-kam 274[la pa]la-aḫ libbi-ka ia-ti tu-uk-la-ni 275[šú-ku-]un i-di-a-am šú-pa-as-su 276[ḫarrana(?)]šá dḪu-wa-wa it-ta-la-ku 277.......... ki-bi-ma te-[ir]-šú-nu-ti (Three lines missing.) L.E. 281.............. nam-ma-la 282............... il-li-ku it-ti-ia 283............... ba-ku-nu-ši-im 284......... [ul]-la(?)-nu i-na ḫu-ud li-ib-bi 285[i-na še-me-e] an-ni-a ga-ba-šú 286e-diš ḫarrana(?) uš-te-[zi-ik] 287a-lik dGiš lu-[ul-lik a-na pa-ni-ka] 288li-lik il-ka .......... 289li-šá-ak-lim-[ka ḫarrana] ...... 290dGiš ù[dEn-ki-dũ] ....... 291mu-di-eš .......... 292bi-ri-[su-nu] ........ [87] Translation. (About ten lines missing.) Col. I. 11.................. (my friend?) 12[Something] that is exceedingly difficult, 13[Why] dost thou desire 14[to do this?] 15.... something (?) that is very [difficult (?)], 16[Why dost thou] desire 17[to go down to the forest]? 18A message [they carried] among [men] 19They carried about. 20They made a .... 21.............. they brought 22.............................. 23.............................. (About 17 lines missing.) 40............................. 41................... my friend 42................ they raised ..... 43answer [they returned.] 44[To] the woman 45They proceeded to the overthrowing Col. II. (About eleven lines missing.) 57.......... name(?) ............. 58[The one who is] a rival [to him] 59subdue and ................ 60Wailing ................ 61The mother [of Gišh, who knows everything] 62Before [Shamash raised her hand][88] 63Who 64Now(?) [why] 65hast thou stirred up the heart for my son, 66[Restlessness imposed upon him (?)] 67............................ (About four lines missing.) 72The eyes [of Enkidu filled with tears]. 73[He clutched] his heart; 74[Sadly(?)] he sighed. 75[The eyes of En]kidu filled with tears. 76[He clutched] his heart; 77[Sadly(?)] he sighed. 78The face [of Gišh was grieved]. 79[He spoke] to Enkidu: 80[“My friend, why are] thy eyes 81[Filled with tears]? 82Thy [heart clutched] 83Dost thou sigh [sadly(?)]?” 84[Enkidu opened his mouth] and 85spoke to Gišh: 86“Attacks, my friend, 87have exhausted my strength(?). 88My arms are lame, 89my strength has become weak.” 90Gišh opened his mouth and 91spoke to Enkidu: (About four lines missing.) Col. III. 96..... [until] Ḫuwawa, [the terrible], 97........................ 98............ [I destroyed]. 99[I will go down to the] cedar forest,[89] 100................... the jungle 101............... tambourine (?) 102................ I will open it. 103Enkidu opened his mouth and 104spoke to Gišh: 105“Know, my friend, in the mountain, 106when I moved about with the cattle 107to a distance of one double hour into the heart of the forest, 108[Alone?] I penetrated within it, 109[To] Ḫuwawa, whose roar is a flood, 110whose mouth is fire, 111whose breath is death. 112Why dost thou desire 113To do this? 114To advance towards 115the dwelling(?) of Ḫuwawa?” 116Gišh opened his mouth and 117[spoke to Enkidu: 118”... [the covering(?)] I will destroy. 119....[in the forest] 120.................... 121.................... 122To ................. 123The dwelling [of Ḫuwawa] 124The axe .......... 125Thou .......... 126I will [go down to the forest].” 127Enkidu opened his mouth and 128spoke to [Gish:] 129“When [together(?)] we go down 130To the [cedar] forest, 131whose guardian, O warrior Gish, 132a power(?) without [rest(?)], 133Ḫuwawa, an offspring(?) of ....[90] 134Adad ...................... 135He ........................ Col. IV. 136To keep safe [the cedar forest], 137[Enlil has decreed for it] seven-fold terror.” 138Gish [opened] his mouth and 139spoke to [Enkidu]: 140“Whoever, my friend, overcomes (?) [terror(?)], 141it is well (for him) with Shamash for the length of [his days]. 142Mankind will speak of it at the gates. 143Wherever terror is to be faced, 144Thou, forsooth, art in fear of death. 145Thy prowess lacks strength. 146I will go before thee. 147Though thy mouth calls to me; “thou art afraid to approach.” 148If I fall, I will establish my name. 149Gish, the corpse(?) of Ḫuwawa, the terrible one, 150has snatched (?) from the time that 151My offspring was born in ...... 152The lion restrained (?) thee, all of which thou knowest. 153........................ 154.............. thee and 155................ open (?) 156........ like a shepherd(?) ..... 157[When thou callest to me], thou afflictest my heart. 158I am determined 159[to enter] the cedar forest.[91] 160I will, indeed, establish my name. 161[The work(?)], my friend, to the artisans I will entrust. 162[Weapons(?)] let them mould before us.” 163[The work(?)] to the artisans they entrusted. 164A dwelling(?) they assigned to the workmen. 165Hatchets the masters moulded: 166Axes of 3 talents each they moulded. 167Lances the masters moulded; 168Blades(?) of 2 talents each, 169A spear of 30 mina each attached to them. 170The hilt of the lances of 30 mina in gold 171Gish and [Enki]du were equipped with 10 talents each 172.......... in Erech seven its .... 173....... the people heard and .... 174[proclaimed(?)] in the street of Erech of the plazas. 175..... Gis [brought him out(?)] 176[In the street (?)] of Erech of the plazas 177[Enkidu(?)] sat before him 178..... [thus] he spoke: 179”........ [of Erech] of the plazas 180............ [before him] Col. V. 181Gish of whom they speak, let me see! 182whose name fills the lands. 183I will lure him to the cedar forest, 184Like a strong offspring of Erech.[92] 185I will let the land hear (that) 186I am determined to lure (him) in the cedar (forest)5. 187A name I will establish.” 188The elders of Erech of the plazas 189brought word to Gish: 190“Thou art young, O Gish, and thy heart carries thee away. 191Thou dost not know what thou proposest to do. 192We hear that Huwawa is enraged. 193Who has ever opposed his weapon? 194To one [double hour] in the heart of the forest, 195Who has ever penetrated into it? 196Ḫuwawa, whose roar is a deluge, 197whose mouth is fire, whose breath is death. 198Why dost thou desire to do this? 199To advance towards the dwelling (?) of Ḫuwawa?” 200Gish heard the report of his counsellors. 201He saw and cried out to [his] friend: 202“Now, my friend, thus [I speak]. 203I fear him, but [I will go to the cedar forest(?)]; 204I will go [with thee to the cedar forest]. (About five lines missing.) 210.............................. 211May ................... thee[93] 212Thy god may (?) ........ thee; 213On the road may he guide [thee in safety(?)]. 214At the rampart of [Erech of the plazas], 215Gish kneeled down [before Shamash(?)], 216A word then he spoke [to him]: 217“I will go, O Shamash, [thy] hands [I seize hold of]. 218When I shall have saved [my life], 219Bring me back to the rampart [in Erech]. 220Grant protection [to me ?]!” 221Gish cried, ”[my friend] ...... 222His oracle .................. 223........................ 224........................ 225........................ 226When (?) (About two lines missing.) Col. VI. 229”[I(?)] Gish, the strong one (?) of the land. 230...... A road which I have never [trodden]; 231........ food ...... do not (?) know. 232[When] I shall have succeeded, 233[I will praise] thee in the joy of my heart, 234[I will extol (?)] the superiority of thy power, 235[I will seat thee] on thrones.” 236.................. his vessel(?) 237The masters [brought the weapons (?)]; 238[bow] and quiver 239They placed in hand. 240[He took] the hatchet. 241................. his quiver.[94] 242..... [to] the god(?) a second time 243[With his lance(?)] in his girdle, 244......... they took the road. 245[Again] they approached Gish! 246”[How long] till thou returnest to Erech?” 247[Again the elders] approached him. 248[For] the road they counselled Gis: 249“Do [not] rely, O Gish, on thy strength! 250Provide food and save thyself! 251Let Enkidu go before thee. 252He is acquainted with the way, he has trodden the road 253[to] the entrance of the forest. 254of Ḫuwawa all of them his ...... 255[He who goes] in advance will save the companion. 256Provide for his [road] and [save thyself]! 257(May) Shamash [carry out] thy endeavor! 258May he make thy eyes see the prophecy of thy mouth. 259May he track out (for thee) the closed path! 260May he level the road for thy treading! 261May he level the mountain for thy foot! 262During thy night6 the word that wilt rejoice 263may Lugal-banda convey, and stand by thee[95] 264in thy endeavor! 265Like a youth may he establish thy endeavor! 266In the river of Ḫuwawa as thou plannest, 267wash thy feet! 268Round about thee dig a well! 269May there be pure water constantly for thy libation 270Goblets of water pour out to Shamash! 271[May] Lugal-banda take note of it!” 272[Enkidu] opened his mouth and spoke to Gish: 273”[Since thou art resolved] to take the road. 274Thy heart [be not afraid,] trust to me! 275[Confide] to my hand his dwelling(?)!” 276[on the road to] Ḫuwawa they proceeded. 277....... command their return (Three lines missing.) L.E. 281............... were filled. 282.......... they will go with me. 283............................... 284.................. joyfully. 285[Upon hearing] this word of his, 286Alone, the road(?) [he levelled]. 287“Go, O Gish [I will go before thee(?)]. 288May thy god(?) go ......... 289May he show [thee the road !] ..... 290Gish and [Enkidu] 291Knowingly .................... 292Between [them] ................ [96]Lines 13–14 (also line 16). See for the restoration, lines 112–13. Line 62. For the restoration, see Jensen, p. 146 (Tablet III, 2a,9.) Lines 64–66. Restored on the basis of the Assyrian version, ib. line 10. Line 72. Cf. Assyrian version, Tablet IV, 4, 10, and restore at the end of this line di-im-tam as in our text, instead of Jensen’s conjecture. Lines 74, 77 and 83. The restoration zar-biš, suggested by the Assyrian version, Tablet IV, 4, 4. Lines 76 and 82. Cf. Assyrian version, Tablet VIII, 3, 18. Line 78. (ú-ta-ab-bil from abâlu, “grieve” or “darkened.” Cf. uš-ta-kal (Assyrian version, ib. line 9), where, perhaps, we are to restore it-ta-[bil pa-ni-šú]. Line 87. uš-ta-li-pa from elêpu, “exhaust.” See Muss-Arnolt, Assyrian Dictionary, p. 49a. Line 89. Cf. Assyrian version, ib. line 11, and restore the end of the line there to i-ni-iš, as in our text. Line 96. For dapinu as an epithet of Ḫuwawa, see Assyrian version, Tablet III, 2a, 17, and 3a, 12. Dapinu occurs also as a description of an ox (Rm 618, Bezold, Catalogue of the Kouyunjik Tablets, etc., p. 1627). Line 98. The restoration on the basis of ib. III, 2a, 18. Lines 96–98 may possibly form a parallel to ib. lines 17–18, which would then read about as follows: “Until I overcome Ḫuwawa, the terrible, and all the evil in the land I shall have destroyed.” At the same time, it is possible that we are to restore [lu-ul]-li-ik at the end of line 98. Line 101. lilissu occurs in the Assyrian version, Tablet IV, 6, 36. Line 100. For ḫalbu, “jungle,” see Assyrian version, Tablet V, 3, 39 (p. 160). Lines 109–111. These lines enable us properly to restore Assyrian version, Tablet IV, 5, 3 = Haupt’s edition, p. 83 (col. 5, 3). No doubt the text read as ours mu-tum (or mu-u-tum) na-pis-su. Line 115. šupatu, which occurs again in line 199 and also line 275.šú-pa-as-su (= šupat-su) must have some such meaning as [97]“dwelling,” demanded by the context. [Dhorme refers me to OLZ 1916, p. 145]. Line 129. Restored on the basis of the Assyrian version, Tablet IV, 6, 38. Line 131. The restoration muḳtablu, tentatively suggested on the basis of CT XVIII, 30, 7b, where muḳtablu, “warrior,” appears as one of the designations of Gilgamesh, followed by a-lik pa-na, “the one who goes in advance,” or “leader”—the phrase so constantly used in the Ḫuwawa episode. Line 132. Cf. Assyrian version, Tablet I, 5, 18–19. Lines 136–137. These two lines restored on the basis of Jensen IV, 5, 2 and 5. The variant in the Assyrian version, šá niše (written Ukumeš in one case and Lumeš in the other), for the numeral 7 in our text to designate a terror of the largest and most widespread character, is interesting. The number 7 is similarly used as a designation of Gilgamesh, who is called Esigga imin, “seven-fold strong,” i.e., supremely strong (CT XVIII, 30, 6–8). Similarly, Enkidu, ib. line 10, is designated a-rá imina, “seven-fold.” Line 149. A difficult line because of the uncertainty of the reading at the beginning of the following line. The most obvious meaning of mi-it-tu is “corpse,” though in the Assyrian version šalamtu is used (Assyrian version, Tablet V, 2, 42). On the other hand, it is possible—as Dr. Lutz suggested to me—that mittu, despite the manner of writing, is identical with miṭṭú, the name of a divine weapon, well-known from the Assyrian creation myth (Tablet IV, 130), and other passages. The combination miṭ-ṭu šá-ḳu-ú-, “lofty weapon,” in the Bilingual text IV, R², 18 No. 3, 31–32, would favor the meaning “weapon” in our passage, since [šá]-ḳu-tu is a possible restoration at the beginning of line 150. However, the writing mi-it-ti points too distinctly to a derivative of the stem mâtu, and until a satisfactory explanation of lines 150–152 is forthcoming, we must stick to the meaning “corpse” and read the verb il-ḳu-ut. Line 152. The context suggests “lion” for the puzzling la-bu. Line 156. Another puzzling line. Dr. Clay’s copy is an accurate reproduction of what is distinguishable. At the close of the line there appears to be a sign written over an erasure. Line 158. [ga-ti lu-]uš-kun as in line 186, literally, “I will place my hand,” i.e., I purpose, I am determined. [98] Line 160. The restoration on the basis of the parallel line 187. Note the interesting phrase, “writing a name” in the sense of acquiring “fame.” Line 161. The kiškattê, “artisans,” are introduced also in the Assyrian version, Tablet VI, 187, to look at the enormous size and weight of the horns of the slain divine bull. See for other passages Muss-Arnolt Assyrian Dictionary, p. 450b. At the beginning of this line, we must seek for the same word as in line 163. Line 162. While the restoration belê, “weapon,” is purely conjectural, the context clearly demands some such word. I choose belê in preference to kakkê, in view of the Assyrian version, Tablet VI, 1. Line 163. Putuku (or putukku) from patâku would be an appropriate word for the fabrication of weapons. Line 165. The rabûtim here, as in line 167, I take as the “master mechanics” as contrasted with the ummianu, “common workmen,” or journeymen. A parallel to this forging of the weapons for the two heroes is to be found in the Sumerian fragment of the Gilgamesh Epic published by Langdon, Historical and Religious Texts from the Temple Library of Nippur (Munich, 1914), No. 55, 1–15. Lines 168–170 describe the forging of the various parts of the lances for the two heroes. The ṣipru is the spear point Muss-Arnolt, Assyrian Dictionary, p. 886b; the išid paṭri is clearly the “hilt,” and the mešelitum I therefore take as the “blade” proper. The word occurs here for the first time, so far as I can see. For 30 minas, see Assyrian version, Tablet VI, 189, as the weight of the two horns of the divine bull. Each axe weighing 3 biltu, and the lance with point and hilt 3 biltu we would have to assume 4 biltu for each pašu, so as to get a total of 10 biltu as the weight of the weapons for each hero. The lance is depicted on seal cylinders representing Gilgamesh and Enkidu, for example, Ward, Seal Cylinders, No. 199, and also in Nos. 184 and 191 in the field, with the broad hilt; and in an enlarged form in No. 648. Note the clear indication of the hilt. The two figures are Gilgamesh and Enkidu—not two Gilgameshes, as Ward assumed. See above, page 34. A different weapon is the club or mace, as seen in Ward, Nos. 170 and 173. This appears also to be the weapon which Gilgamesh holds in his hand on the colossal figure from the palace of Sargon (Jastrow, Civilization of [99]Babylonia and Assyria, Pl. LVII), though it has been given a somewhat grotesque character by a perhaps intentional approach to the scimitar, associated with Marduk (see Ward, Seal Cylinders, Chap. XXVII). The exact determination of the various weapons depicted on seal-cylinders merits a special study. Line 181. Begins a speech of Ḫuwawa, extending to line 187, reported to Gish by the elders (line 188–189), who add a further warning to the youthful and impetuous hero. Line 183. lu-uk-šú-su (also l. 186), from akâšu, “drive on” or “lure on,” occurs on the Pennsylvania tablet, line 135, uk-ki-ši, “lure on” or “entrap,” which Langdon erroneously renders “take away” and thereby misses the point completely. See the comment to the line of the Pennsylvania tablet in question. Line 192. On the phrase šanû bunu, “change of countenance,” in the sense of “enraged,” see the note to the Pennsylvania tablet, l.31. Line 194. nu-ma-at occurs in a tablet published by Meissner, Altbabyl. Privatrecht, No. 100, with bît abi, which shows that the total confine of a property is meant; here, therefore, the “interior” of the forest or heart. It is hardly a “by-form” of nuptum as Muss-Arnolt, Assyrian Dictionary, p. 690b, and others have supposed, though nu-um-tum in one passage quoted by Muss-Arnolt, ib. p. 705a, may have arisen from an aspirate pronunciation of the p in nubtum. Line 215. The kneeling attitude of prayer is an interesting touch. It symbolizes submission, as is shown by the description of Gilgamesh’s defeat in the encounter with Enkidu (Pennsylvania tablet, l. 227), where Gilgamesh is represented as forced to “kneel” to the ground. Again in the Assyrian version, Tablet V, 4, 6, Gilgamesh kneels down (though the reading ka-mis is not certain) and has a vision. Line 229. It is much to be regretted that this line is so badly preserved, for it would have enabled us definitely to restore the opening line of the Assyrian version of the Gilgamesh Epic. The fragment published by Jeremias in his appendix to his Izdubar-Nimrod, Plate IV, gives us the end of the colophon line to the Epic, reading ……… di ma-a-ti (cf. ib., Pl. I, 1. … a-ti). Our text evidently reproduces the same phrase and enables us to supply ka, as well as [100]the name of the hero Gišh of which there are distinct traces. The missing word, therefore, describes the hero as the ruler, or controller of the land. But what are the two signs before ka? A participial form from pakâdu, which one naturally thinks of, is impossible because of the ka, and for the same reason one cannot supply the word for shepherd (nakidu). One might think of ka-ak-ka-du, except that kakkadu is not used for “head” in the sense of “chief” of the land. I venture to restore [i-ik-]ka-di, “strong one.” Our text at all events disposes of Haupt’s conjecture iš-di ma-a-ti (JAOS 22, p. 11), “Bottom of the earth,” as also of Ungnad’s proposed [a-di pa]-a-ti, “to the ends” (Ungnad-Gressmann, Gilgamesch-Epos, p. 6, note), or a reading di-ma-a-ti, “pillars.” The first line of the Assyrian version would now read šá nak-ba i-mu-ru [dGis-gi(n)-maš i-ik-ka]-di ma-a-ti, i.e., “The one who saw everything, Gilgamesh the strong one (?) of the land.” We may at all events be quite certain that the name of the hero occurred in the first line and that he was described by some epithet indicating his superior position. Lines 229–235 are again an address of Gilgamesh to the sun-god, after having received a favorable “oracle” from the god (line 222). The hero promises to honor and to celebrate the god, by erecting thrones for him. Lines 237–244 describe the arming of the hero by the “master” craftsman. In addition to the pašu and paṭru, the bow (?) and quiver are given to him. Line 249 is paralleled in the new fragment of the Assyrian version published by King in PSBA 1914, page 66 (col. 1, 2), except that this fragment adds gi-mir to e-mu-ḳi-ka. Lines 251–252 correspond to column 1, 6–8, of King’s fragment, with interesting variations “battle” and “fight” instead of “way” and “road,” which show that in the interval between the old Babylonian and the Assyrian version, the real reason why Enkidu should lead the way, namely, because he knows the country in which Ḫuwawa dwells (lines 252–253), was supplemented by describing Enkidu also as being more experienced in battle than Gilgamesh. Line 254. I am unable to furnish a satisfactory rendering for this line, owing to the uncertainty of the word at the end. Can it [101]be “his household,” from the stem which in Hebrew gives us מִשְׁפָּחָה “family?” Line 255. Is paralleled by col. 1, 4, of King’s new fragment. The episode of Gišh and Enkidu proceeding to Ninsun, the mother of Gish, to obtain her counsel, which follows in King’s fragment, appears to have been omitted in the old Babylonian version. Such an elaboration of the tale is exactly what we should expect as it passed down the ages. Line 257. Our text shows that irnittu (lines 257, 264, 265) means primarily “endeavor,” and then success in one’s endeavor, or “triumph.” Lines 266–270. Do not appear to refer to rites performed after a victory, as might at a first glance appear, but merely voice the hope that Gišh will completely take possession of Ḫuwawa’s territory, so as to wash up after the fight in Ḫuwawa’s own stream; and the hope is also expressed that he may find pure water in Ḫuwawa’s land in abundance, to offer a libation to Šhamašh. Line 275. On šú-pa-as-su = šupat-su, see above, to l. 115. [Note on Sabitum (above, p. 11) In a communication before the Oriental Club of Philadelphia (Feb. 10, 1920), Prof. Haupt made the suggestion that sa-bi-tum (or tu), hitherto regarded as a proper name, is an epithet describing the woman who dwells at the seashore which Gilgamesh in the course of his wanderings reaches, as an “innkeeper”. It is noticeable that the term always appears without the determinative placed before proper names; and since in the old Babylonian version (so far as preserved) and in the Assyrian version, the determinative is invariably used, its consistent absence in the case of sabitum (Assyrian Version, Tablet X, 1, 1, 10, 15, 20; 2, 15–16 [sa-bit]; Meissner fragment col. 2, 11–12) speaks in favor of Professor Haupt’s suggestion. The meaning “innkeeper”, while not as yet found in Babylonian-Assyrian literature is most plausible, since we have sabū as a general name for ’drink’, though originally designating perhaps more specifically sesame wine (Muss-Arnolt, Assyrian Dictionary, p. 745b) or distilled brandy, according to Prof. Haupt. Similarly, in the Aramaic dialects, sebha is used for “to drink” and in the Pael to “furnish drink”. Muss-Arnolt in [102]his Assyrian Dictionary, 746b, has also recognized that sabitum was originally an epithet and compares the Aramaic sebhoyâthâ(p1) “barmaids”. In view of the bad reputation of inns in ancient Babylonia as brothels, it would be natural for an epithet like sabitum to become the equivalent to “public” women, just as the inn was a “public” house. Sabitum would, therefore, have the same force as šamḫatu (the “harlot”), used in the Gilgamesh Epic by the side of ḫarimtu “woman” (see the note to line 46 of Pennsylvania Tablet). The Sumerian term for the female innkeeper is Sal Geštinna “the woman of the wine,” known to us from the Hammurabi Code §§108–111. The bad reputation of inns is confirmed by these statutes, for the house of the Sal Geštinna is a gathering place for outlaws. The punishment of a female devotee who enters the “house of a wine woman” (bît Sal Geštinna §110) is death. It was not “prohibition” that prompted so severe a punishment, but the recognition of the purpose for which a devotee would enter such a house of ill repute. The speech of the sabitum or innkeeper to Gilgamesh (above, p. 12) was, therefore, an invitation to stay with her, instead of seeking for life elsewhere. Viewed as coming from a “public woman” the address becomes significant. The invitation would be parallel to the temptation offered by the ḫarimtu in the first tablet of the Enkidu, and to which Enkidu succumbs. The incident in the tablet would, therefore, form a parallel in the adventures of Gilgamesh to the one that originally belonged to the Enkidu cycle. Finally, it is quite possible that sabitum is actually the Akkadian equivalent of the Sumerian Sal Geštinna, though naturally until this equation is confirmed by a syllabary or by other direct evidence, it remains a conjecture. See now also Albright’s remarks on Sabitum in the A. J. S. L. 36, pp. 269 seq.] [103] 1 Scribal error for an. 2 Text apparently di. 3 Hardly ul. 4 Omitted by scribe. 5 Kišti omitted by scribe. 6 I.e., at night to thee, may Lugal-banda, etc. Corrections to the Text of Langdon’s Edition of the Pennsylvania Tablet.1 Column 1. 5. Read it-lu-tim (“heroes”) instead of id-da-tim (“omens”). 6. Read ka-ka-bu instead of ka-ka-’a. This disposes of Langdon’s note 2 on p. 211. 9 Read ú-ni-iš-šú-ma, “I became weak” (from enêšu, “weak”) instead of ilam iš-šú-ma, “He bore a net”(!). This disposes of Langdon’s note 5 on page 211. 10. Read Urukki instead of ad-ki. Langdon’s note 7 is wrong. 12. Langdon’s note 8 is wrong. ú-um-mid-ma pu-ti does not mean “he attained my front.” 14. Read ab-ba-la-áš-šú instead of at-ba-la-áš-šú. 15. Read mu-di-a-at instead of mu-u-da-a-at. 20. Read ta-ḫa-du instead of an impossible [sa]-ah-ḫa-ta—two mistakes in one word. Supply kima Sal before taḫadu. 22. Read áš-šú instead of šú; and at the end of the line read [tu-ut]-tu-ú-ma instead of šú-ú-zu. 23. Read ta-tar-ra-[as-su]. 24. Read [uš]-ti-nim-ma instead of [iš]-ti-lam-ma. 28. Read at the beginning šá instead of ina. 29. Langdon’s text and transliteration of the first word do not tally. Read ḫa-aṣ-ṣi-nu, just as in line 31. 32. Read aḫ-ta-du (“I rejoiced”) instead of aḫ-ta-ta. Column 2. 4. Read at the end of the line di-da-šá(?) ip-tí-[e] instead of Di-?-al-lu-un (!). 5. Supply dEn-ki-dū at the beginning. Traces point to this reading. 19. Read [gi]-it-ma-[lu] after dGiš, as suggested by the Assyrian version, Tablet I, 4, 38, where emûḳu (“strength”) replaces nepištu of our text. 20. Read at-[ta kima Sal ta-ḫa]-bu-[ub]-šú. 21. Read ta-[ra-am-šú ki-ma]. [104] 23. Read as one word ma-a-ag-ri-i-im (“accursed”), spelled in characteristic Hammurabi fashion, instead of dividing into two words ma-a-ak and ri-i-im, as Langdon does, who suggests as a translation “unto the place yonder(?) of the shepherd”(!). 24. Read im-ta-ḫar instead of im-ta-gar. 32. Supply ili(?) after ki-ma. 33. Read šá-ri-i-im as one word. 35. Read i-na [áš]-ri-šú [im]-ḫu-ru. 36. Traces at beginning point to either ù or ki (= itti). Restoration of lines 36–39 (perhaps to be distributed into five lines) on the basis of the Assyrian version, Tablet I, 4, 2–5. Column 3. 14. Read Kàš (= šikaram, “wine”) ši-ti, “drink,” as in line 17, instead of bi-iš-ti, which leads Langdon to render this perfectly simple line “of the conditions and the fate of the land”(!). 21. Read it-tam-ru instead of it-ta-bir-ru. 22. Supply [lùŠú]-I. 29. Read ú-gi-ir-ri from garû (“attack), instead of separating into ú and gi-ir-ri, as Langdon does, who translates “and the lion.” The sign used can never stand for the copula! Nor is girru, “lion!” 30. Read Síbmeš, “shepherds,” instead of šab-[ši]-eš! 31. šib-ba-ri is not “mountain goat,” nor can ut-tap-pi-iš mean “capture.” The first word means “dagger,” and the second “he drew out.” 33. Read it-ti-[lu] na-ki-[di-e], instead of itti immer nakie which yields no sense. Langdon’s rendering, even on the basis of his reading of the line, is a grammatical monstrosity. 35. Read giš instead of wa. 37. Read perhaps a-na [na-ki-di-e i]- za-ak-ki-ir. Column 4. 4. The first sign is clearly iz, not ta, as Langdon has it in note 1 on page 216. 9. The fourth sign is su, not šú. 10. Separate e-eš (“why”) from the following. Read ta-ḫi-[il], followed, perhaps, by la. The last sign is not certain; it may be ma. [105] 11. Read lim-nu instead of mi-nu. In the same line read a-la-ku ma-na-aḫ-[ti]-ka instead of a-la-ku-zu(!) na-aḫ … ma, which, naturally, Langdon cannot translate. 16. Read e-lu-tim instead of pa-a-ta-tim. The first sign of the line, tu, is not certain, because apparently written over an erasure. The second sign may be a. Some one has scratched the tablet at this point. 18. Read uk-la-at âli (?) instead of ug-ad-ad-lil, which gives no possible sense! Column 5. 2. Read [wa]-ar-ki-šú. 8. Read i-ta-wa-a instead of i-ta-me-a. The word pi-it-tam belongs to line 9! The sign pi is unmistakable. This disposes of note 1 on p. 218. 9. Read Mi = ṣalmu, “image.” This disposes of Langdon’s note 2 on page 218. Of six notes on this page, four are wrong. 11. The first sign appears to be si and the second ma. At the end we are perhaps to supply [šá-ki-i pu]-uk-ku-ul, on the basis of the Assyrian version, Tablet IV, 2, 45, šá-ki-i pu-[uk-ku-ul]. 12. Traces at end of line suggest i-pa(?)-ka-du. 13. Read i-[na mâti da-an e-mu]-ki i-wa. 18. Read ur-šá-nu instead of ip-šá-nu. 19. Read i-šá-ru instead of i-tu-ru. 24. The reading it-ti after dGiš is suggested by the traces. 25. Read in-ni-[ib-bi-it] at the end of the line. 28. Read ip-ta-ra-[aṣ a-la]-ak-tam at the end of the line, as in the Assyrian version, Tablet IV, 2, 37. 30. The conjectural restoration is based on the Assyrian version, Tablet IV, 2, 36. Column 6. 3. Read i-na ṣi-ri-[šú]. 5. Supply [il-li-ik]. 21. Langdon’s text has a superfluous ga. 22. Read uz-za-šú, “his anger,” instead of uṣ-ṣa-šú, “his javelin” (!). 23. Read i-ni-iḫ i-ra-as-su, i.e., “his breast was quieted,” in the sense of “his anger was appeased.” 31. Read ri-eš-ka instead of ri-eš-su. [106] In general, it should be noted that the indications of the number of lines missing at the bottom of columns 1–3 and at the top of columns 4–6 as given by Langdon are misleading. Nor should he have drawn any lines at the bottom of columns 1–3 as though the tablet were complete. Besides in very many cases the space indications of what is missing within a line are inaccurate. Dr. Langdon also omitted to copy the statement on the edge: 4 šú-ši, i.e., “240 lines;” and in the colophon he mistranslates šú-tu-ur, “written,” as though from šaṭâru, “write,” whereas the form is the permansive III, 1, of atâru, “to be in excess of.” The sign tu never has the value ṭu! In all, Langdon has misread the text or mistransliterated it in over forty places, and of the 204 preserved lines he has mistranslated about one-half. 1 The enumeration here is according to Langdon’s edition. Plates Plate I. The Yale Tablet. Plate II. The Yale Tablet. Plate III. The Yale Tablet. Plate IV. The Yale Tablet. Plate V. The Yale Tablet. Plate VI. The Yale Tablet. Plate VII. The Yale Tablet.

      Compared to the other versions focusing on the epic of Gilgamesh, this version looks more into Gilgamesh's cure for immortality after Enkidu's death. The "us" in this instance would be Gilgamesh and his search for a cure while the "them" would be the enemies which are trying stop him which include the forces he come along. The text is able to create this distinction by describing Gilgamesh as the main character as the one who is need of a cure because struggles to come to terms that he will die one day. Not to mention, Enkidu as a being was able to turn Gilgamesh into a noble figure who used his power for good turning him into a more likeable figure which is why the reader also roots for him to find a cure. Gilgamesh as a figure shows that in his time period, males were the ones who were seen as leaders who have strength because the other females in all versions of the text do not carry dynamic roles that showcase their personality or even their endearing qualities. There are more political and nationalistic themes compared to the Sumerian versions which illustrate how linguistics and language can play a role in how a culture might be perceived. By using the strong characteristics of Gilgamesh, the text is ultimately able to show the civilization of Uruk and create a sense of identity as a result. CC BY Ajey Sasimugunthan (contact)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:<br /> I really enjoyed this manuscript from Torsekar et al on "Contrasting responses to aridity by

      different-sized decomposers cause similar decomposition rates across a precipitation gradient". The authors aimed to examine how climate interacts with decomposers of different size categories to influence litter decomposition. They proposed a new hypothesis: "The opposing climatic dependencies of macrofauna and that of microorganisms and mesofauna should lead to similar overall decomposition rates across precipitation gradients".

      This study emphasizes the importance as well as the contribution of different groups of organisms (micro, meso, macro, and whole community) across different seasons (summer with the following characteristics: hot with no precipitation, and winter with the following characteristics: cooler and wetter winter) along a precipitation gradient. The authors made use of 1050 litter baskets with different mesh sizes to capture decomposers contribution. They proposed a new hypothesis that was aiming to understand the "dryland decomposition conundrum". They combined their decomposition experiment with the sampling of decomposers by using pittfall traps across both experiment seasons. This study was carried out in Israel and based on a single litter species that is native to all seven sites. The authors found that microorganism contribution dominated in winter while macrofauna decomposition dominated the overall decomposition in summer. These seasonality differences combined with the differences in different decomposers groups fluctuation along precipitation resulted in similar overall decomposition rates across sites.<br /> I believe this manuscript has a potential to advance our knowledge on litter decomposition.

      Strengths:

      Well design study with combination of different approaches (methods) and consideration of seasonality to generalize pattern.

      The study expands to current understanding of litter decomposition and interaction between factors affecting the process (here climate and decomposers).

      Weaknesses:

      The study was only based on a single litter species.

      We now discuss the advantages and limitations of this approach in the methods and devote a completely new paragraph to this important point in the discussion (lines 394-401).

      Reviewer #2 (Public Review):

      Summary: Torsekar et al. use a leaf litter decomposition experiment across seasons, and in an aridity gradient, to provide a careful test of the role of different-sized soil invertebrates in shaping the rates of leaf litter decomposition. The authors found that large-sized invertebrates are more active in the summer and small-sized invertebrates in the winter. The summed effects of all invets then translated into similar levels of decomposition across seasons. The system breaks down in hyper-arid sites.

      Strengths: This is a well-written manuscript that provides a complete statistical analysis of a nice dataset. The authors provide a complete discussion of their results in the current literature.

      Weaknesses:

      I have only three minor comments. Please standardize the color across ALL figures (use the same color always for the same thing, and be friendly to color-blind people).

      Thank you for this important suggestion. We have now changed all figures to standardize all colors and chose a more color-blind friendly pallete.

      Fig 1 may benefit from separating the orange line (micro and meso) into two lines that reflect your experimental setup and results. I would mention the dryland decomposition conundrum earlier in the Introduction.

      We based our novel hypotheses on a thorough literature search. Accordingly, decomposition is expected to be positively associated with moisture, regardless of the decomposer body size. Our contribution to theory was to suggest that macro-detritivores may respond very differently to climatic conditions and dominate litter decomposition in warm arid-lands (we listed the reasons in the text). Consequently, we did not distinguish between microorganisms and mesofauna. We assumed that both groups inhabit the litter substrate and have limited adaptation to dry conditions. Our results provide strong evidence that this presumption is likely wrong and that mesofauna respond to climate very differently from micro-decomposers. Yet, we cannot use hindsight understanding to improve our original hypothesis. We now emphasize this important point at the discussion as important future direction. 

      Although we are very appreciative and pleased with the reviewer enthusiasm to highlight the importance of our work as a possible solution to the longstanding dryland decomposition conundrum, we decided not to move it to the introduction. This is because we think that our work is not centred on resolving the DDC but provides more general principles that may lead to a paradigm shift in the way ecologists study nutrient cycling across ecosystems.

      And the manuscript is full of minor grammatical errors. Some careful reading and fixing of all these minor mistakes here and there would be needed.

      We apologize and did our best to find and fix those mistakes

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I really enjoyed this manuscript from Torsekar et al on "Contrasting responses to aridity by different-sized decomposers cause similar decomposition rates across a precipitation gradient". The authors aimed to examine how climate interacts with decomposers of different size categories to influence litter decomposition. They proposed a new hypothesis: "The opposing climatic dependencies of macrofauna and that of microorganisms and mesofauna should lead to similar overall decomposition rates across precipitation gradients".

      This study emphasizes the importance as well as the contribution of different groups of organisms (micro, meso, macro, and whole community) across different seasons (summer with the following characteristics: hot with no precipitation, and winter with the following characteristics: cooler and wetter winter) along a precipitation gradient. The authors made use of 1050 litter baskets with different mesh sizes to capture decomposers contribution. They proposed a new hypothesis that was aiming to understand the "dryland decomposition conundrum". They combined their decomposition experiment with the sampling of decomposers by using pitfall traps across both experiment seasons. This study was carried out in Israel and based on a single litter species that is native to all seven sites. The authors found that microorganism contribution dominated in winter while macrofauna decomposition dominated the overall decomposition in summer. These seasonality differences combined with the differences in different decomposers groups fluctuation along precipitation resulted in similar overall decomposition rates across sites.

      I believe this manuscript has the potential to advance our knowledge on litter decomposition. Below i provide my general and specific comments.

      General comments:

      (1) Study in general is well designed and well thought beforehand,

      (2) Study aims to expand the current understanding of the dryland decomposition conundrum

      (3) The should put a caveat to the fact they only use one litter species and call for examining litter mixture in the same gradient.

      (4) Please check the way you reduce the random effects from your initial model, I have provided a better way to do so in my specific comments

      (5) For Figure 1, authors can check my comment on this and see if they could revise the figure.

      Thank you for the positive feedback and your valuable comments. We have tried to best address all comments and suggestions for improvement and clarification

      Specific comments

      Line # 57 Please write "Theory suggests" instead of "Theory suggest"

      We changed the text as suggested

      Line # 70, please write "Indeed, handful evidence shows" instead of "Indeed, handful evidence show"

      We changed the text as suggested

      Figure 1: I like this conceptual framework. I have a silly question, why is it that the slopes of the whole community at the beginning (between Hyperarid and Arid) is the same as the Macro fauna, I would think the slope should be higher as this is adding up right? and also the same goes for the decomposition of whole community later on. For me this should reflect the adding or summing up (if i am right) then the authors should think about how this could be reflected in the figure.

      We agree with your interpretation that the whole community decomposition reflects the addition by constituent decomposers. The slope of the whole community decomposition between hyper-arid and arid is slightly higher than the one of macro decomposition to reflect the additive effect of macro with meso+micro decomposition. We have now changed the figure slightly to make this point more visible (Line 106).

      Line # 111 Please make "Methods" bold as well to be consistent with others headings.

      We changed the formatting as suggested

      Line #125 and in other lines as well please replace "X" by "x" to denote multiplication.

      We changed the formatting as suggested

      Table 1 Please add "*" to climate like this "Climate*" so that the end note of the table could make sense

      Thank you for this suggestion. We have now added the asterisk referring to the note below the Table.

      Figure 2, please consider putting at line #133, mean annual precipitation (MAP), as such for line # 135 You can directly says The precipitation map ....

      We made both changes as suggested.

      Line # 138 I would not use the different units for the same values. I do understand that you want to emphasize the accuracy but i would write instead 3 +- 0.001 g

      We changed the units as suggested.

      Line # 145, how is the litter basket customized to rest at 1 cm above ground level?

      We have now clarified –that we cut-open windows one centimeter above the cage floor. The cages were positioned on the soil (line 144).

      Lines # 181-183, I like the approach of checking the necessity of having the random effects. However, it has been reported that likelihood ratio test (LRT) are not really reliable to test for random effects. I will suggest you rather use permutations instead. I think the function is confint(MODEL) you need to specify the number of permutation the higher the better but you should start with 99 first and see how the results look like if promising then you can even go to 9999. But it will need computation power and and time.

      Thank you for the suggestion. We now used a simulation-based exact test, instead of a LRT, to examine the random effect, as recommended by the authors from the “lme4” package. As recommended, we used 9999 simulations. The simulation test yielded a similar result to those originally reported (see lines 181-183).

      Line # 187, 188, 188, please do not use capital letter to start mesofauna, macrofauna and whole-community

      We changed the formatting as suggested

      Line # 205 Please add the version number of R in the text.

      We now included the version number as suggested.

      Line # 209-211, could you please check whether "then" is the word you want to use or "than"

      Our bad- we indeed meant “than” and have made the appropriate changes.

      Line # 227 and in other places as well please provide the second degree of freedom of the F test.

      Thank you for this important comment. We have now added the second degree of freedom to the relevant results (lines 229, 232).

      Figure 3 and Figure 4 show some results that are negative, can you please explain what might be the reasons behind this?

      We now explain this important point in the figures’ captions.

      Figure 5 Please add label to the x-axis.

      Thank you-we have now included a label.

      Line # 357, the sentence "... meso-decomposition, like microbial decomposition,...", I don't understand which criteria authors used to classify microbial decomposition as "meso-decomposition"?

      We now remove this potential cause of confusion by using the term ‘meso-decomposition’ to distinguish from microbial decomposition (Line 366).

      Line # 380 Kindly put "per se" in italic.

      We changed the formatting as suggested

      References

      The references format are not consistent. For example for the same journal (say Trends in Ecology and Evolution) the authors sometimes wrote the full name like at line # 36 (and also realize that "vol" should not be written as such) but wrote the abbreviations at line #42

      Our bad- we apologize and carefully checked all references to make sure the style is consistent.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths: 

      Overall the work is novel and moves the field of Alzheimer's disease forward in a significant way. The manuscript reports a novel concept of aberrant activity in VIP interneurons during the early stages of AD thus contributing to dysfunctions of the CA1 microcircuit. This results in the enhancement of the inhibitory tone on the primary cells of CA1. Thus, the disinhibition by VIP interneurons of Principal Cells is dampened. The manuscript was skillfully composed, and the study was of strong scientific rigor featuring well-designed experiments. Necessary controls were present. Both sexes were included.

      We express our gratitude to the reviewer for their keen appreciation of our efforts and their enthusiasm for the outcomes of this research.

      Limitations:

      (1) The authors attributed aberrant circuit activity to the accumulation of "Abeta intracellularly" inside IS-3 cells. That is problematic. 6E10 antibody recognizes amyloid plaques in addition to Amyloid Precursor Protein (APP) as well as the C99 fragment. There are no plaques at the ages 3xTg mice were examined. Thus, the staining shown in Figure 1a is of APP/C99 inside neurons, not abeta accumulations in neurons. At the ages of 3-6 months, 3xTg starts producing abeta oligomers and potentially tau oligomers as well (Takeda et al., 2013 PMID: 23640054; Takeda et al., 2015 PMID: 26458742 and others). Emerging literature suggests that abeta and tau oligomers disrupt circuit function. Thus, a more likely explanation of abeta and tau oligomers disrupting the activity of VIP neurons is plausible.

      The Reviewer correctly points out that 3xTg-AD mice typically do not exhibit plaques before 6 months of age, with limited amounts even up to 12 months, particularly in the hippocampus. To the best of our knowledge, the 6E10 antibody binds to an epitope in APP (682-687) that is also present in the Abeta (3-8) peptide. Consequently, 6E10 detects full-length APP, α-APP (soluble alpha-secretase-cleaved APP), and Abeta (LaFerla et al., 2007). Nonetheless, we concur with the Reviewer's observation that the detected signal includes Abeta oligomers and the C99 fragment, which is currently considered an early marker of AD pathology (Takasugi et al., 2023; Tanuma et al., 2023). Studies have demonstrated intracellular accumulation of C99 in 3-month-old 3xTg mice (Lauritzen et al., 2012), and its binding to the Kv7 potassium channel family, which results in inhibiting their activity (Manville and Abbott, 2021). If a similar mechanism operates in IS-3 cells, it could explain the changes in their firing properties observed in our study. Consequently, we have revised the manuscript to include this crucial information in both the Results and Discussion sections.

      (2) Authors suggest that their animals do not exhibit loss of synaptic connections and show Figure 3d in support of that suggestion. However, imaging with confocal microscopy of 70micron thick sections would not allow the resolution of pre- and post-synaptic terminals. More sensitive measures such as electron microscopy or array tomography are the appropriate techniques to pursue. It is important for the authors to either remove that data from the manuscript or address the limitations of their technique in the discussion section. There is a possibility of loss of synaptic connections in their mouse model at the ages examined.

      We appreciate the Reviewer’s perspective on the techniques used for imaging synaptic connections. While we acknowledge the limitations of confocal microscopy for resolving pre- and post-synaptic structures in thick sections, we respectfully disagree regarding the exclusive suitability of electron microscopy (EM). Our approach involved confocal 3D image acquisition using a 63x objective at 0.2 um lateral resolution and 0.25 Z-step, providing valuable quantitative insights into synaptic bouton density. Despite the challenges posed by thick sections, this method together with automatic analysis allows for careful quantification. Although EM offers unparalleled resolution, it presents challenges in quantification. We have included the important details regarding image acquisition and analysis in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The submitted manuscript by Michaud and Francavilla et al., is a very interesting study describing early disruptions in the disinhibitory modulation exerted by VIP+ interneurons in CA1, in a triple transgenic model of Alzheimer's disease. They provide a comprehensive analysis at the cellular, synaptic, network, and behavioral level on how these changes correlate and might be related to behavioral impairments during these early stages of the disease.

      Main findings:

      - 3xTg mice show early Aß accumulation in VIP-positive interneurons.

      - 3xTg mice show deficits in a spatially modified version of the novel object recognition test. - 3xTg mice VIP cells present slower action potentials and diminished firing frequency upon current injection.

      - 3xTg mice show diminished spontaneous IPSC frequency with slower kinetics in Oriens / Alveus interneurons.

      - 3xTg mice show increased O/A interneuron activity during specific behavioral conditions. - 3xTg mice show decreased pyramidal cell activity during specific behavioral conditions.

      Strengths:

      This study is very important for understanding the pathophysiology of Alzheimer´s disease and the crucial role of interneurons in the hippocampus in healthy and pathological conditions.

      We are thankful to the reviewer for their insightful recognition of our efforts and their enthusiasm for the results of this research.

      Weaknesses:

      Although results nicely suggest that deficits in VIP physiological properties are related to the differences in network activity, there is no demonstration of causality.

      We completely agree with the reviewer's observation regarding the lack of demonstration of causality in our results. Investigating causality in the relationship between deficits in VIP physiological properties and differences in network activity is indeed a crucial aspect of this project. However, achieving this goal will require a significant amount of time and dedicated manipulations in a new mouse model (VIP-Cre-3xTg). We appreciate the importance of this line of investigation and consider it as a priority for our future research endeavors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Limitations:

      (1) The authors should describe their model and state the age at which these mice start depositing amyloid plaques and neurofibrillary tangles. Readers might not be familiar with this model. It is also important to mention that circuit disruptions are assessed prior to plaque and tangle formation.

      We have included a detailed description of the 3xTg-AD mouse model in the Introduction section, including information on the age at which amyloid plaques and neurofibrillary tangles begin to appear. Additionally, we have clarified that circuit disruptions were assessed before the formation of plaques and tangles. These details have been added to both the Introduction and the Results sections to ensure clarity for readers unfamiliar with the model.

      (2) Ns are presented in Supplemental Table 1. Units are presented in a note to Supplementary Table 1. It would be advisable to specify Ns and units as the data is being presented in the results section or figure legends for easy access.

      We have now included the Ns (sample sizes), specifying the number of cells or sections and the number of experimental animals, directly within the Results section and in the figure legends. This ensures that readers have immediate access to this information without needing to refer to the supplementary materials.

      (3) Several typos require correction:

      a. "mamory" - Line 22, page 5.

      b. The term "Interneurons" is abbreviated as both "INs" and "IN" throughout the manuscript. The author should consistently choose one abbreviation.

      We have corrected the typo "mamory" to "memory" on line 22, page 5. Additionally, we have standardized the abbreviation for "Interneurons" to "INs" throughout the manuscript for consistency.

      (4) Note 2 in Supplementary Table 1 states that animals of both sexes with equal distribution were used throughout the study. It would be best for the reader to assess the data distribution based on sex. Thus, it is advisable for the authors to depict male and female data points as distinct symbols throughout the figures.

      Unfortunately, we do not have detailed sex-disaggregated data for all datasets, which limits our ability to depict male and female data points separately across all figures. Therefore, we have opted to pool data from both sexes for a more comprehensive analysis. We believe this approach maintains the robustness of our findings.

      Reviewer #2 (Recommendations for the authors):

      Major Points:

      - To keep the logical line of reasoning and to be able to interpret the results, it would be important to use the same metrics when comparing the population activity of O/A interneurons and principal cells in the different behavioral conditions.

      We have revised Figures 4 and 5 to enhance the coherence in data presentation. This includes using consistent metrics for comparing the population activity of both O/A interneurons and principal cells across different behavioral conditions. These changes ensure a clearer and more logical interpretation of the results.

      - Although results nicely suggest that deficits in VIP physiological properties are related to the differences in network activity, there is no demonstration of causality. Would it be possible to test if manipulating VIP neurons one could obtain such specific results? Alternatively, it could be discussed more in detail how the decrease in disinhibition could lead to the changes in network activity demonstrated here.

      We agree with the reviewer that establishing causality between VIP neuron deficits and changes in network activity would be very important. However, demonstrating causality would require a new line of investigation, involving the use of specific mouse models to selectively manipulate VIP neurons. This is an exciting direction that we plan to prioritize in our future research. For this study, we have included a discussion on the potential mechanisms by which decreased disinhibition might lead to the observed changes in network activity. Specifically, we propose that in young adult 3xTg-AD mice, the altered firing of I-S3 cells may lead to enhanced inhibition of principal cells. This could shift the excitation/inhibition balance, input integration and firing output of principal cells thereby impacting overall network activity. These points are discussed in detail in the revised Discussion section.

      - On the same lines the correlations showed in the manuscript, would be more robust if there was an in vivo demonstration that 3xTg mice indeed show decreased activity in vivo. The same experiments could also clarify if VIP cells in control animals are more active at the time of decision-making and during object exploration as suggested in the manuscript.

      Thank you for your comment. In response to the point raised, we would like to highlight that we have recently documented the increased activity of VIP-INs in the D-zone of the T-maze and during object exploration in a study published in Cell Reports (Tamboli et al., 2024). This publication is now referenced in our manuscript to support our findings. Regarding the in vivo activity of 3xTg mice, our observations indicated no significant differences in major behavioral patterns such as locomotion, rearing, and exploration of the T-maze when comparing Tg and non-Tg mice. These findings are presented in detail in Figure 4c and Supplementary Fig. 5. We believe these data support the robustness of our correlations by demonstrating that the overall behavioral activity of 3xTg mice is comparable to that of non-transgenic controls, thus focusing attention on the specific roles of VIP-INs in early prodromal state of AD pathology.

      Minor Points:

      - Figure 1c: Heading of VIP-Tg should have capital letters.

      Thank you for pointing that out. We have corrected the heading to "VIP-Tg" with capital letters in Figure 1c.

      - Figure 1d: The finding that no change was observed in the percentage of VIP+/CR+ is based on three animals and 3-4 slices per mouse. However, the result of VIP+CR+ in tg-mice has an outlier that might bias the results. I would suggest increasing the number of animals to confirm these results.

      Thank you for your insightful suggestion. We addressed the potential impact of the outlier in the VIP+/CR+ cell density analysis by recalculating the results after removing the outlier using the interquartile range method. This reanalysis revealed a statistically significant difference in the VIP+/CR+ cell density between non-Tg and Tg mice, which we have now detailed in the Results section. Despite this, we have chosen to retain the outlier in our final presentation to accurately represent the biological variability observed in our sample. We agree that increasing the number of animals would further validate these findings and will consider this in future studies.

      - Figure 3d: Would it be possible to identify the recorded interneurons? Is it expected that most of those are OLM cells?

      Thank you for your question. We were unable to fully recover all recorded cells using biocytin staining. However, for those cells with preserved axonal structures, we identified both OLM and bistratified cells, which are the primary targets of I-S3 cells. We have now included this information in the Results section to clarify the types of interneurons identified.

      - Figure 3: Why quantify VGat terminals instead of quantification of VIP-GFP terminals? Combined with the Calretinine labeling it would be more useful to indicate that no changes were observed at the morphological bouton level specifically in disinhibitory interneurons. Please also describe which imageJ plugin was used for the quantification.

      Thank you for your question. Our primary objective was to quantify the synaptic terminals of CR+ INs in the CA1 O/A region, which are predominantly formed by I-S3 cells. Therefore, VGaT and CR co-localization was used to guide this analysis. GFP expression in axonal boutons can sometimes be inconsistent and less reliable for precise quantification. For this analysis, we utilized the “Analyze Particles” function in ImageJ, combined with watershed segmentation, which is now specified in the Methods section.

      -  Figure 4g: How was the statistical test performed? If data was averaged across mice, please add error bars and data points in the figure.

      Thank you for your question. To compare the alternation percentage between non-Tg and Tg mice, we used Fisher’s Exact test as detailed in Supplementary Table 1. In this analysis, we considered each animal's choice individually, comparing the preference for correct versus incorrect choices between the two groups. Since Fisher’s Exact test is designed for analyzing qualitative data rather than quantitative data, averaging across mice was not applicable, and therefore, we did not include error bars or data points in the figure.

      - Figure 4h: To conclude that the increase in activity is larger in the 3xTg mice, there should be a statistical comparison for the magnitude of change between the decision and the stem zone for control and 3xTg mice. To show that there is no significant difference in this measurement in the control mice is insufficient.

      Thank you for your suggestion. We performed a statistical comparison of the magnitude of change in activity between the stem zone and the D-zone for non-Tg and 3xTg mice, as recommended. Our analysis showed no significant difference in this magnitude of change between the two genotypes. These results have now been included in the Results section. However, we would like to highlight an important finding regarding the nature of these changes. In the 3xTg mice, there was a consistent increase in the activity of O/A INs when entering the Dzone. In contrast, non-Tg mice displayed a range of responses, including both increases and decreases in activity. This indicates a higher reliability in the firing of O/A INs in the D-zone of 3xTg mice. Our recent study suggests that VIP-INs are particularly active in the D-zone (Tamboli et al., 2024). Therefore, the absence or reduced input from VIP-INs in 3xTg mice may lead to the observed higher engagement of O/A INs in this zone. We believe this observation is crucial for understanding the differential yet nuanced changes in neural dynamics in these mice.

      - In the methods, it is stated that there was a pre-selection of animals depending on learning performance. Would it be possible to also show the data from animals that did not properly learn? Alternatively, it would be useful to plot the correlation between performance in this test and the difference between activity in the stem and the decision-making zone. The reason to ask for this is that there is a trend for control animals to show reduced alternations (50 vs 80%, although not significant, it is a big difference). Considering that there is also a trend in control animals to show increased activity in the decision-making zone, it would be important to confirm that this is not only due to differences in performance. The current statistical procedure does not allow discarding this.

      In this study, we excluded from the analysis the animals that refused to explore the T-maze and spent all their time in the stem corner, or refused to explore the objects and stayed in the open field maze (OFM) corner. These exclusions applied to both non-Tg (n = 6) and Tg (n = 5) groups, indicating that low exploratory activity is not necessarily linked to AD-related mutations. During the T-maze test, we also observed several animals that made incorrect choices (4 out of 9 non-Tg and 1 out of 6 Tg mice). However, due to the low number of animals making incorrect choices, we were unable to form a separate group for analysis based on incorrect choices. These details are now provided in the Methods section.

      - Figure 4i. It is not clear when exactly cell activity was measured. If it was during the entire recording time, I think it would be interesting to see if the activity of O/A interneurons is different specifically during interaction with the object in 3xTg mice.

      Cell activity was indeed measured throughout the entire recording session and analyzed in relation to animal behavior (immobility to walking; Fig. 4d,e), and periods specifically related to interaction with objects were extracted for analysis (Figure 4i).

      - Why was the object modulation measured during a different task in which both objects were the same? The figure is misleading in that sense, as it suggests the experiment was the same as for the other panels with two different objects. It would be important to correct this if the authors want to correlate the deficits in NOR in 3xTg mice and changes in IN activity.

      The study specifically investigated object-modulated neural activity during the Sampling phase. Therefore, two identical objects were placed in the arena for animal exploration. As mentioned above, due to several animals failing to explore the OFM and objects on the second day, they were excluded from the analysis, preventing the conduct of the novel-object exploration Test Trial. Both non-Tg and Tg mice showed a lack of exploration in the OFM and Tmaze, for reasons that remain unclear. Consequently, we opted to present robust data on neural activity during the initial sampling of two identical objects. However, further investigation is needed to understand how this activity relates to deficits observed in the classical NOR test.

      - Figure. 5c-f. I would strongly suggest performing the same quantification and displaying similar figures for the fiber photometry experiments in interneurons and principal cells. It would help to interpret the data.

      We have taken the reviewer's suggestion into account and standardized the data analysis and presentation. Figures 4d, e and 5c, d now depict the walk-induced activity in INs and PCs, respectively. Figures 4h and 5f compare activity between the stem and D-zone in the T-maze. Additionally, Figures 4j and 5h illustrate the object modulation of INs and PCs, respectively.

      - Although velocity and mobility were quantified, it would be important to show also that they are not different during those times when activity was dissimilar, as in the decision zone.

      We have analyzed these data and found no significant differences between the two genotypes in terms of velocity and mobility during these periods. This analysis is now presented in Supplementary Figure 5e, f and detailed in the Results section.

      - Figure 5g-h. Similarly, I would suggest using the same metrics in order to correlate the results from interneuron and principal cell activity photometry.

      We have updated this figure to align with the presentation of interneurons (Figure 4j) and included RMS analysis to emphasize lower variance in object modulation of PCs as an indicator of increased network inhibition.

      - Was object modulation variance also different for INs depending on the mouse phenotype?

      We conducted this additional analysis but did not find any significant difference.

      - Figure S4: would it be possible to identify the postsynaptic partners?

      As mentioned above, for those cells with preserved axonal structures, we identified both OLM and bistratified cells. We have now included this information in the Results section to clarify the types of interneurons identified.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      The authors present 16 new well-preserved specimens from the early Cambrian Chengjiang biota. These specimens potentially represent a new taxon which could be useful in sorting out the problematic topology of artiopodan arthropods - a topic of interest to specialists in Cambrian arthropods. Because the anatomic features in the new specimens were neither properly revealed nor correctly interpreted, the evidence for several conclusions is inadequate. 

      We thank the Senior Editor, Reviewing Editor and three reviewers for their work, and for their comments aimed at improving this project and manuscript. We have engaged with all the comments in detail, in order to strengthen our work. This includes adding additional data to support that all Acanthomeridion specimens belong to a single species, running further phylogenetic analyses including more trilobite terminals to test the specific hypothesis and interpretation raised by Reviewer 2, and visualising our results in treespace in order to determine support for the different interpretations of the ventral structures and their implications for the evolution of Artiopoda. We have also greatly expanded the introduction, which we feel adds clarity to areas misunderstood by some reviewers in the previous version of the manuscript.

      Our point-by-point response to the public reviews of the reviewers are outlined below. We have also made changes resulting from the additional suggestions which are not public, which we have not reproduced below. We submit a new version of the main text, and can provide a tracked changes version if required. The new main text includes 9 figures and is 8624 words including captions and reference list.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Du et al. report 16 new well-preserved specimens of atiopodan arthropods from the Chengjiang biota, which demonstrate both dorsal and ventral anatomies of a potential new taxon of artipodeans that are closely related to trilobites. Authors assigned their specimens to Acanthomeridion serratum and proposed A. anacanthus as a junior subjective synonym of Acanthomeridion serratum. Critically, the presence of ventral plates (interpreted as cephalic liberigenae), together with phylogenic results, lead authors to conclude that the cephalic sutures originated multiple times within the Artiopoda. 

      We thank Reviewer 1 for their comments on the strengths and weaknesses of the previous version of the manuscript. We hope that the revised version strengthens our conclusions that Acanthomeridion anacanthus is a junior synonym of A. serratum.

      Strengths: 

      New specimens are highly qualified and informative. The morphology of the dorsal exoskeleton, except for the supposed free cheek, was well illustrated and described in detail, which provides a wealth of information for taxonomic and phylogenic analyses. 

      Weaknesses: 

      The weaknesses of this work are obvious in a number of aspects. Technically, ventral morphology is less well revealed and is poorly illustrated. Additional diagrams are necessary to show the trunk appendages and suture lines. Taxonomically, I am not convinced by the authors' placement. The specimens are markedly different from either Acanthomeridion serratum Hou et al. 1989 or A. anacanthus Hou et al. 2017. The ontogenetic description is extremely weak and the morpholical continuity is not established. Geometric and morphometric analyses might be helpful to resolve the taxonomic and ontogenic uncertainties. 

      We appreciate that the reviewer was not convinced by our synonimisation in the first version of the manuscript. The recommendation of the reviewer to provide linear morphometric support for our synonymisation was much appreciated. We have provided measurements of the length and width of the thorax (Figure 6 in the new version), visualising the position of specimens previously assigned to A. anacanthus, to show this morphological continuity. These act as a complement to Figure 5, which shows the fossils in an ontogenetic trend.

      I am confused by the author's description of the free cheek (libragena) and ventral plate. Are they the same object? How do they connect with other parts of the cephalic shield, e.g. hypostome, and fixgena? Critically, the homology of cephalic slits (eye slits, eye notch, dorsal suture, facial suture) is not extensively discussed either morphologically or functionally.

      We appreciate that the brevity of the introduction in the previous version led to some misunderstandings and some confusion. We have provided a greatly expanded introduction, including a new Figure 1, which outlines the possible homologies of the ventral plates and the three hypotheses considered in this study. The function of the cephalic and dorsal suture are now discussed in more detail both in introduction and discussion.

      Finally, the authors claimed that phylogenic results support two separate origins rather than a deep origin. However, the results in Figure 4 can explain a deep homology of the cephalic suture at molecular level and multiple co-options within the Atiopoda. 

      A deep molecular origin is difficult to demonstrate using solely fossil material from an extinct group such as Artiopoda. Thus our study focuses on morphological origins. The number of losses required for a deep morphological origin means that we favour multiple independent morphological origins.

      Reviewer #2 (Public Review): 

      Overall: This paper describes new material of Acanthomeridion serratum that the authors claim supports its synonymy with Acanthomeridion anacanthus. The material is important and the description is acceptable after some modification. In addition, the paper offers thoughts and some exploration of the possibility of multiple origins of the dorsal facial suture among artiopods, at least once within Trilobita and also among other non-trilobite artiopods. Although this possibility is real and apparently correct, the suggestions presented in this paper are both surprising and, in my opinion, unlikely to be true because the potential homologies proposed with regard to Acanthomeridion and trilobite-free cheeks are unconventional and poorly supported. 

      What to do? I can see two possibilities. One, which I recommend, is to concentrate on improving the descriptive part of the paper and omit discussion and phylogenetic analysis of dorsal facial suture distribution, leaving that for more comprehensive consideration elsewhere. The other is to seek to improve both simultaneously. That may be possible but will require extensive effort. 

      We thank the reviewer for their detailed comments and suggestions for multiple ways in which we might revise the manuscript. We have taken the option that is more effort, but we hope more reward, in interrogating the larger question alongside improving the descriptive part of the paper. This has taken a long time and incorporation of new techniques, but has in our opinion greatly strengthened the work.

      Major concerns 

      Concern 1 - Ventral sclerites as free cheek homolog, marginal sutures, and the trilobite doublure 

      Firstly, a couple of observations that bear on the arguments presented - the eyes of A. serratum are almost marginal and it is not clear whether a) there is a circumocular suture in this animal and b) if there was, whether it merged with the marginal suture. These observations are important because this animal is not one in which an impressive dorsal facial suture has been demonstrated - with eyes that near marginal it simply cannot do so. Accordingly, the key argument of this paper is not quite what one would expect. That expectation would be that a non-trilobite artiopod, such as A. serratum, shows a clear dorsal facial suture. But that is not the case, at least with A. serratum, because of its marginal eyes. Rather, the argument made is that the ventral doublure of A. serratum is the homolog of the dorsal free cheeks of trilobites. This opens up a series of issues. 

      We appreciate that the reviewer disagrees with both interpretations we offered for the ventral plates, and has offered a third interpretation for the homology of this feature with the doublure of trilobites. Support for our original interpretation comes from the position of the eye stalks in Acanthomeridion, which fall very close to the suture between ventral plate rest of the cephalon. However, we appreciate that the reviewer has a valid interpretation, that the ventral plates might be homologues of the doublure alone.

      To clarify the (two, now three) hypotheses of homology for the ventral plates considered in this study, we provide a new summary figure (Figure 1). In addition, the introduction has been greatly lengthened with further discussion of the different suture types in trilobites, their importance for trilobite classification schemes, and extensive references to older literature are now included. Further, we add background to the hypotheses around the origins of dorsal ecdysial sutures. 

      We add that the interpretation of A. serratum as having features homologous to the dorsal sutures of trilobites is already present in the literature, and so while the reviewer may disagree with it, it is certainly a hypothesis that requires testing.

      The paper's chief claim in this regard is that the "teardrop" shaped ventral, lateral cephalic plates in Acanthomeridion serratum are potential homologs of the "free cheeks" of those trilobites with a dorsal facial suture. There is no mention of the possibility that these ventral plates in A. serratum could be homologs of the lateral cephalic doublure of olenelloid trilobites, which is bound by an operative marginal suture or, in those trilobites with a dorsal facial suture, that it is a homolog of only the doublure portions of the free cheeks and not with their dorsal components. 

      We include this third possibility in our revised analyses and manuscript. To test this properly required adding in an olenelloid trilobite to our matrix, as we needed a terminal that had both a marginal and circumoral suture, but not fused. We chose Olenellus getzi for this purpose, as it is the only Olenellus with some appendages known (the antennae). We also added further characters to the morphological matrix, and additional trilobites from which soft tissues are known, in order to better resolve this part of the tree. Trilobites in the final analyses were: Anacheirurus adserai, Cryptolithus tesselatus, Eoredlichia intermedia, Olenoides serratus, Olenellus getzi, Triarthrus eatoni.

      However, addition of these trilobites added a further complication. Under unconstrained analysis, Olenellus getzi was resolved with Eoredlichia intermediata as a clade sister to all other trilobites.

      Thus the topology of Paterson et al. 2019 (PNAS) was not recovered, and so the hypothesis of Reviewer 2 could not be robustly tested. In order to achieve a topology comparable to Paterson et al., we ran a further three analyses, where we constrained a clade of all trilobites except for O. getzi. This recovered a topology where the earliest diverging trilobites had unfused sutures, and thus one suitable for considering the role of Acanthomeridion serratum ventral plates as homologues of the doublure of trilobites.

      Unfortunately, for these analyses (both constrained and unconstrained), Acanthomeridion was not resolved as sister to trilobites, but instead elsewhere in the tree (see Table 1 in main text, Fig. 9, and  SFig 9). Thus our analyses do not find support for the reviewer’s hypothesis as multiple origins of this feature are still required.

      It was still an excellent point that we should consider this hypothesis, and we have retained it, and discussion surrounding it, in our manuscript.

      The introduction to the paper does not inform the reader that all olenelloids had a marginal suture - a circumcephalic suture that was operative in their molting and that this is quite different from the situation in, say, "Cedaria" woosteri in which the only operative cephalic exoskeletal suture was circumocular. The conservative position would be that the olenelloid marginal suture is the homolog of the marginal suture in A. serratum: the ventral plates thus being homolog of the trilobite cephalic doublure, not only potential homolog to the entire or dorsal only part of the free cheeks of trilobites with a dorsal facial suture. As the authors of this paper decline to discuss the doublure of trilobites (there is a sole mention of the word in the MS, in a figure caption) and do not mention the olenelloid marginal suture, they give the reader no opportunity to assess support for this alternative. 

      At times the paper reads as if the authors are suggesting that olenelloids, which had a marginal cephalic suture broadly akin to that in Limulus, actually lacked a suture that permitted anterior egression during molting. The authors are right to stress the origin of the dorsal cephalic suture in more derived trilobites as a character seemingly of taxonomic significance but lines such as 56 and 67 may be taken by the non-specialist to imply that olenelloids lacked a forward egressionpermiting suture. There is a notable difference between not knowing whether sutures existed (a condition apparently quite common among soft-bodied artiopods) and the well-known marginal suture of olenelloids, but as the MS currently reads most readers will not understand this because it remains unexplained in the MS. 

      As noted in response to a previous point (above) we now have a greatly expanded introduction which should give the reader an opportunity to assess support for this alternative hypothesis. We now include Olenellus getzi in our analyses, and have added characters to the morphological matrix to make this clear.

      A reference to the case of ‘Cedaria’ woosteri is made in the introduction to highlight further the variability of trilobites, as is a reference to Foote’s analysis of cranidial shapes and support this provides for a  single origin of the dorsal suture.

      With that in mind, it is also worth further stressing that the primary function of the dorsal sutures in those which have them is essentially similar to the olenelloid/limulid marginal suture mentioned above. It is notable that the course of this suture migrated dorsally up from the margin onto the dorsal shield and merged with the circumocular suture, but this innovation does not seem to have had an impact on its primary function - to permit molting by forward egression. Other trilobites completely surrendered the ability to molt by forward egression, and there are even examples of this occurring ontogenetically within species, suggesting a significant intraspecific shift in suture functionality and molting pattern. The authors mention some of this when questioning the unique origin of the dorsal facial suture of trilobites, although I don't understand their argument: why should the history of subsequent evolutionary modification of a character bear on whether its origin was unique in the group? 

      We include reference to evolutionary modification and loss of this character as it is important to stress that if a character is known to have been lost multiple times it is possible that it had a deeper root (in an earlier diverging member of Artiopoda than Trilobita) and was lost in olenelloids. This is the question that we seek to address in our manuscript.

      The bottom line here is that for the ventral plates of A. serratum to be strict homologs of only the dorsal portion of the dorsal free cheeks, there would be no homolog of the trilobite doublure in A. serratum. The conventional view, in contrast, would be that the ventral plates are a homolog of the ventral doublure in all trilobites and ventral plates in artiopods. I do not think that this paper provides a convincing basis for preferring their interpretation, nor do I feel that it does an adequate job of explaining issues that are central to the subject. 

      We stress that our interpretations – that the ventral plates are not homologous to any artiopodan feature or that they are homologous to the free cheeks of trilobites – have both been raised in the literature before. Whereas we could not find mention of the reviewer’s ‘conventional view’ relating to Acanthomeridion. We appreciate that this view is still valid and worth investigating, which we have done in the further analyses conducted. However, we did not find support for it. Instead we find some support for both ventral plates as homologues of free cheeks, and as unique structures within Artiopoda.

      Concern 2. Varieties of dorsal sutures and the coexistence of dorsal and marginal sutures 

      The authors do not clarify or discuss connections between the circumocular sutures (a form of dorsal suture that separates the visual surface from the rest of the dorsal shield) and the marginal suture that facilitates forward egression upon molting. Both structures can exist independently in the same animal - in olenelloids for example. Olenelloids had both a suture that facilitated forward egression in molting (their marginal suture) and a dorsal suture (their circumocular suture). The condition in trilobites with a dorsal facial suture is that these two independent sutures merged - the formerly marginal suture migrating up the dorsal pleural surface to become confluent with the circumocular suture. (There are also interesting examples of the expansion of the circumocular suture across the pleural fixigena.) The form of the dorsal facial suture has long figured in attempts at higher-level trilobite taxonomy, with a number of character states that commonly relate to the proximity of the eye to the margin of the cephalic shield. The form of the dorsal facial suture that they illustrate in Xanderella, which is barely a strip crossing the dorsal pleural surface linking marginal and circumocular suture, is comparable to that in the trilobites Loganopeltoides and Entomapsis but that is a rare condition in that clade as a whole. The paper would benefit from a clear discussion of these issues at the beginning - the dorsal facial suture that they are referring to is a merged circumcephalic suture and circumocular suture - it is not simply the presence of a molt-related suture on the dorsal side of the cephalon. 

      We have added in an expanded introduction where these points are covered in detail. We appreciate that this was not clear in the earlier version, and this suggestion has greatly improved our work.

      Concern 3. Phylogenetics 

      While I appreciate that the phylogenetic database is a little modified from those of other recent authors, still I was surprised not to find a character matrix in the supplementary information (unless it was included in some way I overlooked), which I would consider a basic requirement of any paper presenting phylogenetic trees - after all, there's no a space limit. It is not possible for a reviewer to understand the details of their arguments without seeing the character states and the matrix of state assignments. 

      A link to a morphobank project was included in the first submission. This project has been updated for the current submission, including an additional matrix to treat the reviewer’s hypothesis for the ventral plates. Morphobank Project #P4290. Email address: P4290, reviewer password:

      Acanthomeridion2023, accessible at morphobank.org. We have added in additional details for the reviewer and others to help them access the project:

      The project can be accessed at morphobank.org, using the below credentials to log in:  Email address: P4290, Password: Acanthomeridion 2023.

      The section "phylogenetic analyses" provides a description of how tree topology changes depending on whether sutures are considered homologous or not using the now standard application of both parsimony and maximum likelihood approaches but, considering that the broader implications of this paper rest of the phylogenetic interpretation, I also found the absence of detailed discussion of the meaning and implications of these trees to be surprising, because I anticipated that this was the main reason for conducting these analysis. The trees are presented and briefly described but not considered in detail. I am troubled by "Circles indicate presence of cephalic ecdysial sutures" because it seems that in "independent origin of sutures" trilobites are considered to have two origins (brown color dot) of cephalic ecdysial sutures - this may be further evidence that the team does not appreciate that olenelloids have cephalic ecdysial sutures, as the basal condition in all trilobites. Perhaps I'm misunderstanding their views, but from what's presented it's not possible to know that. Similarly, in the "sutures homologous" analyses why would there be two independent green dots for both Acanthomeridion and Trilobita, rather than at the base of the clade containing them both, as cephalic ecdysial sutures are basal to both of them? Here again, we appear to see evidence that the team considers dorsal facial sutures and cephalic ecdysial sutures to be synonymous - which is incorrect.  

      We appreciate that the reviewer misunderstood the meaning of the dots, leading to confusion. The dots indicated how features were coded in the phylogenetic analysis. In our revised version of this figure (Figure 8 in the new version), these dots are now clearly labelled as indicating ‘coding in phylogenetic matrix’. Further, with the revised character list, we now can provide additional detail for the types of sutures (relevant as we now include more trilobite terminals).

      This point aside, and at a minimum, that team needs to do a more thorough job of characterizing and considering the variety of conditions of dorsal sutures among artiopods, their relationships to the marginal suture and to the circumocular suture, the number, and form of their branches, etc. 

      We thank the reviewer for this summary, and appreciate their concerns and thorough review. Our revised version takes into account all these points raised, and they have greatly improved the clarity, scope and thoroughness of the work.

      Reviewer #3 (Public Review): 

      Summary:

      Well-illustrated new material is documented for Acanthomeridion, a formerly incompletely known Cambrian arthropod. The formerly known facial sutures are shown to be associated with ventral plates that the authors very reasonably homologise with the free cheeks of trilobites. A slight update of a phylogenetic dataset developed by Du et al, then refined slightly by Chen et al, then by Schmidt et al, and again here, permits another attempt to optimise the number of origins of dorsal ecdysial sutures in trilobites and their relatives. 

      Strengths:

      Documentation of an ontogenetic series makes a sound case that the proposed diagnostic characters of a second species of Acanthomeridion are variations within a single species. New microtomographic data shed some light on appendage morphology that was not formerly known. The new data on ventral plates and their association with the ecdysial sutures are valuable in underpinning homologies with trilobites. 

      We thank the Reviewer 3 for their positive comments about the manuscript. We appreciate the constructive comments for improvements, and detailed corrections, which we have incorporated into our revised work.

      Weaknesses:

      The main conclusion remains clouded in ambiguity because of a poorly resolved Bayesian consensus and is consistent with work led by the lead author in 2019 (thus compromising the novelty of the findings). The Bayesian trees being majority rules consensus trees, optimising characters onto them (Figure 7b, d) is problematic. Optimising on a consensus tree can produce spurious optimisations that inflate tree length or distort other metrics of fit. Line 264 refers to at least three independent origins of cephalic sutures in artiopodans but the fully resolved Figure 7c requires only two origins. 

      We thank the reviewer for pointing this out. However now the analyses have been re-run we have new results to consider. The results still support multiple origins of sutures. We also note that the dots were indicating how terminals were coded. This is now clearer in the revised version of this figure (Figure 8 in the new version).

      We have extended our interrogation of the trees by incorporating treespace analyses. These add support for the nodes of interest (around the base of trilobites), showing that the coding of Acanthomeridion ventral plate homologies impacts its position in the tree, and thus has implications for our understanding of the evolution of sutures in trilobites.

      The question of how many times dorsal ecdysial sutures evolved in Artiopoda was addressed by Hou et al (2017), who first documented the facial sutures of Acanthomeridion and optimised them onto a phylogeny to infer multiple origins, as well as in a paper led by the lead author in Cladistics in 2019. Du et al. (2019) presented a phylogeny based on an earlier version of the current dataset wherein they discussed how many times sutures evolved or were lost based on their presence in

      Zhiwenia/Protosutura, Acanthomeridion, and Trilobita. To their credit, the authors acknowledge this (lines 62-65). The answer here is slightly different (because some topologies unite Acanthomeridion and trilobites). 

      The following points are not meant to be "Weaknesses" but rather are refinements: 

      I recommend changing the title of the paper from "cephalic sutures" to "dorsal ecdysial sutures" to be more precise about the character that is being tracked evolutionarily. Lots of arthropods have cephalic sutures (e.g., the ventral marginal suture of xiphosurans; the Y-shaped dorsomedian ecdysial line in insects). The text might also be updated to change other instances of "cephalic sutures" to a more precise wording. 

      We appreciate this point and have changed the title as suggested. 

      The authors have provided (but not explicitly identified) support values for nodes in their Bayesian trees but not in their parsimony ones. Please do the jackknife or bootstrap for the parsimony analyses and make it clear that the Bayesian values are posterior probabilities. 

      With the addition of further trilobite terminals to our parsimony analyses, the results became poor.

      Specifically the internal relationships of trilobites did not conform to any previous study, and Olenellus getzi was not resolved as an early diverging member of the group. This meant that these analyses could not be used for addressing the hypothesis of reviewer two. We decided to exclude reporting parsimony analysis results from this version to avoid confusion.

      We have added a note that the values reported at the nodes are posterior probabilities to figures S8, S9 and S10 where we show the full Bayesian results.

      In line 65 or somewhere else, it might be noted that a single origin of the dorsal facial sutures in trilobites has itself been called into question. Jell (2003) proposed that separate lineages of Eutrilobita evolved their facial sutures independently from separate sister groups within Olenellina. 

      We have added this to the introduction (Line 98). Thank you for raising this point.

      I have provided minor typographic or terminological corrections to the authors in a list of recommendations that may not be publicly available. 

      We appreciate the points made by the reviewer and their detailed corrections, which we have corrected in the revised version.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper the authors provide a characterisation of auditory responses (tones, noise, and amplitude modulated sounds) and bimodal (somatosensory-auditory) responses and interactions in the higher order lateral cortex (LC) of the inferior colliculus (IC) and compare these characteristic with the higher order dorsal cortex (DC) of the IC - in awake and anaesthetised mice. Dan Llano's group have previously identified gaba'ergic patches (modules) in the LC distinctly receiving inputs from somatosensory structures, surrounded by matrix regions receiving inputs from auditory cortex. They here use 2P calcium imaging combined with an implanted prism to - for the first time - get functional optical access to these subregions (modules and matrix) in the lateral cortex of IC in vivo, in order to also characterise the functional difference in these subparts of LC. They find that both DC and LC of both awake and anaesthetised appears to be more responsive to more complex sounds (amplitude modulated noise) compared to pure tones and that under anesthesia the matrix of LC is more modulated by specific frequency and temporal content compared to the gaba'ergic modules in LC. However, while both LC and DC appears to have low frequency preferences, this preference for low frequencies is more pronounced in DC. Furthermore, in both awake and anesthetized mice somatosensory inputs are capable of driving responses on its own in the modules of LC, but very little in the matrix. The authors now compare bimodal interactions under anaesthesia and awake states and find that effects are different in some cases under awake and anesthesia - particularly related to bimodal suppression and enhancement in the modules.

      The paper provides new information about how subregions with different inputs and neurochemical profiles in the higher order auditory midbrain process auditory and multisensory information, and is useful for the auditory and multisensory circuits neuroscience community.

      The manuscript is improved by the response to reviewers. The authors have addressed my comments by adding new figures and panels, streamlining the analysis between awake and anaesthetised data (which has led to a more nuanced, and better supported conclusion), and adding more examples to better understand the underlying data. In streamlining the analyses between anaesthetised and awake data I would probably have opted for bringing these results into merged figures to avoid repetitiveness and aid comparison, but I acknowledge that that may be a matter of style. The added discussions of differences between awake and anaesthesia in the findings and the discussion of possible reasons why these differences are present help broaden the understanding of what the data looks like and how anaesthesia can affect these circuits.

      As mentioned in my previous review, the strength of this study is in its demonstration of using prism 2p imaging to image the lateral shell of IC to gain access to its neurochemically defined subdivisions, and they use this method to provide a basic description of the auditory and multisensory properties of lateral cortex IC subdivisions (and compare it to dorsal cortex of IC). The added analysis, information and figures provide a more convincing foundation for the descriptions and conclusions stated in the paper. The description of the basic functionality of the lateral cortex of the IC are useful for researchers interested in basic multisensory interactions and auditory processing and circuits. The paper provides a technical foundation for future studies (as the authors also mention), exploring how these neurochemically defined subdivisions receiving distinct descending projections from cortex contribute to auditory and multisensory based behaviour.

      Minor comment:

      - The authors have now added statistics and figures to support their claims about tonotopy in DC and LC. I asked for and I think allows readers to better understand the tonotopical organisation in these areas. One of the conclusions by the authors is that the quadratic fit is a better fit that a linear fit in DCIC. Given the new plots shown and previous studies this is likely true, though it is worth highlighting that adding parameters to a fitting procedure (as in the case when moving from linear to quadratic fit) will likely lead to a better fit due to the increased flexibility of the fitting procedure.

      Thank you for the suggestion. We have highlighted that the quadratic function allowed the regression model to include the cells tuned to higher frequencies at the rostromedial part of the DC and result in a better fit, which is consistent with the tonotopic organization that was previously described as shown in text at (lines 208-211).

      Reviewer #2 (Public Review):

      Summary:

      The study describes differences in responses to sounds and whisker deflections as well as combinations of these stimuli in different neurochemically defined subsections of the lateral and dorsal cortex of the inferior colliculus in anesthetised and awake mice.

      Strengths:

      A major achievement of the work lies in obtaining the data in the first place as this required establishing and refining a challenging surgical procedure to insert a prism that enabled the authors to visualise the lateral surface of the inferior colliculus. Using this approach, the authors were then able to provide the first functional comparison of neural responses inside and outside of the GABA-rich modules of the lateral cortex. The strongest and most interesting aspects of the results, in my opinion, concern the interactions of auditory and somatosensory stimulation. For instance, the authors find that a) somatosensory-responses are strongest inside the modules and b) somatosensory-auditory suppression is stronger in the matrix than in the modules. This suggests that, while somatosensory inputs preferentially target the GABA-rich modules, they do not exclusively target GABAergic neurons within the modules (given that the authors record exclusively from excitatory neurons we wouldn't expect to see somatosensory responses if they targeted exclusively GABAergic neurons) and that the GABAergic neurons of the modules (consistent with previous work) preferentially impact neurons outside the modules, i.e. via long-range connections.

      Weaknesses:

      While the findings are of interest to the subfield they have only rather limited implications beyond it and the writing is not quite as precise as it could be.

      Reviewer #3 (Public Review):

      The lateral cortex of the inferior colliculus (LC) is a region of the auditory midbrain noted for receiving both auditory and somatosensory input. Anatomical studies have established that somatosensory input primarily impinges on "modular" regions of the LC, which are characterized by high densities of GABAergic neurons, while auditory input is more prominent in the "matrix" regions that surround the modules. However, how auditory and somatosensory stimuli shape activity, both individually and when combined, in the modular and matrix regions of the LC has remained unknown.

      The major obstacle to progress has been the location of the LC on the lateral edge of the inferior colliculus where it cannot be accessed in vivo using conventional imaging approaches. The authors overcame this obstacle by developing methods to implant a microprism adjacent to the LC. By redirecting light from the lateral surface of the LC to the dorsal surface of the microprism, the microprism enabled two-photon imaging of the LC via a dorsal approach in anesthetized and awake mice. Then, by crossing GAD-67-GFP mice with Thy1-jRGECO1a mice, the authors showed that they could identify LC modules in vivo using GFP fluorescence while assessing neural responses to auditory, somatosensory, and multimodal stimuli using Ca2+ imaging. Critically, the authors also validated the accuracy of the microprism technique by directly comparing results obtained with a microprism to data collected using conventional imaging of the dorsal-most LC modules, which are directly visible on the dorsal IC surface, finding good correlations between the approaches.

      Through this innovative combination of techniques, the authors found that matrix neurons were more sensitive to auditory stimuli than modular neurons, modular neurons were more sensitive to somatosensory stimuli than matrix neurons, and bimodal, auditory-somatosensory stimuli were more likely to suppress activity in matrix neurons and enhance activity in modular neurons. Interestingly, despite their higher sensitivity to somatosensory stimuli than matrix neurons, modular neurons in the anesthetized prep were overall more responsive to auditory stimuli than somatosensory stimuli (albeit with a tendency to have offset responses to sounds). This suggests that modular neurons should not be thought of as primarily representing somatosensory input, but rather as being more prone to having their auditory responses modified by somatosensory input. However, this trend was different in the awake prep, where modular neurons became more responsive to somatosensory stimuli. Thus, to this reviewer, one of the most intriguing results of the present study is the extent to which neural responses in the LC changed in the awake preparation. While this is not entirely unexpected, the magnitude and stimulus specificity of the changes caused by anesthesia highlight the extent to which higher-level sensory processing is affected by anesthesia and strongly suggests that future studies of LC function should be conducted in awake animals.

      Together, the results of this study expand our understanding of the functional roles of matrix and module neurons by showing that responses in LC subregions are more complicated than might have been expected based on anatomy alone. The development of the microprism technique for imaging the LC will be a boon to the field, finally enabling much-needed studies of LC function in vivo. The experiments were well-designed and well-controlled, the limitations of two-photon imaging for tracking neural activity are acknowledged, and appropriate statistical tests were used.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Increase font size of scale bars on figure 6.

      Thank you for the suggestion. We have increased the font size of the scale bar.

      Reviewer #2 (Recommendations For The Authors):

      Line 505: typo: 'didtinction'

      Thank you for the suggestion and we do apologize for the typo. We have fixed the word as shown in the text (line 506).

      No further comments.

      Reviewer #3 (Recommendations For The Authors):

      Line 543: Change "contripute" to "contribute"

      Thank you for the suggestion and we do apologize for the typo. We have fixed the word as shown in the text (line 544).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) In the first paragraph of the result section it is not clear why the authors introduce the function of p53ΔAS/ΔAS in thymocyte and then they mention fibroblasts. The authors should clarify this point. The authors should also explain based on what rationale they use doxorubicin and nutlin to analyze p53 activity (Figure 1 and figure S1). 

      We thank the reviewer for this comment. In the revised manuscript, we corrected this by mentioning, at the beginning of the Results section: “We analyzed cellular stress responses in thymocytes, known to undergo a p53-dependent apoptosis upon irradiation (Lowe et al., 1993), and in primary fibroblasts, known to undergo a p53-dependent cell cycle arrest in response to various stresses - e.g. DNA damage caused by irradiation or doxorubicin (Kastan et al., 1992), and the Nutlin-mediated inhibition of Mdm2, a negative regulator of p53 (Vassilev et al., 2004).”

      (2) The authors should provide quantification for the western blot in figure 2D because the reduction of p53 protein level in mutant vs wt tumors is not striking. 

      In the previous version of the manuscript, the quantification of p53 bands had been included, but quantification results were mentioned below the actin bands, rather than the p53 bands, and this was probably confusing. We have corrected this in the revised version of the manuscript. The quantification results are now provided just below the p53 bands in Figs. 1B and 2D, which should clarify this point. For Figure 2D, the quantifications show a strong decrease in p53 levels for 3 out of 4 analyzed mutant tumors. For consistency purposes, in the revised manuscript the quantification results also appear below Myc bands in Fig. 2C.

      (3) In the discussion section, the authors propose that a difference in Ackr4 expression may have prognostic value and that measuring ACKR4 gene expression in male patients with Burkitt lymphoma could be useful to identify the patients at higher risk. However the authors perform a lot of correlative analysis, both in mice and in patients, but the manuscript lacks of functional experiments that could help to functionally characterize Ackr4 and Mt2 in the etiology of B-cell lymphomas in males (both in mouse and in human models).

      In the previous version of the manuscript, we proposed that Ackr4 might act as a suppressor of B-cell lymphomagenesis by attenuating Myc signaling. This hypothesis relied on studies showing that Ackr4 impairs the Ccr7 signaling cascade, which may lead to decreased Myc activity (Ulvmar et al., 2014; Shi et al., 2015; Bastow et al., 2021) and that the loss of Ccr7 may delay Myc-driven lymphomagenesis (Rehm et al., 2011). Furthermore, we proposed that the increased expression of Mt2 in p53ΔAS/ΔAS Em-Myc male splenic cells reflected an increase in Myc activity, because Mt2 is known to be regulated by Myc (Qin et al., 2021) and because the Mt2 promoter is bound by Myc in B cells according to experiments reported in the ChIP-Atlas database. However, in the first version of the manuscript this hypothesis might have appeared only partially supported by our data because an increase in Myc activity could be expected to have a more general impact, i.e. an impact not only on the expression of Mt2, but also on the expression of many canonical Myc target genes. In the revised manuscript, we show that this is indeed the case. We performed a gene set enrichment analysis (GSEA) comparing the RNAseq data from p53ΔAS/ΔAS Eμ-Myc and p53+/+ Eμ-Myc male splenic cells and found an enrichment of hallmark Myc targets in p53ΔAS/ΔAS Eμ-Myc cells. These new data, which strengthen our hypothesis of differences in Myc signaling intensity, are presented in Fig. 3K and Table S2.

      Importantly, we now go beyond correlative analyses by providing direct experimental evidence that ACKR4 impacts on the behavior of Burkitt lymphoma cells. We used a CRISPR-Cas9 approach to knock-out ACKR4 in Raji Burkitt lymphoma cells and found that ACKR4 KO cells exhibited a 4-fold increase in chemokine-guided cell migration. These new data are presented in Figure 4F and the supplemental Figures S5-S7.  

      Finally, following a suggestion of Reviewer#2, we now also point out that “Ackr4 regulates B cell differentiation (Kara et al., 2018), which raises the possibility that an altered p53-Ackr4 pathway in p53ΔAS/ΔAS Eμ-Myc male splenic cells might contribute to increase the pools of pre-B and immature B cells that may be prone to lymphomagenesis.”

      In sum, we now mention in the Discussion that a decrease in Ackr4 expression might promote B-cell lymphomagenesis through three non-exclusive mechanisms.

      Reviewer #2 (Recommendations For The Authors): 

      (1) A great addition would be to demonstrate how p53AS specifically contributes to the regulation of Ackr4. In particular, is there evidence that p53AS might be preferentially recruited on p53 RE within that gene as compared to WT? The availability of specific antibodies that distinguish between AS and WT p53 might help to address this (experimentally complex) question. As a note, usage of such antibodies would also strengthen Fig 1B, in which the AS isoform appears as a mere faint shadow under p53, thus making its "disappearance" in trp53ΔAS/ΔAS difficult to evaluate. 

      We agree with the referee that efficient antibodies against p53-AS isoforms would have been useful. In fact, we tried a non-commercial antibody developed for that purpose, but it led to many unspecific bands in western blots and appeared not reliable. Importantly however, our luciferase assays clearly show that both p53-a and p53-AS can transactivate Ackr4, a result that might be expected because these isoforms share the same DNA binding domain. Furthermore, because p53-a isoforms appear more abundant than p53-AS isoforms at the protein and RNA levels (Figs. 1B and S1A), and because the loss of p53-AS isoforms leads to a significant decrease in p53-a protein levels (Figs. 1B and 2D), we think that in p53ΔAS/ΔAS cells the reduction in p53-a levels might be the main reason for a decreased transactivation of Ackr4. This is now more clearly discussed in the revised manuscript.

      (2) A most interesting observation is in Fig3 A and Fig S3, showing that spleen cells of p53ΔAS Eμ-Myc males (but not females) were enriched in pre-B and immature B cells as compared to WT counterparts. This observation points to a possible defect in B cell maturation process. It would be most interesting to determine whether this particular defect is directly mediated by a p53AS-Ackr4 axis. The hypothesis raised by the authors in the Discussion section is that increased Ackr4 expression may delay lymphomatogenesis, but data in Fig 3A and 3S actually suggest that ΔAS increases the pool of immature B-cell that may be prone to lymphomagenesis. 

      We thank the reviewer for this useful comment, which we integrated in the Discussion of the revised manuscript. Ackr4 was shown to regulate B cell differentiation (Kara at al. (2018) J Exp Med 215, 801–813), so this is indeed one of the possible mechanisms by which a deregulation of the p53-Ackr4 axis might promote lymphomagenesis. We now mention: “Ackr4 regulates B cell differentiation (Kara et al., 2018), which raises the possibility that an altered p53-Ackr4 pathway in p53ΔAS/ΔAS Eμ-Myc male splenic cells might contribute to increase the pools of pre-B and immature B cells that may be prone to lymphomagenesis.” This is presented as one of three possible mechanisms by which decreased Ackr4 levels may promote tumorigenesis, the two others being the impact of Ackr4 on the chemokine-guided migration of lymphoma cells and its apparent effect on Myc signalling.

      (3) The concordance with a male-specific prognostic effect of Ackr4 is most interesting in itself but is only of correlative evidence with respect to the study. Is there any information on whether p53AS expression is also a prognostic factor in BL? And is there evidence that Ackr4 may also be a male-specific prognostic factor in other B-cell malignancies, e.g. Multiple Myeloma?

      We have now performed the CRISPR-mediated knock-out of ACKR4 in Burkitt lymphoma cells and found that it leads to a dramatic increase in chemokine-guided cell migration, which goes beyond correlation. This significant new result is mentioned in the revised abstract and presented in detail in Figures 4F and S5-S7.

      Regarding p53-AS isoforms, they are murine-specific isoforms (Marcel et al. (2011) Cell Death Diff 18, 1815-1824), so there is no information on p53-AS expression in Burkitt lymphoma. Human p53 isoforms with alternative C-terminal domains are p53b and p53g isoforms, but the datasets we analyzed did not provide any information on the relative levels of p53a (the canonical isoform), p53b or p53g isoforms. We agree with the referee that this is an interesting question, but that cannot be answered with currently available datasets.

      Regarding the different types of B-cell malignancies, we had already shown that Ackr4 is a male-specific prognostic factor in Burkitt lymphomas but not in Diffuse Large B cell lymphomas, which indicated that it is not a prognostic factor in all types of B cell lymphomas. For this revision, we also searched for its potential prognostic value in multiple myeloma, and found that, as for DLBCL, it is not a prognostic factor in this cancer type. This new analysis is presented in Figure S4C.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: This article explores the role of Ecdysone in regulating female sexual receptivity in Drosophila. The researchers found that PTTH, throughout its role as a positive regulator of ecdysone production, negatively affects the receptivity of adult virgin females. Indeed, loss of larval PTTH before metamorphosis significantly increases female receptivity right after adult eclosion and also later. However, during metamorphic neurodevelopment, Ecdysone, primarily through its receptor EcR-A, is required to properly develop the P1 neurons since its silencing led to morphological changes associated with a reduction in adult female receptivity. Nonetheless, the result shown in this manuscript sheds light on how Ecdysone plays a dual role in female adult receptivity, inhibiting it during larval development and enhancing it during metamorphic development. Unfortunately, this dual and opposite effect in two temporally different developmental stages has not been highlighted or explained. 

      Strengths: This paper exhibits multiple strengths in its approach, employing a well-structured experimental methodology that combines genetic manipulations, behavioral assays, and molecular analysis to explore the impact of Ecdysone on regulating virgin female receptivity in Drosophila. The study provides clear and substantial findings, highlighting that removing PTTH, a positive Ecdysone regulator, increases virgin female receptivity. Additionally, the research expands into the temporal necessity of PTTH and Ecdysone function during development. 

      Weaknesses: 

      There are two important caveats with the data that are reflecting a weakness: 

      (1) Contradictory Effects of Ecdysone and PTTH: One notable weakness in the data is the contrasting effects observed between Ecdysone and its positive regulator PTTH. PTTH loss of function increases female receptivity, while ecdysone loss of function reduces it. Given that PTTH positively regulates Ecdysone, one would expect that the loss of function of both would result in a similar phenotype or at least a consistent directional change. 

      A1. As newly formed prepupae, the ptth-Gal4>UAS-Grim flies display similar changes in gene expression to the genetic control flies to response to a high-titer ecdysone pulse. These include the repression of EcR (McBrayer et al.,2007). We tested whether there is a similar feedforward relationship between PTTH and EcR-A. We quantified the EcR-A mRNA level of PTTH -/- and PTTH -/+ in the whole body of newly formed prepupae. Indeed, PTTH -/- induced increased EcR-A expression in the whole body of newly formed prepupae compared with PTTH -/+ flies. Because of the function of EcR-A in gene expression, this suggests that PTTH -/- disturbs the regulation of a serious of gene expressions during metamorphosis. However, it is not sure that the EcR-A expression in pC1 neurons is increased compared with genetic controls when PTTH is deleted. Furthermore, PTTH -/- must affect development of other neurons rather than only pC1 neurons. So, the feedforward relationship between PTTH and EcRA at the start of prepupal stage is one possible cause for the contradictory effects of PTTH -/- and EcR-A RNAi in pC1 neurons.  

      (2) Discordant Temporal Requirements for Ecdysone and PTTH: Another weakness lies in the different temporal requirements for Ecdysone and PTTH. The data from the manuscript suggest that PTTH is necessary during the larval stage, as shown in Figure 2 E-G, while Ecdysone is required during the pupal stage, as indicated in Figure 5 I-K. Ecdysone is a crucial developmental hormone with precisely regulated expression throughout development, exhibiting several peaks during both larval and pupal stages. PTTH is known to regulate Ecdysone during the larval stage, specifically by stimulating the kinetics of Ecdysone peaking at the wandering stage. However, it remains unclear whether pupal PTTH, expressed at higher levels during metamorphosis, can stimulate Ecdysone production during the pupal stage. Additionally, given the transient nature of the Ecdysone peak produced at wandering time, which disappears shortly before the end of the prepupal stage, it is challenging to infer that larval PTTH will regulate Ecdysone production during the pupal stage based on the current state of knowledge in the neuroendocrine field.  

      Considering these two caveats, the results suggest that the authors are witnessing distinct temporal and directional effects of Ecdysone on virgin female receptivity.  

      A2. First of all, it is necessary to clarify the detailed time for the manipulation of Ptth gene and PTTH neurons. In Figure 3, activation of PTTH neurons during the stage 2 inhibited the female receptivity. The “stage 2” is from six hours before the 3rd-instar larvae to the end of the wandering larvae (the start of prepupae). In Figure 5, The “pupal stage” is from the prepupal stage to the end of pupal stage. This “pupal stage” includes the forming of prepupae when the ecdysone peak is not disappeared. The time of manipulating Ptth and EcR-A in pC1 neurons are continuous. In addition, the pC1-Gal4 expressing neurons appear also at the start of prepupal stage. So, it is possible that PTTH regulates female receptivity through the function of EcR-A in pC1 neurons. 

      Reviewer #1 (Recommendations For The Authors): 

      In light of the significant caveat previously discussed, I will just make a few general suggestions: 

      (1) The paper primarily focuses on robust phenotypes, particularly in PTTH mutants, with a well-detailed execution of several experiments, resulting in thorough and robust outcomes. However, due to the caveat previously presented (opposite effect in larva and pupa), consider splitting the paper into two parts: Figures 1 to 4 deal with the negative effect of PTTH-Ecdysone on early virgin female receptivity, while Figures 5 to 7 focus on the positive metamorphic effect of Ecdysone in P1 metamorphic neurodevelopment. However, in this scenario, the mechanism by which PTTH loss of function increases female receptivity should be addressed.

      A3. It is a good suggestion that splitting the paper into two parts associated with the PTTH function and EcR function in pC1 neurons separately, if it is impossible that PTTH functions in female receptivity through the function of EcR-A in pC1 neurons. However, because of the feedforward relationship between PTTH and EcR-A in the newly formed prepupae, and the time of manipulating Ptth and EcR-A in pC1 neurons is continuous, it is possible that these two functions are not independent of each other. So, we still keep the initial edition.

      (2) Validate the PTTH mutants by examining homozygous mutant phenotypes and the dose-dependent heterozygous mutant phenotype using existing PTTH mutants. This could also be achieved using RNAi techniques.

      A4. We did not get other existing PTTH mutants. We instead decreased the PTTH expression in PTTH neurons and dsx+ neurons, but did not detect the similar phenotype to that of PTTH -/-. Similarly, the overexpression through PTTH-Gal4>UAS-PTTH is also not sufficient to change female receptivity. It is possible that both decreasing and increasing PTTH expression are not sufficient to change female receptivity.

      (3) Clarify if elav-Gal4 is not expressed in PTTH neurons and discuss how the rescue mechanisms work (hormonal, paracrine, etc.) in the text.

      A5. We tested the overlap of elav-Gal4>GFP signal and the stained PTTH with PTTH antibody. We did not detect the overlap. It suggests that elav-Gal4 is not expressed in PTTH neurons. However, we detected the expression of PTTH (PTTH antibody) in CNS when overexpressed PTTH using elav-Gal4>UASPTTH based on PTTH -/-. Furthermore, this rescued the phenotype of PTTH -/- in female receptivity. Insect PTTH isoforms have similar probable signal peptide for secreting. Indeed, except for the projection of axons to PG gland, PTTH also carries endocrine function acting on its receptor Torso in light sensors to regulate light avoidance of larvae. The overexpressed PTTH in other neurons through elav-Gal4>UASPTTH may act on the PG gland through endocrine function and then induce the ecdysone synthesis and release. So that, although elav-Gal4 is not expressed in PTTH neurons, the ecdysone synthesis triggered by PTTH from the hemolymph may result in the rescued PTTH -/- phenotype in female receptivity.

      (4) Consider renaming the new PTTH mutant to avoid confusion with the existing PTTHDelta allele. 

      A6. We have renamed our new PTTH mutant as PtthDelete.

      (5) Include the age of virgin females in each figure legend, especially for Figures 2 to 7, to aid in interpretation. This is essential information since wild-type early virgins -day 1- show no receptivity. In contrast, they reach a typical 80% receptivity later, and the mechanism regulating the first face might differ from the one occurring later.

      A7. We have included the age of virgin females in each figure legend. 

      (6) Explain the relevance of observing that PTTH adult neurons are dsx-positive, as it's unclear why this observation is significant, considering that these neurons are not responsible for the observed receptivity effect in virgin females. Alternatively, address this in the context of the third instar larva or clarify its relevance.  

      A8. We decreased the DsxF expression in PTTH neurons and did not detect significantly changed female receptivity. Almost all neurons regulating female receptivity, including pC1 neurons, express DsxF. We suppose that PTTH neurons have some relationship with other DsxF-positive neurons which regulate female receptivity. Indeed, we detected the overlap of dsx-LexA>LexAop-RFP and torso-Gal4>UAS-GFP during larval stage. Furthermore, decreasing Torso expression in pC1 neurons significantly inhibit female receptivity. 

      These results suggest that, PTTH regulates female receptivity not only through ecdysone, but also may through regulating other neurons especially DsxF-positive neurons associated with female receptivity directly. 

      Reviewer #2 (Public Review): 

      Summary: The authors tried to identify novel adult functions of the classical Drosophila juvenile-adult transition axis (i.e. ptth-ecdysone). Surprisingly, larval ptth-expressing neurons expressed the sex-specific doublesex gene, thus belonging to the sexual dimorphic circuit. Lack of ptth during late larval development caused enhanced female sexual receptivity, an effect rescued by supplying ecdysone in the food. Among many other cellular players, pC1 neurons control receptivity by encoding the mating status of females. Interestingly, during metamorphosis, a subtype of pC1 neurons required Ecdysone Receptor A in order to regulate such female receptivity. A transcriptomic analysis using pC1-specific Ecdyone signaling down-regulation gives some hints of possible downstream mechanisms. 

      Strengths: the manuscript showed solid genetic evidence that lack of ptth during development caused enhanced copulation rate in female flies, which includes ptth mutant rescue experiments by overexpressing ptth as well as by adding ecdysone-supplemented food. They also present elegant data dissecting the temporal requirements of ptth-expressing neurons by shifting animals from non-permissive to permissive temperatures, in order to inactivate neuronal function (although not exclusively ptth function). By combining different drivers together with a EcR-A RNAi line authors also identified the Ecdysone receptor requirements of a particular subtype of pC1 neurons during metamorphosis. Convincing live calcium imaging showed no apparent effect of EcR-A in neural activity, although some effect on morphology is uncovered. Finally, bulk RNAseq shows differential gene expression after EcR-A down-regulation. 

      Weaknesses: the paper has three main weaknesses. The first one refers to temporal requirements of ptth and ecdysone signaling. Whereas ptth is necessary during larval development, the ecdysone effect appears during pupal development. ptth induces ecdysone synthesis during larval development but there is no published evidence about a similar role for ptth during pupal stages. Furthermore, larval and pupal ecdysone functions are different (triggering metamorphosis vs tissue remodeling). The second caveat is the fact that ptth and ecdysone loss-of-function experiments render opposite effects (enhancing and decreasing copulation rates, respectively). The most plausible explanation is that both functions are independent of each other, also suggested by differential temporal requirements. Finally, in order to identify the effect in the transcriptional response of down-regulating EcR-A in a very small population of neurons, a scRNAseq study should have been performed instead of bulk RNAseq. 

      In summary, despite the authors providing convincing evidence that ptth and ecdysone signaling pathways are involved in female receptivity, the main claim that ptth regulates this process through ecdysone is not supported by results. More likely, they'd rather be independent processes. 

      B1. Clarification: in Figure 3, activation of PTTH neurons during the stage 2 inhibited the female receptivity. The “stage 2” is from six hours before the 3rd-instar larvae to the end of the wandering larvae (the start of prepupae). In Figure 5, The “pupal stage” is from the start of prepupal stage to the end of pupal stage. This “pupal stage” includes the forming of prepupae when the ecdysone peak is not disappeared. The time of manipulating Ptth and EcR-A in pC1 neurons are continuous. In addition, the pC1-Gal4 expressing neurons appear also at the start of prepupal stage. So, it is possible that PTTH regulates female receptivity through the function of EcR-A in pC1 neurons. 

      B2. During the forming of prepupae, the ptth-Gal4>UAS-Grim flies display similar changes in gene expression to the genetic control flies to response to a high-titer ecdysone pulse. These include the repression of EcR (McBrayer et al.,2007). We tested whether there is a similar feedforward relationship between PTTH and EcR-A. We quantified the EcR-A mRNA level of PTTH -/- and PTTH -/+ in the whole body of newly formed prepupae. Indeed, PTTH -/- induced increased EcR-A compared with PTTH -/+ flies. Because of the function of EcR-A in gene expression, this suggests that PTTH -/- disturbs the regulation of a serious of gene expressions during metamorphosis. However, it is not sure that the EcR-A expression in pC1 neurons is increased compared with genetic controls when PTTH is deleted. Furthermore, PTTH -/- must affect the development of other neurons rather than only pC1 neurons. So, the feedforward relationship between PTTH and EcR-A at the start of prepupal stage is one possible cause for the contradictory effects of PTTH -/- and EcR-A RNAi in pC1 neurons.

      B3. We will do single cell sequencing in pC1 neurons for the exploration of detailed molecular mechanism of female receptivity in the future.

      Reviewer #2 (Recommendations For The Authors): 

      Additional experiments and suggestions: 

      - torso LOF in the PG to determine whether or not the ecdysone peak regulated by ptth (there is a 1-day delay in pupation) is responsible for the ptth effect in L3. In the same line, what happens if torso is downregulated in the pC1 neurons? Is there any effect on copulation rates? 

      B4. Because the loss of phm-Gal4, we could not test female receptivity when decreasing the expression of Torso in PG gland. However, decreasing Torso expression in pC1 neurons significantly inhibit female receptivity. This suggests that PTTH regulates female receptivity not only through ecdysone but also through regulating dsx+ pC1 neurons in female receptivity directly.

      - What is the effect of down-regulating ptth in the dsx+ neurons? No ptth RNAi experiments are shown in the paper. 

      B5. We decreased PTTH expression in dsx+ neurons but did not detect the change in female receptivity.  We also decreased PTTH expression in PTTH neurons using PTTH-Gal4, also did not detect the change in female receptivity. Similarly, the overexpression through PTTH-Gal4>UAS-PTTH is also not sufficient to change female receptivity. It is possible that both decreasing and increasing PTTH expression are not sufficient to change female receptivity.

      - Why are most copulation rate experiments performed between 4-6 days after eclosion? ptth LOF effect only lasts until day 3 after eclosion (but very weak-fig 1). Again, this supports the idea that ptth and ecdysone effects are unrelated.

      B6. Most behavioral experiments were performed between 4-6 days after eclosion as most other studies in flies, because the female receptivity reaches the peak at that time. Ptth LOF made female receptivity enhanced from the first day after eclosion. This seems like the precocious puberty. Wild type females reach high receptivity at 2 days after eclosion (about 75% within 10 min). We suppose that Ptth LOF effect only lasts until day 3 after eclosion because too high level of receptivity of control flies to exceed.

      It is not sure whether the effect of PTTH-/- in female receptivity disappears after the 3rd day of adult flies. So that it is not sure whether PTTH and EcR-A effects in pC1 neurons are unrelated.

      - The fact that pC1d neuronal morphology changes (and not pC1b) does not explain the effect of EcR-A LOF. Despite it is highlighted in the discussion, data do not support the hypothesis. How do these pC1 neurons look like in a ptth mutant animal regarding Calcium imaging and/or morphology? 

      B7. We detected the pattern of pC1 neurons when PTTH is deleted. Consistent with the feedforward relationship between PTTH and expression of EcR-A in newly formed prepupae, PTTH deletion induced less established pC1-d neurons contrary to that induced by EcR-A reduction in pC1 neurons. However, it is not sure that the expression of EcR-A in pC1 neurons is increased when PTTH is deleted. Furthermore, on the one hand, manipulation of PTTH has general effect on the neurodevelopment not only regulating pC1 neurons. On the other hand, the detailed pattern of pC1-b neurons which is the key subtype regulating female receptivity when EcR-A is decreased in pC1 neurons or PTTH is deleted could not be seen clearly. So, the abnormal development of pC1-b neurons, if this is true, is just one of the possible reasons for the effect of PTTH deletion on female receptivity.

      - The discussion is incomplete, especially the link between ptth and ecdysone; discuss why the phenotype is the opposite (ptth as a negative regulator of ecdysone in the pupa, for instance); the difference in size due to ptth LOF might be related to differential copulation rates.  

      B8. We have revised the discussion. We could not exclude the effect of size of body on female receptivity when PTTH was deleted or PTTH neurons were manipulated, although there was not enough evidence for the effect of body size on female receptivity.

      - scheme of pC neurons may help. 

      B9. We have tried to label pC1 neurons with GFP and sort pC1 neurons through flow cytometry sorting, but could not success. This may because the number of pC1 neurons is too low in one brain. We will try single-cell sequencing in the future. 

      - Immunofluorescence images are too small.

      B10. We have resized the small images.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript shows that mutations that disable the gene encoding the PTTH gene cause an increase in female receptivity (they mate more quickly), a phenotype that can be reversed by feeding these mutants the molting hormone, 20-hydoxyecdysone (20E). The use of an inducible system reveals that inhibition or activation of PTTH neurons during the larval stages increases and decreases female receptivity, respectively, suggesting that PTTH is required during the larval stages to affect the receptivity of the (adult) female fly. Showing that these neurons express the sex-determining gene dsx leads the authors to show that interfering with 20E actions in pC1 neurons, which are dsx-positive neurons known to regulate female receptivity, reduces female receptivity and increases the arborization pattern of pC1 neurons. The work concludes by showing that targeted knockdown of EcRA in pC1 neurons causes 527 genes to be differentially expressed in the brains of female flies, of which 123 passed a false discovery rate cutoff of 0.01; interestingly, the gene showing the greatest down-regulation was the gene encoding dopamine beta-monooxygenase. 

      Strengths 

      This is an interesting piece of work, which may shed light on the basis for the observation noted previously that flies lacking PTTH neurons show reproductive defects ("... females show reduced fecundity"; McBrayer, 2007; DOI 10.1016/j.devcel.2007.11.003). 

      Weaknesses: 

      There are some results whose interpretation seem ambiguous and findings whose causal relationship is implied but not demonstrated. 

      (1) At some level, the findings reported here are not at all surprising. Since 20E regulates the profound changes that occur in the central nervous system (CNS) during metamorphosis, it is not surprising that PTTH would play a role in this process. Although animals lacking PTTH (rather paradoxically) live to adulthood, they do show greatly extended larval instars and a corresponding great delay in the 20E rise that signals the start of metamorphosis. For this reason, concluding that PTTH plays a SPECIFIC role in regulating female receptivity seems a little misleading, since the metamorphic remodeling of the entire CNS is likely altered in PTTH mutants. Since these mutants produce overall normal (albeit larger--due to their prolonged larval stages) adults, these alterations are likely to be subtle. Courtship has been reported as one defect expressed by animals lacking PTTH neurons, but this behavior may stand out because reduced fertility and increased male-male courtship (McBrayer, 2007) would be noticeable defects to researchers handling these flies. By contrast, detecting defects in other behaviors (e.g., optomotor responses, learning and memory, sleep, etc) would require closer examination. For this reason, I would ask the authors to temper their statement that PTTH is SPECIFICALLY involved in regulating female receptivity.  

      C1. We agree with that, it is not surprising that PTTH regulates the profound changes that occur in the CNS during metamorphosis through ecdysone. Also, the behavioral changes induced by PTTH mutants include not only female receptivity. We will temper the statement about the function of PTTH on female receptivity.

      We think there are two new points in our text although more evidences are needed in the future. On the one hand, PTTH deletion and the reduction of EcR-A in pC1 neurons during metamorphosis have opposite effects on female receptivity. On the other hand, development of pC1-b neurons regulated by EcR-A during metamorphosis is important for female receptivity.

      (2) The link between PTTH and the role of pC1 neurons in regulating female receptivity is not clear. Again, since 20E controls the metamorphic changes that occur in the CNS, it is not surprising that 20E would regulate the arborization of pC1 neurons. And since these neurons have been implicated in female receptivity, it would therefore be expected that altering 20E signaling in pC1 neurons would affect this phenotype. However, this does not mean that the defects in female receptivity expressed by PTTH mutants are due to defects in pC1 arborization. For this, the authors would at least have to show that PTTH mutants show the changes in pC1 arborization shown in Fig. 6. And even then the most that could be said is that the changes observed in these neurons "may contribute" to the observed behavioral changes. Indeed, the changes observed in female receptivity may be caused by PTTH/20E actions on different neurons.

      C2. As newly formed prepupae, the ptth-Gal4>UAS-Grim flies display similar changes in gene expression to the genetic control flies to response to a high-titer ecdysone pulse. These include the repression of EcR (McBrayer et al., 2007). We tested whether there is a similar feedforward relationship between PTTH and EcR-A. We quantified the EcR-A mRNA level of PTTH -/- and PTTH -/+ in the whole body of newly formed prepupae. Indeed, PTTH -/- induced upregulated EcR-A in the whole body of newly formed prepupae compared with PTTH -/+ flies. We also detected the pattern of pC1 neurons when PTTH is deleted. Consistent with the feedforward relationship between PTTH and expression of EcR-A in newly formed prepupae, PTTH deletion induced less established pC1-d neurons contrary to that induced by EcR-A reduction in pC1 neurons. 

      However, it is not sure that the expression of EcR-A in pC1 neurons increases compared with genetic controls when PTTH is deleted. Furthermore, on the one hand, manipulation of PTTH has general effect on the neurodevelopment. On the other hand, the detailed pattern of pC1-b neurons which is the key subtype regulating female receptivity through EcR-A function in pC1 neurons could not be seen clearly. So, the abnormal development of pC1b neurons, if this is true, is just one of the possible reasons for the effect of PTTH deletion on female receptivity.

      (3) Some of the results need commenting on, or refining, or revising:  a- For some assays PTTH behaves sometimes like a recessive gene and at other times like a semidominant, and yet at others like a dominant gene. For instance, in Fig. 1D-G, PTTH[-]/+ flies behave like wildtype (D), express an intermediate phenotype (E-F), or behave like the mutant (G). This may all be correct but merits some comment.

      C3. Female receptivity increases with the increase of age after eclosion, not only for wild type flies but also PTTH mutants. At the first day after eclosion (Figure 1D), maybe the loss of PTTH in PTTH[-]/+ flies is not enough for sexual precocity as in PTTH -/-. At the second day after eclosion and after (Figure 1E-G), the loss of PTTH in PTTH[-]/+ flies is sufficient to enhance female receptivity compared with wild type flies. However, After the 2nd day of adult, female receptivity of all genotype flies increases sharply. At the 3rd day of adult and after, female receptivity of PTTH -/- reaches the peak and the receptivity of PTTH[-]/+ reaches more nearly to PTTH -/- when flies get older.  

      b - Some of the conclusions are overstated. i) Although Fig. 2E-G does show that silencing the PTTH neurons during the larval stages affects copulation rate (E) the strength of the conclusion is tempered by the behavior of one of the controls (tub-Gal80[ts]/+, UAS-Kir2.1/+) in panels F and G, where it behaves essentially the same as the experimental group (and quite differently from the PTTH-Gal4/+ control; blue line).(Incidentally, the corresponding copulation latency should also be shown for these data.). ii) For Fig. 5I-K, the conclusion stated is that "Knock-down of EcR-A during pupal stage significantly decreased the copulation rate." Although strictly correct, the problem is that panel J is the only one for which the behavior of the control lacking the RNAi is not the same as that of the experimental group. Thus, it could just be that when the experiment was done at the pupal stage is the only situation when the controls were both different from the experimental. Again, the results shown in J are strictly speaking correct but the statement is too definitive given the behavior of one of the controls in panels I and K. Note also that panel F shows that the UAS-RNAi control causes a massive decrease in female fertility, yet no mention is made of this fact.

      C4. i) For all figures in the text, only when all the control groups were significant different from assay group, we say the assay group is significantly different. In Figure 2E-G, the control groups were both different from the assay group only at the larval stage. The difference between two control groups may due to the genetic background. We have described more detailed statistical analysis in the legend. In addition, the corresponding copulation latency has been shown. ii) For Figure 5, we have revised the conclusion in text as “when the experiment was done at the pupal stage is the only situation when the controls were both different from the experimental.” Besides, the UAS-RNAi control causes a massive decrease in female fertility in panel F has been mentioned.

      Reviewer #3 (Recommendations For The Authors): 

      (1) I am not sure that PTTH neurons should be referred to as "PG neurons". I am aware that this name has been used before but the PG is a gland that does not have neurons; it is not even innervated in all insects. 

      C5. Agree. “PG neurons” has been changed into “PTTH neurons”.

      (2) Fig. 1A warrants some explanation. One can easily imagine what it shows but a description is warranted. 

      C6. Explanation has been added.

      (3) When more than one genotype is compared it would be more useful to use letters to mark the genotypes that are not statistically different from each other rather than simply using asterisks. For instance, in the case of copulation latencies shown in Fig. 1E-G, which result does the comparison refer to? For example, since the comparisons are the result of ANOVAs, which comparison receives "*" in Fig. 1F? Is it PTTH[-]/+ vs PTTH[-]/PTTH[-] or vs. +/+? 

      C7. Referred genotypes and conditions were marked in all figure legends.

      (4) Fig. 1H: Why is copulation latency of PTTH[-]/PTTH[-]+elav-GAL4 significantly different from that of PTTH[-]/PTTH[-]? This merits a comment. Also, why was elav-GAL4 used to effect the rescue and not the PTTH-GAL4 driver? 

      C8. We could not explain this phenomenon. This may due to the different genetic backgrounds between controls. We have mentioned this in figure legend.

      (5) Fig. 2C, the genotype is written in a confusing order, GAL4+UAS should go together as should LexA+LexAop. 

      C9. We have revised for avoiding confusion.

      (6) In Fig. 2, is "larval stage" the same period that is shown in Fig. 3A? Please clarify.

      C10. We have clarified this in text and legends.

      (7) Fig. 6. The fact that pC1 neurons can be labeled using the pC1-ss2-Gal4 at the start of the pupal stage does not mean that this is when these neurons appear (are born), only when they start expressing this GAL4. Other types of evidence would be needed to make a statement about the birthdate of these neurons. 

      C11. We have revised the description for the appearance of pC1-ss2-Gal4>GFP. The detailed birth time of pC1 neurons will be tested in future.

      (8) The results shown in Fig. 7 are not pursued further and thus appear like a prelude to the next manuscript. Unless the authors have more to add regarding the role of one of the differentially expressed genes (e.g., dopamine beta-monooxygenase, which they single out) I would suggest leaving this result out. 

      C12. We have leave this out.

      (9) Female flies lacking PTTH neurons were reported to show lower fecundity by McBrayer et al. (2007) and should be cited. 

      C13. This important study has been cited in the first manuscript. In this revision, we have cited it again when mentioning the lower fecundity of female flies lacking PTTH neurons.

      (10) Line 230: when were PTTH neurons activated? Since they are dead by 10h post-eclosion it isn't clear if this experiment even makes sense. 

      C14. Yes, we did this for making sure that PTTH neurons do not affect female receptivity at adult stage again.

      (11) Line 338: the statements in the figures say that PTTH function is required during the larval stages, not during metamorphosis 

      C15. This has been revised as “The result suggested that EcR-A in pC1 neurons plays a role in virgin female receptivity during metamorphosis. This is consistent with that PTTH regulates virgin female receptivity before the start of metamorphosis.”

      (12) Did the authors notice any abnormal behavior in males? McBrayer et al. (2007) mention that males lacking PTTH neurons show male-male courtship. This may remit to the impact of 20E on other dsx[+] neurons. 

      C16. Yes, we have noticed that males lacking PTTH show male-male courtship. It is possible that PTTH deletion induces male-male courtship through the impact of 20E on other dsx+ or fru+ neurons. We have added the corresponding discussion.

      (13) Line 145: please define CCT at first use 

      C17. CCT has been defined.

      (14) Overall the manuscript is well written; however, it would still benefit from editing by a native English speaker. I have marked a few corrections that are needed, but I probably missed some. 

      + Line 77: "If female is not willing..." should say "If THE female is not willing..." 

      + Line 78 "...she may kick the legs, flick the wings," should say "...she may kick HER legs, flick HER wings," 

      + Lines 93-94 this sentence is unclear: "...while the neurons in that fru P1 promoter or dsx is expressed regulate some aspects..." 

      + Line 108 "...similar as the function of hypothalamic-pituitary-gonadal (HPG).." should say "...similar

      TO the function of hypothalamic-pituitary-gonadal (HPG).." 

      + Line 152 "Due to that 20E functions through its receptor EcR.." should say ""BECAUSE 20E ACTS through its receptor EcR.." 

      + Lines 155, 354 "unnormal" is not commonly used (although it is an English word); "abnormal" is usually used instead. 

      + Line 273: "....we then asked that whether ecdysone regulates" delete "that"  + Sentences lines 306-309 need to be revised.

      C18. Thank you for your suggestions. We have revised as you advise.

    1. Explain how the procedure benefits the students to build buy-in Model good and bad execution Practice, practice, practice

      I totally agree with these three things because I think that the better you can get students to buy-in to what we are doing the better student outcomes will be. Modelling and explaining not just good but also bad execution is helpful because some students may not even realize what they are doing wrong. As for "practice, practice, practice" I believe that practice makes perfect.

    1. Author Response:

      We would like to thank the editors and reviewers for the careful consideration of our manuscript and their many helpful comments. We would like to provide provisional author responses to address the public reviews.

      Response to Reviewer 1:

      Weaknesses:

      While this study convincingly describes the phenotype seen upon Drp1 loss, my major concern is that the mechanism underlying these defects in zygotes remains unclear. The authors refer to mitochondrial fragmentation as the mechanism ensuring organelle positioning and partitioning into functional daughters during the first embryonic cleavage. However, could Drp1 have a role beyond mitochondrial fission in zygotes? I raise these concerns because, as opposed to other Drp1 KO models (including those in oocytes) which lead to hyperfused/tubular mitochondria, Drp1 loss in zygotes appears to generate enlarged yet not tubular mitochondria. Lastly, while the authors discard the role of mitochondrial transport in the clustering observed, more refined experiments should be performed to reach that conclusion.

      It would be difficult to answer from this study whether Drp1 has a role beyond mitochondrial fission in zygotes. However, there are several possible reasons why the Drp1 KO zygotes differs from the somatic cell Drp1 KO models.  

      First, the reviewer mentions that the loss of Drp1 in oocytes leads to hyperfused/tubular mitochondria, but in fact, unlike in somatic cells, the EM images in Drp1 KO oocytes show enlarged mitochondria rather than tubular structures  (Udagawa et al. Current Biology 2014, Fig. 2C and Fig. S1B-D), as in the case of zygotes in this study. 

      These mitochondrial morphologies in Drp1-deficient oocytes/zygotes may be attributed to the unique mitochondrial architecture in these cells. Mitochondria in oocytes have the shape of a small sphere with an irregular cristae located peripherally or transversely. These structural features might be the cause of insensitivity or resistance to inner membrane fusion. In addition, in our previous study (Wakai et al., Molecular Human Reproduction 2014, Fig. 2), overexpression of mitochondrial fusion factors in oocytes resulted in mitochondrial aggregation when outer membrane fusion factor Mfn1/Mfn2 was overexpressed, while overexpression of Opa1 did not cause any morphological changes. Thus, while mitochondria in oocytes/zygotes divide actively, complete fusion, including the inner membrane, as seen in somatic cells, is unlikely to occur.

      As for mitochondrial transport, we do not entirely discard its role. Althogh mitochondrial intrinsic dynamics such as fission are of primary importance for the mitochondrial distribution and partitioning in embryos, the regulation of dynamics by the cytoskeletons may be important and thus needs further study, as the reviewer pointed out.

      Response to Reviewer 2:

      Weaknesses:

      The authors first describe the redistribution of mitochondria during normal development, followed by alterations induced by Drp1 depletion. It would be useful to indicate the time post-hCG for imaging of fertilised zygotes (first paragraph of the results/Figure 1) to compare with subsequent Drp1 depletion experiments.

      We will indicate the time after hCG as the reviewer pointed out. The only problem is that in this experiment, there may be a slight deviation from the actual mitochondrial distribution change (Fig. S1A) due to the manipulation time for Trim-Away (since it was performed outside of the incubator). Also, no significant delay in pronuclear formation or embryonic development was observed with Drp1 depleted zygotes.

      It is noted that Drp1 protein levels were undetectable 5h post-injection, suggesting earlier times were not examined, yet in Figure 3A it would seem that aggregation has occurred within 2 hours (relative to Figure 1).

      As the reviewer pointed out, the depletion of Drp1 is likely to have occurred at an earlier stage. In this study, due to the injection of various RNAs to visualize organelles such as mitochondria and chromosomes, observations were started after about 5 hours of incubation for their fluorescent proteins to be sufficiently expressed. Therefore, for the western blotting analysis, samples were taken into account their condition at the start of the observation.

      Mitochondria appear to be slightly more aggregated in Drp1 fl/fl embryos than in control, though comparison with untreated controls does not appear to have been undertaken. There also appears to be some variability in mitochondrial aggregation patterns following Drp1 depletion (Figure 2-suppl 1 B) which are not discussed.

      We would like to add quantitative data on mitochondrial aggregation in Drp1-depleted embryos.

      The authors use western blotting to validate the depletion of Drp1, however do not quantify band intensity. It is also unclear whether pooled embryo samples were used for western blot analysis.

      We would like to add the quantitative results of the intensity of the bands for the Western blot analysis. The number of embryos analyzed is described in Fig legends, from 20 (Fig. 4) to 30 (Fig. 2) pooled samples were used.

      Likewise, intracellular ROS levels are examined however quantification is not provided. It is therefore unclear whether 'highly accumulated levels' are of significance or related to Drp1 depletion.

      We will present to indicate quantitative results on the accumulation of ROS.

      In previous work, Drp1 was found to have a role as a spindle assembly checkpoint (SAC) protein. It is therefore unclear from the experiments performed whether aggregation of mitochondria separating the pronuclei physically (or other aspects of mitochondrial function) prevents appropriate chromosome segregation or whether Drp1 is acting directly on the SAC.

      It has been reported that Drp1 regulates meiotic spindle through spindle assembly checkpoint (SAC) (Zhou et al., Nature Communications 2022). We would like to mention the possibility pointed out in the discussion part.

      Response to Reviewer 3:

      Seemingly, there are few apparent shortcomings. Following are the specific comments to activate the further open discussion.

      - Line 246: Comments on cristae morphology of mitochondria in Drp1-depleted embryos would better be added.

      We would like to add a comment regarding cristae morphology.

      - Regarding Figure 2H: If possible, a representative picture of Ateam would better be included in the figure. As the authors discussed in line 458, Ateam may be able to detect whether any alterations of local energy demand occurred in the Drp1-depleted embryos.

      ATeam fluorescence is analyzed using a regular fluorescence microscope, not a confocal laser microscope, in order to analyze the intensity in the whole embryo (or the whole blastomere). Therefore, we are currently unable to obtain images of localized areas within the cell (e.g., around the spindle) as expected by the reviewer; as shown in the images in Figure 3-figure supplement 1C, there is a tendency to see high ATP levels at the cell periphery, but further analysis is needed for clear and definitive results.

      - Line 282: In Figure 3-Video 1, mitochondria were seemingly more aggregated around female pronucleus. Is it OK to understand that there is no gender preference of pronuclei being encircled by more aggregated mitochondria?

      Aggregated mitochondria are localized toward the cell center, but do not behave in such a way that they are preferentially concentrated near the female pronucleus.

      - Line 317: A little more explanation of the "variability" would be fine. Does that basically mean that the Ca2+ response in both Drp1-depleted blastomeres were lower than control and blastomere with more highly aggregated mitochondria show severer phenotype compared to the other blastomere with fewer mito?

      We assume that what the reviewer have pointed out is right. However, although we were able to show the bias in Ca2+ store levels between blastomeres of Drp1 depleted embryos, we did not stain mitochondria simultaneously, so we were unable to say details such as more Ca2+ stores in blastomere that inherited more mitochondria or less Ca2+ stores in blastomere with more aggregated mitochondria

      - Regarding Figure 5B (& Figure 1-figure supplement 1B): Do authors think that there would be less abnormalities in the embryos if Drp1 is trim-awayed after 2-cell or 4-cell, in which mitochondria are less involved in the spindle?

      The marked accumulation of mitochondria around the spindle is unique to the first cleavage and seems to be coincident with the migration of the pronuclei toward the center. Since the process of assembly of the male and female pronuclei is also an event unique to the first cleavage, abnormalities such as binucleation due to mitochondrial misplacement are thought to be a phenomenon seen only in the first cleavage. Therefore, if Drp1 is depleted at the 2-cell or 4-cell stage, chromosome segregation errors may be less frequent. However, since unequal partitioning of mitochondria is thought to occur, some abnormalities in embryonic development is likely to be observed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Strengths

      We thank the reviewer for recognizing the strengths of our in vivo Ca2+ measurements, super resolution microscopy and assessment of the secretory dysfunction in the Sjogrens syndrome mouse model.

      Weaknesses

      Point 1: The less restricted Ca2+ signal to the apical region of the acinar cell is not really relevant to the reduced activation of TMEM16a by a local signal at the apical plasma membrane.

      We agree that the spatially averaged Ca2+ signal is not indicative of the local Ca2+ signal that activates TMEM16a. The description of the disordered Ca2+ signal in the disease model was intended to simply convey that the Ca2+ signal is altered in the model. Whether or indeed how the altered spatial characteristics of the signal are deleterious is not known but we speculate in the discussion that this contributes to the ultrastructural damage observed.

      Point 2. Secretion is decreased but the amplitude of the globally averaged Ca2+ signals are increased. No proof is offered that the greater distance between IP3R and TMEM16a is the reason for decreased secretion in the face of this increased peak signal.

      We have now added new data that indicates that the local Ca2+ signal is indeed disrupted in the disease model. We show that in control animals, activation of TMEM16a by application of agonist occurs when the pipette is buffered with the slower buffer EGTA but not with the fast buffer BAPTA In contrast, in cells isolated from DMXAA -treated animals both EGTA and BAPTA abolish the agonist-induced currents (new Figure 6). These data are consistent with our super resolution data showing the distance between IP3R and TMEM16a being greaterand thus presumably is enough to allow buffering of Ca2+ release from IP3R such that it does not effectively activate TMEM16a. These data also would suggest that the increased amplitude of the spatially averaged Ca2+ signal is not sufficient to overcome this structural change.

      Point 3. Lack of evidence that the mitochondrial changes are associated with the defect in fluid secretion.

      We agree that a causal link between the decreased secretion and altered mitochondrial morphology and function is not established. Nevertheless, we feel it is reasonable to contend that profound changes in mitochondrial morphology observed at the light and EM level, together with changes in mitochondrial membrane potential and oxygen consumption are consistent with contributing to altered fluid secretion given that this is an energetically costly process. We have altered the discussion to reflect these caveats and ideas.

      Reviewer 2:

      We thank the reviewer for their assessment of our work and constructive comments.

      Reviewer 3:

      We thank the reviewer for their careful appraisal of our manuscript and insightful comments. 

      Point 1: Are all the effects of DMXAA mediated through the STING pathway?

      This is an important point because as noted DMXAA has been reported to inhibit NAD(P)H quinone oxireductase that could contribute to the phenotype reported here. In future studies we intend to test other STING pathway agonists such as MSA-2 and perhaps antagonists of the STING pathway. We have added text to the discussion indicating that all the effects observed may not be a result of activation of the STING pathway.

      Point 2: As noted, and clarified in the text, the driving force for ATP production is the electrochemical H+ gradient which establishes the mitochondrial membrane potential.

      Point 3:  The reviewer suggested there was a decrease mitochondrial membrane potential in the absence of a change in TMRE steady state.

      We apologize for the confusion generated from the presentation of the figure. We normalized TMRE fluorescence against Mitotraker green fluorescence but as shown, the figure does not reflect that the absolute TMRE fluorescence was indeed decreased. Supplemental figure 4 now shows the basal TMRE fluorescence.

      Point 4: Indications that the disruption to ER structure seen in Electron Micrographs contributes to the changes in Ca2+ signal and fluid secretion.

      We did not focus on the relative distance between ER and apical PM in the EMs primarily because the ER that projects towards the apical PM is a relatively minor component of the specialized ER expressing IP3R and is difficult to identify. We note that the disruption of the bulk ER as quantitated by altered ER-mitochondrial interfaces and fragmentation is consistent with our super resolution data and thus likely plays a role in the mechanism that results in dysregulated Ca2+ signals and reduced secretion.

      Recommendations to Authors:

      Reviewing Editor:

      (1) The Editor suggests that we should use the activity of TMEM16a to directly measure the [Ca2+] experienced by the channel.

      We now present new additional data.  First, we show an extended range of pipette [Ca2+] demonstrating identical Ca2+ sensitivity in DMXAA vs vehicle treated cells (Figure 5). Second, importantly, we now present data evaluating the ability of muscarinic stimulation to activate TMEM16a in the presence of either EGTA (slow Ca2+ buffer) or BAPTA (fast Ca2+ buffer). Notably, currents can be stimulated in control cells when the pipette is buffered with EGTA, but not in DMXAA treated cells. BAPTA inhibits activation in both situations (new Figure 6). These data are consistent with TMEM16a being activated by Ca2+ in a microdomain and that this is disrupted in the disease model.   

      (2) The Editor asks whether a decrease in IP3R3 in a subset of the samples could account for the decreased fluid secretion.

      We think this is unlikely given, as noted by the Editor, that a reduction only occurred in a subset of the samples and statistically there was no significant difference to vehicle-treated animals. Moreover, we would note that there is also no difference in the expression of IP3R2 between experimental groups and in studies of transgenic mice where either IP3R2 or IP3R3 were knocked out individually, there was no effect on salivary fluid secretion, indicating that expression of a single subtype can support stimulus-secretion coupling.

      (3) Absolute values for changes in fluorescence (over time) should be included together with SD images.

      These have been added in Figure 3.

      (4) DMXAA has additional effects to STING activation and thus other STING pathway modulators should be used.

      We agree that additional STING agonists should be explored in the future but believe that this is beyond the scope of the present studies. Additional text has been added to the discussion acknowledging the additional targets of DMXAA and that they could contribute to the phenotype.

      (5) No causal link between the observed Ca2+ changes and mitochondrial dysfunction.

      We agree that no experimental evidence is offered to directly support this contention. Nevertheless, dysregulated Ca2+ signals are well-documented to lead to altered mitochondrial structure and function and thus we feel it not unreasonable to speculate that this is a possibility.

      (6) The paper would be improved by directly assessing mechanistic connections between altered Ca2+ signaling and TMEM16a activation.

      We agree, please refer to point 1 and new figure 6.

      Reviewer 1:

      (1) Standard Deviation images should be explained and the location of ROI identified.

      We contend that Standard Deviation images provide an effective visualization (in a single image) of both the magnitude of the Ca2+ increase and the degree of recruitment of cells in the field of view during the entire period of stimulation.  We have added text to describe the utility of this technique. Nevertheless, we now show kinetic traces of the changes in fluorescence over time in both apical and basal regions in Figure 3. We also clarify that the traces shown in Figure 2 are averaged over the entire cell. 

      (2) The Authors should consider that reduced secretion is because cells are dying.

      We believe this is unlikely given the lack of morphological changes in glandular structure and the minor lymphocyte infiltration observed in this model. Nevertheless, we now add data showing that the mass of SMG is not altered in the DMXAA-treated animals compared with vehicle-treated (Figure 1E).

      (3) The role of mitochondria in the DMXAA phenotype is unclear. What is the effect of acutely de-energizing mitochondria on fluid secretion.

      Since fluid secretion is an energetically expensive undertaking, it is not unreasonable to suggest that compromised mitochondrial function may impact secretion. That being said this could occur at multiple levels- production of ATP to fuel the Na/K pump to establish membrane gradients or to provide energy to sequester Ca2+ among a multitude of targets. This will be a subject of ongoing experiments. We contend that experiments to acutely disrupt salivary mitochondria in vivo while assessing fluid secretion would be difficult experiments to perform and interpret given that local administration of agents to SMG would not effect the other major salivary glands and systemic administration would be predicted to have wide-ranging off target effects. 

      (4) Could a subset of cells with low IP3R numbers contribute to reduced fluid secretion?

      Please see the response to Reviewing Editors point 2. 

      (5) An attempt to estimate the effect of the spatial distruption of IP3R and TMEM16a localization should be made.

      Please see the response to Reviewing Editors point 1.

      Minor Points

      We have amended the statement form “Highly expressed” to increased.

      Regions of the cell have been labelled for orientation in the line scans.

      The molecular weight markers have been added in Figure 4.

      Reviewer 2:

      (1) Whether mitochondrial dysfunction is the initiator of the phenotype or a result of the dysregulated Ca2+ signal is unclear.

      We agree that our data does not clarify a classic “Chicken vs Egg” conundrum. We plan further experiments to address this issue. Future plans include repeating the mitochondrial and Ca2+ signaling experiments at earlier time points where we know fluid secretion is not yet impacted. This may potentially reveal the temporal sequence of events. Similarly, we plan experiments to mechanistically address why the global Ca2+ signal is augmented- reduced Ca2+ clearance or enhanced Ca2+ release/influx are possibilities. We speculate that reduced Ca2+ clearance, either because mitochondrial Ca2+ uptake is reduced or as a secondary consequence of reduced ATP levels on SERCA and PMCA is a likely possibility.

      (2) Measurement of ECAR and direct measurements of ATP and Seahorse methods.

      In a separate series of experiments, we monitored ECAR. These data were unfortunately very variable and difficult to interpret, although no obvious compensatory increase was observed. We plan in the future to directly monitor ATP levels in acinar cells using Mg-Green. To normalize for cell numbers in the Seahorse experiments, following centrifugation, cell pellets of equal volume were resuspended in equal volumes of buffer. Acinar cells were seeded onto Cell Tak coated dishes. This information is added to the Methods section.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): Summary: To explore the relationship between histone post-translational modifications (H3K4me3 and H3K27me3) and enhancer activation with gene expression during early embryonic development, the authors used a monolayer differentiation approach to convert mouse embryonic stem cells (ESCs) into Anterior Definitive Endoderm (ADE). They monitored differentiation stages using a dual reporter mESC line (B6), which has fluorescent reporters inserted at the Gsc (GFP) and Hhex (Redstar) loci. Their analyses indicate that the differentiating cells advanced through stages similar to those in the embryo, successfully converting into endoderm and ADE with high efficiency. This is elegant and well performed stem cell biology.

      Their subsequent genome-wide and nascent transcription analyses confirmed that the in vitro gene expression changes correlated with developmental stages and confirmed that transcriptional activation precedes mRNA accumulation. They then focussed on linking active enhancers and histone modifications (H3K4me3 and H3K27me3) were with gene expression dynamics. Finally, the performed PRC2 inhibition and showed that, while it enhanced differentiation efficiency, it also induced ectopic expression of non-lineage specific genes.

      Major comments: In terms of mechanistic advances, they propose that transcriptional up-regulation does not require prior loss of H3K27me3, which they show appears to lag behind gene activation, but critically, on a likely mixed population level. I am sceptical of their interpretation of their data because they are looking at heterogenous populations of cells. To explain, one could imagine a particular H3K27me3 coated gene that gets activated during differentiation. In a population of differentiating cells, while the major sub-population of cells could retain H3K27me3 on this particular gene when it is repressed, a minority sub-population of cells could have no H3K27me3 on the gene when it is actively transcribed. The ChIP and RNA-seq results in this mixed cell scenario would give the wrong impression that the gene is active while retaining H3K27me3, when in reality, it's much more likely that the gene is never expressed when its locus in enriched with the repressive H3K27me3 modification. Therefore, to support their claim, they would have to show that a particular gene is active when its locus is coated with H3K27me3. Personally, I don't feel this approach would be worth pursuing.

      They also report that inhibition of PRC2 using EZH2 inhibitor (EPZ6438) enhanced endoderm differentiation efficiency but led to ectopic expression of pluripotency and non-lineage genes. However, this is not surprising considering the established role of Polycomb proteins as repressors of lineage genes.

      Reviewer #1 (Significance (Required)): I feel that this is a solid and well conducted study in which the authors model early development in vitro. It should be of interest to researchers with an interest in more sophisticated in vitro differentiation systems, perhaps to knockout their gene of interest and study the consequences. However, I don't see any major mechanistic advances in this work.

      *>Author Response *

      *We agree with the point regarding the delayed loss of H3K27me3 relative to gene activation, and indeed this same point has been raised by reviewer 3 (see below). Our cell-population based data does not allow us to directly test if gene up-regulation in a small population of cells from TSSs lacking H3K27me3, accounts for the observed result. Furthermore, there are currently no robust methods to determine cell- or allele-specific expression simultaneously with ChIP/Cut and Run for chromatin marks. However, we provide the following additional evidence that strongly supports our conclusions. *

      • *

      Our FACs isolation strategy used to prepare cell populations for ChIP, microarray expression and 4sU-seq analysis is based on expression (or lack thereof) of a fluorescent GSC-GFP reporter. This means that every cell in the G+ populations express the Gsc fluorescent reporter, at least at the protein level, at the point of isolation. This is despite the presence of appreciable and invariant levels of H3K27me3 at the TSS of the Gsc gene in both G+ and G- populations at day 3 of differentiation. Comparable to our meta-analysis of all upregulated genes shown in the original manuscript (Figure 5 and S5), H3K27me3 levels are then subsequently reduced in the G+ relative to the G- populations at day 4. The transcriptional changes which correspond to the GSG-GFP reporter expression and associated ChIP-seq data are shown in the reviewer figure (Fig R1 A shown in revision plan). To further support our observations, we sought to rule out the possibility that the shift in H3K27me3 and transcription were from mutually exclusive gene sets, from nominal transcription levels or from sites with low level H3K27me3. To do this with a gene set of sufficient size to yield a robust result, we selected upregulated TSSs that had a greater than median value for both transcription (4sU-seq) and H3K27me3 (n=49 of 159 genes; Fig R1 B shown in revision plan). Meta-analysis of these genes showed that, as for all upregulated gene TSS (n=159), transcriptional activation occurred in the presence of substantial and invariant levels of H3K27me3 at day 3 followed by a subsequent reduction by day 4 of differentiation (Fig R1 C shown in revision plan). Importantly, many of these genes yielded high absolute 4sU-seq signal, comparable to that of Gsc, arguing against transcriptional activation being limited to a small subpopulation of cells.

      • *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): In this paper the authors profile gene expression, including active transcription, and histone modifications (k4 and k27me3) during a complex differentiation protocol from ES cells, which takes advantage of FACS sorting of appropriate fluorescent reporters. The data is of good quality and the experiments are well performed. The main conclusion, that the analyzed histone marks channel differentiation more than they directly allow/block it, is well supported by the data. The paper is interesting and will represent a good addition to an already extensive literature. I have however a few major concerns, described below:

      1/ K4me3 may show more changes than they interpret, at least over the +1 nucl. An alternative quantification to aggregate profiles should be used to more directly address the questions regarding the correlations between histone mods and gene expression.

      *>Author Response *

      *Whilst we state that H3K4me3 levels are somewhat invariant at differentially expressed genes relative to H3K27me3, quantification of individual TSS (+/- 500 bp) did show a direct correlation with gene expression (Figure 5 and S5). To further explore this in response to the reviewer’s comment we will quantify K4me3 signal at the +1 nucleosome to determine if this yields more substantial differences than that observed more broadly across TSSs. *

      2/ Related to the previous point, it appears clear in Fig.4 that the promoters of each gene expression cluster do not belong to a single chromatin configuration. I think it would be important to: 1/ cluster the genes based on promoter histone mods and interrogate gene expression and cluster allocation (basically the reverse to what is presented) 2/ order the genes in the heatmaps identically for K4me3 and K27me3 to more easily understand the respective chromatin composition per cluster

      >Author Response

      We thank the reviewer for these suggestions and will include these analyses in a revised manuscript.

      3/ Also, as it is apparent that not all promoters in every cluster are enriched for the studied marks, could the authors separately analyze these genes? What are they? Do they use alternative promoters?

      >Author Response

      *Indeed, this is the case. Whilst there is significant enrichment of H3K27me3 at the TSS of developmentally regulated genes, not all genes whose expression changes during the differentiation will be polycomb targets. We will further stratify these clusters as suggested and determine what distinguishes the subsets. If informative, this data will be included in a revised manuscript. *

      4/ The use of 4SU-seq to identify active enhancers is welcome; however, I have doubts it is working very efficiently: for instance, in the snapshots shown in Fig.2A, the very active Oct4 enhancers in ES cells are not apparent at all... More validation of the efficiency of the approach seems required.

      >Author Response

      The 4sU-seq data shown in Figure 2A was generated in samples isolated from day 3 and 4 of the ADE differentiation. It is therefore likely that the enhancers have been partly or wholly decommissioned at this point. Indeed, in a separate study we generated 4sU-seq data using the same protocol and conditions as presented here but in ES cells and differentiated NPCs (day 3 to 7) and indeed see transcription at Oct 4 enhancers in ESCs (arrowed in the screenshot shown in revision plan) which are extinguished upon differentiation to neural progenitor cells (NPCs); data from PMID: 31494034).

      5/ The effects of the EZH2 inhibitor are quite minor regarding the efficiency of the differentiation as analyzed by FACS, despite significant gene expression changes. To the knowledge of this referee, this is at odds with results obtained with Ezh2 ko ES cells that display defects in mesoderm and endoderm differentiation. I have issues reconciling these results (uncited PMID: 19026780). Either the authors perform more robust assays (inducible KOs) or they more directly explain the limitations of the study and the controversies with published work.

      >Author Response

      We agree that this result appears to be at odds with the findings in (PMID: 19026780*). This is likely due to the fact that we are acutely reducing H3K27me3 levels for a short period either during or immediately preceding the differentiation rather than removing PRC2 function genetically. This, likely provides a less pronounced defect on the ability to generate endodermal cells. However, we cannot address this without further experimentation which is beyond the scope of this study. We will more fully discuss the results in the context of this and other studies and discuss the limitations of the study in this regard. *

      Minor 1/ please add variance captured to PCA plots 2/ Fig1E add color scales to all heatmaps 3/ Fig4C,D are almost impossible to follow, please find a way to identify better the clusters/samples and make easier to correlate all the variables

      • *

      >Author Response

      *We will address all of these points in a revised manuscript. *

      Reviewer #2 (Significance (Required)):

      The paper is incremental in knowledge, and not by a big margin, as it is known already that histone mods rather channel than drive differentiation. Though, the authors do not clearly address inconsistencies with published work, especially regarding Ezh2 thought to be important to make endoderm. It is however a good addition to current knowledge, provided a better discussion of differences with published work is provided.

      >Author Response

      *As outlined above, we will address this with a more complete discussion about the distinction between the studies and what can and can’t be concluded from our approach. *

      * *

      • *

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): This study investigates the role of chromatin-based regulation during cell fate specification. The authors use an ESC model of differentiation into anterior primitive streak and subsequently definitive endoderm, which they traced via a dual-reporter system that combines GSC-GFP and HHEX-RedStar. The authors mapped changes in (nascent) gene expression and histone modifications (H3K4me3/H3K27me3) at key timepoints and within different populations over six days of differentiation. Finally, the authors test the functional implications of H3K27me3 landscapes via PRC2 inhibition.

      The majority of data chart the descriptive changes in (epi)genomic and transcriptional dynamics coincident with cell differentiation. The use of nascent transcriptomics improves the temporal resolution of expression dynamics, and is an important strategy. By and large the data reinforce established paradigms. For example, that transcription is the dominant mechanism regulating mRNA levels, or that dynamic chromatin states changes occur and largely corelate of gene activity. They also identify putative enhancers with profiling data, albeit these are not validated, and confirm that PRC2 inhibition impacts cell fate processes - in this case promoting endodermal differentiation efficiency. Overall, the study is relatively well-performed and clearly written, with the omics profiling adding more datasets from in vitro cell types that can be difficult to characterise in vivo. Whilst the majority of the study may be considered incremental, the key finding is the authors conclusion that H3K27me3 is subordinate to gene activity rather than an instructive repressor. If borne out, this would mark an important observation with broad implications. However, in my view this conclusion is subject to many confounders and alternative interpretations, and the authors have not ruled out other explanations. Given the centrality of this to the novelty of the study, I would encourage further analysis/stratification of existing data, and potentially further experiments to provide more confidence in this key conclusion.

      Primary issue 1.) The authors show that at the earliest timepoint (d3), nascent gene activation of a handful of genes between G+ and G- populations is not associated with a FC loss of H3K27me3. From this the authors extrapolate their key conclusion that H3K27me3 is subordinate. Causality of chromatin modifications in gene regulation is critical to decipher, and therefore this is an important observation to confirm. Below I go through the possible confounders and issues with the conclusion at this point.

      (i) Single-cell penetrance. A possible (likely?) possibility is that gene activation initially occurs in a relatively small subset of cells at d3. Because these genes are expressed lowly prior to this, they will register as a significant upregulation in bulk analysis. However, in this scenario H3K27me3 would only be lost from a small fraction of cells, which would not be detectable against a backdrop of most cells retaining the mark. In short, the authors have not ruled out heterogeneity driving the effect. Given the different dynamic range of mRNA and chromatin marks, and that a small gain from nothing (RNA) is easier to detect than a small loss from a pre-marked state (chromatin), investigating this further is critical to draw the conclusions the authors have.

      (ii) Initial H3K27me3 levels. The plots in Fig 5 show the intersect FC of H3K27me3 and gene expression. Genes that activate at d3 show no loss of H3K27me3. However, it is important to characterise (and quantitate) whether these genes are significantly marked by H3K27me3 in the first place, which I could not find in the manuscript. Many/several of the genes may not be polycomb marked or may have low levels to begin with. This would obviously confound the analysis, since an absence/low K27 cannot be significantly lost and is unlikely to be functional. Thus, the DEG geneset should be further stratified into H3K27me3+ and K27me3- promoter groups/bins, with significance and conclusions based on the former only (e.g. boxplot in 5F).

      (iii) Sample size. The conclusions are based on a relatively small number of genes that upregulate between G+ and G- (n=55 in figure by my count, text mentions n=52). Irrespective of the other confounders above, this is quite a small subset to make the sweeping general conclusion that "loss of the repressive polycomb mark H3K27me3 is delayed relative to transcriptional activation" in the abstract. Indeed, the small number of DEG suggests the cell types being compared are similar and perhaps therefore have specific genomic features (this could be looked at) that drive .

      >Author Response

      *These are very good points and are also raised by reviewer 1 (see above). We have one example where we can definitively interrogate single cell protein expression, in our current data. Gsc (as monitored by GSC-GFP FACS and the bulk RNA analysis) meets the criteria of being robustly upregulated in all FACs sorted cells in the presence of high levels of H3K27me3 in the D3G+ population. We believe that the additional analysis (Figure R1A shown in revision plan) and the discussion above addresses the reviewer’s concerns about both the levels of expression and magnitude of H3K27me3. With respect to the third point, the numbers are low (although here I present data from the 4SU analysis with approximately three times more data points) however, the point here is not too say this happens in every instance of gene activation but more that it can happen and not just at a small subset of outlier genes. This is important, as the reviewer notes, in our understanding of how polycomb repression is relieved during development. We will also look to see if there are sequence characteristics/ motifs of these genes. In a revised manuscript we would include this data and further analysis as outlined above. The reviewer points out that the numbers vary a little between analyses. This arises due to the annotation of multiple TSSs per genes in some cases. This will be rectified throughout and made clearer in the legends. *

      Other comments: 2.) The authors show that promoter H3K4me3 corelates well with gene expression dynamics in their model. They conclude that "transcription itself is required for H3K4me3 deposition", or in other words is subordinate. This may well be the case but from their correlative data this cannot be inferred. Indeed, several recent and past papers have shown that H3K4me3 itself can directly modulate transcription, for example by triggering RNA II pause-release, by preventing epigenetic silencing and/or by recruiting the PIC. The authors could point out or discuss these alternative possibilities to provide a more balanced discourse.

      >Author Response

      We agree and this will be discussed more thoroughly and both possibilities put forward in the revised manuscript.

      3.) The labelling of some figures is unclear. In Fig 4C and 4D (right) it is impossible to tell what sample each of the lines represents. It is also not clear what the blue zone corresponds to in genome view plots (the whole gene?). Moreover, the replicate numbers are not shown in figure legends.


      >Author Response

      *We agree that the data presented in 4C and D is unclear. We will, as a minimum, collapse profiles into like populations (ESC / G- / G+ / G+H- / G+H+) which makes sense given the similarity of these populations across all analyses (see e.g. PCA analysis in Figure 1). We will also explore alternative ways of presenting the data to better highlight the dynamics and incorporate this with the changes suggested by reviewer 2. The blue shaded area represents the full extent of the key gene being discussed in the screen shot, this is mentioned in the legend but will be made clearer in a revised manuscript. Replication will also be added to the legend throughout (n=2 for ChIP-seq and n=3 for 4sU-seq). *

      4.) It would be nice to provide more discussion to reconcile the conclusions that H3K27me3 in endoderm differentiation is subordinate and the final figure showing inhibiting H3K27me3 has a significant effect on differentiation, since the latter is the functional assessment.

      >Author Response

      *We will build on the points already made that suggests that whilst K27me3 is a passive repressor that serves to act against sub-threshold activating cues, it is nonetheless a critical regulator of developmental fidelity. *

      Reviewer #3 (Significance (Required)): Overall, the study's strengths are in that it characterises epigenomic dynamics within a specific and relevant cell fate model. The nascent transcriptomics adds important resolution, and underpins the core conclusions. The weakness is that data is over-interpreted at this point, and other possibilities are not adequately tested. The conclusions should therefore either be scaled back (which reduces novelty) or further analysis and/or experiments should be performed to support the conclusion. If it proves correct, this would be a significant observation for the community,

      >Author Response

      In a revised manuscript, we will address the reviewer’s concerns with additional data and discussion as indicated above.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      My main concern is still in place. It is unclear whether the proposed method can find actual goal states, and as a result it is unclear what states it finds. Table S1 mentions the model BIOMD0000000454, which is a small metabolic pathway with known equations given in "Example One" in "Metabolic Control Analysis: Rereading Reder". In this model the goal states can be calculated analytically.

      Regarding your statements below: I am not concerned that your method will be less efficient than random search (or any other search..) on small models, but I think it is important for the readers to have evidence that your method is able to discover true goal states at least in small networks, used in your study. You do show that your method scales to complex models. So, in my opinion, the missing part is to show that it is able to find true goal states.

      "...For simple models whose true steady-state distribution can be derived numerically and/or analytically, it is very likely that their exploration will be much simpler and this is not where a lot of improvement over random search may be found, which explains our focus on more complex models..."

      We thank you for your response and for your concerns on the lack of evidence that our method is able to re-discover the true goal states of simple models when these are known a priori. We acknowledge that adding these simple cases is useful for completeness. We did not include these simple models in our main study because in most cases a basic random search over the initial conditions will lead to the re-discovery of these goal states. For instance for the mentioned model BIOMD0000000454 described in the "Example One" from the "Metabolic Control Analysis: Rereading Reder" paper, several simplifying assumptions are made such that the system only has one steady state (x1=0.056, x2=0.769, x3=4.231) which can be found analytically as shown in the paper. In that simple case, this goal state is also straightforward to find with numerical simulation as any valid initial condition will converge to it.

      To address the concerns of the reviewer, we propose to add an additional "sanity check" figure in the supplementary of the revised paper (Figure S4), as well as a “sanity check” subsection in the “Methods”, to present additional experiments made on  simple models such as this one. The novel figure and subsection can be visualized on the paper’s interactive version available online https://developmentalsystems.org/curious-exploration-of-grn-competencies, and we plan to include them as such in the further revision.  We have also included the full code to reproduce this sanity check as a ‘sanity_check.ipynb’  jupyter notebook in the github repository (https://github.com/flowersteam/curious-exploration-of-grn-competencies/blob/main/notebooks/sanity_check.ipynb).

      In the novel figure S4-b, we show the results of our exploration pipeline on the suggested model BIOMD0000000454 as described in the "Example One" of the paper. These results provide evidence that the curiosity search is able to find back the correct unique goal state (x1=0.056, x2=0.769, x3=4.231), as expected.

      We also include a second sanity check on BIOMD0000000341 which models the dynamics of beta-cell mass, insulin and glucose dynamics. This model has two stable fixed points representing physiological (B=300, I=10, G=100) and pathological (B=0, I=0, G=600) steady states, which are the known ground truth steady states as described in Figure 3 of the "A Model of b-Cell Mass, Insulin, and Glucose Kinetics: Pathways to Diabetes" paper. Again, as expected, curiosity search is able to find back those two steady states (Figure S4-a).

      As stated in our previous answer, our main study focuses on more complex models that are not limited to one or few attractors that can easily be discovered with random initial conditions. Regarding the mentioned BIOMD0000000454, maybe something that has been confusing for the reviewer is that we indeed included it in our main study but, as specified in the caption of table S4, at the difference of what is done in the "example one" of the original paper, we let the metabolite concentrations y1,...,y5 evolve in time (instead of enforcing them as constants). When doing so, the resulting dynamics of the system are more complex and exhibit a spectrum of possible steady states (unknown a priori), which differ from the previous case with a single steady state. In that case, the new attractors are not analytically easy to find and the proposed curiosity search becomes interesting as it is able to uncover the distribution of possible steady states much more efficiently than a random search baseline, as shown in the new figures S4-c and S4-d.

      We hope that these new results will address the reviewer’s concerns and provide evidence to the readers on the validity of the approach on simple networks.

      eLife assessment

      This important study develops a machine learning method to reveal hidden unknown functions and behavior in gene regulatory networks by searching parameter space in an efficient way. The evidence for some parts of the paper is still incomplete and needs systematic comparison to other methods and to the ground truth, but the work will be of broad interest to anyone working in biology of all stripes since the ideas reach beyond gene regulatory networks to revealing hidden functions in any complex system with many interacting parts.

      We thank the editors and reviewers for their positive assessment and constructive suggestions. In our response, we acknowledge the importance of systematic comparison to other methods and to the ground truth, when available. However we also emphasize the challenges associated with evaluating such methods in the context of uncovering hidden behaviors in complex biological networks as the ground truth is often unknown. We hope that our explanations will clarify the potential of our approach in advancing the exploration of these systems.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: This paper suggests to apply intrinsically-motivated exploration for the discovery of robust goal states in gene regulatory networks.

      Strengths:

      The paper is well written. The biological motivation and the need for such methods are formulated extraordinarily well. The battery of experimental models is impressive.

      We thank the reviewer for sharing interest in the research problem and for recognizing the strengths of our work.

      Weaknesses:

      (1) The proposed method is compared to the random search. That says little about the performance with regard to the true steady-state goal sets. The latter could be calculated at least for a few simple ODE (e.g., BIOMD0000000454, `Metabolic Control Analysis: Rereading Reder'). The experiment with 'oscillator circuits' may not be directly interpolated to the other models.

      The lack of comparison to the ground truth goal set (attractors of ODE) from arbitrary initial conditions makes it hard to evaluate the true performance/contribution of the method. A part of the used models can be analyzed numerically using JAX, while there are models that can be analyzed analytically.

      "...The true versatility of the GRN is unknown and can only be inferred through empirical exploration and proxy metrics....": one could perform a sensitivity analysis of the ODEs, identifying stable equilibria. That could provide a proxy for the ground truth 'versatility'.

      We agree with the reviewer that one primary concern is to properly evaluate the effectiveness of the proposed method. However, as we move toward complex pathways, knowledge of the “true” steady-state goal sets is often unknown which is where the use of machine learning methods as the one we propose are particularly interesting (but challenging to evaluate).

      For simple models whose true steady-state distribution can be derived numerically and/or analytically, it is very likely that their exploration will be much simpler and this is not where a lot of improvement over random search may be found, which explains our focus on more complex models. While we agree that it is still interesting to evaluate exploration methods on these simple models for checking their behavior, it is not clear how to scale this analysis to the targeted more complex systems.

      For systems whose true steady state distribution cannot be derived analytically or numerically, we believe that random search is a pertinent baseline as it is commonly used in the literature to discover the attractors/trajectories of a biological network. For instance, Venkatachalapathy et al. [1] initialize stochastic simulations at multiple randomly sampled starting conditions (which is called a kinetic Monte Carlo-based method) to capture the steady states of a biological system. Similarly, Donzé et al. [29] use a Monte Carlo approach to compute the reachable set of a biological network «when the number of parameters  is large and their uncertain range  is not negligible». For the considered models, the true steady-state goal set is unknown, which is why we chose comparison with random search. We added a “Statistics” subsection in the Methods section providing additional details about the statistical analyses we perform between our method and the random search baseline.

      (2) The proposed method is based on `Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning', which assumes state action trajectories [s_{t_0:t}, a_{t_0:t}], (2.1 Notations and Assumptions' in the IMGEP paper). However, the models used in the current work do not include external control actions, but rather only the initial conditions can be set. It is not clear from the methods whether IMGEP was adapted to this setting, and how the exploration policy was designed w/o actual time-dependent actions. What does "...generates candidate intervention parameters to achieve the current goal....", mean considering that interventions 'Sets the initial state...' as explained in Table 2?

      We thank the reviewer for asking for clarification, as indeed the IMGEP methodology originates from developmental robotics scenarios which generally focus on the problem of robotic sequential decision-making, therefore assuming state action trajectories as presented in Forestier et al. [65]. However, in both cases, note that the IMGEP is responsible for sampling parameters which then govern the exploration of the dynamical system. In Forestier et al. [65], the IMGEP also only sets one vector at the start (denoted ) which was specifying parameters of a movement (like the initial state of the GRN), which was then actually produced with dynamic motion primitives which are dynamical system equations similar to GRN ODEs, so the two systems are mathematically equivalent. More generally, while in our case the “intervention” of the IMGEP (denoted ) only controls the initial state of the GRN, future work could consider more advanced sequential interventions simply by setting parameters of an action policy  at the start which could be called during the GRN’s trajectory to sample control actions  where  would be the state of the GRN. In practice this would also require setting only one vector at the start, so it would remain the same exploration algorithm and only the space of parameters would change, which illustrates the generality of the approach.

      (3) Fig 2 shows the phase space for (ERK, RKIPP_RP) without mentioning the typical full scale of ERK, RKIPP_RP. It is unclear whether the path from (0, 0) to (~0.575, ~3.75) at t=1000 is significant on the typical scale of this phase space. is it significant on the typical scale of this phase space?

      The purpose of Figure 2 is to illustrate an example of GRN trajectory in transcriptional space, and to illustrate what “interventions” and “perturbations” can be in that context. To that end we have used the fixed initial conditions provided in the BIOMD0000000647, replicating Figure 5 of Cho et al. [56].

      While we are not sure of what the reviewer means with “typical” scale of this phase space, we would like to point reviewer toward Figure 8 which shows examples of certain paths that indeed reach further point in the same phase space (up to ~10 in RKIPP_RP levels and ~300 in ERK levels). However, while the paths displayed in Figure 8 are possible (and were discovered with the IMGEP), note that they may be “rarer” to occur naturally  in the sense that a large portion of the tested initial conditions with random search tend to converge toward smaller (ERK, RKIPP_RP) steady-state values similar to the ones displayed in Figure 2.

      (4) Table 2:

      a. Where is 'effective intervention' used in the method?

      b. in my opinion 'controllability', 'trainability', and 'versatility' are different terms. If their correspondence is important I would suggest to extend/enhance the column "Proposed Isomorphism". otherwise, it may be confusing.

      a) We thank the reviewer for pointing out that “effective intervention” is not explicitly used in the method. The idea here is that as we are exploring a complex dynamical system (here the GRN), some of the sampled interventions will be particularly effective at revealing novel unseen outcomes whereas others will fail to produce a qualitative change to the distribution of discovered outcomes. What we show in this paper, for instance in Figure 3a and Figure 4, is that the IMGEP method is particularly sample-efficient in finding those “effective interventions”, at least more than a random exploration. However we agree that the term “effective intervention” is ambiguous (does not say effective in what) and we have replaced it with “salient intervention” in the revised version.

      b) We thank the reviewer for highlighting some confusing terms in our chosen vocabulary, and we have clarified those terms in the revised version. We agree that controllability/trainability and versatility are not exactly equivalent concepts, as controllability/trainability typically refers to the amount to which a system is externally controllable/trainable whereas versatility typically refers to the inherent adaptability or diversity of behaviors that a system can exhibit in response to inputs or conditions. However, they are both measuring the extent of states that can be reached by the system under a distribution of stimuli/conditions, whether natural conditions or engineered ones, which is why we believe that their correspondence is relevant.

      I don't see how this table generalizes "concepts from dynamical complex systems and behavioral sciences under a common navigation task perspective".

      We have replaced the verb “generalize” with “investigate” in the revised version.

      Reviewer #2 (Public Review):

      Summary:

      Etcheverry et al. present two computational frameworks for exploring the functional capabilities of gene regulatory networks (GRNs). The first is a framework based on intrinsically-motivated exploration, here used to reveal the set of steady states achievable by a given gene regulatory network as a function of initial conditions. The second is a behaviorist framework, here used to assess the robustness of steady states to dynamical perturbations experienced along typical trajectories to those steady states. In Figs. 1-5, the authors convincingly show how these frameworks can explore and quantify the diversity of behaviors that can be displayed by GRNs. In Figs. 6-9, the authors present applications of their framework to the analysis and control of GRNs, but the support presented for their case studies is often incomplete.

      Strengths:

      Overall, the paper presents an important development for exploring and understanding GRNs/dynamical systems broadly, with solid evidence supporting the first half of their paper in a narratively clear way.

      The behaviorist point of view for robustness is potentially of interest to a broad community, and to my knowledge introduces novel considerations for defining robustness in the GRN context.

      We thank the reviewer for recognizing the strengths and novelty of the proposed experimental framework for exploring and understanding GRNs, and complex dynamical systems more generally. We agree that the results presented in the section “Possible Reuses of the Behavioral Catalog and Framework” (Fig 6-9) can be seen as incomplete along certain aspects, which we tried to make as explicit as possible throughout the paper, and why we explicitly state that these are “preliminary experiments”. Despite the discussed limitations, we believe that these experiments are still very useful to illustrate the variety of potential use-cases in which the community could benefit from such computational methods and experimental framework, and build on for future work.

      Some specific weaknesses, mostly concerning incomplete analyses in the second half of the paper:

      (1) The analysis presented in Fig. 6 is exciting but preliminary. Are there other appropriate methods for constructing energy landscapes from dynamical trajectories in gene regulatory networks? How do the results in this particular case study compare to other GRNs studied in the paper?

      We are not aware of other methods than the one proposed by Venkatachalapathy et al. [1] for constructing an energy landscape given an input set of recorded dynamical trajectories, although it might indeed be the case. We want to emphasize that any of such methods would anyway depend on the input set of trajectories, and should therefore benefit from a set that is more representative of the diversity of behaviors that can be achieved by the GRN, which is why we believe the results presented in Figure 6 are interesting. As the IMGEP was able to find a higher diversity of reachable goal states (and corresponding trajectories) for many of the studied GRNs, we believe that similar effects should be observable when constructing the energy landscapes for these GRN models, with the discovery of additional or wider “valleys” of reachable steady states.

      Additionally, it is unclear whether the analysis presented in Fig. 6C is appropriate. In particular, if the pseudopotential landscapes are constructed from statistics of visited states along trajectories to the steady state, then the trajectories derived from dynamical perturbations do not only reflect the underlying pseudo-landscape of the GRN. Instead, they also include contributions from the perturbations themselves.

      We agree that the landscape displayed Fig. 6C integrates contributions from the perturbations on the GRN’s behavior, and that it can shape the landscape in various ways, for instance affecting the paths that are accessible, the shape/depth of certain valleys, etc. But we believe that qualitatively or quantitatively analyzing the effect of these perturbations  on the landscape is precisely what is interesting here: it might help 1) understand how a system respond to a range of perturbations and to visualize which behaviors are robust to those perturbations, 2) design better strategies for manipulating those systems to produce certain behaviors

      (2) In Fig. 7, I'm not sure how much is possible to take away from the results as given here, as they depend sensitively on the cohort of 432 (GRN, Z) pairs used. The comparison against random networks is well-motivated. However, as the authors note, comparison between organismal categories is more difficult due to low sample size; for instance, the "plant" and "slime mold" categories each only have 1 associated GRN. Additionally, the "n/a" category is difficult to interpret.

      We acknowledge that this part is speculative as stated in the paper: “the surveyed database is relatively small with respect to the wealth of available models and biological pathways, so we can hardly claim that these results represent the true distribution of competencies across these organism categories”. However, when further data is available, the same methodology can be reused and we believe that the resulting statistical analyses could be very informative to compare organismal (or other) categories.

      (3) In Fig. 8, it is unclear whether the behavioral catalog generated is important to the intervention design problem of moving a system from one attractor basin to another. The authors note that evolutionary searches or SGD could also be used to solve the problem. Is the analysis somehow enabled by the behavioral catalog in a way that is complementary to those methods? If not, comparison against those methods (or others e.g. optimal control) would strengthen the paper.

      We thank the reviewer for asking to clarify this point, which might not be clearly explained in the paper. Here the behavioral catalog is indeed used in a complementary way to the optimization method, by identifying a representative set of reachable attractors which are then used to define the optimization problem. For instance here, thanks to the catalog, we 1) were able to identify a “disease” region and several possible reachable states in that region and 2) use several of these states as starting points of our optimization problem, where we want to find a single intervention that can successfully and robustly reset all those points, as illustrated in Figure 8. Please note that given this problem formulation, a simple random search was used as an optimization strategy. When we mention more advanced techniques such as EA or SGD, it is to say that they might be more efficient optimizers than random search. However, we agree that in many cases optimizing directly will not work if starting from random or bad initial guess, and this even with EA or SGD. In that case the discovered behavioral catalog can be useful to better initialize  this local search and make it more efficient/useful, akin to what is done in Figure 9.

      (4) The analysis presented in Fig. 9 also is preliminary. The authors note that there exist many algorithms for choosing/identifying the parameter values of a dynamical system that give rise to a desired time-series. It would be a stronger result to compare their approach to more sophisticated methods, as opposed to random search and SGD. Other options from the recent literature include Bayesian techniques, sparse nonlinear regression techniques (e.g. SINDy), and evolutionary searches. The authors note that some methods require fine-tuning in order to be successful, but even so, it would be good to know the degree of fine-tuning which is necessary compared to their method.

      We agree that the analysis presented in Figure 9 is preliminary, and thank the reviewer for the suggestion. We would first like to refer to other papers from the ML literature that have more thoroughly analyzed this issue, such as Colas et al. [74] and Pugh et al. [34], and shown the interest of diversity-driven strategies as promising alternatives.  Additionally, as suggested by the reviewer, we added an additional comparison to the CMA-ES algorithm in the revised version in order to complete our analysis. CMA-ES is an evolutionary algorithm which is self-adaptive in the optimization steps and that is known to be better suited than SGD to escape local minimas when the number of parameters is not too high (here we only have 15 parameters). However, our results showed that while CMA-ES explores more the solution space at the beginning of optimization than SGD does, it also ultimately converges into a local minima similarly to SGD. The best solution converges toward a constant signal (of the target b) but fails to maintain the target oscillations, similar to the solutions discovered by gradient descent. We tried this for a few hyperparameters (init mean and std) but always found similar results.  We have updated the figure 9 image and caption, as well as descriptive text, to include these novel results in the revised version. We also added a reference to the CMA-ES paper in the citations.

      Reviewer #1 (Recommendations For The Authors):

      I would suggest to conduct a more rigor analysis of the performance by estimating/approximating the ground truth robust goal sets in important GRNs.

      Also, the use of terminology from different disciplines can be improved. Please see my comments above. Specifically, the connection between controllability in dynamical control systems and versatility used in this paper is unclear.

      We hope to have addressed the reviewer's concerns in our previous answers.

      Reviewer #2 (Recommendations For The Authors):

      Fig 4b: I'm not sure if DBSCAN is the appropriate method to use here, as the visual focus on the core elements of the clusters downplays the full convex hull of the points that random sampling achieves in Z space. An analysis based on convex hulls or the ball-coverage from Fig. 3b would presumably generate plots that were more similar between random sampling and curiosity search. If the goal is to highlight redundancy/non-linearity in the mapping between Z and I, another approach might be to simply bin Z-space in a grid, or to use a clustering algorithm that is less stringent about core/noise distinctions.

      We thank the reviewer for the suggestion. This plot is intended to convey the reader an understanding of why a method that uniformly samples goals in Z (what the  IMGEP is doing), is more efficient than a method that uniformly samples parameters in I (what the random search is doing), in systems for which there is high redundancy/non-linearity in the mapping between I and Z. We agree that binning the Z-space in a grid and counting the number of achieved bins is a way to quantitatively measure this, which is by the way very close to what we do in Figure 3 for measuring the achieved diversity. We believe however that the clustering and coloring provides additional intuitions on why this is the case: it illustrates that large regions of the intervention space map to small regions in the outcome space and vice versa.

      Additional changes in the revised version:

      We added a sentence in the Methods section as well as in the caption of Table S1 providing additional details about the way we simulate the biological models from the BioModels website

      We fixed a wrong reference to Figure 4 in the Methods “Sensitivity measure” subsection with reference to Figure 5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study details an enrichment of the IL-6 signaling pathway in human tendinopathy and applies transcriptional profiling to an advanced in vitro model to test IL-6 specific phenotypes in tendinopathy. Overall, the strength of evidence is solid yet incomplete, as transcriptomic measurements provide clarity, though functional studies including analysis of proliferation are needed to confirm these findings. This work will be of interest to stem cell biologists and immunologists.

      To functionally assess the effect of IL-6 on Scx+ fibroblast proliferation in an acute injury, we repeated the in vivo studies with an EdU staining and a newly established IL-6 KO x ScxGFP+ mouse line. We found no evidence for this effect in acute injuries and acknowledge this in the revised manuscript.

      We further added data collected by combining fluorescence microscopy with human patient-derived tissue to strengthen the link between IL-6, IL-6R, and proliferation of CD90+ cells in chronic injuries.

      See comment 1.1.

      See comment 2.4.

      Changes:

      - Title

      - Abstract

      - Figure 2 and 3 (new data)

      - Figure 7 (new data)

      - Results

      - Discussion

      Reviewer 1

      (1.1) First, the experimental approach does not directly assess proliferation, as such the conclusions regarding proliferation are not well supported. In the ex-vivo model, the use of cell counting approaches is somewhat acceptable since the system is constrained by the absence of potential influx of new cells. However, given the nearly unlimited supply of extrinsically derived cells in vivo (vs. the explant model), assessment of actual proliferation (e.g. Edu, BrdU, Ki67) is critical to support this conclusion.

      To assess the effect of IL-6 on Scx+ fibroblast proliferation in an acute injury, we repeated the in vivo studies with an EdU staining and a newly established IL-6 KO x ScxGFP+ mouse line to combat the considerable background noise of currently available Scx antibodies.

      Under the improved design of these experiments, we could detect no effect of IL-6 on ScxGFP+ cells in an acute injury in vivo. We have therefore replaced figure 5 with the new results in figure 7 and moved figure 5F to the supplementary materials (Supplementary figure 9).

      We acknowledge and discuss this in the discussion section.

      See comment 2.4.

      See comment 2.11.

      Changes:

      - Title

      - Abstract

      - Figure 7 (new data)

      - Supplementary Figure 9

      - Results

      - Discussion

      (1.2) Second, the justification for the use of Scx-GFP+ cells as a progenitor population is not well supported. Indeed, in the discussion, Scx+ cells are treated as though they are uniformly a progenitor population, when the diversity of this population has been established by the cited studies, which do not suggest that these are progenitor populations. Additional definition/ delineation of these cells to identify the subset of these cells that may actually display other putative progenitor markers would support the conclusions. As it stands, the study currently provides important information on the impact of IL6 on Scx+ cells, but not tendon progenitors.

      We further delineated the extrinsic cell populations isolated from mouse Achilles tendons of ScxGFP+ mice using flow cytometric analysis and RT-qPCR. We used tendon population markers suggested by sc-RNA-seq of mouse Achilles tendons.

      (De Micheli et al., Am. J. Physiol. - Cell Physiol., 2020, 319(5), DOI: 10.1152/ajpcell.00372.2020)

      While a small subpopulation of these cells expressed typical progenitor markers (i.e. CD45 and CD146), we could detect no overlap with Scx+ cells. As suggested by the reviewer, we therefore replaced occurrences of “progenitor” in the manuscript with “fibroblast” and performed additional experiments with human patient-derived tissue sections and the fibroblast marker CD90.

      See comment 2.1.

      Changes:

      - Title

      - Abstract

      - Figure 2 (new data)

      - Figure 3 (new data)

      - Supplementary Figure 6 (new data)

      - Results

      - Discussion

      (1.3) Clarity regarding the relevance of the 'sheath-like' component of the assembloid would provide helpful context regarding which types of tendons are likely to have this type of communication vs. those that do not, and if there are differences in tendinopathy prevalence. Understanding why/how this communication between structures is relevant is important.

      Our assembloid concept is inspired by the structure of unsheathed tendons (i.e. biceps, semitendinosus, gracilis) and not sheathed tendons like the flexor tendons.

      We agree that clarity regarding the tendon type having this type of communication is important, so we sharpened previously blurry text passages in the revised manuscript.

      Text changes:

      - Introduction, page 3

      - Results, page 4

      - Results, page 8

      - Results, page 9

      - Results, page 11

      - Discussion, page 25

      - Discussion, page 26

      - Experimental section, page 28

      - Figure 1

      - Figure 2

      - Figure 3

      - Supplementary Table 1

      - Supplementary Figure 3

      - Supplementary Figure 4

      (1.4) Minor: in the text for Figure 6 (2nd paragraph), the comma in 19,694 is superscripted.

      Corrections were made throughout the manuscript.

      Text changes:

      - Results, page 4

      - Results, page 12

      - Results, page 19

      - Results, page 21

      (1.5) Minor: The inclusion of the Scx-GFP mouse should be included in the schematic Figure 5.

      The results presented in the previous draft did not feature tissues from ScxGFP mice but used a Scx-antibody to visually detect Scx+ cells. In anticipation of the revision process, we bred a new IL-6 KO x ScxGFP+ mouse line and repeated the experiment. As suggested by the reviewer, the new schematic figure 7 as well as the former figure 5 moved to the supplementary material now includes this mouse.

      Figure changes:

      - Supplementary Figure 9 (former figure 5)

      - Figure 7

      Reviewer 2

      (2.1) One question that comes to mind is whether the fibroblast progenitors in the extrinsic sheath of Achilles tendon is similar to those surrounding the tail tendon. The similarity of progenitors between different tendons is assumed with this model. I would consider this to be a minor issue.

      Tail tendon fascicles are thought to have a low number of reparative fibroblasts / progenitor cells because they lack a developed extrinsic compartment. Achilles tendons are supposed to have a higher number of reparative fibroblasts / progenitor cells, as their fascicles are surrounded by an extrinsic compartment.

      To verify this here, we added a better characterization and comparison of the cell populations isolated from the tail tendon fascicles and the Achilles tendons.

      First, we added representative light microscopy images of these cells at different timepoints after being cultured on tissue-culture plastic.

      Second, we performed flow cytometric analysis not only on the freshly digested tail tendon fascicles and Achilles tendons, but also on the cultured cells at the timepoint when they would have been embedded into the assembloids.

      Third, we compared the expression of population-specific markers in cells derived from tail tendon fascicle and Achilles tendons.

      As expected, tail tendon fascicle-derived cell populations appeared to be more elongated than Achilles tendon-derived populations shortly after isolation. Similarly, the “maintenance” fibroblasts in healthy tendons are more elongated than the reparative fibroblasts in diseased ones. After culture and priming in tendinopathic niche conditions, both populations assumed a more roundish, reparative phenotype.

      This was consistent with the flow cytometric analysis, which revealed a large difference between freshly isolated populations, that disappeared after extended culture and priming in tendinopathic niche conditions. Gene expression in tail tendon fascicle-derived and Achilles tendon-derived cells was similar after extended culture and priming in tendinopathic niche conditions.

      See comment 1.2.

      See comment 2.10.

      Changes:

      - Supplementary Figure 6 (new data)

      - Results, page 11

      (2.2) The authors use core tendons from IL-6 knockout mice and progenitors from wild-type mice. The reasoning behind this approach was a little confusing... is IL-6 expressed solely in the tendon core compared to the extrinsic sheath?

      Insights gained from human patient-derived tissues (Figure 2) suggest that in a healthy tendon, most of the IL-6 is located in the extrinsic compartment but distributed over compartments in the tendinopathic ones.

      Our assembloid design mimicks this by embedding wildtype fibroblasts into the extrinsic compartment. Our hypothesis was that a wildtype core in tendinopathic niche conditions attracts reparative fibroblasts through IL-6, while an IL-6 knock-out core does not. Therefore, it was important to establish IL-6 gradients close to what they seem to be in vivo.

      Nevertheless, we have to acknowledge that the amount of IL-6 secreted by extrinsic fibroblasts in isolation is quite small compared to what is secreted by a wildtype core (Supplementary Figure 7). Attributing IL-6 in the supernatant of a WT core // WT fibroblast assembloid to the correct cell population is challenging but could be part of future research.  

      Changes:

      - Figure 2 (new data)

      - Supplementary Figure 7 (new data)

      - Results, page 12

      (2.3) Is a co-culture system for 7 days appropriate to model tendinopathy without the supplementation of exogenous inflammatory compounds? The transcriptomic differences in Figure 3 seem to be subtle, and may perhaps suggest that it could be a model that more closely resembles steady state compared to tendinopathy. If so, is IL-6 still relevant during steady state?

      The collective experience in our lab is that core explants exposed to tendinopathic niche conditions (i.e. serum, 37°C, high oxygen, and high glucose levels) assume a disease-like phenotype. (i.e. Wunderli et al., Matrix Biology, 2020, Volume 89 https://doi.org/10.1016/j.matbio.2019.12.003 and Blache et al., Sci. Rep., 2021, 11(1), DOI 10.1038/s41598-021-85331-1).

      Specifically for our core // fibroblast co-culture system, we have reported the emergence of exaggerated tendinopathic hallmarks in a previous publication (Stauber et al., Adv. Healthc. Mater., 2021, 10(20), https://doi.org/10.1002/adhm.202100741).

      We clarified the use of previously validated tendinopathic niche conditions in this manuscript.

      Changes:<br /> - Introduction, page 3<br /> - Results, page 12

      (2.4) The results presented in Figures 4 and 5 are impressive, demonstrating a link between IL-6 and fibroblast progenitor numbers and migration. Their experimental design in these figures show strong evidence, using Tocilizumab and recombinant IL-6 to rescue shown phenotypes. I would reduce the claims on proliferation, however, unless a proliferation-specific marker (e.g., Ki67, BrdU, EdU) is included in confocal analyses of Scx+ progenitors.

      As reviewer 1 pointed out as well, it is important to use a proliferation-specific marker “given the nearly unlimited supply of extrinsically derived cells in vivo (vs. the explant model)”.

      To assess the effect of IL-6 on Scx+ fibroblast proliferation in vivo, we repeated those experiments with a proliferation-specific EdU staining and a newly established IL-6 KO x ScxGFP+ mouse line.

      Under this improved design, we could not detect an effect of IL-6 on proliferation in an acute injury in vivo.

      We have therefore replaced figure 5 with the new results in figure 7 and moved figure 5F to the supplementary materials (Supplementary figure 9).

      We acknowledge and discuss this in the discussion section and softened our statements in the title and the abstract.

      See comment 1.1.

      See comment 2.11.

      Changes:

      - Title

      - Abstract

      - Figure 7 (new data)

      - Supplementary Figure 9

      - Results

      - Discussion

      (2.5) I think it would significantly strengthen the study if they could measure tendon healing in IL-6 knockouts or in wild-type mice treated with IL-6 inhibitors, since conventional ablation of IL-6 may lead to the elevation of compensatory IL-6 superfamily ligands that could activate STAT signaling. The authors claim that reducing IL-6 signaling decreases transcriptomic signatures of tendinopathy, but IL-6 may be necessary to promote normal healing of the tendon following injury. It is supposed that a lack of Scx+ progenitor migration would delay tendon healing.

      Indeed, another study using the same IL-6 knock-out strain showed that a lack of IL-6 signaling resulted in slightly inferior mechanical properties in healing patellar tendons (Lin et al., J. Biomech., 39(1), 2006 https://doi.org/10.1016/j.jbiomech.2004.11.009)

      Also, it might be due to the elevation of compensatory IL-6 superfamily ligands that we found no effect of IL-6 on the proliferation of Scx+ cells in an acute injury in vivo.

      Therefore, assessing the effects of IL-6 inhibitors on tendon healing following an acute injury would have been of great interest to us. Unfortunately, getting the necessary permission from the animal experimentation office for a new invasive treatment protocol was outside of our scope due to the severity degree and time limitations.

      We incorporated and acknowledged these important points in the discussion.

      Text changes:

      - Introduction, page 3

      - Discussion, page 26

      (2.6) Do IL-6 knockout mice and/or mice treated with IL-6 inhibitors have delayed healing following Achilles tendon resection? Please provide experimental evidence.

      See comment 2.5.

      (2.7) I would suggest reducing claims on proliferation, or include a proliferation specific marker (e.g., Ki67, BrdU, EdU) in confocal analyses of Scx+ progenitors.

      See comment 1.1.

      See comment 2.4.

      (2.8) Supplementary Figures 1 and 2: the authors removed outliers. Please specify exactly which outliers were removed in the figures, and provide additional information on the criteria used to identify these outliers.

      To address this comment, we sharpened our criteria for identifying outliers and re-did the analysis depicted in figure 1.

      Briefly, we excluded 5 normal and 5 tendinopathic samples from sheathed tendons which have a different compartmental structure than unsheathed tendons.

      A complete separate analysis of the sheathed tendons would have been beyond the scope of this manuscript, but early screening suggested that IL-6 transcripts are not increased in sheathed tendinopathic tendons.

      We made text changes throughout the manuscript and to the supplementary table 1 and supplementary figure 2 to clearly state our criteria for excluding samples / outliers.

      Changes:

      - Introduction, page 3

      - Results, page 4

      - Results, page 8

      - Results, page 9

      - Results, page 11

      - Discussion, page 25

      - Discussion, page 26

      - Experimental section, page 28

      - Figure 1,

      - Figure 2,

      - Figure 3,

      - Supplementary table 1,

      - Supplementary figure 2,

      - Supplementary figure 3,

      - Supplementary figure 4,

      (2.9) Whenever "positive enrichment" is mentioned in the text, please specify in what group. It is presumed that the enrichment, for example, in the first figure is associated with tendinopathy samples compared to controls, though it is a bit unclear.

      The direction of the enrichment was added to the text.

      Text changes:

      - Abstract, page 1

      - Introduction, page 3

      - Results, page 4

      - Results, page 6

      - Results, page 12

      - Results, page 14

      - Results, page 19

      - Results, page 21

      - Discussion, page 25

      - Discussion, page 26

      - Discussion, page 27

      - Figure 1

      - Figure 5

      - Figure 8

      - Figure 9

      - Supplementary figure 3

      - Supplementary figure 4

      - Supplementary figure 6

      - Supplementary figure 8

      - Supplementary figure 11

      - Supplementary figure 12

      - Supplementary figure 14

      (2.10) Are tail tendon progenitors similar to Achilles tendon progenitors? Please provide a statement that shows similarity (in function, transcriptome, etc.) to support the in vitro tendon model.

      See comment 1.2.

      See comment 2.1.

      (2.11) Are the results in Figure 5F significant? It seems that your pictures show a dramatic change in migration, but the quantification does not?

      We repeated the in vivo studies with a newly established IL-6 KO x ScxGFP+ mouse line to combat the considerable background noise of currently available Scx antibodies.

      Under the improved design of these experiments, we could not detect an effect of IL-6 on ScxGFP+ cells migration in an acute injury in vivo.

      We have therefore replaced figure 5 with the new results in figure 7 and moved figure 5F to the supplementary materials (Supplementary figure 9)

      We acknowledge and discuss this in the discussion section.

      See comment 1.1.

      See comment 2.4.

      Changes:

      - Title

      - Abstract

      - Figure 7 (new data)

      - Supplementary Figure 9

      - Results

      - Discussion

      (2.12) Please provide additional discussion points on cis- versus trans-IL6 signaling in your results found in mouse. Do you think researchers/clinicians would want to target trans-IL6 signaling based on your results? Please support these statements with the expression of IL6R on cells found in the tendon core and external sheath progenitors.

      To address this comment, we performed flow cytometric analysis on Achilles tendon-derived fibroblasts expanded in 2D and digested sub-compartments of the assembloids (Supplementary Figure 7).

      These data suggest that IL6R is neither expressed by core nor extrinsic fibroblasts, but mainly comes from core-resident CD45+ tenophages.

      Human samples co-stained for IL6R and CD68 (an established human macrophage marker) confirmed macrophages as a source of IL-6R in vivo. However, human samples co-stained for IL6R and CD90 (an established marker of reparative fibroblasts in humans) also detected IL6R on CD90+ cells, which have not yet been reported to express IL6R themselves.

      Overall, it is likely that trans-IL-6 signaling is more important for the activation of reparative fibroblasts than cis-IL-6 signaling. We added these statements to the manuscript.

      Changes:

      - Results, page 9

      - Results, page 12

      - Discussion, page 25

      - Discussion, page 26

      - Figure 3 (new data)

      - Supplementary figure 7 (new data)

      (2.13) Please provide more detail on collagen isolation from rat tail in the methods section.

      We provided more details on collagen isolation from rat tail in the experimental section (page 29)

      Changes:

      - Experimental section, page 29

      (2.14) Please comment on whether your in vitro system resembles tendinopathy or a steady state tendon. If it models more of a steady state system, would IL-6 still be relevant?

      See comment 2.3.

      Detailed feedback:

      Reviewer 1:

      This work by Stauber et al. is focused on understanding the signaling mechanisms that are associated with tendinopathy development, and by screening a panel of human tendinopathy samples, identified IL-6/JAK/STAT as a potential mediator of this pathology. Using an innovative explant model they delineated the requirement for IL-6 in the main body of the tendon to alter the dynamics of cells in the peritendinous synovial sheath space.

      The use of a publicly available existing dataset is considered a strength since this dataset includes expression data from several different human tendons experiencing tendinopathy. This facilitates the identification of potentially conserved regulators of the tendinopathy phenotype.

      The clear transcriptional shifts between WT and IL6-/- cores demonstrates the utility of the assembloid model, and supports the importance of IL6 in potentiating the cell response to this stimuli.

      Reviewer 2:

      The authors of this study describe a goal of elucidating the signaling pathways that are upregulated in tendinopathy in order to target these pathways for effective treatments. Their goal is honorable, as tendinopathy is a common debilitating condition with limited treatments. The authors find that IL-6 signaling is upregulated in human tendinopathy samples with transcriptomic and GSEA analyses. The evidence of their initial findings are strong, providing a clinically-relevant phenotype that can be further studied using animal models.

      Along these lines, the authors continue with an advanced in vitro system using the mouse tail tendon as the core with progenitors isolated from the Achilles tendon as the external sheath embedded in a hydrogel matrix. One question that comes to mind is whether the fibroblast progenitors in the extrinsic sheath of Achilles tendon is similar to those surrounding the tail tendon. The similarity of progenitors between different tendons is assumed with this model. I would consider this to be a minor issue, and would consider the in vitro system to be an additional strength of this study.

      In order to address the IL-6 signaling pathway, the authors use core tendons from IL-6 knockout mice and progenitors from wild-type mice. The reasoning behind this approach was a little confusing... is IL-6 expressed solely in the tendon core compared to the extrinsic sheath? Furthermore, is a co-culture system for 7 days appropriate to model tendinopathy without the supplementation of exogenous inflammatory compounds? The transcriptomic differences in Figure 3 seem to be subtle, and may perhaps suggest that it could be a model that more closely resembles steady state compared to tendinopathy. If so, is IL-6 still relevant during steady state?

      Nevertheless, the results presented in Figures 4 and 5 are impressive, demonstrating a link between IL-6 and fibroblast progenitor numbers and migration. Their experimental design in these figures show strong evidence, using Tocilizumab and recombinant IL-6 to rescue shown phenotypes. I would reduce the claims on proliferation, however, unless a proliferation-specific marker (e.g., Ki67, BrdU, EdU) is included in confocal analyses of Scx+ progenitors. The Achilles tendon injury model provides a nice in vivo confirmation of Scx-progenitor migration to the neotendon.

      Given their goal to elucidate signaling pathways that could be targeted in the clinic, I think it would significantly strengthen the study if they could measure tendon healing in IL-6 knockouts or in wild-type mice treated with IL-6 inhibitors, since conventional ablation of IL-6 may lead to the elevation of compensatory IL-6 superfamily ligands that could activate STAT signaling. The authors claim that reducing IL-6 signaling decreases transcriptomic signatures of tendinopathy, but IL-6 may be necessary to promote normal healing of the tendon following injury. It is supposed that a lack of Scx+ progenitor migration would delay tendon healing.

      Overall, the authors of this study elucidated IL-6 signaling in tendinopathy and provided a strong level of evidence to support their conclusions at the transcriptomic level. However, functional studies are needed to confirm these phenotypes and fully support their aims and conclusions. With these additional studies, this work has the potential to significantly influence treatments for those suffering from tendinopathy.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Lee et al. compared encoding of odor identity and value by calcium signaling from neurons in the ventral pallidum (VP) in comparison to D1 and D2 neurons in the olfactory tubercle (OT).

      Strengths:

      They utilize a strong comparative approach, which allows the comparison of signals in two directly connected regions. First, they demonstrate that both D1 and D2 OT neurons project strongly to the VP, but not the VTA or other examined regions, in contrast to accumbal D1 neurons which project strongly to the VTA as well as the VP. They examine single unit calcium activity in a robust olfactory cue conditioning paradigm that allows them to differentiate encoding of olfactory identity versus value, by incorporating two different sucrose, neutral and air puff cues with different chemical characteristics. They then use multiple analytical approaches to demonstrate strong, low-dimensional encoding of cue value in the VP, and more robust, high-dimensional encoding of odor identity by both D1 and D2 OT neurons, though D1 OT neurons are still somewhat modulated by reward contingency/value. Finally, they utilize a modified conditioning paradigm that dissociates reward probability and lick vigor to demonstrate that VP encoding of cue value is not dependent on encoding of lick vigor during sucrose cues, and that separable populations of VP neuros encode cue value/sucrose probability and lick vigor. Direct comparisons of single unit responses between the two regions now utilize linear mixed effects models with random effects for subject,

      Weaknesses:

      The manuscript still includes mention of differences in effect size or differing "levels" of significance between VP and OT D1 neurons without reports of a direct comparisons between the two populations. This is somewhat mitigated by the comprehensive statistical reporting in the supplemental information, but interpretation of some of these results is clouded by the inclusion of OT D2 neurons in these analyses, and the limited description or contextualization in the main text.

      We think the reviewer is mistaken and have clarified the text.  Each pairwise comparison between VP, OTD1 and OTD2, for each odor across days is shown as a heatmap in supplementary figure 8B, with further details in table 37. Absolute diff 3H no statistics

      Reviewer #2 (Public Review):

      We appreciate the authors revision of this manuscript and toning down some of the statements regarding "contradictory" results. We still have some concerns about the major claims of this paper which lead us to suggest this paper undergo more revision as follows since, in its present form, we fear this paper is misleading for the field in two areas. here is a brief outline:

      (1) Despite acknowledging that the injections only occurred in the anteromedial aspect of the tubercle, the authors still assert broad conclusions regarding where the tubercle projects and what the tubercle does. for instance, even the abstract states "both D1 and D2 neurons of the OT project primarily to the VP and minimally elsewhere" without mention that this is the "anteromedial OT". Every conclusion needs to specify this is stemming from evidence in just the anteromedial tubercle, as the authors do in some parts of the the discussion.

      We have clarified in multiple locations that we are recorded from the anteromedial OT, including the abstract, and further clarified this in the conclusions throughout the results and discussion. We refrain stating “anteromedial OT” at every mention of the OT, but think we have now made it abundantly clear that our observations are from the anteromedial OT. It is worth noting that retrograde tracing from the VTA did not label any neuron in any part of the OT, suggesting that the conclusion may well extend beyond the anteromedial portion. Though, we acknowledge further work is needed to comprehensively characterize the OT outputs.

      (2) The authors now frame the 2P imaging data that D1 neuron activity reflects "increased contrast of identity or an intermediate and multiplexed encoding of valence and identity". I struggle to understand what the authors are actually concluding here. Later in discussion, the authors state that they saw that OT D1 and D2 neurons "encode odor valence" (line 510). 

      The point we aim to make is that valence encoding is different between the OT and VP. We do not think the reward modulated activity in OT is valence encoding, at least not as it is in the VP.  We do observe some valence encoding at the population level, which is different from individual valence encoding neurons. The ability of classifiers to segregate population activity based on reward might be considered valence encoding, but we contrast it with that in VP where individual neurons signal reward prediction. This is more robust than that in the OT data where few neurons robustly encode valence. The increased response of the OTD1 neurons after reward association, is more consistent with contrast enhancement than valence encoding.  We believe this distinction is important and reflects a transformation between two reward-related brain areas. For clarification of the sentence in question we have changed it to reflects “increased contrast of iden-ty or an intermediate encoding of valence that also encodes iden-ty.” (line 488)

      We appreciate the authors note that there is "poor standardization" when it comes to defining valence (line 521). We are ok with the authors speculating and think this revision is more forthcoming regarding the results and better caveats the conclusions. I suggest in abstract the authors adjust line 14/15 to conclude that, "While D1 OT neurons showed larger responses to rewarded odors, in line with prior work, we propose this might be interpreted as identity encoding with enhanced contrast." [eliminating "rather than valence encoding" since that is a speculation best reserved for discussion as the authors nicely do.

      We accept this suggestion and have modified the abstract sentence to say, “Though D1 OT neurons showed larger responses to rewarded odors than other odors, consistent with prior findings, we interpret this as iden-ty encoding with enhanced contrast.”  We believe this is appropriately qualified as an interpreta-on, and should not be confusing.

      The above items stated, one issue comes to mind, and that is, why of all reasons would the authors find that the anteromedial aspect of the tubercle is not greatly reflecting valence. the anteromedial aspect of the tubercle, over all other aspects of the tubercle, is thought my many to more greatly partake in valence and other hedonic-driven behaviors given its dense reception of VTA DAergic fibers (as shown by Ikemoto, Kelsch, Zhang, and others). So this finding is paradoxical in contrast to if the authors would had studied the anterolateral tubercle or posterior lateral tubercle which gets less DA input.

      We agree that this seems surprising.  This is why we focused on the anteromedial expecting to find valence encoding.  It remains possible that other parts of the OT, or more dorsal aspects of the anteromedial OT encode valence, as has been reported by Murthy and colleagues.  However, it remains unclear if their recordings are in the OT or VP.  Nonetheless our findings indicate that more work is required to understand the contribution of the OT to valence encoding.  It is also important to note that our conclusions are drawn in comparison to the VP, which has more robust valence encoding than the OT. Thus, in comparison the OT sample in our recordings lack robust valence signaling.  We think this comparison is important, due to the lack of clear framework for defining valence that may create misleading statements in past OT work.  

      Reviewer #3 (Public Review):

      Summary:

      This manuscript describes a study of the olfactory tubercle in the context of reward representation in the brain. The authors do so by studying the responses of OT neurons to odors with various reward contingencies and compare systematically to the ventral pallidum. Through careful tracing, they present convincing anatomical evidence that the projection from the olfactory tubercle is restricted to the lateral portion of the ventral pallidum.

      Using a clever behavioral paradigm, the authors then investigate how D1 receptor- vs. D2 receptor-expressing neurons of the OT respond to odors as mice learn different contingencies. The authors find that, while the D1-expressing OT neurons are modulated marginally more by the rewarded odor than the D2-expressing OT neurons as mice learn the contingencies, this modulation is significantly less than is observed for the ventral pallidum. In addition, neither of the OT neuron classes shows conspicuous amount of modulation by the reward itself. In contrast, the OT neurons contained information that could distinguish odor identities. These observations have led the authors to conclude that the primary feature represented in the OT may not be reward.

      Strengths:

      The highly localized projection pattern from olfactory tubercle to ventral pallidum is a valuable finding and suggests that studying this connection may give unique insights into the transformation of odor by reward association.

      Comparison of olfactory tubervle vs. ventral pallidum is a good strategy to further clarify the olfactory tubercle's position in value representation in the brain.

      Weaknesses:

      The study comes to a different conclusion about the olfactory tubercle regarding reward representations from several other prior works. Whether this stems from a difference in the experimental configurations such as behavioral paradigms used or indeed points to a conceptually different role for the olfactory tubercle remains to be seen.

      We acknowledge that our results lead us to conclusions that are different from that of prior work.  But we note that our results are not directly at odds, as we see similar reward modulation of D1 OT neurons as has been reported previously. Our conclusion is different because we contrast our OT responses with that in the VP where valence is more robustly encoded at the single neuron level. We also note, that many of the past studies do not define valence as stringently as we do.  Thus, increased activity with reward, as observed in our data and past studies, seems more like reward modulation than valence.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work explored intra and interspecific niche partitioning along spatial, temporal, and dietary niche partitioning between apex carnivores and mesocarnivores in the Qilian Mountain National Park of China, using camera trapping data and DNA metabarcoding sequencing data. They conclude that spatial niche partitioning plays a key role in facilitating the coexistence of apex carnivore species, spatial and temporal niche partitioning facilitate the coexistence of mesocarnivore species, and spatial and dietary niche partitioning facilitate the coexistence between apex and mesocarnivore species. The information presented in this study is important for wildlife conservation and will contribute substantially to the current understanding of carnivore guilds and effective conservation management in fragile alpine ecosystems.

      Strengths:

      Extensive fieldwork is evident in the study. Aiming to cover a large percentage of the Qilian Mountain National Park, the study area was subdivided into squares, as a geographical reference to distribute the sampling points where the camera traps were placed and the excreta samples were collected.

      They were able to obtain many records in their camera traps and collected many samples of excreta. This diversity of data allowed them to conduct robust analyses. The data analyses carried out were adequate to obtain clear and meaningful results that enabled them to answer the research questions posed. The conclusions of this paper are mostly well supported by data.

      The study has demonstrated the coexistence of carnivore species in the landscapes of the Qilian Mountains National Park, complementing the findings of previous studies. The information presented in this study is important for wildlife conservation and will contribute substantially to the current understanding of carnivore guilds and effective conservation management in fragile alpine ecosystems.

      Weaknesses:

      It is necessary to better explain the methodology because it is not clear what is the total sampling effort. In methodology, they only claim to have used 280 camera traps, and in the results, they mention that there are 319 sampling sites. However, the total sampling effort (e.g. total time of active camera traps) carried out in the study and at each site is not specified.

      Thanks a lot for this detailed review! We apologize for not offering a distinct description of the overall sampling effort. In this study, we deployed 280 camera trappings, and these cameras were active for approximately 4 to 6 months. We visited each camera 2 to 3 times annually to download photos and check the batteries. In case some cameras failed to capture the targeted carnivore, we would relocate the positions of those cameras. Eventually, we collected 322 camera trapping sites, among which 3 cameras malfunctioned due to loss. As a result, we analyzed data from 319 camera sites and obtained 14,316 independent detections over 37,192 trap-days.

      We have added this information as follows in lines 132 to lines 143: “Taking into account the fact that mammalian communities are sensitive to seasonality, we used camera traps to monitor animals with an extensive survey effort from December 2016 to February 2022, covering the activity of animal species in different seasons, which can reflect the overall distribution of carnivores. We placed a total of 280 infrared cameras at the study site, set them to be active for 4 to 6 months, and considered possible relocation to another position based on animal detection in an effort to improve estimates of the occupancy and detection rates for both common and rare species (Figure 1) (Kays et al., 2020). The camera trap was set to record the time and date on a 24 hr clock when triggered, and to record a 15s video and 1 photo with an interval of 2 minutes between any two consecutive triggers. The sum of camera trap effective days was defined by the total amount of trapping effort during the sampling period, which was calculated from the time the camera was placed in operation to the time the last video or photograph was taken. We visited each camera 2 to 3 times a year to download photos and check batteries.” and lines 228 to lines 232: “A total of 322 camera trap sites were surveyed after relocating infrared cameras that did not capture any target carnivore species. A total of 3 cameras were considered to have failed due to loss. We analyzed data from 319 camera sites and obtained 14,316 independent detections during a total effort of 37,192 effective camera trap days. We recorded wolf in 26 sites, snow leopard in 109 sites, Eurasian lynx in 36 sites, red fox in 92 sites, and Tibetan fox in 34 sites.”

      Reviewer #2 (Public Review):

      Summary:

      The study entitled "Different coexistence patterns between apex carnivores and mesocarnivores based on temporal, spatial, and dietary niche partitioning analysis in Qilian Mountain National Park, China" by Cong et al. addresses the compelling topic of carnivores' coexistence in a biodiversity hotspot in China. The study is interesting given it considers all three components affecting sympatric carnivores' distribution and co-occurrence, namely the temporal, the spatial, and the dietary partition within the carnivore guild. The authors have found that spatial co-occurrence is generally low, which represents the major strategy for coexistence, while there is temporal and dietary overlap. I also appreciated the huge sampling effort carried out for this study by the authors: they were able to deploy 280 camera trapping sites (which became 322 in the result section?) and collect a total of 480 scat samples. However, I have some concerns about the study on the non-consideration of the human dimension and potential anthropogenic disturbance that could affect the spatial and temporal distribution of carnivores, the choice of the statistical model to test co-occurrence, and the lack of clearly stated ecological hypotheses.

      Strengths:

      The strengths of the study are the investigation of all three major strategies that can mitigate carnivores' coexistence, therefore, the use of multiple monitoring techniques (both camera trapping and DNA metabarcoding) and the big dataset produced that consists of a very large sampled area with a noteworthy number of camera trap stations and many scat samples for each species.

      Weaknesses:

      I think that some parts of the manuscript should be written better and more clearly. A clear statement of the ecological hypotheses that could affect the partitioning among the carnivore guild is lacking. I think that the human component (thus anthropogenic disturbance) should have been considered more in the spatial analyses given it can influence the use of the environment by some carnivores. Additionally, a multi-species co-occurrence model would have been a more robust approach to test for spatial co-occurrence given it also considers imperfect detection.

      Thank you very much for your valuable comments and suggestions. We checked and edited the manuscript, and we thought the English level was improved.

      (1) According to your suggestion, we added the competitive exclusion and niche differentiation hypothesis with space, time and diets axis to explain co-occurrence relationship among species in the introduction as follow: “The competitive exclusion principle dictates that species with similar ecological requirements are unable to successfully coexist (Hardin, 1960; Gause, 1934). Thus, carnivores within a guild occupy different ecological niches based on a combination of three niche dimensions, i.e. spatial, temporal, and trophic (Schoener, 1974). Spatially, carnivore species within the same geographic area exhibit distinct distributions that minimize overlap in resource use and competition. For example, carnivores can partition habitats based on habitat feature preferences and availability of prey (De Satgé et al., 2017; Garrote and Pérez De Ayala, 2019; Gołdyn et al., 2003; Strampelli et al., 2023). Temporally, differences in seasonal or daily activity patterns among sympatric carnivores can reduce competitive interactions and facilitate coexistence. For example, carnivores can exhibit temporal segregation in their foraging behaviors, such as diurnal versus nocturnal activity, to avoid direct competition (Finnegan et al., 2021; Nasanbat et al., 2021; Searle et al., 2021). Trophically, carnivore species can diversify their diets to exploit different prey species or sizes, thereby reducing competition for food resources. For example, carnivores can exhibit dietary specialization to optimize their foraging efficiency and minimize competitive pressures (Steinmetz et al., 2021).”

      (2) In addition to distance from roads, we included human dimension as covariates influencing occupancy rates based on the number of independent photos or videos of herders and livestock detected by infrared cameras (named human disturbance and is represented by hdis). According to the results of occupancy models, we found red fox occupancy probability displayed a significant positive relationship with hdis. Moreover, the detection probability of snow leopard and Eurasian lynx decreased with increasing hdis.

      We have incorporated these results into the Results as follow: “According to the findings derived from single-season, single-species occupancy models, the snow leopard demonstrated a notably higher probability of occupancy compared to other carnivore species, estimated at 0.437 (Table 1). Conversely, the Eurasian lynx exhibited a lower occupancy probability, estimated at 0.161. Further analysis revealed that the occupancy probabilities of the wolf and Eurasian lynx declined with increasing Normalized Difference Vegetation Index (NDVI) (Table 2, Figure 2). Additionally, wolf occupancy probability displayed a negative relationship with roughness index and a positive relationship with prey availability. Snow leopard occupancy probabilities exhibited a negative relationship with distance to roads and NDVI. In contrast, both red fox and Tibetan fox demonstrated a positive relationship with distance to roads. Moreover, red fox occupancy probability increased with higher human disturbance and greater prey availability. The detection probabilities of wolf, snow leopard, red fox, and Tibetan fox exhibited an increase with elevation (Table 2). Moreover, there was a positive relationship between the detection probability of Tibetan fox and prey availability. The detection probabilities of snow leopard and Eurasian lynx declined as human disturbance increased.”

      (3) We appreciate the suggestion to use a multi-species co-occurrence model to test spatial co-occurrence. We attempted a multispecies occupancy modeling to analysis the five species in our study followed the method of Rota et al. (2016). Initially, we simplified the candidate models by adopting a single-season, single-species occupancy model. We selected occupancy covariates from the best model as the best covariates for each species and used them to establish multispecies occupancy models. Unfortunately, the final model results did not converge. We are investigating potential solutions to resolve this problem.

      Rota CT, Ferreira MAR, Kays RW, Forrester TD, Kalies EL, McShea WJ, Parsons AW, Millspaugh JJ. 2016. A multispecies occupancy model for two or more interacting species. Methods Ecol Evol 7:1164–1173. doi:10.1111/2041-210X.12587

      Temporal and dietary results are solid and this latter in particular highlights a big predation pressure on some prey species such as the pika. This implies important conservation and management implications for this species, and therefore for the trophic chain, given that i) the pika population should be conserved and ii) a potential poisoning campaign against small mammals could be incredibly dangerous also for mesocarnivores feeding on them due to secondary poisoning.

      Thank you for your thoughtful comments. We appreciate your recognition of the temporal and dietary findings, particularly the highlighted predation pressure on prey species like the pika. These observations indeed underscore critical implications for conservation and management. The necessity to conserve the pika population is paramount for its role in maintaining the stability of the trophic chain within its ecosystem. As you rightly pointed out, any disruption to this delicate balance, including through predation or indirect threats like poisoning campaigns, could have far-reaching consequences. Regarding the potential risks associated with poisoning campaigns targeting small mammals, we acknowledge the significant concerns raised about secondary poisoning affecting mesocarnivores. This underscores the need for careful consideration in pest control strategies and the adoption of measures that minimize unintended ecological impacts. Our findings suggest several practical implications for conservation and management. Conservation efforts should focus on vulnerable prey populations such as the pika, while management strategies could include regulatory frameworks and community education to mitigate risks associated with pest control methods. We believe our study contributes valuable insights into the complexities of predator-prey dynamics and the broader implications for ecosystem health. By integrating these findings into conservation practices, we can work towards ensuring the sustainability of natural systems and the species that depend on them.

      Reviewer #1 (Recommendations For The Authors):

      To better explain the methodology and the sampling effort I recommend reviewing e.g. Kays et al. 2020. An empirical evaluation of camera trap study design: How many, how long, and when?. Methods in Ecology and Evolution, 11(6), 700-713. https://besjournals.onlinelibrary.wiley.com/doi/epdf/10.1111/2041-210X.13370.

      Thank you for this valuable suggestion! According to this reference, we have added this information to explain the methodology and the sampling effort as follow: “Taking into account the fact that mammalian communities are sensitive to seasonality, we used camera traps to monitor animals with an extensive survey effort from December 2016 to February 2022, covering the activity of animal species in different seasons, which can reflect the overall distribution of carnivores. We placed a total of 280 infrared cameras at the study site, set them to be active for 4 to 6 months, and considered possible relocation to another position based on animal detection in an effort to improve estimates of the occupancy and detection rates for both common and rare species (Figure 1) (Kays et al., 2020). The camera trap was set to record the time and date on a 24 hr clock when triggered, and to record a 15s video and 1 photo with an interval of 2 minutes between any two consecutive triggers. The sum of camera trap effective days was defined by the total amount of trapping effort during the sampling period, which was calculated from the time the camera was placed in operation to the time the last video or photograph was taken. We visited each camera 2 to 3 times a year to download photos and check batteries.”

      Reviewer #2 (Recommendations For The Authors):

      I have some concerns about the manuscript.

      I find that the manuscript should be written more clearly: some sentences are not straightforward to understand given the presence of structural errors that make the text hard to read; the paragraphs should be written in a more harmonic way (without logical leaps) with a smoother change of topic between paragraphs, especially in the introduction.

      We appreciate your constructive comments, which have helped us improve the clarity and coherence of the manuscript. We have revised the introduction to provide a clearer outline of the paper's structure and objectives. Specifically, we have rephrased complex sentences and removed ambiguities to ensure that each idea is communicated more straightforwardly. We providing clearer links between ideas and avoiding abrupt shifts in topics to ensure that a smoother transition between paragraphs.

      I feel like the strength of merging the two techniques (camera trapping and DNA metabarcoding) is not brought up enough, while the disadvantage of this approach is not even mentioned (e.g., the increasing costs).

      Thanks a lot for this valuable comment! We have added this information to the Discussion (L356-L363) as follow: “Our study highlights the effectiveness of combining camera trapping with DNA metabarcoding for detecting and identifying both cryptic and rare species within a sympatric carnivore guild. This integrated approach allowed us to capture a more comprehensive view of species presence and interactions compared to traditional visual surveys. whereas, it is important to acknowledge the challenges associated with this technique, including the high costs of equipment and the need for specialized training and computational resources to manage and analyze the large volumes of sequence data. Despite these challenges, the benefits of this combined method in improving biodiversity assessments and understanding species coexistence outweigh the drawbacks.”

      The structure of the manuscript does not follow the structure of the journal (Intro, Material and Method, Results, Discussion instead it reports the methods at the end of the main manuscript), and, most critically, I found that a clear explanation of the research hypothesis is missing: authors should clearly state they ecological hypotheses. What are your hypotheses on the co-occurrence relationship among species? What would specifically affect and change the sympatric relationships among carnivores?

      Thank you for this valuable suggestion! We have revised the manuscript, that is integrated the methods section appropriately within the main body of the manuscript to ensure that it aligns with the standard sections (Introduction, Materials and Methods, Results, Discussion.

      We state our main ecological hypotheses concerning the co-occurrence relationships among carnivore species is based on niche differentiation hypothesis. We hypothesize that differentiation along one or more niche axes is beneficial for the coexistence of carnivorous guild in the Qilian Mountains. We expected that spatial niche differentiation promotes the coexistence of large carnivores in the Qilian Mountain region, as they are more likely than small carnivores to spatially avoid interspecific competition (Davis et al., 2018). Mesocarnivores may coexist either spatially or temporally due to increased interspecific competition for similar prey (Di Bitetti et al., 2010; Donadio and Buskirk, 2006). Nutritional niche differentiation may be a significant factor for promoting coexistence between large and mesocarnivore species due to differences in body size (Gómez-Ortiz et al., 2015; Lanszki et al., 2019). We have added ecological hypotheses in lines 101 to 110.

      Another concern is that all pictures with people have been removed from the dataset, but I think that this could be a bit biased as human presence (or also the presence of livestock) could affect the spatial or temporal presence of carnivores, changing their co-occurrence dynamics. On one side, humans can be perceived as a source of disturbance by carnivores and, therefore, can cause a shift in distribution towards locations with lower human presence (or lower anthropogenic disturbance) that could further concentrate the presence of carnivores increasing the competitive interaction. Conversely, mesocarnivores could take advantage of an increasing human presence - following the human shield hypotheses - finding a refugium from larger body carnivores. From this perspective, important information on the potential anthropogenic pressure is lacking in the description of the study area: how effective is the protection effort of the park? How intense is the potential human disturbance in and around the park? Is there poaching? Intensive livestock grazing? Resources extractions? These are all factors that could affect the interactions among carnivores. Do not forget the possibility and risk of being retaliatory killed by humans due to the presence of livestock in the area. I think that incorporating the human dimension is important because it could strongly affect how carnivores perceive and use the environment. Here only the distance to the closest road has been considered. However, for example, recent research (Gorczynski et al 2022, Global Change Biology) has indeed found that co-occurrece of ecologically similar species differed in relation to increasing human density. Therefore, I think that anthropogenic disturbance is an aspect to be reckoned with and more variables as proxy of human disturbance should be considered.

      Thanks a lot for this valuable comment! We acknowledge that humans can act as both a disturbance factor, potentially driving carnivores away from highly populated areas, and as a source of indirect refuge for mesocarnivores, thereby affecting competitive interactions among carnivores. We understand that poaching and resource extraction are prohibited and livestock grazing is a significant human activity within the study area. Therefore, we added human dimension as covariates influencing occupancy rates based on the number of independent photos or videos of herders and livestock detected by infrared cameras (named human disturbance and is represented by hdis). According to the results of occupancy models, we found red fox occupancy probability displayed a significant positive relationship with hdis. Moreover, the detection probability of snow leopard and Eurasian lynx decreased with increasing hdis.

      In the statistical analyses section, I don't find that the statistical procedure is well described: it is not clear which occupancy model has been used (probably a single-species single-season occupancy model for each target species?), which covariates have been tested for each species and following which hypotheses. Additionally, I think that when modelling the spatial distribution of subordinate species, it should be important to include information on the spatial distribution of apex species given this could affect their occurrence on the territory. This could have been done by using the Relative Abundance Index of the apex predators as a covariate when modelling the distribution of subordinate species. Additionally, why haven't the authors used prey as a covariate for occupancy? I think that prey distribution should affect the occupancy probability more than the detection rate. Also, the authors used the Sørensen similarity index to measure associations between species. However, this association metric has been criticized (see the recent paper of Mainali et al 2022, Science Advances). I am therefore wondering: given the authors are using the occupancy framework, why don't they use a multi-species co-occurrence model that allows them to directly estimate both single-species occupancy and the co-occurrence parameter as a function of covariates (examples are Rota et al. 2016, Methods Ecol. Evol. Or Tobler et al. 2019, Ecology)? For the temporal overlap, I think that adding Figure S2 (pairwise temporal overlap) in the main text would help deliver the results of the temporal analyses more straightforwardly.

      Thanks a lot for this valuable comment!

      (1) The current manuscript utilizes a single-species single-season occupancy model for each target species. Additionally, we have added prey and human disturbance as occupancy covariables. We have revised the statistical analyses section to explicitly state this model choice and clarify the covariates tested for each species from lines 153 to lines170. The details are as follows: “To investigate the spatial distribution of carnivores, as well as the influence of environmental factors on the site occupancy of species in the study area, we performed single-season, single-species occupancy models to estimate carnivores’ occupancy (ψ) and detection (Pr) probability (Li et al., 2022b; MacKenzie, 2018; Moreno-Sosa et al., 2022). To ensure capture independence, only photo or video records at intervals of 30 min were was included in the data analysis (Li et al., 2020). We created a matrix recording whether each carnivore species was detected (1) or not (0) across several 30-day intervals (that is 0-30, 31-60, 61-90, 91-120, 121-150, >150 days) for each camera location. Based on the previous studies of habitat use of carnivores (Greenspan and Giordano, 2021; Alexander et al., 2016; Gorczynski et al., 2022), we selected terrain, vegetation, biological factors and disturbance to construct the model. Terrain is a fundamental element of wildlife habitat and closely linked to other environmental factors (Chen et al., 2024). Terrain variables include elevation (ele) and roughness index (rix). Vegetation variables include normalized difference vegetation index (ndvi), and provide information on the level of habitat concealment. Biological variables include prey abundance (the number of independent photos of their preferred prey based on dietary analysis in this study, wolf and snow leopard: artiodactyla including livestock; Eurasian lynx and Pallas’s cat: lagomorpha; red fox and Tibetan fox: lagomorpha and rodentia) and reflect habitat preference and distribution patterns of carnivores. Disturbance variables include distance to roads (disrd) and human disturbances (hdis, the number of independent photos of herdsman and livestock) and can provide insight into the habitat selection and behavior patterns of carnivores.”

      (2) Thank you for your valuable suggestions. We acknowledge the importance of considering apex species in models of subordinate species' spatial distributions.

      Nonetheless, considering the consistency of covariates for each species and the lack of interspecies interactions in single-species occupancy models, we did not include the Relative Abundance Index of the apex predators as a covariate affecting the occupancy of mesopredators. As you recommended, multi-species occupancy models that account for interspecies interactions are a robust approach. However, we attempted to use the multi-species occupancy method of Rota et al. (Rota et al., 2016), the final model results did not converge. Specifically, we selected occupancy covariates from the best model by single-species model as the best covariates for each species and used them to establish multispecies occupancy models. We are investigating potential solutions to resolve this problem.

      (3) We used the Sørensen similarity index to measure associations between species based on support from previous literature. As counted by Mainali et al., the Sørensen index has been used in more than 700 papers across journals such as Science, Nature, and PNAS. We believe this index holds broad applicability in describing relationships between species.

      (4) We agree that presenting pairwise temporal overlap in the main text would enhance clarity. We revised the manuscript to include Figure S2 in the main text and ensure that the temporal analyses are more straightforwardly presented.

      Regarding the sampling collection of the scats, I'm just curious to know why you decided to use silica desiccant instead of keeping the samples frozen. I'm not familiar with this method and I guess it works fine because the environment is generally freezing cold. Yet, I would like to know more. How fresh do scat samples need to be in order to be suitable for DNA metabarcoding analyses? Additionally, what do you mean by "scats were collected within camera trapping area", could you be more specific? Have you specified a buffer around camera stations?

      Thanks a lot for this specific inquiry! We refer to the scat collection method mentioned in the study of Janecka et al (2008; 2011). Silica is used to dry the scats to minimize DNA degradation. Due to the limitation of field environmental conditions, there is no suitable equipment to freeze samples during sampling, the collected scat samples should be kept dry and cool in shade, and transferred to the laboratory as soon as possible after sampling. We selected relatively fresh samples based on the color of the scat as well as broken off bits and pieces from the outside part of the scat including pieces not directly in the sun. Collect scat material about the size of a pinkie nail in the tube. If over fill the tube it will likely not dry and lead to DNA degradation.

      The study area was subdivided into sample squares of 25 km2 (5×5 km) as a geographical reference for placing camera survey sites and collecting scat samples. Camera traps were set in areas believed to be important to and heavily used by wildlife, such as the bottoms of cliffs, sides of boulders, valleys and ridges along movement corridors. Also, we focused on sites with known or suspected carnivore activity to maximize probability of detection for scat samples. Therefore, transects were set around the infrared camera to collect scat samples. Length of each transect was determined by terrain, amount of scat, and available time. Each transect should have collected about 18 samples or covered 5 km of terrain to avoid uneven representation among transects and ensure that the team has sufficient time to return to base camp (Janečka et al., 2011).

      Janecka J, Jackson R, Yuquang Z, Li D, Munkhtsog B, Buckley-Beason V, Murphy W. 2008. Population monitoring of snow leopards using noninvasive collection of scat samples: A pilot study. Animal Conservation 11:401–411. doi:10.1111/j.1469-1795.2008.00195.x

      Janečka JE, Munkhtsog B, Jackson RM, Naranbaatar G, Mallon DP, Murphy WJ. 2011. Comparison of noninvasive genetic and camera-trapping techniques for surveying snow leopards. J Mammal 92:771–783. doi:10.1644/10-MAMM-A-036.1

      Kays R, Arbogast BS, Baker‐Whatton M, Beirne C, Boone HM, Bowler M, Burneo SF, Cove MV, Ding P, Espinosa S, Gonçalves ALS, Hansen CP, Jansen PA, Kolowski JM, Knowles TW, Lima MGM, Millspaugh J, McShea WJ, Pacifici K, Parsons AW, Pease BS, Rovero F, Santos F, Schuttler SG, Sheil D, Si X, Snider M, Spironello WR. 2020. An empirical evaluation of camera trap study design: How many, how long and when? Methods Ecol Evol 11:700–713. doi:10.1111/2041-210X.13370

      Regarding the discussion, the authors have information for 1) spatial distribution, 2) temporal overlap, 3) dietary requirement, they should use this information to support the discussion. Instead, sometimes it feels that authors go by exclusion or make a suggestion. For example: the authors have found dietary and temporal overlap between two apex predators (i.e., wolf and snow leopard), and they said that this suggests that spatial partitioning is responsible for their successful coexistence in this area (lines 195-196). But why "suggesting", what the co-occurrence metric says? Another example: "Apex carnivores and mesocarnivores showed substantial overlap in time overall, indicating that spatial and dietary partitioning may play a large role in facilitating their coexistence" (lines 241 - 242). However, this should not be a suggestion: your Sørensen similarity index is low proving spatial divergence. So, when data supports the hypotheses, the authors should be firmer in their discussion. Generally, when reading the discussion, it felt that a figure summarizing the partitioning would be much needed to digest which type of partitioning strategy the species are using.

      Thank you for your thoughtful comments and suggestions.

      (1) We appreciate your insights on the discussion section, particularly concerning the interpretation of our findings on spatial distribution, temporal and dietary overlap. We acknowledge the need for clearer interpretation of our findings. We have revised the discussion section to provide more direct support. For example, in line 294-295, we modify it as “We found dietary and temporal overlap among apex carnivores, showing that spatial partitioning is responsible for their successful coexistence in this area.” In line 341-342, we modify it as “Apex carnivores and mesocarnivores exhibited considerable overlap in time overall, showing that spatial and dietary partitioning may play a large role in facilitating their coexistence.”

      (2) We appreciate your suggestion regarding the inclusion of a figure summarizing partitioning strategies among species discussed. In our study, we organized the overlap index of space, time, and diet among carnivores in Table 3, which directly reflects the overlap of carnivore species in these three dimensions by summarizing them in a single table. Additionally, Figure 3 illustrates the activity patterns and overlap among species, while Figure 4 displays the primary prey of carnivores and the frequency of food utilization.

      About lines 228 - 229, just as a side note, the Pallas's cat, as the red fox, selects the environment according to a greater distribution of prey species, while also selecting primarily meadows and natural environment (Greco et al. 2022, Journal of wildlife management) additionally it is not strictly diurnal (Anile et al. 2020, Wildlife Research; Greco et al. 2022, Journal of wildlife management). Regarding the Pallas's cat and its exclusion from the temporal and spatial analyses, can you specify how many independent detection events you had?

      Thanks a lot for this valuable comment!

      (1) We appreciate the references to recent studies highlighting its habitat preferences and activity patterns. We have revised the manuscript to acknowledge these points and provide context regarding its habitat selection strategies. Specifically, we modify it as follow: “Pallas’s cat hunts during crepuscular and diurnal periods, inhabits meadow with greater prey abundance (Anile et al., 2021; Greco et al., 2022; Ross et al., 2019).”

      (2) The low detection rate of Pallas's cat (0.072) identified by single-species occupancy model raised concerns regarding the reliability of the results. The estimated high standard errors for each environmental variable and the wide confidence intervals around the detection rate further indicated potential bias or randomness. Consequently, we made the decision to exclude the Pallas's cat data from further analysis. Upon closer examination of the Pallas's cat data, it became evident that out of 319 camera sites surveyed, only 27 sites detected the presence of Pallas's cat. Notably, only 3 out of 193 sites in Gansu Province recorded detections, while Qinghai Province had 24 detections out of 126 sites. This skewed distribution of data likely contributed to the unsatisfactory outcomes observed in our models.

      About the diet and results of scat analyses, have you found any sign of intra-guild predation (i.e., apex predators that kill and sometimes consume subordinate carnivores to reduce competition), this could actually represent proof of competition and spatial overlap.

      Thanks a lot for your thoughtful comments!

      We observed intraguild predation in the diet of wolves and snow leopards. Specifically, we found the presence of Pallas’s cat, red fox, and Tibetan fox in the diet of wolfs, and Pallas’s cat, Eurasian Badger and Tibetan fox in the diet of snow leopard. However, these intraguild predation events accounted for only 1.89% of the diet composition of apex carnivores. We suggest that the rarity of these observations may be influenced by various factors and does not necessarily provide sufficient evidence of competition and spatial overlap. Therefore, further data collection and in-depth research are needed to better understand this phenomenon.

      Some minor comments: Figure 2 is really nice, while some abbreviations are missing in the caption of Table 2.

      Thank you for your feedback and positive comments on Figure 2. Unfortunately, we have removed Figure 2 from the manuscript. Due to the inclusion of prey abundance and human disturbance as occupancy covariates, these variables were derived solely from infrared camera trap data and did not encompass a comprehensive dataset across the entire national park. Therefore, we were unable to accurately spatially project for carnivore species occupancy probability in nature park.

      We apologize for the oversight that the abbreviations missing in the caption of Table 2. We have added the missing abbreviations to the caption of Table 2 as follow: “Abbreviations: Disrd-distance to roads, Ele-elevation, NDVI-normalized difference vegetation index, Rix- roughness index, hdis-human disturbance.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Satoshi Yamashita et al., investigate the physical mechanisms driving tissue bending using the cellular Potts Model, starting from a planar cellular monolayer. They argue that apical length-independent tension control alone cannot explain bending phenomena in the cellular Potts Model, contrasting with the vertex model. However, the evidence supporting this claim is incomplete. They conclude that an apical elastic term, with zero rest value (due to endocytosis/exocytosis), is necessary in constricting cells and that tissue bending can be enhanced by adding a supracellular myosin cable. Notably, a very high apical elastic constant promotes planar tissue configurations, opposing bending.

      Strengths:

      - The finding of the required mechanisms for tissue bending in the cellular Potts Model provides a more natural alternative for studying bending processes in situations with highly curved cells.

      - Despite viewing cellular delamination as an undesired outcome in this particular manuscript, the model's capability to naturally allow T1 events might prove useful for studying cell mechanics during out-of-plane extrusion.

      We thank the reviewer for the careful comments and insightful suggestions.

      Weaknesses:

      - The authors claim that the cellular Potts Model is unable to obtain the vertex model simulation results, but the lack of a substantial comparison undermines this assertion. No references are provided with vertex model simulations, employing similar setups and rules, and explaining tissue bending solely through an increase in a length-independent apical tension.

      Studies cited in a previous paragraph included the simulations employing the increased length-independent apical tension. For the sake of clarity, we added the citation to them as below.

      P4L174: “In contrast to the simulations in the preceding studies (Sherrard et al., 2010; Conte et al., 2012; Perez-Mockus et al., 2017; Pérez-González et al., 2021), our simulations could not reproduce the apical constriction”.

      We did not copy the parameters of the vertex models in the preceding studies because we also found that the apical, lateral, and basal surface tensions must be balanced otherwise the epithelial cell could not maintain the integrity (Figure 1—figure supplement 1), while the ratio was outside of the suitable range in the preceding studies.

      - The apparent disparity between the two models is attributed to straight versus curved cellular junctions, with cells with a curved lateral junction achieving lower minimum energies at steady-state. However, a critical discussion on the impact of T1 events, allowing cellular delamination, is absent. Note that some of the cited vertex model works do not allow T1 events while allowing curvature.

      We appreciate the comment and added it to the discussion as suggested.

      P12L301: “Even when the vertex model allowed the curved lateral surface, the model did not assume the cells to be rearranged and change neighbors, limiting the cell delamination (Pérez-González et al., 2021).”

      P12L311: “Note that the vertex model could also be extended to incorporate the curved edges and rearrangement of the cells by specifically programming them, and would reproduce the cell delamination. That is, we could find the importance of the balanced pressure because the cellular Potts model intrinscally included a high degree of freedom for the cell shape, the cell rearrangement, and the fluctuation.”

      - The suggested mechanism for inducing tissue bending in the cellular Potts Model, involving an apical elastic term, has been utilized in earlier studies, including a cited vertex model paper (Polyakov 2014). Consequently, the physical concept behind this implementation is not novel and warrants discussion.

      The reviewer is correct but Polyakov et al. assumed “that the cytoskeletal components lining the inside membrane surfaces of the cells provide these surfaces with springlike elastic properties” without justification. We assumed that the myosin activity generated not the elasticity but the contractility based on Labouesse et al. (2015), and expected that the surface elasticity corresponded with the membrane elasticity. Also, in the physical concept, we clarified how the contractility and the elasticity differently deformed the cells and tissue, and demonstrated why the elasticity was important for the apical constriction. We added it to the discussion as below.

      P12L316: “In the preceding studies, the apically localized myosin was assumed to generate either the contractile force (Sherrard et al., 2010; Conte et al., 2012; Perez-Mockus et al., 2017; Pérez-Vonzález et al., 2021) or the elastic force (Polyakov et al., 2014; Inoue et al., 2016; Nematbakhsh et al., 2020). However, the limited cell shape in the vertex model made them similar in terms of the energy change during the apical constriction, i.e., the effective force to decrease the apical surface. In this study, we showed that the contractile force and the elastic force differently deformed the cells and tissue, and demonstrated why and how the elasticity was important for the apical constriction.”

      - The absence of information on parameter values, initial condition creation, and boundary conditions in the manuscript hinders reproducibility. Additionally, the explanation for the chosen values and their unit conversion is lacking.

      We agree with the comment.

      For the initial configuration, we added an explanation to Tissue deformation by increased apical contractility with cellular Potts model section in the Results as below.

      P4L170: “A simulation started from a flat monolayer of cells beneath the apical ECM, and was continued until resulting deformation of cells and tissue could be evaluated for success of failure of reproducing the apical constriction.”

      For the parameter values we added a section “Parameters for the simulations” in the Methods.

      For the parameters unit conversion, we did not measure the surface tension and cell pressure in an actual tissue and thus could not compare the parameters to the actual forces. Instead, we varied the parameters and demonstrated that the apical constriction was reproduced with the wide range of the parameter values. We added it to the discussion as below.

      P12L310: “It succeeded with a wide range of parameter values, indicating a robustness of the model.”

      Reviewer #2 (Public Review):

      Summary:

      In their work, the authors study local mechanics in an invaginating epithelial tissue. The mostly computational work relies on the Cellular Potts model. The main result shows that an increased apical "contractility" is not sufficient to properly drive apical constriction and subsequent tissue invagination. The authors propose an alternative model, where they consider an alternative driver, namely the "apical surface elasticity".

      Strengths:

      It is surprising that despite the fact that apical constriction and tissue invagination are probably most studied processes in tissue morphogenesis, the underlying physical mechanisms are still not entirely understood. This work supports this notion by showing that simply increasing apical tension is perhaps not sufficient to locally constrict and invaginate a tissue.

      We thank the reviewer for recognizing the importance and novelty of our work.

      Weaknesses:

      The findings and claims in the manuscript are only partially supported. With the computational methodology for studying tissue mechanics being so well developed in the field, the authors could probably have done a more thorough job of supporting the main findings of their work.

      We thank the reviewer for the careful assessment and suggestions. However our simulation was computationally expensive, modeling the epithelium in an analytically calculable expression requires a lot of work, and it is beyond the scope of the present study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Reference line 648: Correct the author's name (Pérez-González).

      We thank the reviewer and corrected the reference.

      (2) "Pale" colors are challenging to discern.

      We updated the figures.

      (3) Figure 1j: What does the yellow color in the cellular junction represent?

      We used the apical lateral site colored yellow in Fig. 1e-f’ to simulate the effect of the adherens junction. We updated the figure legend.

      (4) Figure 2c - left: Why is there a red apical junction?

      Our simulation model marked the apical junction in the initial configuration and updated the marking based on connectedness to surrounding other site marked as apical in the same cell. But when a cell was once delaminated and lost its apical junction, any surface site not adjacent to other epithelial cells were marked as basal junction because they were not adjacent to the apical junction.

      We added it to Cellular Potts model with partial surface elasticity section in the Methods as below.

      P17L430: “To simulate the differential phyisical properties of the apical, lateral, and basal surfaces, the subcellular locations are marked automatically, and the marking is updated during the simulation. In each cell, sites adjacent to different cells but not to the medium are marked as lateral.

      At the initial configuration, sites adjacent to the apical ECM are marked as apical, and during the simulation, sites adjacent to medium and other apical sites in the same cell are marked as apical.

      Rest of sites which are adjacent to medium but not marked as apical are marked as basal.

      Therefore, once a cell is delaminated and loses its apical surface, afterwards all sites in the cell adjacent to the medium are marked as basal even if it is adjacent to the apical ECM or the outer body fluid.”

      (5) Figure 4a: The snapshots are not in a steady state but in the middle of deformation. Is the time the same for all snapshots? The motivation to change P_0a is related to endocytosis. However, this could be achieved by decreasing P_0a to a non-zero value. Here, in the more drastic limit, the depth (a measure of bending) is very slight, approximately half of a cell size. What physically limits further invagination? Is it the number of cells or the range of parameters under study?

      The time length was the same for simulations in each figure, and we add it to Parameters for the simulations section in Method as below.

      P18L466: “In each figure, snapshots of the simulations show deformation by the same time length unless specified.”

      For P_0a, the reviewer is correct and the iterated ratcheting may decrease P_0a step by step instead of making it 0 immediately. Still, with P_a0 >0, the energy function and its derivative are both increasing with respect to the apical width as long as P_a > P_a0, and thus the apical shrinkage would be synchronized, even though the deformation would be smaller. We also run simulations by decreasing P_0a to 0.6 times the initial P_a, and observed smaller deformation as expected. On the other hand, the non-zero P_0a made the invagination deeper when it was combined with the effect of surrounding supracellular myosin cable, maybe due to a resistance of the apical surface against compression. One of the novel and important finding in this study is the synergetic effect of the elasticity-based apical constriction and the surrounding supracellular myosin cable. To demonstrate that the deep invagination was not due to the apical surface resistance against the compression, we showed the simulations with P_a0 = 0.

      For the conditions for further invagination, it may include the number of cells, a ratio between the cell height and width (Figure 5—figure supplement 1), interaction with ECM (Figure 5—figure supplement 2), etc. For the parameter, there might be an upper limit (Figure 4). We did not test the number of cells because of its computational cost. Among the conditions we tested, we found the planar compression by surrounding supracellular myosin the most influential rather than the mechanical property of apically constricting cells themselves.

      How each condition and parameter contributes to the invagination shall be studied in future. We added it to the conclusion as below.

      P15L395: “The depth, curvature, and speed of the invagination might be influenced by the cell shape, configuration, and parameters, and how each condition contributes to the invagination shall be studied in future.”

      (6) Figure 6b: What does the cell-surface color represent? If the idea was to represent junction tension, it would be clearer to color the junctions only.

      The junction tension may vary differently in different situations. For example, T1 transition is accompanied by enriched myosin along a shrinking cell-cell junction, and the junction bears higher tension, but other junctions of the same cell do not and thus the cell does not decrease its apical surface. In chick embryo neural tube closure, the junction tension is also polarized, and the cells shrink the apical surface along medial-lateral axis, driving the apical constriction (Nishimura et al., 2012, doi:10.1016/j.cell.2012.04.021). In the case of Drosophila embryo tracheal invagination, the cells shrank their apical surface isotropically (Figure 6a). If the junction tension was responsible for the shrinkage, all junctions of the cell must bear higher tension. Based on this assumption, the junction tension was averaged in each cell to check if the tracheal cells bore the higher average tension than surrounding cells.

      We also plotted stress tensor and calculated nematic order to check if there was radial or encircling tension alignment in the tracheal pit, but there was not.

      (7) Figure 6c: What does the junction color represent here?

      The junction color represent the relative junctional tension. We updated the figure legend.

      (8) Figure 6d-e: It is challenging to understand which error bar corresponds to each dataset.

      We updated the figure.

      (9) What is the definition of relative pressure?

      The geometrical tension inference method assumes that the tissue is in mechanical equilibrium and a sum of the junctional tensions and cell pressures pulling/pushing a vertex (tricellular junction) is 0. Therefore the calculated tensions and pressures are proportional to each other but not absolute values. We added it to the 3D Bayesian tension inference section of Methods as below.

      P24L567: “Since Equation 13 and Equation 14 only evaluate the balance among the forces, it cannot estimate an absolute value but a relative value of the tension and pressure.”

      (10) In the main text, it is mentioned that a large Es (apical elastic constant) leads to flat surfaces, avoiding bending, but the abstract says "strong apical surface tension," which, according to the rest of the text, would seem to be J_apical. Clarification is needed.

      The surface tension includes both of the surface contractility and the surface elasticity.

      We added it to Extended cellular Potts model to simulate epithelial deformations section in the Results as below.

      P3L122: “Note that in some studies the tension and the contractility are considered as equivalent, but they are distinguished in this study.”

      and

      P4L151: “The energy H included only the terms of the contact energy (Equation 1) and the area constraint (Equation 5), but the surface elasticity (Equation 2) nor (Equation 3) was not included, and thus the surface tension was determined by the contact energy.”

      Reviewer #2 (Recommendations For The Authors):

      (1) The model used is rather specific and it is rather confusing whether the issue is in the methodology or fundamental biophysics of apical constriction. For instance, one of the main narratives of the manuscript is that the Cellular Potts model better predicts apical constriction and tissue invagination than the vertex model. As I understand it, and as the authors state in p7 (line 210), "the difference between the vertex model and the cellular Potts model results was due to the straight lateral surface...". I assume that if apical constriction and tissue invagination were modelled with a vertex model with curved edges, while also allowing for cell rearrangements out of the tissue plane (some sort of epithelium-to-mesenchyme transition), the vertex model would yield exactly the same results as in the authors' cellular Potts model. If my understanding is correct, the authors should change the narrative of their manuscript and focus more on the comparison of a model with flat vs. curved edges, with "contractility" vs. "surface elasticity", with patterned apical contractility vs. non-patterned contractility (see my comment in point 2 below)... and not on comparison between CPM and VM.

      We appreciate the comments. The reviewers is correct that the vertex model can include the curved edges and the cell rearrangement, and it would reproduce the result of our cellular Potts model simulations. For the cellular Potts model, there was no need to specifically design how much the cell surface could be curved in a large arc, zigzag, or other shape, and that enabled us to find the conditions of delamination and bending.

      We added it to the discussion as below.

      P12L311: “Note that the vertex model could also be extended to incorporate the curved edges and rearrangement of the cells by specifically programming them, and would reproduce the cell delamination. That is, we could find the importance of the balanced pressure because the cellular Pott’s model intrinscally included a high degree of freedom for the cell shape, the cell rearrangement, and the fluctuation.”

      (2) About physics... and I think this is a really important point: one of the observations in the model was that in the "contractilty" model, only "edge cells" shrank its apical surface, while inner cells remained quadrilateral. Related to this, the authors say that one of the requirements for proper apical constriction is a mechanism that "simulataneously shrinks the apical surface among cells in a cluster". What would happen if the authors assumed patterned contractility, meaning that cells in the center of the cluster would be most apically-contractile, while those further away from the center, would not be contractile? Features like this were investigated in studies of ventral-furrow invagination [see, for instance, Spahn and Reuater PLOS ONE (2013) and Rauzi et al. Nat Commun (2015)-Fig. S13d].

      We thank the reviewer for the critical comment, and ran simulations with the patterned apical contractility. The apical contractility following a gradient of parabola shape succeeded in the simultaneous apical shrinkage. However, it was weak against fluctuations and the cells were delaminated by chance.

      We added it to Apical constriction by modified apical elasticity section in the result as below.

      P9L252: “We also tested another model for the simultaneous apical shrinkage, a gradient contractility model (Spahn and Reuter, 2013; Rauzi et al., 2015). If the inner cells bear higher apical surface contractility than the edge cells, that inner cells may shrink their apical surface. To synchronize the apical shrinkage, the apical contractility must follow a parabola shape gradient. Even though the gradient contractility enabled the cells to shrink the apical surface simultaneously, often some of the cells shrank faster than neighbors and were delaminated by chance (Figure 4—figure Supplement 1).”

      (3) The quality of the figures should be improved. Especially, Figure 3 and the related explanation in lines 183-192. This explanation is way too complicated and it is not clear what Figure 3c shows. For instance: if the arrows are indeed showing contractile forces (as written in the caption) then they are not illustrated correctly, but should be tangential to the cell membrane.

      We updated the figure.

      (4) The figures mostly show steady-state cross-sections from simulations. I miss a more dedicated study with model parameters being varied through wider ranges and some phase diagrams being shown etc. Also, some results could probably be supported by analytic calculations. For instance, the condition for stability (discussed in p4 lines 145-151), cells' preferred aspect ratio, cells' preferred "wedgeness" i.e., local curvature etc... I am sure some of these, if not all, could be calculated analytically and then these analytic results could help to interpret the phase diagrams.

      For the simulation results shown in the figures, we were not sure if the simulations results were in a steady state or not. We added it to Tissue deformation by increased apical contractility simulated with cellular Potts model section in the Results as below.

      P4L170: “A simulation started from a flat monolayer of cells beneath the apical ECM, and was continued until resulting deformation of cells and tissue could be evaluated for success of failure of reproducing the apical constriction.”

      For the ranges of parameters, we ran the simulation in wider range and showed results from sub-range. We added it to Parameters for the simulations section in Methods as below.

      P18L464: “The parameters were varied in a range, and the figures showed simulations with parameter values within a sub-range so that the results showed both success and failure in a development of interest.”

      For the analytical calculations, the Figure 3f shows a kind of phase diagram for shapes of a single cell. To clarify this, we rephrased “map of cell shapes” to “Phase diagram of cell shapes” in the figure legend, and added an explanation to the Results section as below.

      P6L207: “For the analysis of the cell shape in motion, we plotted a phase diagram for shapes of a single cell (Figure 3f).”

      For the analytical evaluation of the cellular Potts model simulations, there was a study doing similar but it concerned a cell of isotropic shape in a steady state (Magno et al., 2015, doi:10.1186/s13628-015-0022-x). Also, our simulation framework is computationally expensive and we could not vary the parameters in fine resolution. Therefore we could not include it in this study.

      (5) I am not sure about the terminology "contractility" vs. "elasticity". In Farhadifar et al. (2007) "contractility" is described by a squared apical-perimeter energy term, while in this work, the authors describe it by a surface-energy-like term.

      In general, elasticity is the ability of a material to resist against deformation and to return to its original shape/size. In Farhadifar et al. (2007), the cell apical area was assigned the area elasticity in this meaning. For the contractility, it is the ability to decrease the size/length, and thus it could be either expressed in linear or quadratic dependent on the modeling. In this study, we assumed cell-cell/cell-ECM adhesion and myosin activity to generate the surface contractility, and thus employed the linear expression. In Farhadifar et al. (2007) it was described as a line tension.

      We used the terms surface ‘elasticity’ and ‘contractility’ as distinctive elements composing the surface ‘tension’. We added it Extended cellular Potts model to simulate epithelial deformations section in the Results as below.

      P3L122: “Note that in some studies the tension and the contractility are considered as equivalent, but they are distinguished in this study.”

      (6) It is not entirely clear what are apical, basal, lateral, and cell "perimeters". This is a 2D model, so I assume all P-s are in fact interface lengths. In either case, this needs to be explained more clearly.

      We updated the explanation in Extended cellular Potts model to simulate epithelial deformations section in the Results as below.

      P3L111: “The cell's perimeter was partitioned automatically based on adjacency with other cells, and it was marked as apical, lateral, basal. Also, apico-lateral sites were marked as a location for the adherens junction. This cell representation also cast the vertical section of the cell. Therefore an area of the cell corresponded with a body of the cell, and a perimeter of the cell corresponded with the cell surface. Likewise the apical, lateral, and basal parts of the perimeter corresponded with the apical surface, cell-cell interface, and the basal surface of the cell respectively.”

      (7) The term H_{mc} is not clear at all. Why is this term called potential energy? What is U(i)? What is the exact biophysical interpretation of this term in 2D vs 3D?

      In 3D, the supracellular myosin cable is formed encircling the cells deformed by the apical constriction. Shrinking of the supracellular myosin cable makes the circle small, and it moves the cable toward the center of the circle. To simulate this motion of the supracellular myosin cable in the 2D cross section, we assigned the force exerted on the adherens junction of the boundary cells pulling toward the center, and because the force is relative to the position of the adherens junction and the center, it was expressed by the potential energy in the simulation.

      We updated Extended cellular Potts model to simulate epithelial deformation section in Results and Cellular Potts model with potential energy section in Methods as below.

      P4L140: “The potential energy was defined by a scalar field which made a horizontal gradient decreasing toward the center,”

      and

      P17L449: “In 3D, tension on a circular actomyosin cable would shrink the circle, and the shrinkage would pull the cable toward the center of the circle. In 2D cross section, the cable is pulled horizontally toward the middle line.”

      (8) Highten->increased

      We updated the text.

      (9) "It seems natural to consider that the myosin generates a force proportional to its density but not to the surface width nor the strain". This sentence should be supported by a reference. Also, if the force is proportional to myosin density, then it must depend on surface width, since density, I assume, is the number of motors per area.

      For the myosin density and generated force, in all preceding studies cited in this manuscript and others in the extent of our knowledge, the myosin and actin filaments density visualized by staining or labeling had been assumed relevant to the generated contractility without references. Therefore it might be well established and shared assumption.

      For the independence from the surface width and strain, the review comment is correct, but the results would be the same. If we presumed that the number of motors on the apical surface was constant in a cell during the apical constriction, then the density would increase when the apical surface was contracted, and thus it would make the apical contractility more unbalanced and promote the delamination. We added it to the results and discussion as below.

      P4L166: “For the sake of simplicity, we ignored an effect of the constriction on the apical myosin density, and discussed it later.”

      P14L328: “In our model, for the sake of simplicity, we ignored an effect of the constriction on the apical myosin density. If we presumed that the apical myosin would be condensed by the shrinkage of the apical surface, it would increase the apical tension in the shrinking cell and is expected to promote the cell delamination further. Therefore it would not change the results.”

      Reviewing Editor (Recommendations For The Authors):

      Please note also the following excerpts from discussions amongst the reviewers and the Reviewing Editor:

      Regarding Reviewer #2's Point 2:

      I believe the authors have assumed patterned contractility in their simulations, and this is shown by the "pale blue" cell color (see also lines 162-163). However, as Reviewer #2 points out in their point 2), the pale colors are very hard to see and therefore easy to miss.

      We updated figure coloring and also add the gradient pattern of contractility.

      Regarding Reviewer #2's point 5:

      It is indeed unconventional to call the "J" terms contractility, they are usually called contact energy or adhesive energy.

      In this study, we included both of the contact energy of cell-cell/cell-ECM adhesion and actomyosin activity in the surface contractility, and used the “J” term as it was conventional in the cellular Potts model.

      On the other hand, due to the parameters chosen for J_apical and J_basal in the pale blue cells, the apical membrane area will tend to shrink and the basal membrane will tend to enlarge. Because the lateral membrane energy J_lateral is constant among all cells (I think?), this will effectively drive cells to apically contract in the center.

      That expectation was an initial motivation of our study, but we found that the differential J alone could not drive the cells to apically contract in the center.

      I agree that extra clarification by the authors would be very helpful here.

      Reviewer #2:

      Regarding the patterned contractility: indeed, I missed this point (the pale blue region is really poorly visible).

      Nevertheless, it seems that contractility in the authors' model changes in a step-like fashion.

      [...] There may be important differences between furrowing under step-like patterning profile versus smooth "bell-like" patterning (see Supplementary Figure 13 in Rauzi et al. Nat Commun 2015). In particular, in the case of a step-like patterning, [there are] constrictions of side cells (similar to what the authors in this manuscript report), whereas in the bell-like patterning, [...] such side constrictions [do not occur].

      As replied to the reviewer #2 comment (2), we added the simulations with gradient-pattern contractility.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Main points:

      (1) We have added data for fructose in Fig. 1

      (2) We have added sta1s1cs (red stars and NS) comparing Tp between fed and refed flies. 

      (3) We have modified the figure for each point to the opened small circles.

      (4) We have moved the data from Fig. S3 to Fig. 2 and 3.

      (5) We have added the schema1c diagrams depic1ng behavioral assay in Fig. S1.

      (6) We have added heatmaps for WT and Gr64f-Gal4>UAS-CsChrimson flies in Fig. S2.

      (7) We have added Orco1 mutant data in Fig. S4.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper presents valuable findings that gustation and feeding state influence the preferred environmental temperature preference in flies. Interestingly, the authors showed that by refeeding starved animals with the non-nutritive sugar sucralose, they are able to tune their preference towards a higher temperature in addition to nutrient-dependent warm preference. The authors show that temperature-sensing and sweet-sensing gustatory neurons (SGNs) are involved in the former but not the latter. In addition, their data indicate that pep3dergic signals involved in internal state and clock genes are required for taste-dependent warm preference behavior.

      The authors made an analogy of their results to the cephalic phase response (CPR) in mammals where the thought, sight, and taste of food prepare the animal for the consumption of food and nutrients. They further linked this behavior to core regulatory genes and peptides controlling hunger and sleep in flies having homologues in mammals. These valuable behavioral results can be further inves3gated in flies with the advantage of being able to dissect the neural circuitry underlying CPR and nutrient homeostasis.

      Strengths: 

      (1) The authors convincingly showed that tasting is sufficient to drive warm temperature preference behavior in starved flies and that it is independent of nutrient-driven warm preference. 

      (2) By using the genetic manipulation of key internal sensors and genes controlling internal feeding and sleep states such as DH44 neurons and the per genes for example, the authors linked gustation and temperature preference behavior control to the internal state of the animal. 

      Weaknesses: 

      (1) The title is somewhat misleading, as the term homeostatic temperature control linked to gustation only applies to starved flies. 

      We agree with the reviewer's suggestion and have changed the title to "Taste triggers a homeostatic temperature control in hungry flies".

      (2) The authors used a temperature preference assay and refeeding for 5 minutes, 10 minutes, and 1 hour.

      Experimentally, it makes a difference if the flies are tested immediately after 10 minutes or at the same 3me point as flies allowed to feed for 1 hour. Is 10 minutes enough to change the internal state in a nutrition-dependent manner? Some of the authors' data hint at it (e.g. refeeding with fly food for 10 minutes), but it might be relevant to feed for 5/10 minutes and wait for 55/50min to do the assays at comparable time points. 

      Thank you for your suggestions. The temperature preference behavioral test itself takes 30 minutes from the time the flies are placed in the apparatus until the final choice is made. This means that after the hungry flies have been refed for 5 minutes, they will determine their preferred temperature within 35 minutes. It has been shown that insulin levels peak at 10 minutes and gradually decline (Tsao, et al., PLoS Genetics 2023). However, it is unclear how subtle insulin levels affect behavior and how quickly the flies are able to consume food. These factors may contribute to temperature preference in flies. Therefore, to minimize "extraneous" effects, we decided to test the behavioral assay immediately after they had eaten the food. We have noted in the material and method section that why we chose the condition based on behavior duration and insulin effect. 

      (3) A figure depicting the temperature preference assay in Figure 1 would help illustrate the experimental approach. It is also not clear why Figure 1E is shown instead of full statistics on the individual panels shown above (the data is the same). 

      We have revised Figure 1A and added statistics in Figure 1BCD. We also added a figure depicting the temperature preference assay (Fig. S1).

      (4) The authors state that feeding rate and amount were not changed with sucralose and glucose. However, the FLIC assay they employed does not measure consumption, so this statement is not correct, and it is unclear if the intake of sucralose and glucose is indeed comparable. This limits some of the conclusions. 

      We agree and removed “amount” and have revised the MS. 

      (5) The authors make a distinction between taste-induced and nutrient-induced warm preference. Yet the statistics in most figures only show the significance between the starved and refed flies, not the fed controls. As the recovery is in many cases incomplete and used as a distinction of nutritive vs nonnutritive signals (see Figure 1E) it will be important to also show these additional statistics to allow conclusions about how complete the recovery is. 

      We agree with the comments and have revised the MS and figures. 

      (6) The starvation period used is ranging from 1 to 3 days, as in some cases no effect was seen upon 1 day of starvation (e.g. with clock genes or temperature sensing neurons). While the authors do provide a comparison between 18-21 and 26-29 hours old flies in Figure S1, a comparison for 42-49 and 66-69 hours of starvation is missing. This also limits the conclusion as the "state" of the animal is likely quite different after 1 day vs. 3 days of starvation and, as stated by the authors, many flies die under these conditions.  

      We mainly used 2 overnights of starvation.  Some flies (e.g. Ilp6 mutants) were completely healthy even after 2 overnights of starvation, we had to starve them for 3 overnights. For example, Ilp6 mutants needed 3 overnights of starvation to show a significant difference Tp between fed and starved flies. On the other hand, some flies (e.g. w1118 control flies) were very sick after 2 overnights of starvation, we had to starve them for one overnight. Therefore, the starvation conditions which we used for this manuscript are from 1- 3-overnights.

      First, we confirmed the starvation time by focusing on Tp which resulted in a sta1s1cally significant Tp difference between fed and starved flies; as men1oned above, flies prefer lower temperatures when starvation is prolonged (Umezaki et al., Current Biology 2018). Therefore, if Tp was not statistically different between fed and starved flies, we extended the starva1on 1me from 1 to 3 overnights. Importantly, we show in Fig. S3 that the dura1on of starvation did not affect the recovery effect. Furthermore, since control flies do not survive 42-49 or 66-69 hours of starvation, we can not test the reviewer's suggestion. We have carefully documented the conditions in the Material and method and figure legends.

      (7) In Figure 2, glucose-induced refeeding was not tested in Gr mutants or silenced animals, which would hint at post-ingestive recovery mechanisms related to nutritional intake. This is only shown later (in Figure S3) but I think it would be more fitting to address this point here. The data presented in Figure S3 regarding the taste-evoked vs nutrient-dependent warm preference is quite important while in some parts preliminary. It would nonetheless be justified to put this data in the main figures. However, some of the conclusions here are not fully supported, in part due to different and low n numbers, which due to the inherent variability of the behavior do not allow statistically sound conclusions. The authors claim that sweet GRNs are only involved in taste-induced warm preference, however, glucose is also nutritive but, in several cases, does not rescue warm preference at all upon removal of GRN function (see Figures S3A-C). This indicates that the Gal4 lines and also the involved GRs are potentially expressed in tissues/neurons required for internal nutrient sensing. 

      Thank you for your suggestion. We have added Figure S3ABC (glucose refeeding using Gr mutants and silenced animals) to Figure 2. There is no low N number since we tested > 5 times, i.e. >100 flies were tested. Tp may have a variation probably due to the effect of starvation on their temperature preference. 

      We did not mention that "The authors claim that sweet GRNs are only involved in taste-induced warm preference...". However, our wri1ng may not be clear enough. We agree that "...GRs may be expressed in tissues/neurons required for internal nutrient sensing. ..."  We have rewritten and revised the section.  

      (8) In Figure 4, fly food and glucose refeeding do not fully recover temperature preference after refeeding. With the statistical comparison to the fed control missing, this result is not consistent with the statement made in line 252. I feel this is an important point to distinguish between state-dependent and taste/nutrition-dependent changes.  

      We inserted the statistics and compared between Fed and other conditions. 

      (9) The conclusion that clock genes are required for taste-evoked warm preference is limited by the observation that they ingest less sucralose. In addition, the FLIC assay does not allow conclusions about the feeding amount, only the number of food interactions. Therefore, I think these results do not allow clear-cut conclusions about the impact of clock genes in this assay.  

      We agree and remove “amount” and have revised the MS. The per01 mutants ate (touched) sucralose more often than glucose. On the other hand, 1m01 mutants ate glucose more often than sucralose (Figure S6BC). However, these mutants s1ll showed a similar TP pattern for sucralose and glucose refeeding (Fig. 5CD). The results suggest that the 1m01 flies eat enough amount of sucralose over glucose that their food intake does not affect the TP behavioral phenotype. We have rewritten and revised the section.

      (10) CPR is known to be influenced by taste, thought, smell, and sight of food. As the discussion focused extensively on the CPR link to flies it would be interesting to find out whether the smell and sight of food also influence temperature preference behavior in animals with different feeding states.  

      We have added the data using Olfactory receptor co-receptor (Orco1) mutant, which lack olfaction, in Fig. S4. They failed to show the taste-evoked warm preference, but exhibited the nutrient-induced warm preference. Therefore, the data suggest that olfactory detection is also involved in taste-evoked warm preference. On the other hand, "seeing food" is probably more complicated, since light dramatically affects temperature preference behavior and the circadian clock that regulates temperature preference rhythms. Therefore, it will not be unlikely to draw a solid conclusion from the short set of experiments. We will address this issue in the next study.

      (11) In the discussion in line 410ff the authors claim that "internal state is more likely to be associated with taste-evoked warm preference than nutrient-induced warm preference." This statement is not clear to me, as neuropeptides are involved in mediating internal state signals, both in the brain itself as well as from gut to brain. Thus, neuropeptidergic signals are also involved in nutrient-dependent state changes, the authors might just not have identified the peptides involved here. The global and developmental removal of these signals also limits the conclusions that can be drawn from the experiments, as many of these signals affect different states, circuits, and developmental progression.  

      We agree with the comments. We have removed the sentences and revised the MS.  

      Reviewer #2 (Public Review): 

      Animals constantly adjust their behavior and physiology based on internal states. Hungry animals, desperate for food, exhibit physiological changes immediately upon sensing, smelling, or chewing food, known as the cephalic phase response (CPR), involving processes like increased saliva and gastrointestinal secretions. While starvation lowers body temperature, the mechanisms underlying how the sensation of food without nutrients induces behavioral responses remain unclear. Hunger stress induces changes in both behavior and physiological responses, which in flies (or at least in Drosophila melanogaster) leads to a preference for lower temperatures, analogous to the hunger-driven lower body temperature observed in mammals. In this manuscript, the authors have used Drosophila melanogaster to investigate the issue of whether taste cues can robustly trigger behavioral recovery of temperature preference in starving animals. The authors find that food detection triggers a warm preference in flies. Starved flies recover their temperature preference after food intake, with a distinction between partial and full recovery based on the duration of refeeding. Sucralose, an artificial sweetener, induces a warm preference, suggesting the importance of food-sensing cues. The paper compares the effects of sucralose and glucose refeeding, indicating that both taste cues and nutrients contribute to temperature preference recovery. The authors show that sweet gustatory receptors (Grs) and sweet GRNs (Gustatory Receptor Neurons) play a crucial role in taste-evoked warm preference. Optogenetic experiments with CsChrimson support the idea that the excitation of sweet GRNs leads to a warm preference. The authors then examine the internal state's influence on taste-evoked warm preference, focusing on neuropeptide F (NPF) and small neuropeptide F (sNPF), analogous to mammalian neuropeptide Y. Mutations in NPF and sNPF result in a failure to exhibit taste-evoked warm preference, emphasizing their role in this process. However, these neuropeptides appear not to be critical for nutrient-induced warm preference, as indicated by increased temperature preference during glucose and fly food refeeding in mutant flies. The authors also explore the role of hunger-related factors in regula3ng taste-evoked warm preference. Hunger signals, including diuretic hormone (DH44) and adipokinetic hormone (AKH) neurons, are found to be essential for taste-evoked warm preference but not for nutrient-induced warm preference. Additionally, insulin-like peptides 6 (Ilp6) and Unpaired3 (Upd3), related to nutritional stress, are identified as crucial for taste-evoked warm preference. The investigation then extends into circadian rhythms, revealing that taste-evoked warm preference does not align with the feeding rhythm. While flies exhibit a rhythmic feeding pattern, taste-evoked warm preference occurs consistently, suggesting a lack of parallel coordination. Clock genes, crucial for circadian rhythms, are found to be necessary for taste-evoked warm preference but not for nutrient-induced warm preference. 

      Strengths: 

      A well-written and interesting study, investigating an intriguing issue. The claims, none of which to the best of my knowledge controversial, are backed by a substantial number of experiments. 

      Weakness: 

      The experimental setup used and the procedures for assessing the temperature preferences of flies are rather sparingly described. Additional details and data presentation would enhance the clarity and replicability of the study. I kindly request the authors to consider the following points: 

      i) A schematic drawing or diagram illustrating the experimental setup for the temperature preference assay would greatly aid readers in understanding the spatial arrangement of the apparatus, temperature points, and the positioning of flies during the assay. The drawing should also be accompanied by specific details about the setup (dimensions, material, etc). 

      Thank you for your suggestions. We have added the schematic drawing in Fig. S1.

      ii) It would be beneficial to include a visual representation of the distribution of flies within the temperature gradient on the apparatus. A graphical representation, such as a heatmaps or histograms, showing the percentage of flies within each one-degree temperature bin, would offer insights into the preferences and behaviors of the flies during the assay. In addition to the detailed description of the assay and data analysis, the inclusion of actual data plots, especially for key findings or representative trials, would provide readers with a more direct visualization of the experimental outcomes. These additions will not only enhance the clarity of the presented information but also provide the reader with a more comprehensive understanding of the experimental setup and results. I appreciate the authors' attention to these points and look forward to the potential inclusion of these elements in the revised manuscript. 

      Thank you for the advice. We have added the heat map for WT and Gr64fGal4>CsChrimson data in Fig. S2. 

      Reviewer #3 (Public Review): 

      Summary: 

      The manuscript by Yujiro Umezaki and colleagues aims to describe how taste stimuli influence temperature preference in Drosophila. Under starvation flies display a strong preference for cooler temperatures than under fed conditions that can be reversed by refeeding, demonstrating the strong impact of metabolism on temperature preference. In their present study, Umezaki and colleagues observed that such changes in temperature preference are not solely triggered by the metabolic state of the animal but that gustatory circuits and peptidergic signalling play a pivotal role in gustation-evoked alteration in temperature preference. 

      The study of Umezaki is definitively interesting and the findings in this manuscript will be of interest to a broad readership. 

      Strengths: 

      The authors demonstrate interesting new data on how taste input can influence temperature preference during starvation. They propose how gustatory pathways may work together with thermosensitive neurons, peptidergic neurons and finally try to bridge the gap between these neurons and clock genes. The study is very interesting and the data for each experiment alone are very convincing. 

      Weaknesses: 

      In my opinion, the authors have opened many new questions but did not fully answer the initial question - how do taste-sensing neurons influence temperature preferences? What are the mechanisms underlying this observation? Instead of jumping from gustatory neurons to thermosensitive neurons to peptidergic neurons to clock genes, the authors should have stayed within the one question they were asking at the beginning. How does sugar sensing influence the physiology of thermos-sensation in order to change temperature preference? Before addressing all the following question of the manuscript the authors should first directly decipher the neuronal interplay between these two types of neurons. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Figure S3D is cited before S2, so please rearrange the numbering.

      Thank you. We have changed the numbering.

      I would also suggest a different color to visualize the data points in Figure S3, as some are barely visible on the dark bars (e.g. on a dark green background). 

      We have revised the figures. The data points were changed to smaller opened circles. 

      Reviewer #2 (Recommendations For The Authors): 

      *Please, expand on the experimental procedure, and describe the assay in detail. 

      We have added a scheme for the assay in Fig. S1 and also have revised the manuscript and figures.

      *Show the distribution of the gradient data that the preference values are based upon. Not necessarily for all, but for select key experiments. Heatmaps for each replicate (stacked on top of each other) would be a nice way of showing this. Simple histograms would of course work as well. 

      We have added heatmaps of selected key experiments that were added in Fig. S2. We have revised the manuscript and figures, correspondingly.

      Reviewer #3 (Recommendations For The Authors  

      The manuscript by Yujiro Umezaki and colleagues aims at describing how taste stimuli influence temperature preference in Drosophila. Under starvation, flies display a strong preference for cooler temperatures than under-fed conditions that can be reversed by refeeding, demonstrating the strong impact of metabolism on temperature preference. In their present study, Umezaki and colleagues observed that such changes in temperature preference are not solely triggered by the metabolic state of the animal but that gustatory circuits play a pivotal role in temperature preference. The study of Umezaki is definitively interesting and the findings in this manuscript will be of interest to a broad readership. However, I would like to draw the authors' attention to some points of concern: 

      The title to me sounds somehow inadequate. The definition of homeostasis (Cambridge Dictionary) is as follows: "the ability or tendency of a living organism, cell, or group to keep the conditions INSIDE it the same despite any changes in the conditions around it, or this state of internal balance". What do the authors mean by homeostatic temperature control? Reading the title not knowing much about poikilotherm insects I would understand that the authors claim that Drosophila can indeed keep a temperature homeostasis as mammals do. As Drosophila is not a homoiotherm animal and thus cannot keep its body temperature stable the title should be amended.  

      Homeostasis means a state of balance between all the body systems necessary for the body to survive and function properly. Drosophila are ectotherms, so the source of temperature comes from the environment, and their body temperature is very similar to that of their environment. However, the flies' temperature regulation is not simply a passive response to temperature. Instead, they actively seek a temperature based on their internal state. We have shown that the preferred temperature increases during the day and decreases during the night, showing a circadian rhythm of temperature preference (TPR). Because their environmental temperature is very close to their body temperature, TPR gives rise to body temperature rhythms (BTR). We have shown that TPR is similar to BTR in mammals. (Kaneko et al., Current Biology 2012 and Goda et al., JBR 2023). Similarly, we showed that the hungry flies choose a lower temperature so that the body temperature is also lower. Therefore, our data suggest that the fly maintains its homeostasis by using the environmental temperature to adjust its body temperature to an appropriate temperature depending on its internal state. Therefore, I would like to keep the title as "Taste triggers a homeostatic temperature control in hungry flies" We have added more explana1on in the Introduc1on and Discussion.

      Accordingly, the authors compare the preference of flies to cooler temperatures to the reduced body temperature of mammals (Lines 64 - 65). However, according to the cited literature the reduced body temperature in starved rats is discussed to reduce metabolic heat production (Sakurada et al., 2000). The authors should more rigorously give a short summary of the findings in the cited papers and the original interpretation to help the reader not get confused.

      In flies, it has been shown that a lower temperature means a lower metabolic rate, and a higher temperature means a higher metabolic rate. Therefore, hungry flies choose a lower temperature where their metabolic rate is lower and they do not need as much heat.

      Similarly, in mammals, starvation causes a lower body temperature, hypothermia. Body temperature is controlled by the balance between heat loss and heat production. The starved mammals showed lower heat production. We have added this information to the introduction. 

      The authors show that 5 min fly food refeeding causes a par3al recovery of the naïve temperature preference of the flies (Figure 1B) and that feeding of sucralose par3ally rescues the preference whereas glucose rescues the preference similar to refeeding with fly food would do. As glucose is both sweet and metabolically valuable it would be clearer for the reader if the authors start with the fly food experiment and then show the glucose experiment to show that the altered temperature preference depends on the food component glucose. From there they can further argue that glucose is both sweet (hedonic value) and metabolically valuable. And to disentangle sweetness from metabolism one needs a sugar that is sweet but cannot be metabolized - sucralose. 

      Thank you for your advice. Since the data with sucralose is the one we want to highlight the most, we decided to present it in the order of sucralose, glucose, and fly food.

      In the sucralose experiment the authors omit the 5 min data point and only show the 10 min time point. As Figure 1F indicates that both Glucose and Sucralose elicit the same attractiveness in the flies and that sweetness influences the temperature preference, it is important that the authors show the 5 min temperature preference too to underline the effect of the sweet taste stimulus on the fly behavior independent from the caloric value. Further, the authors should demonstrate not only the cumulative touches but how much sucralose or glucose may already be consumed by the fly in the depicted time frames. 

      It is interesting to see how much sucralose or glucose the flies consume over the time frames shown. Although the cumula1ve exposure to sugar is ideally equivalent to the amount of sugar, we need a different way to actually measure the amount of sugar. We will now emphasize "cumulative touches" rather than "amount of sugar" in the text. In the next study, we will look at how much sucralose or glucose the fly has already consumed.

      Sucralose and Glucose have a similar molecular structure - it would be interesting to see how the sweet taste of a sugar with a different molecular structure like fructose and its receptor Gr43b (Myamato & Amrein 2014) may contribute to temperature preferences.  

      Sucralose and Glucose are not structurally similar. That said, we tested fructose refeeding anyway. The hungry flies showed a taste-evoked warm preference after fructose refeeding. We have added data in Figure 1E and F. The data suggest that sweet taste is more important than sugar structure. We also tested Gr43b>CsChrimson. However, the flies do not show the taste-evoked warm preference (data not shown). The data suggest that Gr43b is not the major receptor controlling taste-evoked warm preference. We have revised the manuscript.

      Both sugars appear similarly attractive to the flies (Figure 1F) - are water, sucralose, and glucose presented in a choice assay or are these individually in separate experiments? 

      Water, sucralose, and glucose were individually presented in separate experiments. We clarified it in the figure legend.

      Subsequently, the authors address the question of how sweet taste may influence temperature preferences in flies. To this end, the authors first employ gustatory receptor mutants for Gr5a, Gr64a, and Gr61a and demonstrate that sucralose feeding does not rescue temperature preference in the absence of sweet taste receptors. In an alternative approach, the authors do not use mutants but an expression of UAS:Kir in Gr64F neurons. Taking a closer look at the graph it appears that the Kir expressing flies have an increased (nearly 1{degree sign}C) temperature preference than the starved mutant flies. Is this preference change related to the mutation directly and what would be the result if Kir would be conditionally only expressed after development is completed, or is the observed temperature preference related to the Gr64f-Gal4 line? If the latter would be the case perhaps the authors may want to bring the flies to the same genetic background to allow for a more direct comparison of the temperature preferences. 

      The Gr64fGal4>Kir flies show a ~one degree higher preferred temperature under starvation compared to the mutants. However, the phenotype is similar to the controls, Gr64fGal4/+ flies, under starvation. Therefore, this phenotype is not due to either the mutation or the Kir effect. Most importantly, the Gr64fGal4>Kir flies failed to show a taste-evoked warm preference. Together with other mutant data, we concluded that sweet GRNs are required for taste-evoked warm preference.

      Overall, the figure legend for Figure 2 is very cryptic and should be more detailed.

      We have revised the figure legend for Figure 2. 

      To shed light on the mechanisms underlying the changes in temperature preferences through gustatory stimuli the authors next blocked heat and cold sensing neurons in fed and starved flies and found out that TrpA1 expressing anterior cells and R11F02-Gal4 expressing neurons both participate in sweetness-induced alteration of temperature preference in starved animals. At this point, it should be explicitly indicated in the figure that the flies need more than one overnight starva3on to display the behavior (Figure 3A). 

      We have revised the manuscript.

      The data provided by the authors indicate a kind of push-and-pull mechanism between heat and cold-sensing neurons under starvation that is somehow influenced by sweet taste sensing. Further, the authors demonstrate that TrpA1-as well as R11F02-Gal4 driven Chrimson activation is sufficient to partially rescue temperature preference under starvation. At this point is unclear why the authors use a tubGal80ts expression system but not for the TrpA1SH-Gal4 driven Chrimson. As the development itself and the conditions under which the animals were raised may have influence on the temperature preference it is important that both groups are equally raised if the authors want to directly compare with each other. 

      As we wrote in the Material and Method, the R11F02-Gal4>uas-CsChrimson flies died during the development. Therefore, we had to use tubGal80ts. On the other hand, the TrpA1-Gal4>CsChrimson flies can survive to adults. As we mentioned in MS, all flies were treated with ATR after they had fully developed into adults. This means that both TrpA1-Gal4 and R11F02-Gal4 expressing cells are ac1vated by red light via CsChrimson only in adult stages. We carefully revised the MS.

      It is a pity that the authors at this point have decided to not deepen the understanding of the circuitry between thermo-sensation and metabolic homeostasis but subsequently change the focus of their study to investigate how internal state influences taste-evoked warm preference in hungry flies. Using mutants for NPF and sNPF the authors demonstrate that both peptides play a pivotal role in taste-evoked warm preference after sucrose feeding but not for nutrient-induced warm preference. Similarly, they found that DH44, AKH and dILP6, Upd2 and Upd3 neurons are also required for taste-evoked warm preference but not for nutrient-induced warm preference. Here again, the authors do not keep the systems stable and change between inhibition of neurons through Kir and mutants for peptides. For a better comparison, it would be preferable to use always exactly the same technique to inhibit neuron signalling.

      It would be interesting to find the neural circuity of thermo-sensation and metabolic homeostasis, but we do not have any luck so far. We will continue to look into the neural circuits which control taste-evoked warm preference and nutrient-induced warm preference. Since UAS-Kir is such a strong reporter, it may kill the flies sometime. So we couldn't use UAS-Kir for all Gal4 flies. 

      DH44 is expressed in the brain and in the abdominal ganglion where they share the expression pattern with 4 Lk neurons per hemisphere. Seeing the impact of Lk signalling in metabolism (AlAnzi et al., 2010) the authors should provide evidence that the observed effect is indeed because of DH44 and not Lk.

      It would be interesting to see if Lk may play a role in taste-evoked warm preference and/or nutrient-induced warm preference. We would like to systematically screen which neuropeptides and receptors are involved in the behavior in the next study. 

      Seeing the results on dILP6 it is interesting that Li and Gong (2015) could show in larvae that cold-sensing neurons directly interact with dILP neurons in the brain. It would be interesting to see whether similar circuitry may exist in adult flies to regulate temperature preferences and these peptidergic neurons. Further, it appears interesting that again these animals need much longer time to display the observed shift in temperature (which again should be clearly indicated in the figure legend too). These observations should be more carefully considered in the discussion part too.

      We have revised the manuscript.

      In the last part of the study, the authors investigate how sensory input from temperature-sensitive cells may transmit information to central clock neurons and how these in turn may influence temperature preference under starvation. The experiments assume that DH44-expressing neurons play a role in the output pathway of the central clock. Using the clock gene null mutants per and tim the authors show that even though the animals display a significant starvation response neither per nor tim mutants exhibited taste-evoked warm preference, indicating a taste but not nutrient-evoked temperature preference regulation. 

      The authors demonstrate interesting new data on how taste input can influence temperature preference during starvation. They propose how gustatory pathways may work together with thermosensitive neurons, peptidergic neurons and finally try to bridge the gap between these neurons and clock genes. The study is very interesting and the data for each experiment alone are very convincing. However, in my opinion, the authors have opened many new questions but did not fully answer the initial question - how do taste-sensing neurons influence temperature preferences? What are the mechanisms underlying this observation? Instead of jumping from gustatory neurons to thermosensitive neurons to peptidergic neurons to clock genes, the authors should have stayed within the one question they were asking at the beginning. How does sugar sensing influence the physiology of thermos-sensation? Before addressing all the following questions of the manuscript the authors should first directly decipher the neuronal interplay between these two types of neurons. 

      Thank you for your suggestion. It would be interesting to find the neural circuity of thermo-sensation and metabolic homeostasis. We have tried but there is no luck so far. 

      The authors could e.g., employ Ca or cAMP-imaging in anterior or cold-sensitive cells and see how the responsiveness of these cells may be altered after sugar feeding. Or at least follow the idea of Li and Gong about the thermos-regulation of dILP-expressing neurons. 

      Thank you for your suggestion. Since we do not know how dlLP-expression neurons are involved in temperature response in the adult flies. We will focus on the cells using Calcium imaging for the next study.

      Anatomical analysis using the GRASP technique may further help to understand the interplay of these neurons and give new insights into the circuitry underlying food preference alteration under starvation. 

      Thank you for your suggestion. It would be interesting to find the neural circuity of thermo-sensation and metabolic homeostasis. We have tried but there is no luck so far.  

      Minor comments: 

      Line 51: Hungry animals are desperate for food - I think the authors should not anthropomorphize at this point too\ much but rather strictly describe how the animals change their behavior without any interpretation of the mental state of the animal. 

      We have modified the manuscript.

      Line 80: Hunger and satiety dramatically affect animal behavior and physiology and control feeding - please not only cite the papers but also give a short overview of the cited papers on which behaviors are altered and how. 

      We have revised the manuscript. 

      Overall statistic: The authors do comparative statistics always against starved animals throughout but often state in the text a comparison against fed (Line 111: "but did not reach that of the fed flies") I think the authors should describe the date according to their statistics and keep this constant throughout the paper. 

      Sorry for the confusion. We originally had it, but we removed it. We have added the additional statistical analyses.  

      Figure legends: Overall the figure legends could be more developed and more detailed.

      We have revised the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      As adult-born granule neurons have been shown to play diverse roles, both positive and negative, to modulate hippocampal circuitry and function in epilepsy, understanding the mechanisms by which altered neurogenesis contributes to seizures is important for future therapeutic strategies. The work by Jain et al. demonstrates that increasing adult neurogenesis before status epilepticus (SE) leads to a suppression of chronic seizures in the pilocarpine model of temporal lobe epilepsy. This work is potentially interesting because previous studies showed suppressing neurogenesis led to reduced chronic seizures.

      To increase neurogenesis, the authors conditionally delete the pro-apoptotic gene Bax using a tamoxifen-inducible Nestin-CreERT2 which has been previously published to increase proliferation and survival of adult-born neurons by Sahay et al. After 6 weeks of tamoxifen injection, the authors subjected male and female mice to pilocarpine-induced SE. In the first study, at 2 hours after pilocarpine, the authors examine latency to the first seizure, severity and total number of acute seizures, and power during SE. In the second study in a separate group of mice, at 3 weeks after pilocarpine, the authors examine chronic seizure number and frequency, seizure duration, postictal depression, and seizure distribution/cluster seizures. Overall, the study concludes that increasing adult neurogenesis in the normal adult brain can reduce epilepsy in females specifically. However, important BrdU birthdating experiments in both male and female mice need to be included to support the conclusions made by the authors. Furthermore, speculative mechanisms lacking direct evidence reduce enthusiasm for the findings.

      There are two suggestions. First, BrdU birthdating of newborn neurons is important to add to the paper so that there is support for the conclusions. Second, speculative text reduced enthusiasm. In response, we clarified the conclusions. We do not think that the clarified conclusions require BrdU birthdating (discussed further below). We also removed two schematics (and associated text) that we think the reviewer was referring to when speculation was mentioned.

      We also want to point out something minor -that the times of injections listed above are not correct.

      a. Seizures were not measured 2 hrs after pilocarpine; that is when the anticonvulsant diazepam was administered to males. 

      b. Seizures were not measured 3 weeks after pilocarpine; the duration of recording was 3 weeks.  

      (1) BrdU birthdating is required for conclusions.

      We think that the Reviewer was suggesting birthdating because we were not clear about our conclusions, and we apologize for the confusion. The Reviewer stated that we concluded: “conditionally deleting Bax in Nestin-Cre+ cells leads to increased neurogenesis and hilar ectopic granule cells, thereby reducing chronic seizures.”  (Note this is a quote from the review).

      However, we did not intend to conclude that. We intended to conclude that conditionally deleting Bax in Nestin-Cre+ mice reduced chronic seizures in the mouse model of epilepsy that we used. Also, that conclusion only pertained to females. Please note we did not conclude that hilar ectopic granule cells led to reduced seizures. We also concluded that Bax deletion increased neurogenesis in female mice. We have revised the text to make the conclusions clear.

      Abstract, starting on line 67:

      The results suggest that selective Bax deletion to increase adult neurogenesis can reduce experimental epilepsy, and the effect shows a striking sex difference.

      Results, starting on line 448:

      Because Cre+ epileptic females had increased numbers of immature neurons relative to Cre- females at the time of SE, and prior studies show that Cre+ females had less neuronal damage after SE (Jain et al., 2019), female Cre+ mice might have had reduced chronic seizures because of high numbers of immature neurons. However, the data do not prove a causal role.

      Starting on line 477:

      ...we hypothesized that female Cre+ mice would have fewer hilar ectopic GCs than female Cre- mice. However, that female Cre+ mice did not have fewer hilar ectopic GCs.

      Discussion, starting on line 563:

      The chronic seizures, measured 4-7 weeks after pilocarpine, were reduced in frequency by about 50% in females. Therefore, increasing young adult-born neurons before the epileptogenic insult can protect against epilepsy. However, we do not know if the protective effect was due to the greater number of new neurons before SE or other effects. Past data would suggest that increased numbers of newborn neurons before SE leads to a reduced SE duration and less neuronal damage in the days after SE. That would be likely to lessen the epilepsy after SE. However, there may have been additional effects of larger numbers of newborn neurons prior to SE.

      Conclusions, starting on line 745:

      In the past, suppressing adult neurogenesis before SE was followed by fewer hilar ectopic GCs and reduced chronic seizures. Here, we show that the opposite - enhancing adult neurogenesis before SE and increased hilar ectopic GCs - do not necessarily reduce seizures. We suggest instead that protection of the hilar neurons from SE-induced excitotoxicity was critical to reducing seizures. The reason for the suggestion is that the survival of hilar neurons would lead to persistence of the normal inhibitory functions of hilar neurons, protecting against seizures. However, this is only a suggestion at the present time because we do not have data to prove it. Additionally, because protection was in females, sex differences are likely to have played an important role. Regardless, the results show that enhancing neurogenesis of young adult-born neurons in Nestin-Cre+ mice had a striking effect in the pilocarpine model, reducing chronic seizures in female mice.

      The Reviewer is correct that it would be interesting to know when the increase in adult neurogenesis occurred that was critical to the effect. For example, was it the initial increase following Bax deletion but before pilocarpine-induced SE, or the increase in neurogenesis following SE, or increased adult neurogenesis in the chronic stage of epilepsy. It also might be that related aspects of neurogenesis played a role such as the degree that maturation was normal in adult-born neurons. We have not pursued the experiments to identify these aspects of neurogenesis because of how much work it would entail. Also, approaches to conclude cause-effect relationships are going to be difficult. 

      (2) Speculation.

      We removed the text and supplemental figures with schematics that we think were the overly speculative parts of the paper the Reviewer mentioned.

      Strengths:

      (1) The study is sex-matched and reveals differences in response to increasing adult neurogenesis in chronic seizures between males and females.

      (2) The EEG recording parameters are stringent, and the analysis of chronic seizures is comprehensive. In two separate experiments, the electrodes were implanted to record EEG from the cortex as well as the hippocampus. The recording was done for 10 hours post pilocarpine to analyze acute seizures, and for 3 weeks continuous video EEG recording was done to analyze chronic seizures.

      Weaknesses:

      (1) Cells generated during acute seizures have different properties to cells generated in chronic seizures. In this study, the authors employ two bouts of neurogenesis stimuli (Bax deletion dependent and SE dependent), with two phases of epilepsy (acute and chronic). There are multiple confounding variables to effectively conclude that conditionally deleting Bax in Nestin-Cre+ cells leads to increased neurogenesis and hilar ectopic granule cells, thereby reducing chronic seizures.

      As mentioned above, with a clarification of our conclusions we think we have addressed the concern. We believe that we conditionally deleted Bax in Nestin-expressing cells. We believe we found that female mice had reduced loss of hilar mossy cells and somatostatin-expressing neurons after SE, and fewer chronic seizures after SE. While it makes sense that increased neurogenesis caused the reduced seizures, we acknowledge it was not proved.

      We do not make conclusions about the role of hilar ectopic granule cells. However, we note that they appear to have been similar in number across groups, which suggests they played no role in the results. This is very surprising and therefore adds novelty.

      (2) Related to this is the degree of neurogenesis between Cre+ and Cre- mice and the nature of the sex differences. It is crucial to know the rate/fold change of increased neurogenesis before pilocarpine treatment and whether it is different between male and female mice.

      We agree that if sex differences in adult neurogenesis could be shown by a sex difference in rate, fold change, maturation, and other characteristics.  However, sex differences can also be shown by a change in doublecortin (DCX), which is what we did. We respectfully submit that we do not see an exhaustive study is critical.

      As a result, we have clarified DCX was studied either before SE or in the period of chronic seizures:

      Results, starting on line 406:

      III. Before and after epileptogenesis, Cre+ female mice exhibited more immature neurons than Cre- female mice but that was not true for male mice.

      Starting on line 446:

      Therefore, elevated DCX occurred after chronic seizures had developed in Cre+ mice but the effect was limited to females.

      Discussion, starting on line 592:

      This study showed that conditional deletion of Bax from Nestin-expressing progenitors increased young adult-born neurons in the DG when studied 6 weeks after deletion and using DCX as a marker of immature neurons.

      (3) The authors observe more hilar Prox1 cells in Cre+ mice compared to Cre- mice. The authors should confirm the source of the hilar Prox1+ cells.

      This is an excellent question but it is unclear that it is critical to the seizures since both sexes showed more hilar Prox1 cells in Cre+ mice but only the females had fewer seizures than Cre- mice. This is the additional text to describe the results (starting on Line 493):

      In past studies, hilar ectopic GCs have been suggested to promote seizures (Scharfman et al., 2000; Jung et al., 2006; Cho et al., 2015). Therefore, we asked if the numbers of hilar ectopic GCs correlated with the numbers of chronic seizures. When Cre- and Cre+ mice were compared (both sexes pooled), there was a correlation with numbers of chronic seizures (Fig. 6D1) but it suggested that more hilar ectopic GCs improved rather than worsened seizures. However, the correlation was only in Cre- mice, and when sexes were separated there was no correlation (Fig. 6D3).

      When seizure-free interval was examined with sexes pooled, there was a correlation for Cre+ mice (Fig. 6D2) but not Cre- mice. Strangely, the correlations of Cre+ mice with seizure-free interval (Fig. 6D2, D4) suggest ectopic GCs shorten the seizure-free interval and therefore worsen epilepsy, opposite of the correlative data for numbers of chronic seizures. In light of these inconsistent results it seems that hilar ectopic granule cells had no consistent effect on chronic seizures.

      (4) The biggest weakness is the lack of mechanism. The authors postulate a hypothetical mechanism to reconcile how increasing and decreasing adult-born neurons in GCL and hilus and loss of hilar mossy and SOM cells would lead to opposite effects - more or fewer seizures. The authors suggest the reason could be due to rewiring or no rewiring of hilar ectopic GCs, respectively, but do not provide clear-cut evidence.

      As we mention above, we removed the supplemental figures with schematics because they probably were what seemed overly speculative.

      We acknowledge that mechanism is not proven by our study. However, we would like to mention that in our view, showing preservation of hilar mossy cells and SOM cells, but not PV cells, does add mechanistic data to the paper. We understand more experiments are necessary.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Jain et al explore whether increasing adult neurogenesis is protective against status epilepticus (SE) and the development of spontaneous recurrent seizures (chronic epilepsy) in a mouse pilocarpine model of TLE. The authors increase adult neurogenesis via conditional deletion of Bax, a pro-apoptotic gene, in Nestin-CreERT2Baxfl/fl mice. Cre- littermates are used as controls for comparisons. In addition to characterizing seizure phenotypes, the authors also compare the abundance of hilar ectopic granule cells, mossy cells, hilar SOM interneurons, and the degree of neuronal damage between mice with increased neurogenesis (Cre+) vs Cre- controls. The authors find less severe SE and a reduction in chronic seizures in female mice with pre-insult increased adult-born neurons. Immunolabeling experiments show these females also have preservation of hilar mossy cells and somatostatin interneurons, suggesting the pre-insult increase in adult neurogenesis is protective.

      Strengths:

      (1) The finding that female mice with increased neurogenesis at the time of pilocarpine exposure have fewer seizures despite having increased hilar ectopic granule cells is very interesting.

      (2) The work builds nicely on the group's prior studies.

      (3) Apparent sex differences are a potentially important finding.

      (4) The immunohistochemistry data are compelling.

      (5) Good controls for EEG electrode implantation effects.

      (6) Nice analysis of most of the SE EEG data.

      Weaknesses:

      (1) In addition to the Cre- littermate controls, a no Tamoxifen treatment group is necessary to control for both insertional effects and leaky expression of the Nestin-CreERT2 transgene.

      About “leaky” expression, we have not found expression to be leaky. We checked by injecting a Cre-dependent virus so that mCherry would be expressed in those cells that had Cre.  The results were published as Supplemental Figure 9 in Jain et al. (2019).

      In the revised manuscript we also mention a study that examined three Nestin-CreERT2 mouse lines (Sun et al., 2014). One of the mouse lines was ours. The leaky expression was not in the mouse line we use. We have added these points to the revised manuscript:

      Methods, section II starting on line 791:

      Although Nestin-Cre-ERT2 mouse lines have been criticized because  they can have leaky expression, the mouse line used in the present study did not (Sun et al., 2014), which we confirmed (Jain et al., 2019).

      (2) The authors suggest sex differences; however, experimental procedures differed between male and female mice (as the authors note). Female mice received diazepam 40 minutes after the first pilocarpine-induced seizure onset, whereas male mice did not receive diazepam until 2 hours post-onset. The former would likely lessen the effects of SE on the female mice. Therefore, sex differences cannot be accurately assessed by comparing these two groups, and instead, should be compared between mice with matching diazepam time courses.

      We agree that a shorter delay between pilocarpine and diazepam would be likely to lead to less damage. However, the latency from pilocarpine to SE varied, making the time from the onset of SE to diazepam variable. Most of the variability was in females. By timing the diazepam injection differently in males and females, we could make the time from the onset of SE to diazepam similar between females and males. We had added a supplemental figure to show that our approach led to no significant differences between females and males in the latency to SE, time between SE and diazepam injection, and time between pilocarpine and diazepam injection. We also show that Cre+ females and Cre- females were not different in these times, so it could not be related to the neuroprotection of Cre+ females.

      Additionally, the authors state that female mice that received diazepam 2 hours post-onset had severe brain damage. This is concerning as it would suggest that SE is more severe in the female than in the male mice.

      We regret that our language was misleading. We intended to say females had more morbidity and mortality than males (lack of appetite and grooming, death in the days after SE) when we gave DZP 2 hrs after Pilo. We actually don’t know why because there were no differences in severity of SE. We think the females had worse outcome when they had a short latency to SE.  These females had a longer period of SE before DZP than males, probably leading to worse outcome. To correct this we gave DZP to females sooner. Then morbidity and mortality was improved in females. 

      Interestingly, after we did this we saw females did not always have a short latency to SE. We maintained the same regimen however, to be consistent. As the new supplemental figure (above) shows, there were significant sex differences in the latency to SE, time between SE and DZP, and time between pilocarpine and DZP.

      (3) Some sample sizes are low, particularly when sex and genotypes are split (n=3-5), which could cause a type II statistical error.

      We agree and have noted this limitation in the Discussion:

      Additional considerations, starting on line 739:

      This study is limited by the possibilities of type II statistical errors in those instances where we divided groups by genotype and sex, leading to comparisons of 3-5 mice/group.

      (4) Several figures show a datapoint in the sex and genotype-separated graphs that is missing from the corresponding male and female pooled graphs (Figs. 2C, 2D, 4B).

      We are very grateful to the Reviewer for pointing out the errors. They are corrected.

      (5) In Suppl Figs. 1B & 1C, subsections 1c and 2c, the EEG trace recording is described as the end of SE; however, SE appears to still be ongoing in these traces in the form of periodic discharges in the EEG.

      The Reviewer is correct.  It is a misconception that SE actually ends completely. The most intense seizure activity may, but what remains is abnormal activity that can last for days. Other investigators observe the same and have suggested that it argues against the concept of a silent period between SE and chronic epilepsy. We had discussed this in our prior papers and had referenced how we define SE.  In the revised manuscript we add the information to the Methods section instead of referencing a prior study:

      Methods, starting on line 899:

      SE duration was defined in light of the fact that the EEG did not return to normal after the initial period of intense activity. Instead, intermittent spiking occurred for at least 24 hrs, as we previously described (Jain et al., 2019) and has been described by others (Mazzuferi et al., 2012; Bumanglag and Sloviter, 2018; Smith et al., 2018). We therefore chose a definition that captured the initial, intense activity. We defined the end of this time as the point when the amplitude of the EEG deflections were reduced to 50% or less of the peak deflections during the initial hour of SE. Specifically, we selected the time after the onset of SE when the EEG amplitude in at least 3 channels had dropped to approximately 2 times the amplitude of the EEG during the first hour of SE, and remained depressed for at least 10 min (Fig. S2 in (Jain et al., 2019). Thus, the duration of SE was defined as the time between the onset and this definition of the "end" of SE.

      (6) In Results section II.D and associated Fig.3, what the authors refer to as "postictal EEG depression" is more appropriately termed "postictal EEG suppression". Also, postictal EEG suppression has established criteria to define it that should be used.

      We find suppression is typical in studies of ECT or humans (Esmaeili et al., 2023; Gascoigne et al., 2023; Hahn et al., 2023; Kavakbasi et al., 2023; Langroudi et al., 2023; Karl et al., 2024; Vilan et al., 2024; Zhao et al., 2024) and animal research uses the term postictal depression(Kanner et al., 2010; Krishnan and Bazhenov, 2011; Riljak et al., 2012; Singh et al., 2012; Carballosa-Gonzalez et al., 2013; Kommajosyula et al., 2016; Smith et al., 2018; Uva and de Curtis, 2020; Medvedeva et al., 2023). Therefore we think depression is a more suitable term.

      The example traces in Fig. 3A and B should also be expanded to better show this potential phenomenon.

      We expanded traces in Fig. 3 as suggested. They are in Fig 3A.

      (7) In Fig.5D, the area fraction of DCX in Cre+ female mice is comparable to that of Cre- and Cre+ male mice. Is it possible that there is a ceiling effect in DCX expression that may explain why male Cre+ mice do not have a significant increase compared to male Cre- mice?

      We thank the Reviewer for the intriguing possibility. We now mention it in the manuscript:

      Results, starting on line 456:

      It is notable that the Cre+ male mice did not show increased numbers of immature neurons at the time of chronic seizures but Cre+ females did. It is possible that there was a “ceiling” effect in DCX expression that would explain why male Cre+ mice did not have a significant increase in immature neurons relative to male Cre- mice.

      (8) In Suppl. Fig 6, the authors should include DCX immunolabeling quantification from conditional Cre+ male mice used in this study, rather than showing data from a previous publication.

      We have made this revision.

      (9) In Fig 8, please also include Fluorojade-C staining and quantification for male mice.

      The additional data for males have been added to part D.

      (10) Page 13: Please specify in the first paragraph of the discussion that findings were specific to female mice with pre-insult increases in adult-born neurogenesis.

      This has been done.

      Minor:

      (11) In Fig. 1 and suppl. figure 1, please clarify whether traces are from male or female mice.

      We have clarified.

      (12) Please be consistent with indicating whether immunolabeling images are from female or male mice.

      a. Fig 5B images labeled as from "Cre- Females" and "Cre+ Females".

      b. Suppl. Fig 8: Images labeled as "Cre- F" and "Cre+ F".

      c. Fig 6: sex not specified.

      d. Fig. 7: sex only specified in the figure legend.

      e. Fig 8: only female mice were included in these experiments, but this is not clear from the figure title or legend.

      We revised all figures according to the comments.

      (13) Page 4: the last paragraph of the introduction belongs within the discussion section.

      We recognize there is a classic view that any discussion of Results should not be in the Introduction. However, we find that view has faded and more authors make a brief summary statement about the Results at the end of the Introduction. We would like to do so because it allow Readers to understand the direction of the study at the outset, which we find is helpful.

      (14) Page 6: The sentence "The data are consistent with prior studies..." is unnecessary.

      We have removed the text.

      (15) Suppl. Fig 6A: Please include representative images of normal condition DCX immunolabeling.

      We have added these data. There is an image of a Cre- female, Cre+ female, Cre- male and Cre+ male in the new figure, Supplemental Figure 6. All mice had tamoxifen at 6 weeks of age and were perfused 6 weeks later. None of the mice had pilocarpine.

      (16) In Suppl. Fig 7C, I believe the authors mean "no loss of hilar mossy and SOM cells" instead of "loss of hilar mossy and SOM cells".

      This Figure was removed because of the input from Reviewer 1 suggesting it was too speculative.

      Reviewer #1 (Recommendations For The Authors):

      (1) The main claim of the study is that increasing adult neurogenesis decreases chronic seizures. However, to quantify adult-born neurons, DCX immunoreactivity is used as the sole metric to determine neurogenesis. This is insufficient as changes in DCX-expressing cells could also be an indicator of altered maturation, survival, and/or migration, not proliferation per se. To claim that increasing adult neurogenesis is associated with a reduction of chronic seizures, the authors should perform a pulse/chase (birth dating) experiment with BrdU and co-labeling with DCX.

      We think that increased DCX does reflect increased adult neurogenesis. However, we agree that one does not know if it was due to increased proliferation, survival, etc. We also note that this mouse line has been studied thoroughly to show there was increased neurogenesis with BrdU, Ki67 and DCX. We mention that paper in the revised text:

      Methods, starting on line 786:

      It was shown that after tamoxifen injection in adult mice there is an increase in dentate gyrus neurogenesis based on studies of bromo-deoxyuridine, Ki67, and doublecortin (Sahay et al., 2011).

      (2) As mentioned above, analysis of DCX staining alone months after TAM injections is limited. Instead, the cells could be labelled by BrdU prior to TAM injection, following which quantification of BrdU+/Prox1+ cells at 6 weeks post TAM injection should be performed in Cre+ and Cre- mice (males and females) to yield the rate of neurogenesis increase.

      We respectfully disagree that birthdating cells is critical. Using DCX staining just before SE, we know the size of the population of cells that are immature at the time of SE. This is what we think is most important because these immature neurons are those that appear to affect SE, as we have already shown.

      (3) To confirm the source of the hilar Prox1+ cells, a dual BrdU/EdU labeling approach would be beneficial. BrdU injection could be given before TAM injection and EdU injection before pilocarpine to label different cohorts of neural stem cells. Co-staining with Prox1 at different time points will help in identifying the origin of hilar ectopic cells.

      We are grateful for the ideas of the Reviewer. We hesitate to do these experiments now because it seems like a new study to find out where hilar granule cells come from.

      REFERENCES

      Bumanglag AV, Sloviter RS (2018) No latency to dentate granule cell epileptogenesis in experimental temporal lobe epilepsy with hippocampal sclerosis. Epilepsia 59:2019-2034.

      Carballosa-Gonzalez MM, Munoz LJ, Lopez-Alburquerque T, Pardal-Fernandez JM, Nava E, de Cabo C, Sancho C, Lopez DE (2013) EEG characterization of audiogenic seizures in the hamster strain gash:Sal. Epilepsy Res 106:318-325.

      Cho KO, Lybrand ZR, Ito N, Brulet R, Tafacory F, Zhang L, Good L, Ure K, Kernie SG, Birnbaum SG, Scharfman HE, Eisch AJ, Hsieh J (2015) Aberrant hippocampal neurogenesis contributes to epilepsy and associated cognitive decline. Nat Commun 6:6606.

      Esmaeili B, Weisholtz D, Tobochnik S, Dworetzky B, Friedman D, Kaffashi F, Cash S, Cha B, Laze J, Reich D, Farooque P, Gholipour T, Singleton M, Loparo K, Koubeissi M, Devinsky O, Lee JW (2023) Association between postictal EEG suppression, postictal autonomic dysfunction, and sudden unexpected death in epilepsy: Evidence from intracranial EEG. Clin Neurophysiol 146:109-117.

      Gascoigne SJ, Waldmann L, Schroeder GM, Panagiotopoulou M, Blickwedel J, Chowdhury F, Cronie A, Diehl B, Duncan JS, Falconer J, Faulder R, Guan Y, Leach V, Livingstone S, Papasavvas C, Thomas RH, Wilson K, Taylor PN, Wang Y (2023) A library of quantitative markers of seizure severity. Epilepsia 64:1074-1086.

      Hahn T et al. (2023) Towards a network control theory of electroconvulsive therapy response. PNAS Nexus 2:pgad032.

      Jain S, LaFrancois JJ, Botterill JJ, Alcantara-Gonzalez D, Scharfman HE (2019) Adult neurogenesis in the mouse dentate gyrus protects the hippocampus from neuronal injury following severe seizures. Hippocampus 29:683-709.

      Jung KH, Chu K, Lee ST, Kim J, Sinn DI, Kim JM, Park DK, Lee JJ, Kim SU, Kim M, Lee SK, Roh JK (2006) Cyclooxygenase-2 inhibitor, celecoxib, inhibits the altered hippocampal neurogenesis with attenuation of spontaneous recurrent seizures following pilocarpine-induced status epilepticus. Neurobiol Dis 23:237-246.

      Kanner AM, Trimble M, Schmitz B (2010) Postictal affective episodes. Epilepsy Behav 19:156-158.

      Karl S, Sartorius A, Aksay SS (2024) No effect of serum electrolyte levels on electroconvulsive therapy seizure quality parameters. J ECT 40:47-50.

      Kavakbasi E, Stoelck A, Wagner NM, Baune BT (2023) Differences in cognitive adverse effects and seizure parameters between thiopental and propofol anesthesia for electroconvulsive therapy. J ECT 39:97-101.

      Kommajosyula SP, Randall ME, Tupal S, Faingold CL (2016) Alcohol withdrawal in epileptic rats - effects on postictal depression, respiration, and death. Epilepsy Behav 64:9-14.

      Krishnan GP, Bazhenov M (2011) Ionic dynamics mediate spontaneous termination of seizures and postictal depression state. J Neurosci 31:8870-8882.

      Langroudi ME, Shams-Alizadeh N, Maroufi A, Rahmani K, Rahchamani M (2023) Association between postictal suppression and the therapeutic effects of electroconvulsive therapy: A systematic review. Asia Pac Psychiatry 15:e12544.

      Mazzuferi M, Kumar G, Rospo C, Kaminski RM (2012) Rapid epileptogenesis in the mouse pilocarpine model: Video-EEG, pharmacokinetic and histopathological characterization. Exp Neurol 238:156-167.

      Medvedeva TM, Sysoeva MV, Sysoev IV, Vinogradova LV (2023) Intracortical functional connectivity dynamics induced by reflex seizures. Exp Neurol 368:114480.

      Riljak V, Maresova D, Jandova K, Bortelova J, Pokorny J (2012) Impact of chronic ethanol intake of rat mothers on the seizure susceptibility of their immature male offspring. Gen Physiol Biophys 31:173-177.

      Sahay A, Scobie KN, Hill AS, O'Carroll CM, Kheirbek MA, Burghardt NS, Fenton AA, Dranovsky A, Hen R (2011) Increasing adult hippocampal neurogenesis is sufficient to improve pattern separation. Nature 472:466-470.

      Scharfman HE, Goodman JH, Sollas AL (2000) Granule-like neurons at the hilar/CA3 border after status epilepticus and their synchrony with area CA3 pyramidal cells: Functional implications of seizure-induced neurogenesis. J Neurosci 20:6144-6158.

      Singh B, Singh D, Goel RK (2012) Dual protective effect of passiflora incarnata in epilepsy and associated post-ictal depression. J Ethnopharmacol 139:273-279.

      Smith ZZ, Benison AM, Bercum FM, Dudek FE, Barth DS (2018) Progression of convulsive and nonconvulsive seizures during epileptogenesis after pilocarpine-induced status epilepticus. J Neurophysiol 119:1818-1835.

      Sun MY, Yetman MJ, Lee TC, Chen Y, Jankowsky JL (2014) Specificity and efficiency of reporter expression in adult neural progenitors vary substantially among nestin-creer(t2) lines. J Comp Neurol 522:1191-1208.

      Uva L, de Curtis M (2020) Activity- and ph-dependent adenosine shifts at the end of a focal seizure in the entorhinal cortex. Epilepsy Res 165:106401.

      Vilan A, Grangeia A, Ribeiro JM, Cilio MR, de Vries LS (2024) Distinctive amplitude-integrated EEG ictal pattern and targeted therapy with carbamazepine in kcnq2 and kcnq3 neonatal epilepsy: A case series. Neuropediatrics 55:32-41.

      Zhao C, Tang Y, Xiao Y, Jiang P, Zhang Z, Gong Q, Zhou D (2024) Asymmetrical cortical surface area decrease in epilepsy patients with postictal generalized electroencephalography suppression. Cereb Cortex 34.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      First, all the experiments are performed in Jurkat T cells that may not recapitulate the regulation of polarization in primary T cells.

      To extend our results in Jurkat cells forming IS to primary cells, we have now performed experiments using synapses established by Raji cells and either primary T cells  (TCRmediated) or primary CAR T cells (CAR-mediated) (new Suppl. Fig. S7). These experiments clearly show the presence of FMNL1 at these two different IS classes (new Suppl. Fig. S7), similar to what was found in Jurkat-Raji synapses. In addition, since most of the experiments were performed in Jurkat cells, we have changed the title of our manuscript, to be faithful to the main body of our results. New sentences dealing with this important issue have been included in the Results and Discussion sections.

      Moreover, all the experiments analyzing the role of PKCdelta are performed in one clone of wt or PKCdelta KO Jurkat cells. This is problematic since clonal variation has been reported in Jurkat T cells.

      Referee is right, this is the reason why we have studied three different control clones (C3, C9, C7) and three PKCdelta-interfered clones (P5, P6 and S4) all derived from JE6.1 clone and the results have been previously published (Herranz et al 2019)(Bello-Gamboa et al 2020). All these clones expressed similar levels of the relevant cell surface molecules and formed synaptic conjugates with similar efficiency (Herranz et al 2019). The P5, P6 and S4 clones exhibited a similar defect in MVB/MTOC polarization when compared with the control clones (Herranz et al 2019)(Bello-Gamboa et al 2020). Experiments developed by other researchers using a different clone of Jurkat (JE6.1) and primary CD4+ and CD8+ lymphocytes interfered in FMNL1 (Gomez et al. 2007), showed a comparable defect in MTOC polarization to that found in our control clones when were transiently interfered in FMNL1 (Bello-Gamboa et al 2020, this manuscript). In this manuscript we have studied, instead of canonical JE6.1 clone, C3 and C9 control clones derived from JE6.1, since the puromycin-resistant control clones (containing a scramble shRNA) were isolated by limiting dilution together with the PKCdelta-interfered clones (Herranz et al. 2019), thus C3 and C9 clones are the best possible controls to compare with P5 and P6 clones. Please realize that microsatellite analyses, available upon request, supports the identity of our C3 clone with JE6.1. Moreover, when GFP-PKCdelta was transiently expressed in the three PKCdelta-interfered clones, MTOC/MVB polarization was recovered to control levels (Herranz et al. 2019). Therefore, the deficient MTOC/MVB polarization in all these clones is exclusively due to the reduction in PKCdelta expression (Herranz et al 2019), and thus clonal variation cannot underlie our results in stable clones. We have now included new sentences to address this important point and to mention the inability of FMNL1betaS1086D to revert the deficient MTOC polarization occurring in P6 PKCdelta-interfered clone, as occurred in P5 clone. Due to the fact we have now included more figures and panels to satisfy editor and referees’s comments, we have not included the dot plot data corresponding to C9 and P6 clones to avoid a too long and repetitive manuscript. Since all the FMNL1 interference and FMNL1 variants reexpression experiments were performed in transient assays (2-4 days after transfection), there was no chance for any clonal variation in these short-time experiments. Moreover, internal controls using untransfected cells or Raji cells unpulsed with SEE were carried out in all these transient experiments.

      Finally, although convincing, the defect in the secretion of vesicles by T cells lacking phosphorylation of FMNL1beta on S1086 is preliminary. It would be interesting to analyze more precisely this defect. The expression of the CD63‑GFP in mutants by WB is not completely convincing. Are other markers of extracellular vesicles affected, e.g. CD3 positive?

      We acknowledge this comment. It is true that the mentioned results do not directly demonstrate the presence of exosomes at the synaptic cleft of the synapses, since the nanovesicles were harvested from the cell culture supernatants from synaptic conjugates and these nanovesicles could be produced by multi‑directional degranulation of MVBs. To address this important issue, we have performed STED super‑resolution imaging of the immune synapses made by control and FMNL1-interfered cells. Nanosized (100-150 nm) CD63+ vesicles can be found in the synaptic cleft between APC and control cells with polarized MVBs, whereas we could not detect these vesicles in the synaptic cleft from FMNL1-interfered cells that maintain unpolarized MVBs (New Fig. 10). New sentences have been included in the Results and Discussion dealing with this important point. Regarding the use of CD3 as a marker of extracellular vesicles, please realize that CD3 is neither an enriched nor a specific marker of exosomes, since it is also present in plasma membrane shedding vesicles, molting vesicles from microvilli, apoptotic bodies and small cell fragments, apart from exosomes, thus we have preferred to use the canonic exosome marker CD63 as a general exosome reporter readout, for WB and immunofluorescence (MVBs, exosomes), time-lapse of MVBs (suppl. Video 8) and super resolution experiments (Fig. 10).   

      Reviewer #2 (Public Review):

      Summary:

      The authors have addressed the role of S1086 in the FMNL1beta DAD domain in 4 F-actin dynamics, MVB polarization, and exosome secretion, and investigated the potential implication of PKCdelta, which they had previously shown to regulate these processes, in FMNL1beta S1086 phosphorylation. This is based on:

      (1) the documented role of FMNL1 proteins in IS formation

      (2) their ability to regulate F-actin dynamics

      (3) the implication of PKCdelta in MVB polarization to the IS and FMNL1beta phosphorylation

      (4) the homology of the C-terminal DAD domain of FMNL1beta with FMNL2, where a phosphorylatable serine residue regulating its auto-inhibitory function had been previously identified. They demonstrate that FMNL1beta is indeed phosphorylated on S1086 in a PKCdelta-dependent manner and that S1086-phosphorylated FMNL1beta acts downstream of PKCdelta to regulate centrosome and MVB polarization to the IS and exosome release. They provide evidence that FMNL1beta accumulates at the IS where it promotes F-actin clearance from the IS center, thus allowing for MVB secretion.  

      Strengths

      The work is based on a solid rationale, which includes previous findings by the authors establishing a link between PKCdelta, FMNL1beta phosphorylation, synaptic F-actin clearance, and MVB polarization to the IS. The authors have thoroughly addressed the working hypotheses using robust tools. Among these, of particular value is an expression vector that allows for simultaneous RNAi-based knockdown of the endogenous protein of interest (here all FMNL1 isoforms) and expression of wild-‐‑type or mutated versions of the protein as YFP‐tagged proteins to facilitate imaging studies. The imaging analyses, which are the core of the manuscript, have been complemented by immunoblot and immunoprecipitation studies, as well as by the measurement of exosome release (using a transfected MVB/exosome reporter to discriminate exosomes secreted by T cells).

      Weaknesses

      The data on F-‐‑actin clearance in Jurkat T cells knocked down for FMNL1 and expressing wild-type FMNL1 or the non‑phosphorylatable or phosphomimetic mutants thereof would need to be further strengthened, as this is a key message of the manuscript. Also, the entire work has been carried out on Jurkat cells. Although this is an excellent model easily amenable to genetic manipulation and biochemical studies, the key finding should be validated on primary T cells

      Referee’s global assessment is right. To extend our results in Jurkat cells forming IS, we have now performed experiments using synapses established by Raji cells and either primary T cells (TCR-mediated) or primary CAR T cells (CAR-mediated) (new Suppl. Fig. S7). These experiments clearly show the presence of FMNL1 at these two different IS classes (new Suppl. Fig. S7), similar to what was found in Jurkat-Raji synapses. In addition, since most of the experiments were performed in Jurkat cells, we have changed the title of our manuscript, to be faithful to the main body of our results. New sentences have been included in Results and Discussion to address these important points.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This study shows the role of the phosphorylation of FMNL1b on S1086 on the polarity of T lymphocytes in T lymphocytes, which is a new and interesting finding. It would be important to confirm some of the key results in primary T cells and to analyze in-depth the defect in actin remodeling (quantification of the images, analysis of some key actors of actin remodeling). The description of the defect in the secretion of extracellular vesicles would also benefit from a more accurate analysis of the content of vesicles. 

      Referee is right.  We have now performed experiments using synapses containing Raji cells and either primary T cells (TCR-mediated) or primary CAR T cells (CAR-mediated) (new Suppl. Fig. S7). These experiments clearly show the presence of FMNL1 at these two different IS classes, similar to what was found in Jurkat-‐‑Raji synapses. Moreover, since most of the experiments were performed in Jurkat cells, we have changed the title of our manuscript, to be faithful to the main body of our results. Regarding the use of CD63 instead of other markers such as for instance,  CD3 (as stated by the other referee), please realize that CD3 is neither an enriched nor a specific marker of exosomes, since it is also present in plasma membrane shedding vesicles, molting vesicles from microvilli, apoptotic bodies and small cell fragments, apart from exosomes, thus we have preferred to use the accepted consensus, canonic extracellular vesicle marker CD63 (International Society of Extracellular Vesicles positioning, Thery et al 2018, doi: 10.1080/20013078.2018.1535750. eCollection 2018., Alonso et al. 2011) as a general exosome reporter readout, for both WB, immunofluorescence (MVBs, exosomes) and super-resolution experiments. Accordingly, GFP-‐‑CD63 reporter plasmid was used for exosome secretion in transient expression studies and living cell time-lapse experiments (Suppl. Video 8). Any other exosome marker will also be present in Raji cells and will not allow to analyse exclusively the secretion of exosomes by the effector Jurkat cells, since B lymphocytes produce a large quantity of exosomes upon MHC‑II stimulation by Th lymphocytes (Calvo et al, 2020, doi:10.3390/ijms21072631). To reinforce the exosome data in the context of the immune synapse, STED super-resolution imaging of the immune synapses made by control and FMNL1‑interfered cells was performed. Nanosized (100-150 nm) CD63+ vesicles can be found in the synaptic cleft of control cells with polarized MVBs, whereas we could no detect these vesicles in the synaptic cleft from FMNL1-interfered cells that maintain unpolarized MVBs (new Fig. 10).

      Moreover, all the videos are not completely illustrative. For example, in video 2 it would be more appropriate show only the z plane corresponding to the IS to see more precisely the F-actin remodeling relative to CD63 labeling.

      Referee is right. It is true that the upper rows in some videos may distract the reader of the main message contained in the lower row, that includes the 90º turn-generated, zx plane corresponding to the IS interface. Accordingly, we have maintained the still images of the whole synaptic conjugates in the first row from video 2; this will allow the reader to perceive a general view of the fluorochromes on the whole cell conjugates, as a reference, and to compare precisely the F-actin remodeling relative to CD63 labeling only at the zx interface (lower row). We have now processed the videos 1 and 5 following similar criteria

      The quality of videos 3 and 4 are not good enough. For video 7, it seems that the labeling of phospho-‐‑Ser is very broad at the IS, which is expected since it should label all the proteins that are phosphorylated by PKCs. The resolution of microscopy (at the best 200 to 300 nm) does not allow us to conclude on the co-‐localization of FMNL1b with phospho-‐‑Ser and is thus not conclusive. Finally, the study would benefit from a more careful statistical analysis. The dot plots showing polarity are presented for one experiment. Yet, the distribution of the polarity is broad. Results of the 3 independent experiments should be shown and a statistical analysis performed on the independent experiments

      Referee is right, we have amended video settings (brightness/contrast) in videos 3 and 4 to improve this issue. In addition, we would like to remark that the translocation of proteins to cellular substructures in living cells is not a trivial issue, since certain protein localizations are too dynamic to be properly imaged with enough spatial resolution. The equilibrium resulting from the association/dissociation of a certain protein to the membrane, in addition to the protein diffusion naturally occurring in living cells, as well as signal intensity fluctuations inherent to the stochastic nature of fluorescence emission often provide barriers for image quality (Shroff et al, 2024). Thus, additional image blurring is expected when compared with that observed in fixed samples. However, we think it is important to provide the potential readers with a dynamic view of FMNL1 localization, which can only be achieved through real-time videos, in addition to the still frames from the same videos provided in Fig. 6A (the referee did not argue against the inclusion of these frames), together with images from fixed cells in Fig 6B, for comparison. This is the reason why we have preferred to maintain the improved videos to complement the results of some spare frames from the videos, together with images from fixed cells in the same figure (Fig. 6).

      Regarding video 7, we agree that colocalization is limited by the spatial resolution of confocal  microscopy,  and this fact does not allow us to infer that FMNL1beta is phosphorylated at the IS. However, please realize we have never concluded this in our manuscript.  Instead, we claimed that “colocalization of endogenous FMNL1 and YFP‑FMNL1βWT with anti‑phospho‑Ser  …is compatible with the idea that both endogenous FMNL1 and YFP‑FMNL1βWT are specifically phosphorylated at the cIS”. Moreover, we have now performed colocalization in super‑resolved STED microscopy images, that reduces the XY resolution down to 30-­40 nm (Suppl. Fig. S12), and the results also support colocalization of endogenous FMNL1 with anti-phospho‑Ser PKC at the IS within a 30 nm resolution limit. We have now somewhat softened our conclusion: “Although all these data did not allow us to infer that FMNL1β is phosphorylated at the IS due to the resolution limit of confocal and STED microscopes, the results are compatible with the idea that both endogenous FMNL1 and YFP-FMNL1βWT are specifically phosphorylated at the cIS”.   

      Regarding statistical analyses we agree the dot distribution in the polarity experiments is quite broad, but this is consistent with the end point strategy used by a myriad of research groups (including ourselves) to image an intrinsically stochastic, rapid and asynchronous processes such as immune synapse formation and to score MTOC/MVB  polarization (Calvo et al 2018, https://doi.org/10.3389/fimmu.2018.00684). Despite this fact,  ANOVA  analyses have underscored the statistical significance of all the experiments represented by dot plot experiments. We cannot average or perform meta statistical analyses by combining the equivalent cohort results from independent experiments, since we have observed that small variations of certain variables (SEE concentration, cell recovery, time after transfection, etc.) affect synapse formation and PI values among experiments without altering the final outcome in each case. Please, note that our manuscript includes now 10  multi‑panel figures,  12  multi‑panel supplementary figures and 8 videos, and it is already quite large.  Thus,  we feel the inclusion of redundant, triplicate dot plot figures will dilute and distract to any potential reader from the main message of our already comprehensive contribution. We have now included new sentences at the figure legends to remark ANOVA analyses were executed separately in all the 3 independent experiments.

      Reviewer #2 (Recommendations For The Authors):

      (1) The key findings should be validated on primary CD4+ T cells (of which Jurkat is a transformed model).

      Referee is right. However, as commented by the other referee, the data from activating surfaces clearly shows that the synaptic actin architecture of the immune synapse from primary CD8+ T cells is essentially indistinguishable and thus unbiased from that of Jurkat T cells, but different to that of primary CD4+ cells (Murugesan, 2016). Thus, our data in Jurkat T cells are directly applicable to the synaptic architecture of primary CD8+ cells. In addition, to definitely extend our results in Jurkat cells forming IS, we have performed experiments using synapses established by Raji cells and either primary T cells (TCR-mediated) or primary CAR T cells (CAR-mediated) (new Suppl. Fig. S7) challenged by Raji cells. We have preferred to work with mixed CD4+ and CD8+ cells in order to maintain potential interactions in trans between these subpopulations that may affect or influence IS formation. These experiments clearly show the presence of FMNL1 at these two different IS classes (new Suppl. Fig. S7), similar to what was found in JurkatRaji synapses. Moreover, since most of the experiments were performed in Jurkat cells as stated by the referee, we have changed the title of our manuscript, to circumscribe our results to the model we have used and to be faithful to the main body of our results.

      (2) The image of wt YFP-­FMNL1beta in Figure 4A displays a weak CD63 signal and shows an asymmetric polarization of both the centrosome and MVBs. It should be replaced with a more representative one.

      Referee is right. Accordingly, we have modified the CD63 channel settings (brightness/contrast) in this panel to make it comparable to the other panels in the same figure. In addition, thanks to this referee´s comment, we have realized the position of the MTOC (yellow dot) in the diagram in the right side of the YFP-FMNL1betaWT panels row appeared mislocated, producing the mentioned apparent asymmetry with respect to MVBs’s center of mass (green dot) position. This mistake leads to an apparent segregation between the position of the center of mass of these organelles which certainly does not correspond with the real image. We have now amended the scheme and we apologize for this mistake.

      (3) The images showing F-­actin clearance at the IS (Figure 8, S4, S5) are not very convincing, also when looking at the MFI along the T cell-­‐‑APC interface in the en-­‐face  views.  Since  the  F-­actin  signal  also  includes  some  signal  from  the  APC, transfecting T cells with an actin reporter to selectively image T cell actin could better clarify this key point.

      Referee´s point is correct. However, we (83), and other researchers using the proposed actin reporter approach in the same Raji/Jurkat IS model (Fig. 4 in ref 84) have already excluded the possibility that actin cytoskeleton of Raji cells can also contribute to the measurements of synaptic F-actin. In Materials and Methods, page 37, lines 1048-1055 we included this related sentence:  ¨It is important to remark that MHC-II-antigen triggering on the B cell side of the Th synapse does not induce noticeable F-­actin changes along the synapse (i.e. F-­actin clearing at the central IS), in contrast to TCR stimulation on T cell side (84) (85) (3). In addition, we have observed that majority of F‐‑actin changes along the IS belongs to the Jurkat cell (83). Thus, the contribution to the analyses of the residual, invariant F‐actin from the B cell is negligible using our protocol (83).

      Thus, we can exclude this caveat may affect our results.

      (4) A similar consideration applies to the MVB distribution in the en‑face images. For example, in Figure S5 the MVB profile, with some peripheral distribution, does not appear very different in cells expressing wt YFP‑tagged FMNL1beta versus the S1086A‑expressing cells.

      The referee's assessment regarding Supp. Figure S5 is valid. Using only the plot profile, the outcomes obtained with YFP-FMNL1βWT may appear comparable to those derived from YFP-FMNL1βS1086A. Nonetheless, this resemblance is attributed to the plot profile's exclusive consideration of the MVBs signal in the interface from the immune synapse region (white rectangle). The upper images (second row), where the whole cell is displayed, illustrate that in YFP-FMNL1βWT, MVB are specifically accumulated within this specific region, in contrast to the scattered distribution observed in YFP-FMNL1βS1086A, where MVB are dispersed throughout the cell without distinction. While MVBs are evident in both instances within the synapse region, the reason behind this observation is different. The YFP-FMNL1βWT transfected cell (third column) shows a pronounced MVB concentration within the synaptic area (white rectangle), which leads to MVB PI=0.52, whereas the YFP-FMNL1βS1086A transfected cell (fourth column), as it presents a scattered distribution of MVB throughout the cell, also exhibits some MVB (but only a small proportion of the total cellular MVB) in the synaptic area, which yields MVB PI=-0.09. Please realise that the position of the center of mass of the distribution of MVB (MVBC) labelled in this figure (white squares) is an unbiased parameter that mirrors MVB center of mass polarization. A new sentence has been included in the figure legend to clarify this important point.

      (5) The image in the first row in Figure 6B does not show a clear accumulation of FMNL1beta at the IS, possibly because the T cell is in contact with two APCs. This image should be replaced.

      Referee is right Therefore, we have replaced the quoted example with a single cell:cell synapse that shows a clearer and more localized accumulation in the cIS, thereby avoiding the mentioned caveat.

      (6) In Figure 2A the last row shows what appears to be a T:T cell conjugate (with one cell expressing the YFP-­‐‑tagged protein). The image should be replaced with another showing a T cell-­APC (blue) conjugate.

      Referee is right, we have accordingly replaced the mentioned image with a T cell:APC conjugate.

      (7) The Discussion is very long and dispersive. It would benefit from shortening it and making it more focused.

      Referee is right, we have shortened and focused it, by eliminating the whole second and third paragraphs of the discussion. Moreover, a whole paragraph in page 24 has been also deleted.

      We have also focussed the discussion towards the new data in primary T lymphocytes.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This paper uses a model of binge alcohol consumption in mice to examine how the behaviour and its control by a pathway between the anterior insular cortex (AIC) to the dorsolateral striatum (DLS) may differ between males and females. Photometry is used to measure the activity of AIC terminals in the DLS when animals are drinking and this activity seems to correspond to drink bouts in males but not females. The effects appear to be lateralized with inputs to the left DLS being of particular interest. 

      Strengths: 

      Increasing alcohol intake in females is of concern and the consequences for substance use disorder and brain health are not fully understood, so this is an area that needs further study. The attempt to link fine-grained drinking behaviour with neural activity has the potential to enrich our understanding of the neural basis of behaviour, beyond what can be gleaned from coarser measures of volumes consumed etc. 

      Weaknesses: 

      The introduction to the drinking in the dark (DID) paradigm is rather narrow in scope (starting line 47). This would be improved if the authors framed this in the context of other common intermittent access paradigms and gave due credit to important studies and authors that were responsible for the innovation in this area (particularly studies by Wise, 1973 and returned to popular use by Simms et al 2010 and related papers; e.g., Wise RA (1973). Voluntary ethanol intake in rats following exposure to ethanol on various schedules. Psychopharmacologia 29: 203-210; Simms, J., Bito-Onon, J., Chatterjee, S. et al. Long-Evans Rats Acquire Operant Self-Administration of 20% Ethanol Without Sucrose Fading. Neuropsychopharmacol 35, 1453-1463 (2010).)

      We appreciate the reviewer’s perspective on the history of the alcohol research field. There are hundreds of papers that could be cited regarding all the numerous different permutations of alcohol drinking paradigms. This study is an eLife “Research Advances” manuscript that is a direct follow-up study to a previously published study in eLife (Haggerty et al., 2022) that focused on the Drinking in the Dark model of binge alcohol drinking. This study must be considered in the context of that previous study (they are linked), and thus we feel that a comprehensive review of the literature is not appropriate for this study.

      The original drinking in the dark demonstrations should also be referenced (Rhodes et al., 2005). Line 154 Theile & Navarro 2014 is a review and not the original demonstration. 

      This is a good recommendation. We have added this citation to Line 33 and changed Line 154.

      When sex differences in alcohol intake are described, more care should be taken to be clear about whether this is in terms of volume (e.g. ml) or blood alcohol levels (BAC, or at least g/kg as a proxy measure). This distinction was often lost when lick responses were being considered. If licking is similar (assuming a single lick from a male and female brings in a similar volume?), this might mean males and females consume similar volumes, but females due to their smaller size would become more intoxicated so the implications of these details need far closer consideration. What is described as identical in one measure, is not in another. 

      As shown in Figure 1, all measures of intake are reported as g/kg for both water and alcohol to assess intakes across fluids that are controlled by body weights. We do not reference changes in fluid volume or BACs to compare differences in measured lickometry or photometric signals, except in one instance where we suggest that the total volume of water (ml) is greater than the total amount of alcohol (ml) consumed in DID sessions, but this applies generally to all animals, regardless of sex, across all the experimental procedures.

      In Figure 2 – Figure Supplement 1 we show drinking microstructures across single DID sessions, and that males and females drink similarly, but not identically, when assessing drinking measures at the smallest timescale that we have the power to detect with the hardware we used for these experiments. Admittedly, the variability seen in these measures is certainly non-zero, and while we are tempted to assume that there exist at least some singular drinks that occur identically between males and females in the dataset that support the idea that females are simply just consuming more volume of fluid per singular drink, we don’t have the sampling resolution to support that claim statistically. Further, even if females did consume more volume per singular drink that males, we do not believe that is enough information to make the claim that such behavior leads to more “intoxication” in females compared males, as we know that alcohol behaviors, metabolism, and uptake/clearance all differ significantly by sex and are contributing factors towards defining an intoxication state. We’ve amended the manuscript to remove any language of referencing these drinking behaviors as identical to clear up the language.

      No conclusions regarding the photometry results can be drawn based on the histology provided. Localization and quantification of viral expression are required at a minimum to verify the efficacy of the dual virus approach (the panel in Supplementary Figure 1 is very small and doesn't allow terminals to be seen, and there is no quantification). Whether these might differ by sex is also necessary before we can be confident about any sex differences in neural activity. 

      We provide hit maps of our fiber placements and viral injection centers, as we have, and many other investigators do regularly for publication based on histological verification. Figure 1A clearly shows the viral strategy taken to label AIC to DLS projections with GCaMP7s, and a representative image shows green GCaMP positive terminals below the fiber placement. Considering the experiments, animals without proper viral expression did not display or had very little GCaMP signal, which also serves as an additional expression-based control in addition to typical histology performed to confirm “hits”. These animals with poor expression or obvious misplacement of the fiber probes were removed as described in the methods. Further, we also report our calcium signals as z-scored differences in changes in observed fluorescence, thus we are comparing scaled averages of signals across sexes, and days, which helps minimize any differences between “low” or “high” viral transduction levels at the terminals, directly underneath the tips of the fibers.

      While the authors have some previous data on the AIC to DLS pathway, there are many brain regions and pathways impacted by alcohol and so the focus on this one in particular was not strongly justified. Since photometry is really an observational method, it's important to note that no causal link between activity in the pathway and drinking has been established here. 

      As mentioned above, this article is an eLife Research Advances article that builds on our previous AIC to DLS work published in eLife (Haggerty et al., 2022). Considering that this is a linked article, a justification for why this brain pathway was chosen is superfluous. In addition, an exhaustive review of all the different brain regions and pathways that are affected by binge alcohol consumption to justify this pathway seems more appropriate to a review article than an article such as this.  

      We make no claims that photometric recordings are anything but observational, but we did observe these signals to be different when time-locked to the beginning of drinking behaviors. We describe this link between activity in the pathway and drinking throughout the manuscript. It is indeed correlational, but just because it is not causal does not mean that our findings are invalid or unimportant.

      It would be helpful if the authors could further explain whether their modified lickometers actually measure individual licks. While in some systems contact with the tongue closes a circuit which is recorded, the interruption of a photobeam was used here. It's not clear to me whether the nose close to the spout would be sufficient to interrupt that beam, or whether a tongue protrusion is required. This detail is important for understanding how the photometry data is linked to behaviour. The temporal resolution of the GCaMP signal is likely not good enough to capture individual links but I think more caution or detail in the discussion of the correspondence of these events is required. 

      The lickometers do not capture individual licks, but a robust quantification of the information they capture is described in Godynyuk et al. 2019 and referenced in multiple other papers (Flanigan et al. 2023, Haggerty et al. 2022, Grecco et al. 2022, Holloway et al. 2023) where these lickometers have been used. However, individual lick tracking is not a requirement for tracking drinking behaviors more generally. The lickometers used clearly track when the animals are at the bottles, drinking fluids, and we have used the start of that lickometer signal to time-lock our photometry signals to drinking behaviors. We make no claims or have any data on how photometric signals may be altered on timescales of single licks. In regard to how AIC to DLS signals change on the second time scale when animals initiate drinking behaviors, we believe we explain these signals with caution and in context of the behaviors they aim to describe.

      Even if the pattern of drinking differs between males and females, the use of the word "strategy" implies a cognitive process that was never described or measured. 

      We use the word strategy to describe a plan of action that is executed by some chunking of motor sequences that amounts to a behavioral event, in this case drinking a fluid. We do not mean to imply anything further than this by using this specific word.

      Reviewer #2 (Public Review): 

      Summary: 

      This study looks at sex differences in alcohol drinking behaviour in a well-validated model of binge drinking. They provide a comprehensive analysis of drinking behaviour within and between sessions for males and females, as well as looking at the calcium dynamics in neurons projecting from the anterior insula cortex to the dorsolateral striatum. 

      Strengths: 

      Examining specific sex differences in drinking behaviour is important. This research question is currently a major focus for preclinical researchers looking at substance use. Although we have made a lot of progress over the last few years, there is still a lot that is not understood about sex-differences in alcohol consumption and the clinical implications of this. 

      Identifying the lateralisation of activity is novel, and has fundamental importance for researchers investigating functional anatomy underlying alcohol-driven behaviour (and other reward-driven behaviours). 

      Weaknesses: 

      Very small and unequal sample sizes, especially females (9 males, 5 females). This is probably ok for the calcium imaging, especially with the G-power figures provided, however, I would be cautious with the outcomes of the drinking behaviour, which can be quite variable. 

      For female drinking behaviour, rather than this being labelled "more efficient", could this just be that female mice (being substantially smaller than male mice) just don't need to consume as much liquid to reach the same g/kg. In which case, the interpretation might not be so much that females are more efficient, as that mice are very good at titrating their intake to achieve the desired dose of alcohol. 

      We agree that the “more efficient” drinking language could be bolstered by additional discussion in the text, and thus have added this to the manuscript starting at line 440.

      I may be mistaken, but is ANCOVA, with sex as the covariate, the appropriate way to test for sex differences? My understanding was that with an ANCOVA, the covariate is a continuous variable that you are controlling for, not looking for differences in. In that regard, given that sex is not continuous, can it be used as a covariate? I note that in the results, sex is defined as the "grouping variable" rather than the covariate. The analysis strategy should be clarified. 

      In lines 265-267, we explicitly state that the covariate factor was sex, which is mathematically correct based on the analyses we ran. We made an in-text error where we referred to sex as a grouping variable on Line 352, when it should have been the covariate. Thank you for the catch and we have corrected the manuscript.

      But, to reiterate, we are attempting to determine if the regression fits by sex are significantly different, which would be reported as a significant covariate. Sex is certainly a categorical variable, but the two measures at which we are comparing them against are continuous, so we believe we have the validity to run an ANCOVA here.

      Reviewer #3 (Public Review): 

      Summary: 

      In this manuscript by Haggerty and Atwood, the authors use a repeated binge drinking paradigm to assess how water and ethanol intake changes in male in female mice as well as measure changes in anterior insular cortex to dorsolateral striatum terminal activity using fiber photometry. They find that overall, males and females have similar overall water and ethanol intake, but females appear to be more efficient alcohol drinkers. Using fiber photometry, they show that the anterior insular cortex (AIC) to dorsolateral striatum projections (DLS) projections have sex, fluid, and lateralization differences. The male left circuit was most robust when aligned to ethanol drinking, and water was somewhat less robust. Male right, and female and left and right, had essentially no change in photometry activity. To some degree, the changes in terminal activity appear to be related to fluid exposure over time, as well as within-session differences in trial-by-trial intake. Overall, the authors provide an exhaustive analysis of the behavioral and photometric data, thus providing the scientific community with a rich information set to continue to study this interesting circuit. However, although the analysis is impressive, there are a few inconsistencies regarding specific measures (e.g., AUC, duration of licking) that do not quite fit together across analytic domains. This does not reduce the rigor of the work, but it does somewhat limit the interpretability of the data, at least within the scope of this single manuscript. 

      Strengths: 

      - The authors use high-resolution licking data to characterize ingestive behaviors. 

      - The authors account for a variety of important variables, such as fluid type, brain lateralization, and sex. 

      - The authors provide a nice discussion on how this data fits with other data, both from their laboratory and others'. 

      - The lateralization discovery is particularly novel. 

      Weaknesses: 

      - The volume of data and number of variables provided makes it difficult to find a cohesive link between data sets. This limits interpretability.

      We agree there is a lot of data and variables within the study design, but also believe it is important to display the null and positive findings with each other to describe the changes we measured wholistically across water and alcohol drinking.

      - The authors describe a clear sex difference in the photometry circuit activity. However, I am curious about whether female mice that drink more similarly to males (e.g., less efficiently?) also show increased activity in the left circuit, similar to males. Oppositely, do very efficient males show weaker calcium activity in the circuit? Ultimately, I am curious about how the circuit activity maps to the behaviors described in Figures 1 and 2. 

      In Figure 3C, we show that across the time window of drinking behaviors, that female mice who drink alcohol do have a higher baseline calcium activity compared to water drinking female mice, so we believe there are certainly alcohol induced changes in AIC to DLS within females, but there remains to be a lack of engagement (as measured by changes in amplitude) compared to males. So, when comparing consummatory patterns that are similar by sex, we still see the lack of calcium signaling near the drinking bouts, but small shifts in baseline activity that we aren’t truly powered to resolve (using an AUC or similar measurements for quantification) because the shifts are so small. Ultimately, we presume that the AIC to DLS inputs in females aren’t the primary node for encoding this behavior, and some recent work out of David Werner’s group (Towner et al. 2023) suggests that for males who drink, the AIC becomes a primary node of control, whereas in females, the PFC and ACC, are more engaged. Thus, the mapping of the circuit activity onto the drinking behaviors more generally represented in Figures 1 and 2 may be sexually dimorphic and further studies will be needed to resolve how females engage differential circuitry to encode ongoing binge drinking behaviors.

      - What does the change in water-drinking calcium imaging across time in males mean? Especially considering that alcohol-related signals do not seem to change much over time, I am not sure what it means to have water drinking change. 

      The AIC seems to encode many physiologically relevant, interoceptive signals, and the water drinking in males was also puzzling to us as well. Currently, we think it may be both the animals becoming more efficient at drinking out of the lickometers in early weeks and may also be signaling changes due to thirst states of taste associated with the fluid. While this is speculation, we need to perform more in-depth studies to determine how thirst states or taste may modulate AIC to DLS inputs, but we believe that is beyond the scope of this current study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Line 45 - states alcohol use rates are increasing in females across the past half-decade. I thought this trend was apparent over the past half-century? Please consider revising this. 

      According to NIAAA, the rates of alcohol consumption in females compares to males has been closing for about the past 100 years now, but only recently are those trends starting to reverse, where females are drinking similar amounts or more than males.

      Placing more of the null findings into supplemental data would make the long paper more accessible to the reader. 

      In reference to reviewer’s three’s point as well, there is a lot of data we present, and we hope for others to use this data, both null and positive findings in their future work. As formatted on eLife’s website, we think it is important to place these findings in-line as well.

      Reviewer #2 (Recommendations For The Authors): 

      In addition to the points raised about analysis and interpretation in the Public Review, I have a minor concern about the written content. I find the final sentence of the introduction "together these findings represent targets for future pharmacotherapies.." a bit unjustified and meaningless. The findings are important for a basic understanding of alcohol drinking behaviour, but it's unclear how pharmacotherapies could target lateralised aic inputs into dls. 

      There are on-going studies (CANON-Pilot Study, BRAVE Lab, Stanford) for targeted therapies that use technologies like TMS and focused ultrasound to activate the AIC to alleviate alcohol cravings and decrease heavy drinking days. The difficulty with these next-generation therapeutics is often targeting, and thus we think this work may be of use to those in the clinic to further develop these treatments. We agree that this data does not support the development of pharmacotherapies in a traditional sense, and thus have removed the word and added text to reference TMS and ultrasound approaches to bolster this statement in lines 101+.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the reviewers for their overall positive assessment of our manuscript. We have used their constructive feedback to substantially improve our manuscript as described below.

      Reviewer #1

      Evidence, reproducibility and clarity

      This study by Reyes at al is a well conducted analysis of memory B cell dynamics of Plasmodium falciparum (Pf) -specific B cell populations over the course of reducing Pf prevalence in ten Ugandan adults. The data is presented well and the authors provide compelling evidence that 1. There is an overall loss of Ag specific B cells with reduction in exposure and 2. Different antigens (MSP1/AMA-1 vs CIDRa-1) generate different flavors of long lived responses. However, additional clarity to the reader should be provided on certain topics (listed below).

      Major comments: 1. While the premise of the study (reduced Pf transmission due to the use of indoor residual spraying (IRS)) is an important one, I think the authors must take into consideration that 9/10 subjects had at least one Pf positive episode between Time Points 1 and 2 (Figure 1). Also, it looks from Fig 1 that some samples were collected at a time of Pf positive test (green squares), while in Table S1 none of the subjects have a positive parasite status at TP1.

      We recognize that most individuals had detectable parasitemia before and after time point (TP) 1. In our manuscript, we therefore do not report the time between TP1 and TP2, because we agree that the length of this time interval is not relevant in our study methodology. We only mention the time between the last known P. falciparum infection and collection of blood at the second time point. We use the sample collected at TP1 only as a representative sample obtained during a time with high P. falciparum exposure and do not make any claims based on the time between TP1 and TP2. The occurrence of infections after sample collection at TP1 confirms that parasite transmission was still high at this time. We have added a schematic of the relative levels of parasite transmission to Figure 1 to emphasize this.

      With respect to infection status, none of the donors were blood smear positive at TP1. However, as mentioned in Table S1, parasites were detected in three individuals using the more sensitive LAMP assay. These three individuals are therefore marked as parasite positive in Figure 1. Table S1 has been modified to highlight the parasite status of these three individuals.

      1. Figure S1A: What is trBC? Figure S1B: What is Strep? Are the strep positive cells also CIDR-1 positive and were they gated out? Why is APC used for MZ-1 and one of the MSP1-AMA-1 tetramers? Do these stainings come from multiple panels?

      All abbreviations of B cell populations were defined in the figure legend (for example, trBC stands for transitional B cells). To facilitate the interpretation of Figure S1, we have now included the definitions of these abbreviations in the figure.

      Strep stands for streptavidin, which has now also been clarified in the figure. In our gating strategy, we used the term “strep” to denote cells that bound to both CIDRa1 and MSP1/AMA1 tetramers, which we interpreted as non-specific binding to streptavidin or other components of the antigen tetramers. Only the “non-strep” cells were used to gate on antigen-specific cells. We have added this clarification to the figure legend.

      In panel B, we accidentally used the term MZ (for merozoite) to describe tetramers of the merozoite antigens MSP1 / AMA1. These labels are interchangeable, but to avoid confusion, MZ-1 has been changed to MSP1 / AMA1.

      1. Figure 3A: how many cells does the umap plot represent? Were there a total of 3555 Ag specific B cells that were non-naive (Figure 3E)?

      It is correct that there were a total of 3,555 antigen-specific B cells used for the clustering shown in panel A. This information has been added to Figure 3A.

      1. Could the authors comment on why in Figure 3, Ig isotype expression was not considered for clustering? This would allow for characterization of DN sub populations/ clusters in addition to the CD21-CD27- ABCs? It looks like IgD expression was low across the clusters (Figure 3D). Was this the case for the cells considered in this analysis, or was it excluded? If it was truly low expressed, how were the assessments in Figure 2 made?

      From prior experience, we know that Ig isotype information tends to dominate in the clustering, which would result in major clusters based on IgM, IgD, IgG, and IgA expression, not on expression of other markers. This is illustrated in the example below. The UMAP on the left shows clusters in green and red that consist of IgG+ and IgA+ B cells, respectively. The UMAP on the right shows that switched memory (swM) B cells and DN B cells are found in both IgG and IgA clusters. Because we were mainly interested in identifying different subsets of B cells, irrespective of Ig isotype, we did not include Ig isotype in the clustering. We have clarified in the manuscript that Ig isotypes were excluded from the analysis to prevent these from dominating the clustering:

      “Unsupervised clustering was then performed based on expression of all markers, except for Ig isotypes to prevent these from dominating the clustering.”

      IgD expression among cell clusters shown in Figure 3 was low because only non-naïve B cells were included in the analyis. The majority of non-naïve cells are class-switched memory B cells and DN B cells, which by definition do not express IgD (see gating strategy in Figure S1A). Figure 2 shows all B cell populations, including naïve B cells and non-naïve B cell populations (unswitched memory, switched memory, and DN), that were gated based on IgD and CD27 expression.

      5.Are there differences in these designations / phenotypes of DN populations of atBCs vs CD21-CD27- atBCs?

      In the malaria field, atypical B cells are typically defined as CD21-CD27-. The definition of DN2 B cells comes from the autoimmunity field and is stricter: IgD-CD27-CD21-CD11c+ B cells. In our manuscript, we define atypical B cells in a stricter way than typically done in the malaria field, following published guidelines for the identification of B cell subsets (https://doi.org/10.3389/fimmu.2019.02458). Using these guidelines, atypical B cells and DN2 B cells are phenotypically identical. We have added a reference to these published guidelines in the Results section:

      “Following published guidelines for the identification of B cell populations (21), total CD19+ B cells were divided into naïve B cells (IgD+CD27-), unswitched memory B cells (IgD+CD27+), switched memory B cells (IgD-CD27+), and double negative B cells (IgD-CD27-).”

      1. Lines 258-259: In considering only switched MBCs, what clusters from Figure 3a were included? There seem to be 2588 sw MBCs (Table S3, Figure 4). Do the remaining cells (967 cells) come from clusters 2, 5 and 6 (and excludes the atBC clusters)

      This analysis did not use the clusters presented in Figure 3, but instead used switched memory B cells gated as shown in Figure S1A. The reason for this is that the clusters in Figure 3 were generated using antigen-specific B cells and cannot be reproduced using non-antigen-specific B cells. Thus, it is not possible to separate all other B cells into the same six clusters. The only way to compare expression of certain markers between antigen-specific and non-antigen specific switched memory B cells is to gate on these populations manually. We have now tried to clarify this in the manuscript as follows:

      “we determined the percentages of CD95+ cells and CD11c+ cells among antigen-specific switched memory B cells and the total population of switched memory B cells (gated manually as shown in Figure S1A).”

      Minor comments: 1. Line 178- 179: Was there a specific measure of rate of decline made for these cells?

      We did not calculate a rate of decline of antigen-specific B cells for several reasons: 1) the time between TP1 and TP2 is not the same for all people in the study, 2) the time between last exposure and TP2 is not the same for all people, and 3) the rate of decline is most likely not linear and cannot accurately be estimated with only two data points. We have changed the wording of this sentence such that we do not use the word “rate”:

      “we did not observe a difference in the percentage of B cells with specificity for merozoite antigens or variant surface antigens that were lost.”

      In addition, we included the percentage of reduction in size in the paragraph before this section:

      “we observed that both populations decreased in size by about 50%, although these differences were not statistically significant.”

      Significance

      General assessment: Strengths: The authors provide evidence that the dynamics of antigen specific cells in humans can vary with exposure and with the nature of the antigen. They have nicely discussed the potential causes for these differences (Discussion), although they should include the findings of Ambegaonkar et al that ABCs in malaria may be restricted to responding specifically to membrane bound antigens (PMCID: PMC7380957)

      As suggested by the reviewer, we have added a paragraph to the Discussion section to discuss the results reported by Ambegaonkar et al. and how the difference between soluble vs. membrane-bound antigens may have an effect on how these antigens are perceived by B cells:

      The difference between soluble and membrane-bound antigens may also have a direct effect on how these antigens are perceived by B cells. Atypical B cells have been shown to be restricted to recognition of membrane-bound antigens (41). The interaction of a B cell with membrane-associated antigen allows the formation of an immunological synapse. Inhibitory receptors expressed by atypical B cells are excluded from this synapse, resulting in B cell receptor signaling and differentiation towards antibody-secreting cells (41). This could explain why atypical B cell subset 1 that expresses the highest levels of the inhibitory receptor FcRL5 is enriched for recognition of the CIDRα1 domain of membrane-bound protein PfEMP1. It should however be noted that soluble antigen can also be presented effectively in membrane-context by conventional dendritic cells, follicular dendritic cells, and subcapsular macrophages in secondary lymphoid organs, especially when it is part of an immune complex (reviewed in (42)). This would provide a route for atypical B cells to also respond to soluble merozoite antigens, such as MSP1 and AMA1.

      Limitations: 1. Outlined above, and as the authors also mention, a small sample size and homogenous population. 2. The evidence for reduced transmission is not clear, and the negative parasite tests for donors shown in Table S1 do not match with Figure 1 data. 3. Lack of IgD expression across clusters (Figure 3D- the authors will need to clarify this point) would require re-analysis of Figure 2 data

      1. We have provided clarification in response to the points raised by the reviewer.

      2. We believe there is clear evidence for reduced transmission, from a median of almost 2 infections per person per year prior to the implementation of IRS to a median parasite-free period of 1.7 years prior to sample collection at TP2. To further emphasize this, we have summarized the number of P. falciparum infections among the ten individuals included in this study (now included in Table S3):

      year

      Pf infections

      comment

      2012

      20

      2013

      19

      TP1

      2014

      20

      TP1

      2015

      8

      Start IRS

      2016

      0

      TP2

      This reduced parasite exposure is reflected in a decrease in immune activation as presented in Figure 2. We have clarified that the data in Table S1 did indeed match those shown in Figure 1.

      1. We have clarified that IgD expression is low in the clusters presented in Figure 3 because naïve B cells were excluded from this analysis.

      Advances: This study highlights the importance of studying antigen specific B cells in humans in the context of natural infection and the use of high-parameter tools such as spectral flow cytometry in assessing a large quantity of data from limited clinical samples. These data are important to inform better vaccine design. Studies in inbred animals can be quite limited or different from human B cell responses.

      Audience: This study will be of interest to malariologists and B cell immunologists. Atypical B cells are relevant to many infectious diseases and auto immunity, while the dynamics of memory B cells in malaria all be relevant to those interested in vaccine design against blood stage antigens.


      Reviewer #2

      Evidence, reproducibility and clarity

      Summary: In this study, the authors compared long-lived total and antigen (ag)-specific B-cell levels in a cohort of 10 Ugandan malaria patient samples that were collected before and after local reduction of P. falciparum transmission (pre/post-IRS). The focus is on the novel comparison of the two most common malaria antigens: merozoite antigens (MSP1/AMA1) and variant surface antigens (CIDRα1). Using high-parameter spectral flow cytometry, they also characterized the phenotype of the different populations of cells. Their main findings include 1) a decrease in activated but maintenance of resting ag-specific B-cells in the post-IRS sample and 2) CD95 and CD11c, as the only differentially expressed markers between MSP1/AMA1-specific and CIDRα1-specific long-lived memory B cells. Their further phenotypic characterization suggests functional consequences with MSP1/AMA1-specific B-cells being poised for rapid antibody-secreting cell differentiation while CIDRα1-specific B cells were enriched among a subset of atypical B cells that seem poised for antigen presentation (CD86+CD11chi/ AtBC1). Their findings consolidate and further expand our knowledge of long-lived B-cell levels during P. falciparum malaria and report/compare (for the first time to my knowledge) a differential selection of long-lived B-cell levels between these 2 antigen specificities. Overall, the manuscript is straightforward and well-written and the authors did a good job explaining their methodology, findings, and interpretations. I believe the major gap missing in this study is the reconciliation of long-lived antigen-specific B-cell levels with the serum antigen-specific antibody levels of these patients against the same 2 antigens (MSP1/AMA1 and CIDRα1) in the experiments and the discussion. The antibody data would strengthen their main argument and is the main missing piece for characterizing more completely the long-lived antigen-specific humoral responses. Below are my suggestions that would help improve the manuscript:

      Major comments: 1. Serum Anti-Pf antibodies: Do the authors have access to the serum/plasma of these patients? It would be important to correlate the total and ag-specific B-cell populations with levels of serum IgG antibodies against those specific Pf antigens (MSP1/AMA1 and CIDRα1) and total IgG levels to strengthen their point about long-lived humoral responses.

      To our understanding, the rationale for such an analysis would be that if IgG levels correlated with the size of a certain B cell population, it would suggest that this B cell population is implicated in the production of IgG against a particular antigen. While a correlation between the percentage of memory B cells and IgG titers has been observed for antigens from several viruses and bacteria (1-4), other studies have reported the absence of such a correlation (4-7). Similarly, for P. falciparum antigens, a moderate correlation between memory B cell abundance and IgG titers has been observed for some merozoite antigens, but not for others (8, 9). The lack of a correlation between the magnitude of the memory B cell and the antibody response fits with the prevailing model that memory B cells and plasma cells are two independently controlled arms of the humoral immune system (10, 11). Given the lack of strong evidence that the levels of IgG titers and memory B cells are interconnected, we do not think this analysis will be informative.

      An alternative analysis would be to study the contribution of B cell subsets to the production of IgG after re-exposure, similar to a recent study that identified T-bet+ memory B cells as the main contributors to antibody responses following influenza virus vaccination (12). Unfortunately, we are unable to perform this analysis in this study population, because only four of the individuals included in this study (spanning calendar years 2012 – 2016) were recruited into a follow up cohort (calendar years 2017 – 2019), and none of these four people were infected during this later time frame.

      We have however added this future direction to the Discussion section:

      To determine the contribution of different memory B cell subsets to the recall response against P. falciparum, it would be interesting to analyze IgG responses upon re-infection. However, none of the individuals included in this study experienced a recorded P. falciparum infection post-IRS, preventing us from performing such an analysis.

      References

      1. Crotty et al., J Immunol (2003), https://doi.org/10.4049/jimmunol.171.10.4969
      2. Quinn et al., J Infect Dis (2004), https://doi.org/10.1086/423937
      3. Cohen et al., Cell Rep Med (2021), https://doi.org/10.1016/j.xcrm.2021.100354
      4. Amanna et al., New England J Med (2007), https://doi.org/10.1056/nejmoa066092
      5. Leyendeckers et al., Eur J Immunol (1999), https://doi.org/10.1002/(sici)1521-4141(199904)29:04%3C1406::aid-immu1406%3E3.0.co;2-p
      6. Nanan et al., Vaccine (2001), https://doi.org/10.1016/s0264-410x(01)00328-0
      7. Goel et al., Science Immunol (2021), https://doi.org/10.1126/sciimmunol.abi6950
      8. Rivera-Correa et al., eLife (2019), https://doi.org/10.7554/elife.48309
      9. Jahnmatz et al., Front Immunol (2021), https://doi.org/10.3389/fimmu.2020.619398
      10. Weisel et al., Immunity (2016), https://doi.org/10.1016/j.immuni.2015.12.004
      11. Shinnakasu et al., Nat Immunol (2016), https://doi.org/10.1038/ni.3460
      12. Nellore et al., Immunity (2023), https://doi.org/10.1016/j.immuni.2023.03.00
        1. Correlation between populations and initial parasite load: Are the levels between any of the populations at any time point correlated significantly in any way? If the statistical power/N allows it, please perform a correlation array between all populations using all samples both total and ag-specific and initial parasite load.

      We agree that this analysis could be very interesting. However, in most recorded infection cases, parasitemia was submicroscopic and parasite load was not reported. Information about parasite density in the blood prior to TP1 is available for only half of the individuals in this study. In these people, the last known parasite density was recorded between three months to two years prior to TP1. Given the small number of individuals for whom these data are available and the large variation in time between parasitemia and sampling, we do not have sufficient data to perform this analysis.

      1. Figure 2: Why were total and ag-specific plasmablasts/plasma cells not included in this figure? Please include to compare levels in these two time points.

      We did not include the levels of total and antigen-specific plasmablasts (PBs) in Figure 2 because the percentages of PBs are relatively low, and very few antigen-specific PBs were detected. We have now included the levels of total PBs in Figure 2A and the percentages of antigen-specific PBs in Supplementary Figure 2. The percentage of PBs among total B cells decreased by about 50% between TP1 and TP2, in line with a decrease in immune activation.

      1. Healthy baseline: The study is missing "healthy" controls as a reference. I presume this is because each patient is its uninfected control in the post-IRS sample. In methods, they mentioned they used two naïve-USA B-cells as technical controls. It would be important to include and maybe expand (to match age and gender)on that specific data from those controls as supplementary figures to support their findings:
      2. Show negative Tetramer staining for these samples (to understand the background).
      3. Levels of all the USA controls total B cell populations and compared to the pre/post-IRS samples to understand "baseline" or "non-endemic" control levels.
      1. We have included flow cytometry plots of tetramer staining for the non-P. falciparum exposed donors (pooled B cells from two US donors) to show the level of background for these probes. These plots are shown in Figure S1B.

      2. We have used data from P. falciparum-naive US donors (n = 7) that we generated for a prior study to show the average level of total B cell populations in Figure 2, and the percentage of switched memory B cells that express CD95, CD11c, T-bet, and FcRL5 in Figure 4.

      Minor comments: 1. In the gating strategy (S1), please include the percentage of each population of that representative example.

      We have added the percentages for all gated populations to Figure S1.

      1. For Figure 2, since not every panel has the same N, please include the N for each panel in the figure or a supplementary table.

      All panels in Figure 2 show data for all 10 individuals. However, since some data points are overlapping, it may appear that some panels show data from fewer individuals. Specifically, no antigen-specific DN1 cells were detected pre- and post-IRS for four individuals. These data points therefore overlap and are not visible. To avoid confusion, we had mentioned this in the legend to Figure 2 (see text in orange). We have tried to further clarify this by emphasizing in the figure legend that data from all 10 individuals are shown (see text in red):

      Figure 2: Abundance of total and antigen-specific B cell subsets in the circulation during high parasite transmission and in the absence of P. falciparum exposure. The percentage of B cell subsets among circulating B cells is shown for total B cells (A), MSP1/AMA1-specific B cells (B), and CIDRα1-specific B cells (C). For MSP1/AMA1-specific B cells and CIDRα1-specific B cells, the total percentage among all circulating B cells is also shown (right most graphs in each panel). All panels show data for all 10 individuals. In panels B and C, no antigen-specific DN1 cells were detected pre- and post-IRS for four individuals. These data points therefore overlap and are not clearly visible. Differences between groups were evaluated using a Wilcoxon matched-pairs signed-rank test. P values

      1. Please mention the history of past and chronic co-infections of these 10 patients. Particularly if they had any other active or recent infection when the sample was taken.

      Four individuals had active or recent infections in the three months prior to sample collection, with upper respiratory tract infections being the most common. This information has been included in Table S3, with a reference to these data in the Methods section. We have also included a link to ClinEpiDB where additional information about the cohort participants, including medical history, can be found.

      1. Discussion: further discussion with relevant literature on the following points is needed to consolidate cellular and antibody studies: a. Whether the presence of long-lived ag-specific B-cell responses correlates with sustained levels of IgG against Pf antigens. b. The different types of antibodies (protective/pathogenic) that these different B-cell populations have been reported to produce during malaria.

      a. We have added the following paragraph to the Discussion section:

      To determine how these different long-lived B cell subsets contribute to protection against P. falciparum infection, it would be important to analyze the connection between the cellular repertoire and plasma IgG. For P. falciparum antigens, a moderate correlation between memory B cell abundance and IgG titers has been observed for some merozoite antigens, but not for others (28, 44). This is in line with studies for other pathogens, that showed a correlation between the percentage of memory B cells and IgG titers for antigens from several viruses and bacteria (48-51), while other studies have reported the absence of such a correlation (51-54). The lack of a correlation between the magnitude of the memory B cell and the antibody response fits with the prevailing model that memory B cells and plasma cells are two independently controlled arms of the humoral immune system (55, 56). To determine the contribution of different memory B cell subsets to the recall response against P. falciparum, it would be interesting to analyze IgG responses upon re-infection. However, none of the individuals included in this study experienced a recorded P. falciparum infection post-IRS, preventing us from performing such an analysis.

      b. We have added additional discussion about the types of antigens recognized by atypical B cells to the Discussion section:

      Prior studies have shown that while atypical B cells harbor reactivity against P. falciparum antigens (9,18), they are also enriched for autoreactivity (43). Specifically, atypical B cells produce antibodies against the membrane lipid phosphatidylserine, which can induce the destruction of uninfected erythrocytes and contribute to anemia (44).

      Significance

      General assessment:

      Strengths: - Novelty in contrasting two different types of P. falciparum antigen responses at the B-cell level. - The use of tetramers is a cutting-edge technique to assess this question. - Analyses were thorough and found contrasting differences in antigen-specific B-cell populations (atypical vs classical) between these 2 antigens for the first time (to my knowledge). - Well-written manuscript with clear data, methodology, and conclusions

      Limitations: - Missing serum/plasma antibody data to support their claim about long-lived humoral responses and reconciliation of ag-specific B-cell levels and ag-specific antibody levels in experiments and discussion. - Limited N of 10 patients of the same gender (female), some population analyses had even fewer samples. - Missing baseline levels for non-endemic uninfected control for B-cell populations for comparison.

      • We have included a discussion about the correlation between plasma antibody and memory B cell responses in the Discussion section.

      • We have clarified that some data points overlap in Figure 2, giving the impression that data from fewer than 10 individuals were shown.

      • We have included baseline levels of 1) tetramer reactivity (Figure S1), 2) the size of B cell populations (Figure 2), and 3) expression of select markers (Figure 4).

      Advance: The study consolidates antigen-specific responses with the discovery of recently characterized populations (ex. atypical) and finds novel differences between two types of malaria antigen responses at the B-cell level and between specific populations responding differentially to these antigens. The findings are incremental (role of B-cell population in malaria-specific responses), conceptual (contrasting two types of B-cell antigen responses in the same infection), and clinical (finding significant differences in patients).

      Audience: This study will attract basic B-cell immunology scientists, infectious disease clinicians/scientists, vaccinologists, and both basic malaria immunology and clinical audiences.

      Reviewer expertise: Malaria, immunology, antibodies.

      __Reviewer #3 __

      Evidence, reproducibility and clarity: The authors analysed the antigen specificity and phenotypes of B cells during high P falciparum transmission and after a period of successful malaria control with IRS in Uganda. The gap between the two sampling time points is close to two years.

      They use antigen probes for MSP1/AMA1 and CIDRalpha1, two antigens expressed at different stages of P. falciparum life cycle-merozoites and infected red cells, respectively. While MSP1/AMA1 are involved in the parasite's invasion of red blood cells, CIDRalpha1 is a domain of PFEMP1, a large family of antigenically variant proteins that mediates the sequestration of infected red cells in small blood vessels.

      They found that the percentage of activated antigen-specific memory B cells declined with malaria control. However, detectable frequencies of antigen-specific memory B cells were retained after malaria control, which confirms earlier reports.

      However, they also demonstrate that the phenotypic characteristics of memory B cells are associated with antigen specificity. The retained MSP1/AMA1-specific B cells were mostly CD95+CD11c+ memory B cells and FcRL5-Tbet- atypical B cells. In contrast, the retained CIDRalpha1-specific B cells were enriched among a subpopulation of atypical B cells.

      These findings suggest differences exist in how the MSA1/AMA1 and CIDRalpha1 y are recognised and processed by the human immune system and how the immune response responds to them upon re-infection with P falciparum.

      Major issues affecting the conclusion: The findings and conclusions of this study, whilst positively exciting and informative, are based on the analyses of very few cells (at times). Even the authors themselves acknowledge this. I expect the authors to address this issue by toning down their reporting and conclusions (where appropriate). Ultimately, we need to have the confidence that these results are reproducible.

      We appreciate the reviewer’s concern about the numbers of antigen-specific cells included in our analyses, which is an inherent limitation of this approach. However, we would like to point out that most analyses included a substantial number of antigen-specific B cells:

      Figure 3D: 158 to 2,038 cells per group

      Figure 4: an average of 26 to 184 cells per donor

      Figure 5B: 55 to 508 cells per group

      Figure 5C: 10 to 334 cells per group*

      * The group with 10 cells is an outlier here. All other groups contain at least 104 cells. Because this one condition had such a small number of cells, we specifically mentioned this number in the text.

      The numbers of cells for analyses shown in Figures 3D and 5B were already included in the figures. All the other numbers were mentioned in Table S3. To further clarify the number of cells included in each analysis, we have added the number of cells to Figures 4 and 5C.

      To tone down our reporting, we have rephrased some of our conclusions, and now present our section headers in past tense to make these statements reflect our observation instead of a definitive conclusion. For example:

      Conclusion: “The loss of MSP1/AMA1-specific and CIDRα1-specific B cells in the circulation was similar, but the phenotype of long-lived MSP1/AMA1-specific and CIDRα1-specific B cells appeared to differ.”

      Section header: “Long-lived MSP1/AMA1-specific and CIDRα1-specific B cells differed in phenotype”

      Finally, in the Discussion section, we have added a statement to our paragraph describing the limitations of our study to stress the importance of reproducing our findings:

      All in all, it will be important to perform additional studies of the phenotype and functionality of long-lived B cells with specificity for P. falciparum antigens to reproduce and extend our findings.

      Minor comments: Figure 2D-I found this figure, and its presentation is unclear. Notably, using contour plots doesn't allow the reader to appreciate the density of the cells being presented.

      To facilitate the interpretation of this figure, we have changed the plot type to a contour plot with density color gradient, and added the number of cells shown in each plot. (Please note that this panel has been renumbered to C.)

      Figure 4 - label the y-axis.

      The y-axis was labeled with “%”, which we have expanded to “% of B cells expressing marker of interest”.

      __Significance: __The study design-as outlined-allowed for the analyses of the specificity and phenotypic characteristics of residual P falciparum-specific memory B cells after 1.7 years of little to no P falciparum exposure. The cell phenotyping methods presented are also appropriate. However, antigen-specific cells are rare in blood circulation, and as the authors themselves acknowledge in the discussion, some of the results are based on very few cells. This means we cannot be sure all the results presented are reproducible.

      Previous studies demonstrated that P falciparum memory B cells are maintained long after cessation of antigen exposure. However, few (if any) detailed antigen-specific and phenotypic analyses of the characteristics of P falciparum-specific memory B cells following a long period of no exposure exist. Thus, this study presents an incremental advance in our knowledge. In addition, the association of antigen specificity with cell phenotypes is a new concept in malaria immunology. The research presented will greatly interest infectious disease immunologists and vaccinologists.

      I am an infectious disease immunologist with substantial experience in malaria immunology.

    1. Recognize the difference between casual, formal, and urgent registers. Learn how to use each in the classroom and make your shifts between the registers obvious.

      I think that this is a very important point. Being able to understand the difference between formal and informal lessons and tones, as well as posture and facial expressions is a very important skill that teachers need to have, as it is important skill for anyone to have. As teachers and educators, we are role models to our students, and we are meant to exemplify what it is to be a positive contributing member of society. in order for that we need to be able to represent both formal and informal ways of communication and when to be formal or informal communicating. For instance, if we are doing a lesson over business attire and resumes, the instructor may want to be more formal, but if the instructor is teaching about Topic, such as fun or games, the lesson may be less formal. Is important for an educator to represent both forms of communication, as it allows students to understand that there is more to life than just being formal or informal.

    1. Reviewer #1 (Public Review):

      In this paper, Tompary & Davachi present work looking at how memories become integrated over time in the brain, and relating those mechanisms to responses on a priming task as a behavioral measure of memory linkage. They find that remotely but not recently formed memories are behaviorally linked and that this is associated with a change in the neural representation in mPFC. They also find that the same behavioral outcomes are associated with the increased coupling of the posterior hippocampus with category-sensitive parts of the neocortex (LOC) during a post-learning rest period-again only for remotely learned information. There was also correspondence in rest connectivity (posterior hippocampus-LOC) and representational change (mPFC) such that for remote memories specifically, the initial post-learning connectivity enhancement during rest related to longer-term mPFC representational change.

      This work has many strengths. The topic of this paper is very interesting, and the data provide a really nice package in terms of providing a mechanistic account of how memories become integrated over a delay. The paper is also exceptionally well-written and a pleasure to read. There are two studies, including one large behavioral study, and the findings replicate in the smaller fMRI sample. I do however have two fairly substantive concerns about the analytic approach, where more data will be required before we can know whether the interpretations are an appropriate reflection of the findings. These and other concerns are described below.

      (1) One major concern relates to the lack of a pre-encoding baseline scan prior to recent learning.

      a) First, I think it would be helpful if the authors could clarify why there was no pre-learning rest scan dedicated to the recent condition. Was this simply a feasibility consideration, or were there theoretical reasons why this would be less "clean"? Including this information in the paper would be helpful for context. Apologies if I missed this detail in the paper.

      b) Second, I was hoping the authors could speak to what they think is reflected in the post-encoding "recent" scan. Is it possible that these data could also reflect the processing of the remote memories? I think, though am not positive, that the authors may be alluding to this in the penultimate paragraph of the discussion (p. 33) when noting the LOC-mPFC connectivity findings. Could there be the reinstatement of the old memories due to being back in the same experimental context and so forth? I wonder the extent to which the authors think the data from this scan can be reflected as strictly reflecting recent memories, particularly given it is relative to the pre-encoding baseline from before the remote memories, as well (and therefore in theory could reflect both the remote + recent). (I should also acknowledge that, if it is the case that the authors think there might be some remote memory processing during the recent learning session in general, a pre-learning rest scan might not have been "clean" either, in that it could have reflected some processing of the remote memories-i.e., perhaps a clean pre-learning scan for the recent learning session related to point 1a is simply not possible.)

      c) Third, I am thinking about how both of the above issues might relate to the authors' findings, and would love to see more added to the paper to address this point. Specifically, I assume there are fluctuations in baseline connectivity profile across days within a person, such that the pre-learning connectivity on day 1 might be different from on day 2. Given that, and the lack of a pre-learning connectivity measure on day 2, it would logically follow that the measure of connectivity change from pre- to post-learning is going to be cleaner for the remote memories. In other words, could the lack of connectivity change observed for the recent scan simply be due to the lack of a within-day baseline? Given that otherwise, the post-learning rest should be the same in that it is an immediate reflection of how connectivity changes as a function of learning (depending on whether the authors think that the "recent" scan is actually reflecting "recent + remote"), it seems odd that they both don't show the same corresponding increase in connectivity-which makes me think it may be a baseline difference. I am not sure if this is what the authors are implying when they talk about how day 1 is most similar to prior investigation on p. 20, but if so it might be helpful to state that directly.

      d) Fourth and very related to my point 1c, I wonder if the lack of correlations for the recent scan with behavior is interpretable, or if it might just be that this is a noisy measure due to imperfect baseline correction. Do the authors have any data or logic they might be able to provide that could speak to these points? One thing that comes to mind is seeing whether the raw post-learning connectivity values (separately for both recent and remote) show the same pattern as the different scores. However, the authors may come up with other clever ways to address this point. If not, it might be worth acknowledging this interpretive challenge in the Discussion.

      (2) My second major concern is how the authors have operationalized integration and differentiation. The pattern similarity analysis uses an overall correspondence between the neural similarity and a predicted model as the main metric. In the predicted model, C items that are indirectly associated are more similar to one another than they are C items that are entirely unrelated. The authors are then looking at a change in correspondence (correlation) between the neural data and that prediction model from pre- to post-learning. However, a change in the degree of correspondence with the predicted matrix could be driven by either the unrelated items becoming less similar or the related ones becoming more similar (or both!). Since the interpretation in the paper focuses on change to indirectly related C items, it would be important to report those values directly. For instance, as evidence of differentiation, it would be important to show that there is a greater decrease in similarity for indirectly associated C items than it is for unrelated C items (or even a smaller increase) from pre to post, or that C items that are indirectly related are less similar than are unrelated C items post but not pre-learning. Performing this analysis would confirm that the pattern of results matches the authors' interpretation. This would also impact the interpretation of the subsequent analyses that involve the neural integration measures (e.g., correlation analyses like those on p. 16, which may or may not be driven by increased similarity among overlapping C pairs). I should add that given the specificity to the remote learning in mPFC versus recent in LOC and anterior hippocampus, it is clearly the case that something interesting is going on. However, I think we need more data to understand fully what that "something" is.

      (3) The priming task occurred before the post-learning exposure phase and could have impacted the representations. More consideration of this in the paper would be useful. Most critically, since the priming task involves seeing the related C items back-to-back, it would be important to consider whether this experience could have conceivably impacted the neural integration indices. I believe it never would have been the case that unrelated C items were presented sequentially during the priming task, i.e., that related C items always appeared together in this task. I think again the specificity of the remote condition is key and perhaps the authors can leverage this to support their interpretation. Can the authors consider this possibility in the Discussion?

      (4) For the priming task, based on the Figure 2A caption it seems as though every sequence contributes to both the control and primed conditions, but (I believe) this means that the control transition always happens first (and they are always back-to-back). Is this a concern? If RTs are changing over time (getting faster), it would be helpful to know whether the priming effects hold after controlling for trial numbers. I do not think this is a big issue because if it were, you would not expect to see the specificity of the remotely learned information. However, it would be helpful to know given the order of these conditions has to be fixed in their design.

      (5) The authors should be cautious about the general conclusion that memories with overlapping temporal regularities become neurally integrated - given their findings in MPFC are more consistent with overall differentiation (though as noted above, I think we need more data on this to know for sure what is going on).

      (6) It would be worth stating a few more details and perhaps providing additional logic or justification in the main text about the pre and post-exposure phases were set up and why. How many times each object was presented pre and post, and how the sequencing was determined (were any constraints put in place e.g., such that C1 and C2 did not appear close in time?). What was the cover task (I think this is important to the interpretation & so belongs in the main paper)? Were there considerations involving the fact that this is a different sequence of the same objects the participants would later be learning - e.g., interference, etc.?

    1. We further identified HC-HA/PTX3 as the primary bioactive component responsible for pain inhibition.

      This is such an exciting overall result. I'm wondering if you've tested/identified any other bioactive compounds from the same material in addition to HC-HA/PTX3, and/or whether you think there may be other significant contributors to pain inhibition from human birth tissues.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the Authors):

      Arpin is a negative regulator of Arp2/3 activity. Here the authors investigated the role of arpin in vascular permeability using appropriate cultured human and murine endothelial monolayers and successfully developed an arpin KO mice. The results clearly show arpin is expressed in blood vessels (not clear about lymphatics but given leaky vessels, one wonders). The data show that arpin is important for vessel barrier function yet its genetic loss still leads to viable animals in the C57Blk strain albeit with leaky blood vessels. The data are well presented and controls are included. However, the evidence that arpin loss/knockdown causes increased actin functions independent of Arp2/3 is based on pharmacological data and is indirect. Authors conclude ROCK1 activity is elevated and the cause of lost barrier function by arpin reduction. I do have one suggestion for the authors that involves a new study in these animals, which could strengthen their proposed mechanism that the vascular defects are independent of Arp2/3 activity and rather involve ROCK1 but not ZIPK.

      (1) If arpin is working via ROCK1, as the authors infer, perhaps treatment of arpin-/- mice with ROCK1 inhibitor(s) would attenuate vessel permeability while HS38 treatment would not. This type of study would strengthen the conclusion that ROCK1, but not ZIPK, was involved. Including CK666 if active in mouse cells, could also be tested.

      To analyze vascular permeability in vivo, we performed Miles assays in arpin+/+ and arpin-/- mice using the inhibitors of ROCK1 (Y27632) and ZIPK (HS38). Both Y27632 and HS38 reduced the permeability caused by absence of arpin (new Figure 8E), thus confirming what we observed before in HUVEC (shown in old Figure 7). CK666 did not change the permeability in arpin-/- mice, thus confirming the conclusion that arpin does not regulate vascular permeability via Arp2/3 but rather via ROCK1/ZIPK-mediated stress fiber formation (page 13).

      (2) Fig 5. Data demonstrate that Arpin regulates actin filament formations and permeability in HUVEC, but this does not demonstrate its occurring in an Arp2/3-independent manner. If I understand your data this is indirect evidence. One needs more information to reach this conclusion. Can authors measure Arp2/3 directly and then test whether arpin knockdown and CK666 have the same capacity to reduce Arp2/3 activity in vitro.

      Arp2/3 activity cannot be measured directly. The commonly used approach is therefore Arp2/3 inhibition via CK666. Our new in vivo permeability assays (see answer above) together with our HUVEC data in Figure 5 clearly show that CK666 does not have the same effect as arpin knock-down, and neither does CK666 rescue the effects of arpin deficiency in vitro and in vivo. Together, these findings clearly suggest that arpin does not regulate endothelial permeability via Arp2/3.

      Minor issues:

      Fig 2, 3 or other Figs: In presented western blots, all proteins should include appropriate mw labels.

      Thank you. Molecular weights have been added to all Western blots.

      Fig 2. Suggest that like your arpin analysis, amounts of AP1AP and PICK1 at baseline and TNF-treatment by blotting should be included. A minor point is yellow color for labels does not stand out and should be changed to another color - as the authors used in Fig 2C.

      We have included Western blots and quantifications for PICK1 in Figure S1A and S1C. An antibody against AP1AP was unfortunately not available.

      The yellow color has been changed to purple for better visibility.

      Fig 2C. The arpin loss at junctions and actin filaments (Figure 2C) is very minor even though it reached statistical significance. It really is not an obvious loss from your 3 color overlay.

      Thank you. It is indeed hard to see. We included now magnifications in Figure 2C that better show the loss of arpin at junctions.

      Fig 8, text 303-310 shows in vivo evidence of lung congestion and edema. Also appear to be inflammatory cells present in images. If these are inflammatory cells, it begs the question if these mice have an abnormal complete blood cell count (CBC). Suggest adding CBC data for arpin-/- vs control arpin +/+ mice in Fig 8.

      The pathologist observed the presence of lymphocytes and macrophages, indicating the possibility of a (low level) chronic inflammation in arpin-deficient lungs. However, we now also performed hemograms of the mice (new Table S2) that showed no significant difference in the blood cell count of arpin-/- and arpin+/+ mice. Thus, the presence of lymphocytes and macrophages cannot be explained simply by higher leukocyte counts (page 14).

      Line 289, pg 13, Fig 8: Lung levels of arpin are not shown in Fig 8B. Authors must mean another fig?

      Sorry. Arpin protein levels in lungs are shown in figure 8C. This has been corrected on page 13.

      Reviewer #2 (Recommendations For The Authors):

      This is a solid piece of work that adds a small amount of additional factual information to our understanding of cell-cell junctions. The experimental work is of good quality and is sufficient to support the aims of the paper. I think the value of the work is to add this small amount of new knowledge to the archive. I do not believe that further experimental work would add to the paper - it's done. But this doesn't have the impact or completeness for this journal. It belongs in a for-the-record journal.

      We appreciate your overall positive evaluation and your comments that our study represents a solid piece of work with good quality experimental work. However, we are not sure what you mean by “it belongs in a for-the-record journal”. Anyway, we agree that our study does not reveal a complete mechanism of how arpin regulates actin stress fibers, but we respectfully disagree that our study only adds a “small amount of additional factual information”. We may not have been very clear about it, but we present in this study several new discoveries and although some are descriptive in nature that does not make them trivial or less important. We provide for the first time experimental evidence that: 1) arpin is expressed in endothelial cells in vitro and in vivo, and downregulated during inflammation; 2) presence of arpin is required for proper endothelial permeability regulation and junction architecture; 3) arpin exerts these functions in an Arp2/3-independent manner; 4) arpin controls actomyosin contractility in a ROCK1- and ZIPK-dependent fashion; 5) arpin knock-out mice are viable and breed and develop normally but show histological characteristics of a vascular phenotype and increased vascular permeability that can be rescued by inhibition of ROCK1 and ZIPK. The fact that arpin fulfills its functions in endothelial cells independently of the Arp2/3 complex is of special relevance as previously the only known function of arpin was the inhibition of the Arp2/3 complex. Thus, we believe that our study adds a significant amount of new information to the literature. Thank you very much.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Summary Responses: Besides the WT allele, equivalent to the mouse TMEM173 gene, the human TMEM173 gene has two common alleles: the HAQ and AQ alleles carried by billions of people. The main conclusions and interpretation, summarized in the Title and Abstract, are i) Different from the WT TMEM173 allele, the HAQ or AQ alleles are resistant to STING activation-induced cell death; ii) STING residue 293 is critical for cell death; iii) HAQ, AQ alleles are dominant to the SAVI allele; iv) One copy of the AQ allele rescues the SAVI disease in mice. We propose that STING research and STING-targeting immunotherapy should consider human TMEM173 heterogeneity. These interpretations and conclusions were based on Data and Logic. We welcome alternative, logical interpretations and collaborations to advance the human TMEM173 research.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Aybar-Torres et al investigated the effect of common human STING1 variants on STING-mediated T cell phenotypes in mice. The authors previously made knock-in mice expressing human STING1 alleles HAQ or AQ, and here they established a new knock-in line Q293. The authors stimulated cells isolated from these mice with STING agonists and found that all three human mutant alleles resist cell death, leading to the conclusion that R293 residue is essential for STING-mediated cell death (there are several caveats with this conclusion, more below). The authors also bred HAQ and AQ alleles to the mouse Sting1-N153S SAVI mouse and observed varying levels of rescue of disease phenotypes with the AQ allele showing more complete rescue than the HAQ allele. The Q293 allele was not tested in the SAVI model. They conclude that the human common variants such as HAQ and AQ have a dominant negative effect over the gain-of-function SAVI mutants.

      Strengths:

      The authors and Dr. Jin's group previously made important observations of common human STING1 variants, and these knock-in mouse models are essential for understanding the physiological function of these alleles.

      Weaknesses:

      However, although some of the observations reported here are interesting, the data collectively does not support a unified model. The authors seem to be drawing two sets of conclusions from in vitro and in vivo experiments, and neither mechanism is clear. Several experiments need better controls, and these knock-in mice need more comprehensive functional characterization.

      (1) In Figure 1, the authors are trying to show that STING agonist-induced splenocytes cell death is blocked by HAQ, AQ and Q alleles. The conclusion at line 134 should be splenocytes, not lymphocytes. Most experiments in this figure were done with mixed population that may involve cell-to-cell communication. Although TBK1-dependence is likely, a single inhibitor treatment of a mixed population is not sufficient to reach this conclusion.

      We greatly appreciate Reviewer 1's insights. We changed the “lymphocytes” to “splenocytes” (line 133) as suggested. We respectfully disagree with Reviewer 1’s comments on TBK1. First, we used two different TBK1 inhibitors: BX795 and GSK8612. Second, because BX795 also inhibits PDK1, we used a PDK1 inhibitor GSK2334470; Third, both BX795 and GSK8612 completely inhibited diABZI-induced splenocyte cell death (Figure 1B) (lines 128 – 133). The logical conclusion is “TBK1 activation is required for STING-mediated mouse spleen cell death ex vivo”. (line 117).

      Our discovery that the common human TMEM173 alleles are resistant to STING activation-induced cell death is a substantial finding. It further strengthens the argument that the HAQ and AQ alleles are functionally distinct from the WT allele 1-3. We wish to underscore the crucial message of this study-that 'STING research and STING-targeting immunotherapy should consider TMEM173 heterogeneity in humans' (line 37), which has been largely overlooked in current STING clinical trials 4.

      Regarding STING-Cell death, as we stated in the Introduction (lines 65-77). i) STING-mediated cell death is cell type-dependent 5-7 and type I IFNs-independent 5,7,8. ii) The in vivo biological significance of STING-mediated cell death is not clear 7,8. iii) The mechanisms of STING-Cell death remain controversial. Multiple cell death pathways, i.e., apoptosis, necroptosis, pyroptosis, ferroptosis, and PANoptosis, are proposed 7,9,10. SAVI/HAQ, SAVI/AQ prevented lymphopenia and alleviated SAVI disease in mice. Thus, the manuscript provides some answers to the biological significance of STING-cell death in vivo, which is new. Regarding the molecular mechanism, splenocytes from Q293/Q293 mice are resistant to STING cell death. The logical conclusion is that the amino acid 293 is critical for STING cell death (line 29).

      Extensive studies are needed, beyond the scope of this manuscript, on how aa293 and TBK1 mediates STING-Cell death to resolve the controversies in the STING-cell death fields (e.g. apoptosis, necroptosis, pyroptosis, ferroptosis, and PANoptosis).

      (2) Q293 knock-in mouse needs to be characterized and compared to HAQ and AQ. Is this mutant expressed in tissues? Does this mutant still produce IFN and other STING activities? Does the protein expression level altered on Western blot? Is the mutant protein trafficking affected? In the authors' previous publications and some of the Western blot here, expression levels of each of these human STING1 protein in mice are drastically different. HAQ and AQ also have different effects on metabolism (pmid: 36261171), which could complicate interoperation of the T cell phenotypes.

      These are very important questions that require rigorous investigations that are beyond the scope of this manuscript. This manuscript, titled “The common TMEM173 HAQ, AQ alleles rescue CD4 T cellpenia, restore T-regs, and prevent SAVI (N153S) inflammatory disease in mice” does not focus on Q293 mice. We have been investigating the common human TMEM173 alleles since 2011 from the discovery 11 , mouse model 1,3, human clinical trial 2, and human genetics studies 3. This manuscript is another step towards understanding these common human TMEM173 alleles with the new discovery that HAQ, AQ alleles are resistant to STING cell death.

      (3) HAQ/WT and AQ/WT splenocytes are protected from STING agonist-induced cell death equally well (Figure 1G). HAQ/SAVI shows less rescue compared to AQ/SAVI. These are interesting observations, but mechanism is unclear and not clearly discussed. E.g., how does AQ protect disease pathology better than HAQ (that contains AQ)? Does Q293 allele also fully rescue SAVI?

      In this manuscript, Figure 6 shows AQ/SAVI had more T-regs than HAQ/SAVI (lines 251 – 261). In our previous publication on HAQ, AQ knockin mice, we showed that AQ T-regs have more IL-10 than HAQ T-regs 3. Thus, increased IL-10+ Tregs in AQ mice may contribute to an improved phenotype in AQ/SAVI compared to HAQ/SAVI. However, we are not excluding other contributions (e.g. metabolic difference) (lines 332-335). We are exploring these possibilities.  

      (4) Figure 2 feels out of place. First of all, why are the authors using human explant lung tissues? PBMCs should be a better source for lymphocytes. In untreated conditions, both CD4 and B cells show ~30% dying cells, but CD8 cells show 0% dying cells. This calls for technical concerns on the CD8 T cell property or gating strategy because in the mouse experiment (Figure 1A) all primary lymphocytes show ~30% cell death at steady-state. Second, Figure 2C, these type of partial effect needs multiple human donors to confirm. Three, the reconstitution of THP1 cells seems out of place. STING-mediated cell death mechanism in myeloid and lymphoid cells are likely different. If the authors want to demonstrate cell death in myeloid cells using THP1, then these reconstituted cell lines need to be better validated. Expression, IFN signaling, etc. The parental THP1 cells is HAQ/HAQ, how does that compare to the reconstitutions? There are published studies showing THP1-STING-KO cells reconstituted with human variants do not respond to STING agonists as expected. The authors need to be scientifically rigorous on validation and caution on their interpretations.

      Figure 2 is necessary because it reveals the difference between mouse and human STING cell death, which is critical to understand STING in human health and diseases (lines 160-161). Figure 2A-2B showed that STING activation killed human CD4 T cells, but not human CD8 T cells or B cells. This observation is different from Figure 1A, where STING activation killed mouse CD4, CD8 T cells, and CD19 B cells, revealing the species-specific STING cell death responses. Regarding human CD8 T cells, as we stated in the Discussion (lines 323-325), human CD8 T cells (PBMC) are not as susceptible as the CD4 T cells to STING-induced cell death 8. We used lung lymphocytes that showed similar observations (Figure 2A). For Figure 2C, we used 2 WT/HAQ and 3 WT/WT individuals (lines 738-739). We generate HAQ, AQ THP-1 cells in STING-KO THP-1 cells (Invivogen,, cat no. thpd-kostg) (lines 380-387).

      A recent study found that a new STING agonist SHR1032 induces cell death in STING-KO THP-1 cells expressing WT(R232) human STING 10 (line 182). SHR1032 suppressed THP1-STING-WT(R232) cell growth at GI50: 23 nM while in the parental THP1-STING-HAQ cells, the GI50 of SHR1032 was >103 nM 10. Cytarabine was used as an internal control where SHR1032 killed more robustly than cytarabine in the THP1-STING-WT(R232) cells but much less efficiently than cytarabine in the THP-1-STING-HAQ cells 10. 

      Our manuscript rigorously uses mouse splenocytes, human lung lymphocytes, THP-1 reconstituted with HAQ, AQ, and HAQ/SAVI, AQ/SAVI mice, to demonstrate that the common human HAQ, AQ alleles are resistant to STING cell death in vitro and in vivo.

      We agree with Reviewer 1 that STING-mediated cell death mechanisms in myeloid and lymphoid cells may be different and likely contribute to the different mechanisms proposed in STING cell death research 7,9,10. Our study focuses on the in vivo STING-mediated T cellpenia.

      (5) Figure 2G, H, I are confusing. AQ is more active in producing IFN signaling than HAQ and Q is the least active. How to explain this?

      We stated in the Introduction that “AQ responds to CDNs and produce type I IFNs in vivo and in vitro 3,12,13 ”(line 92-93). We reported that the AQ knock in mice responded to STING activation 3. We previously showed that there was a negative natural selection on the AQ allele in individuals outside of Africa 3. 28% of Africans are WT/AQ but only 0.6% East Asians are WT/AQ 3. In contrast, the HAQ allele was positively selected in non-Africans 3. Investigation to understand the mechanisms and biological significance of these naturally selected human TMEM173 alleles has been ongoing in the lab.

      (6) The overall model is unclear. If HAQ, AQ and Q are loss-of-function alleles and Q is the key residue for STING-mediated cell death, then why AQ is the most active in producing IFN signaling and AQ/SAVI rescues disease most completely? If these human variants act as dominant negatives, which would be consistent with the WT/het data, then how do you explain AQ is more dominant negative than HAQ?

      In this manuscript, Figure 6 shows AQ/SAVI had more T-regs than HAQ/SAVI (lines 251 – 261). In our previous publication on HAQ, AQ knockin mice, we showed that AQ T-regs have more IL-10 and mitochondria activity than HAQ T-regs 3. Nevertheless, we are not excluding other contributions (e.g. metabolic difference) by the AQ allele (lines 332-335). Last, we used modern human evolution to discover the dominance of these common human STING alleles. In modern humans outside Africans, HAQ was positively selected while AQ was negatively selected 3. However, AQ is likely dominant to HAQ because there is no HAQ/AQ individuals outside Africa. The genetic dominance of common human TMEM173 allele is a new concept. More investigation is ongoing.

      (7) As a general note, SAVI disease phenotypes involve multiple cell types. Lymphocyte cell death is only one of them. The authors' characterization of SAVI pathology is limited and did not analyze immunopathology of the lung.

      Both radioresistant parenchymal and/or stromal cells and hematopoietic cells influence SAVI pathology in mice 14,15. Nevertheless, the lack of CD 4 T cells, including the anti-inflammatory T-regs, likely contributes to the inflammation in SAVI mice and patients 16. We characterized lung function, lung inflammation (Figure 4), lung neutrophils, and inflammatory monocyte infiltration (Figure S5) (lines 232-235).

      (8) Line 281, the discussion on HIV T cell death mechanism is not relevant and over-stretching. This study did not evaluate viral infection in T cells at all. The original finding of HAQ/HAQ enrichment in HIV/AIDS was 2/11 in LTNP vs 0/11 in control, arguably not the strongest statistics.

      Several publications have linked STING to HIV pathogenesis 17-22  (line 271). CD4 T cellpenia is a hallmark of AIDS. The manuscript studies STING activation-induced T cellpenia in vivo. It is not stretching to ask, for example, does preventing STING T cell death (e.g HAQ, AQ alleles) can restore CD4 T cell counts and improve care for AIDS patients?

      Reviewer #2 (Public Review):

      Aybar-Torres and colleagues utilize common human STING alleles to dissect the mechanism of SAVI inflammatory disease. The authors demonstrate that these common alleles alleviate SAVI pathology in mice, and perhaps more importantly use the differing functionality of these alleles to provide insight into requirements of SAVI disease induction. Their findings suggest that it is residue A230 and/or Q293 that are required for SAVI induction, while the ability to induce an interferon-dependent inflammatory response is not. This is nicely exemplified by the AQ/SAVI mice that have an intact inflammatory response to STING activation, yet minimal disease progression. As both mutants seem to be resistant STING-dependent cell death, this manuscript also alludes to the importance of STING-dependent cell death, rather than STING-dependent inflammation, in the progression of SAVI pathology. While I have some concerns, I believe this manuscript makes some important connections between STING pathology mouse models and human genetics that would contribute to the field.

      Some points to consider:

      (1) While the CD4+ T cell counts from HAQ/SAVI and AQ/SAVI mice suggest that these T cells are protected from STING-dependent cell death, an assay that explores this more directly would strengthen the manuscript. This is also supported by Fig 2C, but I believe a strength of this manuscript is the comparison between the two alleles. Therefore, if possible, I would recommend the isolation of T cells from these mice and direct stimulation with diABZI or other STING agonist with a cell death readout.

      Please see the new Figure S3 for cell death by diABZI, DMXAA in Splenocytes from WT/WT, WT/HAQ, HAQ/SAVI, AQ/SAVI mice. The HAQ/SAVI and AQ/SAVI splenocytes showed similar partial resistance to STING activation-induced cell death (lines 214-216).

      (2) Related to the above point - further exemplifying that the Q293 locus is essential to disease, even in human cells, would also strengthen the paper. It seems that CD4 T cell loss is a major component of human SAVI. While not co_mpletely necessary, repeating the THP1 cell death experiments from Fig 2 with a human T cell line would round out the study nicely._

      We examined HAQ, AQ mouse splenocytes, HAQ human lung lymphocytes, THP-1 reconstituted with HAQ, AQ, and HAQ/SAVI, AQ/SAVI mice, to demonstrate that the common human HAQ, AQ alleles are resistant to STING cell death in vitro and in vivo. Additional human T cell line work does not add too much. We hope to conduct more human PBMC or lung lymphocytes STING cell death experiments from HAQ, AQ individuals as we continue the human STING alleles investigation.

      (3) While I found the myeloid cell counts and BMDM data interesting, I think some more context is needed to fully loop this data into the story. Is myeloid cell expansion exemplified by SAVI patients? Do we know if myeloid cells are the major contributors to the inflammation these patients experience? Why should the SAVI community care about the Q293 locus in myeloid cells?

      This is likely a misunderstanding. We use BMDM for the purpose of comparing STING signaling (TBK1, IRF3, NFkB, STING activation) by WT/SAVI, HAQ/SAVI, AQ/SAVI. Ideally, we would like to compare STING signaling in CD4 T cells from WT/SAVI to HAQ/SAVI, AQ/SAVI mice. However, WT/SAVI has no CD4 T cells. Doing so, we are making the assumption that the basic STING signaling (TBK1, IRF3, NFkB, STING activation) is conserved between T cells and macrophages.

      (4) The functional assays in Figure 4 are exciting and really connect the alleles to disease progression. To strengthen the manuscript and connect all the data, I would recommend additional readouts from these mice that address the inflammatory phenotype shown in vitro in Figure 5. For example, measuring cytokines from these mice via ELISA or perhaps even Western blots looking for NFkB or STING activation would be supportive of the story. This would also allow for some tissue specificity. I believe looking for evidence of inflammation and STING activation in the lungs of these mice, for example, would further connect the data to human SAVI pathology.

      Reviewer 2 suggests looking for evidence of inflammation and STING activation in the lungs of HAQ/SAVI, AQ/SAVI. We would like to elaborate further. First, anti-inflammatory treatments, e.g. steroids, DMARDs, IVIG, Etanercept (TNF), rituximab, Nifedipine, amlodipine, et al., all failed in SAVI patients 23. JAK inhibitors on SAVI had mixed outcomes (lines 55-58). Second, Figure S5 examined lung neutrophils and inflammatory monocyte infiltration. Interestingly, while AQ/SAVI mice had a better lung function than HAQ/SAVI mice (Figure 4D, 4E vs 4H, 4I), HAQ/SAVI and AQ/SAVI lungs had comparable neutrophils and inflammatory monocyte infiltration (Figure S5). Last, SAVI is classified as type I interferonopathy 23, but the lung diseases of SAVI are mainly independent of type I IFNs 24-27. The AQ allele suppresses SAVI in vivo.  Understanding the mechanisms by which AQ rescues SAVI may lead to curative care for SAVI patients.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      One suggestion is to streamline this study by focusing on STING-mediated cell death only in CD4 T cells. The authors can use in vitro PBMC isolated human T cells, ex vivo T cells from the knock-in mice, and in vivo T cells from the SAVI breeding. The current manuscript includes myeloid cell death, Tregs, complex SAVI disease pathology, which is too confusing and too complex to explain with the varying effect from the three human STING1 variants.

      We sincerely appreciate Reviewer 1’s suggestion. The goal of our human STING alleles research has always been translational, i.e. improving human health. Even as a monogenetic disease, the SAVI pathology is still complex. For example, thought as a type I Interferonopathy, SAVI is largely independent of type I IFNs. Similarly, STING-activation-induced cell death, while contribute to SAVI, is not the whole story, as the Reviewer pointed out in the Comment 3 & 6 &7. HAQ/SAVI mice still died early and had lung dysfunction (Figure 4). In contrast, AQ/SAVI mice restore lifespan and lung function. We had Figure 6 show different T-regs between AQ/SAVI and HAQ/SAVI mice. In addition, AQ mice had more IL-10+ T-regs than HAQ mice 3. Therefore, we are excited about developing AQ-based curative therapy for SAVI patients (preventing cell death and inducing immune tolerance).  Again, we thank the Reviewer for the suggestion. Additional research is ongoing.

      Reviewer #2 (Recommendations For The Authors):

      Minor points

      (1) Generation of THP1 cells with the human STING alleles is missing from methods.

      We added the protocol in the methods (lines 380-387). THP-1 KO line stable expressing WT STING was first described by Weikang Tao’s group 10.

      (2) Some abbreviations are not expanded (CDA).

      CDA is expanded as cyclic di-AMP (e.g. line 375).

      References.

      (1) Patel, S. et al. The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele. J Immunol 198, 776-787 (2017).

      (2) Sebastian, M. et al. Obesity and STING1 genotype associate with 23-valent pneumococcal vaccination efficacy. JCI Insight 5 (2020).

      (3) Mansouri, S. et al. MPYS Modulates Fatty Acid Metabolism and Immune Tolerance at Homeostasis Independent of Type I IFNs. J Immunol 209, 2114-2132 (2022).

      (4) Sivick, K. E. et al. Comment on "The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele". J Immunol 198, 4183-4185 (2017).

      (5) Gulen, M. F. et al. Signalling strength determines proapoptotic functions of STING. Nat Commun 8, 427 (2017).

      (6) Kabelitz, D. et al. Signal strength of STING activation determines cytokine plasticity and cell death in human monocytes. Sci Rep 12, 17827 (2022).

      (7) Murthy, A. M. V., Robinson, N. & Kumar, S. Crosstalk between cGAS-STING signaling and cell death. Cell Death Differ 27, 2989-3003 (2020).

      (8) Kuhl, N. et al. STING agonism turns human T cells into interferon-producing cells but impedes their functionality. EMBO Rep 24, e55536 (2023).

      (9) Li, C., Liu, J., Hou, W., Kang, R. & Tang, D. STING1 Promotes Ferroptosis Through MFN1/2-Dependent Mitochondrial Fusion. Front Cell Dev Biol 9, 698679 (2021).

      (10) Song, C. et al. SHR1032, a novel STING agonist, stimulates anti-tumor immunity and directly induces AML apoptosis. Sci Rep 12, 8579 (2022).

      (11) Jin, L. et al. Identification and characterization of a loss-of-function human MPYS variant. Genes Immun 12, 263-269 (2011).

      (12) Yi, G. et al. Single nucleotide polymorphisms of human STING can affect innate immune response to cyclic dinucleotides. PLoS One 8, e77846 (2013).

      (13) Patel, S. et al. Response to Comment on "The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele". J Immunol 198, 4185-4188 (2017).

      (14) Gao, K. M. et al. Endothelial cell expression of a STING gain-of-function mutation initiates pulmonary lymphocytic infiltration. Cell Rep 43, 114114 (2024).

      (15) Gao, K. M., Motwani, M., Tedder, T., Marshak-Rothstein, A. & Fitzgerald, K. A. Radioresistant cells initiate lymphocyte-dependent lung inflammation and IFNgamma-dependent mortality in STING gain-of-function mice. Proc Natl Acad Sci U S A 119, e2202327119 (2022).

      (16) Hu, W. et al. Regulatory T cells function in established systemic inflammation and reverse fatal autoimmunity. Nat Immunol 22, 1163-1174 (2021).

      (17) Monroe, K. M. et al. IFI16 DNA sensor is required for death of lymphoid CD4 T cells abortively infected with HIV. Science 343, 428-432 (2014).

      (18) Doitsh, G. et al. Cell death by pyroptosis drives CD4 T-cell depletion in HIV-1 infection. Nature 505, 509-514 (2014).

      (19) Jakobsen, M. R., Olagnier, D. & Hiscott, J. Innate immune sensing of HIV-1 infection. Curr Opin HIV AIDS 10, 96-102 (2015).

      (20) Silvin, A. & Manel, N. Innate immune sensing of HIV infection. Curr Opin Immunol 32, 54-60 (2015).

      (21) Altfeld, M. & Gale, M., Jr. Innate immunity against HIV-1 infection. Nat Immunol 16, 554-562 (2015).

      (22) Krapp, C., Jonsson, K. & Jakobsen, M. R. STING dependent sensing - Does HIV actually care? Cytokine Growth Factor Rev 40, 68-76 (2018).

      (23) Liu, Y. et al. Activated STING in a vascular and pulmonary syndrome. N Engl J Med 371, 507-518 (2014).

      (24) Luksch, H. et al. STING-associated lung disease in mice relies on T cells but not type I interferon. J Allergy Clin Immunol 144, 254-266 e258 (2019).

      (25) Stinson, W. A. et al. The IFN-gamma receptor promotes immune dysregulation and disease in STING gain-of-function mice. JCI Insight 7 (2022).

      (26) Warner, J. D. et al. STING-associated vasculopathy develops independently of IRF3 in mice. J Exp Med 214, 3279-3292 (2017).

      (27) Fremond, M. L. et al. Overview of STING-Associated Vasculopathy with Onset in Infancy (SAVI) Among 21 Patients. J Allergy Clin Immunol Pract 9, 803-818 e811 (2021).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, the authors provide a comprehensive description of transcriptional regulation in Pseudomonas syringae by investigating the binding characteristics of various transcription factors. They uncover the hierarchical network structure of the transcriptome by identifying top-, middle-, and bottom-level transcription factors that govern the flow of information in the network. Additionally, they assess the functional variability and conservation of transcription factors across different strains of P. syringae by studying DNA-binding characteristics. These findings notably expand our current knowledge of the P. syringae transcriptome.

      The findings associated with crosstalk between transcription factors and pathways, and the diversity of transcription factor functions across strains provide valuable insights into the transcriptional regulatory network of P. syringae. However, these results are at times underwhelming as their significance is unclear. This study would benefit from a discussion of the implications of transcription factor crosstalk on the functioning of the organism as a whole. Additionally, the implications of variability in transcription factor functions on the phenotype of the strains studied would further this analysis.<br /> Overall, this manuscript serves as a key resource for researchers studying the transcriptional regulatory network of P. syringae.

      Thank you for your positive comments.

      Reviewer #2 (Public Review):

      Summary:

      The phytopathogenic bacterium Pseudomonas syringae is comprised of many pathovars with different host plant species and has been used as a model organism to study bacterial pathogenesis in plants. Transcriptional regulation is key to plant infection and adaptation to host environments by this bacterium. However, researchers have focused on a limited number of transcription factors (TFs) that regulate virulence-related pathways. Thus, a comprehensive, systems-level understanding of regulatory interactions between transcription factors in P. syringae has not been achieved.

      This study by Sun et al performed ChIP-seq analysis of 170 out of 301 TFs in P. syringae pv. syringae 1448A and used this unique dataset to infer transcriptional regulatory networks in this bacterium. The network analyses revealed hierarchical interactions between TFs, various network motifs, and co-regulation of target genes by TF pairs, which collectively mediate information flow. As discussed, the structure and properties of the P. syringae transcriptional regulatory networks are somewhat different from those identified in humans, yeast, and E. coli, highlighting the significance of this study. Further, the authors made use of the P. syringae transcriptional regulatory networks to find TFs of unknown functions to be involved in virulence-related pathways. For some of these TFs, their target specificity and biological functions, such as motility and biofilm formation, were experimentally validated. Of particular interest is the finding that despite conservation of TFs between P. syringae pv. syringae 1448A, P. syringae pv. tomato DC3000, P. syringae pv. syringae B728a, and P. syringae pv. actinidiae C48, some of the conserved TFs show different repertoires of target genes in these four P. syringae strains.

      Thank you for your positive comments.

      Strengths:

      This study presents a systems-level analysis of transcriptional regulatory networks in relation to P. syringae virulence and metabolism, and highlights differences in transcriptional regulatory landscapes of conserved TFs between different P. syringae strains, and develops a user-friendly database for mining the ChIP-seq data generated in this study. These findings and resources will be valuable to researchers in the fields of systems biology, bacteriology, and plant-microbe interactions.

      Thank you for your positive comments.

      Weaknesses:

      No major weaknesses were found, but some of the results may need to be interpreted with caution. ChIP-seq was performed with bacterial strains overexpressing TFs. This may cause artificial binding of TFs to promoters which may not occur when TFs are expressed at physiological levels. Another caution is applied to the interpretation of the biological functions of TFs. The biological roles of the tested TFs are based on in vitro experiments. Thus, functional relevance of the tested TFs during plant infection and/or survival under natural environmental conditions remains to be demonstrated.

      Thank you for your comments, and we agree with the reviewer. To eliminate the artificial binding of TFs, we performed EMSA to verify the analyzed targets. Our EMSA results confirmed the analyzed binding peaks.

      For the verification experiments of the biological functions of TFs, we also performed in vivo motility assay and biofilm production assay (Figures 3b-d). To further detect the biological functions of TFs, we performed plant infection assay of TF PSPPH2193 under natural environmental condition (bean leaves). As shown in Figures S6c and g, both the motility and the virulence of P. syringae in ∆PSPPH2193 strain was significantly reduced compared with WT strain. These results showed that TF PSPPH2193 positively regulated the pathogenicity of P. syringae via modulating the bacterial motility.

      Reviewer #3 (Public Review):

      Summary:

      This study aims to understand gene regulation of the plant bacterial pathogen Pseudomonas syringae. Although the function of some TFs has been characterized in this strain, a global picture of the gene regulatory network remains elusive. The authors conducted a large-scale ChIP-seq analysis, covering 170 out of 301 TFs of this strain, and revealed gene regulatory hierarchy with functional validation of some previously uncharacterized TFs.

      Thank you for your positive comments.

      Strengths:

      - This study provides one of the largest ChIP-seq datasets for a single bacterial strain, covering more than half of its TFs. This impressive resource enabled comprehensive systems-level analysis of the TF hierarchy.

      - This study identified novel gene regulation and function with validations through biochemical and genetic experiments.

      - The authors attempted on broad analyses including comparisons between different bacterial strains, providing further insights into the diversity and conservation of gene regulatory mechanisms.

      Thank you for your positive comments.

      Weaknesses:

      (1) Some conclusions are not backed by quantitative or statistical analyses, and they are sometimes overinterpreted.

      Thank you for your comments. We used hypergeometric test in this analysis. Although only one gene was enriched in some pathways, the adjusted p-value was less than 0.05. We added the details in the revised manuscript.

      (2) Some figures and analyses are not well explained, and I was not able to understand them.

      Thank you for your comments, and we are sorry for the confusion. We defined ‘indirect interaction’ as ‘co-association’ and ‘cooperativity’ as ‘if the common target of two TFs is from a TF’. We added the definition of "indirect interaction" and "cooperativity" in the revised legend.

      For Figure S3a, the low co-association scores and large peak numbers of these top-level TFs indicated that top-level TFs preferred to solely regulate target genes, but not to co-regulate with other top-level TFs. PSPPH4700 was an example to show that top-level TFs with low co-association scores and large peak numbers tend to solely regulate target genes, but not to co-regulate with other top-level TFs. We revised the sentence to ‘For example, the top-level TF PSPPH4700 yielded over 1,700 peaks but cooperated with only 24 top-level TFs with low co-association scores about 0.05 (Supplementary Table 2b).’.

      We analyzed high co-association scores of 125 TFs in three levels and further determined the co-association patterns. To identify the tendency of co-association of all these 125 TFs, the co-association patterns were classified into 4 clusters. Bottom-level TFs tend to co-regulate target genes with other TFs. We revised the sentence in the revised manuscript.

      For Figure 2b, in C1, C2 and C4, many bottom-level TFs performed co-association pattern with other TFs, especially bottom TFs (showed in C4). To explore the regulatory pattern in C3, the peak locations in target genes of MexT were analyzed with those of TFs in C3. Seven top-level TFs (PSPPH1435, PSPPH1758, PSPPH2193, PSPPH2454, PSPPH4638, PSPPH4998 and PSPPH3411), three middle-level TFs (PSPPH1100, PSPPH5132 and PSPPH5144) and four bottom-level TFs (PSPPH0700, PSPPH2300, PSPPH2444 and PSPPH2580) were compared with MexT. MexT showed higher co-association scores (more than 60 scores) with more top-level-TFs. Therefore, we demonstrated that MexT performed closer co-association relationships with top-level TFs. We added the statement in the revised manuscript.

      For Figure 1a, the hierarchical network showed different number of TFs in three levels (54 top-level TFs, 62 middle-level TFs and 147 bottom-level TFs), which indicated that more than half of TFs (bottom-level TFs) tend to be regulated by other TFs and then directly bound to target genes. This finding showed a downward regulatory direction of transcription regulation in P. syringae. We revised the statement in the revised manuscript.

      (3) The Method section lacks depth, especially in data analyses. It is strongly recommended that the authors share their analysis codes so that others can reproduce the analyses.

      Thank you for your comments, and we defined the intergenic region before each TF sequence as the promoter region. As pHM1 plasmid carries its own constitutive promoter (lacZ promoter), we amplified the TF-coding sequence and cloned into site following the promoter. The TF protein expression was activated by the promoter of plasmid. Psph 1448A was used for our main ChIP-seq. We added the details in the revised manuscript.

      For Figure S3, we performed GO analysis on genes that were co-bound by TF pairs. We added the details in the revised manuscript.

      We shared our analysis codes on the website (https://github.com/dengxinb2315/PS-PATRnet-code) in the Data Availability.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      (1) The specific strain of Pseudomonas syringae used in the study outside of the evolutionary analysis should be specified in the abstract and main text.

      Thank you for your suggestion. We revised the statements in abstract and main text to specific strains.

      (2) The language used throughout the manuscript should be revised for clarity, conciseness, and readability.

      Thank you for your suggestion. We have revised the language used throughput the manuscript by a scientific editor who is a native speaker of English.

      (2) Line 688: Replace "80C" with "-80C".

      Thank you for your correction. We revised ‘80℃’ to ‘-80℃’. Please see Line 713.

      (3) Line 172 - 173: The abbreviations TT, MM, BB, TM, TB, and MB need to be expanded in the main text before their use.

      Thank you for your suggestion. We added the abbreviations TT, MM, BB, TM, TB, and MB in the manuscript. Please see Lines 172-174.

      Reviewer #2 (Recommendations For The Authors):

      Major points

      (1) The name of the P. syringae strains used in each experiment/analysis should be explicitly stated (most experiments were carried out with P. syringae strain 1448A). This should also be applied to the introduction where many papers on P. syringae are cited without clear indication of strain names. I think this amendment is essential because target genes and thus biological functions of TFs could be different between P. syringae strains, as shown in the present study.

      Thank you for your suggestion. We revised the P. syringae strains in the citations throughout the manuscript.

      (2) How many TFs were analyzed throughout the study? Most sentences including line 22 in the abstract say 170, but I also found some say 270 (for example, line 106 and line 149). The legend of Figure 1 says 262. More detailed information is required regarding the datasets used for each analysis.

      Thank you for your suggestion. The number of TFs analyzed by ChIP-seq in this research is 170, the number of TFs analyzed by HT-SELEX in our previous research is 100. Hierarchical analysis integrated data from ChIP-seq and HT-SELEX which included 270 TFs. As 8 TFs did not show hierarchical characteristic, the legend of Figure 1 said 262 TFs. We added the data source in the revised manuscript. Please see Lines 104, 147, 160 and 1082.

      (3) Figure 1b: Please define "indirect interaction" and "cooperativity" in the legend as well as in the text. I only found the definition of "direct interaction".

      Sorry for the missing information. We defined ‘indirect interaction’ and ‘cooperativity’ as ‘co-association’ and ‘if the common target of two TFs is from a TF’, respectively. We added the definition of "indirect interaction" and "cooperativity" in the revised legend. Please see Lines 174-176, 1084-1086.

      (4) I found it very interesting that conserved TFs show different repertoires of target genes in different P. syringae strains. This suggests the rewiring of transcriptional regulatory networks in P. syringae strains, but the underlying mechanism is not explored in the current manuscript. It can be easily tested whether these conserved TFs bind to similar or different motifs by motif enrichment analysis. If they bind to similar motifs, it is possible that the promoter sequences of their target genes have diversified. Addressing or at least discussing these points would provide molecular insights into the diversification of the transcriptional regulatory networks in P. syringae. Similarly, functional enrichment analysis of target genes can be used to test whether the conserved TFs regulate different biological processes.

      Thank you for your suggestion. We added the motif analysis and functional enrichment analysis of target genes of TFs (PSPPH3122 and PSPPH4127) in different P. syringae strains. We found two different motifs (AGACN4GATCAA and CGGACGN3GATCA) in 1448A and DC3000 strains, respectively. We also performed the GO analysis and found the specific functions of PSPPH3122 in Psph 1448A compared with Pst DC3000 and Pss B728a strains, including recombinase activity and DNA recombination. For PSPPH4127, we found four different motifs in four P. syringae strains. GO analysis showed its relationship with recombinase activity in Psph 1448A strain, and RNA binding, structural constituent of ribosome, translation and ribosome in Pss B728a strain. These results indicated the highly functional diversity of TFs in P. syringae. We added these points in the Results part, and Figure S9-S10 in the revised manuscript. Please see Lines 497-509.

      (5) Related to point 4, it would be quite useful if a list of orthologous genes of 1448A TFs in the other tested P. syringae strains were provided. Such information may also enhance the utility of the database developed in this study.

      Thank you for your suggestion. We added the list of orthologous genes of 301 Psph 1448A TFs in the other tested P. syringae strains in the Supplementary Table 5. Please see Lines 467 and Supplementary Table 5.

      (6) Lines 243-246: It is unclear how these functional enrichment analyses were performed. Did you use target genes regulated by individual TFs or those coregulated by pairs of TFs? Please add more information for the sake of readers.

      Thank you for your suggestion. We performed the functional enrichment analyses by hypergeometric test (BH-adjusted p < 0.05) via using target genes regulated by individual TFs. We added the details in the Results part. Please see Lines 248-252, 270, 1194-1195, 1199-1200 and 1205-1206.

      Minor points

      (1) Lines 167-168: I may not understand correctly, but you might want to say "downward-pointing edges" instead of "upward-pointing edges".

      Thank you for correction. We revised the ‘upward-pointing edges’ to ‘downward-pointing edges’. Please see Line 166.

      (2) Line 174: "physical interactions" should be amended to "direct interactions".

      Thank you for correction. We revised the ‘physical interactions’ to ‘direct interactions’. Please see Line 177.

      (3) Line 224: Could you please explain why bacterial growth in plant tissues is considered an example of "multi-stability"?

      Thank you for your suggestion. We are sorry for the incorrect statement. We showed ‘plant intercellular spaces’ as ‘multi-stability’. We revised the sentence to ‘These auto-regulators are important and always act as repressors in scenarios of multi-stability, such as plant intercellular spaces’. Please see Lines 224-226.

      (4) Line 254-257: Here, the definition of "tether binding" is introduced, but it is not very clear to me. In my understanding, tethered binding is an indirect binding of a TF to a target gene through protein-protein interaction with other TF that directly binds to the promoter of the target gene.

      Thank you for your suggestion, and we agree with you. We referred to the paper published in 2012 (Wang et al., 2012) and revised the statement of ‘tether binding’ to ‘This finding suggested that these TFs indirectly regulated target genes through protein-protein interaction with other TFs that directly binds to the promoters of target genes, a phenomenon defined as tethered binding’. Please see Lines 259-262.

      (5) Lines 341-343: Figure 3b shows qRT-PCR of hopAE1, not hrpR.

      Thank you for your correction. We revised ‘hrpR’ to ‘hopAE1’. Please see Line 349.

      (6) Lines 500 and Figure 6b: It is hard to see edges from module 12 to others. So, it would be better to provide numeric information (number of TFs and target genes) in the text.

      Thank you for your suggestion. Module 12 includes 22 TFs and 318 target genes. We added the statement of numeric information about Module 12 in the revised manuscript. Please see Lines 536-537.

      (7) Line 519: Figure S4b is not the EMSA data for PSPPH3798. Should it be Figure S4e?

      Thank you for your correction. We revised to ‘Figure S4e’. Please see Line 545.

      (8) Line 522: Figure S6b is not relevant to the statement here.

      Thank you for your correction. We deleted the ‘Figure S6b’ here. Please see Line 547.

      (9) Line 593: prokaryotic transcriptional regulatory networks -> eukaryotic transcriptional regulatory networks?

      Thank you for your correction. We revised ‘prokaryotic transcriptional regulatory networks’ to ‘eukaryotic transcriptional regulatory networks’. Please see Line 618.

      (10) Figure S3 requires images of higher resolution. Especially, values for the color codes are not readable or very hard to see.

      Thank you for your suggestion. To make the images clearer, we enlarged the images, change the color codes, and divided it into three figures. Please see the revised Figures S3-S5 and corresponding Figure legends at Lines 1191-1206.

      Reviewer #3 (Recommendations For The Authors):<br /> (1) Some conclusions are not backed by quantitative or statistical analyses, and they are sometimes overinterpreted.

      L221: "Taken together, the simplest and most effective submodule M1 and the coregulatory submodule M13 played crucial roles in the transcriptional regulation of TFs in P. syringae."

      The authors did not provide any evidence supporting the functional importance of any of these submodules. M13 is most enriched within the locked loop, but its size is much smaller than simple loops. What evidence supports the importance of this particular submodule?

      Thank you for your suggestion. In eukaryote (Saccharomyces cerevisiae) and prokaryote (Escherichia coli) which have the best characterized transcriptional regulation networks, the feed-forward loop (called M13 in this article) appear numerous times in the networks and perform different biological functions. M1 appeared most frequently by an order of magnitude than other modules. We revised the sentence to ‘Taken together, the most numerous but simplest submodule M1 played a crucial role in the transcriptional regulation of TFs in P. syringae.’ Please see Lines 222-224.

      L223: "...we found 92 auto-regulators...These auto-regulators are important and always act as repressors in scenarios of multi-stability, such as in plant intercellular spaces where bacteria grow (Figure 1d)(Alon, 2007). These regulators are regarded as bistable switches that further influence the expression of downstream genes."<br /> Are these claims supported by any evidence?

      Thank you for your suggestion. We referred to the following articles:

      (1) Alon. Nature Reviews Genetics. 2007(Alon, 2007).

      That transcription factors repress the transcription of their target genes was considered as negative regulation. These negative autoregulators account for half of the repressors in E. coli and occur in many eukaryotes. The repressors controlled the concentration of the target production through suppressing its expression, which accelerated back to the steady state of cells.

      (2) Becskei. et al. Nature. 2000; Rosenfeld et al. Journal of Molecular Biology. 2002 (Becskei & Serrano, 2000; Rosenfeld, Elowitz, & Alon, 2002).

      Fluorescent assay confirmed that the negative autoregulatory module (negative autoregulator TetR) spent less time to the log phase than unregulated group, which reduced cell-to-cell fluctuations in the steady-state level of the transcription factor. Some negative autoregulators were showed here, such as LexA, CysB and SrlA-D.

      In our research, we also identified many autoregulators including CysB and LexA2 (annotated as LexA repressor). We revised the sentence to ‘In addition, we found 92 auto-regulators in our hierarchy network. These auto-regulators are important and always act as repressors in scenarios of multi-stability, such as plant intercellular spaces (Figure 1d) (Alon, 2007). For example, LexA and CysB as negative autoregulators were indicated to reduce cell-to-cell fluctuations in the steady-state level of the transcription factor (Becskei & Serrano, 2000; Rosenfeld et al. 2002).’. Please see Lines 224-229.

      L265: "This finding indicated that the bottom-level TFs, which were more easily regulated, tended to cooperate with downstream genes and other intra-level TFs."<br /> Could the authors provide more explanation to reach this conclusion from the data? Analyzing the number of highly co-accessing TFs does not sufficiently support this conclusion. The clustering of TFs (C1-C4) is incomplete, and each TF level (Top/Middle/Bottom) contains different numbers of TFs. Since the authors calculated all-by-all co-association scores for these 125 TFs, they can group these scores into 6 possible combinations (TT, TM, TB, MM, MB, BB) and show the distribution of co-association scores.

      Thank you for your suggestion. We indicated that the bottom-level TFs preferred to regulate the target genes through the cooperation with other TFs. To further support the claim, we analyzed the proportion of the bottom TF interaction in all the TF pairs interactions and direct interaction based on results in Figure 1B. The interactions of bottom TFs were 43% and 49%, respectively. However, the interactions of top TFs and middle TFs were only 20% and 28%, respectively. We revised the statement ‘Based on the analysis in Figure 1B, we found that the proportions of bottom-level TF interaction in all the TF pair interactions and direct interaction were 43% and 49%. These results indicated that the bottom-level TFs tended to regulate downstream genes through cooperating with other level TFs.’ in the revised manuscript. Please see Lines 269-272.

      As not every TF performed co-association with other TFs, we only collected 125 TFs with co-association scores. For the numbers of TF in each level, we divided TFs into three levels according to hierarchy height. Hierarchy height from -1 to -0.3 represented bottom level; hierarchy height from -0.3 to 0.3 represented middle level ; hierarchy height from 0.3 to 1 represents top level. Each level was equally divided by height scores. We suggested that different numbers of TFs in three levels indicated the characteristic of transcriptional regulation in P. syringae.

      Thank you for your suggestion. As the co-association patterns were determined by co-association scores of the same TFs, we first grouped the co-association scores into 3 possible TF pairs (TT, MM, and BB, in Figures S3a, S4a and S5a). Our results indicated that higher co-association scores preferred to occur in bottom-level TFs. We revised the statement in the revised manuscript. Please see Lines 244-252.

      (2) Some figures and analyses are not well explained, and I was not able to understand them.

      Figure 1b: The terms "direct," "indirect," and "cooperativity" require further clarification as their definitions in the text (L169-183) are unclear to me. This ambiguity hampers the evaluation of the authors' discussion regarding TF-TF interactions (L561-584), an important theme of this study. The figure includes concepts discussed in later sections (e.g., cooperativity), making it difficult to understand. A diagram explaining these concepts would be highly helpful for readers to understand.

      Sorry for the missing information. We defined ‘indirect interaction’ as ‘co-association’, ‘cooperativity’ as ‘if the common target of two TFs is from a TF’. We added the definition of "indirect interaction" and "cooperativity" in the revised manuscript and legend. Please see Lines 174-176 and 1085-1087.

      L253: "Notably, we found that TFs at the top level, without cooperating TFs, exhibited a large number of binding peaks (Figure S3a)."

      I could not understand this sentence. Did the authors mean that top-level TFs with a large number of peaks showed a low level of co-association? If so, does this data suggest that these TFs do not tend to cooperate with other TFs? I was confused by the discussion in L253-L261.

      Thank you for your comment, and we agree with you. The low co-association scores and large peak numbers of these top-level TFs indicated that top-level TFs preferred to solely regulate target genes, but not to co-regulate with other top-level TFs.

      Thank you for your comment. From L253-256, PSPPH4700 was an example to show that top-level TFs with low co-association scores and large peak numbers tend to solely regulate target genes, but not to co-regulate with other top-level TFs. We revised the sentence to ‘For example, the top-level TF PSPPH4700 yielded over 1,700 peaks, but cooperated with only 24 top-level TFs with low co-association scores about 0.05 (Supplementary Table 2b).’.

      From L257-261, we analyzed high co-association scores of 125 TFs in three levels and further determined the co-association patterns. To identify the tendency of co-association of all these 125 TFs, the co-association patterns were classified into 4 clusters. Bottom-level TFs tend to co-regulate target genes with other TFs. We revised the sentence. Please see Lines 262-264, 265-266 and 269-272.

      L287: "The analysis of the peak locations of MexT demonstrated that MexT showed closer co-association relationships with top-level TFs (Figure 2b)."

      I could reach this conclusion by seeing Figure 2b. Additional explanation and/or data visualization would be appreciated.

      Thank you for your suggestion. In C1, C2 and C4, many bottom-level TFs performed co-association pattern with other TFs, especially bottom TFs (showed in C4). To explore the regulatory pattern in C3, the peak locations in target genes of MexT were analyzed with those of TFs in C3. Seven top-level TFs (PSPPH1435, PSPPH1758, PSPPH2193, PSPPH2454, PSPPH4638, PSPPH4998 and PSPPH3411), three middle-level TFs (PSPPH1100, PSPPH5132 and PSPPH5144) and four bottom-level TFs (PSPPH0700, PSPPH2300, PSPPH2444 and PSPPH2580) were compared with MexT. MexT showed higher co-association scores (more than 60 scores) with more top-level-TFs. Therefore, we demonstrated that MexT performed closer co-association relationships with top-level TFs. We added the statement in the revised manuscript. Please see Lines 291-296.

      Figure 6cd: What kind of enrichment analysis did the authors perform? Was any statistical test used? The figure only shows the number of genes, and sometimes the number is only 1 for a functional category. Can it be considered as significant enrichment?

      Thank you for your comment. We used hypergeometric test in this analysis. Although only one gene was enriched in some pathways, the adjusted p-value was less than 0.05. We added the details in the revised manuscript. Please see Lines 533-534.

      L169: "The hierarchical network revealed a downward information flow, suggesting the prioritization of collaboration between different hierarchy levels."<br /> Can the authors please explain the logic behind this statement more in detail?

      Thank you for your comment. The hierarchical network showed different number of TFs in three levels (54 top-level TFs, 62 middle-level TFs and 147 bottom-level TFs), which indicated that more than half of TFs (bottom-level TFs) tend to be regulated by other TFs and then directly bound to target genes. This finding showed a downward regulatory direction of transcription regulation in P. syringae. We revised the statement in the revised manuscript. Please see Lines 167-170.

      (3) The Method section lacks depth, especially on data analyses.

      How did the authors define promoter regions of each gene? How were operons treated in their analyses? Was P. syringae 1448A used for their main ChIP-seq?

      Thank you for your comment. We defined the intergenic region before each TF sequence as the promoter region.

      As pHM1 plasmid carries its own constitutive promoter (lacZ promoter), we amplified the TF-coding sequence and cloned into the site following the promoter. The TF protein expression was activated by the promoter of plasmid.

      P. syringae 1448A was used for our main ChIP-seq. We added the details in the revised manuscript. Please see Lines 705 and 727-730.

      Figure S3: I am not sure how the GO analyses were done. For example, in the case of the top-level TF PSPPH4700, did the authors perform GO analysis on genes that are co-bound by PSPPH4700 and any other top-level TFs?

      Thank you for your comment and we agree with you. We performed GO analysis on genes that were co-bound by TF pairs in the same level. We added the details in the revised manuscript. Please see Lines 248-252.

      The analysis presented in Figure 6a needs more explanation of the methodology employed by the authors.

      Thank you for your comment. We added more details for the analysis in Figure 6a. Please see Lines 514-522.

      It is strongly recommended that the authors share their analysis codes so that others can reproduce the analyses.

      Thank you for your comment. We shared our analysis codes on the website (https://github.com/dengxinb2315/PS-PATRnet-code) in the Data Availability. Please see Lines 800-801.

      (4) Other:

      Figure 3: I suggest putting additional panel labels to facilitate the interpretation of the figure.

      Thank you for your suggestion. We added detailed labels in the revised Figures 3 and 4. Please see in the revised Figures 3 and 4.

      I spotted several potential errors:

      L106: 170 TFs?

      Thank you for your comment, and we are sorry for the missing details. For the hierarchical network, we integrated the DNA-binding data of 170 TFs in this study and 100 TFs in our previous SELEX research. We added the details in the revised manuscript. Please see Lines 104, 147 and 159-160.

      L592: P. syringae not E. coli?

      Thank you for your comment. Here we discussed the hierarchical characteristics in E. coli. We revised the statement in the revised manuscript. Please see Line 618.

      L593: eukaryotic not prokaryotic?

      Thank you for your correction. Here we discussed the feedforward loops in our study. We revised the statement in the revised manuscript. Please see Line 618.

      References

      Alon, U. (2007). Network motifs: theory and experimental approaches. Nature Reviews Genetics, 8(6), 450-461.

      Becskei, A., & Serrano, L. (2000). Engineering stability in gene networks by autoregulation. Nature, 405(6786), 590-593.

      Rosenfeld, N., Elowitz, M. B., & Alon, U. (2002). Negative autoregulation speeds the response times of transcription networks. Journal of molecular biology, 323(5), 785-793.

      Wang, J., Zhuang, J., Iyer, S., Lin, X., Whitfield, T. W., Greven, M. C., . . . Cheng, Y. (2012). Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome research, 22(9), 1798-1812.

    1. Author response:

      eLife assessment

      The authors present a potentially useful approach of broad interest arguing that anterior cingulate cortex (ACC) tracks option values in decisions involving delayed rewards. The authors introduce the idea of a resource-based cognitive effort signal in ACC ensembles and link ACC theta oscillations to a resistance-based strategy. The evidence supporting these new ideas is incomplete and would benefit from additional detail and more rigorous analyses and computational methods.

      The reviewers have provided several excellent suggestions and pointed out important shortcomings of our manuscript. We are grateful for their efforts. To address these concerns, we are planning a major revision to the manuscript. In the revision, our goal is to address each of the reviewer’s concerns and codify the evidence for resistance- and resource-based control signals in the rat anterior cingulate cortex. We have provided a nonexhaustive list we plan to address in the point by point responses below.   

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Young (2.5 mo [adolescent]) rats were tasked to either press one lever for immediate reward or another for delayed reward.

      Please note that at the time of testing and training that the rats were > 4 months old.

      The task had a complex structure in which (1) the number of pellets provided on the immediate reward lever changed as a function of the decisions made, (2) rats were prevented from pressing the same lever three times in a row. Importantly, this task is very different from most intertemporal choice tasks which adjust delay (to the delayed lever), whereas this task held the delay constant and adjusted the number of 20 mg sucrose pellets provided on the immediate value lever.

      Several studies parametrically vary the immediate lever (PMID: 39119916, 31654652, 28000083, 26779747, 12270518, 19389183). While most versions of the task will yield qualitatively similar estimates of discounting, the adjusting amount is preferred as it provides the most consistent estimates (PMID: 22445576). More specifically this version of the task avoids contrast effects of that result from changing the delay during the session (PMID: 23963529, 24780379, 19730365, 35661751) which complicates value estimates.

      Analyses are based on separating sessions into groups, but group membership includes arbitrary requirements and many sessions have been dropped from the analyses.

      We are in discussions about how to address this valid concern. This includes simply splitting the data by delay. This approach, however, has conceptual problems that we will also lay out in a full revision.  

      Computational modeling is based on an overly simple reinforcement learning model, as evidenced by fit parameters pegging to the extremes.

      We apologize for not doing a better job of explaining the advantages of this type of model for the present purposes. Nevertheless, given the clear lack of enthusiasm, we felt it was better to simply update the model as suggested by the Reviewers. The straightforward modifications have now been implemented and we are currently in discussion about how the new results fit into the larger narrative.

      The neural analysis is overly complex and does not contain the necessary statistics to assess the validity of their claims.

      We plan to streamline the existing analysis and add statistics, where required, to address this concern.

      Strengths:

      The task is interesting.

      Thank you for the positive comment

      Weaknesses:

      Behavior:

      The basic behavioral results from this task are not presented. For example, "each recording session consisted of 40 choice trials or 45 minutes". What was the distribution of choices over sessions? Did that change between rats? Did that change between delays? Were there any sequence effects? (I recommend looking at reaction times.) Were there any effects of pressing a lever twice vs after a forced trial?

      Animals tend to make more immediate choices as the delay is extended, which is reflected in Figure 1. We will add more detail and additional statistics to address these questions. 

      This task has a very complicated sequential structure that I think I would be hard pressed to follow if I were performing this task.

      Human tasks implement a similar task structure (PMID: 26779747). Please note the response above that outlines the benefits of using of this task.   

      Before diving into the complex analyses assuming reinforcement learning paradigms or cognitive control, I would have liked to have understood the basic behaviors the rats were taking. For example, what was the typical rate of lever pressing? If the rats are pressing 40 times in 45 minutes, does waiting 8s make a large difference?

      This is a good suggestion. However, rats do not like waiting for rewards, even small delays. Going from the 4 à 8 sec delay results in more immediate choices, indicating that the rats will forgo waiting for a smaller reinforcer at the 8 sec delay as compared to the 4 sec.  

      For that matter, the reaction time from lever appearance to lever pressing would be very interesting (and important). Are they making a choice as soon as the levers appear? Are they leaning towards the delay side, but then give in and choose the immediate lever? What are the reaction time hazard distributions?

      These are excellent suggestions. We are looking into implementing them.

      It is not clear that the animals on this task were actually using cognitive control strategies on this task. One cannot assume from the task that cognitive control is key. The authors only consider a very limited number of potential behaviors (an overly simple RL model). On this task, there are a lot of potential behavioral strategies: "win-stay/lose-shift", "perseveration", "alternation", even "random choices" should be considered.

      The strategies the Reviewer mentioned are descriptors of the actual choices the rats made. For example, perseveration means the rat is choosing one of the levers at an excessively high rate whereas alternation means it is choosing the two levers more or less equally, independent of payouts. But the question we are interested in is why? We are arguing that the type of cognitive control determines the choice behavior but cognitive control is an internal variable that guides behavior, rather than simply a descriptor of the behavior. For example, the animal opts to perseverate on the delayed lever because the cognitive control required to track ival is too high. We then searched the neural data for signatures of the two types of cognitive control.

      The delay lever was assigned to the "non-preferred side". How did side bias affect the decisions made?

      The side bias clearly does not impact performance as the animals prefer the delay lever at shorter delays, which works against this bias.

      The analyses based on "group" are unjustified. The authors compare the proportion of delayed to immediate lever press choices on the non-forced trials and then did k-means clustering on this distribution. But the distribution itself was not shown, so it is unclear whether the "groups" were actually different. They used k=3, but do not describe how this arbitrary number was chosen. (Is 3 the optimal number of clusters to describe this distribution?) Moreover, they removed three group 1 sessions with an 8s delay and two group 2 sessions with a 4s delay, making all the group 1 sessions 4s delay sessions and all group 2 sessions 8s delay sessions. They then ignore group 3 completely. These analyses seem arbitrary and unnecessarily complex. I think they need to analyze the data by delay. (How do rats handle 4s delay sessions? How do rats handle 6s delay sessions? How do rats handle 8s delay sessions?). If they decide to analyze the data by strategy, then they should identify specific strategies, model those strategies, and do model comparison to identify the best explanatory strategy. Importantly, the groups were session-based, not rat based, suggesting that rats used different strategies based on the delay to the delayed lever.

      These are excellent points and, as stated above, we are in the process revisiting the group assignments in an effort allay these criticisms.

      The reinforcement learning model used was overly simple. In particular, the RL model assumes that the subjects understand the task structure, but we know that even humans have trouble following complex task structures. Moreover, we know that rodent decision-making depends on much more complex strategies (model-based decisions, multi-state decisions, rate-based decisions, etc). There are lots of other ways to encode these decision variables, such as softmax with an inverse temperature rather than epsilon-greedy. The RL model was stated as a given and not justified. As one critical example, the RL model fit to the data assumed a constant exponential discounting function, but it is well-established that all animals, including rodents, use hyperbolic discounting in intertemporal choice tasks. Presumably this changes dramatically the effect of 4s and 8s. As evidence that the RL model is incomplete, the parameters found for the two groups were extreme. (Alpha=1 implies no history and only reacting to the most recent event. Epsilon=0.4 in an epsilon-greedy algorithm is a 40% chance of responding randomly.)

      Please see our response above. We agree that the approach was not justified, but we do not agree that it is invalid. Simply stated, a softmax approach gives the best fit to the choice behavior, whereas our epsilon-greedy approach attempted to reproduce the choice behavior using a naïve agent that progressively learns the values of the two levers on a choice-by-choice basis. The epsilon-greedy approach can therefore tell us whether it is possible to reproduce the choice behavior by an agent that is only tracking ival. Given our discovery of an ival-tracking signal in ACC, we believed that this was a critical point (although admittedly we did a poor job of communicating it). However, we also appreciate that important insights can be gained by fitting a model to the data as suggested. In fact, we had implemented this approach initially and are currently reconsidering what it can tell us in light of the Reviewers comments.

      The authors do add a "dbias" (which is a preference for the delayed lever) term to the RL model, but note that it has to be maximal in the 4s condition to reproduce group 2 behavior, which means they are not doing reinforcement learning anymore, just choosing the delayed lever.

      Exactly. The model results indicated that a naïve agent that relied only on ival tracking would not behave in this manner. Hence it therefore was unlikely that the G1 animals were using an ival-tracking strategy, even though a strong ival-tracking signal was present in ACC.

      Neurophysiology:

      The neurophysiology figures are unclear and mostly uninterpretable; they do not show variability, statistics or conclusive results.

      While the reviewer is justified in criticizing the clarity of the figures, the statement that “they do not show variability, statistics or conclusive results” is demonstrably false. Each of the figures presented in the manuscript, except Figure 3, are accompanied by statistics and measures of variability. This comment is hyperbolic and not justified.  

      Figure 3 was an attempt to show raw neural data to better demonstrate how robust the ivalue tracking signal is.

      As with the behavior, I would have liked to have seen more traditional neurophysiological analyses first. What do the cells respond to? How do the manifolds change aligned to the lever presses? Are those different between lever presses?

      We provide several figures describing how neurons change firing rates in response to varying reward. We are unsure what the reviewer means by “traditional analysis”, especially since this is immediately followed by a request for an assessment of neural manifolds. That said, we are developing ways to make the analysis more intuitive and, hopefully, more “traditional”.

      Are there changes in cellular information (both at the individual and ensemble level) over time in the session?

      We provide several analyses of how firing rate changes over trials in relation to ival over time in the session.

      How do cellular responses differ during that delay while both levers are out, but the rats are not choosing the immediate lever?

      It is not clear to us how this analysis addresses our hypothesis regarding control signals in ACC.

      Figure 3, for example, claims that some of the principal components tracked the number of pellets on the immediate lever ("ival"), but they are just two curves. No statistics, controls, or justification for this is shown. BTW, on Figure 3, what is the event at 200s?

      Figure 3 will be folded into one of the other figures that contains the summary statistics.

      I'm confused. On Figure 4, the number of trials seems to go up to 50, but in the methods, they say that rats received 40 trials or 45 minutes of experience.

      This analysis included force trials. The max of the session is 40 choice trials. We will clarify in the revised manuscript. 

      At the end of page 14, the authors state that the strength of the correlation did not differ by group and that this was "predicted" by the RL modeling, but this statement is nonsensical, given that the RL modeling did not fit the data well, depended on extreme values. Moreover, this claim is dependent on "not statistically detectable", which is, of course, not interpretable as "not different".

      We plan to revisit this analysis and the RL model.

      There is an interesting result on page 16 that the increases in theta power were observed before a delayed lever press but not an immediate lever press, and then that the theta power declined after an immediate lever press.

      Thank you for the positive comment.

      These data are separated by session group (again group 1 is a subset of the 4s sessions, group 2 is a subset of the 8s sessions, and group 3 is ignored). I would much rather see these data analyzed by delay itself or by some sort of strategy fit across delays.

      Provisional analysis indicates that the results hold up over delays, rather than the groupings in the paper. We will address this in a full revision of the manuscript.

      That being said, I don't see how this description shows up in Figure 6. What does Figure 6 look like if you just separate the sessions by delay?

      We are unclear what the reviewer means by “this description”.

      Discussion:

      Finally, it is unclear to what extent this task actually gets at the questions originally laid out in the goals and returned to in the discussion. The idea of cognitive effort is interesting, but there is no data presented that this task is cognitive at all. The idea of a resourced cognitive effort and a resistance cognitive effort is interesting, but presumably the way one overcomes resistance is through resource-limited components, so it is unclear that these two cognitive effort strategies are different.

      We view the strong evidence for ival tracking presented herein as a potentially critical component of resource based cognitive effort. We hope to clarify how this task engaged cognitive effort more clearly.  

      The authors state that "ival-tracking" (neurons and ensembles that presumably track the number of pellets being delivered on the immediate lever - a fancy name for "expectations") "taps into a resourced-based form of cognitive effort", but no evidence is actually provided that keeping track of the expectation of reward on the immediate lever depends on attention or mnemonic resources. They also state that a "dLP-biased strategy" (waiting out the delay) is a "resistance-based form of cognitive effort" but no evidence is made that going to the delayed side takes effort.

      There is a well-developed literature that rats and mice do not like waiting for delayed reinforcers. We contend that enduring something you don’t like takes effort.

      The authors talk about theta synchrony, but never actually measure theta synchrony, particularly across structures such as amygdala or ventral hippocampus. The authors try to connect this to "the unpleasantness of the delay", but provide no measures of pleasantness or unpleasantness. They have no evidence that waiting out an 8s delay is unpleasant.

      We will better clarify how our measure of Theta power relates to synchrony. There is a well-developed literature that rats and mice do not like waiting for delayed reinforcers.

      The authors hypothesize that the "ival-tracking signal" (the expectation of number of pellets on the immediate lever) "could simply reflect the emotional or autonomic response". Aside from the fact that no evidence for this is provided, if this were to be true, then, in what sense would any of these signals be related to cognitive control?

      This is proposed as an alternative explanation to the ivalue signal. We provide this as a possibility, never a conclusion. We will clarify this in the revised text. 

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores the neuronal signals that underlie resistance vs resource-based models of cognitive effort. The authors use a delayed discounting task and computational models to explore these ideas. The authors find that the ACC strongly tracks value and time, which is consistent with prior work. Novel contributions include quantification of a resource-based control signal among ACC ensembles, and linking ACC theta oscillations to a resistance-based strategy.

      Strengths:

      The experiments and analyses are well done and have the potential to generate an elegant explanatory framework for ACC neuronal activity. The inclusion of local-field potential / spike-field analyses is particularly important because these can be measured in humans.

      Thank you for the endorsement of our work.

      Weaknesses:

      I had questions that might help me understand the task and details of neuronal analyses.

      (1) The abstract, discussion, and introduction set up an opposition between resource and resistance based forms of cognitive effort. It's clear that the authors find evidence for each (ACC ensembles = resource, theta=resistance?) but I'm not sure where the data fall on this dichotomy.

      a. An overall very simple schematic early in the paper (prior to the MCML model? or even the behavior) may help illustrate the main point.

      b. In the intro, results, and discussion, it may help to relate each point to this dichotomy.

      c. What would resource-based signals look like? What would resistance based signals look like? Is the main point that resistance-based strategies dominate when delays are short, but resource-based strategies dominate when delays are long?

      d. I wonder if these strategies can be illustrated? Could these two measures (dLP vs ival tracking) be plotted on separate axes or extremes, and behavior, neuronal data, LFP, and spectral relationships be shown on these axes? I think Figure 2 is working towards this. Could these be shown for each delay length? This way, as the evidence from behavior, model, single neurons, ensembles, and theta is presented, it can be related to this framework, and the reader can organize the findings.

      These are excellent suggestions, and we intend to implement each of them, where possible.

      (2) The task is not clear to me.

      a. I wonder if a task schematic and a flow chart of training would help readers.

      Yes, excellent idea, we intend to include this.

      b. This task appears to be relatively new. Has it been used before in rats (Oberlin and Grahame is a mouse study)? Some history / context might help orient readers.

      Indeed, this task has been used in rats in several prior studies in rats. Please see the following references (PMID: 39119916, 31654652, 28000083, 26779747, 12270518, 19389183).

      c. How many total sessions were completed with ascending delays? Was there criteria for surgeries? How many total recording sessions per animal (of the 54?)

      Please note that the delay does not change within a session. There was no criteria for surgery. In addition, we will update Table 1 to make the number of recording sessions more clear.

      d. How many trials completed per session (40 trials OR 45 minutes)? Where are there errors? These details are important for interpreting Figure 1.

      Every animal in this data set completed 40 trials. We will update the task description to clarify this issue. There are no errors in this task, but rather the task is designed to the tendency to make an impulsive choice (smaller reward now). We will provide clarity to this issue in the revision of the manuscript.   

      (3) Figure 1 is unclear to me.

      a. Delayed vs immediate lever presses are being plotted - but I am not sure what is red, and what is blue. I might suggest plotting each animal.

      We will clarify the colors and look into schemes to graph the data set.

      b. How many animals and sessions go into each data point?

      This information is in Table 1, but this could be clearer, and we will update the manuscript.

      c. Table 1 (which might be better referenced in the paper) refers to rats by session. Is it true that some rats (2 and 8) were not analyzed for the bulk of the paper? Some rats appear to switch strategies, and some stay in one strategy. How many neurons come from each rat?

      Table 1 is accurate, and we can add the number of neurons from each animal.

      d. Task basics - RT, choice, accuracy, video stills - might help readers understand what is going into these plots

      e. Does the animal move differently (i.e., RTs) in G1 vs. G2?

      We will look into ways to incorporate this information.

      (4) I wasn't sure how clustered G1 vs. G2 vs G3 are. To make this argument, the raw data (or some axis of it) might help.

      a. This is particularly important because G3 appears to be a mix of G1 and G2, although upon inspection, I'm not sure how different they really are

      b. Was there some objective clustering criteria that defined the clusters?

      c. Why discuss G3 at all? Can these sessions be removed from analysis?

      These are all excellent suggestions and points. We plan to revisit the strategy to assign sessions to groups, which we hope will address each of these points.

      (5) The same applies to neuronal analyses in Fig 3 and 4

      a. What does a single neuron peri-event raster look like? I would include several of these.

      b. What does PC1, 2 and 3 look like for G1, G2, and G3?

      c. Certain PCs are selected, but I'm not sure how they were selected - was there a criteria used? How was the correlation between PCA and ival selected? What about PCs that don't correlate with ival?

      d. If the authors are using PCA, then scree plots and PETHs might be useful, as well as comparisons to PCs from time-shuffled / randomized data.

      We will make several updates to enhance clarity of the neural data analysis, including adding more representative examples. We feel the need to balance the inclusion of representative examples with groups stats given the concerns raised by R1.

      (6) I had questions about the spectral analysis

      a. Theta has many definitions - why did the authors use 6-12 Hz? Does it come from the hippocampal literature, and is this the best definition of theta?. What about other bands (delta - 1-4 Hz), theta (4-7 Hz); and beta - 13- 30 Hz? These bands are of particular importance because they have been associated with errors, dopamine, and are abnormal in schizophrenia and Parkinson's disease.

      This designation comes mainly from the hippocampal and ACC literature in rodents. In addition, this range best captured the peak in the power spectrum in our data. Note that we focus our analysis on theta give the literature regarding theta in the ACC as a correlate of cognitive controls (references in manuscript). We did interrogate other bands as a sanity check and the results were mostly limited to theta. Given the scope of our manuscript and the concerns raised regarding complexity we are concerned that adding frequency analyses beyond theta obfuscates the take home message. However, we think this is worthy, and we will determine if this can be done in a brief, clear, and effective manner.

      b. Power spectra and time-frequency analyses may justify the authors focus. I would show these (y-axis - frequency, x-axis - time, z-axis, power).

      This is an excellent suggestion that we look forward to incorporating. 

      (7) PC3 as an autocorrelation doesn't seem the to be right way to infer theta entrainment or spike-field relationships, as PCA can be vulnerable to phantom oscillations, and coherence can be transient. It is also difficult to compare to traditional measures of phase-locking. Why not simply use spike-field coherence? This is particularly important with reference to the human literature, which the authors invoke.

      Excellent suggestion. We will look into the phantom oscillation issue. Note that PCA provided a way to classify neurons that exhibited peaks in the autocorrelation at theta frequencies. While spike-field coherence is a rigorous tool, it addresses a slightly different question (LFP entrainment). Notwithstanding, we plan to address this issue.  

      Reviewer #3 (Public Review):

      Summary:

      The study investigated decision making in rats choosing between small immediate rewards and larger delayed rewards, in a task design where the size of the immediate rewards decreased when this option was chosen and increased when it was not chosen. The authors conceptualise this task as involving two different types of cognitive effort; 'resistance-based' effort putatively needed to resist the smaller immediate reward, and 'resource-based' effort needed to track the changing value of the immediate reward option. They argue based on analyses of the behaviour, and computational modelling, that rats use different strategies in different sessions, with one strategy in which they consistently choose the delayed reward option irrespective of the current immediate reward size, and another strategy in which they preferentially choose the immediate reward option when the immediate reward size is large, and the delayed reward option when the immediate reward size is small. The authors recorded neural activity in anterior cingulate cortex (ACC) and argue that ACC neurons track the value of the immediate reward option irrespective of the strategy the rats are using. They further argue that the strategy the rats are using modulates their estimated value of the immediate reward option, and that oscillatory activity in the 6-12Hz theta band occurs when subjects use the 'resistance-based' strategy of choosing the delayed option irrespective of the current value of the immediate reward option. If solid, these findings will be of interest to researchers working on cognitive control and ACCs involvement in decision making. However, there are some issues with the experiment design, reporting, modelling and analysis which currently preclude high confidence in the validity of the conclusions.

      Strengths:

      The behavioural task used is interesting and the recording methods should enable the collection of good quality single unit and LFP electrophysiology data. The authors recorded from a sizable sample of subjects for this type of study. The approach of splitting the data into sessions where subjects used different strategies and then examining the neural correlates of each is in principle interesting, though I have some reservations about the strength of evidence for the existence of multiple strategies.

      Thank you for the positive comments.

      Weaknesses:

      The dataset is very unbalanced in terms of both the number of sessions contributed by each subject, and their distribution across the different putative behavioural strategies (see table 1), with some subjects contributing 9 or 10 sessions and others only one session, and it is not clear from the text why this is the case. Further, only 3 subjects contribute any sessions to one of the behavioural strategies, while 7 contribute data to the other such that apparent differences in brain activity between the two strategies could in fact reflect differences between subjects, which could arise due to e.g. differences in electrode placement. To firm up the conclusion that neural activity is different in sessions where different strategies are thought to be employed, it would be important to account for potential cross-subject variation in the data. The current statistical methods don't do this as they all assume fixed effects (e.g. using trials or neurons as the experimental unit and ignoring which subject the neuron/trial came from).

      This is an important issue that we plan to address with additional analysis in the manuscript update.

      It is not obvious that the differences in behaviour between the sessions characterised as using the 'G1' and 'G2' strategies actually imply the use of different strategies, because the behavioural task was different in these sessions, with a shorter wait (4 seconds vs 8 seconds) for the delayed reward in the G1 strategy sessions where the subjects consistently preferred the delayed reward irrespective of the current immediate reward size. Therefore the differences in behaviour could be driven by difference in the task (i.e. external world) rather than a difference in strategy (internal to the subject). It seems plausible that the higher value of the delayed reward option when the delay is shorter could account for the high probability of choosing this option irrespective of the current value of the immediate reward option, without appealing to the subjects using a different strategy.

      Further, even if the differences in behaviour do reflect different behavioural strategies, it is not obvious that these correspond to allocation of different types of cognitive effort. For example, subjects' failure to modify their choice probabilities to track the changing value of the immediate reward option might be due simply to valuing the delayed reward option higher, rather than not allocating cognitive effort to tracking immediate option value (indeed this is suggested by the neural data). Conversely, if the rats assign higher value to the delayed reward option in the G1 sessions, it is not obvious that choosing it requires overcoming 'resistance' through cognitive effort.

      The RL modelling used to characterise the subject's behavioural strategies made some unusual and arguably implausible assumptions:

      i) The goal of the agent was to maximise the value of the immediate reward option (ival), rather than the standard assumption in RL modelling that the goal is to maximise long-run (e.g. temporally discounted) reward. It is not obvious why the rats should be expected to care about maximising the value of only one of their two choice options rather than distributing their choices to try and maximise long run reward.

      ii) The modelling assumed that the subject's choice could occur in 7 different states, defined by the history of their recent choices, such that every successive choice was made in a different state from the previous choice. This is a highly unusual assumption (most modelling of 2AFC tasks assumes all choices occur in the same state), as it causes learning on one trial not to generalise to the next trial, but only to other future trials where the recent choice history is the same.

      iii) The value update was non-standard in that rather than using the trial outcome (i.e. the amount of reward obtained) as the update target, it instead appeared to use some function of the value of the immediate reward option (it was not clear to me from the methods exactly how the fival and fqmax terms in the equation are calculated) irrespective of whether the immediate reward option was actually chosen.

      iv) The model used an e-greedy decision rule such that the probability of choosing the highest value option did not depend on the magnitude of the value difference between the two options. Typically, behavioural modelling uses a softmax decision rule to capture a graded relationship between choice probability and value difference.

      v) Unlike typical RL modelling where the learned value differences drive changes in subjects' choice preferences from trial to trial, to capture sensitivity to the value of the immediately rewarding option the authors had to add in a bias term which depended directly on this value (not mediated by any trial-to-trial learning). It is not clear how the rat is supposed to know the current trial ival if not by learning over previous trials, nor what purpose the learning component of the model serves if not to track the value of the immediate reward option.

      Given the task design, a more standard modelling approach would be to treat each choice as occurring in the same state, with the (temporally discounted) value of the outcomes obtained on each trial updating the value of the chosen option, and choice probabilities driven in a graded way (e.g. softmax) by the estimated value difference between the options. It would be useful to explicitly perform model comparison (e.g. using cross-validated log-likelihood with fitted parameters) of the authors proposed model against more standard modelling approaches to test whether their assumptions are justified. It would also be useful to use logistic regression to evaluate how the history of choices and outcomes on recent trials affects the current trial choice, and compare these granular aspects of the choice data with simulated data from the model.

      Each of the issues outlined above with the RL model a very important. We are currently re-evaluating the RL modeling approach in light of these comments. Please see comments to R1 regarding the model as they are relevant for this as well.

      There were also some issues with the analyses of neural data which preclude strong confidence in their conclusions:

      Figure 4I makes the striking claim that ACC neurons track the value of the immediately rewarding option equally accurately in sessions where two putative behavioural strategies were used, despite the behaviour being insensitive to this variable in the G1 strategy sessions. The analysis quantifies the strength of correlation between a component of the activity extracted using a decoding analysis and the value of the immediate reward option. However, as far as I could see this analysis was not done in a cross-validated manner (i.e. evaluating the correlation strength on test data that was not used for either training the MCML model or selecting which component to use for the correlation). As such, the chance level correlation will certainly be greater than 0, and it is not clear whether the observed correlations are greater than expected by chance.

      This is an astute observation and we plan to address this concern. We agree that cross-validation may provide an appropriate tool here.

      An additional caveat with the claim that ACC is tracking the value of the immediate reward option is that this value likely correlates with other behavioural variables, notably the current choice and recent choice history, that may be encoded in ACC. Encoding analyses (e.g. using linear regression to predict neural activity from behavioural variables) could allow quantification of the variance in ACC activity uniquely explained by option values after controlling for possible influence of other variables such as choice history (e.g. using a coefficient of partial determination).

      This is also an excellent point that we plan to address the manuscript update.

      Figure 5 argues that there are systematic differences in how ACC neurons represent the value of the immediate option (ival) in the G1 and G2 strategy sessions. This is interesting if true, but it appears possible that the effect is an artefact of the different distribution of option values between the two session types. Specifically, due to the way that ival is updated based on the subjects' choices, in G1 sessions where the subjects are mostly choosing the delayed option, ival will on average be higher than in G2 sessions where they are choosing the immediate option more often. The relative number of high, medium and low ival trials in the G1 and G2 sessions will therefore be different, which could drive systematic differences in the regression fit in the absence of real differences in the activity-value relationship. I have created an ipython notebook illustrating this, available at: https://notebooksharing.space/view/a3c4504aebe7ad3f075aafaabaf93102f2a28f8c189ab9176d4807cf1565f4e3. To verify that this is not driving the effect it would be important to balance the number of trials at each ival level across sessions (e.g. by subsampling trials) before running the regression.

      Excellent point and thank you for the notebook. We explored a similar approach previously but did not pursue it to completion. We will re-investigate this issue.

    1. Author response:

      Reviewer #3 (Public Review):

      (1) Conditions on growth and interaction rates for feasibility and stability. The authors approach this using a mean field approximation, and it is important to note that there is no particular temperature dependence assumed here: as far as it goes, this analysis is completely general for arbitrary Lotka-Volterra interactions.

      However, the starting point for the authors' mean field analysis is the statement that "it is not possible to meaningfully link the structure of species interactions to the exact closed-form analytical solution for [equilibria] 𝑥^*_𝑖 in the Lotka-Volterra model.

      I may be misunderstanding, but I don't agree with this statement. The time-independent equilibrium solution with all species present (i.e. at non-zero abundances) takes the form

      x^* = A^{-1}r

      where A is the inverse of the community matrix, and r is the vector of growth rates. The exceptions to this would be when one or more species has abundance = 0, or A is not invertible. I don't think the authors intended to tackle either of these cases, but maybe I am misunderstanding that.

      So to me, the difficulty here is not in writing a closed-form solution for the equilibrium x^*, it is in writing the inverse matrix as a nice function of the entries of the matrix A itself, which is where the authors want to get to. In this light, it looks to me like the condition for feasibility (i.e. that all x^* are positive, which is necessary for an ecologically-interpretable solution) is maybe an approximation for the inverse of A---perhaps valid when off-diagonal entries are small. A weakness then for me was in understanding the range of validity of this approximation, and whether it still holds when off-diagonal entries of A (i.e. inter-specific interactions) are arbitrarily large. I could not tell from the simulation runs whether this full range of off-diagonal values was tested.

      We thank the reviewer for pointing this out and we agree that the language used is imprecise. The GLV model is solvable using the matrix inversion method but as they note, this does not give an interpretable expression in terms of the system parameters. This is important as we aim to build understanding of how these parameters (which in turn depend on temperature) affect the richness in communities. We have made this clearer in lines 372-379.

      In regards to the validity of the approximation we have significantly increased the detail of the method in the manuscript, including the assumptions it makes (lines 384-393). In general the method assumes that any individual interaction has a weak effect on abundance. This will fail when the variation in interactions becomes too strong but should be robust to changes in the average interaction strength across the community.

      As a secondary issue here, it would have been helpful to understand whether the authors' feasible solutions are always stable to small perturbations. In general, I would expect this to be an additional criterion needed to understand diversity, though as the authors point out there are certain broad classes of solutions where feasibility implies stability.

      As the reviewer notes previous work using the GLV model by ? has shown that stability almost surely implies stability in the GLV. Thus we expect that our richness estimates derived from feasibility will closely resemble those from stabiltiy. We have amended the maintext to make this argument clear on lines 321-335.

      (2) I did not follow the precise rationale for selecting the temperature dependence of growth rate and interaction rates, or how the latter could be tested with empirical data, though I do think that in principle this could be a valuable way to understand the role of temperature dependence in the Lotka-Volterra equations.

      First, as the authors note, "the temperature dependence of resource supply will undoubtedly be an important factor in microbial communities"

      Even though resources aren't explicitly modeled here, this suggests to me that at some temperatures, resource supply will be sufficiently low for some species that their growth rates will become negative. For example, if temperature dependence is such that the limiting resource for a given species becomes too low to balance its maintenance costs (and hence mortality rate), it seems that the net growth rate will be negative. The alternative would be that temperature affects resource availability, but never such that a limiting resource leads to a negative growth rate when a taxon is rare.

      On the other hand, the functional form for the distribution of growth rates (eq 3) seems to imply that growth rates are always positive. I could imagine that this is a good description of microbial populations in a setting where the resource supply rate is controlled independently of temperature, but it wasn't clear how generally this would hold.

      We thank the reviewer for their comment. The assumption of positive growth rates is indeed a feature of the Boltzmann-Arrhenius model of temperature dependence. We use the Boltzmann-Arrhenius model due to the dependence of growth on metabolic rate. As metabolic rate is ultimately determined by biochemical kinetics its temper- ature dependence is well described by the Boltzmann-Arrhenius. In addition to this reasoning there is a wealth of empirical evidence supporting the use of the Boltzmann- Arrhenius to describe the temperature dependence of growth rate in microbes.

      Ultimately the temperature dependence of resource supply is not something we can directly consider in our model. As such we have to assume that resource supply is sufficient to maintain positive growth rates in the community. Note that this assump- tion only requires resource supply is sufficient to maintain positive growth rates (i.e. the maximal growth rate of species in isolation) not that resource supply is sufficient to maintain growth in the presence of intra- and interspecific competition. We have updated the manuscript in lines 156-159 to make these assumptions more clear.

      Secondly, while I understand that the growth rate in the exponential phase for a single population can be measured to high precision in the lab as a function of temperature, the assumption for the form of the interaction rates' dependence on temperature seems very hard to test using empirical data. In the section starting L193, the authors seem to fit the model parameters using growth rate dependence on temperature, but then assume that it is reasonable to "use the same thermal response for growth rates and interactions". I did not follow this, and I think a weakness here is in not providing clear evidence that the functional form assumed in Equation (4) actually holds.

      The reviewer is correct, it is very difficult to measure interaction coefficients experi- mentally and to our knowledge there is little to no data available on their empirical temperature responses. We as a best guess use the observed variation in thermal physiology parameters for growth rate as a proxy assuming that interactions must also depend on metabolic rates of the interacting species (see also response to com- ment 8).

    1. Author response:

      Reviewer #1 (Public Review):

      The authors conducted cross-species comparisons between the human brain and the macaque brain to disentangle the specific characteristics of structural development of the human brain. Although previous studies had revealed similarities and differences in brain anatomy between the two species by spatially aligning the brains, the authors made the comparison along the chronological axis by establishing models for predicting the chronological ages with the inputting brain structural features. The rationale is actually clear given that brain development occurs over time in both. More interestingly, the model trained on macaque data was better able to predict the age of humans than the human-trained model was at predicting macaque age. This revealed a brain cross-species age gap (BCAP) that quantified the discrepancy in brain development between the two species, and the authors even found this BCAP measure was associated with performance on behavioral tests in humans. Overall, this study provides important and novel insights into the unique characteristics of human brain development. The authors have employed a rigorous scientific approach, reflecting diligent efforts to scrutinize the patterns of brain age models across species. The clarity of the rationale, the interpretability of the methods, and the quality of the presentation all contribute to the strength of this work.

      We are grateful to your helpful and thorough review and for being so positive about our manuscript. Following your recommendations, we have added more analytic details that have strengthened our paper. We would like to thank you for your input.

      Reviewer #2 (Public Review):

      In the current study, Li et al. developed a novel approach that aligns chronological age to a cross-species brain age prediction model to investigate the evolutionary effect. This method revealed some interesting findings, like the brain-age gap of the macaque model in predicting human age will increase as chronological age increases, suggesting an evolutionary alignment between the macaque brain and the human brain in the early stage of development. This study exhibits ample novelty and research significance. However, I still have some concerns regarding the reliability of the current findings.

      We thank you for the positive and appreciative feedback on our work and the insightful comments, which we have addressed below.

      Question 1: Although the authors named their new method a "cross-species" model, the current study only focused on the prediction between humans and macaques. It would be better to discuss whether their method can also generalize to cross-species examination of other species (e.g., C. elegans), which may provide more comprehensive evolutionary insights. Also, other future directions with their new method are worth discussing.

      We appreciate your insightful comment regarding the generalizability of our model to other species. As you said, we indeed only performed human-macaque cross-species study not including other species. In our study, we only focused human and macaque because macaque is considered to be one of the closest primates to humans except chimpanzees and thus is considered to be the best model for studying human brain evolution. However, our proposed method has limitations that limit its generalizability for other species, e.g., C. elegans. First, our model was trained using MRI data, which limits its applicability to species for which such data is unavailable. This technological requirement brings a barrier to broaden cross-species application. Second, our current model is based on homologous brain atlases that are available for both humans and macaques. The lack of comparable atlases for other species further restricts the model's generalizability. We have discussed this limitation in the revised manuscript and outlined potential future directions to overcome these challenges. This includes discussing the need for developing comparable imaging techniques and standardized brain atlases across a wider range of species to enhance the model's applicability and broaden our understanding of cross-species neurodevelopmental patterns.

      On page 15, lines 11-18

      “However, the existing limitation should be noted regarding the generalizability of our proposed approach for cross-species brain comparison. Our current model relies on homologous brain atlases, and the lack of comparable atlases for other species restricts its broader applicability. To address this limitation, future research should focus on developing prediction models that do not depend on atlases. For instance, 3D convolutional neural networks could be trained directly on raw MRI data for age prediction. These deep learning models may offer greater flexibility for cross-species applications once the training within species is complete. Such advancements would significantly enhance the model's adaptability and expand its potential for comparative neuroscience studies across a wider range of species.”

      Question 2: Algorithm of prediction model. In the method section, the authors only described how they chose features, but did no description about the algorithm (e.g., supporting vector regression) they used. Please add relevant descriptions to the methods.

      Thank you for your comment. We apologize for not providing sufficient details about the model training process in our initial submission. In our study, we used a linear regression model for prediction. We have provided more details regarding the algorithm of prediction model in our response to Reviewer #1. For your convenience, we have attached them below.

      For details on the algorithm of prediction model:

      “A linear regression model was adopted for intra- and inter-species age prediction. The linear regression model was built including the following three main steps: 1) Feature selection: a total of two steps are required to extract the final features. The first step is preliminary extraction. First, all the human or macaque participants were divided into 10-fold and 9-fold was used for model training and 1-fold for model test. The preliminary features were chosen by identifying the significantly age-associated features with p < 0.01 during calculating Pearson’s correlation coefficients between all the 260 features and actual ages of the 9-fold subjects. This process was repeated 100 times. Since we obtained not exactly the same preliminary features each time, we thus further analyzed the preliminary features using two methods to determine the final features: common features and minimum mean absolute error (min MAE). Common features are the preliminary features that were selected in all the 100 times during preliminary model training. The min MAE features were the preliminary features that with the smallest MAE value during the 100 times model test for predicting age. After the above feature selections, we obtained two sets of features: 62 macaque features and 225 human features (common features) and 117 macaque features and 239 human features (min MAE). In addition, to further exclude the influences of unequal number of features in human and macaque, we also selected the first 62 features in human and macaque to test the model prediction performances. 2) Model construction: we conducted age prediction linear model using 10-fold cross-validation based on the selected features for human and macaque separately. The linear model parameters are obtained using the training set data and applied to the test set for prediction. The above process is also repeated 100 times. 3) Prediction: with the above results, we obtained the optimal linear prediction models for human and macaque. Next, we performed intra-species and inter-species brain age prediction, i.e., human model predicted human age, human model predicted macaque age, macaque model predicted macaque age and macaque model predicted human age. Three sets of features (62 macaque features and 225 human features; 117 macaque features and 239 human features; 62 macaque features and 62 human features) were used to test the prediction models for cross-validation and to exclude effects of different number of features in human and macaque. In the main text, we showed the results of brain age prediction, brain developmental and evolutional analyses based on common features and the results obtained using other two types of features were shown in supplementary materials. The prediction performances were evaluated by calculating the Pearson’s correlation and MAE between actual ages and predicted ages.”

      Question 3: Sex difference. The sex difference results are strange to me. For example, in the second row of Figure Supplement 3A, different models show different correlation patterns, but why their Pearson's r is all equal to 0.3939? If they are only typo errors, please correct them. The authors claimed that they found no sex difference. However, the results in Figure Supplement 3 show that, the female seems to have poorer performance in predicting macaque age from the human model. Moreover, accumulated studies have reported sex differences in developing brains (Hines, 2011; Kurth et al., 2021). I think it is also worth discussing why sex differences can't be found in the evolutionary effect.

      Reference:

      Hines, M. (2011). Gender development and the human brain. Annual review of neuroscience, 34, 69-88.

      Kurth, F., Gaser, C., & Luders, E. (2021). Development of sex differences in the human brain. Cognitive Neuroscience, 12(3-4), 155-162.

      It is recommended that the authors explore different prediction models for different species. Maybe macaques are suitable for linear prediction models, and humans are suitable for nonlinear prediction models.

      Thank you for pointing the typos out and comments on sex difference. In Figure Supplement 3A, there are typos for Pearson’s r values and we have corrected it in updated Figure 2-figure supplement 3. For details, please see the updated Figure 2-figure supplement 3 and the following figure.

      Regarding gender effects, we acknowledge your point about the importance of gender differences in understanding brain evolution and development. In our study, however, our primary goal was to develop a robust age prediction model by maximizing the number of training samples. To mitigate gender-related effects in our main results, we incorporated gender information as a covariate in the ComBat harmonization process. We conducted a supplementary analysis just to demonstrate the stability of our proposed cross-species age prediction model by separating the data with gender variable not to investigate gender differences. Although our results demonstrated that gender-specific models could still significantly predict chronological age, we refrained from emphasizing these models' performance in gender-specific species comparisons due to difficulty in explanation for the predicted gender difference. For cross-species prediction, whether a higher Pearson’s r value between actual age and predicted age could reflect conserved evolution for male or female is not convincing. In addition, we adopted same not different prediction models for human and macaque aiming to establish a comparable model between species. Generally speaking, the nonlinear model could obtain better prediction accuracy than linear model. If different species used different models, it is unfair to perform cross-species prediction. Importantly, our study aimed to developed new index based on the same prediction models to quantify brain evolution difference, i.e., brain cross-species age gap (BCAP) instead of traditional statistical analyses. Different prediction models for different species may introduce bias causing by prediction methods and thus impacting the accuracy of BCAP. Thus, we adopted the linear model with best prediction performances for intra-species prediction in this study for cross-species prediction. Although our main goal in this study is to set up stable cross-species prediction model and the models built using either male or female subjects showed good performances during cross-species prediction, however, as your comment, how to unbiasedly characterize evolutionary gender differences using machining learning approaches needs to be further investigated since there are many reports about the gender difference in developing brain in humans. In fact, whether macaque brains have the same gender differences as humans is an interesting scientific question worth studying. Thus, we have included a discussion on how to use machining learning method to study the evolutionary gender difference in our revised manuscript.

      On page 15, lines 18-23 and page 16, line 1-4

      “Many studies have reported sex differences in developing human brains (Hines, 2011; Kurth, Gaser, & Luders, 2021), however, whether macaque brains have similar sex differences as humans is still unknown. We used machining learning method for cross-species prediction to quantify brain evolution and the established prediction models are stable even when only using male or female data, which may indicate that the proposed cross-species prediction model has no evolutionary sex difference. Although the stable prediction model can be established in either male or female participants for cross-species prediction, this indeed does not mean that there are no evolutionary sex differences due to lack of quantitative comparative analysis. In the future, we need to develop more objective, quantifiable and stable index for studying sex differences using machining learning methods to further identify sex differences in the evolved brain”

      Reviewer #3 (Public Review):

      The authors identified a series of WM and GM features that correlated with age in human and macaque structural imaging data. The data was gathered from the HCP and WA studies, which was parcellated in order to yield a set of features. Features that correlated with age were used to train predictive intra and inter-species models of human and macaque age. Interestingly, while each model accurately predicted the corresponding species age, using the macaque model to predict human age was more accurate than the inverse (using the human model to predict macaque age). In addition, the prediction error of the macaque model in predicting human age increased with age, whereas the prediction error of the human model predicting macaque age decreased with age.

      After elaboration of the predictive models, the authors classified the features for prediction into human-specific, macaque-specific and common to human and macaque, where they most notably found that macaque-only and common human-macaque areas were located mainly in gray matter, with only a few human-specific features found in gray matter. Furthermore, the authors found significant correlations between BCAP and picture vocabulary (positive correlation) test and visual sensitivity (negative correlation) test. Several white matter tracts (AF, OR, SLFII) were also identified showing a correlation with BCAP.

      Thank you for providing this excellent summary. We appreciate your thorough review and concise overview of our work.

      STRENGTHS AND WEAKNESSES

      The paper brings an interesting perspective on the evolutionary trajectories of human and non-human primate brain structure, and its relation to behavior and cognition. Overall, the methods are robust and support the theoretical background of the paper. However, the overall clarity of the paper could be improved. There are many convoluted sentences and there seems to be both repetition across the different sections and unclear or missing information. For example, the Introduction does not clearly state the research questions, rather just briefly mentions research gaps existing in the literature and follows by describing the experimental method. It would be desirable to clearly state the theoretical background and research questions and leave out details on methodology. In addition, the results section repeats a lot of what is already stated in the methods. This could be further simplified and make the paper much easier to read.

      In the discussion, authors mention that "findings about cortex expansion are inconsistent and even contradictory", a more convincing argument could be made by elaborating on why the cortex expansion index is inadequate and how BCAP is more accurate.

      Thank you for highlighting the interesting aspects of our work. We are sorry for the lack of the clarity in certain parts of our manuscript. Following your valuable suggestions, we have revised the manuscript to reduce unnecessary repetitions and provide a clearer statement of our research question in Introduction. Specifically, unlike previous analyses of human and macaque evolution using comparative neuroscience, this study embeds chronological axis into the cross-species evolutionary analysis process. It constructed a linear prediction model of brain age for humans and macaques, and quantitatively described the degree of evolution. The brain structure based cross-species age prediction model and cross-species brain age differences proposed in this study further eliminate the inherent developmental effects of humans and macaques on cross-species evolutionary comparisons, providing new perspectives and approaches for studying cross-species development. Regarding the existing repetition in the results section, we have simplified them for the clarity. Regarding the comparison between the cortex expansion index and BCAP, we would like to emphasize that the cortex expansion index was derived without fully considering cross-species alignment along the chronological axis. Specifically, this index does not correspond to a specific developmental stage, but rather focuses on a direct comparison between the two species. In contrast, BCAP addresses this limitation by utilizing a prediction model to establish alignment (or misalignment) between species at the individual level. Therefore, BCAP may serve as a more flexible and nuanced tool for cross-species brain comparison.

      STUDY AIMS AND STRENGTH OF CONCLUSIONS

      Overall, the methods are robust and support the theoretical background of the paper, but it would be good to state the specific research questions -even if exploratory in nature- more specifically. Nevertheless, the results provide support for the research aims.

      Thank you for excellent suggestion. We have revised our introduction to state the specific research question as mentioned above.

      IMPACT OF THE WORK AND UTILITY OF METHODS AND DATA TO THE COMMUNITY

      This study is a good first step in providing a new insight into the neurodevelopmental trajectories of humans and non-human primates besides the existing cortical expansion theories.

      Thank you for your encouraging comment.

      ADDITIONAL CONTEXT:

      It should be clearly stated both in the abstract and methods that the data used for the experiment came from public databases.

      Thank you for your suggestion. We have added this information in both abstract and method. For details, please see page 2, line 9 in Abstract section; page 16, lines 10-11 and page 17, lines 6-10 in Materials and Method section.

    1. Author response:

      Reviewer #1 - Public Review

      This report describes work aiming to delineate multi-modal MRI correlates of psychopathology from a large cohort of children of 9-11 years from the ABCD cohort. While uni-modal characterisations have been made, the authors rightly argue that multi-modal approaches in imaging are vital to comprehensively and robustly capture modes of large-scale brain variation that may be associated with pathology. The primary analysis integrates structural and resting-state functional data, while post-hoc analyses on subsamples incorporate task and diffusion data. Five latent components (LCs) are identified, with the first three, corresponding to p-factor, internal/externalising, and neurodevelopmental Michelini Factors, described in detail. In addition, associations of these components with primary and secondary RSFC functional gradients were identified, and LCs were validated in a replication sample via assessment of correlations of loadings.

      1.1) This work is clearly novel and a comprehensive study of associations within this dataset. Multi-modal analyses are challenging to perform, but this work is methodologically rigorous, with careful implementation of discovery and replication assessments, and primary and exploratory analyses. The ABCD dataset is large, and behavioural and MRI protocols seem appropriate and extensive enough for this study. The study lays out comprehensive associations between MRI brain measures and behaviour that appear to recapitulate the established hierarchical structure of psychopathology.

      We thank Reviewer 1 for appreciating our methods and findings, and we address their suggestions below:

      1.2) The work does have weaknesses, some of them acknowledged. There is limited focus on the strength of observed associations. While the latent component loadings seem reliably reproducible in the behavourial domain, this is considerably less the case in the imaging modalities. A considerable proportion of statistical results focuses on spatial associations in loadings between modalities - it seems likely that these reflect intrinsic correlations between modalities, rather than associations specific to any latent component.

      We appreciate the Reviewer’s comment, and minimized the reporting of correlations between the loadings from the different modalities in the revised Results (specifically subsections on LC1, LC2, and LC3). We now refer to Table S4 in each subsection for this information: “Spatial correlations between modality-specific loadings are reported in Supplementary file 1c.”

      For completeness, we report the intrinsic correlations between the different modalities in Supplementary file 1c (P.19):

      “Lastly, although the current work aimed to reduce intrinsic correlations between variables within a given modality through running a PCA before the PLS approach, intrinsic correlations between measures and modalities may potentially be a remaining factor influencing the PLS solution. We, thus, provided an additional overview of the intrinsic correlations between the different neuroimaging data modalities in the supporting results (Supplementary file 1c).”

      1.3) Assessment of associations with functional gradients is similarly a little hard to interpret. Thus, it is hard to judge the implications for our understanding of the neurophysiological basis of psychopathology and the ability of MRI to provide clinical tools for, say, stratification.

      We now provide additional context, including a rising body of theoretical and empirical work, that outlines the value of functional gradients and cortical hierarchies in the understanding of brain development and psychopathology. Please see P.26.

      “Initially demonstrated at the level of intrinsic functional connectivity (Margulies et al., 2016), follow up work confirmed a similar cortical patterning using microarchitectural in-vivo MRI indices related to cortical myelination (Burt et al., 2018; Huntenburg et al., 2017; Paquola et al., 2019), post-mortem cytoarchitecture (Goulas et al., 2018; Paquola et al., 2020, 2019), or post-mortem microarray gene expression (Burt et al., 2018). Spatiotemporal patterns in the formation and maturation of large-scale networks have been found to follow a similar sensory-to-association axis; moreover, there is the emerging view that this framework may offer key insights into brain plasticity and susceptibility to psychopathology (Sydnor et al., 2021). In particular, the increased vulnerability of transmodal association cortices in late childhood and early adolescence has been suggested to relate to prolonged maturation and potential for plastic reconfigurations of these systems (Paquola et al., 2019; Park et al., 2022b). Between mid-childhood and early adolescence, heteromodal association systems such as the default network become progressively more integrated among distant regions, while being more differentiated from spatially adjacent systems, paralleling the development of cognitive control, as well as increasingly abstract and logical thinking. [...] This suggests that neurodevelopmental difficulties might be related to alterations in various processes underpinned by sensory and association regions, as well as the macroscale balance and hierarchy of these systems, in line with previous findings in several neurodevelopmental conditions, including autism, schizophrenia, as well as epilepsy, showing a decreased differentiation between the two anchors of this gradient (Hong et al., 2019). In future work, it will be important to evaluate these tools for diagnostics and population stratification. In particular, the compact and low dimensional perspective of gradients may provide beneficial in terms of biomarker reliability as well as phenotypic prediction, as previously demonstrated using typically developing cohorts (Hong et al. 2020) On the other hand, it will be of interest to explore in how far alterations in connectivity along sensory-to-transmodal hierarchies provide sufficient graduality to differentiate between specific psychopathologies, or whether they, as the current work suggests, mainly reflect risk for general psychopathology and atypical development.”

      1.4) The observation of a recapitulation of psychopathology hierarchy may be somewhat undermined by the relatively modest strength of the components in the imaging domain.

      We thank the Reviewer for this comment, and now expressed this limitation in the revised Discussion, P.23.

      “The p factor, internalizing, externalizing, and neurodevelopmental dimensions were each associated with distinct morphological and intrinsic functional connectivity signatures, although these relationships varied in strength.”

      1.5) The task fMRI was assessed with a fairly basic functional connectivity approach, not using task timings to more specifically extract network responses.

      In the revised Discussion on P.24, we acknowledge that more in-depth analyses of task-based fMRI may have offered additional insights into state-dependent changes in functional architecture.

      “While the current work derived main imaging signatures from resting-state fMRI as well as grey matter morphometry, we could nevertheless demonstrate associations to white matter architecture (derived from diffusion MRI tractography) and recover similar dimensions when using task-based fMRI connectivity. Despite subtle variations in the strength of observed associations, the latter finding provided additional support that the different behavioral dimensions of psychopathology more generally relate to alterations in functional connectivity. Given that task-based fMRI data offers numerous avenues for analytical exploration, our findings may motivate follow-up work assessing associations to network- and gradient-based response strength and timing with respect to external stimuli across different functional states.”

      1.6) Overall, the authors achieve their aim to provide a detailed multimodal characterisation of MRI correlations of psychopathology. Code and data are available and well organised and should provide a valuable resource for researchers wanting to understand MRI-based neural correlates of psycho-pathology-related behavioural traits in this important age group. It is largely a descriptive study, with comparisons to previous uni-modal work, but without particularly strong testing of neuroscience hypotheses.

      We thank the Reviewer for recognizing the detail and rigor of data-driven study and extensive code and data documentation.

      Reviewer #2 - Public Review

      In "Multi-modal Neural Correlates of Childhood Psychopathology" Krebets et al. integrate multi-modal neuroimaging data using machine learning to delineate dissociable links to diverse dimensions of psychopathology in the ABCD sample. This paper had numerous strengths including a superb use of a large resource dataset, appropriate analyses, beautiful visualizations, clear writing, and highly interpretable results from a data-driven analysis. Overall, I think it would certainly be of interest to a general readership. That being said, I do have several comments for the authors to consider.

      We thank Dr Satterthwaite for the positive evaluation and helpful comments.

      2.1) Out-of-sample testing: while the permutation testing procedure for the PLS is entirely appropriate, without out-of-sample testing the reported effect sizes are likely inflated.

      As discussed in the editorial summary of essential revisions, we agree that out-of-sample prediction indeed provides stronger estimates of generalizability. We assess this by applying the PCA coefficients derived from the discovery cohort imaging data to the replication cohort imaging data. The resulting PCA scores and behavioral data were then z-scored using the mean and standard deviation of the replication cohort. The SVD weights derived from the discovery cohort were applied to the normalized replication cohort data to derive imaging and behavioral composite scores, which were used to recover the contribution of each imaging and behavioral variable to the LCs (i.e., loadings). Out-of-sample replicability of imaging (mean r=0.681, S.D.=0.131) and behavioral (mean r=0.948, S.D.=0.022) loadings was generally high across LCs 1-5. This analysis is reported in the revised manuscript (P.18).

      “Generalizability of reported findings was also assessed by directly applying PCA coefficients and latent components weights from the PLS analysis performed in the discovery cohort to the replication sample data. Out-of-sample prediction was overall high across LCs1-5 for both imaging (mean r=0.681, S.D.=0.131) and behavioral (mean r=0.948, S.D.=0.022) loadings.”

      2.2) Site/family structure: it was unclear how site/family structure were handled as covariates.

      Only unrelated participants were included in discovery and replication samples (see P.6). The site variable was regressed out of the imaging and behavioral data prior to the PLS analysis using the residuals from a multiple linear model which also included age, age2, sex, and ethnicity. This is now clarified on P.29:

      “Prior to the PLS analysis, effects of age, age2, sex, site, and ethnicity were regressed out from the behavioral and imaging data using a multiple linear regression to ensure that the LCs would not be driven by possible confounders (Kebets et al., 2021, 2019; Xia et al., 2018). The imaging and behavioral residuals of this procedure were input to the PLS analysis.”

      2.3) Anatomical features: I was a bit surprised to see volume, surface area, and thickness all evaluated - and that there were several comments on the correspondence between the SA and volume in the results section. Given that cortical volume is simply a product of SA and CT (and mainly driven by SA), this result may be pre-required.

      As suggested, we reduced the reporting of correlations between the loadings from the different modalities in the revised Results (specifically subsections on LC1, LC2, and LC3). Instead, we now refer to Table S4 in each subsection for this information: “Spatial correlations between modality-specific loadings are reported in Supplementary file 1c.”

      We also reran the PLS analysis while only including thickness and surface area as our structural metrics, to account for potential redundancy of these measures with volume. This analysis and associated findings are reported on P.36 and P.19:

      “As cortical volume is a result of both thickness and surface area, we repeated our main PLS analysis while excluding cortical volume from our imaging metrics and report the consistency of these findings with our main model.”

      “Third, to account for redundancy within structural imaging metrics included in our main PLS model (i.e., cortical volume is a result of both thickness and surface area), we also repeated our main analysis while excluding cortical volume from our imaging metrics. Findings were very similar to those in our main analysis, with an average absolute correlation of 0.898±0.114 across imaging composite scores of LCs 1-5.”

      2.4) Ethnicity: the rationale for regressing ethnicity from the data was unclear and may conflict with current best practices.

      We thank the Reviewer for this comment. In light of recent discussions on including this covariate in large datasets such as ABCD (e.g., Saragosa-Harris et al., 2022), we elaborate on our rationale for including this variable in our model in the revised manuscript on P.30:

      “Of note, the inclusion of ethnicity as a covariate in imaging studies has been recently called into question. In the present study, we included this variable in our main model as a proxy for social inequalities relating to race and ethnicity alongside biological factors (age, sex) with documented effects on brain organization and neurodevelopmental symptomatology queried in the CBCL.”

      We also assess the replicability of our analyses when removing race and ethnicity covariates prior to computing the PLS analysis and correlating imaging and behavioral composite scores across both models. We report resulting correlations in the revised manuscript (P.37, 19, and 27):

      “We also assessed the replicability of our findings when removing race and ethnicity covariates prior to computing the PLS analysis and correlating imaging and behavioral composite scores across both models.”

      “Moreover, repeating the PLS analysis while excluding this variable as a model covariate yielded overall similar imaging and behavioral composites scores across LCs to our original analysis. Across LCs 1-5, the average absolute correlations reached r=0.636±0.248 for imaging composite scores, and r=0.715±0.269 for behavioral composite scores. Removing these covariates seemed to exert stronger effects on LC3 and LC4 for both imaging and behavior, as lower correlations across models were specifically observed for these components.”

      “Although we could consider some socio-demographic variables and proxies of social inequalities relating to race and ethnicity as covariates in our main model, the relationship of these social factors to structural and functional brain phenotypes remains to be established with more targeted analyses.”

      2.5) Data quality: the authors did an admirable job in controlling for data quality in the analyses of functional connectivity data. However, it is unclear if a comparable measure of data quality was used for the T1/dMRI analyses. This likely will result in inflated effect sizes in some cases; it has the potential to reduce sensitivity to real effects.

      We agree that data quality was not accounted for in our analysis of T1w- and diffusion-derived metrics. We now accounted for T1w image quality by adding manual quality control ratings to the regressors applied to all structural imaging metrics prior to performing the PLS analysis, and reported the consistency of this new model with original findings. See P.36, P.19:

      “We also considered manual quality control ratings as a measure of T1w scan quality. This metric was included as a covariate in a multiple linear regression model accounting for potential confounds in the structural imaging data, in addition to age, age2, sex, site, ethnicity, ICV, and total surface area. Downstream PLS results were then benchmarked against those obtained from our main model.”

      “Considering scan quality in T1w-derived metrics (from manual quality control ratings) yielded similar results to our main analysis, with an average correlation of 0.986±0.014 across imaging composite scores.”

      As for diffusion imaging, we also regressed out effects of head motion in addition to age, age2, sex, site, and ethnicity from FA and MD measures and reported the consistency with our original results (P.36, P.19):

      “We tested another model which additionally included head motion parameters as regressors in our analyses of FA and MD measures, and assessed the consistency of findings from both models.”

      “Additionally considering head motion parameters from diffusion imaging metrics in our model yielded consistent results to those in our main analyses (mean r=0.891, S.D.=0.103; r=0.733-0.998).”

      Reviewer #3 - Public Review

      In this study, the authors utilized the Adolescent Brain Cognitive Development dataset to investigate the relationship between structural and functional brain network patterns and dimensions of psychopathology. They identified multiple components, including a general psychopathology (p) factor that exhibited a strong association with multimodal imaging features. The connectivity signatures associated with the p factor and neurodevelopmental dimensions aligned with the sensory-to-transmodal axis of cortical organization, which is linked to complex cognition and psychopathology risk. The findings were consistent across two separate subsamples and remained robust when accounting for variations in analytical parameters, thus contributing to a better understanding of the biological mechanisms underlying psychopathology dimensions and offering potential brain-based vulnerability markers.

      3.1) An intriguing aspect of this study is the integration of multiple neuroimaging modalities, combining structural and functional measures, to comprehensively assess the covariance with various symptom combinations. This approach provides a multidimensional understanding of the risk patterns associated with mental illness development.

      We thank the Reviewer for acknowledging the multimodal approach, and for the constructive suggestions.

      3.2) The paper delves deeper into established behavioral latent variables such as the p factor, internalizing, externalizing, and neurodevelopmental dimensions, revealing their distinct associations with morphological and intrinsic functional connectivity signatures. This sheds light on the neurobiological underpinnings of these dimensions.

      We are happy to hear the Reviewer appreciates the gain in understanding neural underpinnings of dimensions of psychopathology resulting from the current work.

      3.3) The robustness of the findings is a notable strength, as they were validated in a separate replication sample and remained consistent even when accounting for different parameter variations in the analysis methodology. This reinforces the generalizability and reliability of the results.

      We appreciate that the Reviewer found our robustness and generalizability assessment convincing.

      3.4) Based on their findings, the authors suggest that the observed variations in resting-state functional connectivity may indicate shared neurobiological substrates specific to certain symptoms. However, it should be noted that differences in resting-state connectivity between groups can stem from various factors, as highlighted in the existing literature. For instance, discrepancies in the interpretation of instructions during the resting state scan can influence the results. Hence, while their findings may indicate biological distinctions, they could also reflect differences in behavior.

      For the ABCD dataset, resting-state fMRI scans were based on eyes open and passive viewing of a crosshair, and are thus homogenized. We acknowledge, however, that there may still be state-to-state fluctuations contributing to the findings, and this is now discussed in the revised Discussion, on P.28. Note, however, that prior literature has generally also suggested rather modest impacts of cognitive and daily variation on resting-state functional networks, compared to much more dominating inter-individual and inter-group factors.

      “Finally, while prior research has shown that resting-state fMRI networks may be affected by differences in instructions and study paradigm (e.g., with respect to eyes open vs closed) (Agcaoglu et al., 2019), the resting-state fMRI paradigm is homogenized in the ABCD study to be passive viewing of a centrally presented fixation cross. It is nevertheless possible that there were slight variations in compliance and instructions that contributed to differences in associated functional architecture. Notably, however, there is a mounting literature based on high-definition fMRI acquisitions suggesting that functional networks are mainly dominated by common organizational principles and stable individual features, with substantially more modest contributions from task-state variability (Gratton et al. 2018). These findings, thus, suggest that resting-state fMRI markers can serve as powerful phenotypes of psychiatric conditions, and potential biomarkers (Abraham et al., 2017; Gratton et al., 2020; Parkes et al., 2020).”

      3.5) The authors conducted several analyses to investigate the relationship between imaging loadings associated with latent components and the principal functional gradient. They found several associations between principal gradient scores and both within- and between-network resting-state functional connectivity (RSFC) loadings. Assessing the analysis presented here proves challenging due to the nature of relating loadings, which are partly based on the RSFC, to gradients derived from RSFC. Consequently, a certain level of correlation between these two variables would be expected, making it difficult to determine the significance of the authors' findings. It would be more intriguing if a direct correlation between the composite scores reflecting behavior and the gradients were to yield statistically significant results.

      We thank the Reviewer for the comment, and agree that investigating gradient-behavior relationships could offer additional insights into the neural basis of psychiatric symptomatology. However, the current analysis pipeline precludes this direct comparison which is performed on a region-by-region basis across the span of the cortical gradient. Indeed, the behavioral loadings are provided for each CBCL item, and not cortical regions.

      The Reviewer also evokes concerns of potential circularity in our analysis, as we compared imaging loadings, which are partially based on RSFC, and gradient values generated from the same RSFC data. In response to this comment, we cross-validated our findings using an RSFC gradient derived from an independent dataset (HCP), showing highly consistent findings to those presented in the manuscript. This correlation is now reported in the Results section P.15.

      “A similar pattern of findings was observed when cross-validating between- and within-network RSFC loadings to a RSFC gradient derived from an independent dataset (HCP), with strongest correlations seen for between-network RSFC loadings for LC1 and LC3 (LC1: r=0.50, pspin<0.001; LC3: r=0.37, pspin<0.001).”

      We furthermore note similar correlations between imaging loadings and T1w/T2w ratio in the same participants, a proxy of intracortical microstructure and hierarchy (Glasser et al., 2011). These findings are now detailed in the revised Results, P.15-16:

      “Of note, we obtain similar correlations when using T1w/T2w ratio in the same participants, a proxy of intracortical microstructure and hierarchy (Glasser et al., 2011). Specifically, we observed the strongest association between this microstructural marker of the cortical hierarchy and between-network RSFC loadings related to LC1 (r=-0.43, pspin<0.001).”

      3.6) Lastly, regarding the interpretation of the first identified latent component, I have some reservations. Upon examining the loadings, it appears that LC1 primarily reflects impulse control issues rather than representing a comprehensive p-factor. Furthermore, it is worth noting that within the field, there is an ongoing debate concerning the interpretation and utilization of the p-factor. An insightful publication on this topic is "The p factor is the sum of its parts, for now" (Fried et al, 2021), which explains that the p-factor emerges as a result of a positive manifold, but it does not necessarily provide insights into the underlying mechanisms that generated the data.

      We thank the Reviewer for this comment, and added greater nuance into the discussion of the association to the p factor. We furthermore discuss some of the ongoing debate about the use of the p factor, and cite the recommended publication on P.27.

      “Other factors have also been suggested to impact the development of psychopathology, such as executive functioning deficits, earlier pubertal timing, negative life events (Brieant et al., 2021), maternal depression, or psychological factors (e.g., low effortful control, high neuroticism, negative affectivity). Inclusion of such data could also help to add further mechanistic insights into the rather synoptic proxy measure of the p factor itself (Fried et al., 2021), and to potentially assess shared and unique effects of the p factor vis-à-vis highly correlated measures of impulse control.”

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for going through our manuscript and providing valuable feedback. We are grateful to all 3 reviewers for describing our findings as important and valuable, well-designed and robust, and of value to the Parkinson's and Crohn's disease communities studying LRRK2. Below we detail a point-by-point response to the reviewers.

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      The paper by Dikovskaya and collaborators investigated the activitiy and expression of LRRK2 in different subtypes of splenic and intestinal immune cells, taking advantage of a novel GFP-Lrrk2 knockin mouse. Interestingly, they found that T-cell-released IL-4 stimulates Lrrk2 expression in B cells. I have a few comments and suggestions for the authors. 1) Figure 1C. LRRK2 KO cells display residual Rab10 phosphorylation. Do the authors have any idea of which kinase other than LRRK2 could be involved in this phosphorylation?

      As far as we are aware no other kinase is known to phosphorylate Rab10 at T73 in vivo. In vitro, recombinant Rab10 can be phosphorylated by MST3 at this site (Knebel A. et al, protocols.io https://dx.doi.org/10.17504/protocols.io.bvjxn4pn), but its relevance in vivo or in cells has not been shown. It is possible that the residual band recognised by anti-pT73 Rab10 ab in splenocytes is unspecific background, as it is mainly seen in LRRK2 KO spleen cells and not in other tissues. But to be certain that our assay assesses LRRK2-dependent Rab10 phosphorylation, we have always compared with the MLi-2 control.

      2) Since there are no good antibodies for IF/IHC as pointed by the authors, the GFP-Lrrk2 mouse gives the opportunity to check endogenous LRRK2 localization, i.e. in cells untreated or treated with IL-4 or other cytokines. Also, does endogenous GFP-LRRK2 accumulate into filaments/puncta upon MLi2 inhibition? The relocalization into filaments of inhibited LRRK2 has been observed in overexpression but not under endogenous expression. This analysis would be interesting also in light of the observed side effect of type-I inhibitors.

      We thank the reviewer for this suggestion. We will attempt a super-resolution microscopy using Airyscan with isolated B-cells treated with cytokine and/or LRRK2 inhibitor to address this question.

      3) Figure 5. The authors need to label more clearly the graphs referring to wt mice versus GFP-Lrrk2 KI mice.

      We have now labelled the panels referring to the WT mice only with "WT mice", to distinguish them from the other panels that incorporate data from both EGFP-Lrrk2 mice and their WT littermates used as a background.

      They should also replace GFP-LRRK2 with GFP-Lrrk2 since they edited the endogenous murine gene.

      Thank you, we have corrected it, and also the other mouse genotypes.

      4) In the material and methods MLi-2 administration in mice is indicated at 60 mg/kg for 2 hr whereas in suppl. figure 5 the indicated dose is 30 mg/kg. Please correct with the actual dose used.

      Thank you, we have corrected the mistake.

      5) The discovery of IL-4 as a Lrrk2 activator in B cells is a very interesting and novel finding. The authors could take advantage of the GFP tag to investigate LRRK2 interactome upon IL-4 stimulation (optional). Also, is the signaling downstream of IL-4 attenuated in Lrrk2 KO cells?

      We thank the reviewer for these interesting suggestions. The role of LRRK2 in IL-4 activated B-cells is currently under active research in the lab.

      Reviewer #1 (Significance (Required)):

      The manuscript is well designed and organized, and the experimental approaches are robust. These results are significant for the field as they add additional layers in the complex regulation and regulatory roles of LRRK2 in immunity, with implication for inflammatory disorders and Parkinson's disease.

      We thank the reviewer for their positive comments and for recognising our efforts to provide some clarity to a complex field.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      The authors present a flow cytometry methodology to assess LRRK2 expression and pathway markers in mouse models and explore LRRK2 in splenic and intestinal immune cells. This is a highly valuable study given the emerging understanding that LRRK2 pathway activity in peripheral tissues may be of crucial importance to Parkinson's disease and Crohn's disease. P8 : the authors state that their results indicate 'that the effects of LRRK2-R1441C mutation and inflammation on LRRK2 activity represent two different parallel pathways'. This seems like an overinterpretation as pathway suggests the presence of additional partners in the pathway while R1441C is a LRRK2 intrinsic modification. The results can equally be explained by synergistic effects between both activation mechanisms (mutant and inflammation).

      We agree with the reviewer, and have added this into the text. The sentence now reads "suggesting that the LRRK2-R1441C mutation and inflammation have different impacts on LRRK2 activity, either in parallel or in synergy."

      Methods and experiment descriptions in results : the authors appear to use the terms anti-CD3 stimulation and CD3 stimulation interchangeably, although it is not always clear in the text that these are synonymous. This should be clarified.

      We thank reviewer for pointing out this error on our part. We have made the necessary changes to always refer to the stimulation as anti-CD3.

      One major observation in this paper is that LRRK2 is not detected in gut epithelial cells as previously has been reported. It would be useful to comment on any differences between the presented protocol and the previous reports, in particular relating to the antigen retrieval step. In order to reinforce the finding, it would be useful to include in situ hybridization data that could further strengthen the observations of which cellular subtypes express LRRK2 and which do not. Indeed, while the KO control shows that there is an unacceptable high non-specific staining, it does not prove absence of expression. Also, can any conclusions be made about expression of LRRK2 in neural cells of the gut? This important information on LRRK2 detection in gut should be mentioned in the abstract and highlighted in the discussion.

      We thank the reviewer for pointing this out. In fact, we think the observation that LRRK2 is not detected in epithelial cells is so important that we have a separate manuscript exploring this point. Please see 1. Tasegian, A. et al.https://doi.org/10.1101/2024.03.07.582590 (2024). In this manuscript we have explored the expression of LRRK2 in human and murine intestinal epithelial cells using qPCR. Although we do not have in situ hybridization data, we believe that using both the EGFP-LRRK2 and the pRab10 flow cytometry, as well as qPCR and proteomics on selected cell types, corroborates our findings on the cell types that express LRRK2. We did not analyse LRRK2 expression in the neural cells of the gut, as the focus was on the immune cells, however we hope that others will use the tools developed here to explore this further.

      The authors mention in the discussion that they 'show for the first time that eosinophils also express active LRRK2 at levels comparable to B-cells and DCs.' The relevance of this finding should be further developed. Why is this important?

      We thank the reviewer for this point. We don't know how LRRK2 is important in these cells. However, as the role of LRRK2 in eosinophils and neutrophils has not yet been explored and both cell types play important roles in IBD, we think it is important to point out. We have now added a sentence to the discussion highlighting the importance of eosinophils in IBD. "Since eosinophils have recently been implicated as key player in intestinal defense and colitis(Gurtner et al, 2022), it will be interesting to evaluate LRRK2 functions in these cells."

      In the isolation of lamina propria cells, what efforts were made to characterize the degree of purification of the lamina propria cells compared to cells of other gut wall layers such as epithelium, muscularis mucosa, or deeper layers? Please specify.

      Isolation of lamina propria cells is a very well-established process (LeFrancois and Lycke, 'Isolation of Mouse Small Intestinal Intraepithelial Lymphocytes, Peyer's Patch, and Lamina Propria Cells.' Curr. Protocols in Immunology 2001), where we extensively wash off the epithelial layer before digesting the tissue for the LP. After the digestion the muscle and wall of the gut are still intact, so we do not get any contamination with other deeper layers. The subsets of cells we find in the LP are in line with isolations from other labs.

      Minor comments Figure 5G, for the graphs indicating LRRK2 activity and LRRK2 phosphorylation, the specific measures should be specified in the graph titles to avoid any ambiguity (pT73-Rab10, pS935-LRRK2).

      We have added the specifications to the new version of the figure.

      Suppl figure 1 : please specify the figure label and abbreviation AF568 in the legend. Suppl figure 2 : please specify the figure label and abbreviation anti-rb in the legend

      Thank you, we added the abbreviations to the legends. The Figure labels for both figures have been already included at the top of figure legends.

      Reviewer #2 (Significance (Required)):

      The authors present a flow cytometry methodology to assess LRRK2 expression and pathway markers in mouse models and explore LRRK2 in splenic and intestinal immune cells. This is a highly valuable study given the emerging understanding that LRRK2 pathway activity in peripheral tissues may be of crucial importance to Parkinson's disease and Crohn's disease.

      We thank the reviewer for recognising the value of this study.

      Reviewer #3

      Evidence, reproducibility and clarity

      The paper describes a set of experiments to analyse LRRK2 activity in tissues and despite it has very important findings and technical developments is largely descriptive. It does look like a collection of experiments more than a defined hypothesis and experiments to address that.

      We thank the reviewer for recognising the importance of our findings and the technical developments. We agree that the paper's focus is to describe where LRRK2 is expressed in immune cells, and in which cells is it active or activated after inflammation in a hypothesis-free unbiased manner. We believe this is important data to share as a resource for the wider LRRK2 community and we will submit the manuscript as a Resource.

      The flow cytometry assay of the first part is a great technical challenge and represents the establishment of a potentially very useful tool for the field. It would have been important to test other organs, either as controls or for example because of their relevance e.g. lungs. This first part is disconnected from the second part below.

      We thank the reviewer for pointing out that the pRab10 assay would be useful to apply to other organs too. Since we are interested in the role of LRRK2 in IBD, we had focused on applying the pRab10 assay on intestinal tissue, with spleens also analysed as major lymphoid organ and a source of immune cells that can translocate to the gut in inflammation. We hope that the publication of this method would allow other researchers to analyse other tissues in the future.

      The authors generated a new mouse KI mouse expressing EGFP-LRRK2 and show data the levels of LRRK2 expression are reduced in tissues at different degrees and established a flow cytometry assay to measure LRRK2 expression by monitoring the GFP signal. Interestingly they found that expression does not correlate with activity (as measured by phospho-Rabs). I suggest taking this part out as it breaks the flow of the paper. If data using this mouse is included, then microscopy should be included to complement the flow cytometry data. I understand the mice were used later with the anti-CD3 treatment, but it is very confusing that some experiments are done with EGFP-LRRK2 mice and others not. It does look in general like the mice do not behave as wild types and this is an important caveat. Without microscopy of the tissues or even cells (Figure 4) is hard to conclude much about these experiments.

      We thank the reviewer for this point and would like to explain. It is true that in Suppl Figure 5, we show reduction of LRRK2 signal in the EGFP-Lrrk2-KI mice. However, based on immunoblotting, a significant reduction in EGFP-LRRK2 expression levels was seen only in the brain, but not in the tissues we analysed, that is the spleen and the intestine. Further, we have shown clearly using proteomics (Fig. 3D and 5E), that the GFP signal in immune cells correlates very well with the WT LRRK2 expression. Therefore, we think that the GFP signal in these mice reflects WT LRRK2 expression pattern. Further, despite the limitations of reduced kinase activity that we thoroughly describe, we think this model is very useful since no antibodies work to stain for LRRK2 in mice. We therefore respectfully disagree with this reviewer that the EGFP-LRRK2 data should be taken out, as it has proven to be an invaluable tool to measure and track changes in endogenous LRRK2 expression. Moreover, we think the fact that LRRK2 expression does not correlate with levels of activity, that is, LRRK2 is more active in some immune cells than in others, is a very important finding that evidences the cell-specific regulation of LRRK2 activity beyond its expression level.

      We tried but failed to visualize the EGFP-LRRK2 signal using fluorescence microscopy in the tissue. This is most likely due to the low expression of LRRK2 (proteomics data suggests that even neutrophils express less than 9000 copies), confounded further by the high background autofluorescence in tissues, especially in the gut. We now explain the lack of tissue images from the EGFP-LRRK2 mice in the text. However, we can visualize the EGFP-LRRK2 in B cells, and we will provide these images in a revised version of the manuscript.

      We have also added the following paragraph to the discussion:

      "We complemented the pRab10 assay with the development of the EGFP-Lrrk2-KI reporter mouse. Although the reporter was initially designed as a fluorescent tracker for imaging LRRK2 localisation in cells and tissues, the low expression of LRRK2, combined with high and variable autofluorescence in tissues precluded its use for microscopy. Even in neutrophils, which express highest level of LRRK2 among immune cells, there are less than 9000 copies of LRRK2 per cell (Sollberger et al, 2024), making it difficult to identify localization. However, the EGFP signal was sufficient for flow cytometry-based measurements, where background autofluorescence of each cell type was taken into account and subtracted."

      Then the authors show that LRRK2 expression and activity is different in different cell types and depends on inflammation. The anti-CD3 strategy to induce inflammation is very different from physiological inflammation such as sepsis and LPS stimulation, so experiments with other stimuli could be important here to contribute to the message of inflammatory trigger of LRRK2 activation and decoupling of cell type.

      We thank the reviewer for this suggestion. We used the anti-CD3 model as it also causes intestinal inflammation, and mimics T-cell cytokine storms that happens in many diseases. However, for the revisions we will also test another model of inflammation as suggested, such as LPS stimulation, to measure how inflammation affects LRRK2 expression and activity.

      The IL-4 data is intriguing but too preliminary. The lack of strong effect of IFN-gamma is expected as the promoter of LRRK2 in mice and humans is different and human cells responds much better with regards to LRRK2 expression after IFN-gamma stimulation.

      We are confused by what the reviewer means by saying the IL-4 data is preliminary. We have shown by flow cytometry, immunoblotting, qPCR and proteomics that IL-4 induced LRRK2 expression in B-cells. So we are uncertain as to how else this can be shown. As to the effect of IFNγ on LRRK2 expression, it may indeed be that human cells respond better than murine cells. Importantly, the IL-4 ability to induce LRRK2 in B-cells is a novel and important finding, regardless of the effects of IFNγ.

      Reviewer #3 (Significance (Required))

      The paper describes a set of experiments to analyse LRRK2 activity in tissues and despite it has very important findings and technical developments is largely descriptive. It does look like a collection of experiments more than a defined hypothesis and experiments to address that.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Revisions Round 1

      Reviewer #1

      We thank the reviewer for their careful reading of our manuscript and have taken all of their grammatical corrections into account.

      Reviewer #2 (Public Review): 

      Weaknesses: 

      The paper contains multiple instances of non-scientific language, as indicated below. It would also benefit from additional details on the cryo-EM structure determination in the Methods and inclusion of commonly accepted requirements for cryo-EM structures, like examples of 2D class averages, raw micrographs, and FSC curves (between half-maps as well as between rigid-body fitted (or refined) atomic models of the different polymorphs and their corresponding maps). In addition, cryo-EM maps for the control experiments F1 and F2 should be presented in Figure 9.

      We tried to correct the non-scientific language and have included the suggested data on the Cryo-EM analyses including new Figures 11-17.  We did not collect data on the sample used for the seeds in the cross seeding experiments because we had already confirmed in multiple datasets that the conditions in F1 and F2 reproducibly produce fibrils of Type 1 and Type 3, respectively. We have now analyzed cryo-EM data for 6 more samples at pH 7.0 and found that several kinds of polymorphs (Types 1A, 1M, 2A, 2B and 5) are accessible at this pH, however the Type 3 polymorphs are not formed at pH 7.0 under the conditions that we used for aggregation.

      Reviewer #2 (Recommendations For The Authors): 

      - remove unscientific language: "it seems that there are about as many unique atomic-resolution structures of these aggregates as there are publications describing them"   

      We have rephrased this sentence.

      - for same reason, remove "Obviously, " 

      Done

      - What does this mean? “polymorph-unspecific” 

      Rephrased as non-polymorph-specific

      - What does this mean? "shallow amyloid energy hypersurface"  

      By “shallow hypersurface” we mean that the minimum of the multi-dimensional function that describes the energy of the amyloid is not so deep that subtle changes to the environment will not favor another fold/energy minimum. We have left the sentence because while it may not be perfect, it is concise and seems to get the point across.

      - "The results also confirm the possibility of producing disease-relevant structure in vitro." -> This is incorrect as no disease-relevant structure was replicated in this work. Use another word like “suggest”.

      We have changed to “suggest” as suggested.

      - Remove "historically" 

      Done

      - Rephrase “It has long been understood that all amyloids contain a common structural scaffold” 

      Changed to “It has long been established that all amyloids contain a common structural scaffold..” 

      - "Amyloid polymorphs whose differences lie in both their tertiary structure (the arrangement of the beta-strands) and the quaternary structure (protofilamentprotofilament assembly) have been found to display distinct biological activities [8]" -> I don't think this is true, different biological activities of amyloids have never been linked to their distinct structures.  

      We have added 5 new references (8-12) to support this sentence.

      - Reference 10 is a comment on reference 9; it should be removed. Instead, as for alpha-synuclein, all papers describing the tau structures should be included.  

      We have removed the reference, but feel that the addition of all Tau structure references is not merited in this manuscript since we are not comparing them.

      - Rephrase: "is not always 100% faithful"

      Removed “100%”

      - What is pseudo-C2 symmetry? Do the authors mean pseudo 2_1 symmetry (ie a 2-start helical symmetry)?

      Thank for pointing this out.  We did indeed mean pseudo 21 helical symmetry.  

      - Re-phrase: "alpha-Syn's chameleon-like behavior" 

      We have removed this phrase.

      - "In the case of alpha-Syn, the secondary nucleation mechanism is based on the interaction of the positively charged N-terminal region of monomeric alpha-Syn and the disordered, negatively charged C-terminal region of the alpha-Syn amyloid fibrils [54]" -> I would say the mechanisms of secondary nucleation are not that well understood yet, so one may want to tune this down a bit. 

      We have changed this to “mechanism has been proposed to be”

      - The paragraphs describing experiments by others are better suited for a Discussion rather than a Results section. Perhaps re-organize this part? 

      We have left the text intact as we are using a Results and Discussion format.

      - A lot of information about Image processing seems to be missing: what steps were performed after initial model generation? 

      We have added more details in the methods section on the EM data processing and model analysis.

      - Figure 1: Where is Type 4 on the pH scale?

      We have adjusted the Fig 1 legend to clarify that pH scale is only applicable to the structures presented in this manuscript. 

      - Figure 2: This might be better incorporated as a subpanel of Figure 1.

      We agree that this figure is somewhat of a loner on its own and we only added it in order to avoid confusion with the somewhat inconsistent naming scheme used for the Type 1B structure. However, we prefer to leave it as a separate figure so that it does not get dilute the impact of figure 1.

      - Figure 3: What is the extra density at the bottom of Type 3B from pH 5.8 samples 1 and 2. pH 5.8 + 50mM NaCl (but not pH 5.8 + 100 mM NaCl)? Could this be an indication of a local minimum and the pH 5.8 + 100 mM NaCl structure is correct? Or is this a real difference between 0/50mM NaCl and 100 mM NaCl? 

      We did not see the extra density to which the reviewer is referring, however the images used in this panel are the based on the output of 3D-classification which is more likely to produce more artifacts than a 3D refinement. With this in mind, we did not see any significant differences in the refined structures and therefore only deposited the better quality map and model for each of the polymorph types.

      - Figure 3: To what extent is Type 3B of pH 6.5 still a mixture of different types? The density looks poor. In general, in the absence of more details about the cryo-EM maps, it is hard to assess the quality of the structures presented.  

      In order to improve the quality of the images in this panel, a more complete separation of the particles from each polymorph was achieved via the filament subset selection tool in RELION 5. In each case, an unbiased could be created from the 2D classes via the relion_helix_inimodel2D program, further supporting the coexistence of 4 polymorphs in the pH 6.5 sample. The particles were individually refined to produce the respective maps that are now used in this figure.

      - Many references are incorrect, containing "Preprint at (20xx)" statements.

      This has been corrected.  

      Reviewer #3 (Public Review): 

      Weaknesses: 

      (1) The authors reveal that both Type 1 monofilament fibril polymorph (reminiscent of JOSlike polymorph) and Type 5 polymorph (akin to tissue-amplified-like polymorph) can both form under the same condition. Additionally, this condition also fosters the formation of flat ribbon-like fibril across different batches. Notably, at pH 5.8, variations in experimental groups yield disparate abundance ratios between polymorph 3B and 3C, indicating a degree of instability in fibrillar formation. The variability would potentially pose challenges for replicability in subsequent research. In light of these situations, I propose the following recommendations: 

      (a) An explicit elucidation of the factors contributing to these divergent outcomes under similar experimental conditions is warranted. This should include an exploration of whether variations in purified protein batches are contributing factors to the observed heterogeneity.

      We are in complete agreement that understanding the factors that lead to polymorph variability is of utmost importance (and was the impetus for the manuscript itself). However the number of variables to explore is overwhelming and we will continue to investigate this in our future research. Regarding the variability between batches of purified protein, we also think that this could be a factor in the polymorph variability observed for otherwise “identical” aggregation conditions, particularly at pH 7 where the largest variety of polymorphs have been observed. However, even variation between identical replicates (samples created from the same protein solution and simply aggregated simultaneously in separate tubes) can lead to different outcomes (see datasets 15 and 16 in the revised Table 1) suggesting that there are stochastic processes that can determine the outcome of an individual aggregation experiment. While our data still indicates that Type 1,2 and 3 polymorphs are strongly selected by pH, the selection between interface variants 3B vs. 3C and 2A vs. 2B might also be affected by protein purity. Our standard purification protocol produces a single band by coomassie-stained SDS-PAGE however minor truncations and other impurities below a few percent would go undetected and, given the proposed roles of the N and C-termini in secondary nucleation, could have a large effect on polymorph selection and seeding. In line with the reviewer’s comments we now include a batch number for each EM dataset. While no new conclusions can be drawn from the inclusion of this additional data, we feel that it is important to acknowledge the possible role of batch to batch variability. 

      (b) To enhance the robustness of the conclusions, additional replicates of the experiments under the same condition should be conducted, ideally a minimum of three times.  

      The pH 5.8 conditions that yield Type 3 fibrils has already been repeated several times in the original manuscript. Since the pH 7.4 conditions produce the most common a-Syn polymorph (Type 1A) and were produced twice in this manuscript (once as an unseeded and once as a cross-seeded fibrilization) we decided to focus on the intermediate condition where the most variability had been seen (pH 7.0). The revised table 1 now has 6 new datasets (11-16) representing 6 independent aggregations at pH 7.0 starting from two different protein purification batches. The results is that we now produce the type 2A/B polymorphs in three samples and in two of these samples we once again observed the type 1M polymorph.  The other samples produced Type 1A or non-twisted fibrils.

      (c) Further investigation into whether different polymorphs formed under the same buffer condition could lead to distinct toxicological and pathology effects would be a valuable addition to the study.  

      The correlation of toxicity with structure would in principle be interesting. However the Type 1 and Type 3 polymorphs formed at pH 5.8 and 7.4 are not likely to be biologically relevant. The pH 7 polymorphs (Type 5 and 1M) would be more interesting because they form under the same conditions and might be related to some disease relevant structures. Still, it is rare that a single polymorph appears at 7.0 (the Type 5 represented only 10-20% of the fibrils in the sample and the Type 1M also had unidentified double-filament fibrils in the sample). We plan to pursue this line of research and hope to include it in a future publication.

      (2) The cross-seeding study presented in the manuscript demonstrates the pivotal role of pH conditions in dictating conformation. However, an intriguing aspect that emerges is the potential role of seed concentration in determining the resultant product structure. This raises a critical question: at what specific seed concentration does the determining factor for polymorph selection shift from pH condition to seed concentration? A methodological robust approach to address this should be conducted through a series of experiments across a range of seed concentrations. Such an approach could delineate a clear boundary at which seed concentration begins to predominantly dictate the conformation, as opposed to pH conditions. Incorporating this aspect into the study would not only clarify the interplay between seed concentration and pH conditions, but also add a fascinating dimension to the understanding of polymorph selection mechanisms.

      A more complete analysis of the mechanisms of aggregation, including the effect of seed concentration and the resulting polymorph specificity of the process, are all very important for our understanding of the aggregation pathways of alpha-synuclein and are currently the topic of ongoing investigations in our lab.

      Furthermore, the study prompts additional queries regarding the behavior of cross-seeding production under the same pH conditions when employing seeds of distinct conformation. Evidence from various studies, such as those involving E46K and G51D cross-seeding, suggests that seed structure plays a crucial role in dictating polymorph selection. A key question is whether these products consistently mirror the structure of their respective seeds. 

      We thank the reviewer for reminding us to cite these studies as a clear example of polymorph selection by cross-seeding. Unfortunately, it is not 100% clear from the G51D cross seeding manuscript (https://doi.org/10.1038/s41467-021-26433-2) what conditions were used in the cross-seeding since different conditions were used for the seedless wild-type and mutant aggregations… however it appears that the wildtype without seeds was Tris pH 7.5 (although at 37C the pH could have dropped to 7ish) and the cross-seeded wild-type was in Phosphate buffer at pH 7.0. In the E46K cross-seeding manuscript, it appears that pH 7.5 Tris was used for all fibrilizations (https://doi.org/10.1073/pnas.2012435118).  In any event, both results point to the fact that at pH 7.0-7.5 under low-seed conditions (0.5%) the Type 4 polymorph can propagate in a seed specific manner.

      (3) In the Results section of "The buffer environment can dictate polymorph during seeded nucleation", the authors reference previous cell biological and biochemical assays to support the polymorph-specific seeding of MSA and PD patients under the same buffer conditions. This discussion is juxtaposed with recent research that compares the in vivo biological activities of hPFF, ampLB as well as LB, particularly in terms of seeding activity and pathology. Notably, this research suggests that ampLB, rather than hPFF, can accurately model the key aspects of Lewy Body Diseases (LBD) (refer to: https://doi.org/10.1038/s41467-023-42705-5). The critical issue here is the need to reconcile the phenomena observed in vitro with those in in-vivo or in-cell models. Given the low seed concentration reported in these studies, it is imperative for the authors to provide a more detailed explanation as to why the possible similar conformation could lead to divergent pathologies, including differences in cell-type preference and seeding capability.  

      We thank the reviewer for bring this recent report to our attention. The findings that ampLB and hPFF have different PK digestion patterns and that only the former is able to model key aspects of Lewy Body disease are in support of the seed-specific nature of some types of alpha-synuclein aggregation.  We have added this to the discussion regarding the significant role that seed type and seed conditions likely play in polymorph selection.

      (4) In the Method section of "Image processing", the authors describe the helical reconstruction procedure, without mentioning much detail about the 3D reconstruction and refinement process. For the benefit of reproducibility and to facilitate a deeper understanding among readers, the authors should enrich this part to include more comprehensive information, akin to the level of detail found in similar studies (refer to: https://doi.org/10.1038/nature23002).

      As also suggested by reviewer #2, we have now added more comprehensive information on the 3D reconstruction and refinement process.

      (5) The abbreviation of amino acids should be unified. In the Results section "On the structural heterogeneity of Type 1 polymorphs", the amino acids are denoted using three-letter abbreviation. Conversely, in the same section under "On the structural heterogeneity of Type 2 and 3 structures", amino acids are abbreviated using the one-letter format. For clarity and consistency, it is essential that a standardized format for amino acid abbreviations be adopted throughout the manuscript.

      That makes perfect sense and had been corrected.

      Reviewing Editor: 

      After discussion among the reviewers, it was decided that point 2 in Reviewer #3's Public Review (about the experiments with different concentrations of seeds) would probably lie outside the scope of a reasonable revision for this work. 

      We agree as stated above and will continue to work on this important point.

      Revisions Round 2

      Reviewer #2 (Public Review): 

      I do worry that the FSC values of model-vs-map appear to be higher than expected from the corresponding FSCs between the half-maps (e.g. see Fig 13). The implication of this observation is that the atomic models may have been overfitted in the maps, which would have led to a deterioration of their geometry. A table with rmsd on bond lengths, angles, etc would probably show this. In addition, to check for overfitting, the atomic model for each data set could be refined in one of the half-maps, and then that same model could be used to calculate 2 FSC model-vs-map curves: one against the half-map it was refined in and one against the other half-map. Deviations between these two curves are an indication of overfitting. 

      Thank you for the recommendations for model validation.  We have added the suggested statistics to Table 2 and performed the suggested model fitting to one of the half-maps and plotted 3 FSC model-vs-map curves: one for each half-map versus the model fit against only one half map and one for the model fit against the full map. We feel that the degree of overfitting is reasonable and does not  significantly impact the quality of the models. 

      In addition, the sudden drop in the FSC curves in Figure 16 shows that something unexpected has happened to this refinement. Are the authors sure that only the procedures outlined in the Methods were used to create these curves? The unexpected nature of the FSC curve for this type (2A) raises doubts about the correctness of the reconstruction. 

      We thank the reviewer for the attention to detail.  We should have caught this mistake. It turns out that in the last round of 3D refinement, the two half-maps become shifted with respect to each other in the z direction. We realigned the two maps using Chimera and then re-ran the postprocessing. The new maps have been deposited in EMD-50850. This mistake motivated us to inspect all of the maps and we found the same problem had occurred in the Type 3B maps.  This was not noticed by the reviewer because we accidentally plotted the FSC curves from postprocessing from one refinement round before the one deposited in the EMD. We performed the same half-map shifting procedure for the Type 3B data and performed a final round of real-space refinement to produce new maps and models that have been deposited as EMD-50888 and 9FYP (superseding the previous entries).

      Reviewer #3 (Public Review): 

      There are two minor points I recommend the authors to address: 

      (1) In the response to Weakness 1, point (3), the authors state that "the Type 5 represented only 10-20% of the fibrils in the sample." However, this information is not labeled in the corresponding Figure 4. I suggest the authors verify and label all relevant percentages in the figures to prevent misunderstandings. 

      We aim to be as transparent as possible and this information was included in the main text however we did not label the percentage of Type 5 fibrils in Figure 4 because that would make the other percentages ambiguous.  The percentages in Figure 4 represent the ratio of helical segments used for each type of refined structure in the dataset (always adding up to 100%), not the percent of all fibrils in the dataset.  That is, there are sometimes untwisted or unidentifiable fibrils in datasets and these were not accounted for in the listed percentages. We have added a sentence to the Figure 4 legend to explain to what the percentages refer.

      (2) While the authors have detailed the helical reconstruction procedure in the Methods section, it is necessary to indicate the scale bar or box size in the figure legend of the 2D representative classes to ensure clarity and reproducibility. 

      Thank you for reminding us to add the scale bars. This is now done for the 2D classes in Figures 11-17.

      Recommendations for the authors: 

      Reviewer #2 (Recommendations For The Authors): 

      A critical look at the maps and models of the various structures at this stage may prevent the authors from entering suboptimal structures into the databases.  

      We agree. Thank you for suggesting this.

      Reviewer #3 (Recommendations For The Authors): 

      The authors have responded adequately to these critiques in the revised version of the manuscript. There are two minor points. 

      (1) The authors state that "the Type 5 represented only 10-20% of the fibrils in the sample." However, this information is not labeled in the corresponding Figure 4. I suggest the authors verify and label all relevant percentages in the figures to prevent misunderstandings. 

      (2) While the authors have detailed the helical reconstruction procedure in the Methods section, it is necessary to indicate the scale bar or box size in the figure legend of the 2D representative classes to ensure clarity and reproducibility. 

      Answered in public comments

    1. Author response:

      eLife assessment

      Cav2 voltage-gated calcium channels play key roles in regulating synaptic strength and plasticity. In contrast to mammals, invertebrates like Drosophila encode a single Cav2 channel, raising questions on how diversity in Cav2 is achieved from a single gene. Here, the authors present convincing evidence that two alternatively spliced isoforms of the Cac gene (cacophony, also known as Dmca1A and nightblindA) enable diverse changes in Cav2 expression, localization, and function in synaptic transmission and plasticity. These valuable findings will be of interest to a variety of researchers.

      We suggest replacing “two alternatively spliced isoforms of the Cac gene” by “two alternatively spliced mutually exclusive exon pairs of the Cac gene”. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Bell et. al. describes an analysis of the effects of removing one of two mutually exclusive splice exons at two distinct sites in the Drosophila CaV2 calcium channel Cacophony (Cac). The authors perform imaging and electrophysiology, along with some behavioral analysis of larval locomotion, to determine whether these alternatively spliced variants have the potential to diversify Cac function in presynaptic output at larval neuromuscular junctions. The author provided valuable insights into how alternative splicing at two sites in the calcium channel alters its function.

      Strengths:

      The authors find that both of the second alternatively spliced exons (I-IIA and I-IIB) that are found in the intracellular loop between the 1st and 2nd set of transmembrane domains can support Cac function. However, loss of the I-IIB isoform (predicted to alter potential beta subunit interactions) results in 50% fewer channels at active zones and a decrease in neurotransmitter release and the ability to support presynaptic homeostatic potentiation. Overall, the study provides new insights into Cac diversity at two alternatively spliced sites within the protein, adding to our understanding of how regulation of presynaptic calcium channel function can be regulated by splicing.

      Weaknesses:

      The authors find that one splice isoform (IS4B) in the first S4 voltage sensor is essential for the protein's function in promoting neurotransmitter release, while the other isoform (IS4A) is dispensable. The authors conclude that IS4B is required to localize Cac channels to active zones. However, I find it more likely that IS4B is required for channel stability and leads to the protein being degraded, rather than any effect on active zone localization. More analysis would be required to establish that as the mechanism for the unique requirement for IS4B.

      We agree that we need to explain more clearly why IS4B is unlikely required for channel stability, but instead, likely has a unique function at the presynaptic active zone of fast synapses. We will address this by revising text and by providing additional data. If IS4B was required for evoked release because it supported channel protein stability, then the removal of IS4B should cause protein degradation throughout all sub-neuronal compartments and throughout the CNS, but this is not the case. First, upon removal of IS4B in adult motoneurons (which use cac channels at the presynapse and somatodendritically, Ryglewski et al., 2012) evoked release from axon terminals is abolished (as at the larval NMJ), but somatodendritic cac inward current is present. If IS4B was required for cac channel stability, somatodendritic current should also be abolished. We will add these data to the ms. Second, immunohistochemistry for tagged IS4B channels reveals that these are present not only at presynaptic active zones at the NMJ but also throughout the VNC motor neuropils. Excision of IS4B causes the absence of cac channels from the presynaptic active zones at the NMJ and throughout the VNC neuropils (and accordingly this is lethal). By contrast, tagged IS4A channels (with IS4B excised) are not found at the presynaptic terminals of fast synapses, but instead, in other distinct parts of the CNS. We will also provide data to show this. Together these data are in line with a unique requirement of IS4B at presynaptic active zones (not excluding additional functions of IS4B), whereas IS4A containing cac isoforms mediate different functions.

      We appreciate the additional reviewer suggestions to the authors that we will address point by point when revising the ms. 

      Reviewer #2 (Public Review):

      This study by Bell et al. focuses on understanding the roles of two alternatively spliced exons in the single Drosophila Cav2 gene cac. The authors generate a series of cac alleles in which one or the other mutually exclusive exons are deleted to determine the functional consequences at the neuromuscular junction. They find alternative splicing at one exon encoding part of the voltage sensor impacts the activation voltage as well as localization to the active zone. In contrast, splicing at the second exon pair does not impact Cav2 channel localization, but it appears to determine the abundance of the channel at active zones. Together, the authors propose that alternative splicing at the Cac locus enables diversity in Cav2 function generated through isoform diversity generated at the single Cav2 alpha subunit gene encoded in Drosophila.

      Overall this is an excellent, rigorously validated study that defines unanticipated functions for alternative splicing in Cav2 channels. The authors have generated an important toolkit of mutually exclusive Cac splice isoforms that will be of broad utility for the field, and show convincing evidence for distinct consequences of alternative splicing of this single Cav2 channel at synapses. Importantly, the authors use electrophysiology and quantitative live sptPALM imaging to determine the impacts of Cac alternative splicing on synaptic function. There are some outstanding questions regarding the mechanisms underlying the changes in Cac localization and function, and some additional suggestions are listed below for the authors to consider in strengthening this study. Nonetheless, this is a compelling investigation of alternative splicing in Cav2 channels that should be of interest to many researchers.

      We agree that some additional information on cac isoform localization (in particular for splicing at the IS4 site) will strengthen the manuscript. We will address this by providing additional data and revising text (see responses to reviewers 1 and 3). We are also grateful for the additional reviewer suggestions which we will address point by point when revising the ms.  

      Reviewer #3 (Public Review):

      Summary:

      Bell and colleagues studied how different splice isoforms of voltage-gated CaV2 calcium channels affect channel expression, localization, function, synaptic transmission, and locomotor behavior at the larval Drosophila neuromuscular junction. They reveal that one mutually exclusive exon located in the fourth transmembrane domain encoding the voltage sensor is essential for calcium channel expression, function, active zone localization, and synaptic transmission. Furthermore, a second mutually exclusive exon residing in an intracellular loop containing the binding sites for Caβ and G-protein βγ subunits promotes the expression and synaptic localization of around ~50% of CaV2 channels, thereby contributing to ~50% of synaptic transmission. This isoform enhances release probability, as evident from increased short-term depression, is vital for homeostatic potentiation of neurotransmitter release induced by glutamate receptor impairment, and promotes locomotion. The roles of the two other tested isoforms remain less clear.

      Strengths:

      The study is based on solid data that was obtained with a diverse set of approaches. Moreover, it generated valuable transgenic flies that will facilitate future research on the role of calcium channel splice isoforms in neural function.

      Weaknesses:

      (1) Based on the data shown in Figures 2A-C, and 2H, it is difficult to judge the localization of the cac isoforms. Could they analyze cac localization with regard to Brp localization (similar to Figure 3; the term "co-localization" should be avoided for confocal data), as well as cac and Brp fluorescence intensity in the different genotypes for the experiments shown in Figure 2 and 3 (Brp intensity appears lower in the dI-IIA example shown in Figure 3G)? Furthermore, heterozygous dIS4B imaging data (Figure 2C) should be quantified and compared to heterozygous cacsfGFP/+.

      We understand the reviewer’s comment and will do the following to convincingly demonstrate absence of cac from presynaptic active zones upon IS4B excision. First, we will show selective enlargements of IS4A and IS4B with Brp in presynaptic active zones to show distinct cac label in active zones following excision of IS4A but not following excision of IS4B. Second, we will provide Pearson’s co-localization coefficients of Brp with IS4B and with IS4A, respectively. Third, we will reduce the intensity of the green channels in figures 2C and 2H to the same levels as in 2A and B, and H control to allow a fair comparison of cac intensities following excision of IS4B versus excision of IS4A and control. We had increased intensity to show that following excision of IS4B, no distinct cac label is found in active zones, even at high exaggerated image brightness. However, we agree with the reviewer that the bright background hampers interpretation and thus will show the same intensity in all images that need to be compared.

      (2) They conclude that I-II splicing is not required for cac localization (p. 13). However, cac channel number is reduced in dI-IIB. Could the channels be mis-localized (e.g., in the soma/axon)? What is their definition of localization? Could cac be also mis-localized in dIS4B? Furthermore, the Western Blots indicate a prominent decrease in cac levels in dIS4B/+ and dI-IIB (Figure 1D). How do the decreased protein levels seen in both genotypes fit to a "localization" defect? Could decreased cac expression levels explain the phenotypes alone?

      We will precisely define channel localization, and we will explain why it is highly unlikely that the absence of IS4B channels as well as the lower number of I-IIA channels are simply a consequence of reduced expression, but instead of splice variant specific channel function and localization. For example, upon excision of IS4B no cac channels are found at the presynaptic active zones and these synapses are thus non-functional. The isoforms containing the mutually exclusive IS4A exon are expressed and mediate other functions (see also response to reviewer 1) but cannot substitute IS4B containing isoforms at the presynapse. In fact, our Western blots are in line with reduced cac expression if all isoforms that mediate evoked release are missing, again indicating that the presynapse specific cac isoforms cannot be replaced by other cac isoforms (see also below, response to (3)). Feedback mechanisms that regulate cac expression in the absence of presynapse specific cac isoforms are beyond the scope of this study.

      (3) Cac-IS4B is required for Cav2 expression, active zone localization, and synaptic transmission. Similarly, loss of cac-I-IIB reduces calcium channel expression and number. Hence, the major phenotype of the tested splice isoforms is the loss of/a reduction in Cav2 channel number. What is the physiological role of these isoforms? Is the idea that channel numbers can be regulated by splicing? Is there any data from other systems relating channel number regulation to splicing (vs. transcription or post-transcriptional regulation)?

      We will provide additional evidence that mutually exclusive splicing at the IS4 site results in cac channels that localize to the presynaptic active zone (IS4B) versus cac channels that localize to other brain parts and/or other subneuronal compartments (see response to reviewer 1).  In addition, we already show in figure 2J that IS4B is required for normal cac HVA current, and we can add data showing that IS4A is not essential for cac HVA current. Similarly, for I-II we find it unlikely that differential splicing regulates channel numbers, but rather splice variant specific functions in different brain parts and different sub-neuronal compartments. To substantiate this interpretation, we will add data from developing adult motoneurons showing that excision of I-IIA causes reduced activity induced calcium influx into dendrites (new data), but it does not reduce channel number at the larval NMJ (figure 4). In our opinion these data are not in line with the idea that splicing regulates cac expression levels, and this in turn, results in specific defects in distinct neuronal compartments. However, we agree that the lack of isoforms with specific functions results in altered overall cac expression levels as indicated by our Western data. If isoforms normally abundantly expressed throughout most neuropils are missing due to exon excision, we indeed find less cac protein in Westerns. By contrast, the lack of isoforms with little abundance has little effect on cac expression levels. This may be the results of unknown feedback mechanisms which are beyond the scope of this study.

      (4) Although not supported by statistics, and as appreciated by the authors (p. 14), there is a slight increase in PSC amplitude in dIS4A mutants (Figure 2). Similarly, PSC amplitudes appear slightly larger (Figure 3J), and cac fluorescence intensity is slightly higher (Figure 3H) in dI-IIA mutants. Furthermore, cac intensity and PSC amplitude distributions appear larger in dI-IIA mutants (Figures 3H, J), suggesting a correlation between cac levels and release. Can they exclude that IS4A and/or I-IIA negatively regulate release? I suggest increasing the sample size for Canton S to assess whether dIS4A mutant PSCs differ from controls (Figure 2E). Experiments at lower extracellular calcium may help reveal potential increases in PSC amplitude in the two genotypes (but are not required). A potential increase in PSC amplitude in either isoform would be very interesting because it would suggest that cac splicing could negatively regulate release.

      There are several possibilities to explain this, but as none of the effects are statistically significant, we prefer to not investigate this in depth. However, given that we cannot find IS4A at the presynaptic active zone, IS4A is unlikely to have a direct negative effect on release probability. Nonetheless, given that IS4A containing cac isoforms mediate functions in other neuronal compartments it may regulate release indirectly by affecting action potential shape. We will provide data in response to the more detailed suggestions to authors that will provide additional insight.

      (5) They provide compelling evidence that IS4A is required for the amplitude of somatic sustained HVA calcium currents. However, the evidence for effects on biophysical properties and activation voltage (p. 13) is less convincing. Is the phenotype confined to the sustained phase, or are other aspects of the current also affected (Figure 2J)? Could they also show the quantification of further parameters, such as CaV2 peak current density, charge density, as well as inactivation kinetics for the two genotypes? I also suggest plotting peak-normalized HVA current density and conductance (G/Gmax) as a function of Vm. Could a decrease in current density due to decreased channel expression be the only phenotype? How would changes in the sustained phase translate into altered synaptic transmission in response to AP stimulation?

      Most importantly, HVA current is mostly abolished upon excision of IS4B (not IS4A, we think the reviewer accidentally mixed up the genotype). This indicates that the cac isoforms that mediate evoked release encode HVA channels. However, the somatodendritic current shown in figure 2J that remains upon excision of IS4B is mediated by IS4A containing cac isoforms. Please note that these never localize to the presynaptic active zone, thus the small inactivating HVA that remains in figure 2J does normally not mediate evoked release. Therefore, the interpretation is that specifically HVA current encoded by IS4B cac isoforms is required for synaptic transmission. Reduced cac current density is not the cause for this phenotype because a specific current component is absent. 

      We agree with the reviewer that a deeper electrophysiological analysis of cac currents mediated by IS4B containing isoforms will be instructive. However, a precise analysis of activation and inactivation voltages and kinetics suffers form space clamp issues in recordings from the soma of such complex neurons (DLM motoneurons of the adult fly). Therefore, we will analyze the currents in a heterologous expression system and present these data to the scientific community as a separate study at a later time point.

      (6) Why was the STED data analysis confined to the same optical section, and not to max. intensity z-projections? How many and which optical sections were considered for each active zone? What were the criteria for choosing the optical sections? Was synapse orientation considered for the nearest neighbor Cac - Brp cluster distance analysis? How do the nearest-neighbor distances compare between "planar" and "side-view" Brp puncta?

      Max. z-projections would be imprecise because they can artificially suggest close proximity of label that is close in x and y but far away in z. Therefore, the analysis was executed in xy-direction of various planes of entire 3D image stacks. We considered active zones of different orientations (Fig. 4C, D). In fact, we searched the entire z-stacks until we found active zones of all orientations shown in figures 4C1-C6 within the same boutons. The same active zone orientations were analyzed for all exon-out mutants with cac localization in active zones. The distance between cac and brp did not change if viewed from the side.

      (7) Cac clusters localize to the Brp center (e.g., Liu et al., 2011). They conclude that Cav2 localization within Brp is not affected in the cac variants (p. 8). However, their analysis is not informative regarding a potential offset between the central cac cluster and the Brp "ring". Did they/could they analyze cac localization with regard to Brp ring center localization of planar synapses, as well as Brp-ring dimensions?

      In the top views (planar) we did not find any clear offset in cac orientation to brp between genotypes. This study focuses on cac splice isoform specific localization and function. Possible effects of different cac isoforms on Brp-ring dimensions or other aspects of scaffold structure are not central to our study, in particular given that Brp puncta are clearly present even if cac is absent from the synapse (Fig. 2H), indicating that cac is not instructive for the formation of the Brp scaffold.  

      (8) Given the accelerated PSC decay/ decreased half width in dI-IIA (Fig. 5Q), I recommend reporting PSC charge in Figure 3, and PPR charge in Figures 5A-D. The charge-based PPRs of dI-IIA mutants likely resemble WT more closely than the amplitude-based PPR. In addition, miniature PSC decay kinetics should be reported, as they may contribute to altered decay kinetics. How could faster cac inactivation kinetics in response to single AP stimulation result in a decreased PSC half-width? Is there any evidence for an effect of calcium current inactivation on PSC kinetics? On a similar note, is there any evidence that AP waveform changes accelerate PSC kinetics? PSC decay kinetics are mainly determined by GluR decay kinetics/desensitization. The arguments supporting the role of cac splice isoforms in PSC kinetics outlined in the discussion section are not convincing and should be revised.

      We agree that reporting charge in figure 3 will be informative and will do so. We also understand the reviewer’s concern attributing altered PSC kinetics to presynaptic cac channel properties. We will tone down our interpretation in the discussion and list possible alterations in presynaptic AP shape or Cav2 channel kinetics as alternative explanations (not conclusions). Moreover, we will quantify postsynaptic GluRIIA abundance to test whether altered PSC kinetics are caused by altered GluRIIA expression. In our opinion, the latter is more instructive than mini decay kinetic analysis because this depends strongly on the distance of the recording electrode to the actual site of transmission in these large muscle cells.

      (9) Paired-pulse ratios (PPRs): On how many sweeps are the PPRs based? In which sequence were the intervals applied? Are PPR values based on the average of the second over the first PSC amplitudes of all sweeps, or on the PPRs of each sweep and then averaged? The latter calculation may result in spurious facilitation, and thus to the large PPRs seen in dI-IIB mutants (Kim & Alger, 2001; doi: 10.1523/JNEUROSCI.21-24-09608.2001).

      We agree that the PP protocol and analyses have to be described more precisely in the methods, and we will do so. PPR values are based on the PPRs of each sweep and then averaged. We are aware of the study of Kim and Alger 2001, but it does not affect our data interpretation because all genotypes were analyzed identically, but only the I-IIB excision resulted in the large data spread shown in figure 5.

      (10) Could the dI-IIB phenotype be simply explained by a decrease in channel number/ release probability? To test this, I propose investigating PPRs and short-term dynamics during train stimulation at lower extracellular Ca2+ concentration in WT. The Ca2+ concentration could be titrated such that the first PSC amplitude is similar between WT and dI-IIB mutants. This experiment would test if the increased PPR/depression variability is a secondary consequence of a decrease in Ca2+ influx, or specific to the splice isoform.

      In fact, the interpretation that decreased PSC amplitude upon I-IIB excision is caused mainly by reduced channel number is precisely our interpretation (see discussion page 14, last paragraph to page 15, first paragraph). In addition, we are grateful for the reviewer’s suggestion to triturate the external calcium such that the first PSC amplitude matches the one in ΔI-IIB to test whether altered short term plasticity is solely a function of altered channel number or whether additional causes, such as altered channel properties, also play into this. We will conduct these experiments and include them in the revised manuscript.

      (11) How were the depression kinetics analyzed? How many trains were used for each cell, and how do the tau values depend on the first PSC amplitude? Time constants in the range of a few (5-10) milliseconds are not informative for train stimulations with a frequency of 1 or 10 Hz (the unit is missing in Figure 5H). Also, the data shown in Figures 5E-K suggest slower time constants than 5-10 ms. Together, are the data indeed consistent with the idea that dI-IIB does not only affect cac channel number, but also PPR/depression variability (p. 9)?

      For each animal, the amplitudes of each PSC were plotted over time and fitted with a single exponential. For depression at 1 and 10 Hz, we used one train per animal, and 5-6 animals per genotype (as reflected in the data points in Figs 5H and 5L). Given that the tau values are highly similar between control and excision of I-IIA, but ΔI-IIA tends to have larger single PSC amplitudes, differences in first PSC amplitude do not seem to skew the data (but see also response to comment 10 above). We thank the reviewer for pointing out that tau values in the range of ms are not informative at 1 and 10 Hz stimulations (Figs 5H and 5L). We mis-labeled (or did not label) the axes. The label should read seconds, not milliseconds. We apologize, and this will be corrected accordingly.

      In sum, pending the outcome of additional important control experiments for GluRIIA abundance (see response to comment 8) and trituration of control PSC amplitude for the first pulse of paired pulses in ΔI-IIB (see response to comment 10) we will either modify or further support that interpretation.

      (12) The GFP-tagged I-IIA and mEOS4b-tagged I-IIB cac puncta shown in Figure 6N appear larger than the Brp puncta. Endogenously tagged cac puncta are typically smaller than Brp puncta (Gratz et al., 2019). Also, the I-IIA and I-IIB fluorescence sometimes appear to be partially non-overlapping. First, I suggest adding panels that show all three channels merged. Second, could they analyze the area and area overlap of I-IIA and I-IIB with regard to each other and to Brp, and compare it to cac-GFP? Any speculation as to how the different tags could affect localization? Finally, I recommend moving the dI-IIA and dI-IIB localization data shown in Figure 6N to an earlier figure (Figure 1 or Figure 3).

      We will show panels with all three labels matched as suggested by the reviewer. For the size of the puncta: this could be different numbers and types of fluorophores on the different antibodies used and thus different point spread, chromatic aberration, different laser and detector intensities etc. We will re-analyze the data to test whether there are systematic differences in size. We do not want to speculate whether the different tags have any effect on localization precision because of the abovementioned reasons as well as artificial differences in localization precision that can be suggested by different antibodies. We prefer to not move the figure because we believe it is informative to show our finding that active zones usually contain both splice variants together with the finding that only one splice variant is required for PHP.

    1. Author response:

      The following is the authors’ response to the original reviews.

      (1) Please provide more background about Rpgrip1l in the introduction, particularly the past studies of mammalian homolog of Rpgrip11, if any? Is there any human disease associated with Rpgrip1l? Do these patients have scoliosis phenotype? 

      • We have added more background on the human ciliopathies caused by RPGRIP1L mutations and on their occasional association with early onset scoliosis (lines 45-54 page 2 in the introduction, see cited references). 

      (2) The allele is a large deficiency of most of the coding region of rpgrip1l, can you give details in the Supplementary data of how you show this by genotyping? It would be good to explain that this mutation is most likely behaving as a null, if you have RNAseq data that supports this please note that. Otherwise, it may be incorrect to assume it is a null allele as your shorthand nomenclature states. If you do not have stronger evidence that the deficiency allele is behaving as a null allele, then please think about using an allele nomenclature as outlined at ZFIN:  

      • We now describe in the results section (Lines 72-76, page 3) the extent of the deletion of rpgrip1l ∆/∆ (22 exons out of 26) that creates an early stop at position 88 of 1256 aas. We have submitted to ZFIN our two novel mutant lines: rpgrip1l∆  is recorded as rpgrip1l bps1 and rpgrip1l ex4 as rpgrip1l bps2 , and we provide this information in the text. Transcriptomics data confirmed this allele is behaving as a null as the most down-regulated transcript found in the brain of rpgrip1l ∆/∆ is rpgrip1l transcript itself, (volcano plot in Fig 5A, described in the results, Line 270-71, page 9).

      • We also have provided in Supplementary Figure 1 A’ a picture of a typical genotyping gel for the rpgrip1l∆ allele. Sequences of both CRISPR guide RNAs and genotyping primers are provided in the Math & Meth section. 

      (3) Throughout the manuscript, the authors refer to zebrafish mutant phenotypes as "juvenile scoliosis". However, scoliosis may not appear until 11 weeks post-fertilization in some animals. After 6-8 weeks of age, it would be more appropriate to describe the phenotype as "late-onset or adult scoliosis" to differentiate between other reported scoliosis mutants (such as hypomorphic or dominant negative alleles of scospondin) that start body curvatures at 3-5 dpf .

      • We think we can really qualify rpgrip1l-/- scoliosis as being a “juvenile scoliosis” as shown by the time course displayed in Fig 1B: rpgrip1l-/- scoliosis develops asynchronously between 4 weeks and 9 weeks (from 0.8 cm/1 cm to 1.6 cm, corresponding to juvenile stages according to Parichy et al, 2009 PMID: 19891001), after which it reaches a plateau. Half of the mutants are already scoliotic by 5 weeks and no scoliosis develops at adult stage, ie from 10 weeks on. We have acknowledged the late onset scoliosis in page 3 line 93.

      (4) A more careful demonstration of the individual vertebrae, using magnified high-resolution pictures in Figures 1D-G, should be made to more clearly show no obvious vertebral malformations are present. 

      • We now provide a movie in Sup Data that presents 3D views of controls and mutant spines, which show the intervertebral spaces as well as vertebral shape and size. With these images we could exclude vertebral fusion and the presence of dysmorphic vertebrae.

      (5) On page 5: the authors comment on transgenic expression of RPGRIP1L in foxj1a-lineages as "rescuing" scoliosis. This terminology is confusing, as rescuing a condition could be interpreted as inducing it where it was once absent. "Suppressing" scoliosis may be a more appropriate term. 

      • We agree with the reviewers, the “rescue” term is confusing, we changed it for “suppress” in the title of the paragraph (line 95 page 3) and within the text (line 115 page 3).

      (6) On page 5, lines 155-156: the authors state that "Indeed, no tissue-specific rescue has been performed yet in zebrafish ciliary gene mutants". This is misleading, as ptk7a and katnb1 mutations both disrupt cilia, and transgenic reintroduction of both ptk7a and katnb1 in foxj1a- expressing lineages has previously been shown to suppress cilia defects as well as scoliosis in these models. The statement should be removed for accuracy. 

      • We agree that we were not precise enough in our sentence: when we mentioned “ciliary gene” mutants, we were referring to genes whose products are enriched within cilia and directly affecting ciliogenesis, cilia content and maintenance such as TZ or BBS genes, without encompassing genes like ptk7 and katnb1 whose products perform multiple functions on top of cilia maintenance such as Wnt signalling and remodelling of the whole microtubule network respectively. We have therefore modified our sentence by adding zebrafish ciliary “TZ and BBS” genes (line 104, page 4).

      (7) Figure 2: panels A-B: In the text (line 196) you state that cilia length was increased and that Arl13b content was severely reduced. However, Panel B shows no significant length difference between scoliotic mutants and controls. This statement and graph should be corrected for accuracy. Also, the Arl13b staining is difficult to see in panel A - can channels be split, and/or quantified? 

      • We have now split the Arl13b and glutamylated tubulin channels (Fig 2 A-C”). We think that the reduction of Arl13b staining intensity is now obvious in both straight and scoliotic mutants (Compare 2A” with 2B” and 2C”). We were not able to quantify Arl13b staining using ciliary masks from glutamylated tubulin staining since both staining only partially overlap along the length of the cilium, Arl13b being more distal than glutamylated tubulin (Fig 2A’). 

      • Ciliary length was significantly increased (from 3.4 to 5.3 µ) in straight rpgrip1l-/-, while the average mean values for scoliotic rpgrip1l-/- were heterogenous (mean 4.1µ) and therefore not significantly different when compared to controls. This heterogeneity stems from the combined presence of both shorter and longer cilia in scoliotic fish, a finding we interpreted by the potential breakage over time of extra-long and thin cilia observed in scoliotic fish (as in Sup figure 1 H’’’, Sup Fig 2M’ and 2O’). 

      • We changed the text to be more accurate: we now state that cilia length increased in straight mutants, and became more heterogenous than controls in scoliotic mutants (line 143-144, page 5). 

      (8) Figure 3: Page 7, line 206: authors state that SCO-spondin secreting cells varied in number along SCO length. What is the evidence that these cells secrete SCO-spondin? The staining shown in Figure 3L-O appears to demonstrate extracellular accumulation of sspo:GFP. What is the evidence that this staining originated from cells in proximity to it? 

      The claim of SCO-secreting cells in Figure 2E-J is confusing. I assume you are using anatomy to infer the SCO is captured in these sections. This should be done in sspo-GFP animals (as in Figure 3) and/or dual anti-body labeling can be done to show SCO-secreting cells and cilia. 

      • We now show in Supplementary Figure 2 A-D a double staining for Sco-spondin-GFP and cilia (Ac-tub, Glu-Tub). Analyzing GFP staining along SCO length on successive sections, we identified the SCO producing cells on the diencephalic dorsal midline by their position under the posterior commissure (PC), which forms an Acetylated Tubulin positive arch), and counted the nuclei surrounded by cytoplasmic GFP from the most anterior region ( 24 cells wide, Sup Fig 2A-A’) to the most posterior region (4-8 cells wide, Sup Fig 2 C).` 

      • Furthermore, the close-ups presented on Fig 2A’ and 2B’ allow to detect the cytoplasmic Sspo-GFP staining around SCO nuclei, above the region presenting primary cilia pointing towards the diencephalic ventricle, both in controls and mutants at scoliosis onset (tail-up mutants), showing that the extracellular staining in B’ very likely originates from these cells. In these tail-up mutants, extracellular Sspo aggregates have not yet filled the whole diencephalic ventricle as in Fig 3 N and Q. 

      (9) Figure 5: Is the transcriptome data and proteomic data consistent for any transcripts and encoded protein products? Please highlight those consistent targets in both analyses. 

      • We would like to emphasize that the transcriptomic study was performed at scoliosis onset, at 5 weeks, while the proteomics analysis was performed at adult stage (3 months) so they cannot be directly compared.

      Moreover, low abundance proteins (such as centrosomal proteins and transcription factors like Foxj1a ) are not detected by label-free proteomics, without prior subcellular fractionation procedure (Lindemann et al, 2017 PMID: 28282288). The extraction protocol also does not allow to purify short neuropeptides such as Urp1-2.

      Nevertheless, we found four targets in common, now highlighted in red in Fig 5, Panel E: Anxa2, complement proteins

      C4 and C7a, and Stat3, all related to immune response, a GO term enriched in both studies as explained in the text (Lines 308-311, page 10). 

      The absence of many inflammation markers or immune response proteins at adult stage in scoliotic mutants most probably indicates a transient inflammatory episode at scoliosis onset, while astrogliosis, as detected by GFAP staining, increases with scoliosis severity. Along the same lines, the two-fold increase of Lcp1 cells within the tectum is present before axis curvature (in straight mutants) and disappears in scoliotic fish (Graph G in Sup Figure S5) as explained in the text, Lines 378-381, page 12, 

      (10) Supplementary Figure 1 F-H: What stage/age samples were used for SEM? It is only stated that they were 'adults'. It is also stated that cilia tufts in straight rpgrip1l-/- fish were morphologically normal but 'less dense'- this was not obvious from the figure. Can density be quantified? (otherwise, data does not support the statement). Similarly, can the statement that "cilia of mono-ciliated ependymal cells showed abnormal irregular structures compared to controls, with either bulged or thinner parts" be supported with measurements/quantification? 

      • The SEM study was performed on 3 months old fish, 3 controls and 5 mutants. We added this information in the figure legend. We could not quantify the number of ciliary tufts in the brain ventricle of the sole straight mutant that was analyzed. We therefore removed the statement that cilia were less dense in the straight mutant. Along the same lines, we mentioned that we could find mutant cilia of irregular shape as shown in Supplementary Figure S1, F”,G’’, H’’ and H’’’) (page 4, lines 124-129). 

      (11) Supplementary Figure 1D-E is never mentioned in the text. The Supplemental Figure legend also refers to a graph of cilia length that is not in the figure itself. As a result, many of the subsequent panel references are out of register. 

      • We now provide the correct version of the legend and refer to Sup Fig 1D-E in the text (page 3, lines 79-81) and its legend, page 53, lines 1616-1620.

      (12) Supplementary Figure 2A-F: Of interest, in panels C and F, it looks as though sspo:GFP is accumulating on cilia within the ventricles of rpgrip1l mutants. Can this be explored? Is it possible that abnormal aggregation of SSPO on cilia is ultimately leading to cilia loss, as you report for multi-ciliated cells surrounding the subcommissural organ? This could be a very interesting finding and possible mechanism for cilia loss.

      • Our observation of all brain sections led us to conclude that the majority of Sspo-GFP aggregates were floating within the brain ventricles of rpgrip1l-/- fish while a portion of aggregates were stuck on ventricle walls, in close contact with cilia as now shown on Supplementary figure S2 B’, outlined in legend page 54, lines 1634-1637. We agree that the contact between Sspo aggregates and cilia might have damaging consequences, either on cilia maintenance or on immune reaction induction and we now mention these possibilities in the discussion page16, lines 524-526. These research lines will be explored in the near future.

      (13) Supplementary Figure 5A-F is not mentioned in the manuscript. Please clarify the role of Anxa2 in neuroinflammation. Is increased Anxa2 expression in rpgrip1l mutant zebrafish reduced after anti-inflammatory drug treatment? What is the expression level of anxa2 in cep290 mutant zebrafish? 

      • We have now added mention to Supplementary Figure 5A-F in the text page 10 lines 328-331. 

      • We unfortunately did not have enough histological material to test Anxa2 staining on NACET treated fish after performing GFAP and Lcp1 staining, neither for dilatation measurement or multiciliated cells quantification. We agree this would have helped to better define which defect might be an indirect consequence of an inflammatory environment.

      • We tested the expression level of Anxa2 in cep290-/- fish. No labelling above control level was detected on cep290-/- brain sections that were positive for GFAP (N = 5). As GFAP staining in 3-4 weeks cep290-/- was not as intense and widespread as in adult rpgrip1l-/- (50% of GFAP + cells compared to 100% in the SCO for example), we concluded that Anxa2 expression may be upregulated after widespread or long-term astrogliosis/inflammation. Alternatively, Anxa2 overexpression could be specific to rpgrip1l-/- fish. 

      (14) A summary diagram at the end would be helpful for understanding the main findings. 

      We added a Graphical Abstract summarizing the main conclusions and hypotheses of this study. It is mentioned and explained in the Discussion section, p. 16 lines 504-508 and 516-529. 

      (15) The sspo-GFP zebrafish line should be listed in the STAR methods section: 

      The sspo-GFP line is now listed in the STAR methods, Scospondin-GFPut24, (Troutwine et al., 2020 PMID: 32386529), p.43, last line.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate the role of chirping in a species of weakly electric fish. They subject the fish to various scenarios and correlate the production of chirps with many different factors. They find major correlations between the background beat signals (continuously present during any social interactions) or some aspects of social and environmental conditions with the propensity to produce different types of chirps. By analyzing more specifically different aspects of these correlations they conclude that chirping patterns are related to navigation purposes and the need to localize the source of the beat signal (i.e. the location of the conspecific).

      The study provides a wealth of interesting observations of behavior and much of this data constitutes a useful dataset to document the patterns of social interactions in these fish. Some data, in particular the high propensity to chirp in cluttered environments, raises interesting questions. Their main hypothesis is a useful addition to the debate on the function of these chirps and is worth considering and exploring further.

      After the initial reviewers' comments, the authors performed a welcome revision of the way the results are presented. Overall the study has been improved by the revision. However, one piece of new data is perplexing to me. The new Figure 7 presents the results of a model analysis of the strength of the EI caused by a second fish to localize when the focal fish is chirping. From my understanding of this type of model, EOD frequency is not a parameter in the model since it evaluates the strength of the field at a given point in time. Therefore the only thing that matters is the phase relationship and strength of the EOD. Assuming that the second fish's EOD is kept constant and the phases relationship is also the same, the only difference during a chirp that could affect the result of the calculation is the potential decrease in EOD amplitude during the chirp. It is indeed logical that if the focal fish decreased its EOD amplitude the target fish's EOD becomes relatively stronger. Where things are harder to understand is why the different types of chirps (e.g. type 1 vs type 2) lead to the same increase in signal even though they are typically associated with different levels of amplitude modulations. Also, it is hard to imagine that a type 2 chirps that is barely associated with any decrease in EOD amplitude (0-10% maybe), would cause doubling of the EI strength. There might be something I don't understand but the authors should provide a lot more details on how this result is obtained and convince us that it makes sense.

      We thank the author for the comments and we agree that the approach could have been better detailed. As anticipated by the Reviewer, the Boundary Element Method (BEM) model can be used simply to calculate the electric field and electric image at a specific point in time (instantaneously), regardless of EOD frequency. However, our model allows for the concatenation of consecutive instants and thus is able to render an entire sequence of electric fields - and resulting electric images - incorporating realistic EOD characteristics such as shape, duration, and frequencies (see Pedraja et al., 2014).

      Chirp-triggered EIs were modeled using real chirps produced by interacting fish. Each chirp was thus associated to its duration and peak parameters, as well as the fish positional information (distance and angle). 

      However, since we did not know the beat phase at which chirps were produced, we computed electric images for each fish position and chirp scenario by simulating various phases (here referred to the initial offset of the two EODs, set at 4 phases, equally spaced). These are intended as phases of the sender EOD and simply refer to the initial OFFSET between the two interacting EODs. However, since our simulations were run over a time window of 500 msec, all phases are likely to be covered, with a different temporal order relative to the chirp (always centered within the 500 msec).

      The simulation was run maintaining consistent timing for both chirp and non-chirp conditions, across approximately 800 body nodes. At each node, the current flow was calculated from the peak-to-peak of the EOD sum (i.e. the point-to-point of the difference between the beat positive and negative envelopes). Analyzing the EIs over this fixed time window enables us to assess the unitary changes of current flow induced by chirps over units of time (ΔI/Δt). From this, we can calculate a cumulative sum of current flow changes - expressed as delta(EI) and use it to show the effect of the chirps on the spatiotemporal EI (Figure 7C).

      One can express this cumulative change mapped onto the fish body (keeping the 800 points separated, as in Figure 7C) or further sum the current changes to obtain a single total (as shown in Figure 7D).

      One can check this by considering that a sum for example of a set of 500/800 points - judging from the size of the blue areas in C not all 800 points have a detectable change - each valued 0.1-to-0.3 mA/s, one could get circa 100 mA/s, which is what is shown in D. (is this what is happening ?)

      We do not know why chirps of different types triggered similar effects. It is possible that, since EI measurements are pooled over several chirps produced at different angles and distances, in case of a lower amount of chirps considered for a given type (as in the case of rises, very low) these measurements may not highlight more marked differences among types. In a publication we are currently working on, we are considering a larger dataset to better assess these results.

      The methods section has been edited to clarify the approach (not yet).

      Reviewer #2 (Public Review):

      Studying Apteronotus leptorhynchus (the weakly electric brown ghost knifefish), the authors provide evidence that 'chirps' (brief modulations in the frequency and amplitude of the ongoing electric signal) function in active sensing (specifically homeoactive sensing) rather than communication. Chirping is a behavior that has been well studied, including numerous studies on the sensory coding of chirps and the neural mechanisms for chirp generation.

      Chirps are largely thought to function in communication behavior, so this alternative function is a very exciting possibility that could have a great impact on the field.

      We thank the Reviewer for the extensive and constructive comments. We would like to add that, while it is true that many detailed studies have been published on the anatomy and physiology of the circuits implicated in the production and modulation of “electric chirps”, most of this  research assumed, and focused exclusively on, their possible role in communication.  In addition, most behavioral studies did the same and a meta-analysis of the existing literature on chirping allows to trace back the communication idea mainly to two studies: Hagedorn and Heiligenberg, 1985 (“Court and spark: electric signals in the courtship and mating of gymnotoid fish”) and Hopkins, 1974 (“Electric Communication: Functions in the Social Behavior of Eigenmannia Virescens”), among the main sources. Importantly, in these studies only contextual observations have been made (no playback experiment or other attempts to analyze more quantitatively the correlation of chirping with other behaviors).

      The authors do provide convincing evidence that chirps may function in homeoactive sensing. However, their evidence arguing against a role for chirps in communication is not as strong, and fails to sufficiently consider the evidence from a large body of existing research. Ultimately, the manuscript presents very interesting data that is sure to stimulate discussion and follow-up studies, but it suffers from dismissing evidence in support of, or consistent with, a communicative function for chirps.

      Although the tone of some statements present in our earlier draft may suggest otherwise, through our revisions, we have made an effort to clarify that we do not intend to dismiss a function of chirps in communication, we only intend to debate and discuss valid alternative hypothesis, advanced from reasonable considerations.

      Before writing this manuscript, we have attempted to survey  literally all the existing literature on chirps (including studies focused on behavior, peripheral sensory physiology as well as brain physiology). Although it is not unlikely that some studies have eluded our attention, an effort for a comprehensive review was made. Based on this survey we realized that none of the studies provided a clear  and  unambiguous piece of evidence to support the communication hypothesis (we refer here to the weak points highlighted in the discussion and mentioned in the previous comment). Which in fact does not come without its weak points and contradictions (see later comments).

      It follows a summary of the mentions made to the communication theory in the different section of the manuscript including several edits we have applied in response to the Reviewer’s concern:

      In the abstract we clearly state that we are considering an alternative that is only hypothetically complementary, not for sure.  Nonetheless, we have identified a couple of instances that could sound dismissive of the “communication hypothesis” in the following section.

      In the introduction we write in fact about the possibility of interference between communication signals and conspecific electrolocation cues, as they are both detected as beat perturbations. We did not mean to use “Interference” here as “reciprocal canceling”, rather we intended it as “partial or more or less conspicuous overlap” in the responses triggered in electroreceptors.

      Hoping to convey a clearer message, we have edited the related statement and changed it to “both types of information are likely to overlap and interact in highly variable ways”.

      We have also removed the statement: “According to this idea, beats and chirps are not only detected through the same input channel, but also used for the same purpose.” as at this point in the manuscript it may be too strong.

      In the results section we do not include statements that might be seen as dismissive of the communication hypothesis but only statements in support of the “probing with chirps” idea (which is the central hypothesis of the study).

      In the discussion paragraphs we elaborate on why the current functional view is either flawed or incomplete (first paragraph “existing functional hypotheses''). Namely: 1)  multiple triggering factors implied in chirp responses covary and need to be disentangled (example DF/ sex), 2) findings on brown ghosts and a few other gymnotiforms have been used to advance the hypothesis of “communication through chirps'' in all weakly electric fish (including pulse species). 3) social encounters - in which chirps are recorded - imply also other behaviors (such as probing) which have not been considered so far. This point is related to the first one on covariates. 4) most studies referring to big chirps as courtship chirps were not done in reproductive animals (added now)  and 5) no causal evidence has been provided so far to justify a role of chirps in social communication.

      We are discussing these points as challenges to the communication hypothesis, not to dismiss the hypothesis, but rather to motivate future studies addressing these challenges.

      We do not want to appear dismissive of the communication hypothesis and had therefore previously edited the manuscript to avoid the impression of exclusivity of the probing hypothesis. We have now gone over the manuscript once more and edited several sentences. Nevertheless, we want to point out again that - despite the large consensus - the communication hypothesis has, until now, never been investigated with the kind of rigor applied here.

      The authors do acknowledge that chirps could function as both a communication and homeactive sensing signal, but it seems clear they wish to argue against the former and for the latter, and the evidence is not yet there to support this.

      In both rounds of revision we have made an effort to convey a more inclusive interpretation of our findings. We tried our best to express our ideas as hypothetical, not as proof that communication through chirps does not exist. The aim of this study is to propose an alternative view, and this cannot be done without underlining the weak points of an existing hypothesis while providing and supporting reasonable arguments in favor of the alternative we advance. The actual evidence for a role of chirping in communication is much less strong than appears from the pure number of articles that have discussed chirps in this context.

      Regarding the weak evidence against communication, here we can list a few additional important points related to the proposed interpretations of chirp function (more specific than those made earlier):

      (1) A formally sound assessment of signal value/meaning - as typically done in animal communication studies should involve: 

      a) the isolation of a naturally occurring signal and determination of the context in which it is produced 

      b) the artificial replication of the signal

      c) the observation that such mimic is capable of triggering reliable and stereotyped responses in a group of individuals (identified by sex and/or species) under the same conditions (conditioned, unconditioned, state-dependent, etc.). As discussed for instance in Bradbury and Vehrencamp, 2011; Laidre and Johnstone, 2013; Wyatt, 2015; Rutz et al., 2023.

      This approach has so far not been applied to weakly electric fish. The initial purpose of the present study was in fact to conduct this type of validation.

      (2) The hypothesis of chirps used for DF-sign discrimination - for “social purposes” - although plausible in the face of theoretical considerations,  does not seem to be reasonable in practice, when one considers emission rates of 150 chirps per minute. We do find a strong correlation of chirp type with DF, which is often very abrupt and sudden (as if the fish were tracking beat frequency to guess its value) but the consideration made above on chirp rates seems to discourage this interpretation.

      (3) The hypothesis of chirp-patterning (i.e. chirping may have meaning based on the sequence of chirps of different types, a bit like syllables in birdsongs) - assessed by only one study conducted in our group - has not been enough substantiated by replication. We have surveyed all possible combinations of chirps produced by interacting pairs in different behavioral conditions using different value for chirp sequence size: 2, 3,... ,8 chirps (both considering the sender alone as well as sender+receiver together). In all cases we found no evidence for  a context dependent “modulation” of chirp types (i.e. no specific chirp type sequence in specific contexts).

      (4) The hypothesized role of “large chirps” as courtship signals could be easily criticized by noting the symmetrical distribution of these events around  a DF of 0 Hz . Although one could argue about a failure to discriminate DF-sign, to explain this well known pattern. However, we know from Walter Heiligenberg’s work and physiological considerations that such task can be solved easily through t-units and … in principle even just by motion (which would change the EOD phase in frequency dependent ways, thus potentially revealing the DF sign).

      Overall, these considerations made us think that certainly chirping occurs in a social context, but it is the meaning of this behavior that remains elusive.  We noticed that environmental factors are also strongly implied … we then formulate an alternative hypothesis to explain chirping but we do so  without dismissing the communication idea.

      All this seems to us just a careful way to critically discuss our results and those of other studies, without considering the issue resolved.

      In the introduction, the authors state, "Since both chirps and positional parameters (such as size, orientation or motion) can only be detected as perturbations of the beat, and via the same electroreceptors, the inputs relaying both types of information are inevitably interfering." I disagree with this statement, which seems to be a key assumption. Both of these features certainly modulate the activity of electroreceptors, but that does not mean those modulations are ambiguous as to their source. You do not know whether the two types of modulations can be unambiguously decoded from electroreceptor afferent population activity.

      We thank the Reviewer for noting this imprecision. We have addressed the Reviewer’s concern in another reply (see above).

      My biggest issue with this manuscript is that it is much too strong in dismissing evidence that chirping correlates with context. In your behavioral observations, you found sex differences in chirping as well as differences between freely interacting and physically separated fish. Chirps tended to occur in close proximity to another fish. Your model of chirp variability found that environmental experience, social experience, and beat frequency (DF) are the most important factors explaining chirp variability. Are these not all considered behavioral or social context? Beat frequency (DF) in particular is heavily downplayed as being a part of "context" but it is a crucial part of the context, as it provides information about the identity of the fish you're interacting with. The authors show quite convincingly that the types of chirps produced do not vary with these contexts, but chirp rates do.

      We believe the “perceived claim” may be an issue of unclear writing. We have now tried to better clarify that “context” affects chirp rates, but it does not affect chirp types as much (except when beat frequency is high).  

      We have edited two statements possibly susceptible to misinterpretation: 

      (1) In the results: “It also indicates that chirp parameters such as duration and FM do not seem to be associated with any particular context in a meaningful way, other than being affected by beat frequency.”

      (2) In the discussion: the statement

      “Recordings from interacting fish pairs confirmed the absence of any significant correlation between chirp type choice and behavioral context (Figure S2) although the variance of chirp parameters appears to be significantly affected by this factor (Figure 2). This may suggest that the effect of behavioral context is mainly detectable in the number of chirps produced (Figure S1), rather than the type (Figure S2).”

      has been changed to:

      “Recordings from interacting fish pairs confirmed the absence of any significant correlation between chirp type choice and behavioral context, except for those cases characterized by higher beat frequencies  (Figure S2). This suggests that the effect of behavioral context highlighted in our factor analysis (Figure 2) is mainly due to the number of chirps produced (Figure S1), rather than their type (Figure S2).”

      Eventually, in the results we emphasize the relatively higher impact of previously unexplored factors on chirp variance: “The plot of individual chirps (Figure 2C) shows the presence of clustering around different categorical variables and it reveals that experience levels or swimming conditions are important factors affecting chirp distribution (note for instance the large central “breeding” cluster in which fish are divided and the smaller ones in which fish are free). Sender or receiver identity does not individuate any clear clustering relative to either sex (see the overlap of male_s/male_r and female_s/female_r) or social status (dominant/subordinate). Chirps labeled based on tank experience (i.e. resident vs intruder) are instead clearly separated.”.

      Further, in your playback experiments, fish responded differently to small vs. large DFs, males chirped more than females, type 2 chirps became more frequent throughout a playback, and rises tended to occur at the end of a playback. These are all examples of context-dependent behavior.

      We do note that male brown ghosts chirp more than females. But we do also say - and show in figure 8 - that males move more in proximity to and around conspecifics. We do acknowledge that chirp time-course may be different during playbacks in a type-dependent manner. But how this can support the communication hypothesis - or other alternatives - is unclear. This result could equally imply the use of different chirp types for different probing needs. Since we cannot be sure about either, we do not want to put too much emphasis to it. Eventually, the fact that “context” (here meant broadly to define different experimental situations in which social but also physical and environmental parameters are altered) affects chirping is undeniable: cluttered and non-cluttered environments do represent different contexts which differently affect chirping in conspicuous ways.

      In the results, the authors state, "Overall, the majority of chirps were produced by male subjects, in comparable amounts regardless of environmental experience (resident, intruder or equal; Figure S1A,C), social status (dominant or subordinate; Figure S1B) or social experience (novel or experienced; Figure S1D)." This is not what is shown in Figure S1. S1A shows clear differences between resident vs. intruder males, S1B shows clear differences between dominant vs. subordinate males, and S1D shows clear differences between naïve and experienced males. The analysis shown in Figure 2 would seem to support this. Indeed, the authors state, "Overall, this analysis indicated that environmental and social experience, together with beat frequency (DF) are the most important factors explaining chirp variability."

      The Reviewer is right in pointing at this imprecise reference and we are grateful for spotting this incongruence. The writing refers probably to an earlier version of the figure in which data were grouped and analyzed differently. We now edited the text and changed it to: “Overall, the majority of chirps were produced by male subjects, at rates that seemed  affected by environmental experience (resident, intruder or equal; Figure S1A,C), social status (dominant or subordinate; Figure S1B) and social experience (novel or experienced; Figure S1D).”

      The choice of chirp type varied widely between individuals but was relatively consistent within individuals across trials of the same experiment. The authors interpret this to mean that chirping does not vary with internal state, but is it not likely that the internal states of individuals are stable under stable conditions, and that individuals may differ in these internal states across the same conditions? Stable differences in communication signals between individuals are frequently interpreted as reflecting differences between those individuals in certain characteristics, which are being communicated by these signals.

      It seems here we have been unclear in the writing: while it is true that behavioral states are stable and can imply stable chirp patterning (if the two are related), since chirp types vary abruptly and in a reliable DF-dependent manner, different types of chirps are unlikely to be matched to different internal states following the same temporal order in such a reliable way (similarly repeated through consecutive trials).

      This would imply the occurrence of different internal states in rapid sequence, reliably triggered by repeated EOD ramps, regardless of whether the playback is 20 sec long or 180 sec long.

      We have edited this paragraph to better explain this: “The reliability by which the chirping response adapts to both the rate and direction of beat frequency is variable across individuals but rather stable across trials (relative to a given subject), further suggesting that chirp type variations may not reflect changes in internal states or in the animal motivation to specific behavioral displays (which are presumably subject to less abrupt variations and stereotypical patterning based on DF).”

      I am not convinced of the conclusion drawn by the analysis of chirp transitions. The transition matrices show plenty of 1-2 and 2-1 transitions occurring.

      The only groups in which 1-2 and 2-1 transitions are as frequent as 1-1 and 2-2 (being 1 and 2 the numerical IDs of the two interacting fish) are F-F pairs. This is a result of the fact that in females chirp rates are so low that within-fish-correlations end up being as low as between-fish-correlations. We believe the impression of the Reviewer could be due to the fact that these are normalized maps (see legend of Figure 5A-B).

      Further, the cross-correlation analysis only shows that chirp timing between individuals is not phase-locked at these small timescales. It is entirely possible that chirp rates are correlated between interacting individuals, even if their precise timing is not.

      We agree with the Reviewer, this is a possibility. To address this point, we did edit the results section to acknowledge that what we see may be related to the time window chosen (i.e. 4 sec):

      “More importantly, they show that - at least in the social conditions analyzed here and within small-sized time windows - chirp time series produced by different fish during paired interactions are consistently independent of each other.”

      Further, it is not clear to me how "transitions" were defined. The methods do not make this clear, and it is not clear to me how you can have zero chirp transitions between two individuals when those two individuals are both generating chirps throughout an interaction.

      We thank the Reviewer for bringing up this unclear point. We have now clarified how transitions were calculated in the method section: “The number of chirp transitions present in each recording (dataset used for Figures 1, 2, 5) was measured by searching in a string array containing the 4 chirp types per fish pair, all their possible pairwise permutations (i.e. all possible permutations of 4+4=8 elements are: 1-1, 1-2, 1-3 … 7-6, 7-7, 7-8; considering the following legend 1 = fish1 type 1, 2 = fish 1 type 2, 3 = fish1 type 3 … 6 = fish2 type 2, 7 = fish2 type 3 and 8 = fish2 rise).”.

      Zero transitions are possible if two fish (or groups of fish) do not produce chirps of all types. Only transitions of produced types can be counted.

      In the results, "Although all chirp types were used during aggressive interactions, these seemed to be rather less frequent in the immediate surround of the chirps (Figure 6A)." A lack of precise temporal correlation on short timescales does not mean there is no association between the two behaviors. An increased rate of chirping during aggression is still a correlation between the two behaviors, even if chirps and specific aggressive behaviors are not tightly time-locked.

      The Reviewer is right in pointing out the limited temporal scaling of our observations/analysis. We have now edited the last paragraph of the results related to figure 6 to include the possibility mentioned by the Reviewer: “The significantly higher extent of chirping during swimming and locomotion, consistently confirmed by 4 different approaches (PSTH, TM, CN, MDS), suggests that - although chirp-behavior correlations may exist at time-scales larger than those here considered - chirping may be linked more strongly with scanning and environmental exploration than with a particular motivational state, thus confirming findings from our playback experiments.”

      The Reviewer here remarks an important point, yet, due to space limitations, we have considered only a sub-second scale. Most playback experiments in weakly electric fish implied the use of EOD mimics for a few tens of seconds - to avoid habituation in the fish behavioral responses -  while inter-chirp intervals usually range between a few hundreds of milliseconds to seconds (depending on how often a fish would chirp). This suggested to us that a 4 second time window may not be a bad choice to start with.

      In summary, it is simply too strong to say that chirping does not correlate with context, or to claim that there is convincing evidence arguing against a communication function of chirps. Importantly, however, this does not detract from your exciting and well-supported hypothesis that chirping functions in homeoactive sensing. A given EOD behavior could serve both communication and homeoactive sensing. I actually suspect this is quite common in electric fish (both gymnotiforms and mormyrids), and perhaps in other actively sensing species such as echolocating animals. The two are not mutually exclusive.

      We agree with the Reviewer that context - broadly speaking - does affect chirping (as we mentioned above). We hope we have improved the writing and clarified that we do not dismiss communication functions of chirping, but we do lean towards electrolocation based on the considerations above made and our results.

      We do conclude the manuscript remarking that communication and electrolocation are not mutually exclusive: ”probing cues could function simultaneously as proximity signals to signal presence, deter approaches, or coordinate behaviors like spawning, if properly timed (Henninger et al., 2018).” (see the conclusion paragraph of the discussion) .

      Therein, we further add “These findings aim to stir the pot and initiate a discussion on possible alternative functions of chirps beyond their presumed communication role.”.

      With this, we hope we’ve made it clear how we intend our manuscript to be read.

      Reviewer #3 (Public Review):

      Summary:

      This important paper provides the best-to-date characterization of chirping in weakly electric fish using a large number of variables. These include environment (free vs divided fish, with or without clutter), breeding state, gender, intruder vs resident, social status, locomotion state and social and environmental experience, without and with playback experiments. It applies state-of-the-art methods for reducing the dimensionality of the data and finding patterns of correlation between different kinds of variables (factor analysis, K-means). The strength of the evidence, collated from a large number of trials with many controls, leads to the conclusion that the traditionally assumed communication function of chirps may be secondary to its role in environmental assessment and exploration that takes social context into account. Based on their extensive analyses, the authors suggest that chirps are mainly used as probes that help detect beats caused by other fish and as well as objects.

      Strengths:

      The work is based on completely novel recordings using interaction chambers. The amount of new data and associated analyses is simply staggering, and yet, well organized in presentation. The study further evaluates the electric field strength around a fish (via modelling with the boundary element method) and how its decay parallels the chirp rate, thereby relating the above variables to electric field geometry.

      The main conclusions are that the lack of any significant behavioural correlates for chirping, and the lack of temporal patterning in chirp time series, cast doubt on a primary communication goal for most chirps. Rather, the key determinants of chirping are the difference frequency between two interacting conspecifics as well as individual subjects' environmental and social experience. The paper concludes that there is a lack of evidence for stereotyped temporal patterning of chirp time series, as well as of sender-receiver chirp transitions beyond the known increase in chirp frequency during an interaction.

      These conclusions by themselves will be very useful to the field. They will also allow scientists working on other "communication" systems to perhaps reconsider and expand the goals of the probes used in those senses. A lot of data are summarized in this paper, with thorough referencing to past work.

      The alternative hypotheses that arise from the work are that chirps are mainly used as environmental probes for better beat detection and processing and object localization, and in this sense are self-directed signals. This led to their prediction that environmental complexity ("clutter") should increase chirp rate, which is fact was revealed by their new experiments. The authors also argue that waveform EODs have less power across high spatial frequencies compared to pulse-type fish, with a resulting relatively impoverished power of resolution. Chirping in wave-type fish could temporarily compensate for the lower frequency resolution while still being able to resolve EOD perturbations with a good temporal definition (which pulse-type fish lack due to low pulse rates).

      The authors also advance the interesting idea that the sinusoidal frequency modulations caused by chirps are the electric fish's solution to the minute (and undetectable by neural wetware) echo-delays available to it, due to the propagation of electric fields at the speed of light in water. The paper provides a number of experimental avenues to pursue in order to validate the non-communication role of chirps.

      We thank the reviewer for the kind assessment.

      Weaknesses:

      My main criticism is that the alternative putative role for chirps as probe signals that optimize beat detection could be better developed. The paper could be clearer as to what that means precisely, especially since beating - and therefore detection of some aspects of beating due to the proximity of a conspecific - most often precedes chirping. One meaning the authors suggest, tentatively, is that the chirps could enhance electrosensory responses to the beat, for example by causing beat phase shifts that remediate blind spots in the electric field of view.

      We agree with the Reviewer that a better and more detailed explanation of how beat processing for conspecific electrolocation may be positively affected by chirps would be important to provide. We are currently working on a follow-up manuscript in which we intend to include these aspects. For space limitations and readability we had to discard from the current manuscript a lot of results that could further clarify these issues.

      A second criticism is that the study links the beat detection to underwater object localization. The paper does not significantly develop that line of thought given their data - the authors tread carefully here given the speculative aspect of this link. It is certainly possible that the image on the fish's body of an object in the environment will be slightly modified by introducing a chirp on the waveform, as this may enhance certain heterogeneities of the object in relation to its environment. The thrust of this argument derives mainly from the notion of Fourier analysis with pulse type fish EOD waveforms (see above, and radar theory more generally), where higher temporal frequencies in the beat waveform induced by the chirp will enable a better spatial resolution of objects. It remains to be seen whether experiments can show this to be significant.

      Perhaps the Reviewer refers to the last discussion paragraph before the conclusions in which we mention the performance of pulse or wave-type EODs in electrolocation (referring here to ideas illustrated in a recent review by Crampton, 2019). We added to this paragraph a statement which could better clarify that we do not propose that chirping could enhance object electrolocation. What we mean is that, in a context in which object electrolocation occurs through wave-type EODs - given the generally lower performance of such narrow-band signals in resolving the spatial features of any object, even a 3D electric field  - chirping could improve beat detection during social encounters by increasing the amount of information obtained by the fish.

      The edited paragraph now reads: “While broadband pulse signals may be useful to capture highly complex environments rich in foliage, roots and other structures common in vegetation featuring the more superficial habitats in which pulse-type fish live, wave-type EODs may be a better choice in the relatively simpler river-bed environments in which many wave-type fish live (e.g., the benthic zone of deep river channels; Crampton, 2019). In this case, achieving a good spatial resolution is critical during social encounters, especially considering the limited utility of visual cues in these low-light conditions. In such habitats, social encounters may “electrically” be less “abrupt”, but spatially less “conspicuous” or blurred (as a 3D electric field may be). In such a scenario, chirps could serve as a means to supplement the spatial information acquired via the beat, accentuating these cues during periods of reduced resolution.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      None, my points in the original review have been properly addressed in this resubmission.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The present study provides a phylogenetic analysis of the size prefrontal areas in primates, aiming to investigate whether relative size of the rostral prefrontal cortex (frontal pole) and dorsolateral prefrontal cortex volume vary according to known ecological or social variables.

      I am very much in favor of the general approach taken in this study. Neuroimaging now allows us to obtain more detailed anatomical data in a much larger range of species than ever before and this study shows the questions that can be asked using these types of data. In general, the study is conducted with care, focusing on anatomical precision in definition of the cortical areas and using appropriate statistical techniques, such as PGLS.

      I have read the revised version of the manuscript with interest. I agree with the authors that a focus on ecological vs laboratory variables is a good one, although it might have been useful to reflect that in the title.

      I am happy to see that the authors included additional analyses using different definitions of FP and DLPFC in the supplementary material. As I said in my earlier review, the precise delineation of the areas will always be an issue of debate in studies like this, so showing the effects of different decisions in vital.

      We thank the reviewer for these positive remarks and for these very useful suggestions on the previous version of this article.

      I am sorry the authors are so dismissive of the idea of looking the models where brain size and area size are directly compared in the model, rather preferring to run separate models on brain size and area size. This seems to me a sensible suggestion.

      We agree with the reviewer 1 and the response of reviewer 3 also made it clear to us of why it was an important issue. We have therefore addressed it more thoroughly this time.

      First, we have added a new analysis, with whole brain volume included as covariate in the model accounting for regional volumes, together with the socio-ecological variables of interest. As expected given the very strong correlation across all brain measures (>90%), the effects of all socio-ecological factors disappear for both FP and DLPFC volumes when ‘whole brain’ is included as covariate. This is coherent with our previous analysis showing that the same combination of socio-ecological variables could account for the volume of FP, DLPFC and the whole brain. Nevertheless, the interpretation of these results remains difficult, because of the hidden assumptions underlying the analysis (see below).

      Second, we have clarified the theoretical reasons that made us choose absolute vs relative measures of brain volumes. In short, we understand the notion of specificity associated with relative measures, but 1) the interpretation of relative measures is confusing and 2) we have alternative ways to evaluate the specificity of the effects (which are complementary to the idea of adding whole brain volume as covariate). 

      Our goal here was to evaluate the influence of socio-ecological factors on specific brain regions, based on their known cognitive functions in laboratory conditions (working memory for the DLPFC and metacognition for the frontal pole). Thus, the null hypothesis is that socio-ecological challenges supposed to mobilize working memory and metacognition do not affect the size of the brain regions associated with these functions (respectively DLPFC and FP). This is what our analysis is testing, and from that perspective, it seems to us that direct measures are better, because within regions (across species), volumes provide a good index of neural counts (since densities are conserved), which are indicative fo the amount of computational resources available for the region. It is not the case when using relative measures, or when using the whole brain as covariate, since densities are heterogenous across brain regions (e.g. Herculano-Houzel, 2011; 2017, but see below for further details on this).

      Quantitatively, the theoretical level of specificity of the relation between brain regions and socio-ecological factors is difficult to evaluate, given that our predictions are based on the cognitive functions associated with DLPFC and FP, namely working memory and metacognition, and that each of these cognitive functions also involved other brain regions. We would actually predict that other brain regions associated with the same cognitive functions as DLPFC or FP also show a positive influence of the same socioecological variables. Given that the functional mapping of cognitive functions in the brain remains debated, it is extremely difficult to evaluate quantitatively how specific the influence of the socio-ecological factors should be on DLPFC and FP compared to the rest of the brain, in the frame of our hypothesis.

      Critically, given that FP and DLPFC show a differential sensitivity to population density, a proxy for social complexity, and that this difference is in line with laboratory studies showing a stronger implication of the FP in social cognition, we believe that there is indeed some specificity in the relation between specific regions of the PFC and socioecological variables. Thus, our results as a whole seem to indicate that the relation between prefrontal cortex regions and socio-ecological variables shows a small but significant level of specificity. We hope that the addition of the new analysis and the corresponding modifications of the introduction and discussion section will clarify this point.

      Similarly, the debate about whether area volume and number of neurons can be equated across the regions is an important one, of which they are a bit dismissive.

      We are sorry that the reviewer found us a bit dismissive on this issue, and there may have been a misunderstanding.

      Based on the literature, it is clearly established that for a given brain region, area volume provides a good proxy for the number of neurons, and it is legitimate to generalize this relation across species if neuronal densities are conserved for the region of interest (see for example Herculano-Houzel 2011, 2017 for review). It seems to be the case across primates because cytoarchitectonic maps are conserved for FP and DLPFC, at least in humans and laboratory primates (Petrides et al, 2012; Sallet et al, 2013; Gabi et al, 2016; Amiez et al, 2019). But we make no claim about the difference in number of neurons between FP and DLPFC, and we never compared regional volumes across regions (we only compared the influence of socio-ecological factors on each regional volume), so their difference in cellular density is not relevant here. As long as the neuronal density is conserved across species but within a region (DLPFC or FP), the difference in volume for that region, across species, does provide a reliable proxy for the influence of the socioecological regressor of interest (across species) on the number of neurons in that region.

      Our claims are based on the strength of the relation between 1) cross-species variability in a set of socio-ecological variables and 2) cross-species variability in neural counts in each region of interest (FP or DLPFC). Since the effects of interest relate to inter-specific differences, within a region, our only assumption is that the neural densities are conserved across distinct species for a given brain region. Again (see previous paragraph), there is reasonable evidence for that in the literature. Given that assumption, regional volumes (across species, for a given brain region) provide a good proxy for the number of neurons. Thus, the influence of a given socio-ecological variable on the interspecific differences in the volume of a single brain region provides a reliable estimate of the influence of that socio-ecological variable on the number of neurons in that region (across species), and potentially of the importance of the cognitive function associated with that region in laboratory conditions. None of our conclusions are based on direct comparison of volumes across regions, and we only compared the influence of socioecological factors (beta weights, after normalization of the variables).

      Note that this is yet another reason for not using relative measures and not including whole brain as covariate in the regression model: Given that whole brain and any specific region have a clear difference in density, and that this difference is probably not conserved across species, relative measures (or covariate analysis) cannot be used as proxies for neuronal counts (e.g. Herculano-Houzel, 2011). In other words, using the whole brain to rescale individual brain regions relies upon the assumption that the ratios of volumes (specific region/whole brain) are equivalent to the ratios of neural counts, which is not valid given the differences in densities.

      Nevertheless, I think this is an important study. I am happy that we are using imaging data to answer more wider phylogenetic questions. Combining detailed anatomy, big data, and phylogenetic statistical frameworks is a important approach.

      We really thank the reviewer for these positive remarks, and we hope that this study will indeed stimulate others using a similar approach.

      Reviewer #2 (Public Review):

      In the manuscript entitled "Linking the evolution of two prefrontal brain regions to social and foraging challenges in primates" the authors measure the volume of the frontal pole (FP, related to metacognition) and the dorsolateral prefrontal cortex (DLPFC, related to working memory) in 16 primate species to evaluate the influence of socio-ecological factors on the size of these cortical regions. The authors select 11 socio-ecological variables and use a phylogenetic generalized least squares (PGLS) approach to evaluate the joint influence of these socio-ecological variables on the neuro-anatomical variability of FP and DLPFC across the 16 selected primate species; in this way, the authors take into account the phylogenetic relations across primate species in their attempt to discover the the influence of socio-ecological variables on FP and DLPF evolution.

      The authors run their studies on brains collected from 1920 to 1970 and preserved in formalin solution. Also, they obtained data from the Mussée National d´Histoire Naturelle in Paris and from the Allen Brain Institute in California. The main findings consist in showing that the volume of the FP, the DLPFC, and the Rest of the Brain (ROB) across the 16 selected primate species is related to three socio-ecological variables: body mass, daily traveled distance, and population density. The authors conclude that metacognition and working memory are critical for foraging in primates and that FP volume is more sensitive to social constraints than DLPFC volume.

      The topic addressed in the present manuscript is relevant for understanding human brain evolution from the point of view of primate research, which, unfortunately, is a shrinking field in neuroscience. But the experimental design has two major weak points: the absence of lissencephalic primates among the selected species and the delimitation of FP and DLPFC. Also, a general theoretical and experimental frame linking evolution (phylogeny) and development (ontogeny) is lacking.

      We are sorry that the reviewer still believes that these two points are major weaknesses.

      - We have added a point on lissencephalic species in the discussion. In short, we acknowledge that our work may not be applied to lissencephalic species because they cannot be studied with our method, but on the other hand, based on laboratory data there is no evidence showing that the functional organization of the DLPFC and FP in lissencephalic primates is radically different from that of other primates (Dias et al, 1996; Roberts et al, 2007; Dureux et al, 2023; Wong et al, 2023). Therefore, there is no a priori reason to believe that not including lissencephalic primates prevents us from drawing conclusions that are valid for primates in general. Moreover, as explained in the discussion, including lissencephalic primates would require using invasive functional studies, only possible in laboratory conditions, which would not be compatible with the number of species (>15) necessary for phylogenetic studies (in particular PGLS approaches). Finally, as pointed out by the reviewer, our study is also relevant for understanding human brain evolution, and as such, including lissencephalic species should not be critical to this understanding.

      - In response to the remarks of reviewer 1 on the first version of the manuscript, we had included a new analysis in the previous version of the manuscript, to evaluate the validity of our functional maps given another set of boundaries between FP and DLPFC. But one should keep in mind that our objective here is not to provide a definitive definition of what the regions usually referred to as DLPFC and FP should be from an anatomical point of view. Rather, as our study aims at taking into account the phylogenetic relations across primate species, we chose landmarks that enable a comparison of the volume of cortex involved in metacognition (FP) and working memory (DLPFC) across species. We have also updated the discussion accordingly.

      We agree that this is a difficult point and we have always acknowledged that this was a clear limitation in our study. In the light of the functional imaging literature in humans and non-human primates, as well as the neurophysiological data in macaques, defining the functional boundary between FP and DLPFC remains a challenging issue even in very well controlled laboratory conditions. As mentioned by reviewer 1, “the precise delineation of the areas will always be an issue of debate in studies like this, so showing the effects of different decisions in vital”. Again, an additional analyses using different boundaries for FP and DLPFC was included in the supplementary material to address that issue. Now, we are not aware of solid evidence showing that the boundaries that we chose for DLPFC vs FP were wrong, and we believe that the comparison between 2 sets of measures as well as the discussion on this topic should be sufficient for the reader to assess both the strength and the limits of our conclusion. That being said, if the reviewer has any reference in mind showing better ways to delineate the functional boundary between FP and DLPFC in primates, we would be happy to include it in our manuscript.

      - The question of development, which is an important question per se,  is neither part of the hypothesis nor central for the field of comparative cognition in primates. Indeed, major studies in the field do not mention development (e.g. Byrne, 2000; Kaas, 2012; Barton, 2012). De Casien et al (2022) even showed that developmental constraints are largely irrelevant (see Claim 4 of their article): [« The functional constraints hypothesis […] predicts more complex, ‘mosaic’ patterns of change at the network level, since brain structure should evolve adaptively and in response to changing environments. It also suggests that ‘concerted’ patterns of brain evolution do not represent conclusive evidence for developmental constraints, since allometric relationships between developmentally linked or unlinked brain areas may result from selection to maintain functional connectivity. This is supported by recent computational modeling work [81], which also suggests that the value of mosaic or concerted patterns may fluctuate through time in a variable environment and that developmental coupling may not be a strong evolutionary constraint. Hence, the concept of concerted evolution can be decoupled from that of developmental constraints »].

      Finally, when studies on brain evolution and cognition mention development, it is generally to discuss energetic constraints rather than developmental mechanisms per se (Heldstab et al 2022 ; Smaers et al, 2021;  Preuss & Wise, 2021; Dunbar & Schutz, 2017; MacLean et al, 2012. Mars et al, 2018; 2021). Therefore, development does not seem to be a critical issue, neither for our article nor for the field.

      Reviewer #3 (Public Review):

      This is an interesting manuscript that addresses a longstanding debate in evolutionary biology - whether social or ecological factors are primarily responsible for the evolution of the large human brain. To address this, the authors examine the relationship between the size of two prefrontal regions involved in metacognition and working memory (DLPFC and FP) and socioecological variables across 16 primate species. I recommend major revisions to this manuscript due to: 1) a lack of clarity surrounding model construction; and 2) an inappropriate treatment of the relative importance of different predictors (due to a lack of scaling/normalization of predictor variables prior to analysis).

      We thank the reviewer for his/her remarks, and for the clarification of his /her criticism regarding the use of relative measures. We are sorry to have missed the importance of this point in the first place. We also thank the reviewer for the cited references, which were very interesting and which we have included in the discussion. As the reviewer 1 also shared these concerns, we wrote a detailed response to explain how we addressed the issue above.

      First, we did run a supplementary analysis where whole brain volume was added as covariate, together with socio-ecological variables, to account for the volume of FP or DLPFC. As expected given the very high correlation across all 3 brain measures, none of the socio-ecological variables remained significant. We have added a long paragraph in the discussion to tackle that issue. In short, we agree with the reviewer that the specificity of the effects (on a given brain region vs the rest of the brain) is a critical issue, and we acknowledge that since this is a standard in the field, it was necessary to address the issue and run this extra-analysis. But we also believe that specificity could be assessed by other means: given the differential influence of ‘population density’ on FP and DLPFC, in line with laboratory data, we believe that some of the effects that we describe do show specificity. Also, we prefer absolute measures to relative measures because they provide a better estimate of the corresponding cognitive operation, because standard allometric rules (i.e., body size or whole brain scaling) may not apply to the scaling and evolution of FP and DLPFC in primates.. Indeed, given that we use these measures as proxies of functions (metacognition for FP and working memory for DLPFC), it is clear that other parts of the brain should show the same effect since these functions are supported by entire networks that include not only our regions of interest but also other cortical areas in the parietal lobe. Thus, the extent to which the relation with socio-ecological variables should be stronger in regions of interest vs the whole brain depends upon the extent to which other regions are involved in the same cognitive function as our regions of interest, and this is clearly beyond the scope of this study. More importantly, volumetric measures are taken as proxies for the number of neurons, but this is only valid when comparing data from the same brain region (across species), but not across brain regions, since neural densities are not conserved. Thus, using relative measures (scaling with the whole brain volume) would only work if densities were conserved across brain regions, but it is not the case. From that perspective, the interpretation of absolute measures seems more straightforward, and we hope that the specificity of the effects could be evaluated using the comparison between the 3 measures (FP, DLPFC and whole brain) as well as the analysis suggested by the reviewer. We hope that the additional analysis and the updated discussion will be sufficient to cover that question, and that the reader will have all the information necessary to evaluate the level of specificity and the extent to which our findings can be interpreted.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      In my previous review of the present manuscript, I pointed out the fact that defining parts, modules, or regions of the primate cerebral cortex based on macroscopic landmarks across primate species is problematic because it prevents comparisons between gyrencephalic and lissencephalic primate species. The authors have rephrased several paragraphs in their manuscript to acknowledge that their findings do apply to gyrencephalic primates.

      I also said that "Contemporary developmental biology has showed that the selection of morphological brain features happens within severe developmental constrains. Thus, the authors need a hypothesis linking the evolutionary expansion of FP and DLPFC during development. Otherwise, the claims form the mosaic brain and modularity lack fundamental support". I insisted that the author should clarify their concept of homology of cerebral cortex parts, modules, or regions cross species (in the present manuscript, the frontal pole and the dorsolateral prefrontal cortex). Those are not trivial questions because any phylogenetic explanation of brain region expansion in contemporary phylogenetic and evolutionary biology must be rooted in evolutionary developmental biology. In this regard, the authors could have discussed their findings in the frame of contemporary studies of cerebral cortex evolution and development, but, instead, they have rejected my criticism just saying that they are "not relevant here" or "clearly beyond the scope of this paper".

      The question of development, which is an important question per se, is neither part of the hypothesis nor central for the field of comparative cognition in primates. Indeed, the major studies in the field do not mention development and some even showed that developmental constraints were not relevant (see De Casien et al., 2022 and details in our response to the public review). When studies on brain evolution and cognition mention development, it is generally to discuss energetic constraints rather than developmental mechanisms per se (Heldstab et al 2022 ; Smaers et al, 2021;  Preuss & Wise, 2021; Dunbar & Schutz, 2017;  MacLean et al, 2012. Mars et al, 2018; 2021).

      If the other reviewers agree, the authors are free to publish in eLife their correlations in a vacuum of evolutionary developmental biology interpretation. I just disagree. Explanations of neural circuit evolution in primates and other mammalian species should tend to standards like the review in this link: https://royalsocietypublishing.org/doi/full/10.1098/ rstb.2020.0522

      In this article, Paul Cizek (a brilliant neurophysiologist) speculates on potential evolutionary mechanisms for some primate brain functions, but there is surprisingly very little reference to the existing literature on primate evolution and cognition. There is virtually no mention of studies that involve a large enough number of species to address evolutionary processes and/or a comparison with fossils and/or an evaluation of specific socio-ecological evolutionary constraints. Most of the cited literature refers to laboratory studies on brain anatomy of a handful of species, and their relevance for evolution remains to be evaluated. These ideas are very interesting and they could definitely provide an original perspective on evolution, but they are mostly based on speculations from laboratory studies, rather than from extensive comparative studies. This paper is interesting for understanding developmental mechanisms and their constraints on neurophysiological processes in laboratory conditions, but we do not think that it would fit it in the framework of our paper as it goes far beyond our main topic.

      Reviewer #3 (Recommendations For The Authors):

      Yes, I am suggesting that the authors also include analyses with brain size (rather than body size) as a covariate to evaluate the effects of other variables in the model over and above the effect on brain size. In a very simplified theoretical scenario: two species have the same body sizes, but species A has a larger brain and therefore a larger FP. In this case, species A has a larger FP because of brain allometric patterns, and models including body size as a covariate would link FP size and socioecological variables characteristic of species A (and others like it). However, perhaps the FP of species A is actually smaller than expected for its brain size, while the FP of species B is larger than expected for its brain size.

      As explained in our response to the public review, we did run this analysis and we agree with the reviewer’s point from a practical point of view: it is important to know the extent to which the relation with a set of socio-ecological variables is specific of the region of interest, vs less specific and present for other brain regions. Again, we are sorry to not have understood that earlier, and we acknowledge that since it is a standard in the field, it needs to be addressed thoroughly.

      We understand that the scaling intuition, and the need to get a reference point for volumetric measures, but here the volume of each brain region is taken as a proxy for the number of neurons and therefore for the region’s computational capacities. Since, for a given brain region (FP or DLPFC) the neural densities seem to be well conserved across species, comparing regional volumes across species provides a good proxy for the contrast (across species) in neural counts for that region. All we predicted was that for a given brain region, associated with a given cognitive operation, the volume (number of neurons) would be greater in species for which socio-ecological constraints potentially involving that specific cognitive operation were greater. We do not understand how or why the rest of the brain would change this interpretation (of course, as discussed just above, beyond the question of specificity). And using whole brain volume as a scaling measure is problematic because the whole brain density is very different from the density of these regions of the prefrontal cortex (see above for further details). Again, we acknowledge that allometric patterns exist, and we understand how they can be interpreted, but we do not understand how it could prove or disprove our hypothesis (brain regions involved in specific cognitive operations are influenced by a specific set of socio-ecological variables). When using volumes as a proxy for computational capacities, the theoretical implications of scaling  procedures might be problematic. For example, it implies that the computational capacities of a given brain region are scaled by the rest of the brain. All other things being equal, the computational capacities of a given brain region, taken as the number of neurons, should decrease when the size of the rest of the brain increases. But to our knowledge there is no evidence for that in the literature. Clearly these are very challenging issues, and our position was to take absolute measures because they do not rely upon hidden assumptions regarding allometric relations and their consequence on cognition.

      But since we definitely understand that scaling is a reference in the field, we have not only completed the corresponding analysis (including the whole brain as a covariate, together with socio-ecological variables) but also expended the discussion to address this issue in detail. We hope that between this new analysis and the comparison of effects between non-scaled measures of FP, DLPFC and the whole brain, the reader will be able to judge the specificity of the effect.

      Models including brain (instead of body) size would instead link FP size and socioecological variables characteristic of species B (and others like it). This approach is supported by a large body of literature linking comparative variation in the relative size of specific brain regions (i.e., relative to brain size) to behavioral variation across species - e.g., relative size of visual/olfactory brain areas and diurnality/nocturnality in primates (Barton et al. 1995), relative size of the hippocampus and food caching in birds (Krebs et al. 1989).

      Barton, R., Purvis, A., & Harvey, P. H. (1995). Evolutionary radiation of visual and olfactory brain systems in primates, bats and insectivores. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 348(1326), 381-392.

      Krebs, J. R., Sherry, D. F., Healy, S. D., Perry, V. H., & Vaccarino, A. L. (1989). Hippocampal specialization of food-storing birds. Proceedings of the National Academy of Sciences, 86(4), 1388-1392. 

      We are grateful to the reviewer for mentioning these very interesting articles, and more generally for helping us to understand this issue and clarify the related discussion. Again, we understand the scaling principle but the fact that these methods provide interesting results does not make other approaches (such as ours) wrong or irrelevant. Since we have used both our original approach and the standard version as requested by the reviewer, the reader should be able to get a clear picture of the measures and of their theoretical implications. We sincerely hope that the present version of the paper will be satisfactory, not only because it is clearer, but also because it might stimulate further discussion on this complex question.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity:

      In this work, Anandi et al. propose an ex vivo model that can be used to recapitulate the in vivo structure of the tumor microenvironment, which allows the observation of morphological and functional changes in tumor cells in a 3D context. Due to the ability of cancer cells to induce hypoxic condition within the TME, authors propose this model to tackle the study of metastasis initiation in vitro. The proposed system successfully displays an ischemic gradient with cells accessing nutrients at different rates, similarly to what happens in solid tumors in vivo. Moreover, in line with the literature, tumor cell migration and invasiveness were promoted by hypoxic conditions. Authors also show that the system could be used to study cell-cell interaction, as co-cultures of macrophages and cancer cells were successfully cultured in the system and studied in the context of tumor hypoxia.

      The study proposed is interesting and timely, as cancer cell invasion remains an important area of tumor biology that needs further exploration. The methodology is well explained and proposed in a linear flow. However, the work could benefit from some improvement and changes, as well as from additional experiments. On an important note, authors do not properly refer to the current literature, as several studies on 3D culture systems/chambers have already been studied and developed to investigate the tumor microenvironment, but they are not cited nor referred to in the manuscript. Authors should refer to such literature and explain how this system is different and adds to it.

      Major comments:

      1. Authors propose this method to study the TME in 3D. When culturing cells with different ECM (Collagen vs. matrigel+collagen I) authors should take into consideration the effect of these materials on different cell types. It is known how collagen and matrigel can differently influence the polarization and phenotype of stromal cells (particularly in regards of fibroblasts - major components of solid cancers - e.g., PMID 21029367), therefore these points should be addressed at least in the discussion.

      We completely agree with the reviewer so we added this point (and reference) to our manuscript's introduction (lines 45-46) and discussion (lines 442-445).

      1. In addition to the previous comment, matrigel and collagen are also known to alter cancer cell phenotype (e.g., PMID 21029367) and this point should be taken into account.

      We completely agree with the reviewer so we added this point (and reference) to our discussion in the main text (lines 442-445).

      1. The need for novel 3D systems to study different aspects of the TME in vitro/ex vivo are certainly needed, however they are not inexistent. Authors should address this in the text, as the current literature already started to propose 3D models (including models involving matrigel/collagen in combination with other materials). 3D chambers (of different materials, and with different aims) are being used and designed and can be found in the literature. These works are not cited in the current study at all. For instance, Anguiano 2017; Cavo et al. 2018; Anguiano et al., 2020; Sodek et al. 2008, etc.

      We agree so we have now added those references to the main text (line 56-57).

      1. Even though the focus is on hypoxia and the achievement of an ischemic gradient in the chamber to allow resemblance of an in vivo tumor, the authors write in line 123 (and also in other parts of the text) that: "these results show that consumer cells in the 3MIC form ischemic gradients that can influence the local metabolic microenvironment experienced by neighboring tumor spheroids". The addition of the use of the PMDS membrane partly supports the claim, however it would be interesting to check whether this is indeed true, by measuring for example the levels of certain metabolites (e.g., glucose, glutamine, glutamate, lactate, aspartate) reached with the system, or pH levels, etc., in presence or absence of the hypoxic gradient/consumer cells.

      This is an insightful question and defining the exact composition of this complex ischemic microenvironment is a major ambition of our lab, so we completely agree with the reviewer's comment. However, as the 3MIC was designed specifically for microscopy, measuring specific metabolites it is unfortunately outside its capabilities.

      Having said that, and following the spirit of the reviewer's comments, we used microscopy to measure additional signs of metabolic stress. Specifically, we used fluorescent probes to detect changes in intracellular pH (pHrodo, Molecular Probes) and in Redox status (CellROX, Molecular Probes) and glucose (2-NBDG - a fluorescent D-glucose analog). As we explain below, we found exciting results from our pH measurements which led us to additional functional experiments. We are very excited about these new results, and we thank the reviewer for encouraging these experiments. These new results also provide evidence that other parameters in ischemia - and not just hypoxia - change along the 3MIC and can have an impact on tumor cells.

      1. When looking at the references presented in the manuscript, authors quote too many review articles, rather than scientific articles. Given the extremely wide literature on cancer metastasis, more of these works should be quoted in this context. For example: in the introduction - text lines 27-38 - only 4 references are research articles, out of 14 references presented in that paragraph.

      The reviewer is correct in pointing this out. Our intention was to use reviews on topics that are well established where citing primary research could be unfair to other contributions. But again, we agree with the reviewer, so we replaced reviews with primary research articles in multiple locations along the manuscript.

      1. As authors showed successfully how macrophages and cancer cells can interact in the chamber, recapitulating cell interactions in an in vivo context, it would be very interesting to see whether different consumer cells would induce similar or different changes to the spheroids and the ischemic gradient (for instance using stromal cells or non-tumor cell lines as consumers, instead of cancer cells only), as we know how tumors are a multitude of cell subsets, each contributing to nutrient production, oxygen consumption, etc.

      This is a great point. We thought about that very same point and conducted several experiments to test the combinatorial effects of different consumer cells. In broad terms, we did not observe major differences when using different consumer cells. However, we agree that this system may provide compelling opportunities to test the effect of different cell types on each other. Still, for consistency and ease, we conducted most of our experiments using the same cells in both consumers and in spheroids.

      In the resubmitted version, we added an experiment where we looked at the sprouting of SVEC endothelial cells using the same cells or Lung KPs as consumers (Fig. S6A).

      Minor comments:

      1. Studying the early metastatic development/seeding remains a timely quest, however authors should refer to several new studies in which various mouse models are used to study metastasis from different points of view (e.g., PMID 25822788; PMID 36991128; PMID 25171411; PMID 25633981; PMID 34632412; PMID 35921474; etc). Or line 41, three reviews are quoted (refs 27-29), whilst there are several works that could be quoted on metabolism in solid tumors also in the context of metastasis (e.g., PMID 36522548; PMID: 26719539, PMID 34303764). This comment applies to the rest of the text.

      We thank the reviewer for their help in processing this vast literature. We were aware of most of those works but some were new to us so thanks again! We have now added these references.

      1. The order of the references is not properly presented. In the introduction, the first reference is n. 4 (text line 22), instead of it being reference 1. Moreover, the subsequent literature ref. is number 12 and not number 2. Please revise the order of the references, and position them within the bibliography from first cited to last cited in the text.

      We apologize for this confusion. We have now revised all the references and we hope they are correctly formatted and numbered. The origin of this confusion may have been that we had references in the abstract thus their numbering started there rather than from the introduction. To avoid further confusions, we removed all references from the abstract.

      1. Lines 98-104. It would be helpful to the reader to define here what these consumer cells are. Even though it is explained in the methods that the consumer cells are cancer cells, it is important to make it clear in the text, as it could be misleading at times.

      We agree with the reviewer although we did not mean to be misleading. As mentioned above, we chose to use the same cells for both: consumers and spheroids and we have now added a new figure to illustrate this point (Fig S6A). Following the advice, we are also including additional text to make the message clearer (lines 107-109).

      1. The English grammar and spelling should be revised in some parts, as well as typos and missing words throughout the text (e.g., Line 38, the word "interraction" is misspelled and should be corrected with "interaction". Line 49, the first sentence seems incomplete. Lines 68-69 should be revised as the sentences do not flow well together, probably due to a missing word. In line 77 it should be "presents". Line 341 should be "cannot be explained").

      We apologize for these typos and mistakes. We have tried our best to avoid these type of errors in the new manuscript version.

      Referees cross-commenting

      I find the comments from the other reviewers to be in line with one another as well as with my general assessment. The major and comments of all reviewers should be addressed. The minor comments should be taken into account as well, as they would render the text and the figures more precise. I suggest that 3-6 months to complete the revision process is an appropriate time frame for the authors.

      Finally, I strongly encourage the authors to add in the discussion the points and questions raised by all reviewers, as well as to improve the bibliography in terms of organisation, linearity, and state of the art.

      Significance:

      General assessment:

      The work by Anandi et al. offers an additional tool to tackle the issue of studying the tumor microenvironment, in a 3D culture system.

      The authors show a model that can be used to study tumor hypoxia in 3D, offering the possibility to study the TME in a more in vivo-like manner without turning to mice models. The development of new tools to study the TME avoiding the excessive use of animals is definitely a timely quest. In addition, the system has the potential to be applied to tackle different biological questions, as the methodology is well explained and could be suitable to many other fields of cancer biology (e.g., drug resistance or uptake). The work is overall presented in a clear way and the methodology is explained thoroughly and it has the potential to be a useful tool for the study of cancer hypoxia.

      However, authors should address how their method could differently impact other cells when applied to other systems. As one major claim is the potential use of this methodology to study the TME, it should be taken into consideration how stromal cells are strongly affected by the ECM, and how certain settings or features of the system may impact such cell populations. In addition, the work does not properly refer to the current state of the art. As other studies started to propose 3D systems for the study of TME and cell-cell interactions - besides organoids - the authors should cite these works and frame their own study in a more appropriate context, pointing out differences with the current 3D chambers available, the advantages of one vs the other, and so on.

      Advance: the study adds to the current literature as the study of tumor hypoxia in 3D remains a complicated issue. The interesting co-culture settings with macrophages suggests potential uses of this model to study cell-cell interactions.

      Audience: the study is very methodological and offers a tool that could be used by cancer biologists - and maybe by other biology fields.

      Reviewer #2

      Evidence, reproducibility and clarity:

      Summary

      Anandi and colleagues present a manuscript describing a nice assay for exploring the progressive effect of metabolic depletion of the nutrients and oxygen on the invasion of cancer cells. This builds upon and extends a device that they previously described - MEMIC - and now enables 3D analysis of small numbers of cells. The key to their method is the inclusion of a layer of consumer cells that deplete oxygen and nutrients. Using this tool, they demonstrate that depleted environments promote invasive behavior and lower cell-cell adhesion. This is related to the nutrient-deprived and hypoxic environments found in the center of many tumors. Cellular Potts Modelling is used to explore ideas around the cooperation between reduced cell-cell adhesion and increase ECM adhesion in promoting invasion. Overall, this is a well-constructed manuscript that will be of interest to cell biologists and cancer biologists.

      Major comments

      I realize this work is submitted to review commons and this complicates the recommendation regarding publication. My view is that the 'more prestigious' journals would require greater mechanistic insight, but that the work could find a suitable place in other members of the review commons stable. My comments are divided into those essential for any journal and those that might be journal dependent.

      We hope that the mechanistic experiments added to our new manuscript version will appeal the reviewer and merit publication in any of the review commons journals.

      Essential regardless of journal

      1. Many of the figures lack information about the number of spheroids analyzed and from how many biological repeats they are derived.

      We have now added this information to all our experiments. This information can be found in the figures and on the figure legends.

      1. The authors need to provide citations for their assertion that only gases can cross the PDMS, but not other small metabolites. They should also comment on whether the build-up of CO2 might be relevant.

      We have now added the original reference where they describe PDMS's properties (Cox and Dunn, 1986).

      The point raised about CO2 is very interesting, but we do not expect a buildup of this gas. When using PDMS, CO2 would not accumulate as PDMS membranes are permeable to gases - including CO2. When using glass covers, the lack of oxygen should minimize CO2 production as hypoxic cells will not be able to conduct oxidative phosphorylation and produce lactic acid instead.

      1. The data on the directionality of migration when consumers are present are not significant and doesn't warrant the speculation in lines 186-189.

      Following the reviewer's advice we have removed this speculation.

      1. The ECM degradation in Figure 3 should be quantified.

      We agree. We added additional quantifications for the gelatin degradation assay. We also highlight the quantification we already had of the ECM degradation assessed via DQ collagen. Those data can be found in the new figures 4 and S4, respectively.

      1. Do the authors have evidence that the hypoxia-exposed cells are more adhesive to ECM. This is central to their Potts model and I could not locate the supporting experimental data. If not, then the Potts model should include matrix proteolysis, which they do have data about.

      Again, this is a very insightful observation, and we completely understand this confusion. We think that this may part of the inherent challenge of trying to condense biological problems into analogies or "metaphors" when using physical/mathematical models.

      The algorithm in a Cellular Potts model (CPM) tries to minimize the energy of the system (the entire group of cells/ECM that we are modelling). This global energy reduction is achieved by minimizing local energies in the cell-cell and cell-ECM interactions. The way the algorithm executes this minimization, is by always (probability p=1) accepting a configuration that decrease the energy while restricting the configurations that lead to higher energies (with a probability of p = e-DHT) where DH is the difference between the current and previous energy.

      So, the only thing the model is really doing is to increase the likelihood that cells are in a more "comfortable" environment - i.e. that the energy from the interactions with their neighboring cells and ECM is as low as possible. For example, if cell 1 and cell 2 adhere strongly but not to cell 3, in a CPM this is modelled as a low DH between cell 1 and 2 and a higher DH with cell 3. Conversely, when people model cells better at "invading" into a new "territory" they choose a lower energy between that cell type and that type of substratum.

      In other words, our CPM does not "care" whether ischemic cells invade the ECM because they create space through increased proteolysis or because they are more adherent to the ECM. These two scenarios are the same in a CPM and it is consistent with previous CPM models of similar scenarios (e.g.: PMID: 18835895, 33933478, 26436883, 23596570).

      We have now reworded the description of the model on the main text, and we added an illustration hoping to make this aspect of the model clearer (Fig. S4F).

      1. Is the down-regulation of E-cadherin transcriptional - i.e. is the mRNA level reduced?

      This is a great question. After the reviewer posed this question, we looked at out data and we concluded E-cad's downregulation is transcriptional. Assessing local mRNA levels in the 3MIC is challenging. However, our E-Cad reporter (pHAGE-E-cadherin-RFP, addgene #79603) is a red fluorescent protein driven by the CDH1 (E-Cad) reporter. RFP levels decrease with ischemia indicating that this regulation occurs at the promoter/transcriptional level. We now added this point to the revised manuscript (lines 259-261). We thank the reviewer for this insight!

      1. The title of figure 6 is misleading. The authors do not demonstrate chemoresistance in terms of cell survival or cell proliferation, which is how the term is normally used. The authors should measure cell number, proliferation, and cell viability. The data presented in the Supplementary Figure are inadequate with no quantification. The FUCCI reporter cells would be a good tool for this. Also, why use 150nM paclitaxel when the IC50 is 817nM? This seems bizarre. Lastly, there is a typo in the figure that suggest 150mM drug was used.

      We apologize if these experiments caused confusion. Our intention was to look at the anti-migratory effects of Taxol-related drugs. As such, we first determined the concentrations at which the drug was lethal to our cells (this is the LD50 of ~800nM). Then, we tested if lower concentrations - which we knew where not lethal - would affect cell migration, protrusions, etc. Hence the 30-150nM range we used in our experiments.

      We have now completely rewritten this section hoping that our approach is now clearer. We have also changed the title of the section and the figure legend to clarify that we are studying the effects of Taxol as anti-motility drug rather than its effects on cell survival and proliferation (now Fig. 7). Finally, we have now fixed the 150mM/150nM typo in the figure legend.

      Journal dependent

      1. The authors have not excluded that either changes in nutrients, or even a pro-invasive factor, produced by consumer cells are necessary for the increased invasion. They have only shown that they are not sufficient. The authors should perform a series of experiments comparing hypoxic conditions with normal media and normoxic conditions with nutrient depleted/condition media by prior culturing of KP cancer cells.

      This is a great point. We actually do not want or propose to exclude this possibility. So, we have now added text to clarify this issue (lines 431-435).

      In fact, we would be thrilled if there is a pro-invasive factor. If that would be the case, our results indicate this factor is only effective under ischemia. Because the same consumer cells do not have an effect on the same type of tumor spheroids under well-nurtured environments. In addition, our new pH measurements and perturbations experiments agree with this reviewer's intuition about additional factors being key in the increased invasion (see new Figure 2). We are very excited about these new results, and we hope this reviewer will be excited too.

      1. What is the oxygen sensor for increased invasion? PHD1-3 would be a good place to start looking. Is the PHD2-HIF axis important? Do VHL mutant cells still show responses to the consumer cells?

      Following the reviewer's feedback, we generated isogenic HIF1A KO cell lines to study whether HIF1A was directly needed in the invasion of tumor spheroids within the 3MIC. We complemented these loss-of-function experiments with For HIF1A gain-of-function using pharmacological interventions that stabilize HIF1A under normal oxygen levels (CoCl2 and DMOG).

      As shown in the new figure 2, these experiments mirrored our hypoxia experiments: HIF1A activity was not sufficient but it was required to drive the invasion of ischemic spheroids. We think that these new results are particularly interesting when taken together with our new pH-perturbation experiments. Briefly, our new experiments results show that in addition to the requirement of hypoxia/HIF1A, media acidity also has a strong effect on spheroid invasion. More excitingly, a drop in pH is sufficient to dramatically increase invasion - even in control well-nurtured spheroids. We think that the effects of pH and hypoxia are linked. HIF1A activation and hypoxia the increase glycolysis and thus lactic acid secretion. We speculate that this glycolytic switch is where hypoxia is important, but it is not sufficient because under well-perfused conditions (e.g. healthy tissue or large culture media volume) lactic acid levels may not buildup enough to significantly lower the extracellular pH. In contrast, under poor perfused conditions (3MIC and solid tumors) or if we flood cell cultures with lactic acid, the media's pH drops dramatically (Fig. 2).

      1. If they include both spheroids of endothelial cells and cancer cells, will the resulting protrusions in hypoxia grow towards each other? Would macrophages enhance this process?

      We agree with the reviewer this is an interesting question and we have anecdotally observed this effect. In the manuscript, we used these chimeric endothelial/tumor spheroids rather than separate ones (Fig. 6E). We do not find strong evidence that their protrusions grew towards each other, but this is something that we would like to explore in the future with more detail.

      Significance:

      The main advance is technical, as many previous studies have related hypoxia to increased cancer cell invasion, which the authors correctly acknowledge and cite. It is scholarly study, which will be of interest to many readers, and the method reported is likely to be adopted by several groups.

      Reviewer #3

      Evidence, reproducibility and clarity:

      In this work, Anandi et al., developed a cell culture system to live image the initial transformation of cells in deprivation of oxygen and nutrients in a 3D context. Using this system, 3MIC, they were able to create oxygen and nutrient gradients to simulate ischemic conditions that arise deep within tumors and that typically precede metastasis. With the 3MIC system they validated that ischemia triggers cell migration and invasion of tumor cells. In addition, 3MIC also allowed them to study the interaction of tumor spheroids with stromal cells such as macrophages and endothelial cells. Interestingly, the authors showed that co-culturing tumor spheroids with stromal cells increased the pro-metastatic features induced by ischemia conditions. Lastly, using 3MIC allowed the authors to discern that a poor paclitaxel response in ischemic-like cells is driven by intrinsic cellular resistance rather than due to lower drug concentration.

      Overall, the work is very well-written, and the results are consisting, convincing and support the conclusions. The methods are clear and complete and allow the reproducibility of the experiments. The experiments are adequately replicated and statistical analyses are well described. However, I have few suggestions to improve the impact of the manuscript:

      1. The authors conclude that 3MIC results in the accumulation of lactic acid and nutrient deprivation in an increasing manner when moving far from the opening site. Is there a way to actually show this? So far, the authors employ a hypoxia sensor only. A sensor for internal pH or other method for nutrient deprivation would help to support the conclusion and further validate the model.

      This is an excellent point. Following the reviewer's feedback, we tested additional sensors including for extra- and intra-cellular pH. As mentioned above, we observed dramatic changes in extracellular pH levels. We followed up these observations with a series of experiments that showed a key functional role for media acidification in driving invasion (Figure 2).

      1. According to figure S3E, the main cell line used by the authors is already quite mesenchymal. It would be good to know if the results showed here are consistent in cells with a more basal epithelial phenotype. Do epithelial cells need stronger ischemic conditions to undergo phenotypic changes?

      This is a great catch. To explore this further, we run a Western Blot analysis to compare epithelial and mesenchymal markers expressed by the main cells we used here (Lung KPs) and to compare them to levels in a stereotypical epithelial (MCF-7) and a mesenchymal (MDA-MB-231) cell line (new Fig. S4D). As the reviewer correctly points out, we do see that E-Cad and Vimentin are co-expressed in KP cells.

      So far, our observations in a range of cell lines are a consistent decrease in E-Cad levels with no significant effects in vimentin levels - regardless of the basal levels of this protein.

      Interestingly, a recent study[1] demonstrated in triple-negative breast cancer models, that an EMT hybrid phenotype - including the presence of Vimentin - is required for metastasis. A compelling hypothesis then is that ischemia in the tumor microenvironment may favor these hybrid phenotypes. We briefly discuss this topic in the revised version of this manuscript.

      1. The number of replicates should be included in each figure legend and not only in the methods section. From data presented it is not clearly stated what do points mean in boxplots (e.g, Fig1H, 2B,G...). How many cells/spheroids did the authors count in each experiment?

      We have now added this information to all our experiments. This information can be found in the figures and on the figure legends.

      1. Figure 3B is not mentioned in the main text.

      We apologize for this error, and we thank the reviewer for catching this issue, which have now corrected.

      1. Line 295: "In the absence of macrophages, clusters of endothelial cells remained mostly rounded, even in the presence of consumer cells and regardless of their location along the ischemic gradient (Fig. 5A; Video S6)." However, in Video S6, both images show endothelial cells co-cultured with macrophages. I consider that Video S6 should be not referenced here.

      The reviewer is correct so have removed that reference.

      1. References style should be homogeneous (e.g, in Ref 13 appears "Nature Reviews Cancer" whereas in Ref 14 "Nat Rev Cancer"). Also, in Ref 25, the journal is missing.

      We apologize for this oversight, and we have not tried to be more consistent in our references.

      1. In plots where distance to open chamber site is not especify (e.g. 6B), at what distance were the data recorded? Please, indicate in the figure legend.

      We have now added this information to our figures.

      1. In the experiment showed in Fig 4, the sorting strategy would include stromal cells such as fibroblasts and endothelial cells in the GFP- population (as only CD45+ cells are removed). These cells will likely also grow in the 3MIC system and have an effect in migration. Can the authors rule out this confounding effect?

      The reviewer is correct. We still think that the possibility of fibroblast contamination is low. First, the fluorescence of HRE-GFP cells under normoxic, is still higher than the autofluorescence of cells not expressing this constructs (such as fibroblasts). This is quite normal as most sensors/reporter have some leakage and thus there is a small amount of transcription. Second, intradermal and subcutaneous tumors are quite poor in fibroblasts. In fact, to study the role of fibroblasts in these tumors, they are usually co-injected with tumor cells (PMID: 20138012). Third, in the process of tumor dissociation and in vitroestablishment, non-transformed cells tend to die more. Since these are more technical points, we moved the cell sorting details to the material and methods section.

      1. In Fig 5C the panel of proximal + macrophages is missing

      We apologize for this mistake, and we have corrected in the new version of the manuscript.

      1. In Fig. 5, Linifanib is used to study the effect of blocking VEGF. Linifanib can also interact with RTKs and PDGF. This fact should be acknowledged.

      We agree with this point. Following the reviewer's advice, we now acknowledged the potential off-target effects of these inhibitors (lines 354-355).

      Significance

      This is a very interesting work with the development of a simple and cost-effective system that allows to continuously monitor biological processes in 3D cultures under nutrient-modified conditions. In general, these data would be broadly interesting to cancer community in general, as 3MIC is a very versatile system, where several aspects can be studied and precisely discerned.

    1. Note: This response was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would first like to thank the reviewers for their careful reading and thoughtful feedback.

      We have substantially revised the manuscript and included additional experimental evidence on O-GlcNAc and OGT/OGA protein levels in the placenta of embryos bearing the OGT-Y851A hypomorphic mutation.

      Overall, we believe our improved manuscript provides compelling evidence that the glycosyltransferase activity of OGT, and thus the O-GlcNAc modification itself, plays a sexually dimorphic function in placental development and the developmental repression of retrotransposons in the developing embryo.

      We have addressed each of the reviewers' comments below. The original comments (C) are in italic, our responses (R) in Roman font.

      Reviewer #1

      Evidence, reproducibility and clarity

      C1: Formichetti at el. developed mice with OGT catalytic dead mutations and then studied their function during early embryogenesis. Not surprisingly, dramatic reduction in OGT activity failed to produce embryos; however, mild reduction in OGT did produce animals. The authors then use the T931 animals that have a mild reduction in activity to further characterize the function in the early embryo. Not surprisingly, male mice showed changes in gene expression, implantation sub-lethality, and an uptick in loss of retrotransposon silencing. The authors also show that an even milder reduction in OGT activity (Y851A) effects male placenta function and chromatin remodeling. Finally, the authors make a less stable OGT transgene within the mouse and again found embryogenesis issues in the males and alterations in numerous gene families including mTOR signaling and p53 function. All in all, this is an interesting study that track functions of OGT in early embryonic development. The studies are well-controlled and rigorous.

      R1: We thank the reviewer for their clear understanding and their appreciation of the rigor and impact of this work.

      Significance

      C2: This is a good study and novel. Not only is it of interest to reproductive biologist, but it echos themes found in O-GlcNAc biology.

      R1: We are pleased that the reviewer underlined the novelty of the study and its impact across fields.

      Reviewer #2

      Evidence, reproducibility and clarity

      Comments to authors

      C3: To investigate the function of OGT at specific developmental stages, the authors perturbed OGT's function in vivo by creating a murine allelic series featuring four single amino acid substitutions that variably reduced OGT's catalytic activity. The goal was to identify the direct effect of O-GlcNAcylation, using a sophisticated collection of genetic mutants to evaluate in vivo the role of this modification at early stages of development. Overall, the severity of embryonic lethality correlated with the extent of catalytic impairment of OGT, demonstrating that the O-GlcNAc modification is essential for early development.The study represents a substantial advance in our understanding of OGT and O-GlcNAcylation in mammalian development. The creation of novel murine models and inducible systems is an important contribution, providing powerful tools for future research in this field. The insights into the role of OGT's catalytic activity and its involvement in epigenetic regulation during embryonic development are noteworthy, opening new avenues for research.

      R3: We thank the reviewer for their insightful comments. We are grateful for the supporting statements. Please find below detailed response to all your comments.

      However, there are a few considerations and concerns:

      Major:

      C4: 1. An assumption of the study is that different mutations cause different levels of O-GlcNAcylation rather than alterations in substrate specificity. It might be important to test, at least in cultured cells, that the different mutations do not change the preference of OGT to modify certain proteins rather than others, which can provide alternative explanations for their findings.

      R4: Thanks for asking this question, it helped us to better explain the rationale behind the choice of the Ogt amino-acid substitutions.

      This is a critical point that we carefully considered in the design of the single amino-acid substitutions. Two lines of evidence support that the precise mutations created impact the catalytic rate without modifying the substrate specificity:

      First, as explained in the text, the choice of the single amino-acid substitutions was driven by previous structural and enzymology knowledge. The impact of the four point mutations selected on OGT protein stability and on the Michaelis-Menten kinetic values had previously been determined experimentally (Fig. 1A legend and Martinez-Fleites, C. et al. Nature Structure Molecular Biology 2008; https://doi.org/10.1038/nsmb.1443).

      There is a second important rationale that we added in the revised manuscript: the four point mutations selected are all located in the catalytic domain (specifically, H568A in the N-Cat domain and Y851A, T931A and Q849A in the C-Cat domain), while the substrate recognition is operated via two other domains namely the intervening domain (Int-D) https://doi.org/10.1038/s41589-023-01422-2) and the tetratricopeptide Repeat (TPR) superhelix (10.1021/jacs.7b13546; https://doi.org/10.1073/pnas.2303690120). Therefore, for both these reasons, it is extremely unlikely that these mutations could influence the substrate specificity.

      C5.1: 2. In Fig 1D and 1H, the thresholds to define a gene or TE as differentially expressed are not strong. According to the figure legends, "any" change in terms of log2Fc was considered as DE and colored. I think the figures should illustrate better that the changes are subtle, by for example adding a dotted line (at least) in the value 0.5 of the y-axis. These subtle transcriptional changes should be reflected better in certain paragraphs where the expression of TEs are presented/and discussed as a hallmark of the absence of O-GlcNAcylation in the OGT-mutants. The same happens with Suppl Fig 3C (changes are very minor). {. Applying a stronger threshold, among the upregulated genes, only Xist will be significantly overexpressed. If a gentle threshold needs to be applied to this data, authors should at least justify the reasons behind doing so. Same for Fig2D.

      R5.1: The reviewer means Figure 2D for MA plot of gene expression and Figure 2H for retrotransposons expression. These figures now include a dash line to indicate Log2FC = 0.5 (as all MA plots).

      The text is explicit on the subtle changes in transcription, it reads "with 2/3 of the genes downregulated and 90% of the significant changes below 1 log__2__FC"; "most of the Ogt__T931del/Y embryos showed a low magnitude upregulation of retrotransposons".

      The revised text states "Notably, most of the OgtT931__del/Y embryos showed a low magnitude (log2FC < 1) upregulation of retrotransposons".

      We expand on this topic in the next response (R5.2) noting that changes in gene expression upon O-GlcNAc perturbation in different systems were previously characterized as subtle and widespread. We suggest that this phenotype may arise from the scarcely understood pleiotropic function of O-GlcNAc in fine-tuning gene expression; this phenotype could have a biological significance.

      C5.2: If a gentle threshold needs to be applied to this data, authors should at least justify the reasons behind doing so. Same for Fig2D.

      R5.2: Previous studies in different systems reported that O-GlcNAc perturbation causes a widespread change in gene expression of low magnitude (https://doi.org/10.1101/2024.01.22.576677, https://www.pnas.org/doi/10.1073/pnas.2218332120). We use the same thresholds as a recent functional Ogt study in ES cells to call differentially expressed genes, specifically: p<0.05 (Wald test), any FC (Li et al. PNAS 2023, https://www.pnas.org/doi/10.1073/pnas.2218332120). The p value threshold is standard; the absence of FC threshold is dictated by the insufficient knowledge of the significance of the low magnitude changes observed across many transcripts.

      C6: 3. In Figure 2B, the T931del allele was recovered in the blastocyst population with a very high frequency, even higher than the male WT group (T931del: 10; WT: 3). This observation suggests that the T931del allele did not significantly affect blastocyst survival. Further clarification or additional experiments might be necessary to understand the implications of this finding on early developmental stages.

      R6: This is only a hint as the numbers of blastocysts recovered were too small to perform statistics on Mendelian distribution. Thus, more experiments are needed to perform these statistical tests. These experiments are onerous because the low frequency of germline transmission is incompatible with maintaining this mutation by breeding heterozygous animals. Because of this, a new mouse line needs to be created by CRISPR-HDR targeting in the zygote in order to compute statistics on Mandelian ratios. Importantly, this question - does T931del affect blastocyst survival? - is peripheral, and the results of these experiments would not affect our conclusions in any way.

      C7: 4. Similarly, in Figure 2G, there is an apparent higher expression of TE expression in the T931A/Y embryos group than in the T931del/Y group, which combined with the higher frequency of blastocyst generated in this latest group it may indicate a deeper molecular consequence after the deletion of the T931. A comparison of the transcriptome between these two cell lines help to address this possibility. Also, the authors should compare the O-GlcNAc levels of WT, T931A, and T931del mutant blastocysts by immunostaining, similar to what was done in Figure S5F.

      R7: We agree that a direct comparison between the two mutations of the T931 residue would be interesting; however, this comment is very difficult to address experimentally for the reasons outlined below:

      Firstly, it is not possible to perform a statistical comparison of the transcriptome T931A/Y VS. T931del/Y with the data generated because the number of hemizygous T931A/Y (n=2) is too small. Hence, it cannot be ruled out that the seemingly milder retrotransposon reactivation in one of the T931A/Y embryos could have occurred by chance.

      Secondly, considering the low magnitude effect on gene expression changes upon O-GlcNAc genetic perturbation, to statistically assess the penetrance of the molecular phenotype and perform the differential expression analysis, numerous (>>3) hemizygous blastocysts of each genotype would be needed. Because females heterozygous for the T931 mutations transmit the mutant allele at very low frequency, these experiments require numerous de novo CRISPR injection sessions.

      Thirdly, for the immunostaining of O-GlcNAc to be semi-quantitative, a large number of hemizygous blastocysts for each genotype would be required (note that in Figure S5F, 29 morulae per condition were imaged), thus requiring numerous CRISPR injection experiments as discussed above. Moreover, O-GlcNAc changes could be subtler than what expected based on the strong reduction of OGT activity, since as a compensatory mechanism Ogt expression is upregulated in the Ogt__T931A/del blastocysts (Fig. S2D), making a quantification even more challenging despite a high number of stained embryos.

      In sum, these in vivo experiments are difficult and require sacrificing many animals (about 20 females per CRISPR injection experiment). Because the results would bring refinement to the study but would not change our conclusions, we suggest that the cost/benefit is too high.

      C8: 5. In Boulard et al. 2019 O-GlcNAcylation was shown to be sufficient to modulate expression of DNA methylation-dependent TEs. It would be interesting to know (or at least discuss) if the changes in TE expression observed in OGT-mutant embryos in this study involve changes in DNA methylation. Ideally, some DNA methylation measurement optimized for low input numbers of cells would be useful.

      R8: Thank you for making the link with our previous study. In the PNAS paper, we report that targeted removal of O-GlcNAc at proteins bound to specific TEs (e.g. IAPez) causes their full-blown reactivation without detectable changes in DNA methylation, thus suggesting a role of the O-GlcNAc modification for the silencing of methylated TEs downstream or independent of DNA methylation. We agree that it would be informative to quantify DNA methylation in the T931-mutant blastocysts to test if the in vitro result is the same in vivo, but this would require performing onerous microinjection sessions as explained above.

      C9: 6. The data related with the OGT-degron system in MEs seem disconnected with the rest of the manuscript. While the developmental models (blastocyst, etc) elegantly assess the contribution of O-GlcNAcylation to the control of cell survival and gene expression through the use of different OGT mutants, the degron system is a system of graded depletion that unfortunately was only possible to be used in MEFs (instead of embryos). Thus, the results obtained with the degron system in MEFs are difficult to intersect with the data from the use of OGT-mutants in embryos. Even though there are obvious interesting questions that one may want to know about this OGT degron MEF system, none of them would demonstrate a direct role for O-GlcNAcylation in cellular function, the major point addressed in the developmental system. Using the degron system in embryonic stem cells might have provided a more parallel comparison. The authors should discuss this point in more detail and either use ESC instead of MEFs or provide a stronger justification for the use of MEFs over ESC.

      R9: We thank the reviewer for their clear understanding of the system. The choice of primary MEF as an in vitro model was imposed by technical limitations we encountered during the study. We fully agree that ES cells is the model of choice for preimplantation embryos; thus we initially derived ES cells and obtained only one male clone bearing the AID degron system. Upon auxin addition to the culture media, OGT's level remained unchanged in ES cells. Thus, the ES cells model was not usable. To test the AID degron in a different cell type, we then derived MEFs and showed its effectiveness (Figures 4C and S4C-E), which also allowed to collect functional data on OGT's cellular function (Figures 4D-F). We took the comment on board and clarified the rationale of studying MEFs in the revised manuscript. We agree that it remains to be verified that the OGT-dependent pathways uncovered in MEFs are relevant in the preimplantation embryo. Despite this caveat, we feel the mouse model for endogenous OGT-degron, as well as the negative results in vivo and conclusions in MEFs should be shared with the community, which could take advantage of our results to refine the system.

      Minor:C10: 7. In Fig 2C the color and shape codes are confusing to understand - there are some colors/shapes that are not represented in the PCA plot. The same in Fig 3H, where in the PCA plot there are pink triangles that do not match with the code legends.

      R10: We apologize for the confusion with the legends of Figures 2C and 3H, that we have made unambiguous in the revised version (as well as Figures S2B,C and S3C).

      C11: 8. In the figure legends of Figures 2D, 2E, 2F, and 2H, the notation should be corrected from "OgtT931A/Y" to "OgtT931del/Y".

      R11: This has been corrected; many thanks for bringing it to our attention.

      Significance

      C12: To investigate the function of OGT at specific developmental stages, the authors perturbed OGT's function in vivo by creating a murine allelic series featuring four single amino acid substitutions that variably reduced OGT's catalytic activity. The goal was to identify the direct effect of O-GlcNAcylation, using a sophisticated collection of genetic mutants to evaluate in vivo the role of this modification at early stages of development. Overall, the severity of embryonic lethality correlated with the extent of catalytic impairment of OGT, demonstrating that the O-GlcNAc modification is essential for early development.

      R12: We thank the reviewer for their clear understanding of our work and their appreciation of the biological importance of the findings.

      Reviewer #3

      Evidence, reproducibility and clarity

      C13: This is a conceptually interesting paper that attempts to leverage the knowledge of OGT catalysis to begin to dissect OGT function. The evidence is presented I a straightforward fashion and is in general well documented. The breeding strategies are well informed and the paper draws heavily on previous work carried out in the mouse.

      R13: We greatly appreciate the overall supporting review. However, we fail to understand what they mean with "the paper draws heavily on previous work carried out in the mouse". This comment may stem from a misunderstanding because this work is not based on any previously published study. Specifically, neither the seven murine alleles presented and analyzed nor the single embryo-transcriptomic data sets on which our conclusions are based have been published elsewhere.

      To put this work into context, before our study there were two seminal studies published two decades ago that reported the essential role of Ogt for mouse development, but no molecular profiling was performed (10.1073/pnas.100471497, 10.1128/mcb.24.4.1680-1690.2004). The two Ogt loss-of-function alleles studied in these papers were deemed as not suitable for interrogating molecular phenotypes because they caused cell death that confounds molecular profiling and embryonic lethality at implantation, thus preventing study of the sexually-dimorphic role of Ogt placenta. To overcome this long-standing problem, we created new seven murine alleles, which allowed us to tease apart molecular phenotypes at key stages of mouse embryonic development, focusing on the blastocyst and the placenta.

      Significance

      C14: The paper describes tools which will help dissect the many potential roles of O-GlcNAc addition in early development. As it stands, this is a descriptive manuscript that will lead to hypothesis generation and testing and this should not be undervalued. The biological reagents produced and characterized will be of general interest to the field. Most of the findings presented represented a verification of existing ideas in the field but this is not meant as a criticism since part of the motivation for the approach was to generate a reproducible system for analyzing the biological phenomena.

      R14: We thank the reviewer for their appreciation of the importance of experimentally testing ideas shared in the field without direct evidence.

      However, we must respectfully disagree with the qualification of "descriptive manuscript". This qualification may stem from the particularly difficult challenge to accessing the molecular details on how the O-GlcNAc modification exerts the biological functions we report. We are fully cognizant of the limitations of the study that we discussed in the discussion section and in R20.2. However, we feel that the adjective "descriptive" is not a fair qualification because we provide numerous novel functional evidence. Specifically, we introduce two novel orthogonal in vivo perturbations for endogenous Ogt that allowed us to interrogate for the first time its function in the developing mouse embryo. These perturbations allow us to draw causative conclusions (not descriptive) on the essential role of the O-GlcNAc modification itself for preimplantation development, its sexually-dimorphic role in the placenta and its requirement in vivo for the stable repression of retrotransposons.

      C15: There are perhaps some bioinformatic shortcuts taken that may need to be corrected upon thorough review. These do not lessen the overall impact of the contribution.

      R15: All the code written for the bioinformatic analyses performed in this study is publicly available: https://github.com/boulardlab/Ogt_mouse_models_Formichetti2024. The reviewer needs to specify which bioinformatic analysis they suggest could be improved.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Summary

      C16: O-GlcNAcylation is the fundamental post-translational modification of numerous nuclear and cytosolic proteins. OGT is the sole enzyme catalyzing O-GlcNAc addition onto the proteins. The essentiality of OGT for early development and cellular viability has been established by using OGT-KO mice and cell lines. However, it remains to be elucidated whether the catalytic activity of OGT is required for the early development, and if the catalytic activity of OGT is required what are the functions of OGT or O-GlcNAcylation in early development due to a lack of appropriate mouse models. To overcome the technical difficulty of manipulating the levels of O-GlcNAcylation in early embryos, Formichetti et al. created the series of four mouse models (OgtY851A, OgtT931A, OgtQ849N, and OgtH568A) with different OGT activity by introducing single amino acid substitution in the catalytic domain. By analyzing the inheritance of the hypomorphic OGT alleles and the lethality of mouse embryos, they discovered OGT activity is a critical factor for early development. Subsequently, RNA-seq analyses with two mouse models showing the maternal inheritance of the hypomorphic OGT alleles indicated that sever hypo-OGT activity altered transcription and silencing of retrotransposon in preimplantation development while mild reduction of OGT's activity affected placental development in a sexually dimorphic manner rather than preimplantation development. Furthermore, to study the function of OGT at specific developmental stages, they developed a mouse model bearing endogenously AID-tagged OGT for acute degradation of OGT. Although the degron system wasn't efficient in preimplantation embryos, they discovered quick transcriptional changes upon OGT deletion in MEFs. The quality of the manuscript is good because the question to be solved was appropriately set, the approach was well designed, and their findings were interesting, although their writing was sometimes hard to understand as I raised in my following comments. Nevertheless, there are several points to be fixed before being published.

      R16: We thank the reviewer for their clear understanding of our work and their appreciation of the biological importance of the findings. Your comprehensive review of the manuscript and the questions you raised were extremely helpful in improving the manuscript and fully addressing its limitations. Below, we respond to comments in full, have revised the manuscript to improve clarity and have included novel results.

      Major Comments

      C17: 1. Although the authors showed in vitro activity of each mutant of OGT used in this manuscript by referencing the previous literature, they never showed the levels of global O-GlcNAcylation (and OGT itself) in their established mouse embryos. Although it could be impossible to determine O-GlcNAc levels in OgtQ849N and OgtH568A embryos because of the lack of germline transmission and founder line, respectively, they could do that in OgtY851A and OgtT931A embryos. Given that Y851A and T931A mutants had similar VMAX/KM with different VMAX, it is possible that their activity is comparable or Y851A has even lower activity in vivo depending on the concentration of UDP-GlcNAc in embryos. Therefore, it is critical to assess whether in vivo OGT activity is correlated with that in vitro as expected to conclude that severity of sub-Mendelian inheritance is proportional to the reduction of activity of OGT in vivo. Moreover, since the authors developed the elegant system to deplete OGT, the activity of Q849N and H568A mutant OGT can be examined at least in cells by expressing them in MEFs with OGT-degron system. Thus, I propose determination of global O-GlcNAc levels compensated by OGT levels by western blotting in OgtY851A, and OgtT931A embryos or MEFs with the OGT degron system re-expressing the individual four mutant OGTs. If the protein amount is insufficient for western blotting in the embryos because of the sizes of the earlier stages of embryos, I believe the author could address this by utilizing immunofluorescence as shown in Figure S5.

      R17: We fully agree that this is an important point that requires revision. The only mutation for which the level of O-GlcNAc and OGT can be assessed by western blot in vivo is Y851A, the other mutations resulting in embryonic lethality before the blastocyst stage.

      We have included in the revised manuscript western blot analyses of protein expression for OGT, OGA and O-GlcNAc levels in the placenta of the OgtY851A mutants (new Figures 3C,D). The new data show that OGT is upregulated at the protein level in homozygous females, in good agreement with our transcriptomic analysis. Furthermore, O-GlcNAc levels were slightly reduced in homozygous and hemizygous placentae thus showing the impact of the point mutation on global O-GlcNAc levels in the placentae. Moreover, the analysis of OGA protein level unexpectedly revealed the enrichment of a previously uncharacterized OGA fast migrating isoform in hemizygous and homozygous placentae.

      We agree that it would be informative to compare O-GlcNAc levels in OgtT931A versus OgtY851A embryos. A comparison implies performing the experiment at the same developmental stage, which has to be the blastocyst stage or prior because T931A/Y embryos die around implantation. The blastocyst being made of approximately 140 cells, it would require to pool many single blastocysts to obtain the necessary protein input for western blot. We are not aware of another study performing western blot with pooled blastocysts. An additional great challenge for this experiment is the necessity to genotype and sex the blastocysts before pooling. Thus, the feasibility of this experiment is uncertain.

      As an alternative, the reviewer suggests measuring O-GlcNAc levels in the degron MEFs after introduction of OGT transgenes bearing the mutation studied. This experiment would not be conclusive because of residual O-GlcNAc after OGT degradation (Figure S4E). Furthermore, the O-GlcNAc proteome is dynamic during development (as shown in the developing brain by Liu et al. https://doi.org/10.1371/journal.pone.0043724), therefore the MEFs results would have limited value to explain our results in the early embryo.

      In sum, available technologies to quantify O-GlcNAc (e.g. western bot, mass spectrometry) are inadequate for low input samples as the early embryo. However, our series of hypomorphic alleles backed up with in vitro enzymology measurements brings indirect evidence to this question. Specifically, the qualitative correlation between the measured OGT activity in vitro and the developmental phenotype indicates that the resulting relative levels of O-GlcNAc are consistent with in vitro measurements.

      C18.1 : 2. I didn't understand why the authors couldn't find any founder lines of the OgtH568A mutant. Was that because mosaic mice with OgtH568A mutation are lethal?

      R18.1: To answer to this question, it is important to recall two key features of the biological system:

      1) The mutation H568A was reported to disrupt the glycosyltransferase activity completely (10.1038/nsmb.1443). Hence, OGT-H588A is catalytic dead.

      2) We performed the CRISPR-HDR targeting in the 1-cell embryo.

      Based on these premises, the absence of F0 with the OgtH568A mutation (0/31) suggests that introducing this mutation causes embryonic lethality in both males and females. This hypothesis is consistent with the previously reported lethality around implementation of Ogt-null alleles (10.1128/mcb.24.4.1680-1690.2004). It is possible that the sgRNA is very efficient and results in homozygous mutations in all female zygotes injected (as we have not obtained heterozygous females bearing these mutations). High efficiency of the targeted mutagenesis in the zygote results in mutants where all or the majority of cells bear the mutation (no or low mosaicism). The high number of microinjections performed (416 embryos over the 3 injection sessions) allows us to make these claims.

      C18.2 : Also, I believe there was no explanation why the OgtQ849N allele showed no maternal inheritance. Was that because Q849N possesses enough activity for sustaining mosaic embryos, but not oocytes? The authors should better explain these points in the manuscript text.

      R18.2: Thanks for this comment, we agree that this maternal effect phenotype demands further explanation.

      The phenotype observed suggests two possibilities: either that the oocyte cannot maturate or that the cleavage-stage embryo cannot develop with the resulting lower levels of O-GlcNAc. The cleavage-stage embryo does not transcribe a catalytically active OGT before the 8-cell stage and thus relies on the OGT protein inherited from the oocyte until this stage (https://doi.org/10.1101/2024.01.22.576677).

      Thank you for this comment, we added this interpretation of the result in the text:<br /> "The lack of maternal transmission of the Q849N allele from seemingly mosaic founder females is likely explained by the reliance of the cleavage stage embryo onto the oocyte payload of OGT and O-GlcNAc modified proteins. Specifically, Ogt's exons encoding for the catalytic domains are not detectable before the 8-cell stage, while OGT full-length protein is present and thus maternally inherited (Formichetti et al, 2024)."

      C19: 3. The authors serendipitously found a T931del-allele in the "WT" allele of the OgtT931A line, and suggested that T931del had milder activity loss, although the lethality of embryos was greatly mitigated. Nevertheless, transcriptome analyses in male blastocysts revealed that 120 genes' expression was changed in T931del/Y males. This raised the question about which mutant OGT has higher activity, Y851A or T931del. I think comparing the activity of Y851A and T931del mutants in MEFs with OGT-degron system is important to confirm the proportional relationship between activity and phenotypic severity.

      R19: We agree that it is a limitation that the effect of the T931del mutation on OGT activity has not been biochemically characterized. However, the important point here is that our assessment of phenotypic severity based on maternal inheritance of the mutant allele and embryonic lethality is based on the point mutations for which the catalytic activity has been determined, namely Y851A, T931A, Q849N and H568A, but not T931del.

      We studied the serendipitously discovered T931del mutation to obtain transcriptional insights in the blastocyst. Because the deleted residue T931 is key for the binding to the donor substrate, we can reasonably assume that this mutation affects the catalytic activity, albeit to an undetermined level.

      Hence, our conclusions regarding the requirement of O-GlcNAcylation for development are unaffected by the lack of biochemical knowledge on T931del.

      C20.1: 4. Regarding transcriptomes of T931del/Y, the authors found the upregulation of proteasomal activity and stress granules along with the downregulation of amino acid metabolism, mitochondrial respiration, and so on. To validate the results, the authors should perform qPCR on several up- or down-regulated genes.

      R20.1 : We agree that, in principle, qPCR validation is suitable. However, this validation experiment is particularly expensive in this case because of the requirement of numerous CRISPR zygote pronuclear injection sessions.

      The conclusions of the RNA-seq analysis are strongly supported by a high number of biological replicates (n=10). This high number of biological replicates was essential to obtain sufficient statistical power to quantify with a high level of confidence transcriptional changes of low magnitudes (below 2-fold change, see R5.1 and R5.2).

      Therefore, the qPCR validation experiment would require to repeat the CRISPR zygote pronuclear injection sessions with the same high number of animals. This represents a major investment in experimental work and the sacrificing of about 40 animals. Importantly, the RNA-seq results presented are authoritative because of a high number of biological replicates and high number of sequencing reads per sample. Thus, we argue that qPCR validation is not essential and thus the high cost of this experiment is difficult to justify.

      C20.2: In addition, according to Figure S2E, the authors pointed out that at least for genes upregulated in OgtT931A embryos, the changes were not explained by a developmentally delayed transcriptome, suggesting that upregulation of these genes was the cause of developmental delay. Therefore, I strongly encourage them to discuss in the manuscript text how up-regulated genes could contribute to developmental delay.

      R20.2: Throughout the manuscript, we have been cautious to avoid establishing causal relationships between the differentially expressed genes uncovered and the developmental phenotypes (e.g. delayed development). There are two main obstacles which we believe prevent us from establishing causality with the data available. Firstly, it is not possible to disentangle differentially expressed genes and developmental delay (in other words, we have no way to tell which is the cause and which is the consequence). Secondly, O-GlcNAc modifies over 5000 proteins and the developing embryo is a particularly dynamic system; thus we cannot know whether the differentially expressed promoters are direct targets of O-GlcNAc modified proteins (or alternatively secondary effect of another molecular alteration, for example of the proteome). We discuss this limitation of the study in the discussion section.

      C21: 5. Regarding the transcriptome in OgtY851A mice, Y851A/Y male mice had huge transcriptomic differences, while Y851A/Y851A female mice barely had any. Although it seems to agree with the number of Ogt alleles, I wonder whether other X-linked genes expressed higher in female placenta as shown in Figure 3C could attenuate the effects of decreased OGT activity. I don't think this possibility can be excluded, unless the authors further decrease OGT activity in Y851A/Y851A female placenta and obtain the similar results as for male placenta. Or if they compared the levels of global O-GlcNAcylation between Y851A/Y and Y851A/Y851A mouse placentas and discovered they had similar levels of O-GlcNAcylation, then the authors could conclude that the number of Ogt alleles was not the reason of sexual-dimorphism. The authors should determine the levels of O-GlcNAcylation in Y851A/Y and Y851A/Y851A mouse placentas and/or at least discuss the above possibilities in the manuscript text.

      R21: Thank you for the thoughtful feedback. We agree that the most likely explanation for the higher sensitivity of males placenta as compared to females to OGT reduced activity is the difference in Ogt copy number, especially because Ogt escapes X-chromosome inactivation in the placenta (new Figure S3A).

      Western blot quantification of global O-GlcNAc levels was now performed (new Figures 3C,D). We measured similar level of O-GlcNAc in Y851A/Y and Y851A/Y851A placentas (lowered than WT males in both cases), but we cannot exclude that the WB does not have the dynamic range required to detect a subtle difference. In fact, female homozygous were expected to have an intermediate level between WT males and hemizygous males, and the difference between the two male genotypes (also considering sample-to-sample variability) is already small when quantified from the blot (new Figure 3D). It is possible that a X-linked modifier attenuates the impact of hypo-O_GlcNAcylation in female mutant placenta in the case of identical O-GlcNAc levels in homozygous females and hemizygous males. Thank you for the idea that we included in the revised manuscript:

      "Of note, the lower sensitivity of the homozygous females' transcriptome to Ogt disruption (Fig. 3F,I and S3B) seems difficult to reconcile with their lower O-GlcNAc level comparable (lower) O-GlcNAc level to the hemizygous males (Fig. 3C). It is possible that the western blot technique is not sensitive enough to detect subtle differences in O-GlcNAcylation. An alternative hypothesis, if O-GlcNAc levels were truly identical between Y851A/Y and Y851A/Y851A, could be the existence of a modifier in female that could be a XCI-escapee."

      C22: 6. In terms of the transcriptome in OgtY851A mice, similar to comment 4, the authors should confirm their transcriptomics data shown as Figure 3D by qPCR. In addition, the authors should describe the potential mechanisms by which the differentiation of precursor cells of LaTPs and JZPs were disrupted. Were master regulators of the differentiation known to be O-GlcNAcylated and loss of O-GlcNAcylation perturbed the function?

      R22: As for the whole embryo discussed in R20.2, we also interpret cautiously the gene expression phenotype observed in the placenta. Specifically, we state in the manuscript that it could either be caused by an impact of lower O-GlcNAcylation on placental differentiation or by a general delay in placentation or in the development of the embryo as a whole. The hypothesis of a general delay (of the whole embryo and/or of placental formation specifically) is supported by the downregulation of essentially all markers of more differentiated cell types and the upregulation of the precursor marker. We favor this hypothesis because it is consistent with what observed with the T931 mutants and also with the enzymatic removal of O-GlcNAc in the zygote (Formichetti et al., 2024 BioRxiv). Because of the thousands of O-GlcNAcylated proteins present in the cell, it is impossible to know which is the responsible molecular mechanism, which could even start at much earlier stages.

      Minor Comments

      C23: 1. Regarding DFP461-463 mutant, I couldn't understand the point of this figure because the results had no difference, and the meaning of the mutation was quite different from the others. Thus, the figure was awkward and a little confusing to me. If the authors still want to include the figures, I would suggest that they should reorganize the position of the figure (maybe after figure 3 is better to show you had tried to investigate the effects of nuclear localization of OGT on the changes of transcriptomes) and add some results. Since WT OGT seems to be localized mainly in the cytosol at steady state (Figure S1B and S1C), the effect of mutation on its nuclear localization should not be obvious. Therefore, it is difficult to conclude the mutation had no effect on the nuclear localization unless the ratio of nuclear and cytosol localization is quantified. Also, I wonder whether the O-GlcNAc levels of nuclear and cytosolic proteins in the mutant cells were comparable to those in WT cells. If this is the case, the results would also support the authors' conclusion.

      R23: We took the comments on board and made it clearer that the rationale for the DFP461-463 mutant was an attempt to separate OGT's nuclear and cytosolic functions. We fully agree that these results are peripheral, and thus we presented these results in Supplementary Figure 1 (not in the main figure).

      The biochemical evidence presented in Fig S1C shows that the genetic substitution of DFP to AAA on endogenous OGT has no detectable impact on its nuclear localization in primary MEFs. This result is far more authoritative than the evidence provided by Seo et al. 2016 (doi: 10.1038/srep34614), which is based on the overexpression of OGT transgenes in HeLa cells. Importantly, Seo et al. 2016 did not assess the impact of their mutations on endogenous OGT.

      We believe that the negative results we obtained with the DFP461-463 mouse model shall be extremely valuable for the field. Firstly, science can move forward only if both negative and positive results are shared. In this specific case, we found that mutation of endogenous OGT in MEFs yielded to a different result than previously reported overexpression of the same mutant construct in HeLa cells. Secondly, we want to make the Ogt-NLS- mouse model available for further investigations.

      C24: 2. Since OGT or O-GlcNAcylation regulates chromatin status, the authors analyzed the gene expression profiles of retrotransposons in T931del/Y or T931A/Y mice. Is it possible to investigate if the release of gene silencing is also seen in non-retrotransposon genes? I assumed retrotransposons might be a well-established system to analyze gene silencing status, however, if the authors could find similar effects on genes other than retrotransposons, that would be highly valuable.

      R24: This is an interesting idea. This notion refers to the activation of promoters that are normally epigenetically repressed (e.g. silent despite the presence of all trans-active factors required for their expression). Epigenetically repressed promoters include retrotransposons, imprinted genes and germline specific genes that are normally expressed in germ cells and maintained in a repressed state in somatic cells (10.1038/s41580-019-0159-6). Testing of mono-allelic expression of imprinted genes required F1-hybrid. Thus, we assessed whether well-studied germline specific genes could be realized from silencing in T931del/Y or T931A/Y blastocyst and found no evidence for it (see dot plot below). The unbiased transcriptomic analysis presented in the manuscript shows that the product of upregulated genes are enriched in mRNA processing (Figure 2E), but these genes are not normally epigenetically repressed. Thus, contrary to retrotransposons, the role of O-GlcNAc at cellular gene promoters appears not to be linked to epigenetic silencing. This could be explained by the many different protein substrates for O-GlcNAc.

      C25: 3. OgtY851A mice with milder OGT activity loss didn't exhibit impaired preimplantation development, but did display postimplantation development such as placental development, suggesting that O-GlcNAcylation of proteins required for preimplantation and postimplantation development relies on different degrees of OGT activity. I wonder whether global O-GlcNAc levels in embryos in preimplantation and postimplantation developmental stages are different or not. This might include both the pattern of blotting and intensities. The results would give the authors an explanation why the dependency on OGT activity was different in two developmental stages. Can the authors provide data? If not, then the authors should at least describe hypotheses in the manuscript to address these questions.

      R25: We recently reported that the subcellular patterns of O-GlcNAc are highly dynamic during preimplantation development (Formichetti et al. 2024, BioRxiv). The most striking O-GlcNAc remodeling we observed is the enrichment of nuclear O-GlcNAc as compared to cytoplasmic O-GlcNAc that is concomitant to embryonic genome activation (Formichetti et al. 2024, BioRxiv). We quantified the ratio of the nuclear/cytoplasmic signal by immunofluorescence, but absolute quantification is not possible with this method. Due to the limited number of cells of the preimplantation embryo, this analysis cannot be performed by western blot. Hence, there is no appropriate method to quantitatively compare O-GlcNAc levels between preimplantation and postimplantation embryos.

      C26: 4. The authors' AID-degron system elegantly worked in MEFs but was inefficient in preimplantation embryos. I wonder if this was because of the high expression of the shorter isoform of OGT detected as OGTp78 in the author's western blot. Is it possible to examine this possibility in the embryos? Either way, the authors should describe a potential explanation for why the efficiency in the embryos was low. In addition, the authors should describe why they inserted the AID tag only into the longest OGT isoform.

      R26: This is a good point. The smallest isoform OGTp78 bears the catalytic domain and thus can partially compensate for the degradation of OGTp110. Note that the level of OGTp78 is low and does not increase upon OGTp110 degradation; thus a compensation can only be partial (Figures S4A and S4D). Alternative hypotheses for the ineffectiveness of the degron system in ex vivo grown embryos include: i) the expression level of OsTIR that may be too low in the early embryo (Rosa26 promoter not being activated at EGA), ii) a possible steric hindrance of the N-ter AID tag in these cells, iii) the lower concentration of Auxin imposed by toxicity on the embryo is likely suboptimal. Testing these possibilities is very difficult in preimplantation embryos.

      It is unclear how the OGTp78 isoform is produced; it was hypothesized to originate from an alternative transcription start site (https://doi.org/10.1007/s00335-001-2108-9). We initially attempted to target both isoforms by inserting the AID tag at the C-terminus, but we were unsuccessful in producing this mouse model. It is possible that the C-terminus that is near the catalytic site cannot tolerate the AID knock-in.

      C27: 5. In Figure S1C, is the band detected right below OGTp78 in nuclei fractions non-specific or do both bands correspond to OGTp78 ?

      R27: To answer this question, a knockout control would be needed. OGTp78 being not targeted by our AID-degron, we cannot test the specificity of these bands using our perturbation tool kit.

      C28: 6. Figure 1D top row third column: hemizgous -> hemizygous

      R28: Many thanks; the embarrassing typo has been corrected.

      C29: 7. Figure 1D second row third column: hemyzygous -> hemizygous

      R29: Thanks for bringing this other typo to our attention, it is now corrected.

      Reviewer #4 (Significance (Required)):

      General assessment: strengths and limitations

      C30: Strength: This manuscript elegantly revealed the requirement of OGT in mammalian development by taking advantage knock-in mouse models with different OGT activity. In addition, the manuscript provided the interesting and important transcriptomics data in both pre- and post-implantation embryos of OGT mutant mice. These data sets could explain detailed mechanisms how OGT or O-GlcNAcylation regulates mammalian development in the future. Furthermore, development of AID-tagged OGT system would be a useful tool for other researchers studying OGT function.

      Limitation: Although they found interesting changes in terms transcriptomes in developing mice with different OGT activity, they lack the data showing how these changes caused the observed phenotypes. In other words, there are less mechanistic insights behind the developmental problems seen in mice with different OGT activity.

      In addition, although I agree the question about whether OGT activity itself is crucial for the early development of mammals has not been completely solved for a long time, I assume people thought OGT activity is actually important for the mammalian development thorough the observation of OGT-linked congenital disorders of glycosylation.

      Therefore, I would say the novelty of the manuscript is a little less impactful. Furthermore, although AID-tagged OGT system revealed fundamental questions regarding the transcriptional changes upon acute depletion of OGT in cellular levels, the system was inefficient in mouse embryos. So, they showed nothing about developmental-stage specific requirements of OGT.

      Advance: The manuscript can fill a current gap regarding requirement of OGT in mammalian development. Also, the manuscript developed a series of mutant mice with different OGT activity and an AID-tagged OGT mouse line. These mice provide technical advances.

      Audience: The manuscript will be interested in researchers in specific fields such as glycobiology, developmental biology, and clinical fields.

      Describe your expertise: Biochemistry, Glycobiology, Cell biology

      R30: We are thankful for the constructive and supportive review.

      We fully agree with the limitations of the study and discussed them in the manuscript. Our in vivo approach revealed the most phenotypically relevant transcriptional phenotypes resulting from OGT catalytic impairment during embryonic development. We make the mouse models created for this study available to the community to facilitate follow-up studies aiming at exploring the underlying molecular details.

      As pointed out in the comments, the requirement of OGT glycosyltransferase activity for mammalian development was widely assumed by the field, but this belief was without direct experimental evidence. This study provides the first in vivo evidence for this important conclusion.

      Conclusion: The reviewers' comments were tremendously useful to improving the clarity of the manuscript and adding important new in vivo evidence. We note that none of the reviewers provided any reason to doubt our important conclusions:

      • The demonstration that the enzymatic activity of Ogt, thus the O-GlcNAc modification itself, is essential for preimplantation development.
      • The finding that a mild reduction of OGT's activity is sufficient to perturb the silencing of multiple families of retrotransposons in the growing embryo.
      • The indication, from transcriptomes of hypo-O-GlcNAcylated embryos, of a developmental retardation upon a mild O-GlcNAc perturbation.

      • The discovery that OGT's rapid depletion in vitro downregulates basal cellular function, including translation. This result provides mechanistic support to the embryonic growth delay resulting from decreasing O-GlcNAc in vivo.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      Thank you for your review and pointing out multiple things to be discussed and clarified! Below, we go through the various limitations you pointed out and refer to the places where we have tried to address them.

      (1) It's important to keep in mind that this work involves simplified models of the motor system, and often the terminology for 'motor cortex' and 'models of motor cortex' are used interchangeably, which may mislead some readers. Similarly, the introduction fails in many cases to state what model system is being discussed (e.g. line 14, line 29, line 31), even though these span humans, monkeys, mice, and simulations, which all differ in crucial ways that cannot always be lumped together.

      That is a good point. We have clarified this in the text (Introduction and Discussion), to highlight the fact that our model isn’t necessarily meant to just capture M1. We have also updated the introduction to make it more clear which species the experiments which motivate our investigation were performed in.

      (2) At multiple points in the manuscript thalamic inputs during movement (in mice) is used as a motivation for examining the role of preparation. However, there are other more salient motivations, such as delayed sensory feedback from the limb and vision arriving in the motor cortex, as well as ongoing control signals from other areas such as the premotor cortex.

      Yes – the motivation for thalamic inputs came from the fact that those have specifically been shown to be necessary for accurate movement generation in mice. However, it is true that the inputs in our model are meant to capture any signals external to the dynamical system modeled, and as such are likely to represent a mixture of sensory signals, and feedback from other areas. We have clarified this in the Discussion, and have added this additional motivation in the Introduction.

      (3) Describing the main task in this work as a delayed reaching task is not justified without caveats (by the authors' own admission: line 687), since each network is optimized with a fixed delay period length. Although this is mentioned to the reader, it's not clear enough that the dynamics observed during the delay period will not resemble those in the motor cortex for typical delayed reaching tasks.

      Yes, we completely agree that the terminology might be confusing. While the task we are modeling is a delayed reaching task, it does differ from the usual setting since the network has knowledge of the delay period, and that is indeed a caveat of the model. We have added a brief paragraph just after the description of the optimal control objective to highlight this limitation.

      We have also performed additional simulations using two different variants of a model-predictive control approach that allow us to relax the assumption that the go-cue time is known in advance. We show that these modifications of the optimal controller yield results that remain consistent with our main conclusions, and can in fact in some settings lead to preparatory activity plateaus during the preparation epoch as often found in monkey M1 (e.g in Elsayed et al. 2016). We have modified the Discussion to explain these results and their limitations, which are summarized in a new Supplementary Figure (S9).

      (4) A number of simplifications in the model may have crucial consequences for interpretation.

      a) Even following the toy examples in Figure 4, all the models in Figure 5 are linear, which may limit the generalisability of the findings.

      While we agree that linear models may be too simplistic, much prior analyses of M1 data suggest that it is often good enough to capture key aspects of M1 dynamics; for example, the generative model underlying jPCA is linear, and Sussillo et al. (2015) showed that the internal activity of nonlinear RNN models trained to reproduce EMG data aligned best with M1 activity when heavily regularized; in this regime, the RNN dynamics were close to linear. Nevertheless, this linearity assumption is indeed convenient from a modeling viewpoint: the optimal control problem is more easily solved for linear network dynamics and the optimal trajectories are more consistent across networks. Indeed, we had originally attempted to perform the analyses of Figure 5 in the nonlinear setting, but found that while the results were overall similar to what we report in the linear regime, iLQR was occasionally trapped into local minimal, resulting in more variable results especially for inhibition-stabilized network in the strongly connected end of the spectrum. Finally, Figure 5 is primarily meant to explore to what extent motor preparation can be predicted from basic linear control-theoretic properties of the Jacobian of the dynamics; in this regard, it made sense to work with linear RNNs (for which the Jacobian is constant).

      b) Crucially, there is no delayed sensory feedback in the model from the plant. Although this simplification is in some ways a strength, this decision allows networks to avoid having to deal with delayed feedback, which is a known component of closed-loop motor control and of motor cortex inputs and will have a large impact on the control policy.

      This comment resonates well with Reviewer 3's remark regarding the autonomous nature (or not) of M1 during movement. Rather than thinking of our RNN models as anatomically confined models of M1 alone, we think of them as models of the dynamics which M1 implements possibly as part of a broader network involving “inter-area loops and (at some latency) sensory feedback”, and whose state appears to be near-fully decodable from M1 activity alone. We have added a paragraph of Discussion on this important point.

      (5) A key feature determining the usefulness of preparation is the direction of the readout dimension. However, all readouts had a similar structure (random Gaussian initialization). Therefore, it would be useful to have more discussion regarding how the structure of the output connectivity would affect preparation, since the motor cortex certainly does not follow this output scheme.

      We agree with this limitation of our model — indeed one key message of Figure 4 is that the degree of reliance on preparatory inputs depends strongly on how the dynamics align with the readout. However, this strong dependence is somewhat specific to low-dimensional models; in higher-dimensional models (most of our paper), one expects that any random readout matrix C will pick out activity dimensions in the RNN that are sufficiently aligned with the most controllable directions of the dynamics to encourage preparation.

      We did consider optimizing C away (which required differentiating through the iLQR optimizer, which is possible but very costly), but the question inevitably arises what exactly should C be optimized for, and under what constraints (e.g fixed norm or not). One possibility is to optimize C with respect to the same control objective that the control inputs are optimized for, and constrain its norm (otherwise, inputs to the M1 model, and its internal activity, could become arbitrarily small as C can grow to compensate). We performed this experiment (new Supplementary Figure S7) and obtained a similar preparation index; there was one notable difference, namely that the optimized readout modes led to greater observability compared to a random readout; thus, the same amount of “muscle energy” required for a given movement could now be produced by a smaller initial condition. In turn, this led to smaller control inputs, consistent with a lower control cost overall.

      Whilst we could have systematically optimized C away, we reasoned that (i) it is computationally expensive, and (ii) the way M1 affects downstream effectors is presumably “optimized” for much richer motor tasks than simple 2D reaching, such that optimizing C for a fixed set of simple reaches could lead to misleading conclusions. We therefore decided to stick with random readouts.

      Additional comments:

      (1) The choice of cost function seems very important. Is it? For example, penalising the square of u(t) may produce very different results than penalising the absolute value.

      Yes, the choice of cost function does affect the results, at least qualitatively. The absolute value of the inputs is a challenging cost to use, as iLQR relies on a local quadratic approximation of the cost function. However, we have included additional experiments in which we penalized the squared derivative of the inputs (Supplementary Figure S8; see also our response to Reviewer 3's suggestion on this topic), and we do see differences in the qualitative behavior of the model (though the main takeaway, i.e. the reliance on preparation, continues to hold). This is now referred to and discussed in the Discussion section.

      (2) In future work it would be useful to consider the role of spinal networks, which are known to contribute to preparation in some cases (e.g. Prut and Fetz, 1999).

      (3) The control signal magnitude is penalised, but not the output torque magnitude, which highlights the fact that control in the model is quite different from muscle control, where co-contraction would be a possibility and therefore a penalty of muscle activation would be necessary. Future work should consider the role of these differences in control policy.

      Thank you for pointing us to this reference! Regarding both of these concerns, we agree that the model could be greatly improved and made more realistic in future work (another avenue for this would be to consider a more realistic biophysical model, e.g. using the MotorNet library). We hope that the current Discussion, which highlights the various limitations of our modeling choices, makes it clear that a lot of these choices could easily be modified depending on the specific assumptions/investigation being performed.

      Reviewer 2:

      Thank you for your positive review! We very much agree with the limitations you pointed out, some of which overlapped with the comments of the other reviewers. We have done our best to address them through additional discussion and new supplementary figures. We briefly highlight below where those changes can be found.

      (1) Though the optimal control theory framework is ideal to determine inputs that minimize output error while regularizing the input norm, it however cannot easily account for some other varied types of objectives especially those that may lead to a complex optimization landscape. For instance, the reusability of parts of the circuit, sparse use of additional neurons when learning many movements, and ease of planning (especially under uncertainty about when to start the movement), may be alternative or additional reasons that could help explain the preparatory activity observed in the brain. It is interesting to note that inputs that optimize the objective chosen by the authors arguably lead to a trade-off in terms of other desirable objectives. Specifically, the inputs the authors derive are time-dependent, so a recurrent network would be needed to produce them and it may not be easy to interpolate between them to drive new movement variants. In addition, these inputs depend on the desired time of output and therefore make it difficult to plan, e.g. in circumstances when timing should be decided depending on sensory signals. Finally, these inputs are specific to the full movement chain that will unfold, so they do not permit reuse of the inputs e.g. in movement sequences of different orders.

      Yes, that is a good point! We have incorporated further Discussion related to this point. We have additionally included a new example in which we regularize the temporal complexity of the inputs (see also our response to Reviewer 3's suggestion on this topic), which leads to more slowly varying inputs, and may indeed represent a more realistic constraint and lead to simpler inputs that can more easily be interpolated between. We also agree that uncertainty about the upcoming go cue may play an important role in the strategy adopted by the animals. While we have not performed an extensive investigation of the topic, we have included a Supplementary Figure (S9) in which we used Model Predictive Control to investigate the effect of planning under uncertainty about the go cue arrival time. We hope that this will give the reader a better sense of what sort of model extensions are possible within our framework.

      (2) Relatedly, if the motor circuits were to balance different types of objectives, the activity and inputs occurring before each movement may be broken down into different categories that may each specialize into one objective. For instance, previous work (Kaufman et al. eNeuron 2016, Iganaki et al., Cell 2022, Zimnik and Churchland, Nature Neuroscience 2021) has suggested that inputs occurring before the movement could be broken down into preparatory inputs 'stricto sensu' - relating to the planned characteristics of the movement - and a trigger signal, relating to the transition from planning to execution - irrespective of whether the movement is internally timed or triggered by an external event. The current work does not address which type(s) of early input may be labeled as 'preparatory' or may be thought of as a part of 'planning' computations.

      Yes, our model does indeed treat inputs in a very general way, and does not distinguish between the different types of processes they may be composed of. This is partly because we do not explicitly model where the inputs come from, such that our inputs likely englobe multiple processes. We have added discussion related to this point.

      (3) While the authors rightly point out some similarities between the inputs that they derive and observed preparatory activity in the brain, notably during motor sequences, there are also some differences. For instance, while both the derived inputs and the data show two peaks during sequences, the data reproduced from Zimnik and Churchland show preparatory inputs that have a very asymmetric shape that really plummets before the start of the next movement, whereas the derived inputs have larger amplitude during the movement period - especially for the second movement of the sequence. In addition, the data show trigger-like signals before each of the two reaches. Finally, while the data show a very high correlation between the pattern of preparatory activity of the second reach in the double reach and compound reach conditions, the derived inputs appear to be more different between the two conditions. Note that the data would be consistent with separate planning of the two reaches even in the compound reach condition, as well as the re-use of the preparatory input between the compound and double reach conditions. Therefore, different motor sequence datasets - notably, those that would show even more coarticulation between submovements - may be more promising to find a tight match between the data and the author's inputs. Further analyses in these datasets could help determine whether the coarticulation could be due to simple filtering by the circuits and muscles downstream of M1, planning of movements with adjusted curvature to mitigate the work performed by the muscles while permitting some amount of re-use across different sequences, or - as suggested by the authors - inputs fully tailored to one specific movement sequence that maximize accuracy and minimize the M1 input magnitude.

      Regarding the exact shape of the occupancy plots, it is important to note that some of the more qualitative aspects (e.g the relative height of the two peaks) will change if we change the parameters of the cost function. Right now, we have chosen the parameters to ensure that both reaches would be performed at roughly the same speed (as a way to very loosely constrain the parameters based on the observed behavior). However, small changes to the hyperparameters can lead to changes in the model output (e.g one of the two consecutive reaches being performed using greater acceleration than the other), and since our biophysical model is fairly simple, changes in the behavior are directly reflected in the network activity. Essentially, what this means is that while the double occupancy is a consistent feature of the model, the exact shape of the peaks is more sensitive to hyperparameters, and we do not wish to draw any strong conclusions from them, given the simplicity of the biophysical model. However, we do agree that our model exhibits some differences with the data. As discussed above, we have included additional discussion regarding the potential existence of separate inputs for planning vs triggering the movement in the context of single reaches.

      Overall, we are excited about the suggestions made by the Reviewer here about using our approach to analyze other motor sequence datasets, but we think that in order to do this properly, one would need to adopt a more realistic musculo-skeletal model (such as one provided by MotorNet).

      (4) Though iLQR is a powerful optimization method to find inputs optimizing the author's cost function, it also has some limitations. First, given that it relies on a linearization of the dynamics at each timestep, it has a limited ability to leverage potential advantages of nonlinearities in the dynamics. Second, the iLQR algorithm is not a biologically plausible learning rule and therefore it might be difficult for the brain to learn to produce the inputs that it finds. It remains unclear whether using alternative algorithms with different limitations - for instance, using variants of BPTT to train a separate RNN to produce the inputs in question - could impact some of the results.

      We agree that our choice of iLQR has limitations: while it offers the advantage of convergence guarantees, it does indeed restrict the choice of cost function and dynamics that we can use. We have now included extensive discussion of how the modeling choices affect our results.

      We do not view the lack of biological plausibility of iLQR as an issue, as the results are agnostic to the algorithm used for optimization. However, we agree that any structure imposed on the inputs (e.g by enforcing them to be the output of a self-contained dynamical system) would likely alter the results. A potentially interesting extension of our model would be to do just what the reviewer suggested, and try to learn a network that can generate the optimal inputs. However, this is outside the scope of our investigation, as it would then lead to new questions (e.g what brain region would that other RNN represent?).

      (5)  Under the objective considered by the authors, the amount of input occurring before the movement might be impacted by the presence of online sensory signals for closed-loop control. It is therefore an open question whether the objective and network characteristics suggested by the authors could also explain the presence of preparatory activity before e.g. grasping movements that are thought to be more sensory-driven (Meirhaeghe et al., Cell Reports 2023).

      It is true that we aren’t currently modeling sensory signals explicitly. However, some of the optimal inputs we infer may be capturing upstream information which could englobe some sensory information. This is currently unclear, and would likely depend on how exactly the model is specified. We have added new discussion to emphasize that our dynamics should not be understood as just representing M1, but more general circuits whose state can be decoded from M1.

      Reviewer #2 (Recommendations For The Authors):

      Additionally, thank you for pointing out various typos in the manuscript, we have fixed those!

      Reviewer 3:

      Thank you very much for your review, which makes a lot of very insightful points, and raises several interesting questions. In summary, we very much agree with the limitations you pointed out. In particular, the choice of input cost is something we had previously discussed, but we had found it challenging to decide on what a reasonable cost for “complexity” could be. Following your comment, we have however added a first attempt at penalizing “temporal complexity”, which shows promising behavior. We have only included those additional analyses as supplementary figures, and we have included new discussion, which hopefully highlights what we meant by the different model components, and how the model behavior may change as we vary some of our choices. We hope this can be informative for future models that may use a similar approach. Below, we highlight the changes that we have made to address your comments.

      The main limitation of the study is that it focuses exclusively on one specific constraint - magnitude - that could limit motor-cortex inputs. This isn't unreasonable, but other constraints are at least as likely, if less mathematically tractable. The basic results of this study will probably be robust with regard such issues - generally speaking, any constraint on what can be delivered during execution will favor the strategy of preparing - but this robustness cuts both ways. It isn't clear that the constraint used in the present study - minimizing upstream energy costs - is the one that really matters. Upstream areas are likely to be limited in a variety of ways, including the complexity of inputs they can deliver. Indeed, one generally assumes that there are things that motor cortex can do that upstream areas can't do, which is where the real limitations should come from. Yet in the interest of a tractable cost function, the authors have built a system where motor cortex actually doesn't do anything that couldn't be done equally well by its inputs. The system might actually be better off if motor cortex were removed. About the only thing that motor cortex appears to contribute is some amplification, which is 'good' from the standpoint of the cost function (inputs can be smaller) but hardly satisfying from a scientific standpoint.

      The use of a term that punishes the squared magnitude of control signals has a long history, both because it creates mathematical tractability and because it (somewhat) maps onto the idea that one should minimize the energy expended by muscles and the possibility of damaging them with large inputs. One could make a case that those things apply to neural activity as well, and while that isn't unreasonable, it is far from clear whether this is actually true (and if it were, why punish the square if you are concerned about ATP expenditure?). Even if neural activity magnitude an important cost, any costs should pertain not just to inputs but to motor cortex activity itself. I don't think the authors really wish to propose that squared input magnitude is the key thing to be regularized. Instead, this is simply an easily imposed constraint that is tractable and acts as a stand-in for other forms of regularization / other types of constraints. Put differently, if one could write down the 'true' cost function, it might contain a term related to squared magnitude, but other regularizing terms would by very likely to dominate. Using only squared magnitude is a reasonable way to get started, but there are also ways in which it appears to be limiting the results (see below).

      I would suggest that the study explore this topic a bit. Is it possible to use other forms of regularization? One appealing option is to constrain the complexity of inputs; a long-standing idea is that the role of motor cortex is to take relatively simple inputs and convert them to complex time-evolving inputs suitable for driving outputs. I realize that exploring this idea is not necessarily trivial. The right cost-function term is not clear (should it relate to low-dimensionality across conditions, or to smoothness across time?) and even if it were, it might not produce a convex cost function. Yet while exploring this possibility might be difficult, I think it is important for two reasons.

      First, this study is an elegant exploration of how preparation emerges due to constraints on inputs, but at present that exploration focuses exclusively on one constraint. Second, at present there are a variety of aspects of the model responses that appear somewhat unrealistic. I suspect most of these flow from the fact that while the magnitude of inputs is constrained, their complexity is not (they can control every motor cortex neuron at both low and high frequencies). Because inputs are not complexity-constrained, preparatory activity appears overly complex and never 'settles' into the plateaus that one often sees in data. To be fair, even in data these plateaus are often imperfect, but they are still a very noticeable feature in the response of many neurons. Furthermore, the top PCs usually contain a nice plateau. Yet we never get to see this in the present study. In part this is because the authors never simulate the situation of an unpredictable delay (more on this below) but it also seems to be because preparatory inputs are themselves strongly time-varying. More realistic forms of regularization would likely remedy this.

      That is a very good point, and it mirrors several concerns that we had in the past. While we did focus on the input norm for the sake of simplicity, and because it represents a very natural way to regularize our control solutions, we agree that a “complexity cost” may be better suited to models of brain circuits. We have addressed this in a supplementary investigation. We chose to focus on a cost that penalizes the temporal complexity of the inputs, as ||u(t+1) - u(t)||^2. Note that this required augmenting the state of the model, making the computations quite a bit slower; while it is doable if we only penalize the first temporal derivative, it would not scale well to higher orders.

      Interestingly, we did find that the activity in that setting was somewhat more realistic (see new Supplementary Figure S8), with more sustained inputs and plateauing activity. While we have kept the original model for most of the investigations, the somewhat more realistic nature of the results under that setting suggests that further exploration of penalties of that sort could represent a promising avenue to improve the model.

      We also found the idea of a cost that would ensure low-dimensionality of the inputs across conditions very interesting. However, it is challenging to investigate with iLQR as we perform the optimization separately for each condition; nevertheless, it could be investigated using a different optimizer.

      At present, it is also not clear whether preparation always occurs even with no delay. Given only magnitude-based regularization, it wouldn't necessarily have to be. The authors should perform a subspace-based analysis like that in Figure 6, but for different delay durations. I think it is critical to explore whether the model, like monkeys, uses preparation even for zero-delay trials. At present it might or might not. If not, it may be because of the lack of more realistic constraints on inputs. One might then either need to include more realistic constraints to induce zero-delay preparation, or propose that the brain basically never uses a zero delay (it always delays the internal go cue after the preparatory inputs) and that this is a mechanism separate from that being modeled.

      I agree with the authors that the present version of the model, where optimization knows the exact time of movement onset, produces a reasonably realistic timecourse of preparation when compared to data from self-paced movements. At the same time, most readers will want to see that the model can produce realistic looking preparatory activity when presented with an unpredictable delay. I realize this may be an optimization nightmare, but there are probably ways to trick the model into optimizing to move soon, but then forcing it to wait (which is actually what monkeys are probably doing). Doing so would allow the model to produce preparation under the circumstances where most studies have examined it. In some ways this is just window-dressing (showing people something in a format they are used to and can digest) but it is actually more than that, because it would show that the model can produce a reasonable plateau of sustained preparation. At present it isn't clear it can do this, for the reasons noted above. If it can't, regularizing complexity might help (and even if this can't be shown, it could be discussed).

      In summary, I found this to be a very strong study overall, with a conceptually timely message that was well-explained and nicely documented by thorough simulations. I think it is critical to perform the test, noted above, of examining preparatory subspace activity across a range of delay durations (including zero) to see whether preparation endures as it does empirically. I think the issue of a more realistic cost function is also important, both in terms of the conceptual message and in terms of inducing the model to produce more realistic activity. Conceptually it matters because I don't think the central message should be 'preparation reduces upstream ATP usage by allowing motor cortex to be an amplifier'. I think the central message the authors wish to convey is that constraints on inputs make preparation a good strategy. Many of those constraints likely relate to the fact that upstream areas can't do things that motor cortex can do (else you wouldn't need a motor cortex) and it would be good if regularization reflected that assumption. Furthermore, additional forms of regularization would likely improve the realism of model responses, in ways that matter both aesthetically and conceptually. Yet while I think this is an important issue, it is also a deep and tricky one, and I think the authors need considerable leeway in how they address it. Many of the cost-function terms one might want to use may be intractable. The authors may have to do what makes sense given technical limitations. If some things can't be done technically, they may need to be addressed in words or via some other sort of non-optimization-based simulation.

      Specific comments

      As noted above, it would be good to show that preparatory subspace activity occurs similarly across delay durations. It actually might not, at present. For a zero ms delay, the simple magnitude-based regularization may be insufficient to induce preparation. If so, then the authors would either have to argue that a zero delay is actually never used internally (which is a reasonable argument) or show that other forms of regularization can induce zero-delay preparation.

      Yes, that is a very interesting analysis to perform, which we had not considered before! When investigating this, we found that the zero-delay strategy does not rely on preparation in the same way as is seen in the monkeys. This seems to be a reflection of the fact that our “Go cue” corresponds to an “internal” go cue which would likely come after the true, “external go cue” – such that we would indeed never actually be in the zero delay setting. This is not something we had addressed (or really considered) before, although we had tried to ensure we referred to “delta prep” as the duration of the preparatory period but not necessarily the delay period. We have now included more discussion on this topic, as well as a new Supplementary Figure S10.

      I agree with the authors that prior modeling work was limited by assuming the inputs to M1, which meant that prior work couldn't address the deep issue (tackled here) of why there should be any preparatory inputs at all. At the same time, the ability to hand-select inputs did provide some advantages. A strong assumption of prior work is that the inputs are 'simple', such that motor cortex must perform meaningful computations to convert them to outputs. This matters because if inputs can be anything, then they can just be the final outputs themselves, and motor cortex would have no job to do. Thus, prior work tried to assume the simplest inputs possible to motor cortex that could still explain the data. Most likely this went too far in the 'simple' direction, yet aspects of the simplicity were important for endowing responses with realistic properties. One such property is a large condition-invariant response just before movement onset. This is a very robust aspect of the data, and is explained by the assumption of a simple trigger signal that conveys information about when to move but is otherwise invariant to condition. Note that this is an implicit form of regularization, and one very different from that used in the present study: the input is allowed to be large, but constrained to be simple. Preparatory inputs are similarly constrained to be simple in the sense that they carry only information about which condition should be executed, but otherwise have little temporal structure. Arguably this produces slightly too simple preparatory-period responses, but the present study appears to go too far in the opposite direction. I would suggest that the authors do what they can to address these issue via simulations and/or discussion. I think it is fine if the conclusion is that there exist many constraints that tend to favor preparation, and that regularizing magnitude is just one easy way of demonstrating that. Ideally, other constraints would be explored. But even if they can't be, there should be some discussion of what is missing - preparatory plateaus, a realistic condition-invariant signal tied to movement onset - under the present modeling assumptions.

      As described above, we have now included two additional figures. In the first one (S8, already discussed above), we used a temporal smoothness prior, and we indeed get slightly more realistic activity plateaus. In a second supplementary figure (S9), we have also considered using model predictive control (MPC) to optimize the inputs under an uncertain go cue arrival time. There, we found that removing the assumption that the delay period is known came with new challenges: in particular, it requires the specification of a “mental model” of when the Go cue will arrive. While it is reasonable to expect that monkeys will have a prior over the go time arrival cue that will be shaped by the design of the experiment, some assumptions must be made about the utility functions that should be used to weigh this prior. For instance, if we imagine that monkeys carry a model of the possible arrival time of the go cue that is updated online, they could nonetheless act differently based on this information, for instance by either preparing so as to be ready for the earliest go cue possible or alternatively to be ready for the average go cue. This will likely depend on the exact task design and reward/penalty structure. Here, we added simulations with those two cases (making simplifying assumptions to make the problem tractable/solvable using model predictive control), and found that the “earliest preparation” strategy gives rise to more realistic plateauing activity, while the model where planning is done for the “most likely go time” does not. We suspect that more realistic activity patterns could be obtained by e.g combining this framework with the temporal smoothness cost. However, the main point we wished to make with this new supplementary figure is that it is possible to model the task in a slightly more realistic way (although here it comes at the cost of additional model assumptions). We have now added more discussion related to those points. Note that we have kept our analyses on these new models to a minimum, as the main takeaway we wish to convey from them is that most components of the model could be modified/made more realistic. This would impact the qualitative behavior of the system and match to data but – in the examples we have so far considered – does not appear to modify the general strategy of networks relying on preparation.

      On line 161, and in a few other places, the authors cite prior work as arguing for "autonomous internal dynamics in M1". I think it is worth being careful here because most of that work specifically stated that the dynamics are likely not internal to M1, and presumably involve inter-area loops and (at some latency) sensory feedback. The real claim of such work is that one can observe most of the key state variables in M1, such that there are periods of time where the dynamics are reasonably approximated as autonomous from a mathematical standpoint. This means that you can estimate the state from M1, and then there is some function that predicts the future state. This formal definition of autonomous shouldn't be conflated with an anatomical definition.

      Yes, that is a good point, thank you for making it so clearly! Indeed, as previous work, we do not think of our “M1 dynamics” as being internal to M1, but they may instead include sensory feedback / inter-area loops, which we summarize into the connectivity, that we chose to have dynamics that qualitatively resemble data. We have now incorporated more discussion regarding what exactly the dynamics in our model represent.

      Round 2 of reviews

      Reviewer 3:

      My remaining comments largely pertain to some subtle (but to me important) nuances at a few locations in the text. These should be easy for the authors to address, in whatever way they see fit.

      Specific comments:

      (1) The authors state the following on line 56: "For preparatory processes to avoid triggering premature movement, any pre-movement activity in the motor and dorsal pre-motor (PMd) cortices must carefully exclude those pyramidal tract neurons."

      This constraint is overly restrictive. PT neurons absolutely can change their activity during preparation in principle (and appear to do so in practice). The key constraint is looser: those changes should have no net effect on the muscles. E.g., if d is the vector of changes in PT neuron firing rates, and b is the vector of weights, then the constraint is that b'd = 0. d = 0 is one good way of doing this, but only one. Half the d's could go up and half could go down. Or they all go up, but half the b's are negative. Put differently, there is no reason the null space has to be upstream of the PT neurons. It could be partly, or entirely, downstream. In the end, this doesn't change the point the authors are making. It is still the case that d has to be structured to avoid causing muscle activity, which raises exactly the point the authors care about: why risk this unless preparation brings benefits? However, this point can be made with a more accurate motivation. This matters, because people often think that a null-space is a tricky thing to engineer, when really it is quite natural. With enough neurons, preparing in the null space is quite simple.

      That is a good point – we have now reformulated this sentence to instead say “to avoid triggering premature movement, any pre-movement activity in the motor and dorsal premotor (PMd) cortices must engage the pyramidal tract neurons in a way that ensures their activity patterns will not lead to any movement”.

      (2) Line 167: 'near-autonomous internal dynamics in M1'.

      It would be good if such statements, early in the paper, could be modified to reflect the fact that the dynamics observed in M1 may depend on recurrence that is NOT purely internal to M1. A better phrase might be 'near-autonomous dynamics that can be observed in M1'. A similar point applies on line 13. This issue is handled very thoughtfully in the Discussion, starting on line 713. Obviously it is not sensible to also add multiple sentences making the same point early on. However, it is still worth phrasing things carefully, otherwise the reader may have the wrong impression up until the Discussion (i.e. they may think that both the authors, and prior studies, believe that all the relevant dynamics are internal to M1). If possible, it might also be worth adding one sentence, somewhere early, to keep readers from falling into this hole (and then being stuck there till the Discussion digs them out).

      That is a good point: we have now edited the text after line 170 to make it clear that the underlying dynamics may not be confined to M1, and have referenced the later discussion there.

      (3) The authors make the point, starting on line 815, that transient (but strong) preparatory activity empirically occurs without a delay. They note that their model will do this but only if 'no delay' means 'no external delay'. For their model to prepare, there still needs to be an internal delay between when the first inputs arrive and when movement generating inputs arrive.

      This is not only a reasonable assumption, but is something that does indeed occur empirically. This can be seen in Figure 8c of Lara et al. Similarly, Kaufman et al. 2016 noted that "the sudden change in the CIS [the movement triggering event] occurred well after (~150 ms) the visual go cue... (~60 ms latency)" Behavioral experiments have also argued that internal movement-triggering events tend to be quite sluggish relative to the earliest they could be, causing RTs to be longer than they should be (Haith et al. Independence of Movement Preparation and Movement Initiation). Given this empirical support, the authors might wish to add a sentence indicating that the data tend to justify their assumption that the internal delay (separating the earliest response to sensory events from the events that actually cause movement to begin) never shrinks to zero.

      While on this topic, the Haith and Krakauer paper mentioned above good to cite because it does ponder the question of whether preparation is really necessary. By showing that they could get RTs to shrink considerably before behavior became inaccurate, they showed that people normally (when not pressured) use more preparation time than they really need. Given Lara et al, we know that preparation does always occur, but Haith and Krakauer were quite right that it can be very brief. This helped -- along with neural results -- change our view of preparation from something more cognitive that had to occur, so something more mechanical that was simply a good network strategy, which is indeed the authors current point. Working a discussion of this into the current paper may or may not make sense, but if there is a place where it is easy to cite, it would be appropriate.

      This is a nice suggestion, and we thank the reviewer for pointing us to the Haith and Krakauer paper. We have now added this reference and extended the paragraph following line 815 to briefly discuss the possible decoupling between preparation and movement initiation that is shown in the Haith paper, emphasizing how this may affect the interpretation of the internal delay and comparisons with behavioral experiments.

    1. AbstractBackground Visualization is an indispensable facet of genomic data analysis. Despite the abundance of specialized visualization tools, there remains a distinct need for tailored solutions. However, their implementation typically requires extensive programming expertise from bioinformaticians and software developers, especially when building interactive applications. Toolkits based on visualization grammars offer a more accessible, declarative way to author new visualizations. Nevertheless, current grammar-based solutions fall short in adequately supporting the interactive analysis of large data sets with extensive sample collections, a pivotal task often encountered in cancer research.Results We present GenomeSpy, a grammar-based toolkit for authoring tailored, interactive visualizations for genomic data analysis. Users can implement new visualization designs with little effort by using combinatorial building blocks that are put together with a declarative language. These fully customizable visualizations can be embedded in web pages or end-user-oriented applications. The toolkit also includes a fully customizable but user-friendly application for analyzing sample collections, which may comprise genomic and clinical data. Findings can be bookmarked and shared as links that incorporate provenance information. A distinctive element of GenomeSpy’s architecture is its effective use of the graphics processing unit (GPU) in all rendering. GPU usage enables a high frame rate and smoothly animated interactions, such as navigation within a genome. We demonstrate the utility of GenomeSpy by characterizing the genomic landscape of 753 ovarian cancer samples from patients in the DECIDER clinical trial. Our results expand the understanding of the genomic architecture in ovarian cancer, particularly the diversity of chromosomal instability. We also show how GenomeSpy enabled the discovery of clinically actionable genomic aberrations.Conclusions GenomeSpy is a visualization toolkit applicable to a wide range of tasks pertinent to genome analysis. It offers high flexibility and exceptional performance in interactive analysis. The toolkit is open source with an MIT license, implemented in JavaScript, and available at https://genomespy.app/.

      A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giae040), where the paper and peer reviews are published openly under a CC-BY 4.0 license. These peer reviews were as follows:

      Reviewer 1: Andrea Sboner

      In this manuscript, the authors present Genome Spy, a visualization toolkit geared toward the rapid and interactive exploration of genomic features. They demonstrate how this tool can help investigators explore a large cohort of 753 ovarian cancers sequenced by whole-genome sequencing (WGS). By using the tool, they were able to identify outliers in the dataset and refine their diagnosis. The tool is inspired by Vega-lite, a high-level grammar for interactive graphics, and extends it for genomic applications.

      The manuscript is clearly written, and the authors provide links to the applications itself, tutorials and examples. I want to commend them for doing this. This is a tool that would nicely complement others and has a specific advantage of using high-performance GPUs that are now common in modern computers.

      The only concern that I have is about a couple of claims that may not be fully supported by the data provided: 1. Claim: users can implement new visualization designs easily. While the grammar certainly enables the users to define new designs, I do not think that this is necessarily easy, as the authors themselves recognize in the discussion section when they suggest providing templates to reduce the learning curve. Indeed, the example in Figure 2 is still quite verbose and would need some time for anyone to understand the syntax and the style. The playground web application facilitates testing it, though. 2. Claim: the grammar-based approach allows to be mixed and matched. I did not find any specific example of how to do this. It would have been quite interesting to see the intersection between the DNA representation of structural variants and RNA-seq data (if this is what it means as "mix and match").

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, by using simulation, in vitro and in vivo electrophysiology, and behavioral tests, Peng et al. nicely showed a new approach for the treatment of neuropathic pain in mice. They found that terahertz (THz) waves increased Kv conductance and decreased the frequency of action potentials in pyramidal neurons in the ACC region. Behaviorally, terahertz (THz) waves alleviated neuropathic pain in the mouse model. Overall, this is an interesting study. The experimental design is clear, the data is presented well, and the paper is well-written. I have a few suggestions.

      (1) The authors provide strong theoretical and experimental evidence for the impact of voltage-gated potassium channels by terahertz wave frequency. However, the modulation of action potential also relies on non-voltage-dependent ion channels. For example, I noticed that the RMP was affected by THz application (Figure 3F) as well. As the RMP is largely regulated by the leak potassium channels (Tandem-pore potassium channels), I would suggest testing whether terahertz wave photons have also any impact on the Kleak channels as well.

      Thank you for your positive comment and for providing us with this valuable suggestion. After testing the leak K+ current with and without HFTS on the SNI model, we observed a notable increase in the leak K+ current with HFTS when the holding potential surpassed -40 mV (please see the revised Figs. 2m and n). This finding prompted us to delve deeper into the shifts in the resting membrane potential (RMP). The data, along with statistical analysis, are detailed in Tables S1-3.

      (2) The activation curves of the Kv currents in Figure 2h seem to be not well-fitted. I would suggest testing a higher voltage (>100 mV) to collect more data to achieve a better fitting.

      Thanks for your advice. We repeated the experiment while maintaining the voltage of patched neurons at a higher level (>100 mV) to collect ample data for better fitting. The outcomes are illustrated in the revised Figs. 2g-j. Clearly, the data reveals a significant increase in K+ conductance in the HFTS group as compared to the SNI group. We have integrated these discoveries into the revised manuscript, replacing the earlier results.

      (3) In the part of behavior tests, the pain threshold increased after THz application and lasted within 60 mins. I suggest conducting prolonged tests to determine the end of the analgesic effect of terahertz waves.

      Thank you for your insightful comment. We echo your curiosity about the duration of the HFTS effect. In the process of revising our work, we conducted a comparative analysis of the analgesic duration resulting from 10-minute and 15-minute applications of HFTS. The findings are visualized in the revised Fig. 5c. Our observations indicate that after 160 minutes, the PWMT value for the 15-minute HFTS group decreased to a level comparable to that of the SNI group. Meanwhile, the analgesic effects persisted for 140 minutes in the case of the 10-minute HFTS application. These results imply a direct correlation between the duration of HFTS application and the duration of analgesia.

      (4) Regarding in vivo electrophysiological recordings, the post-HFTS recordings were acquired from a time window of up to 20 min. It seems that the HFTS effect lasted for minutes, but this was not tested in vitro where they looked at potassium currents. This long-lasting effect of HFTS is interesting. Can the authors discuss it and its possible mechanisms, or test it in slice electrophysiological experiments?

      Thank you for your comment. Based on the results from in vivo electrophysiological recordings, it was observed that the effect of HFTS can endure for a minimum of 20 minutes, and this duration was even more extended in behavioral assessments. Taking your advice, we employed slice electrophysiological recording for further testing. Following a 15-minute application of HFTS, we evaluated the K+ current at 5 and 20 minutes after incubation. Our observations clearly indicated a substantial and lasting increase in K+ current, with the effect persisting for at least 20 minutes (refer to Fig. 2l). This provides confirmation of the long-lasting influence of HFTS. The relevant data and statistical analysis are documented in Table S1-2.

      (5) How did the authors arrange the fiber for HFTS delivery and the electrode for in vivo multi-channel recordings? Providing a schematic illustration in Figure 4 would be useful.

      Thank you for your comment. To enhance the reader's understanding of the HFTS delivery device during multi-channel recording, we have included a schematic illustration in Fig. 4a in the revised manuscript. The top portion of Fig. 4a depicts a quantum cascade laser (QCL) with a center frequency located at approximately 36 THz. This laser is then connected to the recording electrode via a PIR fiber. The left section illustrates the detailed structure of the recording electrode.

      (6) Some grammatical errors should be corrected.

      Thank you for your thorough review. We have carefully checked and corrected grammar errors we found throughout the entire text to ensure that readers can better comprehend the content of the article.

      Reviewer #2 (Public Review):

      In this manuscript, Peng et al., reported that 36 THz high-frequency terahertz stimulation (HFTS) can suppress the activity of pyramidal neurons by enhancing the conductance of voltage-gated potassium channel. The authors also demonstrated the effectiveness of using 36THz HFTS for treating neuropathic pain.

      Strengths:

      The manuscript is well written and the conclusions are supported by robust results. This study highlighted the potential of using 36 THz HFTS for neuromodulation.

      Weaknesses:

      More characterization of HFTS is needed, so the readers can have a better assessment of the potential usage of HFTS in their own applications.

      Thank you for your suggestion. We have created schematic diagrams illustrating the HFTS delivery (Fig. 4a and Fig. 5a in the revised manuscript). Fig. 4a presents the structure designed for in vivo multi-channel recording. Fig. 5a shows the structure used in behavior test, the recording electrode is replaced by a metal hollow tube, allowing the PIR fiber to pass through the tube and target the ACC region of the mice.

      (1) It would be very helpful to estimate the volume of tissue that can be influenced by HFTS. It is not clear how 15 mins HFTS was chosen for this functional study. Does a longer time have a stronger effect? A better characterization of the relationship between the stimulus duration of HFTS and its beneficial effects would be very useful.

      Thank you for your feedback. The degree of tissue influence is directly related to the size of the spot emerging from the fiber outlet. In our experiment, we used a PIR fiber with a 630 nm inner core diameter to propagate high-frequency THz waves. This core features a refractive index of 2.15 and has an effective numerical aperture (NA) of 0.35 ± 0.05.

      Our decision to apply HFTS for 15 minutes in the behavioral study was primarily based on observations from in vivo multi-channel recordings. Specifically, we noticed a considerable reduction in the average firing rate of PYR cells after 15 minutes of HFTS exposure. To further investigate the correlation between the duration of HFTS stimulation and its effects, we conducted a comparative study using a 10-minute HFTS session. The results, depicted in revised Fig. 5c, reveal that the PWMT value decreased to the level seen in the SNI group after approximately 160 minutes following 15 minutes of HFTS, and after about 140 minutes with 10 minutes of HFTS. This suggests a direct relationship between the length of HFTS application and its beneficial outcomes.

      (2) How long does the behavioral effect last after 15 minutes of HFTS? Figure 5b only presents the behavioral effect for one hour, but the pain level is still effectively reduced at this time point. The behavioral measurement should last until pain sensitization drops back to pre-stim level.

      Thank you for your feedback. Similar question is also mentioned by reviewer 1. As depicted in Fig. 5c, it was observed that the analgesic effects lasted for 140-160 min with 10-15 minutes application of HFTS. Based on these findings, we can conclude that in the SNI model, targeting the ACC brain region with HFTS for a duration of 10-15 minutes results in an analgesic effect that lasts for roughly 140-160 minutes. This provides valuable insights into the potential clinical applications and duration of relief that can be achieved through HFTS treatment.

      (3) Although the manuscript only tested in ACC, it will also be useful to demonstrate the neural modulation effect on other brain regions. Would 36THz HFTS also robustly modulate activities in other brain regions? Or are different frequencies needed for different brain regions?

      Thank you for your comment. We hypothesize that light waves at a frequency of approximately 36 THz effectively modulate neuronal activities in various brain regions, primarily due to their impact on K channels. Additionally, we speculate that the application of THz waves at different frequencies may influence other channels, such as Na and Ca channels, potentially facilitating or inhibiting neuronal activities. We believe this is a fascinating and significant area of research to explore in the future.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript by Peng et al. presents intriguing data indicating that high-frequency terahertz stimulation (HFTS) of the anterior cingulate cortex (ACC) can alleviate neuropathic pain behaviors in mice. Specifically, the investigators report that terahertz (THz) frequency stimulation widens the selectivity filter of potassium channels thereby increasing potassium conductance and leading to a reduction in the excitability of cortical neurons. In voltage clamp recordings from layer 5 ACC pyramidal neurons in acute brain slice, Peng et al. show that HFTS enhances K current while showing minimal effects on Na current. Current clamp recording analyses show that the spared nerve injury model of neuropathic pain decreases the current threshold for action potential (AP) generation and increases evoked AP frequency in layer 5 ACC pyramidal neurons, which is consistent with previous studies. Data are presented showing that ex-vivo treatment with HFTS in slice reduces these SNI-induced changes to excitability in layer 5 ACC pyramidal neurons. The authors also confirm that HFTS reduces the excitability of layer 5 ACC pyramidal neurons via in vivo multi-channel recordings from SNI mice. Lastly, the authors show that HFTS is effective at reducing mechanical allodynia in SNI using both the von Frey and Catwalk analyses. Overall, there is considerable enthusiasm for the findings presented in this manuscript given the need for non-pharmacological treatments for pain in the clinical setting.

      Strengths:

      The authors use a multifaceted approach that includes modeling, ex-vivo and in-vivo electrophysiological recordings, and behavioral analyses. Interpretation of the findings is consistent with the data presented. This preclinical work in mice provides new insight into the potential use of directed high-frequency stimulation to the cortex as a primary or adjunctive treatment for chronic pain.

      Weaknesses:

      There are a few concerns noted that if addressed, would significantly increase enthusiasm for the study.

      (1) The left Na current trace for SNI + HFTS in Figure 2B looks to have a significant series resistance error. Time constants (tau) for the rate of activation and inactivation for Na currents would be informative.

      Thank you for your feedback. We have carefully considered your comments and made several adjustments in the revised Figs. 2b-f to improve clarity and accuracy. Firstly, we have conducted a comparison of the time constants (tau) between the SNI group and the SNI+HFTS group. These time constants represent the latency of Na current activation or inactivation relative to the half-activated/inactivated voltage. Our analysis reveals that there is no statistically significant difference in tau between the two groups for both activation and deactivation curves. Secondly, we have updated the sample traces in Fig. 2b of the revised manuscript. These new traces illustrate that tau does not significantly differ between the SNI and SNI+HFTS groups, providing a visual representation of our findings. We believe that these modifications strengthen the presentation of our study's details and results, making the data more accessible and understandable for readers.

      (2) It is unclear why an unpaired t-test was performed for paired data in Figure 2. Also, statistical methods and values for non-significant data should be presented.

      Thank you for your comment. I think you mean the results in Fig. 3. We agree with you that we should use one-way ANOVA to analyze the data since there are more than 2 groups for comparison. We thus re-analyzed the data by using one-way ANOVA in Figs. 3g-k, and have included detailed statistical methods and P values in the revised manuscript.

      (3) It would seem logical to perform HFTS on ACC-Pyr neurons in acute slices from sham mice (i.e. Figure 3 scenario). These experiments would be informative given the data presented in Figure 4.

      Thank you for your valuable advice. During the revision process, we performed HFTS on ACC-PYR neurons in acute slices obtained from sham mice. The findings from this experiment have been integrated into the updated Fig. 3, where the sham group is represented by the green line and histogram (the revised Fig. 3 in the manuscript). It is noteworthy that a significant decrease in spike frequency was observed in the sham mice following HFTS.

      (4) As the data are presented in Figure 4g, it does not seem as if SNI significantly increased the mean firing rate for ACC-Pyr neurons, which is observed in the slice. The data were analyzed using a paired t-test within each group (sham and SNI), but there is no indication that statistical comparisons across groups were performed. If the argument is that HFTS can restore normal activity of ACC-Pyr neurons following SNI, this is a bit concerning if no significant increase in ACC-Pyr activity is observed in in-vivo recordings from SNI mice.

      Thank you for highlighting the inaccuracies in the analysis. After reviewing the data, we re-analyzed it using alternative statistical methods. In the revised version, since the data did not follow a normal distribution, we employed Wilcoxon matched-paired signed rank tests within the sham and SNI groups, and Mann-Whitney tests between the sham and SNI groups.

      Upon comparing the statistical outcomes across the groups, we found that the mean firing rate of 130 ACC neurons in SNI mice was significantly higher compared to that of 108 ACC neurons in sham mice (P = 0.0447, Mann-Whitney test). Notably, the mean firing rate of ACC-PYR exhibited a more pronounced increase with a P value of 0.0274 in SNI pre-HFTS versus sham pre-HFTS, while the mean firing rate of ACC-INT did not display a significant change across the groups. These findings align with the observations we made in the slice, reinforcing the validity of our results.

      (5) The authors indicate that the effects of HFTS are due to changes in Kv1.2. However, they do not directly test this. A blocking peptide or dendrotoxin could be used in voltage clamp recordings to eliminate Kv1.2 current and then test if this eliminates the effects of HFTS. If K current is completely blocked in VC recordings then the authors can claim that currents they are recording are Kv1.1 or 1.2.

      Thank you for your kind suggestion. In our research, we employed the Kv1.2 structure as a model to determine the response frequency of terahertz waves. Through both in vitro and in vivo experiments, we were able to demonstrate that the frequency of approximately 36 THz affects the Kv channel and its corresponding spike frequency. Upon analyzing the action potential waveform, we observed a notable variance in the resting membrane potential (RMP). This RMP is predominantly controlled by leak potassium channels, specifically the Tandem-pore potassium channels. In accordance with the recommendation of reviewer 1, we have addressed this particular aspect of our experimentation in the revised manuscript.

      We agree that we should use blocking peptides or dendrotoxin to eliminate Kv1.2 current. However, we meet problems in purchasing and delivery of the drugs. We thus added some explanation in the Discussion part to emphasize the value for this pharmacological experiment and can further confirm this in the future works.

      (6) The ACC is implicated in modulating the aversive aspect of pain. It would be interesting to know whether HFTS could induce conditioned place preference in SNI mice via negative reinforcement (i.e. alleviation of spontaneous pain due to the injury). This would strengthen the clinical relevance of using HFTS in treating pain.

      Thank you for this valuable advice. We share your intrigue regarding this experiment, and we fully recognize the importance and potential of further exploring this area. At present, however, our equipment and platform limitations prevent us from conducting the necessary tests. However, we remain committed to pursuing relevant research opportunities in the future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      (1) Study suggests that the effects of their tumor models of mouse behavioral are largely non-specific to the tumor as most behaviors are rescued by analgesic treatment. So, most of the changes were likely due to site-specific pain and not a unique signal from the tumor.

      The tumor generates pain at the site it is implanted, and it is likely amplified by the oral activities tumor bearing mice have to engage in. As there is no pain in the absence of the tumor, the pain is, by definition, caused by the tumor, not by the site. Concerning the relationship between pain and behavior, the behavioral assays undertaken in our study (nesting, cookie test, wheel running) were very limited in scope.  Two of these assays (nesting, cookie test) require use of the oral cavity. Only nesting and wheel running were assessed in the context of treatment for pain. Nesting behavior was completely restored with carprofen and buprenorphine treatment suggesting that in the absence of pain, mice were able to make perfect nests. Consistent with this, carprofen and buprenorphine treated animals also gained weight indicating that eating (another activity dependent on the oral cavity) was also restored.  Wheel running, an activity that does not rely on the oral cavity, was only partially restored with drug treatment. While additional behavioral tests are necessary to confirm this finding, the data suggest that there is pain-independent information relayed to the brain which accounts for this decline in wheel running.

      Reviewer #2:

      (1) The main claim is that tumor-infiltrating nerves underlie cancer-induced behavioral alterations, but the experimental interventions are not specific enough to support this. For example, all TRPV1 neurons, including those innervating the skin and internal organs, are ablated to examine sensory innervation of the tumor. Within the context of cancer, behavioral changes may be due to systemic inflammation, which may alter TRPV1 afferents outside the local proximity of tumor cells. A direct test of the claims of this paper would be to selectively inhibit/ablate nerve fibers innervating the tumor or mouth region.

      We agree with the reviewer that a direct test of the hypothesis would require selectively inhibiting the nerve fibers innervating the tumor and assessing the impact on behavior. Studies in the lab are on-going using pharmacological interventions to do this. These studies are beyond the scope of this current manuscript.

      (2) Behavioral results from TRPV1 neuron ablation studies are in part confounded by differing tumor sizes in ablated versus control mice. Are the differences in behavior potentially explained by the ablated animals having significantly smaller tumors? The differences in tumor sizes are not negligible. One way to examine this possibility might be to correlate behavioral outcomes with tumor size.

      As suggested by the reviewer, we have graphed nesting scores and time-to-interact (cookie test) relative to tumor volume.  In both cases, we used simple linear regression to fit the data and analyzed the slopes of the lines. In the case of nesting, there was no significant difference between the slopes. This is now included as Supplemental Figure 4A. In the case of the cookie test, there was a significant difference between the slopes. This is now included as Supplemental Figure 4B. Graphing the data in this way allows one to look at any given tumor volume and infer what the nesting score and the time-to-interact for the two groups of mice. The linear regression model fits the time to interact with the cookie reasonably well, thus from this graph, we can see that at any given tumor volume the time to interact with the cookie was generally shorter in TRPV1cre::DTAfl/wt animals as compared to C57BL/6 mice. Unfortunately, the linear regression does not fit the nesting data very well and thus it is more difficult to make the comparison of tumor volume and nesting score.

      The following text has been added to the results section.

      Given the impact of nociceptor neuron ablation on tumor growth, we wondered whether differences in tumor volume contributed to the behavioral differences we noted. Thus, the behavior data were graphed as a function of tumor volume (Supplemental Fig 4A, B). A simple linear regression model was used to fit the data. In the case of nesting scores, the linear regression did not fit the data points very well making it difficult to assess nesting scores at a given tumor volume (Supplemental Fig 4A). However, the linear regression model fit the time to interact data better. Here, the graph suggests that tumor volume did not influence behavior as at any given tumor volume the time to interact with the cookie is generally smaller in TRPV1-Cre::Floxed-DTA animals as compared to C57BL/6 animals (Supplemental Fig 4B).

      Reviewer #3:

      (1) The authors mention in their Discussion the need for additional experiments. Could they also include / comment on the potential impact on the anti-tumor immune system in their model?

      The following text has been added to the discussion:

      Neuro-immune interactions have been studied in the context of a variety of conditions including, but not limited to infection 109, inflammation 110,111, homeostasis in the gut 112-114, as well as neurological diseases115,116. Neuro-immune communications in the context of cancer and behavior have also been studied (e.g., sickness behavior, depression) 117-119 however, these studies did not assess these interactions at the tumor bed. Investigations into neuro-immune interactions occurring within primary malignancies which harbor nerves have shed light on these critical communications. In the context of melanoma, which is innervated by sensory nerves, we identified that release of the neuropeptide calcitonin gene related peptide (CGRP) induces immune suppression. This effect is mediated by CGRP binding to its receptor, RAMP1, which is expressed on CD8+ T cells 49. A study utilizing a different syngeneic model of oral cancer similarly found an immune suppressive role for CGRP 120-122. These studies demonstrate that neuro-immune interactions occur at the tumor bed. Our current findings indicating that tumor-infiltrating nerves connect to a circuit that includes regions within the brain suggest that neuro-immune interactions within the peripheral malignancy may contribute to the behavioral alterations we studied.

      (2) The authors mention the importance of inflammation contributing to pain in cancer but do not clearly highlight how this may play a role in their model. Can this be clarified?

      The following text has been added to the discussion section of the manuscript.

      Moreover, given that carprofen and buprenorphine decrease inflammation 104, their ability to restore normal nesting and cookie test behaviors (which require the use of the oral cavity where the tumor is located) suggests that inflammation at the tumor site contributed to the decline in these behaviors in vehicle-treated animals. Since both drugs were given systemically and each only partially restored wheel running, it suggests that systemic inflammation alone cannot fully account for the decline in wheel running seen in vehicle-treated animals. We posit that the inflammation- and pain-independent component of this behavioral decline is mediated via the transcriptional and functional alterations in the cancer-brain circuit.

      (3) The tumor model apparently requires isoflurane injection prior to tumor growth measurements. This is different from most other transplantable types of tumors used in the literature. Was this treatment also given to control (i.e., non-tumor) mice at the same time points? If not, can the authors comment on the impact of isoflurane (if any) in their model?

      Mice in all groups (tumor and non-tumor) were treated with isoflurane. This important detail has been added to the methods section.

      (4) The authors emphasize in several places that this is a male mouse model. They mention this as a limitation in the Discussion. Was there an original reason why they only tested male mice?

      The following text has been added in the discussion section:

      Head and neck cancer is predominantly a cancer in males; it occurs in males three times more often than in females 123, this disparity increases in certain parts of the world. While smoking cigarettes and drinking alcohol are risk factors for HPV negative head and neck squamous cell carcinoma, even males that do not smoke and drink are have a higher susceptibility for this cancer than females 124,125. Thus, our studies used only male mice. However, we do recognize that females also get this cancer. In fact, female patients with head and neck cancer, particularly oral cancer, report more pain than their male counterparts 126,127. These findings suggest that differences in tumor innervation exist in males and females.

      Therefore, another project in the lab has been to compare disease characteristics (including innervation and behavior) in male and female mice. The findings from this second study are the topic of a separate manuscript.

      Recommendations For The Authors:

      Reviewing editor:

      (1) Tumors can communicate with the brain via blood-borne agents from the tumor itself or immune cells that are activated by the tumor in addition to neurons that invade the tumor. The xia and malaise that accompanies some tumors can be mediated by direct innervation and/or the humoral factors because both can activate the same parabrachial pathway. This paper makes the case for the direct innervation being important but ignores the possibility of both being involved. The interesting observation that innervation supports tumor growth (perhaps via substance P) is troublesome because the slower appearance of behavioral consequences (Figures 4 & 5) could be attributed to the smaller tumor size. A nice control for humoral effects would be to implant the tumor cells someplace in the body where innervation does not occur (if possible) and then examine behavioral outcomes.

      In the course of several projects, we have implanted different tumor cell lines in different locations in mice (oral cavity, hind limb, flank, peritoneal cavity). In each location, tumor innervation occurs. This is not a phenomenon found only in mice as we completed an immunohistological survey of human cancers from different sites and found they are all innervated (PMID 34944001). These data are consistent with tumor and locally-released factors that recruit nerves to the tumor bed (PMID: 30327461)(PMID: 32051587)(PMID: 27989802). Thus, an implantation site that does not result in tumor innervation is currently unknown and likely does not exist.

      (2) The authors should address whether there is an inflammatory component in this tumor model.

      MOC2-7 tumors have been characterized as non-inflamed and poorly immunogenic 129-131.

      This information has been added to the methods section.

      (3) The RTX experiment in Figure 5 would be more compelling if the drug was injected directly into the tumor rather than injecting it in the flank, thus ablating all TRPV1-exressing neurons as in the genetic approach.

      While we agree with the reviewer that ablating the TRPV1-expressing neurons at the tumor site directly would be ideal, RTX treatment takes approximately one week for ablation to occur but a significant amount of inflammation is associated with this. Therefore, we wait a total of 4 weeks for the inflammation to resolve. By this time, tumors have generally reached sacrifice criteria. Thus, this approach would not enable the question to be answered Moreover, we are not aware of any studies in which RTX has been injected in the oral cavity or face. While RTX is utilized clinically to treat pain, it is typically administered intrathecally, epidurally or intra-ganglionically (PMID: 37894723).

      (4) The authors address affective aspects of pain but do not adequately address the sensory aspects, e.g., sensitivity to touch, heat and/or cold. They attribute the decrease in food disappearance (consumption) and nest building to oral pain, but it could be due to anhedonia and anorexia that can accompany tumor progression.

      Assaying for touch and heat/cold sensitivity in the oral cavity is a critical aspect of studying head and neck cancer that needs to be addressed. However, in rodents these assays are not trivial given that any touch/heat/cold in the area of the tumor (oral cavity) impacts the sensitive whiskers in that region which directly influence these assays. Thus, we have been refining assays (e.g., OPAD, facial von Frey) to address these important questions. The findings from these studies are beyond the scope of this manuscript.

      The reviewer makes a good point about anhedonia and anorexia. The following text has been added to the results section:

      Pain-induced anhedonia is mediated by changes in the reward pathway. Specifically, in the context of pain, dopaminergic neurons in the ventral tegmental area (VTA) become less responsive to pain and release less serotonin.  This decreased serotonin results in disinhibition of GABA release; the resulting increased GABA promotes an increased inhibitory drive leading to anhedonia  82 and, when extreme, anorexia. Carprofen and buprenorphine treatments completely reversed nesting behavior and significantly improved eating. Inflammation 83 and opioids 84 directly influence reward processing and though our tracing studies did not indicate that the tumor-brain circuit includes the VTA, this brain region may be indirectly impacted by tumor-induced pain in the oral cavity. Thus, an alternative interpretation of the data is that the effects of carprofen and buprenorphine treatments on nesting and food consumption may be due to inhibition of anhedonia (and anorexia) rather than, or in addition to, relieving oral pain.

      (5) Comment on why only males were used in this study.

      Please see response to public reviews.

      Reviewer #1:

      (1) Please provide a justification for the use of exclusively male mice and expand in the discussion if there is potential for these findings to be directly applicable to female mice as well.

      Please see response to public reviews.

      The following text has been added to the discussion:

      Head and neck cancer is predominantly a cancer in males; it occurs in males three times more often than in females 123, this disparity increases in certain parts of the world. While smoking cigarettes and drinking alcohol are risk factors for HPV negative head and neck squamous cell carcinoma, even males that do not smoke and drink are have a higher susceptibility for this cancer than females 124,125. Thus, our studies used only male mice. However, we do recognize that females also get this cancer. In fact, female patients with head and neck cancer, particularly oral cancer, report more pain than their male counterparts 126,127. These findings suggest that differences in tumor innervation exist in males and females.

      (2) When discussing the results shown in Figure 2, please include some mention of Fus, since it was the highest expressed transcript.

      The following text has been added to the results section regarding Fus.

      The gene demonstrating the highest increase in expression, Fus, was of particular interest; it increases in expression within DRG neurons following nerve injury and contributes to injury-induced pain 51,52. Of note, we purposefully used whole trigeminal ganglia rather than FACS-sorted tracer-positive dissociated neurons to avoid artificially imposing injury and altering the transcript levels of these cells 53,54. Thus, significantly elevated expression of Fus by ipsilateral TGM neurons from tumor-bearing animals suggests the presence of neuronal injury induced by the malignancy. This is consistent with our previous findings 55 and those of others 56 showing that tumor-infiltrating nerves harbor higher expression of nerve-injury transcripts and neuronal sensitization.

      (3) In line 197 please clarify the mice used. Were all mice tumor-bearing and some had nociceptors ablated, or was there a control (no tumor) group as well?

      Line 197 refers to Figure 4D. In this figure, panels B-D show quantification of cFos and DFosB in the spinal nucleus of the TGM (SpVc), The parabrachial nucleus (PBN) and the Central nucleus of the amygdala (CeA). These data are from C57BL/6 and TRPV1cre::DTAfl/wt animals all of whom had tumor. Supplementary Figure 3C also show quantification of cFos and DFosB but these are from control, non-tumor bearing animals. The fact that controls are non-tumor-bearing has been added to the supplemental figure legend and the text of the results section has been clarified as follows.

      While Fos expression was similar between non-tumor bearing mice of the two genotypes (Supplemental Fig. 3C-E), the absence of nociceptor neurons in tumor-bearing animals decreases cFos and DFosB in the PBN, and DFosB in the SpVc (Fig. 4B, C).

      (4) Overall it would improve the readability of the figures if the colors for the IHC channels were on the image itself and not exclusively in the figure legend.

      The colors for all the staining have been added to each panel.

      (5) It is not a problem that complete cartography was not done, but please include a justification for why the brain regions that were focused on were chosen.

      In order to ensure that our neural tracing technique captured only nerves present within the tumor bed, we restricted the injection of tracer to only 2 µl. We demonstrated that this small volume did not leak out of the tumor (Figure 1) and thus any tracer labeled neurons we identified were deemed as being connected in a circuit to nerves in the tumor bed. While we acknowledged that this calculated technical approach restricted our ability to tracer label all neurons in the tumor bed (as well as those they share circuitry with), it ensured no tracer leakage and inadvertent labeling of non-tumoral nerves. In non-tumor animals injected with 10 µl of tracer, labeled regions in the brain included the spinal nucleus of the trigeminal, the parabrachial nucleus, the central amygdala, the facial nucleus and the motor nucleus of the trigeminal. The regions that were tracer positive when tumor was injected were limited to the spinal nucleus of the trigeminal, the parabrachial nucleus and the central amygdala. Thus, the regions in the brain that we focused on were the areas that became tracer-positive following injection of tracer into the tumor.

      (6) Were the cells that were injected cultured in media with 10% fetal calf serum? If so was any inflammatory response seen? If not please state in the methods section the media that cells for injection were cultured in.

      The cells injected into animals were cultured in media containing 10% fetal calf serum. When cells are harvested for tumor injections, they are first washed two times with PBS and then trypsinized to detach the cells from the plate. Cells are collected, washed again with PBS and resuspended with DMEM without serum; this is what is injected into animals. We harvest cells in this way in order to eliminate any serum being injected into mice. This information has been added to the Methods section.

      (7) Would any of the differences in drug treatment (Carprofen vs Buprenorphine) be due to the differing routes of administration and metabolism of the drugs?

      Since carprofen and buprenorphine each resulted in similar behavioral impacts (nesting and wheel running), their different routes of administration seem to play a minor or no role in the behaviors assessed.

      (8) Please include in the methods section the specific approach and software that was used for processing calcium imaging data and calculating a relative change in fluorescence.

      The specific approach used for processing calcium imaging data and calculating relative change in fluorescence as well as the software used are all included in the methods section. Please see below:

      Ca2+ imaging. TGM neurons from non-tumor and tumor-bearing animals (n=4-6 mice/condition) were imaged on the same day. Neurons were incubated with the calcium indicator, Fluo-4AM, at 37°C for 20 min. After dye loading, the cells were washed, and Live Cell Imaging Solution (Thermo-Fisher) with 20 mM glucose was added. Calcium imaging was conducted at room temperature. Changes in intracellular Ca2+ were measured using a Nikon scanning confocal microscope with a 10x objective. Fluo-4AM was excited at 488 nm using an argon laser with intensity attenuated to 1%. The fluorescence images were acquired in the confocal frame (1024 × 1024 pixels) scan mode. After 1 min of baseline measure, capsaicin (300nM final concentration) was added. Ca2+ images were recorded before, during and after capsaicin application. Image acquisition and analysis were achieved using NIS-Elements imaging software. Fluo-4AM responses were standardized and shown as percent change from the initial frame. Data are presented as the relative change in fluorescence (DF/F0), where F0 is the basal fluorescence and DF=F-F0 with F being the measured intensity recorded during the experiment. Calcium responses were analyzed only for neurons responding to ionomycin (10 µM, positive control) to ensure neuronal health. Treatment with the cell permeable Ca2+ chelator, BAPTA (200 µM), served as a negative control.

      (9) Suggestions for Figure 1:

      - In Figures 1C, D, E, include labels for the days of tumor harvest.

      - Please make the size of the labels the same for 1K an 1L and align them.

      - Microscopy image in Figure 1L for SpVc looks like it may be at a different magnification.

      - If possible, include (either in the figure or the supplement) IHC images staining for Dcx and tau, which would complement the western blot data.

      The requested changes to the figures have been made. Unfortunately, we do not have Dcx and tau IHC staining of the day 4, 10 and 20 tumors.

      (10) Suggestions for Figure 2:

      - Include directly onto the graph in Figure 2a the legend for tumor-bearing (red) and non-tumor bearing (blue).

      - Keep consistent between Figure 2G and 2H/I if the tumor/nontumor will be labeled as T/N or Tumor/Control.

      The requested changes to the figures have been made.

      (11) Suggestions for Figure 3:

      - An example trace of calcium signal would complement Figure 3G, H well.

      Example tracings of calcium signal are already provided in Supplementary Figure 3A and B.

      Reviewer #2:

      (1) While the use of male mice is acknowledged, there is not a rationale for why female mice were not included in the study.

      Please see the response to Reviewer #1 (first question).

      (2) Criteria for euthanasia should be described in the Methods. This is especially needed for interpreting the survival curve in Figure 4H.

      Criteria for euthanasia in our IACUC approved protocol include:

      - maximum tumor volume of 1000mm3

      - edema

      - extended period of weight loss progressing to emaciation

      - impaired mobility or lesions interfering with eating, drinking or ambulation

      - rapid weight loss (>20% in 1 week)

      - weight loss at or more than 20% of baseline

      In addition to tumor size and weight loss, we use the body condition score to evaluate the state of animals and to determine euthanasia.  These details have been added to the Methods section.

      (3) At what stage in cancer progression were the Fos studies conducted for Figure 4A-D?

      The brains used for Fos staining (Fig 4B-D) were harvested at week 5 post-tumor implantation.

      (4) For Fos counts, what are the bregma coordinates for the sections that were quantified?

      SpVc:  -7.56 to -8.24mm

      PBN:  -4.96 to -5.52mm

      CeA:  -0.82mm to -1.94mm

      (5) Statistics are needed for the claim in Lines 171-173.

      The statistical analysis of Fos staining from tumor-bearing and non-tumor bearing brains are included in Figure 3D-F. The statistical analysis of ex vivo Ca+2 imaging of brains from tumor-bearing and non-tumor bearing animals are included in Figure 3 I and J.

      (6) How long was the baseline period for weight and food intake measurements? How long were the animals single-housed before taking the baseline measurements?  

      Baseline weight and food intake measurements were 2 weeks and animals were singly housed before baseline measurements for 2 weeks (a total of 4 weeks).

      Minor:

      (7) The authors might consider rewording the sentence on lines 59-62, given that it is abundantly clear from rodent studies that both the tumor and chemotherapy are associated with adverse behavioral outcomes.

      We have reworded the sentence as follows:  The association of cancer with impaired mental health is directly mediated by the disease, its treatment or both; these findings suggest that the development of a tumor alters brain functions.

      (8) Line 212 needs a space between the two sentences.

      This has been fixed.

      (9) Font size in Figure 2 is not consistent with the other figures.

      This has been fixed.

      (10) "DAPI" is the more conventional than "DaPi".

      This has been fixed.

      Editorial Comments and Suggestions:

      (1) The Abstract would be better if it were more concise, e.g. ~175 words.

      The abstract has been shortened as requested and now reads:

      Cancer patients often experience changes in mental health, prompting an exploration into whether nerves infiltrating tumors contribute to these alterations by impacting brain functions. Using a mouse model for head and neck cancer and neuronal tracing we show that tumor-infiltrating nerves connect to distinct brain areas. The activation of this neuronal circuitry altered behaviors (decreased nest-building, increased latency to eat a cookie, and reduced wheel running). Tumor-infiltrating nociceptor neurons exhibited heightened calcium activity and brain regions receiving these neural projections showed elevated cFos and delta FosB as well as increased calcium responses compared to non-tumor-bearing counterparts. The genetic elimination of nociceptor neurons decreased brain Fos expression and mitigated the behavioral alterations induced by the presence of the tumor. While analgesic treatment restored nesting and cookie test behaviors, it did not fully restore voluntary wheel running indicating that pain is not the exclusive driver of such behavioral shifts. Unraveling the interaction between the tumor, infiltrating nerves, and the brain is pivotal to developing targeted interventions to alleviate the mental health burdens associated with cancer.

      (2) Lines 28, 104, 258, 486, 521, and many other places, "utilized" should be "used" because the former refers to an application for which it is not intended, e.g. a hammer was utilized as a doorstop.

      The requested changes have been made.

      (3) Lines 32 and 73, it is not clear whether the basal activity is heightened or whether excitability is increased. "manifest" might be better than "harbor" on line 73.

      We have changed the wording in the abstract to be clearer. Moreover, our finding that TGM neurons from tumor-bearing animals have increased expression of the s1-Receptor and phosphorylated TRPV1 (Fig 2G-I) indicate that these neurons have increased excitability.

      (4) Line 34 and elsewhere, it would be better to refer to Fos because the is no need to distinguish cellular, cFos, from viral, vFos, in this context.

      The requested changes have been made.

      (5) Line 38, It would be better to refer to what was actually measured rather than "oral movements".

      The requested changes have been made. The sentence now reads: “While analgesic treatment restored nesting and cookie test behaviors, it did not fully restore voluntary wheel running.”

      (6) Line 84, CXCR3-null mouse on a C57BL/6 background.

      The requested change has been made.

      (7) Lines 86,129 wild-type, male mice.

      The requested change has been made.

      (8) Lines114-115, the brackets are not necessary.

      The requested change has been made.

      (9) Lines 118, 384, 409, 527, 589, 971, 974 always leave a space between numbers and units. Use Greek u for micro.

      The requested change has been made.

      (10) Lines 123-124, it is not clear that there is meaningful labeling within the CeA.

      We have replaced this image with a more representative one of the CeA from a tumor-bearing animal with clear tracer labeling.

      (11) Lines 125, 138, and 246 transcription was not measured, only transcript levels were measured.

      The requested changes have been made.

      (12) Line 133, I think >4 fold is meant.

      Thank you for catching that. I have fixed it to >4 fold.

      (13) Line 165, single-time-point assessment (add hyphens).

      The requested change has been made.

      (14) Line 181 and elsewhere including figure, the superscripts refer to alleles of the genes; hence approved gene names should be used in italics (as in Methods), TRPV1-Cre:: Floxed-DTA (without italics) would be acceptable.

      The requested changes have been made.

      (15) Line 182, nociceptor-neuron-ablated mice (add hyphens).

      The requested changes have been made.

      (16) Line 197, It is not clear that the "speed" of food disappearance was measured or that it is due to oral pain vs loss of appetite.

      The reviewer makes a good point. We have changed the sentence to read:

      To evaluate the effects of this disruption on cancer-induced behavioral changes, we assessed the animals’ general well-being through nesting behavior 32 and anhedonia using the cookie test 76,77, as well as  body weight and food disappearance as surrogates for oral pain and/or loss of appetite.

      (17) Line 199, The reduced tumor growth after ablation could account for most of the changes in the other parameters that were measured.

      We have graphed the nesting scores and time-to-interact with the cookie as a function of tumor volume.  These data are now included as Supplemental Figure 4 and suggest that at the same tumor volume, nesting scores and times-to-interact with the cookie are different between the groups.

      (18) Line 204 TPVP1 spelling. Is the TGN smaller after ablation of half of the neurons?

      The requested change has been made.

      (19) Line 235, "now" is not necessary.

      The requested change has been made.

      (20) Line 238-239 and elsewhere, a few references for to why the TGN-SpVc-PBN-CeA circuit is relevant would be helpful.

      The following references have been added regarding the relevance of this circuit to behavior:

      Molecular Brain 14: 94 (2021) (PMID 34167570)

      Neuropharmacology 198: 108757 (2021) (PMID 34461068)

      Frontiers in Cellular Neuroscience 16: 997360 (2022)  (PMID 36385947)

      Neuropsychopharmacology  49(3): 508-520 (2024) (PMID 37542159)

      (21) Lines 371, 434 and Figures, gm should be g or grams in scientific usage. Include JAX lab stock numbers for these mouse lines.

      The requested changes have been made.

      (22) Line 432, removing food for one hour is not a fast.

      The sentence has been reworded as follows: One hour prior to testing, mouse food is removed and the animals are acclimated to the brightly lit testing room.

      (23) Line 476, 5-um sections (add hyphen).

      The hyphen has been added.

      (24) Lines 988, and 1023, DAPI are usually shown this way.

      The requested change has been made.

      (25) Figure 1K, add Bregma levels to figures.

      SpVc: -8.12 mm

      PBN: -5.34 mm

      CeA: -1.34 mm

      (26) Figure 3 line 1033, "area under the curve" What curve was examined?

      The curve examined was the change in fluorescence over time. This curve has been added as Supplemental Figure 3C.

      (27) Figure 3B, the circled area is the lateral PBN. At first glance, I thought scp was meant as the label for the circled area.

      Scp is noted in the figure legend as a landmark.

    1. Reviewer #3 (Public Review):

      Pipes and Nielsen propose a valuable new computational method for assigning individual Next Generation Sequencing (NGS) reads to their taxonomic group of origin, based on comparison with a dataset of reference metabarcode sequences (i.e. using an existing known marker sequence such as COI or 16S). The underlying problem is an important one, with broad applications such as identifying species of origin of smuggled goods, identifying the composition of metagenomics/ microbiomics samples, or detecting the presence of pathogen variants of concern from wastewater surveillance samples. Pipes and Nielsen propose (and make available with open source software) new computational methods, apply those methods to a series of exemplar data analyses mirroring plausible real-life scenarios, and compare the new method's performance to that of various field-leading alternative methods.

      In terms of methodology, the manuscript presents a novel computational analyses inspired by standard existing probabilistic phylogenetic models for the evolution of genome sequences. These form the basis for comparisons of each NGS read with a reference database of known examples spanning the taxonomic range of interest. The evolutionary aspects of the models are used (a) to statistically represent knowledge about the reference organisms (and uncertainty about their common ancestors) and their evolutionary relationships; and (b) to derive inferences about the relationship of the sample NGS reads that may be derived from reference organisms or from related organisms not represented in the reference dataset. This general approach has been considered previously and, while expected to be powerful in principle, the reliance of those methods on likelihood computations over a phylogenetic tree structure means they are slow to the point of useless on modern-sized problems that may have many thousands of reference sequences and many millions of NGS reads. Alternative methods that have been devised to be computationally feasible have had to sacrifice the phylogenetic approach, with a consequent loss of statistical power.

      Pipes and Nielsen's methodology contribution in this manuscript is to make a series of approximations to the 'ideal' phylogenetic likelihood analysis, aimed at saving computational time and keeping computer memory requirements acceptable whilst retaining as much as possible of the expected power of phylogenetic methods. Their description of their novel methods is solid; as they are largely approximations to other existing methods, their value ultimately will rest with the success of the method in application.

      Regarding the application of the new methods, to compare the accuracy of their method with a selection of existing methods the authors use 1) simulated datasets and 2) previously published mock community datasets to query sequencing reads against appropriate reference trees. The authors show that Tronko has a higher success at assigning query reads (at the species/genus/family level) than the existing tools with both datasets. In terms of computational performance, the authors show Tronko outperforms another phylogenetic tool, and is still within reasonable limits when compared with other 'lightweight' tools.

      As a demonstration of the power of phylogeny-based methods for taxonomic assignment, this ms. could gain added importance by refocusing the community towards explicitly phylogenetic methods. We agree with the authors that this would be likely to give rise to the most powerful possible methods.

      Strengths of this ms. are 1) the focus on phylogenetic approaches and 2) the reduction of a consequently difficult computational problem to a practical method (with freely available software); 3) the reminder that these approaches work well and are worthy of continued interest and development; and ultimately most-importantly 4) the creation of a powerful tool for taxonomic assignment that seems to be at least as good as any other and generally better.

      Weaknesses of the manuscript at present are 1) lack of consideration of some other existing methods and approaches, as it would be interesting to know if other ideas had been tried and rejected, or were not compatible with the methods created; 2) some over-simplifications in the description of new methods, with some aspects difficult or impossible to reproduce and some claims unsubstantiated. Further, 3) we are not convinced enough weight has been given to the complexity of 'pre-processing' the reference dataset for each metabarcode (e.g. gene) of interest, which may give the impression that the method is easier to apply to new reference datasets than we think would be the case. Lastly, 4) we encountered some difficulties getting the software installed and running on our computers. It was not possible to resolve every issue in the time available to us to perform our review, and some processing options remain untested.

      Overall, the methods that Pipes and Nielsen propose represent an important contribution that both creates a computational resource that is immediately valuable to the community, and emphasises the benefits of phylogenetic methods and provides encouragement for others to continue to work in this area to create still-better methods.

    1. AbstractPlatalea minor, the black-faced spoonbill (Threskiornithidae) is a wading bird that is confined to coastal areas in East Asia. Due to habitat destruction, it has been classified by The International Union for Conservation of Nature (IUCN) as globally endangered species. Nevertheless, the lack of its genomic resources hinders our understanding of their biology, diversity, as well as carrying out conservation measures based on genetic information or markers. Here, we report the first chromosomal-level genome assembly of P. minor using a combination of PacBio SMRT and Omni-C scaffolding technologies. The assembled genome (1.24 Gb) contains 95.33% of the sequences anchored to 31 pseudomolecules. The genome assembly also has high sequence continuity with scaffold length N50 = 53 Mb. A total of 18,780 protein-coding genes were predicted, and high BUSCO score completeness (93.7% of BUSCO metazoa_odb10 genes) was also revealed. A total of 6,155,417 bi-allelic SNPs were also revealed from 13 P. minor individuals, accounting for ∼5% of the genome. The resource generated in this study offers the new opportunity for studying the black-faced spoonbill, as well as carrying out conservation measures of this ecologically important spoonbill species.

      This work is part of a series of papers presenting outputs of the Hong Kong Biodiversity Genomics https://doi.org/10.46471/GIGABYTE_SERIES_0006 This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.130), and has published the reviews under the same license. These are as follows.

      Reviewer 1. Richard Flamio Jr.

      Is the language of sufficient quality?

      No. There are some grammatical errors and spelling mistakes throughout the text.

      Is there sufficient detail in the methods and data-processing steps to allow reproduction?

      Yes. The authors did a phenomenal job at detailing the methods and data-processing steps.

      Additional Comments:

      Very nice job on the paper. The methods are sound and the statistics regarding the genome assembly are thorough. My only two comments are: 1) I think the paper could be improved by the correction of grammatical errors, and 2) I am interested in a discussion about the number of chromosomes expected for this species (or an estimate) based on related species and if the authors believe all of the chromosomes were identified. For example, is the karyotype known or can the researchers making any inferences about the number of microchromosomes in the assembly? Please see a recent paper I wrote on microchromosomes in the wood stork assembly (https://doi.org/10.1093/jhered/esad077) for some ideas in defining the chromosome architecture of the spoonbill and/or comparing this architecture to related species.

      Re-review:

      The authors incorporated the revisions nicely and have produced a quality manuscript. Well done.

      Minor revisions Line 46: A comma is needed after (Threskiornithidae). Line 47: “The” should not be capitalized. Line 48: This should read “as a globally endangered species.” Line 49: “However, the lack of genomic resources for the species hinders the understanding of its biology…” Line 56: Consider changing “also revealed” to “identified” to avoid repetition from the previous sentence. Line 65: Insert “the” before “bird’s.” Lines 69-70: Move “locally” higher in the sentence – “and it is protected locally…” Line 72: Replace “as of to date” with “prior to this study”. Lines 78-79: Pluralize “part.” Line 86: Replace “proceeded” with “processed.” Line 133: “…are listed in Table 1.” Line 158: “accounted” Line 159: “Variant calling was performed using…” Line 161: “Hard filtering was employed…” Lines 200-201: “The heterozygosity levels… from five individuals were comparable to previous reports on spoonbills – black-faced spoonbill … and royal spoonbill … (Li et al. 2022).” Line 202: New sentence. “The remaining heterozygosity levels observed…” Line 206: “…genetic bottleneck in the black-faced spoonbill…” Lines 208-209: “These results highlight the need…” Lines 213-214: “…which are useful and precious resources for future population genomic studies aimed at better understanding spoonbill species numbers and conservation.” Line 226: Missing a period after “heterozygosity.” For references, consider adding DOIs. Some citations have them but most citations would benefit from this addition.

      Reviewer 2. Phred Benham

      Is the language of sufficient quality?

      Generally yes, the language is sufficiently clear. However, a number of places could be refined and extra words removed.

      Are all data available and do they match the descriptions in the paper?

      Additional data is available on figshare.

      I do not see any of the tables that are cited in the manuscript and contain legends. Am I missing something. Also there is no legend for the GenomeScope profile in figure 3.

      The assembly appears to be on genbank as a scaffold level assembly, can you list this accession info in the data availability section in addition to the project number.

      Is there sufficient data validation and statistical analyses of data quality?

      Overall fine, but some additional analyses would aid the paper. Comparison of the spoonbill genome to other close relatives using a synteny plot would be helpful.

      It would also be useful to put heterozygosity and inbreeding coefficients into context by comparing to results from other species.

      Additional Comments:

      Hui et al. report a chromosome level genome for the black-faced spoonbill, a endangered species of coastal wetlands in East Asia. This genome will serve as an important genome for understanding the biology of and conserving this species.

      Generally, the methods are sound and appropriate for the generation of genomic sequence.

      Major comments: This is a highly contiguous genome in line with metrics for Vertebrate Genomics Project genomes and other consortia. The authors argue that they have assembled 31 Pseudo-molecules or chromosomes. It would be nice to see a plot showing synteny of these 31 chromosomes and a closely related species with a chromosome level assembly (e.g. Theristicus caerulescens; GCA_020745775.1)

      The tables appear to be missing from the submitted manuscript?

      Minor comments: Line 49: delete its

      Line 49-51: This sentence is a little awkward, please revise.

      Line 64: delete 'the'

      Line 67: replace 'with' with 'the spoonbil as a'

      Line 68: delete 'Interestingly'

      Line 70: can you be more specific about what kind of genetic methods had previously been performed?

      Line 79: can you provide any additional details on the necessary permits and/or institutional approval

      Line 78: what kind of tissue? or were these blood samples?

      Line 110: do you mean movies?

      Line 143: replace data with dataset

      Line 163: it may be worth applying some additional filters in vcftools, e.g. minor allele freq., min depth, max depth, what level of missing data was allowed?, etc.

      Line 171: delete 'resulted in'

      Line 172: do you mean scaffold L50 was 8? Line 191-195: some context would be useful here, how does this level of heterozygosity and inbreeding compare to other waterbirds?

      Line 217: why did you use the Metazoan database and not the Aves_odb10 database for Busco?

      Figure 1b: Number refers to what, scaffolds? Be consistent with capitalization for Mb. It seems like the order of scaffold N50 and L50 were reversed.

      Figure 3 is missing a legend. Hui et al. report a chromosome level genome for the black-faced spoonbill, a endangered species of coastal wetlands in East Asia. This genome will serve as an important genome for understanding the biology of and conserving this species.

      Generally, the methods are sound and appropriate for the generation of genomic sequence.

      Major comments: This is a highly contiguous genome in line with metrics for Vertebrate Genomics Project genomes and other consortia. The authors argue that they have assembled 31 Pseudo-molecules or chromosomes. It would be nice to see a plot showing synteny of these 31 chromosomes and a closely related species with a chromosome level assembly (e.g. Theristicus caerulescens; GCA_020745775.1)

      The tables appear to be missing from the submitted manuscript?

      Minor comments: Line 49: delete its

      Line 49-51: This sentence is a little awkward, please revise.

      Line 64: delete 'the'

      Line 67: replace 'with' with 'the spoonbil as a'

      Line 68: delete 'Interestingly'

      Line 70: can you be more specific about what kind of genetic methods had previously been performed?

      Line 79: can you provide any additional details on the necessary permits and/or institutional approval

      Line 78: what kind of tissue? or were these blood samples?

      Line 110: do you mean movies?

      Line 143: replace data with dataset

      Line 163: it may be worth applying some additional filters in vcftools, e.g. minor allele freq., min depth, max depth, what level of missing data was allowed?, etc.

      Line 171: delete 'resulted in'

      Line 172: do you mean scaffold L50 was 8? Line 191-195: some context would be useful here, how does this level of heterozygosity and inbreeding compare to other waterbirds?

      Line 217: why did you use the Metazoan database and not the Aves_odb10 database for Busco?

      Figure 1b: Number refers to what, scaffolds? Be consistent with capitalization for Mb. It seems like the order of scaffold N50 and L50 were reversed.

      Figure 3 is missing a legend. Re-review:

      I previously reviewed this manuscript and overall the authors have done a nice job addressing all of my comments.

      I appreciate that the authors include the MCscan analysis that I suggested. However, the alignment of the P. minor assembly and annotations to other genomes suggests rampant mis-assembly or translocations. Birds have fairly high synteny and I would expect Pmin to look more similar to the comparison between T. caerulescens and M. americana in the MCscan plot. For instance, parts of the largest scaffold in the Pmin assembly map to multiple different chromosomes in the Tcae assembly. Similarly, the Z in Tcae maps to 11 different scaffolds in the Pmin assembly and there does not appear to be a single large scaffold in the Pmin assembly that corresponds to the Z chromosome.

      The genome seems to be otherwise of strong quality, so I urge the authors to double-check their MCscan synteny analysis. If this pattern remains, can you please add some comments about it to the end of the Data Validation and Quality Control section? I think other readers will also be surprised at the low levels of synteny apparent between the spoonbill and ibis assemblies.

    1. Popular/Well-known Name

      This is a great addition. I think it would actually be better to use the popular names in the above RA plot as well.

      We may also have some data source internally at SB that has a mapping from NCBI taxonomic virus name to common name.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this work, Odenwald and colleagues show that mutant biotin ligases used to perform proximity-dependent biotin identification (TurboID) can be used to amplify signal in fluorescence microscopy and to label phase-separated compartments that are refractory to many immunofluorescence approaches. Using the parasite Trypanosoma brucei, they show that fluorescent methods such as expansion microscopy and CLEM, which require bright signals for optimal detection, benefit from the elevated signal provided by TurboID fusion proteins when coupled with labeled streptavidin. Moreover, they show that phase-separated compartments, where many antibody epitopes are occluded due to limited diffusion and potential sequestration, are labeled reliably with biotin deposited by a TurboID fusion protein that localizes within the compartment. They show successful labeling of the nucleolus, likely phase-separated portions of the nuclear pore, and stress granules. Lastly, they use a panel of nuclear pore-TurboID fusion proteins to map the regions of the T. brucei nuclear pore that appear to be phase-separated by comparing antibody labeling of the protein, which is susceptible to blocking, to the degree of biotin deposition detected by streptavidin, which is not. 

      Strengths: 

      Overall, this study shows that TurboID labelling and fluorescent streptavidin can be used to boost signal compared to conventional immunofluorescence in a manner similar to tyramide amplification, but without having to use antibodies. TurboID could prove to be a viable general strategy for labeling phase-separated structures in cells, and perhaps as a means of identifying these structures, which could also be useful. 

      Weaknesses: 

      However, I think that this work would benefit from additional controls to address if the improved detection that is being observed is due to the increased affinity and smaller size of streptavidin/biotin compared to IgGs, or if it has to do with the increased amount of binding epitope (biotin) being deposited compared to the number of available antibody epitopes. I also think that using the biotinylation signal produced by the TurboID fusion to track the location of the fusion protein and/or binding partners in cells comes with significant caveats that are not well addressed here, mostly due to the inability to discern which proteins are contributing to the observed biotin signal. 

      To dissect the contributions of the TurboID fusion to elevating signal, anti-biotin antibodies could be used to determine if the abundance of the biotin being deposited by the TurboID is what is increasing detection, or if streptavidin is essential for this.

      We agree with the reviewer, that it would be very interesting to distinguish whether the increase in signal comes from the multiple biotinylation sites or from streptavidin being a very good binder, or perhaps from both. However, this question is very hard to answer, as antibodies differ massively in their affinity to the antigen which is further dependent on the respective IF-conditions, and are therefore not directly comparible. Even if anti-biotin gives a better signal then anti-HA, this can be either caused by the increase in antigen-number (more biotin than HA-tag) or by the higher binding affinity, or by a combination of both, thus hard to distinguish. Nevertheless, we have tested monoclonal mouse anti-biotin targeting the (non-phase-separated) NUP158. We found the signal from the biotin-antibody to be much weaker than from anti-HA, indicating that, at least this particular biotin antibody, is not a very good binder in IF. 

      Alternatively, HaloTag or CLIP tagging could be used to see if diffusion of a small molecule tag other than biotin can overcome the labeling issue in phase-separated compartments. There are Halo-biotin substrates available that would allow the conjugation of 1 biotin per fusion protein, which would allow the authors to dissect the relative contributions of the high affinity of streptavidin from the increased amount of biotin that the TurboID introduces. 

      This is a very good idea, as in this case, the signals are both from streptavidin and are directly comparable. We expressed NUP158 with HaloTag and added PEG-biotin as a Halo ligand. However, PEG-biotin is poorly cell-permeable, and is in general only used on lysates. In trypanosomes, cell permeability is particular restricted, and even Halo-ligands that are considered highly cell-penetrant give only a weak signal. Even after over-night incubation, we could not get any signal with PEG-biotin. Our control, the TMR-ligand 647, gave a weak nuclear pore staining, confirming the correct expression and function of the HaloTag-NUP158.

      The idea of using the biotin signal from the TurboID fusion as a means to track the changing localization of the fusion protein or the location of interacting partners is an attractive idea, but the lack of certainty about what proteins are carrying the biotin signal makes it very difficult to make clear statements. For example, in the case of TurboID-PABP2, the appearance of a biotin signal at the cell posterior is proposed to be ALPH1, part of the mRNA decapping complex. However, because we are tracking biotin localization and biotin is being deposited on a variety of proteins, it is not formally possible to say that the posterior signal is ALPH1 or any other part of the decapping complex. For example, the posterior labeling could represent a localization of PABP2 that is not seen without the additional signal intensity provided by the TurboID fusion. There are also many cytoskeletal components present at the cell posterior that could be being biotinylated, not just the decapping complex. Similar arguments can be made for the localization data pertaining to MLP2 and NUP65/75. I would argue that the TurboID labeling allows you to enhance signal on structures, such as the NUPs, and effectively label compartments, but you lack the capacity to know precisely which proteins are being labeled.  

      We fully agree with the reviewer, that tracking proteins by streptavidin imaging alone is problematic, because it cannot distinguish, which protein is biotinylated. We therefore used words like “likely”  in the description of the data. However, we still think, it is a valid method, as long as it is confirmed by an orthogonal method. We have added this paragraph to the end of this chapter:

      “Importantly, tracking of proteins by streptavidin imaging requires orthogonal controls, as the imaging alone does not provide information about the nature of the biotinylated proteins. These can be proximity ligation assay, mass spectrometry or specific tagging visualisation of protein suspects by fluorescent tags. Once these orthogonal controls are established for a specific tracking, streptavidin imaging is an easy and cheap and highly versatile method to monitor protein interactions in a specific setting.”

      Reviewer #2 (Public Review): 

      Summary: 

      The authors noticed that there was an enhanced ability to detect nuclear pore proteins in trypanosomes using a streptavidin-biotin-based detection approach in comparison to conventional antibody-based detection, and this seemed particularly acute for phase-separated proteins. They explored this in detail for both standard imaging but also expansion microscopy and CLEM, testing resolution, signal strength, and sensitivity. An additional innovative approach exploits the proximity element of biotin labelling to identify where interacting proteins have been as well as where they are. 

      Strengths: 

      The data is high quality and convincing and will have obvious application, not just in the trypanosome field but also more broadly where proteins are tricky to detect or inaccessible due to phase separation (or some other steric limitations). It will be of wide utility and value in many cell biological studies and is timely due to the focus of interest on phase separation, CLEM, and expansion microscopy. 

      Thank you! We are glad you liked it.

      Reviewer #3 (Public Review): 

      Summary: 

      The authors aimed to investigate the effectiveness of streptavidin imaging as an alternative to traditional antibody labeling for visualizing proteins within cellular contexts. They sought to address challenges associated with antibody accessibility and inconsistent localization by comparing the performance of streptavidin imaging with a TurboID-HA tandem tag across various protein localization scenarios, including phase-separated regions. They aimed to assess the reliability, signal enhancement, and potential advantages of streptavidin imaging over antibody labeling techniques. 

      Overall, the study provides a convincing argument for the utility of streptavidin imaging in cellular protein visualization. By demonstrating the effectiveness of streptavidin imaging as an alternative to antibody labeling, the study offers a promising solution to issues of accessibility and localization variability. Furthermore, while streptavidin imaging shows significant advantages in signal enhancement and preservation of protein interactions, the authors must consider potential limitations and variations in its application. Factors such as the fact that tagging may sometimes impact protein function, background noise, non-specific binding, and the potential for off-target effects may impact the reliability and interpretation of results. Thus, careful validation and optimization of streptavidin imaging protocols are crucial to ensure reproducibility and accuracy across different experimental setups. 

      Strengths: 

      - Streptavidin imaging utilizes multiple biotinylation sites on both the target protein and adjacent proteins, resulting in a substantial signal boost. This enhancement is particularly beneficial for several applications with diluted antigens, such as expansion microscopy or correlative light and electron microscopy. 

      - This biotinylation process enables the identification and characterization of interacting proteins, allowing for a comprehensive understanding of protein-protein interactions within cellular contexts. 

      Weaknesses: 

      - One of the key advantages of antibodies is that they label native, endogenous proteins, i.e. without introducing any genetic modifications or exogenously expressed proteins. This is a major difference from the approach in this manuscript, and it is surprising that this limitation is not really mentioned, let alone expanded upon, anywhere in the manuscript. Tagging proteins often impacts their function (if not their localization), and this is also not discussed.

      - Given that BioID proximity labeling encompasses not only the protein of interest but also its entire interacting partner history, ensuring accurate localization of the protein of interest poses a challenge. 

      - The title of the publication suggests that this imaging technique is widely applicable. However, the authors did not show the ability to track the localization of several distinct proteins on the same sample, which could be an additional factor demonstrating the outperformance of streptavidin imaging compared with antibody labeling. Similarly, the work focuses only on small 2D samples. It would have been interesting to be able to compare this with 3D samples (e.g. cells encapsulated in an extracellular matrix) or to tissues.  

      Recommendations for the authors:

      To enhance the assessment from 'incomplete' to 'solid', the reviewers recommend that the following major issues be addressed: 

      Major issues: 

      (1) Anti-biotin antibodies in combination with TurboID labeling should be used to compare the signal/labelling penetrance to streptavidin results. That would show if elevated biotin deposition matters, or if it is really the smaller size, more fluors, and higher affinity of streptavidin that's making the difference. 

      We agree with the reviewer, that it would be very interesting to distinguish whether the increase in signal comes from the multiple biotinylation sites or from streptavidin being a very good binder, or perhaps from both, and whether the size matters (IgG versus streptavidin). However, this question is very hard to answer, as antibodies differ massively in their affinity to the antigen. Thus, even if antibiotin would give a better signal then anti-HA, this could be either caused by the increase in antigen-number (more biotin than HA-tag) or by the better binding affinity, or by a combination, and it would not allow to truly answer the question. We have now tested anti-biotin antibodies, also in repsonse to reviewer 1, and got a much poorer signal in comparison to anti-HA or streptavidin.

      Please note that we made another attempt using nanobodies to target phase-separated proteins, to see, whether size matters (Fig. 2I). The nanobody did not stain Mex67 at the nuclear pores, but gave a weak nucelolar signal for NOG1, which may suggest that the nanobody can slightly better penetrate than IgG, but it does not rule out that the nanobody simply binds with higher affinity. Reviewer 1 has suggested to use the Halo Tag with PEG-biotin: this would indeed allow to directly compare the streptavidin signal caused by the TurboID with a single biotin added by the Halo tag. Unfortunately, the PEG-biotin does not  penetrate trypanosome cells. In conclusion, we are not aware of a method that would allow to establish why streptavidin but not IgGs can penetrate to phase separated areas. We therefore prefer to not overinterpret our data, but stick to what is supported by the data: “the inability to label phase-separated areas is not restricted to anti-HA but applies to other antibodies”.

      (3) Figure 4 A-B. The validity of claiming the correct localization demonstrated by streptavidin imaging comes into question, especially when endogenous fluorescence, via the fusion protein, remains undetectable (as indicated by the yellow arrow at apex). 

      In this figure, the streptavidin imaging does NOT show the correct localisation of the bait protein, but it does show proteins from historic interactions that have a distinct localisation to the bait. We had therefore introduced this chapter with the paragraph below, to make sure, the reader is aware of the limitations (which we also see as an opportunity, if properly controlled):

      “We found that in most cases, streptavidin labelling faithfully reflects the steady state localisation of a bait protein, e.g., the localisation resembles those observed with immunofluorescence or direct fluorescence imaging of GFP-fusion proteins. For certain bait proteins, this is not the case, for example, if the bait protein or its interactors have a dynamic localisation to distinct compartments, or if interactions are highly transient. It is thus essential to control streptavidin-based de novo localisation data by either antibody labelling (if possible) or by direct fluorescence of fusion-proteins for each new bait protein.”

      In particular, on lines 450-460, there's a fundamental issue with the argument put forward here. It is not possible to formally know that the posterior labeling is ALPH1 vs. another part of the decapping complex that was associated with PABP2-Turbo, or if the higher detection capacity of the Turbo-biotin label is uncovering a novel localization of the PABP2. While it is likely that it is ALPH1, it is not possible to rule out other possibilities with this approach. These issues should be discussed here and more generally the possibility of off-target labeling with this approach should be addressed in the discussion. 

      We fully agree with the reviewer, that tracking proteins by streptavidin imaging alone is problematic, because it cannot distinguish, which protein is biotinylated. We therefore used words like “likely”  in the description of the data. However, we still think, it is a valid method, as long as it is back-uped by an orthogonal method. We have added this paragraph to the end of this chapter:

      “Importantly, tracking of proteins by streptavidin imaging requires orthogonal controls, as the imaging alone does not provide information about the nature of the biotinylated proteins. These can be proximity ligation assay, mass spectrometry or specific tagging visualisation of protein suspects by fluorescent tags. Once these orthogonal controls are established for a specific tracking, streptavidin imaging is an easy and cheap and highly versatile method to monitor protein interactions in a specific setting.”

      (4) More discussion and acknowledgment of the general limitations in using tagged proteins are needed to balance the manuscript, especially if the hope is to draw a comparison with antibody labeling, which works on endogenous proteins (not requiring a tag). For example: (a) tagging proteins requires genetic/molecular work ahead of time to engineer the constructs and/or cells if trying to tag endogenous proteins; (b) tagged proteins should technically be validated in rescue experiments to confirm the tag doesn't disrupt function in the cell/tissue/context of interest; and (c) exogenous tagged proteins compete with endogenous untagged proteins, which can complicate the interpretation of data.  

      We have added this paragraph to the first paragraph of the discussion part:

      “Like many methods that are frequently used in cell- and molecular biology, streptavidin imaging is based on the expression of a genetically engineered fusion protein: it is essential to validate both, function and localisation of the TurboID-HA tagged protein by orthogonal methods. If the fusion protein is non-functional or mis-localised, tagging at the other end may help, but if not, this protein cannot be imaged by streptavidin imaging. Likewise, target organisms not amenable to genetic manipulation, or those with restricted genetic tools,  are not or less suitable for this method.”

      Also, we like to point out that for non-mainstream organisms like trypanosomes, antibodies are not commercially available and often genetic manipulation is more time-efficient and cheaper than the production of antiserum against the target protein.

      Also, the introduction would ideally be more general in scope and introduce the pros and cons of antibody labeling vs biotin/streptavidin, which are mentioned briefly in the discussion. The fact that the biotin-streptavidin interaction is ~100-fold higher affinity than an IgG binding to its epitope is likely playing a key role in the results here. The difference in size between IgG and streptavidin, the likelihood that the tetrameric streptavidin carries more fluors than a IgG secondary, and the fact that biotin can likely diffuse into phase-separated environments should be clearly stated. The current introduction segues from a previous paper that a more general audience may not be familiar with. 

      We have now included this paragraph to the introduction:

      “It remains unclear, why streptavidin was able to stain biotinylated proteins within these antibody inaccessible regions, but possible reasons are: (i) tetrameric streptavidin is smaller and more compact than IgGs (60 kDa versus a tandem of two IgGs, each with 150 kDa) (ii) the interaction between streptavidin and biotin is ~100 fold stronger than a typical interaction between antibody and antigen and (iii) streptavidin contains four fluorophores, in contrast to only one per secondary IgG.”

      Minor issues: 

      The copy numbers of the HA and Ty1 epitope tags vary depending on the construct being used. For example, Ty1 is found as a single copy tag in the TurboID tag, but on the mNeonGreen tag there are 6 copies of the epitope. It makes it hard to know if differences in detection are due to variations in copies of the epitope tags. Line 372-374: can the authors explain why they chose to use nanobodies in this case? It would be great to show the innate mNeonGreen signal in 2K to compare to the Ty1 labeling. The presence of 6 copies of the Ty1 epitope could be essential to the labeling seen here.

      We agree with the reviewer, that these data are a bit confusing. We have now removed Figure 3K, as it is the only construct with 6 Ty1 instead of one, and it does not add to the conclusions. (the mNeonsignal is entirely in the nucleolus, as shown by Tryptag). We have also added an explanation why we used nanobodies (“The absence of a nanobody signal rules out that its simply the size of IgGs that prevents the staining of Mex67 at the nuclear pores, as nanobodies are smaller than (tetrameric) streptavidin”). However, as stated above, we prefer not to overinterpret the data, as signals from different antibodies/nanobodies – antigen combinations are not comparable. Important to us was to stress that the absence of signal in phase-separated areas is NOT restricted to the anti-HA antibody, which is clearly supported by the data.

      What is the innate streptavidin background labeling look like in cells that are not carrying a TurboID fusion, from the native proteins that are biotinylated? That should be discussed. 

      We have now included the controls without the TurboID fusions for trypanosomes and HeLa cells: “Wild type cells of both Trypanosomes and human showed only a very low streptavidin signal, indicating that the signal from naturally biotinylated proteins is neglectable (Figure S8 in supplementary material).”

      Line 328-331: This is likely to be dependent on whether or not the protein moves to different localizations within the cell. 

      True, we agree, and we have added this paragraph:

      “The one exception are very motile proteins that produce a “biotinylation trail” distinct to the steady state localisation; these exceptions, and how they can be exploited to understand protein interactions, are discussed in chapter 4 below. “

      Line 304-305: Does biotin supplementation not matter at all? 

      No, we never saw any increase in biotinylation when we added extra biotin to trypanosomes. The 0.8 µM biotin concentration in the medium were sufficient.

      Line 326-327: Was the addition of biotin checked for enhancement in the case of the mammalian NUP98? I would argue that there is a significant number of puncta in Figure 1D that are either green or magenta, not both. The amount of extranuclear puncta in the HA channel is also difficult to explain. Biotin supplementation to 500 µM was used in mammalian TurboID experiments in the original Nature Biotech paper- perhaps nanomolar levels are too low. 

      We now tested HeLa cells with 500 µM Biotin and saw an increase in signal, but also in background; due to the increased background  we conclude that low biotin concentrations are more suitable . We have also repeated the experiment using 4HA tags instead of 1HA, and we found a minor improvement in the antibody signal for NUP88 (while the phase separated NUP54 was still not detectable). We have replaced the images in Figure 1D  (NUP88) and also in Figure 2F (NUP54) with improved images and using 4HA tags. However, we like to note that single nuclear pore resolution is beyond what can be expected of light microscopy.

      Line 371: In 2I, I see a signal that looks like the nucleus, similar to the Ty1 labeling in 2G, so I don't think it's accurate to say that that Mex67 was "undetectable". Does the serum work for blotting? 

      Thank you, yes, “undetectable” was not the correct phrase here. Mex67 localises to the nuclear pores, to the nuceoplasm and to the nucleolus (GFP-tagging or streptavidin). Antibodies, either to the tag or to the endogenous proteins, fail to detect Mex67 at the nuclear pores and also don’t show any particular enrichment in the nucleolus. They do, however, detect Mex67 in the (not-phase-separated) area of the nucleoplasm. We have changed the text to make this clearer. The Mex67 antiserum works well on a western blot (see for example: Pozzi, B., Naguleswaran, A., Florini, F., Rezaei, Z. & Roditi, I. The RNA export factor TbMex67 connects transcription and RNA export in Trypanosoma brucei and sets boundaries for RNA polymerase I. Nucleic Acids Res. 51, 5177–5192 (2023))

      Line 477: "lacked" should be "lagged".

      Thank you, corrected.

      Line 468-481: My previous argument holds here - how do you know that the difference in detection here is just a matter of much higher affinity/quantity of binding partner for the avidin?

      See answer to the second point of (3), above.

      483-491: Same issue - without certainty about what the biotin is on, this argument is difficult to make. 

      See answer to the second point of (3), above.

      Line 530: "bone-fine" should be "bonafide"

      Thank you, corrected.

      Line 602: biotin/streptavidin labeling has been used for expansion microscopy previously (Sun, Nature Biotech 2021; PMID: 33288959). 

      Thank you, we had overlooked this! We have now included this reference and describe the differences to our approach clearer in the discussion part:

      “Fluorescent streptavidin has been previously used in expansion microscopy to detect biotin residues in target proteins produced by click chemistry (Sun et al., 2021). However, to the best of our knowledge, this is the first report that employs fluorescent streptavidin as a signal enhancer in expansion microscopy and CLEM, by combining it with multiple biotinylation sites added by a biotin ligase. Importantly, for both CLEM and expansion, streptavidin imaging is the only alternative approach to immunofluorescence, as denaturing conditions associated with these methods rule out direct imaging of fluorescent tags.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment:

      This study presents valuable framework and findings to our understanding of the brain as a fractal object by observing the stability of its shape property within 11 primate species and by highlighting an application to the effects of aging on the human brain. The evidence provided is solid but the link between brain shape and the underlying anatomy remains unclear. This study will be of interest to neuroscientists interested in brain morphology, whether from an evolutionary, fundamental or pathological point of view, and to physicists and mathematicians interested in modeling the shapes of complex objects.

      We now clarified the outstanding questions regarding if our model outputs can be related to actual primate brain anatomy, which we believe was mainly based on comments regarding the validity of our output of apparently thicker cortices than nature can produce.

      We address this point in more detail in the point-by-point response below, but want to address this misunderstanding directly here: Our algorithm does not produce thicker cortices with increasing coarse-graining scales; in fact, the cortical thickness never exceeds the actual cortical thickness in our outputs, but rather thins with each coarse-graining scale. In other words, we believe that our outputs are fully in line with neuroanatomy across species.

      Reviewer #2 (Public Review): 

      In this manuscript, the authors analyze the shapes of cerebral cortices from several primate species, including subgroups of young and old humans, to characterize commonalities in patterns of gyrification, cortical thickness, and cortical surface area. The authors state that the observed scaling law shares properties with fractals, where shape properties are similar across several spatial scales. One way the authors assess this is to perform a "cortical melting" operation that they have devised on surface models obtained from several primate species. The authors also explore differences in shape properties between brains of young (~20 year old) and old (~80) humans. A challenge the authors acknowledge struggling with in reviewing the manuscript is merging "complex mathematical concepts and a perplexing biological phenomenon." This reviewer remains a bit skeptical about whether the complexity of the mathematical concepts being drawn from are justified by the advances made in our ability to infer new things about the shape of the cerebral cortex. 

      To allow scientists from all backgrounds to adopt these complex ideas, we have made our code to “melt” the brains and for further downstream analysis publicly available. We have now also provided a graphical user interface, to allow users without substantial coding experience to run the analysis. We also believe that the algorithmic concepts are easy to understand due to the similarity to the coarse-graining procedures found in long-standing and well-accepted box-counting algorithms.

      Beyond the theoretical insight of the fractal nature of cortices and providing an explicit and crucial link between vastly different brains that are gyrified and those that are not, we believe that the advance gained by our methods for future applications is clearly demonstrated in our proof-of-principle with a four-fold increase in effect size. For reference, an effect size of 8 would translate to an almost perfect separation of groups, i.e. an ideal biomarker with near 100% sensitivity and specificity.

      (1) The series of operations to coarse-grain the cortex illustrated in Figure 1 produces image segmentations that do not resemble real brains.

      As re-iterated in our Methods and Discussion: “Note, of course, that the coarse-grained brain surfaces are an output of our algorithm alone and are not to be directly/naively likened to actual brain surfaces, e.g. in terms of the location or shape of the folds. Our comparisons here between coarse-grained brains and actual brains is purely on the level of morphometrics across the whole cortex.”

      Fig. 1 therefore serves as an explanation to the reader on the algorithmic outputs, but each melted brain is not supposed to be directly/visually compared to actual brains. Similar to algorithms measuring the fractal dimension, or the exposed surface area of a given brain, the intermediate outputs of these algorithms are not supposed to represent any biologically observed brain structures, but rather serve as an abstraction to obtain meaningful morphometrics.

      We additionally added a note to the caption of Fig. 1 to clarify this point:

      “Note that the actual size of the brains for analysis are rescaled (see Methods and Fig. 3); we display all brains scaled at an equal size here for the ease of visualisation of the method.”

      Finally, we also edited the entire paper for terminology to clearly distinguish the terms of (1) the cortex as a 3D object, (2) coarse-grained and voxelised versions thereof, and (3) summary morphological measures derived from the former. When we invite comparisons in our paper between real brains and coarse-grained brains, this is always at the level of summary morphological measures, not at the level of the 3D objects/voxelisations themselves.

      The process to assign voxels in downsampled images to cortex and white matter is biased towards the former, as only 4 corners of a given voxel are needed to intersect the original pial surface, but all 8 corners are needed to be assigned a white matter voxel. The reason for introducing this bias (and to the extent that it is present in the authors' implementation) is not provided.

      This detail was in the Supplementary, and we have now added additional clarification on this specific point to our Supplementary:

      “In detail, we assign all voxels in the grid with at least four corners inside the original pial surface to the pial voxelization. This process allows the exposed surface to remain approximately constant with increasing voxel sizes. A constant exposed surface is desirable, as we only want to gradually ‘melt’ and fuse the gyri, but not grow the bounding/exposed surface as well. We want the extrinsic area to remain approximately constant as we decrease the intrinsic area via coarse-graining; it is like generating iterates of a Koch curve in reverse, from more to less detailed, by increasing the length of smallest line segment.

      We then assign voxels with all eight corners inside the original white matter surface to the white matter voxelization. This is to ensure integrity of the white matter, as otherwise white matter voxels in gyri may become detached from the core white matter, and thus artificially increase white matter surface area. Indeed, the main results of the paper are not very sensitive to this decision using all eight corners, vs. e.g. only four corners, as we do not directly use white matter surface area for the scaling law measurements. However, we still maintained this choice in case future work wants to make use of the white matter voxelisations or derivative measures.”

      Note on the point of white matter integrity that if both grey and white matter voxelisations require all 8 corner to be inside the respective mesh, there will be voxels not assigned to either at the grey/white matter interface, causing potential downstream issues.

      We further acknowledge:

      “Of course, our proposed procedure is not the only conceivable way to erase shape details below a given scale; and we are actively working on related algorithms that are also computationally cheaper. Nevertheless, the current version requires no fine-tuning, is computationally feasible and conceptually simple, thus making it a natural choice for introducing the methodology and approach.”

      The authors provide an intuitive explanation of why thickness relates to folding characteristics, but ultimately an issue for this reviewer is, e.g., for the right-most panel in Figure 2b, the cortex consists of several 4.9-sided voxels and thus a >2 cm thick cortex. A structure with these morphological properties is not consistent with the anatomical organization of typical mammalian neocortex. 

      We assume the reviewer refers to Fig. 1B with the panel on scale=4.9mm. We would like to point out that Fig. 1 serves as an explanation of the voxelisation method. For the actual analysis and Results, we are using re-scaled brains (see Fig. 2 with the ever decreasing brain sizes). The rescaling procedure is now expanded as below:

      “Morphological properties, such as cortical thicknesses measured in our ‘melted’ brains are to be understood as a thickness relative to the size of the brain. Therefore, to analyse the scaling behaviour of the different coarse-grained realisations of the same brain, we apply an isometric rescaling process that leaves all dimensionless shape properties unaffected (more details in Suppl. S3.1). Conceptually, this process fixes the voxel size, and instead resizes the surfaces relative to the voxel size, which ensures that we can compare the coarse-grained realisations to the original cortices, and test if the former, like the latter, also scale according to Eqn. (1). Resizing, or more precisely, shrinking the cortical surface is mathematically equivalent to increasing the box size in our coarse-graining method. Both achieved an erasure of folding details below a certain threshold. After rescaling, as an example, the cortical thickness also shrinks with increasing levels of coarse-graining, and never exceeds the thickness measured at native scale.”

      We additionally added a note to the caption of Fig. 1 to clarify this point:

      “Note that the actual size of the brains for analysis are rescaled (see Methods and Fig. 3); we display all brains scaled at an equal size here for the ease of visualisation of the method.”

      Finally, we also edited the entire paper for terminology to clearly distinguish the terms of (1) the cortex as a 3D object, (2) coarse-grained versions thereof, and (3) summary morphological measures derived from the former. When we invite comparisons in our paper between real brains and coarse-grained brains, this is always at the level of summary morphological measures, not at the level of the 3D objects themselves and their detailed anatomical features.

      (2) For the comparison between 20-year-old and 80-year-old brains, a well-documented difference is that the older age group possesses more cerebral spinal fluid due to tissue atrophy, and the distances between the walls of gyri becomes greater. This difference is born out in the left column of Figure 4b. It seems this additional spacing between gyri in 80 year olds requires more extensive down-sampling (larger scale values in Figure 4a) to achieve a similar shape parameter K as for the 20 year olds. The authors assert that K provides a more sensitive measure (associated with a large effect size) than currently used ones for distinguishing brains of young vs. old people. A more explicit, or elaborate, interpretation of the numbers produced in this manuscript, in terms of brain shape, might make this analysis more appealing to researchers in the aging field.

      We have removed the main results relating to K and aging from our last revision already to avoid confusion. This is now only in the supplementary analysis, and our claim of K being a more sensitive measure for age and ageing – whilst still true – will be presented in more detail in a series of upcoming papers.

      (3) In the Discussion, it is stated that self-similarity, operating on all length scales, should be used as a test for existing and future models of gyrification mechanisms. Given the lack of association between the abstract mathematical parameters described in this study and explicit properties of brain tissue and its constituents, it is difficult to envision how the coarse-graining operation can be used to guide development of "models of cortical gyrification."

      We have clarified in more detail what we meant originally in Discussion:

      “Finally, this dual universality is also a more stringent test for existing and future models of cortical gyrification mechanisms at relevant scales, and one that moreover is applicable to individual cortices. For example, any models that explicitly simulate a cortical surface as an output could be directly coarse-grained with our method and the morphological trajectories can be compared with those of actual human and primate cortices. The simulated cortices would only be ‘valid’ in terms of the dual universality, if it also produces the same morphological trajectories.”

      However, we agree with the reviewer that our paper could be misread as demanding direct comparisons of each coarse-grained brain with an actual brain, and we have now added the following text to clarify that this is not our intention for the proposed method or outputs.

      “Note, we do not suggest to directly compare coarse-grained brain surfaces with actual biological brain surfaces. As we noted earlier, the coarse-grained brain surfaces are an output of our algorithm alone and not to be directly/naively likened to actual brain surfaces, e.g. in terms of the location or shape of the folds. Our comparisons here between coarse-grained brains and actual brains is purely on the level of morphometrics across the whole cortex.”

      Indeed, the dual universality imposes restrictive constraints on the possible shapes of real cortices, but do not fully specify them. Presumably, the location of individual folds in different individuals and species will depend on their respective evolutionary histories, so there is no reason to expect a match in fold location between the ‘melted’ cortices of more gyrified species, on one hand, and the cortex of a less-gyrified one, on the other,  even if their global morphological parameters and global mechanism of folding coincide.

      (4) There are several who advocate for analyzing cortical mid-thickness surfaces, as the pial surface over-represents gyral tips compared to the bottoms of sulci in the surface area. The authors indicate that analyses of mid-thickness representations will be taken on in future work, but this seems to be a relevant control for accepting the conclusions of this manuscript.

      In the context of some applications and methods, we agree that the mid-surface is a meaningful surface to analyse. However, in our work, the mid-surface is not. The fractal estimation rests on the assumption that the exposed area hugs the object of interest (hence convex hull of the pial surface), as the relationship between the extrinsic and intrinsic areas across scales determine the fractal relationship (Eq. 2). If we used the mid-surface instead of the pial surface for all estimation, this would not represent the actual object of interest, and it is separated from the convex hull. Estimating a new convex hull based on the mid surface would be the equivalent of asking for the fractal dimension of the mid-surface, not of the cortical ribbon. In other words, it would be a different question, bound to yield a different answer.

      Hence, we indicated in our original response that we only have a provisional answer, but more work beyond the scope of this paper is required to answer this question, as it is a separate question. The mid-surface, as a morphological structure in its own right, will have its own scaling properties, and our provisional understanding is that these also yield a scaling law parallel to those of the cortical ribbon with the same or a similar fractal dimension. But more systematic work is required to investigate this question at native scale and across scales.

      Reviewer #3 (Public Review):

      Summary: Through a rigorous methodology, the authors demonstrated that within 11 different primates, the shape of the brain followed a universal scaling law with fractal properties. They enhanced the universality of this result by showing the concordance of their results with a previous study investigating 70 mammalian brains, and the discordance of their results with other folded objects that are not brains. They incidentally illustrated potential applications of this fractal property of the brain by observing a scale-dependant effect of aging on the human brain. 

      Strengths: 

      - New hierarchical way of expressing cortical shapes at different scales derived from previous report through implementation of a coarse-graining procedure 

      - Investigation of 11 primate brains and contextualisation with other mammals based on prior literature 

      - Proposition of tool to analyse cortical morphology requiring no fine tuning and computationally achievable 

      - Positioning of results in comparison to previous works reinforcing the validity of the observation. 

      - Illustration of scale-dependance of effects of brain aging in the human. 

      Weaknesses: 

      - The notion of cortical shape, while being central to the article, is not really defined, leaving some interpretation to the reader 

      - The organization of the manuscript is unconventional, leading to mixed contents in different sections (sections mixing introduction and method, methods and results, results and discussion...). As a result, the reader discovers the content of the article along the way, it is not obvious at what stages the methods are introduced, and the results are sometimes presented and argued in the same section, hindering objectivity. 

      To improve the document, I would suggest a modification and restructuring of the article such that: 1) by the end of the introduction the reader understands clearly what question is addressed and the value it holds for the community, 2) by the end of the methods the reader understands clearly all the tools that will be used to answer that question (not just the new method), 3) by the end of the results the reader holds the objective results obtained by applying these tools on the available data (without subjective interpretations and justifications), and 4) by the end of the discussion the reader understands the interpretation and contextualisation of the study, and clearly grasps the potential of the method depicted for the better understanding of brain folding mechanisms and properties. 

      We thank this reviewer again for their attention to detail and constructive comments. We have followed the detailed suggestions provided by us in the Recommendations For The Authors, and summarise the main changes here:

      - We have restructured all sections to be more clearly following Introduction, Methods, Results, and Discussion; by using subsections, we believe the structure is now more accessible to readers.

      -  We have now clarified the concept of “cortical shape”, as we use it in our paper in several places, by distinguishing clearly the object of study, and the morphological properties measured from it.

      Recommendations for the authors: 

      Reviewer #2 (Recommendations For The Authors): None 

      Reviewer #3 (Recommendations For The Authors): 

      I once again compliment the authors for their elegant work. I am happy with the way they covered my first feedback. My second review takes into account some comments made by other reviewers with which I agree. 

      We thank this reviewer again for their attention to detail and constructive comments.

      Recommendations for clarifications: 

      General comments: The purpose of the article could be made clearer in the introduction. When I differentiate results from discussion, I think of results as objective measures or observations, while discussion will relate to the interpretation of these results (including comparison with previous literature, in most cases). 

      We have restructured all sections to be more clearly following Introduction, Methods, Results, and Discussion; by using subsection, we believe the structure is now more accessible to readers.

      - l.39: define or discuss "cortical shape" 

      We have gone through the entire paper and corrected for any ambiguities. We specifically distinguish between the cortex as a structure overall, shape measures derived from this structure, and coarse-grained versions of the structure.

      - l.48-74: this would match either an introduction or a discussion rather than a methods section. 

      Done

      - l.98-106: this would match a discussion rather than a methods section. 

      Done

      - l.111: here could be a good spot to discuss the 4 vs 8 corners for inclusion of pial vs white matter voxelization 

      We have discussed this in the more detailed Supplementary section now, as after restructuring, this appears to be the more suitable place.

      - l.140-180: it feels that this section mixes methods, results and discussion of the results 

      We agree and we have resolved this by removing sentences and re-arranging sections.

      - l.183-217: mix of results and discussion 

      We agree and we have resolved this by removing sentences and re-arranging sections.

      Small cosmetic suggestions: 

      - l.44: conservation of 'some' quantities: vague 

      Changed to conservation of morphological relationships across evolution

      - l.66: order of citations ([24, 22,23]) 

      Will be fixed at proof stage depending on format of references.

      - l.77: delete space between citation and period 

      Done

      - l.77: I would delete 'say' 

      Done

      - l.86: 'but to also analyse' -> 'to analyse' 

      Done

      - l.105: remove 'we are encouraged that' 

      Done

      - l.111: 'also see' -> 'see also' 

      Done

      - l.164: 'remarkable': subjective 

      Done

      - l.189: define approx. abbreviation 

      Done

      - l.190: 'approx' -> 'approx.' 

      Revised

      - l.195: 'dramatic': subjective 

      removed

      -l. 246: 'much' -> vague 

      explained

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Answers to reviewers


      Reviewer #1

      Sagia et al. present a manuscript using A. nidulans as model to study different transport routes of membrane proteins from the ER to the plasma membrane. They showed in earlier work that apparently at least two different transport routes exist, one involving the classical ER-ERES-ERGIC-Golgi route, one bypassing the Golgi. Unpolarized membrane proteins use the former, apically sorted membrane proteins the latter route. The study here confirms their earlier findings, uses a better model (co-expression of representatives for both routes in the same cell) and provides additional mechanistic insights about the roles of rabs, SNARES and other important proteins of the secretory pathway. The study is thoroughly done, figures are of high quality, data and methods well described and adequately replicated.

      Thank you for your positive comments

      I do have, however, a number of comments that could help to improve the manuscript.

      -I suggest using the term polarized or apical rather than polar. Polar alone to me refers more to physico-chemical properties like water-solubility.

      Amended in most parts of the revised text.

      -introduction and discussion: I don’t think the literature about unconventional secretion bypassing the Golgi is complete, for example studies about TMED10 like Zhang, M. et al. Cell 181, 637-652 e615 (2020) or Zhang et al. Elife 4 (2015) are missing, there might be others. Is UapA a leader-less cargo that could be inserted via TMED10 translocation?

      Thank you for letting us know, we have missed these articles. More references on UPS are now added, including the Zhang et all publications. UapA, as all transporters, is a multispan transmembrane protein with no leader peptide. In fact, we have checked the role of p24 family proteins (homologous to TMED10) in UapA trafficking. The knock-out of key p24 proteins does not affect UapA sorting to the PM (please consider this as confidential unpublished results)

      -Fig. 1C. Can these intracellular structures be characterized in more detail?

      As explained briefly to the handling editor above, and following the reviewer’s suggestion, we performed new experiments to better characterize the identity of the cargo-labeled fluorescent puncta. To do so, we used co-expression of a standard ERES marker, Sec16, in cells expressing either UapA or SynA, tagged with different fluorescent tags. More specifically, we constructed and analyzed strains co-expressing UapA-GFP/Sec16-mCherry or GFP-SynA/mCherry-Sec16 in the sec31ts genetic background, which allows synchronization and better analysis of ER exit, as described in our text. The new findings appear as Figure 5C __in the revised manuscript. Notice that sec16-mCherry introduced in the native sec16 locus by standard knock-in reverse genetics of A. nidulans (see Materials and methods) does not affect Aspergillus growth or secretion. Experiments depicted in __5C show that both cargoes, UapA and SynA, co-localize significantly (PCC ≈ 0.6), with Sec16, suggesting that most of these puncta are indeed ERES structures. Given that the puncta marked with UapA or SynA are clearly distinct (see Figures 1C,2A, 3A, 5B), this new experiment strongly suggests that there are indeed two distinct ERES, one populated mostly by UapA and the other by SynA. Notice, as we already outline in our response to the editor above, a three-colored approach using Sec16-BFP (or Sec13-BFP) for showing directly the existence of these two populations of cargo-specific ERES in the same cell failed as the BFP signal was problematic for colocalization studies.

      Where is the Golgi localized in A. nidulans, is it decentralized like in yeast?

      Yes, as in S. cerevisiae, A. nidulans Golgi cisternae are individually scattered throughout the cytoplasm, also similarly to other filamentous fungi. Notice that in A. nidulans Golgi structures are moderately polarized (Pantazopoulou and Penalva 2009).

      Is the UapA at the time points shown in Fig. 1C in some sub-PM structures? To me the distribution at or near the PM is more punctate than in the steady state image shown in 1B

      The punctuate appearance of PM transporters at the periphery of fungal cells is a common theme when these do not reach high, steady-state, levels of accumulation. In fact, several transporters mark specific subdomains of the PM, more evident before achieving their steady-state levels. For example, in yeast several amino acid and nucleobase transporters mark punctuate structures that colocalize with eisosomes markers (caveolin-like PM subdomains), while the proton pump ATPase Pma1 marks distinct punctuate domains. Similarly, UapA and other solute transporters mark punctuate structures before reaching their state-state accumulation in the PM. Figure 1C shows the de novo synthesis of cargoes after 100 min of transcription, while Figure 1B depicts the steady-state localization of UapA and SynA after 4h. In the latter case, the PM is ‘saturated’ with UapA molecules and thus the fluorescent signal of distinct puncta ‘fuses’, creating continuous fluorescent labeling. Notice also that in several cases, in our work, we have also performed UapA transport assays, which provide a direct tool to test and confirm the presence of UapA in the PM (see Figures 4D or 6C).

      -Fig. 3A. To me it looks like there is actually a lot of colocalization of UapA and SynA, especially at or near the PM, where there is quite some white, punctate staining. The green fluorescence is just much stronger, overlaying the violet. Can you show separate channels and explain?

      We think the reviewer means Figure 2A, which compares UapA and SynA (Figure 3A compares UapA with Golgi markers). If so, we have quantitatively estimated and performed statistical analysis (PCC) which indicates that this, visually apparent colocalization, is not significant (right panel in Figure 2A). Notice also that we cannot totally exclude very minimal colocalization of UapA and SynA signals as both cargoes mark very proximal early secretory domains (i.e., ERES or ERGIC), especially in fungal cells. Anyhow, in the revised Figure 2 we also added a panel depicting separate channels, as the reviewer asks.

      -Fig. 3: In my opinion the statement that UapA "is probably sorted from an early secretory compartment, ultimately bypassing the need for Golgi maturation" is too strong at that point. You say for both UapA and SynA you don’t get significant colocalization with early Golgi/ERGIC marker, then you cannot conclude that one takes the conventional route via early-late Golgi and the other does not. What you can say is that UapA is apparently not going through late Golgi.

      The reviewer is in principle correct. However, significant colocalization with the late Golgi marker, as SynA shows, strongly suggests that this cargo has passed via the early Golgi compartment. The fact we failed to detect significant colocalization of any cargo tested with early Golgi/ERGIC markers (e.g., SedV) is very probably due to very rapid passage of cargoes from these compartments, which conventional widefield or confocal microscopy cannot detect. To achieve this, ultra-fast fluorescent microcopy, as Lattice Light Sheet Microscopy (LLSM), should be used. In fact, we are currently initiating these studies, which will appear in the near future elsewhere.

      -Fig. 4C: UapA does not seem to accumulate in the ER in the Sec24 and 13 mutants but in punctate structures. This for me is unexpected, any explanations? Can you characterize that punctate staining?

      This is an interesting observation. Notice that UapA is a large homodimeric protein (e.g., 28 transmembrane domains) that oligomerizes further upon translocation into the ER membrane. Repression of Sec24, and to a less extent of Sec13, leads to inability to exit the ER properly. Consequently, this will lead to UapA overaccumulation in the ER, which might in turn lead to ER stress and turnover, reflected in UapA aggregates. In line with this, we have previously shown that specific mutants of UapA unable to exit the ER are indeed degraded by selective autophagy (Evangelinos et al., 2016). In contrast to UapA, SynA partitions in the entire ER without forming aggregates when sec24 or sec13 are repressed. This might be due to the fact that is a single-pass, much smaller, membrane protein compared to UapA and one that is not known to form oligomers. Thus, its overaccumulation in the ER might not lead to aggregation, allowing it to diffuse laterally in the membrane of the ER. A note on this is included in the Figure legend of the revised manuscript.

      -Fig. 6D: You state that BFA "has only a very modest effect on UapA translocation to the PM". To me the PM (or very near PM) staining of UapA looks very different in the PFA treated cells, more uneven/punctate. Is there an explanation for that?

      Our explanation is the following. When BFA is added, conventional secretion is blocked and Golgi collapses. We believe that this might have a moderate indirect effect also on cargoes bypassing the late Golgi/TGN, as UapA (i.e., lower levels of UapA present in the PM). This is based on the fact that UapA, in addition to conventional cargoes, requires the Q-SNARE complex SsoA/Sec9 to translocate to the PM. SsoA, being a membrane protein cargo itself, also needs to traffic to the PM. Interestingly, we have previously obtained evidence suggesting that SsoA traffics to the PM by both conventional and a Golgi-bypass routes (Dimou et al 2020). Thus, UapA translocation to the PM might indeed be partially impeded or delayed due to repression of proteins, such as SsoA (and probably Sec9), needed for its final integration into the PM bilayer. Importantly, in line with an indirect effect of BFA on the levels of UapA localized in the PM, notice that, unlike SynA, UapA was never trapped in brefeldin bodies (i.e., Golgi aggregates).

      Reviewer #1 (Significance):

      One strength of the study is the use of a model organism, A. nidulans, not cell cultures. Also, the use of both reporters, UapA and SynA, in the same cell is an advantage over previous studies using different lines and different promotors. Limitation of the study might be that it remains unclear to what extend the basic mechanism (UapA and SynA are transported to PM in different carrier and via different routes) can be generalized to other polarized (apically?) membrane proteins versus non-polarized membrane proteins in A. nidulans and whether a similar mechanism exists in other organisms. Some of the basic findings of the study are not new but were published by the same group. However, as the authors point out, the current study uses improved assays and extends their previous studies, advancing our understanding of the mechanistics of transport in the conventional secretory pathway and novel alternative routes. The study will be of interest for basic researchers in the trafficking field. My own expertise is transport through the secretory pathway in mammalian cells, many years ago more post-Golgi, now mostly ER-Golgi and ER itself.

      We thank the reviewer for his positive comments.

      __Reviewer #2 __

      __ __The idea that transmembrane proteins of the plasma membrane move from the ER to the Golgi and then to the cell surface is firmly entrenched, and the mechanisms and components of this secretory pathway have been extensively characterized. Secretory vesicles are often delivered from the Golgi to sites of polarized growth. This paper builds on previous work by the same group to provide evidence that in Aspergillus nidulans, some non-polarly localized plasma membrane proteins follow a very different pathway, which bypasses components of the conventional secretory machinery such as SNAREs that have been implicated in secretion as well as the exocyst. In particular, they systematically compare the trafficking of the SNARE SynA, which follows the conventional secretory pathway, with that of the purine transporter UapA, which apparently does not. The two proteins were co-expressed in the same cells using the same promoter. A variety of genetic and microscopy methods are used to support the conclusion that UapA reaches the plasma membrane by a route distinct from that followed by SynA.

      In my view, the authors present a convincing case. The individual experimental results are sometimes ambiguous, but the combined results favor the conclusion that UapA follows a novel pathway to the plasma membrane. I have only a few relatively minor comments.

      Thank you for your positive comments

      1. In the Introduction and elsewhere: to my knowledge, there is no clear evidence that AP-1-containing clathrin-coated vesicles carry cargoes from the Golgi to the plasma membrane. On the contrary, as recently reported by Robinson (https://pubmed.ncbi.nlm.nih.gov/38578286/), AP-1-containing vesicles likely mediate retrograde traffic in the late secretory pathway.

      Thank you for this comment and the relative reference. We are aware that AP-1 is likely to also mediate retrograde traffic in the late secretory pathway or/and intra-Golgi recycling, as also reported by the group of Benjamin Glick. Thus, in the revised version we added a short comment on this plus relative references. Along this line, our previous work has shown that transcriptional repression of AP-1 arrests the polar localization of several apical markers in A. nidulans and we reported that this might be due to an effect on both anterograde and retrograde trafficking. Please see “Secretory Vesicle Polar Sorting, Endosome Recycling and Cytoskeleton Organization Require the AP-1 Complex in Aspergillus nidulans”. Martzoukou O, Diallinas G, Amillis S. Genetics. 2018 Aug;209(4):1121-1138. Overall, the fact that AP-1 was found absolutely dispensable for UapA trafficking, further strengthens our conclusion that UapA bypasses the Golgi.

      1. In Figure 2, is there any known significance to the presence of UapA in "cytoplasmic oscillating thread structures decorated by pearl-like foci as well as a very faint vesicular/tubular network"?

      At present we cannot answer this question. In order to understand what these structures represent and answer what is their role, we will need to employ super-resolution and ultra-fast microscopy and additional markers, which we envision to do. We suspect that they might be tubular networks, but this extends beyond the present work.

      1. SynA is related to S. cerevisiae Snc1/2, which are known to be present in late Golgi compartments due to repeated rounds of endocytosis to the Golgi and exocytosis to the plasma membrane. The SynA shown here to colocalize with PHosbp is probably present in a similar recycling loop rather than being en route to the plasma membrane for the first time. Therefore, the differential colocalization of UapA and SynA with PHosbp does not by itself provide "strong evidence that the two cargoes studied traffic via different routes" as stated in the text but might instead indicate that only SynA undergoes frequent endocytosis. The text should be amended accordingly.

      The reviewer is in principle correct. However, given that colocalization of SynA and PHosbp occurred all over the cytoplasm of hyphae and not only at the apical region, and because we record colocalization of cargoes before their steady-state accumulation to the PM, thus at a stage where recycling must be minimal, the recorded colocalization should reflect anterograde transport rather than recycling. We added this reasoning it the revised text.

      1. A missing piece of the story is a test of whether the puncta visualized for the two cargoes in Figure 5B are indeed distinct populations of COPII-containing ER exit sites. The relevant experiment would involve co-labeling of the cargoes together with a COPII marker. Three-color labeling would presumably be needed.

      This point was also raised by reviewer 1 (and review 3) and thus performed new experiments to better characterize the identity of the cargo-labeled fluorescent puncta. To do so, we used co-expression of a standard ERES marker, Sec16, in cells expressing either UapA or SynA, tagged with different fluorescent tags. More specifically, we constructed and analyzed strains co-expressing UapA-GFP/Sec16-mCherry or GFP-SynA/Sec16-mCherry in the sec31ts genetic background, which allows synchronization and better analysis of ER exit, as described in our text. The new findings appear as Figure 5C __in the revised manuscript. Notice that sec16-mCherry introduced in the native sec16 locus by standard knock-in reverse genetics of A. nidulans (see Materials and methods) does not affect Aspergillus growth or secretion. Experiments depicted in __5C show that both cargoes, UapA and SynA, co-localize significantly (PCC ≈ 0.6), with Sec16, suggesting that most of these puncta are indeed ERES structures. Given that the puncta marked with UapA or SynA are clearly distinct (see Figures 1C,2A, 3A, 5B), this new experiment strongly suggests that there are indeed two distinct ERES, one populated mostly by UapA and the other by SynA. Notice, as we already outline in our response to the editor above, a three-colored approach using Sec16-BFP (or Sec13-BFP) for showing directly the existence of these two populations of cargo-specific ERES in the same cell failed as the BFP signal was problematic for colocalization studies.

      Reviewer #2 (Significance):

      This study provides compelling evidence that in the fungus Aspergillus nidulans, some transmembrane transporter proteins reach the plasma membrane by a pathway that bypasses much of the conventional machinery associated with the Golgi apparatus and secretory vesicles. Although previous publications pointed toward a similar conclusion, the present work tackles the problem in a more rigorous and systematic way. These findings are important for cell biologists who study membrane traffic, it remains to be determined how prevalent this type of non-canonical secretion might be in other organisms.

      We thank the reviewer for his positive comments

      Reviewer #3

      The manuscript by Sagia et al compares the trafficking of a polarized (SynA) with a non-polarized (UapA) transmembrane protein. In agreement with previous work of the same lab, they find that UapA reaches the plasma membrane through a Golgi-bypass route, which they characterize to some extent. Overall, the data are of good quality and the story is interesting and timely. Understanding trafficking routes that bypass the Golgi is highly interesting. Nevertheless, there are several points of criticism that I have and below is a list where I combine major and minor points together:

      Thank you for your positive comments

      Major Comments:

      1- Is it possible that the polarized phenotype of SynA is caused by selective removal, i.e. SynA is delivered to the entire plasma membrane, but endocytosed rapidly from all areas except the tip of the hyphae. This would also result in a polarized distribution.

      This is in principle possible, but here this is not the case. SynA is polarized due to rapid local endocytosis and immediate recycling at the subapical region, known as the subapical collar. Please see:

      Taheri-Talesh N, Horio T, Araujo-Bazán L, Dou X, Espeso EA, Peñalva MA, Osmani SA, Oakley BR. The tip growth apparatus of Aspergillus nidulans. Mol Biol Cell. 2008 Apr;19(4):1439-49. doi: 10.1091/mbc.e07-05-0464.

      Hernández-González M, Bravo-Plaza I, Pinar M, de Los Ríos V, Arst HN Jr, Peñalva MA. Endocytic recycling via the TGN underlies the polarized hyphal mode of life. PLoS Genet. 2018;14(4):e1007291. Published 2018 Apr 2. doi:10.1371/journal.pgen.1007291

      This applies to all apical markers; they remain polarized by continuous local recycling after the diffuse laterally to the subapical collar.

      2- The authors describe the distribution of SynA and UapA in cells deficient of various COPII/ERES proteins. However, these data are not shown, and it is not clear how they were quantified. It would be important to add quantitative data here.

      Quantitative data are included in Figure 4C, displaying the percentages of cells with UapA either retained in the ER or reaching the PM for each background deficient in a COPII protein. Repression of SarA and Sec31 resulted in UapA retention in the ER in all analyzed cells (100%). However, repression of Sec12, Sec24, or Sec13 had a differential effect across the cell population, with UapA reaching the PM in some cells, while remaining trapped in the ER in others. To quantify these data and determine which cargo localization pattern prevails, we measured the number of cells in each category and represented them as percentages. A similar approach was used to examine the role of Golgi proteins in the trafficking of UapA and SynA (Figure 6).

      3- on page 8, the authors discuss the discrepancy regarding the role of Sec13. They offer as an explanation that the previous studies have been performed in strains that separately expressed the two cargoes. However, I am unable to see why and how this would be a valid explanation.

      Given that Sec13 has a variable/partial effect on UapA, we have previously been biased towards images that showed an effect on localization, as expected, and considered that the lack of an effect might have been due to inefficient repression in a fraction of cells. In our new system, we were able to directly compare UapA to SynA and find out that while SynA was always affected under our conditions, the effect of UapA was still variable. Thus, the partial effect of Sec13 on UapA is physiologically valid and not a matter of insufficient repression in a fraction of cells. This shows the importance of our new improved system where we follow the synchronous expression of two cargoes in the same cells.

      4- Why is the effect of Sec24 depletion so much stronger than of Sec12 depletion? Sec12 is the GEF for SarA, without which Sec24 should not be recruited to ERES. The explanation that low amounts of Sec12 are still present and sufficient to carry out the role of this protein. What is the evidence for that?

      Sec24 is the principal receptor of cargoes responsible for their recruitment to ERES. Sec12 is the catalytic effector for SarA required for the initiation of COPII vesicle formation. The question of the reviewer is thus logical.

      However, Sec12 is indeed present at extremely very low levels when expressed from its native promoter under the condition of our experiment (minimal media). This is supported by our recent proteomic analysis, performed under similar conditions, which failed to detect the Sec12 protein, unlike all other COPII components (see Dimou et al., 2021, doi; 10.3390/jof7070560), but also by cellular studies of the group of M.A. Peñalva, who failed to detect Sec12 tagged with GFP (Bravo-Plaza et al., 2019, doi: 10.1016/j.bbamcr.2019.118551). Additionally, in yeast, immune detection of Sec12 has been possible only in cells harboring sec12 on a multicopy plasmid, suggesting its low abundance in wild-type cells (Nakano et al., 1988, doi:10.1083/jcb.107.3.851).

      Given that repression of sec12 transcription via the thiAp promoter still allows 68% of cells to secrete normally both SynA and UapA, while 32% of cells are blocked in the trafficking of both cargoes, suggests that in most cells either SarA can catalyze the exchange of GDP for GTP without Sec12, maybe through a cryptic guanine nucleotide exchange factor (GEF), or that very small amounts of Sec12 remaining after repression are sufficient for significant SarA activation. Whichever scenario is true, Sec12, similarly to SarA, is not critical for distinguishing Golgi-dependent from Golgi-independent routes, as both cargoes are affected similarly. In the revised text we added a not on this issue.

      5- In Figure 5, it would help readers who are not so familiar with Aspergillus organelle morphology to explain the figure a bit better. This might appear trivial for experts, but anyone from outside this field is slightly lost.

      In the revised manuscript we added a figure panel depicting a schematic representation of A. nidulans key secretory compartments.

      6- The authors write that not seeing UapA in Golgi membranes is evidence that it does not pass through this organelle. However, when they write that SynA is never seen in cis-Golgi elements, they do not conclude that SynA bypasses the cis-Golgi.

      The fact that SynA, unlike UapA, colocalized significantly with late-Golgi/TGN and follows conventional secretion in general, strongly suggests that SynA also passes from the early-Golgi. Cargo traffic through the Golgi is mediated by cisternal maturation, where an individual cisterna gradually changes its nature from an earlier to a later one, while the cargo remains inside. UapA, unlike SynA, never colocalized with any Golgi marker used and was not affected by BFA. We agree with the reviewer that we did not have direct proof for passage of UapA or SynA from the early-Golgi in the wt background, which allows for the alternative, but rather unlikely hypothesis, that none of the two cargos is sorted to the early Golgi and that SynA traffics directly to late-Golgi/TGN. Our inability to detect sorting of any cargo to the early-Golgi is seemingly due to ultra-fast passage of cargoes from very early secretory compartments, such as ERGIC/early-Golgi. In fact, we have obtained evidence of this using Lattice Light Sheet microscopy (results in progress, to appear elsewhere).

      7- Figure 5C: the authors claim that the CopA and ArfA affects trafficking of UapA and SynA from ER to plasma membrane and assign CopA and ArfA as regulators for anterograde trafficking. I think this interpretation is not justified by the data. Depletion of CopA and ArfA will affect the Golgi apparatus in structure and function. The more straight-forward interpretation is that repression of the COPI machinery results in a defect in Golgi exit and therefore retention in pre-Golgi compartments (including the ER and maybe the ERGIC should it exist in Aspergillus). The same is true for BFA treatment where there are also negative effects on ER export, which are rather indirect consequences of alterations of Golgi function and integrity. Likewise, the interpretation of the papers by Weigel et al and Shomron et al is not correct. It is more likely that COPI is recruited to the growing ERES-derived tubule (or ERGIC) to recycle proteins back to the ER. This is not necessarily a proof that COPI regulates anterograde trafficking

      This is a highly debatable issue which our work cannot address. However, we amended the text accordingly.

      8- Figure 6: The images look like in Figure 5, yet here you don't call them ER-associated.

      The two images are not alike. In Figure 5 upon activation of Sec31 (permissive temperature) we detect mostly punctual structures resembling ERES, whereas at the nonpermissive temperature we detect a membranous network typical of the ER. Upon repression of CopA we also detect punctual structures similar to ERES. In Figure 6, we mostly detect an effect on SynA. Repression of early secretory steps (SedV, GeaA) lead to collapse of SynA in the entire ER network. Repression at later stages of Golgi maturation and post-Golgi secretion (RabO, HypB, RabE, AP-1) lead to the appearance of punctual structures, most probably Golgi aggregates.

      9- Figure 6D: How long was the BFA treatment. I am surprised that the pool of SynA preexisting at the plasma membrane seems to also be sensitive to BFA.

      Cells were grown overnight under repressed conditions for both UapA and SynA. After 12-14h cells were shifted to derepressed conditions using fructose as carbon source. BFA was added after 90min of cargo derepression, while both cargoes were still in cytoplasmic structures so there was not preexisting SynA or UapA at the PM (see also Figure 1C). Subcellular localization of both cargoes was studied for 60min after BFA treatment.

      10- This might be beyond the scope of this study, but as far as I know UapA is not N-glycosylated. Would the introduction of an N-glycosylation site shift it towards the Golgi-based route?

      Thank you for this suggestion. We have performed this experiment, adding a glycosylation site on UapA, based on the glycosylation sites found in tis mammalians homologues. We did not detect any effect on UapA trafficking route or its activity. As the reviewer recognizes this goes beyond the scope of this study and thus, we did not include it the manuscript. Differential cargo glycosylation is however an important issue to be studied systemically in respect to different trafficking routes, and we envision to investigate it systematically.

      Minor Comments

      1- This might be just a personal preference, but I think that the term polar is misleading, because it implies something about the polarity of the amino acids. I think "polarized" might be the more common term. Anyway, this is just a minor point and just a suggestion from my side.

      Amended in the revised text.

      2- The paper by the Saraste lab should be mentioned and discussed (PMID: 16421253), which I think is very relevant to the current story.

      We thank the reviewer for pointing out this important publication. In that case, the Rab1 GTPase defined a pathway connecting a pre-Golgi intermediate compartment with the PM in mammalians nerve cells. Thus, the Saraste lab publication is indeed along the lines of findings supporting that Golgi-independent unconventional cargo trafficking routes initiate at very early secretory compartments. Notice, however, that RabO, the A. nidulans homologue of Rab1, which in their case was essential for direct cargo sorting from the ERES/ERGIC to the PM, in or system, was dispensable for Golgi bypass. The Saraste lab article is now mentioned and discussed.

      3- Having worked with ERES for over two decades, I find it strange to see it written ERes. I see no reason why ER exit sites in Aspergillus should be abbreviated differently from all other types of cells (yeast, drosophila, worms, mammals). I think that the entire acronym should be capitalized.

      Amended in the text

      4- When discussing the data about the partial effect of Sec13, it would be good to refer to a previous paper by the Stephens lab that showed that silencing Sec13/31 results in a defect in trafficking of collagen, but not of VSVG (PMID: 18713835).

      We thank the reviewer for also pointing out the publication of the Stephens lab, now mentioned in the revised text. Noticeably, in that case silencing of both Sec13 and Sec31 has no effect on the trafficking of specific cargoes, whereas in our case Sec31 is still absolutely needed for both conventional and Golgi-independent secretion of SynA and UapA, respectively.

      Reviewer #3 (Significance):

      Overall, the data are of good quality and the story is interesting and timely. Understanding trafficking routes that bypass the Golgi is highly interesting. The main weakness is the lack of mechanistic understanding of the Golgi-bypass pathway. In addition, the study is limited to two proteins as representatives of polarized vs. non-polarized proteins. The main target audience for this paper are scientists working in the area of secretion and trafficking in the secretory pathway.

      We thank the reviewer for his positive comments.

      We are aware that the mechanistic details of Golgi bypass are missing and this is our next goal, dissecting those via various approaches genetic and biochemical approaches and employment of super resolution and ultra-fast microscopy.

      __Reviewer #4 __

      In this study, Sagia et al investigate the trafficking of different secretory cargo in Aspergillus nidulans under conditions that repress expression of transport factors or block stages in membrane trafficking. The primary approach is to conduct dual live-cell imaging of GFP-tagged UapA (plasma membrane localized purine transporter) and SynA (plasma membrane R-SNARE) after their simultaneous derepression to monitor trafficking routes. In germlings, both secretory proteins are detected in non-overlapping intracellular compartments and puncta after 60-90 min of derepression. After 4-6 hrs, SynA localizes to hyphal tips whereas UapA localizes to non-polar regions of the PM. Colocalization studies do not show UapA overlap with Golgi markers (SedV, PH-OSBP) during its biogenesis whereas SynA displays significant co-localization. Repression of COPII and COPI components generally block transport of both cargos to the PM and cause accumulation in ER compartments, although there are some differential effects on UapA and SynA localization. Finally, repression of other transport factors (ER-Golgi SNAREs, Golgi transport factors, and exocytic machinery) had differential effects on UapA and SynA localization over time with UapA reaching the plasma membrane in many instances and SynA accumulating in intracellular compartments.

      Based on these observations, the authors conclude that UapA and SynA follow distinct trafficking routes to the plasma membrane where SynA uses a canonical SNARE-dependent secretory pathway route and UapA follows a non-canonical route that may bypass Golgi compartments. The study is extensive and supports the model that biogenesis of SynA and UapA follow distinct processes. However, there are some complexities that may limit interpretation. First, the cargo studied are targeted to the ER differently. UapA is a multispanning transmembrane protein that is likely dependent on the Sec61 translocon for co-translational membrane insertion and will involve ER chaperones and quality control machinery for its biogenesis. SynA will depend on the tail-anchored machinery (GET/TRC pathway) for insertion into the ER and is processed by cytosolic factors/chaperones. Therefore, the sites of ER insertion and the rates of biogenesis of these cargoes will be different. In addition, the repression of trafficking machinery used in this study appears to be variable and may exert partial blocks on intracellular transport stages. Regardless, the study clearly documents that SynA and UapA follow distinct biogenesis and transport processes when co-expressed in cells under experimentally controlled conditions.

      Thank you for your positive comments.

      To our knowledge there is no evidence suggesting that SynA translocates via a tail-anchored machinery (GET/TRC pathway) and not through the translocase. Despite this, we agree with the reviewer that translocation to the ER, as well as exit from it, might be cargo-dependent, especially when it concerns proteins with very different size, structures and oligomerization. Thus, the rate of biogenesis of UapA and SynA is probably quite different. However, this still does not dismiss our basic conclusion that the two cargoes follow distinct routes to traffic to the PM. The ‘problem’ of variable transcriptional repression of some trafficking-related proteins is solved by comparing the relative effect on the two cargoes in the same cells, and this is in fact the advantage of our new system. Importantly, notice that we took care to use conditions of repression where SynA trafficking by the conventional path was totally abolished and compared it to UapA.

      1. It was not clear if the translation, ER insertion and folding of UapA and SynA are fully synchronous. Is it possible that the rate of UapA synthesis and transport to the plasma membrane is substantially faster than for SynA? The imposition of transport blocks could trap SynA and not UapA if this cargo was at later transport stages.

      As already discussed above translation, ER insertion and folding of UapA and SynA might indeed by different. This might somehow affect the trafficking path followed, but this issue is beyond the scope of this work. Notice, however, that the transcription of both cargoes is kept fully repressed during establishment of repression of secretion. Only when repression and blocking of secretion is established (12-14 h germination), as verified by Western blot analysis, we derepress the transcription of UapA and SynA, expressed from the same promoter, and follow their dynamic subcellular localization. Hence, this system ensures that both cargoes start from the earliest transport stage, the ER, upon imposition of transport blocks.

      1. In repressing transport factors (e.g., SarA, Sec12, Sec24, Sec13, SedV, RabE), it is clear that under thiamine repressing conditions these cells do not grow or have greatly reduced growth rates. However, it was not clear if proteins are depleted to the same extent in cells after repression for 12-14 hr or 16-22 hr. as mentioned in the methods. Indeed, in some cases depleted cells display different cargo localization patterns, for example 67% of cells show normal localization of UapA and SynA after sec12 repression and 33% show ER accumulation of both cargoes. There is differential localization of UapA and SynA in many cases where transport factors are repressed, but this could be due to partial inhibition and not complete blocks. It would be helpful to clearly indicate the time points and conditions in each of the figure legends as in points 3-5 below.

      In the revised manuscript we did our best to clearly indicate the time points and conditions in each of the figure legends. Differential localization of UapA and SynA in many cases where trafficking factors are repressed is indeed an interesting outcome. Inefficient repression was dismissed based on the lack of colony growth (see relative growth tests of SarA, Sec24, Sec13, Sec31, SedV, GeaA, RabO, RabE, Ykt6, Sft1, SsoA and Sec9), but also by western blots (e.g., Sec24, Sec13, Sec31 or Sec9 shown in the present manuscript, or other trafficking proteins studied previously. Martzoukou et al., 2018; Dimou et al., 2020). Repression of Sec12 and HypB, and to lower degree AP-1, allowed formation of small and/or compact colonies, but even in these cases relative protein levels could not be detected in western blots, guaranteeing efficient repression.

      1. In Fig 4A immunoblot, HA-tagged proteins are not detected after thiamine repression. Please state the time of thiamine repression used before protein extraction and blot. Is this for the same length of time as for cells shown in panel 4C? It would also be helpful to state the time of cargo derepression before capturing images in 4C. The methods section mentions 12-14 hr or 16-22 hr of growth, presumably with thiamine in the culture, and then 1-8 hr or 60 min to 4 hr of cargo derepression before imaging. Please specify.

      The time of thiamine repression before protein extraction was 16-18h. The same repression time was used for experiments shown in Figures 4C and 6C (ER/COPII and Golgi/post-Golgi repression respectively). More specifically, for microscopy experiments cells were grown in the presence of glucose and thiamine for 12-14h (repressed UapA/SynA and thiAp expressed gene). After this time, cells were shifted to fructose and thiamine for 4h (derepression of UapA/SynA and repression of thiAp expressed gene). In both cases (protein extraction and microscopy experiments) the total time of thiamine repression was 16-18h.

      1. For the thiA-copA and thiA-arfA repression experiments (Fig 5C), the methods section states that thiamine was not added ab initio in the culture, but after an 8 h time window without thiamine at the start of spore incubation. This is interpreted to mean that repression was for a shorter period to time than the 12-14 hr overnight growth. However, the figure legend states that De novo synthesis of cargos takes place after full repression of CopA and ArfA is achieved (>16 hr). Please clarify.

      We think that the review was confused with repression of cargo synthesis (via alcAp+glucose) versus repression of trafficking proteins (via thiAp+thiamine). Please see Materials and methods. We clarify our protocol also here:

      For the thiAp-copA and thiAp-arfA repression experiments addition of thiamine ab initio in the culture leads to total arrest of spore germination and germling formation. Thus, we added an 8-hour time window without thiamine to allow conidiospores to germinate until the stage of young germlings, under conditions where cargo expression via the alcAp was repressed by glucose. Subsequently, thiamine was added in the media (16-18 h) to repress CopA and ArfA, while cargo expression remained glucose-repressed. The transcriptional repression of the cargoes UapA and SynA was maintained for a longer period (24-26 h) compared to other repression experiments, but longer times of repression of cargoes do not make any difference, as full repression is achieved already at 12 h. De novo cargo trafficking was followed next day by eliciting depression, via a shift to fructose media, while still maintaining thiamine to repress CopA or ArfA.

      1. In Fig 6D, BFA treatment is shown to trap SynA in Golgi aggregates while UapA still reaches the plasma membrane. Please state the time of BFA treatment before collecting these images. Do longer treatments with BFA before cargo derepression cause accumulation of UapA in intracellular compartments?

      As mentioned above (response to Reviewer’s #3 comment 9) cells were grown overnight under repressed conditions for both UapA and SynA. After 12-14h cells were shifted to derepressed conditions using fructose as carbon source. BFA was added after 90min of cargo derepression, while both cargoes were still in cytoplasmic structures so there was not preexisting SynA or UapA at the PM (see also Figure 1C). We have not noticed any different effect on UapA trafficking after a max of 1h of BFA treatment.

      1. A minor point, but on page 21 the methods state that "cells were shifted down to the permissive temperature (25 C), to restore the secretory block...". Suggest changing to "to reverse the secretory block..."

      Modified accordingly

      Reviewer #4 (Significance):

      This manuscript nicely builds on a developing line of investigation in the Aspergillus nidulans model that specific plasma membrane proteins are efficiently delivered to the cell surface in a pathway that is distinct from the canonical secretory pathway. Previous work from this lab has suggested that a subpopulation of COPII carriers can bypass the Golgi for delivery of specific cargo to the plasma membrane. The current study uses dual expression of UapA-GFP and mCherry-SynA to provide further support for this model. Molecular definition of a direct ER to PM transport pathway for secretory cargo would be a significant advance to a broad audience. This study provides additional depth and support that such a pathway exists but does not define how COPII vesicles or related intermediates are transported to the PM.

      Again, thank you for your positive comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to reviewers (minor points):

      We thank all reviewers for their very helpful suggestions and greatly appreciate their positive evaluation of our work.

      Reviewer #1:

      Ad 1) The reviewer states: Fig 5 While the data very nicely show that CPX and Syt1 have interdependent interactions in the chromaffin neurons, this seems to be not the case in neurons, where the loss of complexins and synaptotagmins have additive effects, suggesting independent mechanisms (eg Xue et al., 2010). This would be a good opportunity to discuss some possible differences between secretion in endocrine cells vs neurons.

      We greatly appreciate the insightful suggestion by the reviewer. To accommodate the reviewer’s suggestion, we now discuss this issue on page 21, line 486-491: “In murine hippocampal neurons, loss of CpxI and Syt1 has additive effects on fast synchronous release, suggesting independent mechanisms (Xue et al., 2010). On the other hand, the same study also showed that Syt1 heterozygosity fails to reduce release probability in wild-type neurons, but does so in the absence of Cpx, again suggesting that Cpx and Syt1 may functionally interact in Ca2+-triggered release.”

      Ad 2) The reviewer states: Fig 8 Shows an apparent shift in Ca sensitivity in N-terminal mutants suggesting a modification of Ca sensitivity of Syt1. Could there be also an alternative mechanism, that explains this phenotype which is based on a role of the n-term lowering the energy barrier for fusion, that in turn shifts corresponding fusion rates to take place at lower Ca saturation levels?

      We fully agree with the reviewer. While our data indicate that Cpx and Syt1 act in a dependent manner in accelerating exocytosis, they do not provide decisive evidence that the NTD of CpxII directly modulates the Ca2+ affinity of Syt1, an issue that we discuss on page 23 , line 523529: ”The results favor a model wherein the CpxII NTD either directly regulates the biophysical properties of the Ca2+-sensor by increasing the apparent forward rate of Ca2+-binding or indirectly affects SytI-SNARE or SytI-membrane interactions, thereby, lowering the energy barrier of Ca2+triggered fusion.”

      Reviewer #2:

      Ad 1) The reviewer states: The authors provide a "chromaffin cell-centric" view of the function of mammalian Cplx in vesicle fusion. With the exception of mammalian renal ribbon synapses (and some earlier RNAi knockdown studies that had off-target effects), there is very little evidence for a "fusion-clamp"-like function of Cplxs in mammalian synapses. At conventional mammalian synapses, genetic loss of Cplx (i.e. KO) consistently decreases AP-evoked release, and generally either also decreases spontaneous release rates or does not affect spontaneous release, which is inconsistent with a "fusion-clamp" theory. This is in stark contrast to invertebrate (D. m. and C. e.) synapses where genetic Cplx loss is generally associated with strong upregulation of spontaneous release, providing support for Cplx acting as a "fusion-clamp".

      We agree with the reviewer that it is difficult to reconcile contradictory findings regarding the role of Cpx in membrane fusion in vertebrates and invertebrates or between murine hippocampal neurons and neuroendocrine cells. On the other hand, we respectfully disagree with the statement of providing a "chromaffin cell-centric" view of the function of mammalian Cplx in vesicle fusion. In fact, a large number of model systems (in vitro and in vivo studies) support a scenario where complexin takes center stage in clamping of premature vesicle release. For example, in vitro analyses using a liposome fusion assay (Schaub et al., 2006, Nat Struct Mol Biol 13, 748; Schupp et al., 2016) or Hela cells that ectopically express “flipped” SNAREs on their cell surface (Giraudo et al., 2008, JBC 283, 21211) showed that complexin can inhibit the SNARE-driven fusion machinery. Likewise, several studies boosting complexin action by either genetic overexpression or peptide supplementation have provided evidence for the complexin clamp function in neuronal and nonneuronal cells (e.g. Itakura et al., 1999, BBRC 265, 691; Liu et al., 2007, Biochemistry 72, 439; Abderrahmani et al., 2004, J Cell Sci 117, 2239; Archer et al., 2002, JBC 277, 18249; Tang et al, 2006,

      Cell 126, 1175; Vaithianathan et al., 2013, J Neurosci 33, 8216; Roggero et al., 2007, JBC, 282, 26335.)

      In addition, chromaffin cells enable the investigation of secretion on the background of a well-defined intracellular calcium concentration. Indeed, CplxII knock-out in chromaffin cells demonstrated an enhanced tonic release which is evident at elevated levels of [Ca]i (>100nM), but absent at low resting [Ca]i (Dhara et al., 2014). Given this observation, it is tempting to speculate that variations in [Ca]i among the different preparations may contribute to the deviating expression of the complexin null phenotype in different preparations.

      Ad 2) The reviewer states: The authors use a Semliki Forest virus-based approach to express mutant proteins in chromaffin cells. This strategy leads to a strong protein overexpression (~7-8 fold, Figure 3 Suppl. 1). Therefore, experimental findings under these conditions may not necessarily be identical to findings with normal protein expression levels.

      As shown in Fig. 4, we use the secretion response of wt cells as a control so that we can assess the specificity and quality of the rescue approach in our experiments. In addition, the comparative analysis of the CpxII mutants was performed with respect to the equally overexpressed CpxII wt protein (Fig. 3 Suppl. 1), which we used as a control to determine the standard response under these conditions.

      Ad 3) The reviewer states: Measurements of delta Cm in response to Ca2+ uncaging by ramping [Ca2+ ] from resting levels up to several µM over a me period of several seconds were used to establish changes in the release rate vs [Ca2+ ]i relationship. It is not clear to this reviewer if and how concurrently occurring vesicle endocytosis together with a possibly Ca2+-dependent kinetics of endocytosis may affect these measurements.

      By infusing bovine chromaffin cells with 50µM free Ca2+, Smith and Betz have shown that the total capacitance increase is dominated by exocytosis and that significant endocytosis only sets in after 3 minutes (Smith and Betz, 1996, Nature, 380, 531). In the same line, we previously showed that mouse chromaffin cells (infused with 19µM free calcium over 2 minutes) responded with robust increase in membrane capacitance which strongly correlated with the number of simultaneously recorded amperometric events monitoring fusion of single vesicles (Dhara et al., 2014, Fig. 5B). Thus, capacitance alterations recorded under tonic intracellular Ca2+ increase in chromaffin cells are solely due to exocytosis and are not contaminated by significant endocytosis. As our Ca2+ ramp experiments were carried out for 6 seconds and the intracellular free [Ca]i did not exceed 19 µM the observed phenotypical differences between the experimental groups are most likely due to changes in exocytosis rather than endocytosis.

      Ad 4) The reviewer states: It should be pointed out that an altered "apparent Ca2+ affinity" or "apparent Ca2+ binding rate" does not necessarily reflect changes at Ca2+-binding sites (e.g. Syt1).

      We fully agree with the reviewer’s comment. As pointed out also in the response to reviewer 1, our experiments do not provide decisive evidence that the NTD of CpxII directly modulates the Ca2+ affinity of Syt1, an issue that we discuss on page 23 , line 523-529: ” The results favor a model wherein the CpxII NTD either directly regulates the biophysical properties of the Ca2+sensor by increasing the apparent forward rate of Ca2+-binding or indirectly affects SytI-SNARE or SytI-membrane interactions, thereby, lowering the energy barrier of Ca2+-triggered fusion.” 

      AD 5) There are alternative models on how Cplx may "clamp" vesicle fusion (see Bera et al. 2022, eLife) or how Cplx may achieve its regulation of transmitter release without mechanistically "clamping" fusion (Neher 2010, Neuron). Since the data presented here cannot rule out such alternative models (in this reviewer's opinion), the authors may want to mention and briefly discuss such alternative models.

      The study by Bara et al reiterates the model proposed by the Rothman group which attributes the clamping function of Cpx to its accessory alpha helix by hindering the progressive SNARE complex assembly. We have explicitly stated this issue in the original version of the manuscript (page 19, line 425) “As the accessory helix of Cpx has been found to bind to membrane proximal cytoplasmic regions of SNAP-25 and SybII (Malsam et al., 2012; Bykhovskaia et al., 2013; Vasin et al., 2016), an attractive scenario could be that both domains of CpxII, the CTD and the accessory helix, synergistically cooperate to stall final SNARE assembly”. In this context, we will now cite also the study by Bera et al.. 

      A related view of the function of complexin suggested that it may act as an allosteric adaptor for sytI (Neher 2010, Neuron). Here, rather than postulang independent "clamp" and "trigger" functions for the dual action of complexin, these were explained as facets of a simple allosteric mechanism by which complexin modulates the Ca2+ dependence of release. Yet, this interpretation appears to be difficult to reconcile with the observation of our and other laboratories, showing that the fusion-promoting and clamping effects are separable (e.g. Dhara et al., 2014; Lai et al., 2014; Makke et al., 2018; Bera et al., 2022).  

      Some parts of the Discussion are quite general and not specifically related to the results of the present study. The authors may want to consider shortening those parts.

      Considering the contrary findings in the field of SNARE-regulating proteins, the authors hope that the reviewer will agree that it is necessary to discuss the new observations in a broader context, as also acknowledged by the first reviewer.

      Last but not least, the presentation of the results could be improved to make the data more accessible to non-specialists, this concerns providing necessary background information, choice of colors, and labeling of diagrams.

      Done

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Regarding figures: 

      (1) Please use clearly distinct colors in diagrams. For example, in Figure 2 Suppl. 3, four different shades of red (or reddish) are used to color the traces and the respective bars. These different shades of red are difficult to discriminate. In Figure 5 Suppl. 1, the two greens are nearly indistinguishable.  

      Done

      (2) RRP size and SRP size on the one hand, and SR rate on the other represent different quantities which are measured in different units. Please use a separate y-axis for the SR (a rate measured in fF/s) and do not combine with RRP and SRP (pool sizes measured in fF). This would also automatically alleviate the need for axis breaks in the plots of RRP size and SRP size. In general, please do not use axis breaks which make interpretation of data unnecessarily more complicated.  

      In order to clarify the display, we now define the different units together with the quantified parameter (e.g. RRP [fF], SRP [fF], SR [fF/s]) allowing us to omit a second axis in those subpanels.

      (3) When plotting bar graphs showing mean tau_RRP, mean tau_SRP, and mean delay, please always use the correct y-axis labels, i.e. use "tau_RRP", "tau_SRP" and "delay" as y-axis labels as it was done for example in Figure 4D, and do not use "tau_RRP", "tau_SRP" and "delay" as x-axis labels as it was done for example in Figure 1D and many other figure panels.  

      We have standardized the figure display. Yet, we would prefer to keep our way subpanel labelling which states the parameter underneath the bar graph and thereby makes the results more accessible.  

      (4) Are the asterisks indicating statistical significance perhaps missing in Figure 4D, middle panel (tau_SRP)?

      There was not a statistically significant difference (wt vs cpxIIko+CpxII EA, P=0.0826, Kruskal-Wallis with Dunn’ post hoc test).  

      (5) According to the Results section (pages 12 to 13), I assume that in Figures 6 and 7 the labels "+Cplx XYZ" are used by the authors to identify an overexpression of Cplx XYZ in a Cplx WT background. The legend text reads however " ... cells expressing either Cplx2 wt or the mutant ...", which would not be correct. Please check.

      We have changed the formulations to “overexpression” accordingly.

      (6) The x-axis unit in Figure 8C is likely "µM" and not "M".

      Done.

      (7) The abbreviations "CplxII LL-EE" and "CplxII LL-WW", and "CplxII LLEE" and "CplxII LLWW" are very similar but refer to different mutants. Could you please think of a more specific and unambiguous abbreviation? Perhaps "CplxII L124E-L128E"?  

      We have changed the abbreviations, accordingly (i.e. CpxII L124E-L128E).  

      Regarding the manuscript text:  

      Line 65: "prevents" instead of "impairs"? 

      done

      Line 67: why "in vivo"? 

      We changed the formulation to ‘Several’

      Line 83: "in addition to the clamping function ..." This is misleading. Many of the studies listed here did not provide evidence for enhanced spontaneous release following Cplx loss and often observed the opposite, reduced spontaneous release. The enhanced delayed release was observed by Strenzke et al 2009 J.Neurosci. and by Chang et al. 2015 J.Neurosci. (which the authors may want to cite). However, that enhanced delayed release occurred despite reduced spontaneous release indicating that it is not simply the result of a missing "fusion clamp". 

      To accommodate the reviewer’s suggestion, we have changed the formulation to “Independent of the clamping function of Cpx….”

      Line 104: "speeds up exocytosis that is controlled by the forward rate of Ca2+ binding" This is difficult to understand without context.  

      We have now added the corresponding citations (Voets et al., 2001; Sorensen et al., 2003), which showed that exocytosis timing in chromaffin cells is largely determined by the kinetics of Ca2+-binding to SytI.

      Line 116: "Cplx2 knock out ..." Please provide (here or earlier in the manuscript) information to the reader about which Cplx paralogs are expressed in chromaffin cells.  

      We now state on line 111 that “CpxII is the only Cpx isoform expressed in chromaffin cells (Cai et al., 2008)”

      Line 118: "=~" either "=" or "~". 

      done

      Line 120: "instead" seems superfluous.

      done

      Line 272: "calcium binding rates" should perhaps better read "apparent calcium binding rates". 

      done

      Line 290: "enhancing SytI's Ca2+ affinity" should perhaps better be "enhancing the apparent Ca2+ affinity of the release machinery". Ca2+ binding kinetics is never directly assayed here.

      We agree and have phrased the sentence accordingly.

      Line 300: "Expression of Cplx ... in Syt1 R233Q ki cells, ..." Perhaps better "Overexpression of Cplx ... in Syt1 R233Q ki/Cplx2 wt cells, ..." for clarification?

      done

      Lines 313ff: What is assayed here is the apparent Ca2+ binding kinetics and apparent KD values of the release machinery. Ca2+ binding to Syt1 is never directly measured!  

      We agree and have changed the wording accordingly to “CpxII NTD supports the forward rate of calcium binding to SytI in accelerating exocytosis”

      Line 347: "Complexin plays a dual role ..." This is partially misleading. It does so in chromaffin cells and D.m. and C.e. NMJs but not at conventional mammalian synapses. 

      We agree and have changed the formulation to “In many secretory systems, Complexin plays a dual role in the regulation of SNARE-mediated vesicle fusion”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors): 

      The authors should perform experiments to answer this question: does Cav3 transcription increase in the G369i-KI, or is there instead some post-transcriptional modulation that permits surface expression of functional Cav3-containing channels in the absence of typical HVA Ca conductances? Also, the authors should determine whether G369i-KI can mediate Ca2+ release from intracellular stores and whether release from stores is upregulated as Cav3-containing channel expression (or function) is increased. 

      We performed transcriptomic (drop-seq) analysis to test whether a Cav3 subtype is upregulated in cones of G369i KI mice. These experiments show that, consistent with previous studies (PMID 35803735, 26000488), Cacna1h appears to be the primary Cav3 subtype expressed mouse cones. However, as shown in new Supp.Fig.S3, there was no significant difference in the levels of Cacna1h transcripts in WT and G369i KI cones. Therefore, we propose that there may be some post-transcriptional modification, or alteration in a pathway that regulates channel availability, that enables the contribution Cav3 channels to the whole-cell Ca2+ current in the absence of functional Cav1.4 channels cones.

      We also performed Ca2+ imaging experiments in WT vs G369i KI cone terminals to assess whether the diminutive Cav3 current in G369i KI cone terminals may be compensated by upregulation of a Ca2+ signal such as from intracellular stores. Arguing against this possibility, depolarization-evoked Ca2+ signals in G369i KI cones were dramatically reduced compared to WT cones (new Fig.9). 

      Reviewer #2 (Recommendations For The Authors): 

      Major points- 

      (1) It is stated in too many places that cone features in the Cav1.4 knock-in are "intact", preserved, or spared, but this representation is not accurate. There are two instances in this study that qualify as intact when comparing KI to WT: 1) the photopic a-waves in the Cav1.4 knock-in (also demonstrated in Maddox et al 2020) and 2) latency to the platform (current MS, Figure 7f). However, in the numerous instances listed below, the authors compared the Cav1.4 knock-in to the Cav1.4 knock-out, and then referred to the KI as exhibiting intact responses. The reference point for intactness needs to be wildtype, as appropriately done for Figures 2 and 3, and when comparing the KI to the KO the phrasing should be altered; for example: "the KI was spared from the extensive degeneration witnessed in the KO....". 

      In most cases, we clearly note that there are key differences in the WT and the G369i KI cone synapses, which highlight the importance of Cav1.4-specific Ca2+ signals for certain aspects of the cone synapse. We disagree with the reviewer on the point that we did not often use the WT as a reference since most of our experiments involved comparisons of only WT and G369i KI (Figs. 3-6) or WT, G369i KI, and Cav1.4 KO (Figs.1,7—and in these cases comparisons specifically between WT and G369i KI mice were included). We used “intact” as a descriptor for G369i KI cone synapses since these are actually present, albeit abnormal in the G369i KI retina, whereas cone synapses are completely absent in the Cav1.4 KO retina. To avoid confusion, we modified our use of “intact” and “preserved” where appropriate.

      A. Abstract, line 34 to 35: ".......preserved in KI but not in KO.". 

      Abstract was rewritten and this line was removed.

      B. Line 36: "....synaptogenesis remains intact". The MS documents many differences in the morphology of KI and WT cones (immunofluorescence and electron microscopy data), which is counter to an intact phenotype. 

      The sentence was: “In CSNB2, we propose that Cav3 channels maintain cone synaptic output provided that the Ca2+-independent role of Cav1.4 in cone synaptogenesis remains intact.”

      Here the meaning of “intact” refers to the Ca2+ -independent role of Cav1.4, not synapses. Thus, we have left the sentence unchanged.

      C. This strikes the right balance, lines 67 to 68: "....although greatly impaired.....". 

      D. Line 149, "Cone signaling to a postsynaptic partner is intact in G369i KI mice". This description is inaccurate. Here there is only WT and KI, and the text reads as follows in line 162: "terminals (Figure 6b). The ON and OFF components of EPSCs in G369i KI HCs were measurable, although lower in amplitude than in WT (Figure 6a,b)." Neither "measurable" nor "lower in amplitude" meet the definition of "intact", and actual numerical values are lacking in the text. 

      We have added results showing that there are no light responses in the Cav1.4 KO horizontal cells and have modified the sentence to: “Cone synaptic responses are present in horizontal cells of G369i KI but not Cav1.4 KO mice”. 

      We have modified discussion of these results as (line 210-213): “Consistent with the lack of mature ribbons and abnormal cone pedicles (Fig.1), HC light responses were negligible in Cav1.4 KO mice (Fig.8a,b). In contrast, the ON and OFF responses were present in G369i KI HCs although significantly lower in amplitude than in WT HCs (Fig. 8a,b).”

      E. Please add a legend to Figure 6a to indicate the intensities. The shape of the KI responses is different from the control which is worthy of discussion: i) there is no clear cessation of HC EPSCs in the KI during the light ON period (when release stops, Im fluctuations should be minimal), and ii) the "peaked" appearances of the initial 500ms of the On and Off periods are very similar in shape for the KI (hard to interpret in the same fashion as a control response). How were the On and Off amplitudes analyzed? Furthermore, the OFF current is not summarized in Figure 6D, but should not this be when Cav3 should be opening and triggering release: Off response-EPSC? Lastly, Figure 6b,d shows a ~70% reduction in On-current in the KI, and the KI example of 6b an 80% reduction in Off current compared to WT. Yet, the only place asterisks are used to indicate sig diff is the DNQX data within each genotype in Fig 6d. These data cannot be described as showing "intact" KI responses, and the absence of numerical and statistical values needs to be addressed. 

      New Fig.8a depicting the horizontal cell light responses has been modified to include the legend indicating light intensities. The ON and OFF amplitudes were analyzed as the peak current amplitudes. This information has been added to the legend.

      The reviewer is correct in that the OFF response represents the EPSC whereas the ON response represents the decrease in the EPSC with light. To avoid confusion, we changed the y axis label for the averaged data to read ON or OFF “response” rather than “current” in new Fig.8b.

      As the reviewer suggests, the more transient nature of the KI response during the light ON period could result from aberrant continuation of vesicular release during the light-induced hyperpolarization of cones in the KI mice, in contrast to the prolonged suppression of release by light which is evident in the WT responses. We speculated on this difference as follows (lines 237-241):

      “In addition to its smaller amplitude, the transient nature of the ON response in G369i KI HCs suggested inadequate cessation of cone glutamate release by light (Fig.8b). Slow deactivation of Cav3 channels and/or their activation at negative voltages20 could give rise to Ca2+ signals that support release following light-induced hyperpolarization of G369i KI cones.”

      We added astericks to new Fig.8b,d indicating statistical differences and description of the tests in the legend.

      F. line 168 the section titled "Light responses of bipolar cells and visual behavior is spared in G369i KI but not Cav1.4 KO mice". 

      Changed to: “Light responses of bipolar cells and visual behavior are present in G369i KI but not Cav1.4 KO mice”

      Last sentence of erg results, 189-190: "These results suggest that cone-to-CBC signaling is intact in G369i KI mice.". "Spared and intact" are not accurate descriptions. The ERG data presented here shows massive differences between WT and the KI, except in the instance of awaves. 

      This sentence was removed.

      As for Figure 6, the results text related to Figure 7a-d does not present real numbers for ERG responses, and there is no indication of significant differences there or in the Figure panels. For instance, in Figure 7b, b-waves are KI are comparable to KO, except at the two highest-intensity flashes that show KI responses ~20% the amplitude of WT. Presentation of KI and KO data on a 6- to 10-fold expanded scale higher than WT can be misleading: a quick read of these Figure panels might make one incorrectly conclude that the KI is intact while the KO is impaired when compared to WT. The Methods section needs more details on the ERG analysis (e.g. any filtering out of oscillatory potentials when measuring b-wave, and what was the allowable range of time-to-peak for b-wave amplitude, etc..). 

      The vertical scaling of the ERG results in new Fig.10c,d has been changed so as to reflect clearly diminished responses of the KO and KI vs the WT. Further details regarding the ERG analysis was added to the Methods section.

      G. Can you point to other studies that have used the "visible platform swim test" used in Figure 7e, f, and specify further how mice were dark/light adapted prior to the recordings? 

      As referenced in the Methods, original line 674, the methods we used for the swim test were described in our previous study (PMID 29875267). Other studies that have used this assay include PMIDs: 28262416, 26402607.

      (2) The Maddox et al 2020 study does not safely address whether rods have a residual T-type Ca2+ current in the Cav 1.4 KO or KI. The study showed that membrane currents measured from rods in the KI and KO retina were distinct from WT, supporting their claim that L-type Ca2+ current is absent in the KI and KO. However, the recordings had shortcomings that challenge the analysis of Ca2+ currents: i) collected at room temp (22-24{degree sign}C), ii) at an unknown distance from the terminal (uncertain voltage clamp), iii) with a very slow voltage ramp rate that is not suitable for probing T-type currents (Figure 1d Maddox 2020, 140 mV over 1 sec: 7msec/1mV), and iv) at a signal-to-noise that does not allow to resolve a membrane current under 1 pA (avg wt rod Ca2+ current was -3.5 pA, and line noise ~1pA peak-to-peak in Maddox 2020). Suggestion: say T-type currents were not probed in Maddox et al 2020, but Davison et al 2022 did not find PCR signal for Cav3.2 in rods. 

      We disagree that recordings in the Maddox 2020 study were not sufficient to uncover a T-type current. The voltage ramps in that study were not much slower than that of the Davison et al. 2022 study (they used 0.19 mV/ms). Moreover, in new Supp. Fig.S1, we show that like the slower voltage ramp (0.15 mV/ms) used in the prior study of G369i KI rods, the voltage ramps we used in the present study (0.5 mV/ms), which clearly evoke currents with T-type properties in G369i KI cones (Fig.2a,b, Fig.3a,b) do not evoke currents in WT or G369i KI rods.  

      Minor comments. 

      (1) Suggestion: add an overview panel to Figure 1 that shows the rod terminals in the KI. The problem is that cropping out the ribbon and active zone signals from rods, to highlight cones, can give the impression that the cones are partially spared in the KI, and the rods are not spared at all. (yet you nicely clarify this in Figure 4 and in the legend and text, etc.). 

      We chose to modify the legend with this information as in Fig.4 rather than modify the figure.

      (2) Mouse wt cone Ca2+ currents look like L-type currents, as do your monkey and squirrel cone recordings, and also much like those of mouse rods (see Figure S5, Hagiwara et al., 2018 or Grabner and Moser 2021). Your pharm data from mice and squirrels further supports your conclusion, and certainly took much effort. Davison et al 2022 J Neurosci showed PCR results that support their claim that a Cav3 current exists in wt cones. Questions: 1) have you tried PCR? 2) Can you offer more details on what Cav3 KO you tried and what antibodies failed to confirm the KO? As the authors know, one complication is that the deletion of one Cav can be compensated for by the expression of a new Cav. There are 3 types of Cav3s and removal of one type may be compensated for by another Cav3. 

      We have included drop-seq data (new Supp.Fig.S3) implicating Cav3.2 as the main Cav3 subtype in cones and have modified our discussion of these results accordingly. These experiments did not reveal any changes in Cav3 subtype expression in G369i KI vs WT cones.

      (3) Lines 95/96- onward, spend more time telling the story. When working out the biophysical and pharmacological behavior of the Ca2+ currents, you might want to initially refer to the membrane current as a membrane current, and then state how your voltage protocols, intra- and extra-cell solutions, and drugs helped you verify 1) L-type and 2) T-type Ca2+ currents. 

      We have modified the text with more detail.

      (4) If data is in hand, add a ramp I-V to Figure S2, which shows the response of the ground squirrel cone. The steps in S2a are excellent for making your point that a transient current is missing, and the bipolar is a great control to illustrate ML218 works. However, a comparison of a squirrel cone ramp to a bipolar ramp response could complete the figure. 

      See Reponse to #5 below.

      (5) Consider moving Supplementary Figures S2 and S3 to the main text; these are highly relevant to the story, novel, and well-executed. 

      Fig.S2 and S3 were added as new Figs.4,5. The new Fig.4 includes voltage ramps in ground squirrel cones (panel a) to compare with the bipolar data (panel f).

      (6) The nice electron microscopy reconstructions are not elaborated on in any detail, and there is no mention of ribbon size. Is the resolution sufficient to estimate ribbon size, the number of synaptic vesicles around the ribbon and in the adjacent cytosol? The images indicate major changes in the morphology of the terminals. Is the glial envelope similar in WT and KI? 

      Since ribbons were quantified extensively in the confocal analyses in Fig.6, we felt it unnecessary to add this to the EM analysis which focused mainly on aspects of 3D structure (i.e., arrangement of ribbons, postsynaptic wiring, cone pedicle morphology). We added further discussion of the change in morphology of the G369i KI cone pedicle (lines 200-203): “Compared to WT, ribbons in G369i KI pedicles appeared disorganized and were often parallel rather than perpendicular to the presynaptic membrane (Fig.7a-c). Consistent with our confocal analyses (Fig.1), G369i KI cone pedicles extended telodendria in multiple directions rather than just apically (Fig. 7a).”

      While we did not opt to characterize the glial envelope in WT cones, we did add an analysis of synaptic vesicles around ribbons to Table 2.

      (7) Discussion line 250: "we found no evidence for a functional contribution of Cav3 in our recordings of cones in WT mice (Figures. 2,3), ground squirrels, or macaque (Supplementary Figures S2 and S3).". I would not use "functional" in this context because when comparing your work to Davison et al 2022, they defined functional as a separate response component driven by Cav3. For instance, they examined the influence of their T-type current on exocytosis (by membrane capacitance) and other features like spiking Ca2+ transients. Suggestion: substitute functional with "detectable", and say "we found no detectable Cav currents". Or if you had Ttype staining, but not T-type Ca2+ currents, then say "no functional current even though there is staining...". 

      We have modified the text as (lines 336-338): “However, in contrast to recordings of WT mouse cone pedicles in a previous study21, we found no evidence for Cav3-mediated currents in somatic recordings of cones in WT mice (Figs.2,3).”

      We propose an alternative interpretation of the results in the Davison et al study concerning the conclusion that Cav3 channels contribute to Ca2+ spikes and exocytosis. That study used 100 µM Ni2+ to block a “T-type” contribution to spike activity in cones. In their Figs.4,5, the spikes are suppressed by 100 µM Ni2+ and 10 µM nifedipine, a Cav1 antagonist, and spared by the T-type selective drug Z944. This is problematic for several reasons. First, as shown by the authors

      (their Fig.2A1,A2) and others (PMID: 15541900), 100 µM Ni2+ inhibits Cav1-type currents in photoreceptors. Second, Z944 potentiates Cav1 current in their mouse cones (their Fig.2C1,C2). Thus, both reagents are suboptimal for dissecting the contribution of either Cav subtype to spiking activity. With respect to Cav3 channels and exocytosis, these authors interpreted a reduction in exocytosis upon holding at -39 mV compared to at -69 mV as indicating a loss of a T-type driven component of release. However, Cav1 channel inactivation (PMID: 12473074) could lead to the observed reduction in exocytosis at -30 mV.

      (8) Additional literature related to your Intro and Discussion. Regarding CSNB2, related mutations of active zone proteins, and what happens to Ca2+ currents when ribbons are deleted, you might want to consider the following studies that measure Ca2+ currents from rods: conditional KO of RIM1/2 (Grabner et al 2015 JN), KO of ELKS1/2 (Hagiwara et al, 2018 JCB), and KO of Ribeye (Grabner and Moser eLife 2021). In these studies, the Cav currents were absent in rods of the ELKS1/2 DKO, strongly reduced (80%) in the RIM1/2DKO, but altered in more subtle ways (activation-inactivation) without significantly changing steady-state Ca2+ current in the Ribeye KO. This does not seem to support some of the arguments you have made in the Introduction and Discussion regarding ribbon size and Ca2+ currents, yet the suggested literature is related to the topic at hand. 

      A description of these synaptic proteins as potential mediators of the effect of Cav1.4 on ribbon morphogenesis was added to the Discussion, lines 325-327.

      (9) Line 129: "Along with the major constituents of the ribbon, CtBP2, and RIBEYE", for clarity Ribeye has two domains, one that is identical to CtBP2 (B-domain) and the unique Ribeye domain (A-domain) that is only expressed at ribbon synapses. And, Piccolino is also embedded in the ribbon (Brandstaetter lab, Wichmann/Moser labs). In other words, Ribeye and Piccolino are the major constituents of the ribbon. 

      To avoid confusion, we simply mention Ctbp2 and RIBEYE in the context of the corresponding antibodies that were used to label ribbons.

      (10) Abstract: consider to rephrase "Ca2+-independent role of Cav1.4" by "Ca2+-permeationindependent role of Cav1.4" or alike 

      Sentence changed to: “In CSNB2, we propose that Cav3 channels maintain cone synaptic output provided that the nonconducting role of Cav1.4 in cone synaptogenesis remains intact.”

      Reviewer #3 (Recommendations For The Authors): 

      Cav1.4 voltage-gated calcium channels play an important role in neurotransmission at mammalian photoreceptor synapses. Mutations in the CACNA1f gene lead to congenital stationary night blindness that particularly affects the rod pathway. Mouse Cav1.4 knockout and Cav1.4 knockin models suggest that Cav1.4 is also important for the cone pathway. Deletion of Cav1.4 in the knockout models leads to signaling malfunctions and to abundant morphological re-arrangements of the synapse suggesting that the channel not only has a role in the influx of Ca2+ but also in the morphological organization of the photoreceptor synapse. Of note, also additional Cav-channels have been previously detected in cone synapses by different groups, including L-type Cav1.3 (Wu et al., 2007; pmid; Kersten et al., 2020; pmid), and also T-type Cav3.2 (Davison et al., 2021; pmid 35803735). 

      In order to study a conductivity-independent role of Cav1.4 in the morphological organization of photoreceptor synapses, the authors generated the knockin (KI) mouse Cav1.4 G369i in a previous study (Maddox et al., eLife 2020; pmid 32940604). The Cav1.4 G369i KI channel no longer works as a Ca2+-conducting channel due to the insertion of a glycine in the pore-forming unit (Madox et al. elife 2020; pmid 32940604). In this previous study (Madox et al. elife 2020; pmid 32940604), the authors analyzed Cav1.4 G369i in rod photoreceptor synapses. In the present study, the authors analyzed cone synapses in this KI mouse. 

      For this purpose, the authors performed a comprehensive set of experimental methods

      including immunohistochemistry with antibodies (also with quantitative analyses), electrophysiological measurements of presynaptic Ca2+ currents from cone photoreceptors in the presence/absence of inhibitors of L-type- and T-type- calcium channels, electron microscopy (FIB-SEM), ERG recordings and visual behavior tests of the Cav G369i KI in comparison to the Cav1.4 knockout and wild-type control mice. 

      The authors found that the non-conducting Cav channel is properly localized in cone synapses and demonstrated that there are no gross morphological alterations (e.g., sprouting of postsynaptic components that are typically observed in the Cav1.4 knockout). These findings demonstrate that cone synaptogenesis relies on the presence of Cav1.4 protein but not on its Ca2+ conductivity. This result, obtained at cone synapses in the present study, is similar to the previously reported results observed for rod synapses (Maddox et al., eLife 2020, pmid 32940604). No further mechanistic insights or molecular mechanisms were provided that demonstrated how the presence of the Cav channels could orchestrate the building of the cone synapse. 

      We respectfully disagree regarding the mechanistic advance of our study. As indicated by Reviewer 2, a major advance of our study is in providing a mechanism that can explain the longstanding conundrum that congenital stationary night blindness type 2 mutations that would be expected to severely compromise Cav1.4 function do not produce complete blindness. Our study provides an important contrast to the Maddox et al 2020 study in showing that rods and cones respond differentially to loss of Cav1.4 function, which is also relevant to the visual phenotypes of CSNB2. How the presence of Cav1.4 orchestrates cone synaptogenesis is an important topic that is outside the scope of our present study.

      In the present study, the authors also propose a homeostatic switch from L-type to (newly occurring) T-type calcium channels in the Cav1.4 G369i KI mouse as a consequence of the deficient calcium channel conductivity in the Cav1.4 G369i Cav1.4 KI mouse. In cones of the Cav1.4 G369i, the high-voltage activated, L-type Ca2+-entry was abolished, in agreement with their previous paper (Maddox et al., eLife 2020, pmid 32940604). The authors found a lowvoltage activated Ca2+ current instead that they assigned to T-type Ca2+-currents based on pharmacological inhibitor experiments. T-type Ca2+-currents/channels were already previously identified in other studies by independent groups and independent techniques

      (electrophysiology, RT-PCR, single-cell sequencing) in cones of wild-type mice (Davison et al.,

      2021, pmid 35803735; Macosko et al., 2015, pmid 26000488; Williams et al., 2022, pmid 35650675). In the present manuscript (Figures 3a/b), the authors also observed a low-voltage activated, T-type like current in cones of wild-type mice, that is isradipine-resistant and affected by the T-type inhibitor ML218. This finding appears compatible with a T-type-like current in wildtype cones and is consistent with the published data mentioned above, although the authors interpret this data in a different way in the discussion. 

      Due to the noise inherent in whole cell voltage clamp measurements and some crossover effects in the pharmacology, we cannot completely exclude the presence of a T-type current in WT mouse cones. However, our results very clearly support a conclusion opposite to that stated by the reviewer. Namely, if WT mouse cones have T-type Ca currents, then they are far smaller than those in the Cav1.4 G369i KI and KO cones. In particular, while we identified message for Cav3.2 in WT mouse cones, we were unable to identify a functional T-type current by either voltage clamp measurements or pharmacology. See below for a detailed rebuttal.

      This proposal of a homeostatic switch is not convincingly supported in this reviewer's opinion

      (for further details, please see below). Furthermore, no data on possible molecular mechanisms were provided that would support such a proposal of a homeostatic switch of calcium channels. No mechanistic/molecular insights were provided for a proposed homeostatic switch between Ltype to T-type channels that the authors propose to occur between wild-type and Cav1.4 G369i as a consequence of conduction-deficient Cav1.4 G369i channels. Is this e.g. based on posttranslational modifications that switch on T-type channels or regulation at the transcriptional level inducing expression of T-type calcium channel or on other mechanisms? The authors remain descriptive with their central hypotheses. No molecular mechanisms/signaling pathways were provided that would support the idea of such a homeostatic switch. 

      Homeostatic plasticity refers to the maintenance of neuronal function in response to some perturbation in neuronal activity and can result from changes in the expression of ion channel genes (PMID: 36377048, 32747440, 19778903) or regulatory pathways that modulate ion channels (PMID: 15051886, 32492405). We present multiple lines of evidence showing that Cav3 currents appear in cones upon genetically induced Cav1.4 loss of function and can support cone synaptic responses and visual behavior if cone synapse structure is maintained. Our new transcriptomic studies show no difference between levels of Cav3 channel transcripts in WT and G369i KI cones, suggesting that the appearance of the Cav3 currents in G369i KI cones does not result from an increase in Cav3 gene expression. We are currently investigating our transcriptomic dataset to determine if Cav3 regulatory pathways are upregulated in G369i KI cones and will present this in a follow-up study.

      The authors show residual photopic signaling in the non-conducting Cav1.4 G369i KI mouse as judged by the recording of postsynaptic currents, ERG recordings and visual behavior tests though in a reduced manner. The residual cone-based signaling could be based on the nonaffected T-type Ca2+ channel conductivity in cone synapses. Given that the L-type current through Cav1.4 is gone in the Cav1.4 G369i KI as previously shown (Maddox et al., 2020, pmid 32940604), the T-type calcium current will remain. However as discussed above, this does not necessarily support the idea of a homeostatic switch. 

      A major point which we highlighted with new results is that despite the expression of Cav3 transcripts in WT mouse cones, Cav3 channels do not contribute to the cone Ca2+ current. This is at odds with the Davison et al study (PMID: 35803735, see our response to Reviewer 2, pt 7 for caveats of this study), but our results convincingly show that the Cav3 current appears only when Cav1.4 is genetically inactivated. Pharmacological or electrophysiological methods that should reveal the presence of Cav3 currents do not change the properties of the Ca2+ current in cones of WT mice, ground squirrel, or macaque:

      • Figs.2-4: Voltage steps to -40 mV (Fig 2e) that activate a sizeable T-current in G369i KI mouse cones produce a negligible transient at pulse onset in WT mouse cones. Similarly, transient currents that are obvious in G369i KI mouse cones during the final step to -30 mV are absent in WT cones.  When we block Cav1.4 with isradipine either in cones of WT mice or ground squirrel, the current that remains does not resemble a Cav3 current but rather a scaled down version of the L-type current. ML218, which readily blocks Cav3 channels in HEK293T cells and in G369i KI cones, has only minor effects in cones of WT mice and ground squirrel; these effects of ML218 can be attributed to non-specific actions on Cav1.4 (new Supp.Fig.S2). New Fig.4 (moved from the supplementary data to the main article) clearly shows that the ML218-sensitive current in ground squirrel cones exhibits properties of Cav1.4 not Cav3 channels. 

      • Figs.2,5: Holding voltages that inactivate Cav3 channels have no effect on the Ca2+ current in cones of WT mice or macaque (recordings of macaque cones were moved from the supplement to the main article as new Fig.5).

      In Figure 4 the authors measured an increase in the size of the active zone (as judged by the size of the bassoon cluster) and of the synaptic ribbons in the Cav1.4 G369i. A mechanistic explanation for this phenomenon was not provided and the underlying molecular mechanisms were not unraveled. 

      The FIB-SEM data uncover some ultrastructural alteration/misalignments of the synaptic ribbons and misalignments of the regular arrangement of the postsynaptic dendrites in the G369i KI mice. Also concerning this observation, the study remains descriptive and does not reveal the underlying mechanisms as it would be expected for eLife. 

      We respectfully disagree on the descriptive nature of our study and the need for a full characterization of the molecular mechanism underlying the cone synaptic defects in the G369i KI mouse.   

      An important study in the field (Zanetti et al., Sci. Rep. 2021; pmid 33526839) should be also cited that used a gain-of-function mutation of Cav1.4 to analyze its functional and structural role in the cone pathway. 

      We have added citation of this paper to the Discussion (lines 354-356).

      In conclusion, the study has been expertly performed but remains descriptive without deciphering the underlying molecular mechanisms of the observed phenomena, including the proposed homeostatic switch of synaptic calcium channels. Furthermore, a relevant part of the data in the present paper (presence of T-type calcium channels in cone photoreceptors) has already been identified/presented by previous studies of different groups (Macosko et al., 2015; pmid 26000488; Davison et al., 2021; pmid 35803735; Williams et al., 2022; pmid 35650675). The degree of novelty of the present paper thus appears limited. I think that the study might be better suited in a more specialized journal than eLife. 

      We thank the reviewer for acknowledging the rigor of our study but disagree with their evaluation regarding the novelty of our work as outlined in our responses above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      My comments are largely limited to suggestions to make the manuscript easier to read and digest.

      In the abstract they say RNA sequencing highlights changes in innate...

      Could they be more specific? Innate immune system up or down? They do not indicate actual findings in the abstract.

      We thank the reviewer for the comment and we have revised the abstract accordingly.  

      Their use of non‐intuitive abbreviations is often confusing. Perhaps they can add a table in methods listing all the abbreviations so that the reader can follow the data better. mNGA, vmHT....etc.

      As suggested, we have now included a list of the abbreviations used in the paper.

      There are mis‐spellings in the manuscript.

      We have gone through the manuscript and corrected the mis-spellings.   

      Has the SPR RNAi line been validated?

      The SPR RNAi line that we used has been extensively validated by Yapici et al., 2007 and several subsequent publications. Importantly, the effectiveness of SPR knockdown is evident in female flies as they exhibit dramatically reduced egg laying and, importantly, lack the typical post-mating behaviors (such as rejection of male flies after initial mating) observed in the wild type mated female flies. In fact, female flies with RNAi-mediated SPR knockdown behave identically to females mated with SP-null male flies, confirming the effective disruption of the SP-SPR signaling pathway. We have revised the manuscript and added these statements in the results section concerning SPR RNAi.  

      In the figures showing the Climbing Index vs time, can they abbreviate seconds as sec vs s? At least I think it is seconds. At first, I thought it was Time or Times, and was confused about what they were indicating on those types of graphs (Figures 1D‐F).

      We have revised the figure as suggested by the reviewer.

      In Figure 3F, they have a significance indicated in an unclear manner. It looks like they are comparing neuropil to the cortex, but I think they really mean to compare the cortex of sham to cortex of D31?

      The reviewer was correct. We have revised figure 3F to make this clear.     

      In Figure 4B, what is the y‐axis? Percentage of what? Is that percentage of total flies?

      The reviewer was correct. We have revised the figure to make this clear. 

      In a figure like SF3 B, what is the y‐axis? "Norm. Accum. CI" Can they explain the abbreviation?

      We have revised the Y-axis label to be “Normalized accumulative CI”.  We have also made this clear in the legend.   

      In the methods, what does this mean: "Regions devoid of Hoechst and phalloidin signal in non‐physiologically appropriate areas were considered vacuoles"? What are non‐physiologically appropriate areas? To me, that would mean outside of the brain. I would have thought the areas should be physiologically appropriate (aka neuropil and cortex)? This is confusing.

      We have revised the method section to be more specific.  In the Drosophila brain, there are structures such as esophagus that are devoid of both Hoechst and phalloidin staining, which were excluded from our vacuole quantification.    

      Reviewer #2 (Recommendations For The Authors):

      Since I use mammalian systems, my comment about the confirmation of siRNA should be removed if this is not possible in the Drosophila system.

      We have revised the figures to include total N values when appropriate. Including individual n values for each experimental assay and condition will inevitably crowd the figure legends, so specific values are available upon request. 

      Regarding RNAi knockdown of sex peptide receptors (SPRs), we agree that confirmation of the knockdown by IHC or qRT-PCR will further strengthen our findings. It should be noted, however, that the RNAi line we used has been extensively validated by Yapici et al., 2007 and several subsequent publications. Importantly, the effectiveness of SPR knockdown is evident in female flies as they exhibit dramatically reduced egg laying and, importantly, lack the typical post-mating behaviors (such as rejection of male flies after initial mating) observed in the wild type mated female flies. In fact, female flies with RNAi-mediated SPR knockdown behave identically to females mated with SP-null male flies, confirming the effective disruption of the SP-SPR signaling pathway. We have revised the manuscript to include these statements in the results concerning the SPR RNAi knockdown.    

      Reviewer #3 (Recommendations For The Authors):

      (1) In Figures 1 and 2, the authors found that females have a lower climbing index in the acute phase in D17 injury, not due to neurodegeneration as shown no significant changes of brain vacuolation and other markers. However, in Figure 3, the authors found that female flies have a lower climbing index, more brain vacuolation, and neurodegeneration in the late phase. It's not very convincing that having a lower climbing index at the late phase is due to neurodegeneration. Is it possible that females suffered from more severe acute effects, at least in D17 injury?

      We thank the reviewer for this point. Female flies injured on D17 displayed acute climbing deficits at 90 minutes post-injury. Since we did not observe significant structural changes in the brain at this time, we believe that this short-term functional deficit is not due to acute neuronal death. Here it is important to note that males did not display any acute climbing deficits when injured on D17, which suggests that the females suffered from more severe acute effects than males. However, these injured female flies recovered fully at 24 hours post-injury and displayed no climbing deficits. At two weeks post-injury, we observe climbing deficits and increased vacuole formation as a direct result of the injuries on D17 (see Supplemental Figure 3). When we assessed sensorimotor behavior and brain vacuolation on D45, we found that the injured females had significantly lower climbing indices and more brain vacuolation than the non-injured females of the same age. In this case, the concurrent observance of decreased climbing ability and increased brain vacuolation suggests chronic neurodegeneration in aged, injured females. This is not to be confused with the acute neuronal death observed by other groups using injury models of stronger severity. Overall, our data are consistent with the current view that in many neurodegenerative diseases, functional deficits often precede observable brain degeneration, which may take years to manifest.

      (2) The authors determined late‐life brain deficits and neurodegeneration purely based on climbing index and vacuole formation. These phenotypes are not really specific to TBI‐related neurodegeneration and the significance and mechanisms of vacuole formation are not clear. Indeed, in Figures 3 A and B, male flies especially D31inj tend to have a much larger variation than any other groups. What could be the reasons? The authors should perform additional analyses on TBI‐related neurodegeneration in flies, which have been shown before, such as retinal degeneration and loss, neuronal degeneration, and loss, neuromuscular junction abnormalities, etc (Genetics. 2015 Oct; 201(2): 377‐402).

      We thank the reviewer for the thorough evaluation of our manuscript. The reviewer raised a very important question: whether the neurodegeneration observed in our model is specific to TBI. As the reviewer rightly pointed out, the neurodegenerative phenotypes are unlikely to be specific to TBI-related neurodegeneration. Throughout the manuscript, we have tried to convey the notion that the mild physical impacts to the head represent one form of environmental insults, which in combination with other risk factors such as aging can lead to the emergence of neurodegenerative conditions. It should be noted that the negative geotaxis assay and vacuolation quantification are two well-established approaches to assess sensorimotor deficits and frank brain degeneration in fly brains. 

      It is important to emphasize that the head-specific impacts delivered to the flies in our study are much milder than those used in previous studies. As we showed in our figure 1, this very mild form of head trauma (referred to as vmHT) did not cause any death, nor affected the lifespan of the injured flies. Our supplemental data also show very minimal structural neuronal damage and no acute and chronic apoptosis induced by vmHT exposure. Consistently, we did not observe any exoskeletal or eye damage immediately following injuries, nor did we observe any retinal degeneration and pseudopupil loss at the chronic stage of these flies. We have incorporated these important points in the revised manuscript.  

      (3) In Figure 4, it would be important to perform the behavior test fly speed and directional movement in the acute phase as well to determine whether the females have reduced performance at the acute phase.

      We thank the reviewer for this suggestion. Please note that our modified NGA has already improved the spatiotemporal resolution over the classic NGA.  The data presented in Fig.3 show that there are no acute deficits for young cohorts.  Therefore, we do not believe that the detailed analysis of the direction and speed of these flies is essential.  

      Unfortunately, the current setup for the AI-based analysis requires manual corrections of tracking errors, which are time-consuming and tedious.  We are building a newly designed AI-based NGA (NGA.ai) that will allow automatic tracking and quantification with minimal manual interventions. Once it is completed, we will perform some of the analyses that the reviewer suggested.  

      (4) In Figure 8, the authors performed an RNA‐seq analysis and identified some dysregulated gene expressions. However, it is really surprising to see so few DEGs even in wild‐type males and mated females, and to see that none of DEGs overlap among groups or related to the SP‐signaling. This raises questions about the validity of the RNAseq analysis. It is critical to independently verify their RNA‐sequencing results and to add some more molecular evidence to support their conclusion.

      We agree that future studies are needed to independently validate our RNA sequencing results. We believe that the small number of DEGs are likely due to two unique features of our study: (1) the very mild nature of our injury paradigm and (2) the chronic examination timepoint that was long after the head injury and SP exposure, which distinguish our study from previous fly TBI studies.  As pointed out in the manuscript, our study was aimed to understand how early life exposure to repetitive head traumatic insults could lead to the latelife onset of neurodegenerative conditions. We hope to further validate our results in our next phase of experiments using single-cell RNA sequencing and RT-qPCR. 

      (5) The current results raise a series of interesting questions: what implication of female fly mating and its associated Sex Peptide signaling would be to mammalians or humans? Would mammalian female animals mating with wild‐type or sex hormone‐null male animals have different effects on their post‐injury behavior tests or neuropathological changes? What are the mechanisms underlying the sexual dimorphism?

      As the reviewer pointed out, it would be very interesting to explore the possible roles of sex peptide-signaling in other animals and humans. As far as we know, there is no known mammalian ortholog to the insect sex peptide, so it would be difficult to study SP or an SPlike molecule in mammalian models. However, we believe that prolonged post-mating changes associated with reproduction in female fruit flies contribute to their elevated vulnerability to neurodegeneration.  In this regard, drastic changes within the biology of female mammals associated with reproduction can potentially lead to vulnerability to neurodegeneration. We agree that this demands further study, which may be done with future collaborators using rodent or large animal models.  We have discussed this point in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank you very much for reviewing our manuscript and express our sincere appreciation for the valuable and thoughtful comments that led us to significantly improve the manuscript on Fshr-ZsGreen reporter mice. We have seriously taken your comments to make a major revision of the manuscript, and here is a summary of the revision:

      (1) New data on Fshr expression are input to the revised Manuscript:

      a. Fshr expression in the testis and adipose tissues (WAT and BAT) of B6 mice;

      b. Fshr expression in the testis of B6 by RNA-smFISH;

      c. Comparison of Fshr expression in the testis and ovary between Fshr-ZsGreen and B8 mice by ddRT-PCR to prove Fshr expression without interruptions by insertion of P2A-ZsGreen vector;

      d. Reduction of Fshr expression in osteocytes within the femoral sections from DMP1-CreERT2:Fshrfl/fl mice;

      e. Fshr expression in an established Leydig cell line-TM3 by immunofluorescence and ddRT-PCR, also show Fshr located in the nuclei of TM3 cells;

      f. Fshr expression at scRNA-seq level from 5 public single cell portals as Supplementary Data 3 to support our findings of the widespread expression pattern of Fshr, particularly in Leydig cells.

      (2) Re-organization of Figure 2 with a new legend.

      (3) A new paragraph is added to the Discussion Section of the revised MS to explain the function of P2A peptide in generation of GFP reporter mice and why Fshr express is not interrupted by the P2A-ZsGreen insertion in Fshr-ZsGreen reporter.

      (4) Deletion of Figure 1-D-c, as it is not necessary.

      (5) Replace of Figure 8-A (the left panel) with a reduced exposure time image.

      (6) Amended parts of the revised MS are labeled in red.

      A point by point response to the Reviewers’ comments:

      Reviewer 1:

      One of the shocking observations in this manuscript is the expression of FSHR in Leydig cells. Other observations are in the osteoblasts and endothelial cells as well as epithelial cells in different organs. The expression of ZsGreen in these tissues seems high and one shall start questioning if there are other mechanisms at play here.

      First, the turnover of fluorescent proteins is long, longer than 48h, which means that they accumulate at a different speed than the endogenous FSHR This means that ZsGreen will accumulate in time while the FSHR receptor might be degraded almost immediately. This correlated with mRNA expression (by the authors) but does not with the results of other studies in single-cell sequencing (see below).

      The expression of ZsGreen in Leydig cells seems much higher than in Sertoli cells, this is "disturbing" to put it mildly. This is visible in both the ZsGreen expression and the FISH assay (Figure 2 B-D).

      Thank you for this valuable comments. We added new data on Fshr expression to prove the presence of Fshr in Leydig cells in B6 detected by immunofluorescence staining, RNA-smFISH and ddRT-PCR, as well as in TM3 cells-isolated Leydig cells from a male mice in the revise MS (Fig 2E, F and G), that demonstrate no interruptions of normal Fshr expression by insertion of P2A-ZsGreen vector into a locus located between exon10 and stop code. We use ZsGreen as an indicator for active Fshr promoter status, rather than a method to measure Fshr expression, which is done by ddRT-PCR. These data are shown in Figure 2G of the revised MS

      In addition, we provide scRNA-seq based evidence on Fshr expression in human Leydig cells from two single cell portals (DISCO and BioGPS) as shown in Supplementary Data 3 in the revised MS. We also cited a recent report on scRNA-seq analysis of Fshr expression in Hu sheep in the revised MS as Reference 65 (PMID: 37541020) 1, which also clearly showed Fshr expression in Leydig cells at single cell level in Hu Sheep.

      We believe that the lack of Fshr expression in some single cell databases may be due to the degradation of Fshr transcript in cells during the process of single cell populations. In our laboratory, we spent more than 6 months to optimize methods and reagents to perverse mRNA integrity more than 8 for RAN-seq.

      The expression in WAT and BAT is also questionable as the expression of ZsGreen is high everywhere. That makes it difficult to believe that the images are truly informative. For example, the stainings of aorta show the ZsGreen expression where elastin and collagen fibres are - these are not "cells" and therefore are not expressing ZsGreen.

      FISH expression (for FSHR) in WT mice is missing.

      Also, the tissue sections were stained with the IgG only (neg control) but in practice both the KI and the WT tissues should be stained with the primary and secondary antibodies. The only control that I could think of to truly get a sense of this would be a tagged receptor (N-terminal) that could then be analysed by immunohistochemistry.

      Reply 2 and 3: Thank you for these comments. New data on Fshr expression in WAT and BAT of B6 mice by immunofluorescence staining and in the testis of B6 mice by immunofluorescence staining and RNA-smFISH are added to the revised MS (Fig.2D and E, and Fig. 4G), showing similar patterns to that of Fshr-ZsGreen mice. Furthermore, we provide more evidences as Supplementary Data 3 on Fshr expression obtained from 4 public single cell portables, showing FSHR expression in a widespread organs and tissues (including different fractions of adipose cells) of human, mice and rat at single cell levels. Please also check Fshr expression pattern in adipose tissues by immunostaining for Fshr in previous reports (Fig. 3a of PMID: 28538730 and Fig. 2 of PMID: 25754247) 2 3, which showed a similar expression pattern to our finding. These data should address your concerns on Fshr expression in WAT and BAT and other organs/tissues.

      Regard of “For example, the stainings of aorta show the ZsGreen expression where elastin and collagen fibres are - these are not "cells" and therefore are not expressing ZsGreen.” We believe that you referred to the image of the aorta in Supplementary Data2. However, Please take a look at the images of the aorta in Figure 5-C, which shows positively stained the layer of ‘elastin and collagen fibres’ for EMCN and a-SMA colocalized with Fshr expression with stained DAPI at a 1000X magnification, indicating endothelial cells and the cellular membrane presented in this layer, not just ‘elastin and collagen’.

      The authors also claim:

      To functionally prove the presence of FSHR in osteoblasts/osteocytes, we also deleted FSHR in osteocytes using an inducible model. The conditional knockout of FSHR triggered a much more profound increase in bone mass and decrease in fat mass than blockade by FSHR antibodies (unpublished data).

      This would be a good control for all their images. I think it is necessary to make the large claim of extragonadal expression, as well as intragonadal such as Leydig cells.

      Thank you for this very encouraging comment. As you suggested, we did add a result of reduced Fshr expression in osteocytes from DMP1-CreERT2+:Fshrfl/fl mice treated with tamoxifen to the revise MS, as shown in Figure 3D, demonstrating Fshr present in osteocytes and the specificity of Fshr antibody. Furthermore, we incorporated your advice on making ‘ large claim of extrogonadal and intragonadal expression of Fshr’ into the revised MS in red.

      Claiming that the under-developed Leydig cells in FSHR KO animals are due to a direct effect of the FSHR, and not via a cross-talk between Sertoli and Leydig cells, is too much of a claim. It might be speculated to some degree but as written at the moment it suggests this is "proven".

      Thank you for pointing out this incorrect claim and we apologized for it. In the revised MS, we deleted this claim.

      We also do not know if this FSHR expressed is a spliced form that would also result in the expression of ZsGreen but in a non-functional FSHR, or whether the FSHR is immediately degraded after expression. The insertion of the ZsGreen might have disturbed the epigenetics, transcription, or biosynthesis of the mRNA regulation.

      Thanks for this comment. In the revised MS, we added a new section to explain the function of P2A peptide in generation of a GFP reporter by sgRNA-guilded site specific knockin of P2A ZsGreen vector through CRISPRA/cas9 and provided a new result on comparison of Fshr expression in the testes and ovaries from Fshr-ZsGreen and B6 mice, showing equivalent Fshr expression between Fshr-ZsGreen and B6 mice (Figure 2G), which indicates no interruptions of Fshr expression by the insertion of P2A vector.

      The authors should go through single-cell data of WT mice to show the existence of the FSHR transcript(s).<br /> For example here:<br /> https://www.nature.com/articles/sdata2018192

      Thank you so much for the valuable comment. Yes, we took you critical advice to check Fshr expression through 4 single cell portals, including DISCO, GTEx, BioGPS and Human single cell portal, and present the collected data as Supplementary Data 3 in the revised MS, that strongly support our findings of the wider Fshr expression. Particularly, Fshr expression in Leydig cells is proved by scRNA-seq studies of human cells from DISCO and BioGPS, as well as a recent study in Hu sheep (PMID: 37541020) 1 and we cited it in the revised MS.

      Reviewer 2:

      Is the FSHR expression pattern affected by the knockin mice (no side-by-side comparison between wt and GSGreen mice, using in situ hybridization and ddRTPCR, at least in the gonads, is provided)?

      Thanks for the comment. In the revised MS, we provided a set of new data on Fshr expression in the testis, ovary, WAT and BAT of B6 mice by immunofluorescence staining and by RNA-smFISH for Fshr expression, showing similar expression patterns. Additionally, we also performed ddRT-PCT to compare Fshr expression in the testes and ovaries between Fshr-ZsGreen and B6 mice, demonstrating equivalent expression of Fshr expression between Fshr-ZsGreen and B6 mice. Interestingly, we also observed an significantly higher Fshr expression in the testis than that in the ovary (more than 30 folds).

      Is the splicing pattern of the FSHR affected in the knockin compared to wt mice, at least in the gonads?

      Thanks for the question. Please see our reply to the Reviewer 1 for the function of P2A peptide used for generation of GFP reporters.  Although we didn’t directly assess the splicing pattern, we provide a result of comparison of Fshr expression in Figure 2F in the revised MS, indirectly showing no changes of the splicing pattern. We will assess the splicing pattern of Fshr in the future that has been neglected in the field.

      Are there any additional off-target insertions of GSGreen in these mice?” and “Are similar results observed in separate founder mice?

      Thanks for the questions. As we describe it in the method section  in detail in the MS, Fshr-ZsGreen reporter was produced by the a site-specific long ssDNA recombination of the P2A-ZsGreen targeting vector to the locus between Exon10 and stop code by CRIPRA/cas9, which was guided by site-specific single guide RNA (sgRNA). We showed the results of Southern blot, DNA sequencing and site-specific PCR, proving the site-specific insertion of P2A-ZsGreen as shown in Figure 1. Because of the site-specific recombination, professionally, only one funder line is required for the study and there are no additional off-target insertions.

      How long is GSGreen half-life? Could a very long half-life be a major reason for the extremely large expression pattern observed?

      Thanks for the question. The half life of ZsGreen, also called ZsGreen1, is at least 26 h in mammalian cells or slightly longer due to its tetrameric structure, in contrast with the monomeric configuration of other well-known fluorescent proteins (PMID: 17510373) 4. The rationale for using this GFP protein is that ZsGreen is an exceptionally bright green fluorescent protein, which is up to 4X brighter than EGFP—and is ideally suited for whole-cell labelling, promoter-reporter studies, considering of the higher turnover and rapid degradation of Fshr transcript. In this study, we used ZsGreen as a monitor or an indicator of the active Fshr endogenous promoter, rather than a means for measuring the promoter activity. Therefore, regardless of its accumulation or not, ZsGreen driven by Fshr promoter, indicates the presence of active Fshr promoter in the defined cells. In stead, we used ddRT-PCR to measure Fshr expression degrees in this study. In addition, we also provide single cell sequence-based evidence from 4 public single cell portables to support our findings of the wide Fshr expression. Please see Supplementary Data 3 in the revised MS.

      References:

      (1) Su J, Song Y, Yang Y, et al. Study on the changes of LHR, FSHR and AR with the development of testis cells in Hu sheep. Anim Reprod Sci. Sep 2023;256:107306. doi:10.1016/j.anireprosci.2023.107306

      (2) Liu P, Ji Y, Yuen T, et al. Blocking FSH induces thermogenic adipose tissue and reduces body fat. Nature. Jun 1 2017;546(7656):107-112. doi:10.1038/nature22342

      (3) Liu XM, Chan HC, Ding GL, et al. FSH regulates fat accumulation and redistribution in aging through the Galphai/Ca(2+)/CREB pathway. Aging Cell. Jun 2015;14(3):409-20. doi:10.1111/acel.12331

      (4) Bell P, Vandenberghe LH, Wu D, Johnston J, Limberis M, Wilson JM. A comparative analysis of novel fluorescent proteins as reporters for gene transfer studies. J Histochem Cytochem. Sep 2007;55(9):931-9. doi:10.1369/jhc.7A7180.2007

    1. Author response:

      eLife assessment

      This useful study examines the neural activity in the motor cortex as a monkey reaches to intercept moving targets, focusing on how tuned single neurons contribute to an interesting overall population geometry. The presented results and analyses are solid, though the investigation of this novel task could be strengthened by clarifying the assumptions behind the single neuron analyses, and further analyses of the neural population activity and its relation to different features of behaviour.

      Thanks for recognizing the content of our research, and please stay tuned for our follow-up studies on neural dynamics during interception.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study addresses the question of how task-relevant sensory information affects activity in the motor cortex. The authors use various approaches to address this question, looking at single units and population activity. They find that there are three subtypes of modulation by sensory information at the single unit level. Population analyses reveal that sensory information affects the neural activity orthogonally to motor output. The authors then compare both single unit and population activity to computational models to investigate how encoding of sensory information at the single unit level is coordinated in a network. They find that an RNN that displays similar orbital dynamics and sensory modulation to the motor cortex also contains nodes that are modulated similarly to the three subtypes identified by the single unit analysis.

      Strengths:

      The strengths of this study lie in the population analyses and the approach of comparing single-unit encoding to population dynamics. In particular, the analysis in Figure 3 is very elegant and informative about the effect of sensory information on motor cortical activity. The task is also well designed to suit the questions being asked and well controlled.

      We appreciate these kind comments.

      It is commendable that the authors compare single units to population modulation. The addition of the RNN model and perturbations strengthen the conclusion that the subtypes of individual units all contribute to the population dynamics. However, the subtypes (PD shift, gain, and addition) are not sufficiently justified. The authors also do not address that single units exhibit mixed modulation, but RNN units are not treated as such.

      We’re sorry for not providing sufficient grounds to introduce the subtypes. We determined the PD shift, gain, and addition as pertinent subtypes based on classical cosine tuning model (Georgopoulos et al., 1982) and referred to some gain modulation studies (e.g. Pesaran et al. 2010, Bremner and Andersen, 2012). Here, we applied this subtype analysis as a criteria to identify the modulation in neuronal population rather than to sort neuron into distinct cell types. We will update Methods in the revised version of manuscript.

      Weaknesses:

      The main weaknesses of the study lie in the categorization of the single units into PD shift, gain, and addition types. The single units exhibit clear mixed selectivity, as the authors highlight. Therefore, the subsequent analyses looking only at the individual classes in the RNN are a little limited. Another weakness of the paper is that the choice of windows for analyses is not properly justified and the dependence of the results on the time windows chosen for single-unit analyses is not assessed. This is particularly pertinent because tuning curves are known to rotate during movements (Sergio et al. 2005 Journal of Neurophysiology).

      The mixed selectivity or precisely the mixed modulation is indeed a significant feature of neuronal population in the present study. The purpose of the subtype analysis was to serve as a criterion for the potential modulation mechanisms. However, the results appear to be a spectrum than clusters. It still through some insights to understand the modulation distribution and we will refine the description in the next version. In the current version, we observed single-unit tuning and population neural state with sliding windows, focusing on the period around movement onset (MO) due to the emergence of a ring-like structure. We will clarify the choice of windows and the dependence assessment in the next version. It’s a great suggestion to consider the role of rotating tuning curves in neural dynamics during interception.

      This paper shows sensory information can affect motor cortical activity whilst not affecting motor output. However, it is not the first to do so and fails to cite other papers that have investigated sensory modulation of the motor cortex (Stavinksy et al. 2017 Neuron, Pruszynski et al. 2011 Nature, Omrani et al. 2016 eLife). These studies should be mentioned in the Introduction to capture better the context around the present study. It would also be beneficial to add a discussion of how the results compare to the findings from these other works.

      Thanks for the reminder. We will introduce the relevant research in the next version of manuscript.

      This study also uses insights from single-unit analysis to inform mechanistic models of these population dynamics, which is a powerful approach, but is dependent on the validity of the single-cell analysis, which I have expanded on below.

      I have clarified some of the areas that would benefit from further analysis below:

      (1) Task:

      The task is well designed, although it would have benefited from perhaps one more target speed (for each direction). One monkey appears to have experienced one more target speed than the others (seen in Figure 3C). It would have been nice to have this data for all monkeys.

      Great suggestion! However, it’s hard to implement as the implanted arrays have been removed.

      (2) Single unit analyses:

      In some analyses, the effects of target speed look more driven by target movement direction (e.g. Figures 1D and E). To confirm target speed is the main modulator, it would be good to compare how much more variance is explained by models including speed rather than just direction. More target speeds may have been helpful here too.

      Nice suggestion! The fitting goodness of the simple model (just motor direction) is much less than the complex model (including target speed). We will update the results in the next version.

      The choice of the three categories (PD shift, gain addition) is not completely justified in a satisfactory way. It would be nice to see whether these three main categories are confirmed by unsupervised methods.

      A good point. We will have a try with unsupervised methods. 

      The decoder analyses in Figure 2 provide evidence that target speed modulation may change over the trial. Therefore, it is important to see how the window considered for the firing rate in Figure 1 (currently 100ms pre - 100ms post movement onset) affects the results.

      Thanks for the suggestion and close reading. We will test the decoder in other epochs.

      (3) Decoder:

      One feature of the task is that the reach endpoints tile the entire perimeter of the target circle (Figure 1B). However, this feature is not exploited for much of the single-unit analyses. This is most notable in Figure 2, where the use of a SVM limits the decoding to discrete values (the endpoints are divided into 8 categories). Using continuous decoding of hand kinematics would be more appropriate for this task.

      This is a very reasonable suggestion. In this study, we discrete the reach-direction as the previous studies (Li et al., 2018&2022) and thought that the discrete decoding was already enough to show the interaction of sensory and motor variables. In future studies, we will try continuous decoding of hand kinematics.

      (4) RNN:

      Mixed selectivity is not analysed in the RNN, which would help to compare the model to the real data where mixed selectivity is common. Furthermore, it would be informative to compare the neural data to the RNN activity using canonical correlation or Procrustes analyses. These would help validate the claim of similarity between RNN and neural dynamics, rather than allowing comparisons to be dominated by geometric similarities that may be features of the task. There is also an absence of alternate models to compare the perturbation model results to.

      Thank you for these helpful suggestions. We will perform decoding analysis on RNN units to verify if there is interaction of sensory and motor variables as in real data, as well as the canonical correlation or Procrustes analysis.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Zhang et al. examine neural activity in the motor cortex as monkeys make reaches in a novel target interception task. Zhang et al. begin by examining the single neuron tuning properties across different moving target conditions, finding several classes of neurons: those that shift their preferred direction, those that change their modulation gain, and those that shift their baseline firing rates. The authors go on to find an interesting, tilted ring structure of the neural population activity, depending on the target speed, and find that (1) the reach direction has consistent positioning around the ring, and (2) the tilt of the ring is highly predictive of the target movement speed. The authors then model the neural activity with a single neuron representational model and a recurrent neural network model, concluding that this population structure requires a mixture of the three types of single neurons described at the beginning of the manuscript.

      Strengths:

      I find the task the authors present here to be novel and exciting. It slots nicely into an overall trend to break away from a simple reach-to-static-target task to better characterize the breadth of how the motor cortex generates movements. I also appreciate the movement from single neuron characterization to population activity exploration, which generally serves to anchor the results and make them concrete. Further, the orbital ring structure of population activity is fascinating, and the modeling work at the end serves as a useful baseline control to see how it might arise.

      Thank you for recognizing our work.

      Weaknesses:

      While I find the behavioral task presented here to be excitingly novel, I find the presented analyses and results to be far less interesting than they could be. Key to this, I think, is that the authors are examining this task and related neural activity primarily with a single-neuron representational lens. This would be fine as an initial analysis since the population activity is of course composed of individual neurons, but the field seems to have largely moved towards a more abstract "computation through dynamics" framework that has, in the last several years, provided much more understanding of motor control than the representational framework has. As the manuscript stands now, I'm not entirely sure what interpretation to take away from the representational conclusions the authors made (i.e. the fact that the orbital population geometry arises from a mixture of different tuning types). As such, by the end of the manuscript, I'm not sure I understand any better how the motor cortex or its neural geometry might be contributing to the execution of this novel task.

      The present study shows the sensory modulation on motor tuning in single units and neural state during motor execution period. It’s a pity that the findings were constrained in certain time windows. We are still working this topic, and hopefully will address related questions in our follow-up studies.

      Main Comments:

      My main suggestions to the authors revolve around bringing in the computation through a dynamics framework to strengthen their population results. The authors cite the Vyas et al. review paper on the subject, so I believe they are aware of this framework. I have three suggestions for improving or adding to the population results:

      (1) Examination of delay period activity: one of the most interesting aspects of the task was the fact that the monkey had a random-length delay period before he could move to intercept the target. Presumably, the monkey had to prepare to intercept at any time between 400 and 800 ms, which means that there may be some interesting preparatory activity dynamics during this period. For example, after 400ms, does the preparatory activity rotate with the target such that once the go cue happens, the correct interception can be executed? There is some analysis of the delay period population activity in the supplement, but it doesn't quite get at the question of how the interception movement is prepared. This is perhaps the most interesting question that can be asked with this experiment, and it's one that I think may be quite novel for the field--it is a shame that it isn't discussed.

      Great idea! We are on the way, and close to complete the puzzle.

      (2) Supervised examination of population structure via potent and null spaces: simply examining the first three principal components revealed an orbital structure, with a seemingly conserved motor output space and a dimension orthogonal to it that relates to the visual input. However, the authors don't push this insight any further. One way to do that would be to find the "potent space" of motor cortical activity by regression to the arm movement and examine how the tilted rings look in that space (this is actually fairly easy to see in the reach direction components of the dPCA plot in the supplement--the rings will be highly aligned in this space). Presumably, then, the null space should contain information about the target movement. dPCA shows that there's not a single dimension that clearly delineates target speed, but the ring tilt is likely evident if the authors look at the highest variance neural dimension orthogonal to the potent space (the "null space")--this is akin to PC3 in the current figures, but it would be nice to see what comes out when you look in the data for it.

      Nice suggestion. Target-speed modulation mainly influences PC3, which is consistent with ‘null space’ hypothesis. We will try other methods of dimensionality reduction (e.g. dPCA, Manopt) to determine the potent and null space.

      (3) RNN perturbations: as it's currently written, the RNN modeling has promise, but the perturbations performed don't provide me with much insight. I think this is because the authors are trying to use the RNN to interpret the single neuron tuning, but it's unclear to me what was learned from perturbing the connectivity between what seems to me almost arbitrary groups of neurons (especially considering that 43% of nodes were unclassifiable). It seems to me that a better perturbation might be to move the neural state before the movement onset to see how it changes the output. For example, the authors could move the neural state from one tilted ring to another to see if the virtual hand then reaches a completely different (yet predictable) target. Moreover, if the authors can more clearly characterize the preparatory movement, perhaps perturbations in the delay period would provide even more insight into how the interception might be prepared.

      We are sorry that we didn’t clarify the definition of “none” type, which can be misleading. The 43% unclassified nodes include those inactive ones, when only activate (task-related) nodes included, the ratio of unclassified nodes would be much lower. By perturbing the connectivity, we intended to explore the interaction between different modulations.

      Thank you for the great advice. We tried moving neural states from one ring to another without changing the directional cluster, but this perturbation didn’t have a significant influence on network performance as expected. We will check this result again and try perturbations in the delay period.

      Reviewer #3 (Public Review):

      Summary:

      This experimental study investigates the influence of sensory information on neural population activity in M1 during a delayed reaching task. In the experiment, monkeys are trained to perform a delayed interception reach task, in which the goal is to intercept a potentially moving target.

      This paradigm allows the authors to investigate how, given a fixed reach endpoint (which is assumed to correspond to a fixed motor output), the sensory information regarding the target motion is encoded in neural activity.

      At the level of single neurons, the authors found that target motion modulates the activity in three main ways: gain modulation (scaling of the neural activity depending on the target direction), shift (shift of the preferred direction of neurons tuned to reach direction), or addition (offset to the neural activity).

      At the level of the neural population, target motion information was largely encoded along the 3rd PC of the neural activity, leading to a tilt of the manifold along which reach direction was encoded that was proportional to the target speed. The tilt of the neural manifold was found to be largely driven by the variation of activity of the population of gain-modulated neurons.

      Finally, the authors studied the behaviour of an RNN trained to generate the correct hand velocity given the sensory input and reach direction. The RNN units were found to similarly exhibit mixed selectivity to the sensory information, and the geometry of the « neural population » resembled that observed in the monkeys.

      Strengths:

      - The experiment is well set up to address the question of how sensory information that is directly relevant to the behaviour but does not lead to a direct change in behavioural output modulates motor cortical activity.

      - The finding that sensory information modulates the neural activity in M1 during motor preparation and execution is non trivial, given that this modulation of the activity must occur in the nullspace of the movement.

      - The paper gives a complete picture of the effect of the target motion on neural activity, by including analyses at the single neuron level as well as at the population level. Additionally, the authors link those two levels of representation by highlighting how gain modulation contributes to shaping the population representation.

      Thanks for your recognition.

      Weaknesses:

      - One of the main premises of the paper is the fact that the motor output for a given reach point is preserved across different target motions. However, as the authors briefly mention in the conclusion, they did not record muscle activity during the task, but only hand velocity, making it impossible to directly verify how preserved muscle patterns were across movements. While the authors highlight that they did not see any difference in their results when resampling the data to control for similar hand velocities across conditions, this seems like an important potential caveat of the paper whose implications should be discussed further or highlighted earlier in the paper.

      Thanks for the suggestion. We will highlight the resampling results as important control in the next version of manuscript.

      - The main takeaway of the RNN analysis is not fully clear. The authors find that an RNN trained given a sensory input representing a moving target displays modulation to target motion that resembles what is seen in real data. This is interesting, but the authors do not dissect why this representation arises, and how robust it is to various task design choices. For instance, it appears that the network should be able to solve the task using only the motion intention input, which contains the reach endpoint information. If the target motion input is not used for the task, it is not obvious why the RNN units would be modulated by this input (especially as this modulation must lie in the nullspace of the movement hand velocity if the velocity depends only on the reach endpoint). It would thus be important to see alternative models compared to true neural activity, in addition to the model currently included in the paper. Besides, for the model in the paper, it would therefore be interesting to study further how the details of the network setup (eg initial spectral radius of the connectivity, weight regularization, or using only the target position input) affect the modulation by the motion input, as well as the trained population geometry and the relative ratios of modulated cells after training.

      Great suggestions. It’s a considerable pity that we didn’t dissect the formation reason and influence factor of the representation in the current version. We’ve tried several combinations of inputs before: in the network which received only motor intention and GO inputs, there were rings but not tilting related to target-speed; in the network which received only target location and GO inputs, there were ring-like structures but not clear directional clusters. We will check these results and try alternative models in the next version. In future studies, we will examine the influence of network setup details.

      - Additionally, it is unclear what insights are gained from the perturbations to the network connectivity the authors perform, as it is generally expected that modulating the connectivity will degrade task performance and the geometry of the responses. If the authors wish the make claims about the role of the subpopulations, it could be interesting to test whether similar connectivity patterns develop in networks that are not initialized with an all-to-all random connectivity or to use ablation experiments to investigate whether the presence of multiple types of modulations confers any sort of robustness to the network.

      Thank you for the great suggestions. By perturbations, we intended to explore the contribution of interaction between certain subpopulations. We tried ablation experiments, but the result was not significant. Probably because the most units were of mixed selectivity, the units of only modulations were not enough for bootstrapping, or the random sampling from single subpopulation (bearing mixed selectivity) could be repeated. We will consider these suggestions carefully in the revised version.

      - The results suggest that the observed changes in motor cortical activity with target velocity result from M1 activity receiving an input that encodes the velocity information. This also appears to be the assumption in the RNN model. However, even though the input shown to the animal during preparation is indeed a continuously moving target, it appears that the only relevant quantity to the actual movement is the final endpoint of the reach. While this would have to be a function of the target velocity, one could imagine that the computation of where the monkeys should reach might be performed upstream of the motor cortex, in which case the actual target velocity would become irrelevant to the final motor output. This makes the results of the paper very interesting, but it would be nice if the authors could discuss further when one might expect to see modulation by sensory information that does not directly affect motor output in M1, and where those inputs may come from. It may also be interesting to discuss how the findings relate to previous work that has found behaviourally irrelevant information is being filtered out from M1 (for instance, Russo et al, Neuron 2020 found that in monkeys performing a cycling task, context can be decoded from SMA but not from M1, and Wang et al, Nature Communications 2019 found that perceptual information could not be decoded from PMd)?

      How and where sensory information modulates M1 are very interesting and open questions. We will discuss further about this topic in the next version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Semenova et al. have studied a large cross-sectional cohort of people living with HIV on suppressive ART, N=115, and performed high dimensional flow cytometry to then search for associations between immunological and clinical parameters and intact/total HIV DNA levels.

      A number of interesting data science/ML approaches were explored on the data and the project seems a serious undertaking. However, like many other studies that have looked for these kinds of associations, there was not a very strong signal. Of course, the goal of unsupervised learning is to find new hypotheses that aren't obvious to human eyes, but I felt in that context, there were (1) results slightly oversold, (2) some questions about methodology in terms mostly of reservoir levels, and (3) results were not sufficiently translated back into meaning in terms of clinical outcomes.

      We appreciate the reviewer’s perspective.  In our revised version of the manuscript, we have attempted to address these concerns by more adequately explaining the limitations of the study and by more thoroughly discussing the context of the findings.  We are not able to associate the findings with specific clinical outcomes for individual study participants but we speculate about the overall biological meaning of these associations across the cohort.  We cannot disagree with the reviewer, but we find the associations statistically significant, potentially reflecting real biological associations, and forming the basis for future hypothesis testing research. 

      Strengths:

      The study is evidently a large and impressive undertaking and combines many cutting-edge statistical techniques with a comprehensive experimental cohort of people living with HIV, notably inclusive of populations underrepresented in HIV science. A number of intriguing hypotheses are put forward that could be explored further. Sharing the data could create a useful repository for more specific analyses.

      We thank the reviewer for this assessment.

      Weaknesses:

      Despite the detailed experiments and methods, there was not a very strong signal for the variable(s) predicting HIV reservoir size. The Spearman coefficients are ~0.3, (somewhat weak, and acknowledged as such) and predictive models reach 70-80% prediction levels, though sometimes categorical variables are challenging to interpret.

      We agree with the reviewer that individual parameters are only weakly correlated with the HIV reservoir, likely reflecting the complex and multi-factorial nature of reservoir/immune cell interactions.  Nevertheless, these associations are statistically significant and form the basis for functional testing in viral persistence.

      There are some questions about methodology, as well as some conclusions that are not completely supported by results, or at minimum not sufficiently contextualized in terms of clinical significance.  On associations: the false discovery rate correction was set at 5%, but data appear underdetermined with fewer observations than variables (144vars > 115ppts), and it isn't always clear if/when variables are related (e.g inverses of one another, for instance, %CD4 and %CD8).

      When deriving a list of cell populations whose frequency would be correlated with the reservoir, we focused on well-defined cell types for which functional validation exists in the literature to consider them as distinct cell types.  For many of the populations, gating based on combinations of multiple markers leads to recovery of very few cells, and so we excluded some potential combinations from the analysis.  We are also making our raw data available for others to examine and find associations not considered by our manuscript.

      The modeling of reservoir size was unusual, typically intact and defective HIV DNA are analyzed on a log10 scale (both for decays and predicting rebound). Also, sometimes in this analysis levels are normalized (presumably to max/min?, e.g. S5), and given the large within-host variation of level we see in other works, it is not trivial to predict any downstream impact of normalization across population vs within-person.

      We have repeated the analysis using log10 transformed data and the new figures are shown in Figure 1 and S2-S5.

      Also, the qualitative characterization of low/high reservoir is not standard and naturally will split by early/later ART if done as above/below median. Given the continuous nature of these data, it seems throughout that predicting above/below median is a little hard to translate into clinical meaning.

      Our ML models included time before ART as a variable in the analysis, and this was not found to be a significant driver of the reservoir size associations, except for the percentage of intact proviruses (see Figure 2C). Furthermore, we analyzed whether any of the reservoir correlated immune variables were associated with time on ART and found that, although some immune variables are associated with time on therapy, this was not the case for most of them (Table S4). We agree that it is challenging to translate above or below median into clinical meaning for this cohort, but we emphasize that this study is primarily a hypothesis generating approach requiring additional validation for the associations observed.  We attempted to predict reservoir size as a continuous variable using the data and this approach was not successful (Figure S13). We believe that a significantly larger cohort will likely be required to generate a ML model that can accurately predict the reservoir as a continuous variable.  We have added additional discussion of this to the manuscript.

      Lastly, the work is comprehensive and appears solid, but the code was not shared to see how calculations were performed.

      We now provide a link to the code used to perform the analyses in the manuscript, https://github.com/lesiasemenova/ML_HIV_reservoir.

      Reviewer #2 (Public Review):

      Summary:

      Semenova et. al., performed a cross-sectional analysis of host immunophenotypes (using flow cytometry) and the peripheral CD4+ T cell HIV reservoir size (using the Intact Proviral DNA Assay, IPDA) from 115 people with HIV (PWH) on ART. The study mostly highlights the machine learning methods applied to these host and viral reservoir datasets but fails to interpret these complex analyses into (clinically, biologically) interpretable findings. For these reasons, the direct translational take-home message from this work is lost amidst a large list of findings (shown as clusters of associated markers) and sentences such as "this study highlights the utility of machine learning approaches to identify otherwise imperceptible global patterns" - lead to overinterpretation of their data.

      We have addressed the reviewer’s concern by modifications to the manuscript that enhance the interpretation of the findings in a clinical and biological context.

      Strengths:

      Measurement of host immunophenotyping measures (multiparameter flow cytometry) and peripheral HIV reservoir size (IPDA) from 115 PWH on ART.

      Major Weaknesses:

      (1) Overall, there is little to no interpretability of their machine learning analyses; findings appear as a "laundry list" of parameters with no interpretation of the estimated effect size and directionality of the observed associations. For example, Figure 2 might actually give an interpretation of each X increase in immunophenotyping parameter, we saw a Y increase/decrease in HIV reservoir measure.

      We have added additional text to the manuscript in which we attempt to provide more immunological and clinical interpretation of the associations.  We also have emphasized that these associations are still speculative and will require additional validation.  Nevertheless, our data should provide a rich source of new hypotheses regarding immune system/reservoir interaction that could be tested in future work.

      (2) The correlations all appear to be relatively weak, with most Spearman R in the 0.30 range or so.

      We agree with the review that the associations are mostly weak, consistent with previous studies in this area.  This likely is an inherent feature of the underlying biology – the reservoir is likely associated with the immune system in complex ways and involves stochastic processes that will limit the predictability of reservoir size using any single immune parameter. We have added additional text to the manuscript to make this point clearer.

      (3) The Discussion needs further work to help guide the reader. The sentence: "The correlative results from this present study corroborate many of these studies, and provide additional insights" is broad. The authors should spend some time here to clearly describe the prior literature (e.g., describe the strength and direction of the association observed in prior work linking PD-1 and HIV reservoir size, as well as specify which type of HIV reservoir measures were analyzed in these earlier studies, etc.) and how the current findings add to or are in contrast to those prior findings.

      We have added additional text to the manuscript to help guide the readers through the possible biological significance of the findings and the context with respect to prior literature.

      (4) The most interesting finding is buried on page 12 in the Discussion: "Uniquely, however, CD127 expression on CD4 T cells was significantly inversely associated with intact reservoir frequency." The authors should highlight this in the abstract, and title, and move this up in the Discussion. The paper describes a very high dimensional analysis and the key takeaways are not clear; the more the author can point the reader to the take-home points, the better their findings can have translatability to future follow-up mechanistic and/or validation studies.

      We appreciate the reviewer’s comment.  We have increased the emphasis on this finding in the revised version of the manuscript.

      (5) The authors should avoid overinterpretation of these results. For example in the Discussion on page 13 "The existence of two distinct clusters of PWH with different immune features and reservoir characteristics could have important implications for HIV cure strategies - these two groups may respond differently to a given approach, and cluster membership may need to be considered to optimize a given strategy." It is highly unlikely that future studies will be performing the breadth of parameters resulting here and then use these directly for optimizing therapy.

      Our analyses indicate that membership of study participants in cluster1 or cluster 2 can be fairly accurately determined by a small number of individual parameters (KLRG1 etc, Figure 4F), and measuring the cells of PWH with the degree of breadth used in this paper would not be necessary to classify PWH into these clusters.  As such, we feel that it is not unrealistic to speculate that this finding could turn out to be clinically useful, if it becomes clear that the clusters are biologically meaningful.

      (6) There are only TWO limitations listed here: cross-sectional study design and the use of peripheral blood samples. (The subsequent paragraph notes an additional weakness which is misclassification of intact sequences by IPDA). This is a very limited discussion and highlights the need to more critically evaluate their study for potential weaknesses.

      We have expanded on the list of limitations discussed in the manuscript. In particular, we now address the size of the cohort, the composition with respect to different genders and demographics, lack of information for the timing of ART and the lack of information regarding intracellular transcriptional pathways.

      (7) A major clinical predictor of HIV reservoir size and decay is the timing of ART initiation. The authors should include these (as well as other clinical covariate data - see #12 below) in their analyses and/or describe as limitations of their study.

      All of the participants that make up our cohort were treated during chronic infection, and the precise timing of ART initiation is unclear in most of these cases.  We have added additional information to explain this in the manuscript and include this in the list of limitations.

      Reviewer #3 (Public Review):

      Summary:

      This valuable study by Semenova and colleagues describes a large cross-sectional cohort of 115 individuals on ART. Participants contributed a single blood sample which underwent IPDA, and 25-color flow with various markers (pre and post-stimulation). The authors then used clustering, decision tree analyses, and machine learning to look for correlations between these immunophenotypic markers and several measures of HIV reservoir volume. They identified two distinct clusters that can be somewhat differentiated based on total HIV DNA level, intact HIV DNA level, and multiple T cell cellular markers of activation and exhaustion.

      The conclusions of the paper are supported by the data but the relationships between independent and dependent variables in the models are correlative with no mechanistic work to determine causality. It is unclear in most cases whether confounding variables could explain these correlations. If there is causality, then the data is not sufficient to infer directionality (ie does the immune environment impact the HIV reservoir or vice versa or both?). In addition, even with sophisticated and appropriate machine learning approaches, the models are not terribly predictive or highly correlated. For these reasons, the study is very much hypothesis-generating and will not impact cure strategies or HIV reservoir measurement strategies in the short term.

      We appreciate the reviewer’s comments regarding the value of our study.  We fully acknowledge that the causal nature and directionality of these associations are not yet clear and agree that the study is primarily hypothesis generating in nature.  Nevertheless, we feel that the hypotheses generated will be valuable to the field.  We have added additional text to the manuscript to emphasize the hypothesis generating nature of this paper.

      Strengths:

      The study cohort is large and diverse in terms of key input variables such as age, gender, and duration of ART. Selection of immune assays is appropriate. The authors used a wide array of bioinformatic approaches to examine correlations in the data. The paper was generally well-written and appropriately referenced.

      Weaknesses:

      (1) The major limitation of this work is that it is highly exploratory and not hypothesis-driven. While some interesting correlations are identified, these are clearly hypothesis-generating based on the observational study design.

      We agree that the major goal of this study was hypothesis generating and that our work is exploratory in nature. Performing experiments with mechanism testing goals in human participants with HIV is challenging.  Additionally, before such mechanistic studies can be undertaken, one must have hypotheses to test. As such we feel our study will be useful for the field in helping to identify hypotheses that could potentially be tested.

      (2) The study's cross-sectional nature limits the ability to make mechanistic inferences about reservoir persistence. For instance, it would be very interesting to know whether the reservoir cluster is a feature of an individual throughout ART, or whether this outcome is dynamic over time.

      We agree with the reviewer’s comment. Longitudinal studies are challenging to carry out with a study cohort of this size, and addressing questions such as the one raised by the reviewer would be of great interest. We believe our study nevertheless has value in identifying hypotheses that could be tested in a longitudinal study.

      (3) A fundamental issue is that I am concerned that binarizing the 3 reservoir metrics in a 50/50 fashion is for statistical convenience. First, by converting a continuous outcome into a simple binary outcome, the authors lose significant amounts of quantitative information. Second, the low and high reservoir outcomes are not actually demonstrated to be clinically meaningful: I presume that both contain many (?all) data points above levels where rebound would be expected soon after interruption of ART. Reservoir levels would also have no apparent outcome on the selection of cure approaches. Overall, dividing at the median seems biologically arbitrary to me.

      The reviewer raises a valid point that the clinical significance of above or below median reservoir metrics is unclear, and that the size of the reservoir has potentially little relation to rebound and cure approaches.  In the manuscript, we attempted to generate models that can predict reservoir size as a continuous variable in Figure S13 and find that this approach performs poorly, while a binarized approach was more successful. As such we have included both approaches in the manuscript.  It is possible that future studies with larger sample sizes and more detailed measurements will perform better for continuous variable prediction.  While this is a fairly large study (n=115) by the standards of HIV reservoir analyses, it is a small study by the standards of the machine learning field, and accurate predictive ML models for reservoir size as a continuous variable will likely require a much larger set of samples/participants.  Nevertheless, we feel our work has value as a template for ML approaches that may be informative for understanding HIV/immune interactions and generates novel hypotheses that could be validated by subsequent studies.

      (4) The two reservoir clusters are of potential interest as high total and intact with low % intact are discriminated somewhat by immune activation and exhaustion. This was the most interesting finding to me, but it is difficult to know whether this clustering is due to age, time on ART, other co-morbidity, ART adherence, or other possible unmeasured confounding variables.

      We agree that this finding is one of the more interesting outcomes of the study. We examined a number of these variables for association with cluster membership, and these data are reported in Figure S8A-D.  Age, years of ART and CD4 Nadir were all clearly different between the clusters.   The striking feature of this clustering, however, is the clear separation between the two groups of participants, as opposed to a continuous gradient of phenotypes.  This could reflect a bifurcation of outcomes for people with HIV, dynamic changes in the reservoir immune interactions over time, or different levels of untreated infection.  It is certainly possible that some other unmeasured confounding variables contribute to this outcome and we have attempted to make this limitation clearer.

      (5) At the individual level, there is substantial overlap between clusters according to total, intact, and % intact between the clusters. Therefore, the claim in the discussion that these 2 cluster phenotypes may require different therapeutic approaches seems rather speculative. That said, the discussion is very thoughtful about how these 2 clusters may develop with consideration of the initial insult of untreated infection and / or differences in immune recovery.

      We agree with the reviewer that this claim is speculative, and we have attempted to moderate the language of the text in the revised version.

      (6) The authors state that the machine learning algorithms allow for reasonable prediction of reservoir volume. It is subjective, but to me, 70% accuracy is very low. This is not a disappointing finding per se. The authors did their best with the available data. It is informative that the machine learning algorithms cannot reliably discriminate reservoir volume despite substantial amounts of input data. This implies that either key explanatory variables were not included in the models (such as viral genotype, host immune phenotype, and comorbidities) or that the outcome for testing the models is not meaningful (which may be possible with an arbitrary 50/50 split in the data relative to median HIV DNA volumes: see above).

      We acknowledge that the predictive power of the models generated from these data is modest and we have clarified this point in the revised manuscript. As the reviewer indicates, this may result from the influence of unmeasured variables and possible stochastic processes.  The data may thus demonstrate a limit to the predictability of reservoir size which may be inherent to the underlying biology.  As we mention above, this study size (n-115) is fairly small for the application of ML methods, and an increased sample size will likely improve the accuracy of the models. At this stage, the models we describe are not yet useful as predictive clinical tools, but are still nonetheless useful as tools to describe the structure of the data and identify reservoir associated immune cell types.

      (7) The decision tree is innovative and a useful addition, but does not provide enough discriminatory information to imply causality, mechanism, or directionality in terms of whether the immune phenotype is impacting the reservoir or vice versa or both. Tree accuracy of 80% is marginal for a decision tool.

      The reviewer is correct about these points.  In the revised manuscript, we have attempted to make it clear that we are not yet advocating using this approach as a decision tool, but simply a way to visualize the data and understand the structure of the dataset.  As we discuss above, the models will likely need to be trained on a larger dataset and achieve higher accuracy before use as a decision tool.

      (8) Figure 2: this is not a weakness of the analysis but I have a question about interpretation. If total HIV DNA is more predictive of immune phenotype than intact HIV DNA, does this potentially implicate a prior high burden of viral replication (high viral load &/or more prolonged time off ART) rather than ongoing reservoir stimulation as a contributor to immune phenotype? A similar thought could be applied to the fact that clustering could only be detected when applied to total HIV DNA-associated features. Many investigators do not consider defective HIV DNA to be "part of the reservoir" so it is interesting to speculate why these defective viruses appear to have more correlation with immunophenotype than intact viruses.

      We agree with the reviewer that this observation could reflect prior viral burden and we have added additional text to make this clearer.  Even so, we cannot rule out a model in which defective viral DNA is engaged in ongoing stimulation of the immune system during ART, leading to the stronger association between total DNA and the immune cell phenotypes. We hypothesize that the defective proviruses could potentially be triggering innate immune pattern recognition receptors via viral RNA or DNA, and a higher burden of the total reservoir leads to a stronger apparent association with the immune phenotype.  We have included text in the discussion about this hypothesis.

      (9) Overall, the authors need to do an even more careful job of emphasizing that these are all just correlations. For instance, HIV DNA cannot be proven to have a causal effect on the immunophenotype of the host with this study design. Similarly, immunophenotype may be affecting HIV DNA or the correlations between the two variables could be entirely due to a separate confounding variable

      We have revised the text of the manuscript to emphasize this point, and we acknowledge that any causal relationships are, at this point, simply speculation. 

      (10) In general, in the intro, when the authors refer to the immune system, they do not consistently differentiate whether they are referring to the anti-HIV immune response, the reservoir itself, or both. More specifically, the sentence in the introduction listing various causes of immune activation should have citations. (To my knowledge, there is no study to date that definitively links proviral expression from reservoir cells in vivo to immune activation as it is next to impossible to remove the confounding possible imprint of previous HIV replication.) Similarly, it is worth mentioning that the depletion of intact proviruses is quite slow such that provial expression can only be stimulating the immune system at a low level. Similarly, the statement "Viral protein expression during therapy likely maintains antigen-specific cells of the adaptive immune system" seems hard to dissociate from the persistence of immune cells that were reactive to viremia.

      We updated the text of the manuscript to address these points and have added additional citations as per the reviewer’s suggestion.

      (11) Given the many limitations of the study design and the inability of the models to discriminate reservoir volume and phenotype, the limitations section of the discussion seems rather brief.

      We have now expanded the limitations section of the discussion and added additional considerations. We now include a discussion of the study cohort size, composition and the detail provided by the assays.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A few specific comments:

      "This pattern is likely indicative of a more profound association of total HIV DNA with host immunophenotype relative to intact HIV DNA."

      Most studies I have seen (e.g. single cell from Lictherfeld/Yu group) show intact proviruses are generally more activated/detectable/susceptible to immune selection, so I have a hard time thinking defective proviruses are actually more affected by immunotype.

      We hypothesize that this association is actually occurring in the opposite direction – that the defective provirus are having a greater impact on the immune phenotype, due to their greater number and potential ability to engage innate or adaptive immune receptors. We have clarified this point in the manuscript

      "The existence of two distinct clusters of PWH with different immune features and reservoir characteristics could have important implications for HIV cure strategies - these two groups may respond differently to a given approach, and cluster membership may need to be considered to optimize a given strategy."

      I find this a bit of a reach, given that the definition of 2 categories depended on the total size.

      We have modified the language of this section to reduce the level of speculation.

      "This study is cross-sectional in nature and is primarily observational, so caution should be used interpreting findings associated with time on therapy".

      I found this an interesting statement because ultimately time on ART shows up throughout the analysis as a significant predictor, do you mean something about how time on ART could indicate other confounding variables like ART regimen or something?

      We have rephrased this comment to avoid confusion.  We were simply trying to make the point that we should avoid speculating about longitudinal dynamics from cross sectional data.

      "As expected, the plots showed no significant correlation for intact HIV DNA versus years of ART (Figure 1B), while total reservoir size was positively correlated with the time of ART (Figure 1A, Spearman r = 0.31)."<br />  Is this expected? Studies with longitudinal data almost uniformly show intact decay, at least for the first 10 or so years of ART, and defective/total stability (or slight decay). Also probably "time on ART" to not confuse with the duration of infection before ART.

      We have updated the language of this section to address this comment.  We have avoided comparing our data with respect to time on ART to longitudinal studies for reasons given above.

      On dimensionality reduction, as this PaCMAP seems a relatively new technique (vs tSNE and UMAP which are more standard, but absolutely have their weaknesses), it does seem important to contextualize. I think it would still be useful to show PCA and asses the % variance of each additional dimension to assess the effective dimensionality, it would be helpful to show a plot of % variance by # components to see if there is a cutoff somewhere, and if PaCMAP is really picking this up to determine the 2 dimensions/2 clusters is ideal. Figure 4B ultimately shows a lot of low/high across those clusters, and since low/high is defined categorically it's hard to know which of those dots are very close to the other categories.

      We have added this analysis to the manuscript – found in Figure S9. The PCA plot indicates that members of the two clusters also separate on PCA although this separation is not as clear as for the PaCMAP plot.

      Minor comments on writing etc:

      Intro

      -Needs some references on immune activation sequelae paragraph.

      We have added some additional references to this section.

      -"promote the entry of recently infected cells into the reservoir" -- that is only one possible mechanistic explanation, it's not unreasonable but it seems important to keep options open until we have more precise data that can illuminate the mechanism of the overabundance.

      We have modified the text to discuss additional hypotheses.

      -You might also reference Pankau et al Ppath for viral seeding near the time of ART.

      We have added this reference.

      -"Viral protein expression during therapy likely maintains antigen-specific cells of the adaptive immune system" - this was unclear to me, do you mean HIV-specific cells that act against HIV during ART? I think most studies show immunity against HIV (CD8 and CD4) wanes over time during ART.

      The Goonetilleke lab has recently generated data indicating that antiviral T cell responses are remarkably stable over time on ART, but we agree with the reviewer that the idea that ongoing antigen expression in the reservoir maintains these cells is speculative.  We have modified the text to make this point clearer.

      -Overall I think the introduction lacked a little bit of definitional precision: i.e. is the reservoir intact vs replication competent vs all HIV DNA and whether we are talking about PWH on long-term ART and how long we should be imagining? The first years of ART are certainly different than later, in terms of dynamics. The ultimate implications are likely specific for some of these categorizations.

      -"persistent sequelae of the massive disruptions to T cell homeostasis and lymphoid structures that occur during untreated HIV infection" needs a lot more context/referencing. For instance, Peter Hunt showed a decrease in activation after ART a long time ago.

      -Heather Best et al show T cell clonality stays perturbed after ART.

      We have updated the text of the introduction and added references to address the reviewer’s comments.

      Results

      -It would be important to mention the race of participants and any information about expected clades of acquired viruses, this gets mentioned eventually with reference to the Table but the breakdown would be helpful right away.

      We have added this information to the results section.

      -"performed Spearman correlations", may be calculated or tested?

      We have corrected the language for this sentence.

      Comments on figures:

      -Figure 1 data on linear scale (re discussion above) -- hard to even tell if there is a decay (to match with all we know from various long-term ART studies).

      -Figure 4 data is shown on ln (log_e) scale, which is hard to interpret for most people.

      -Figures 4 C,D, and E should have box plots to visually assess the significance.

      -Figure 4B legend says purple/pink but I think the colors are different in the plot, could be about transparency

      -Figure 5 it is now not clear if log_e(?).

      -Figure 6 "HIV reservoir characteristics" might be better to make this more explicit. Do you mean for instance in the 6B title Total HIV DNA per million CD4+ T cells I think?

      We have made these modifications.

      Reviewer #2 (Recommendations For The Authors):

      Minor Weaknesses:

      (1) The Introduction is too long and much of the text is not directly related to the study's research question and design.

      We have streamlined the introduction in the revised manuscript.

      (2) While no differences were seen by age or race, according to the authors, this is unlikely to be useful since the numbers are so small in some of these subcategories. Results from sensitivity analyses (e.g., excluding these individuals) may be more informative/useful.

      We agree that the lower numbers of participants for some subgroupings makes it challenging to know for sure if there are any differences based on these variables.  Have added text to clarify this. We have added age, race and gender to the LOCO analysis and to the variable inflation importance analysis (Table S5).

      (3) For Figure 4, based on what was described in the Results section of the manuscript, the authors should clarify that the figures show results for TOTAL HIV DNA only (not intact DNA): "Dimension reduction machine learning approaches identified two robust clusters of PWH when using total HIV DNA reservoir-associated immune cell frequencies (Figure 4A), but not for intact or percentage intact HIV DNA (Figure 4B and 4C)".

      We have added this information.

      (4) The statement on page 5, first paragraph, "Interestingly, when we examined a plot of percent intact proviruses versus time on therapy (Figure 1C), we observed a biphasic decay pattern," is not new (Peluso JCI Insight 2020, Gandhi JID 2023, McMyn JCI 2023). Prior studies have clearly demonstrated this biphasic pattern and should be cited here, and the sentence should be reworded with something like "consistent with prior work", etc.

      We have added citations to these studies and rephrased this comment.

      (5) The Cohort and sample collection sections are somewhat thin. Further details on the cohort details should include at the very minimum some description of the timing of ART initiation (is this mostly a chronic-treated cohort?) and important covariate data such as nadir CD4+ T cell count, pre-ART viral load, duration of ART suppression, etc.

      The cohort was treated during chronic infection, and we have clarified this in the manuscript.  Information regarding CD4 nadir and years on ART are included in Table 1.  Unfortunately, pre-ART viral load was not available for most members of this cohort, so we did not use it for analyses. The partial pre-ART viral load data is included with the dataset we are making publicly available.

      Reviewer #3 (Recommendations For The Authors):

      Minor points:

      (1) What is meant by CD4 nadir? Is this during primary infection or the time before ART initiation?

      We have clarified this description in the manuscript.  This term refers to the lowest CD4 count recorded during untreated infection.

      (2) The authors claim that determinants of reservoir size are starting to emerge but other than the timing of ART, I am not sure what studies they are referring to.

      We have updated the language of this section.  We intended to refer to studies looking at correlates of reservoir size, and feel that this is a more appropriate term that ‘determinants’

      (3) The discussion does not tie in the model-generated hypotheses with the known mechanisms that sustain the reservoir: clonal proliferation balanced by death and subset differentiation. It would be interesting to tie in the proposed reservoir clusters with these known mechanisms.

      We have added additional text to the manuscript to address these mechanisms.

      (4) Figure 1: Total should be listed as total HIV DNA.

      We have updated this in the manuscript.

      (5) Figure 1C: Worth mentioning the paper by Reeves et al which raises the possibility that the flattening of intact HIV DNA at 9 years may be spurious due to small levels of misclassification of defective as intact.

      We have added this reference.

      (6) "Total reservoir frequency" should be "total HIV DNA concentration"

      We respectfully feel that “frequency” is a more accurate term than “concentration”, since we are expressing the reservoir as a fraction of the CD4 T cells, while “concentration” suggests a denominator of volume.

      (7) Figure S2-5: label y-axis total HIV DNA.

      We have updated this figure.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Rebuttal_ Preprint- #RC-2023-02144

      First of all we would like to thank the three reviewers for their constructive and positive comments and suggestions, and the time spent in reviewing our manuscript. Their suggestions and comments had contributed to improve our manuscript. We feel the manuscript is much strengthened by this revision.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      __Summary:____ __The manuscript by Dabsan et al builds on earlier work of the Igbaria lab, who showed that ER-luminal chaperones can be refluxed into the cytosol (ERCYS) during ER stress, which constitutes a pro-survival pathway potentially used by cancer cells. In the current work, they extent these observations and a role for DNAJB12&14 in ERCYS. The work is interesting and the topic is novel and of great relevance for the proteostasis community. I have a number of technical comments:

      We thank the reviewer for his/her positive comments on our manuscript.


      __Major and minor comments: __

      1- In the description of Figure 2, statistics is only show to compare untreated condition with those treated with Tg or Tm, but no comparison between condition and different proteins. As such, the statement made by the authors "...DNAJB14-silenced cells were only affected in AGR2 but not in DNAJB11 or HYOU1 cytosolic accumulation" cannot be made.

      Answer: We totally agree with the reviewer#1. The aim of this figure is to show that during ER stress, a subset of ER proteins are refluxed to the cytosol. This is happening in cells expressing DNAJB12 and DNAJB14. We are not comparing the identity of the expelled proteins between DNAJB12-KD cells and DNAJB14-KD cells, This is not the scoop of this paper as such the statement was removed.

      2- Figure S2C: D11 seems to increase in the cytosolic fraction after Tm and Tg treatment. However, this is not reflected in the text. The membrane fraction also increases in the DKO. Is the increase of D11 in both cytosol and membrane and indication for a transcriptional induction of this protein by Tm/Tg? Again, the authors are not reflecting on this in their text.

      Answer: We performed qPCR experiments in control, DNAJB12-KD, DNAJB14-KD and in the DNAJB12/DNAJB14 double knock down cells (in both A549 and PC3 cells) to follow the mRNA levels of DNAJB11. As shown in (Figure S2F-S2N), there is no increase in the mRNA levels of DNAJB11, AGR2 or HYOU1 in the different cells in normal (unstressed conditions). Upon ER stress with tunicamycin or thapsigargin there is a little increase in the mRNA levels of HYOU1 and AGR2 but not in DNAJB11 mRNA levels. On the other hand, we also performed western blot analysis and we did not detect any difference between the different knockdown cells when we analyzed the levels of DNAJB11 compared to GAPDH. Those data are now added as (Figure S2F-S2N).

      We must note that although AGR2 and HYOU1 are induced at the mRNA as a result of ER stress, the data with the overexpression of DNAJB12 and DNAJB14 are important as control experiments because when DNAJB12 is overexpressed it doesn’t inducing the ER stress (Figure S3C-S3D). In those conditions there is an increase of the cytosolic accumulation of AGR2, HYOU1 and DNAJB11 despite that there was no induction of AGR2, HYOU1 or DNAJB11 (Figure 3C and Figure 3E, Figure S3, Figure 4, and Figure S4) . Those results argue against the idea that the reflux is a result of protein induction and an increase in the total proteins levels.

      3- Figure 2D: Only p21 is quantified. phospho-p53 and p53 levels are not quantified.


      Answer: We added the quantification of phospho-p53 and the p53 levels to (Figure 2E-G). Additional blots of the P21, phosphor-p53 and p53 now added to FigureS2O.

      4- Figure 2D: There appears to be a labelling error

      Answer: Yes, the labelling error was corrected.

      5- Are there conditions where DNAJB12 would be higher?

      Answer: In some cancer types there is a higher DNAJB12, DNAJB14 and SGTA expression levels that are associated with poor prognosis and reduced survival (New Figure S6E-M). The following were added to the manuscript: “Finally, we tested the effect of DNAJB12, DNAJB14, and SGTA expression levels on the survival of cancer patients. A high copy number of DNAJB12 is an unfavorable marker in colorectal cancer and in head and neck cancer because it is associated with poor prognosis in those patients (Figure S6E). A high copy number of DNAJB12, DNAJB14, and SGTA is associated with poor prognosis in many other cancer types, including colon adenocarcinoma (COAD), acute myeloid leukemia (LAML), adrenocortical carcinoma (ACC), mesothelioma (MESO), and Pheochromocytoma and paraganglioma (PCPG) (Figure S6F-M). In uveal melanoma (UVM), a high copy number of the three tested genes, DNAJB12, DNAJB14, and SGTA, are associated with poor prognosis and poor survival (Figure S6I, S6J, and S6M). The high copy number of DNAJB12, DNAJB14, and SGTA is also associated with poor prognosis in many other cancer types but with low significant scores. More data is needed to make significant differences (TCGA database). We suggest that the high expression of DNAJB12/14 and SGTA in those cancer types may account for the poor prognosis by inducing ERCYS and inhibiting pro-apoptotic signaling, increasing cancer cells' fitness.

      6- What do the authors mean by "just by mass action"?

      Answer: Mass action means increasing the amount of the protein (overexpression). We corrected this in the main text to overexpression.

      7- Figure 3C: Should be labelled to indicate membrane and cytosolic fraction. The AGR2 blot in the left part is not publication quality and should be replaced.

      Answer: We added the labelling to indicate cytosolic and membrane fractions to Figure 3C. We re-blotted the AGR2, new blot of AGR2 was added.

      8- What could be the reason for the fact that DNAJB12 is necessary and sufficient for ERCYS, while DNAJB14 is only necessary?

      Answer: Because of their very high homology, we speculate that the two proteins have partial redundancy. Partial because we believe that some of the roles of DNAJB12 cannot be carried by DNAJB14 in its absence. Although they are highly homologous, we expect that they probably have different affinities in recruiting other factors that are necessary for the reflux of proteins.

      We further developed around this point in the discussion and the main text.

      9- Figure 5A: Is the interaction between SGTA and JB12 UPR-independent?HCS70 seems to show only background binding. The interaction of JB12 with SGTA is not convincing. A better blot is needed.

      Answer: In the conditions of Figure 5A, we did not observe any induction of the UPR (Figure S3C-D). Thus, we concluded that in those condition of overexpression, DNAJB12 interacts with SGTA in UPR independent manner.

      We repeated this experiment another 3 times with very high number of cells (2X15cm2 culture dishes for each condition) and instead of coimmunoprecipitating with DNAJB12 antibodies we IP-ed with FLAG-beads, the results are very clear as shown in the new Figure 5A compared to Figure S5A.

      10- Figure 5B: the expression of DNAJB14 was induced by Tg50, but not by Tg25 or Tm. However, the authors have not commented on this. This should be mentioned in the text and discussed.

      Answer: In most of the experiments we did not see an increase in DNAJB14 upon ER stress except in this replicate. To be sure we looked at the DNAJB14 levels upon ER stress by protein and qPCR experiment as shown in new (in the Input of Figure 5 and Figure S5) and (Figure S5H-I). We also added new IP experiments in Figure 5 and Figure S5.

      11- Figure 6A: Why is a double knockdown important at all? DNAJB14 does not seem to do much at all (neither in overexpression nor with single knockdown).

      Answer: the data shows that DNAJB12 can compensate for the lack of DNAJB14 while DNAJB14 can only partially compensate for some of the DNAJB12 functions. DNAJB12 could have higher affinity to recruit other factor needed for the reflux process and thus the impact of DNAJB12 is higher. In summary, neither DNAJB12 or DNAJB14 is essential in the single knockdown which means that they compensate for each other. In the overexpression experiment, it is enough to have the endogenous DNAJB14 for the DNAJB12 activity. When DNAJB14 is overexpressed at very high levels, we believe that it binds to some factors that are needed for proper DNAJB12 activity (Figure 4 showing that the WT-DNAJB14 inhibits ER-stress induced ER protein reflux when overexpressed). We believe that DNAJB14 is important because only when we knock both DNAJB12 and DNAJB14 we see an effect on the ER-protein reflux. DNAJB14 is part of a complex of DNAJB12/HSC70 and DSGTA.

      (DNAJB12 is sufficient while DNAJB14 is not- please refer to point #8 above).

      **Referees cross-commenting**

      I agree with the comments raised by reviewer 1 about the manuscript. I also agree with the points written in this consultation session. In my opinion, the comments of reviewer 2 are phrased in a harsh tone and thus the reviewer reaches the conclusion that there are "serious" problems with this manuscript. However, I think that the authors could address many of the points of this reviewer in a matter of 3 months easily. For instance, it is easy to control for the expression levels of exogenous wild type and mutant D12 and compare it to the endogenous one (point 3). This is a very good point of this reviewer and I agree with this experiment. Likewise, it is easy to provide data about the levels of AGR2 to address the concern whether its synthesis is affected by D12 and D14 overexpression. Again, an excellent suggestion, but no reason for rejecting the story. As for not citing the literature, I think this can also easily be addressed and I am sure that this is just an oversight and no ill intention by the authors. __Overall, I am unable to see why the reviewer reaches such a negative verdict about this work. With proper revisions that might take 3 months, I think the points of all reviewers can be addressed. __

      Reviewer #1 (Significance (Required)):

      Significance: The strength of the work is that it provides further mechanistic insight into a novel cellular phenomenon (ERCYS). The functions for DNAJB12&14 are unprecedented and therefore of great interest for the proteostasis community. Potentially, the work is also of interest for cancer researchers, who might capitalize of the ERCYS to establish DNAJB12/14 as novel therapeutic targets. The major weaknesses are as follows: (i) the work is limited to a single cell line. To better probe the cancer relevance, the work should have used at least a panel of cell lines from one (or more) cancer entity. Ideally even data from patient derived samples would have been nice. Having said this, I also appreciate that the work is primarily in the field of cell biology and the cancer-centric work could be done by others. Certainly, the current work could inspire cancer specialists to explore the relevance of ERCYS. (ii) No physiological or pathological condition is shown where DNAJB12 is induced or depleted.

      Answer: We previously showed that ERCYS is conserved in many different cell lines including A549, MCF7, GL-261, U87, HEK293T, MRC5 and others and is also conserved in murine models of GBM (GL-261 and U87 derived tumors) and human patients with GBM (Sicari et al. 2021). Here, we tested the reflux process and the IP experiments in many different cell lines including A549, MCF-7, PC3 and Trex-293 cells. We also added new fractionation experiment in DNAJB12 and DNAJB14 -depleted MCF-7, PC3 and A549 cells. We added all those data to the revised version.

      We also added survival curves from the TCGA database showing that high copy number of DNAB12, DNAJB14 and SGTA are associated with poor prognosis compared to conditions where DNAJB12, DNAJB14, and SGTA are at low copy number (Figure S6E-M). Finally, we included immunofluorescent experiment to show that the interaction between the refluxed AGR2 and the cytosolic SGTA occurs in tumors collected from patients with colorectal cancer patients (Figure S5F-G) compared to non-cancerous tissue.

      This study is highly significant and is relevant not only to cancer but for other pathways that may behave in similar manner. For instance, DNAJB12 and DNAJB14 are part of the mechanism that is used by non-envelope viruses to escape the ER to the cytosol. Thus, the role of those DNAJB proteins seems to be mainly in the reflux of functional (not misfolded) proteins from the ER to the cytosol. We reported earlier that the UDP-Glucose-Glucosyl Transferase 1 (UGGT1) is also expelled during ER stress. UGGT1 is important because it is redeploy to the cytosol during enterovirus A71 (EA71) infection to help viral RNA synthesis (Huang et al, 2017). This redeployment of EAA71 is similar to what happens during the reflux process because on one hand, UGGT1 exit the ER by an ER stress mediated process (Sicari et al. 2021) and it is also a functional in the cytosol as a proteins which help viral RNA synthesis ((Huang et al, 2017). All those data showing that there is more of DNAJB12, DNAJB14, DNAJC14, DNAJC30 and DNAJC18 that still needs to be explored in addition to what is published. Thus, we suggest that viruses hijacked this evolutionary conserved machinery and succeeded to use it in order to escape the ER to the cytosol in a manner that depends on all the component needed for ER protein reflux.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors present a study in which they ascribe a role for a complex containing DNAJB12/14-Hsc70-SGTA in facilitating reflux of a AGR2 from the ER to cytosol during ER-stress. This function is proposed to inhibit wt-P53 during ER-stress.

      Concerns: 1. The way the manuscript is written gives the impression that this is the first study about mammalian homologs of yeast HLJ1, while there are instead multiple published papers on mammalian orthologs of HLJ1. Section 1 and Figure 1 of the results section is redundant with a collection of previously published manuscripts and reviews. The lack of proper citation and discussion of previous literature prevents the reader from evaluating the results presented here, compared to those in the literature.

      Answer: We highly appreciate the reviewer’s comments. This paper is not to show that DNAJB12 and DNAJB14 are the orthologues of HLJ-1 but rather to show that DNAJB12 and DNAJB14 are part of a mechanism that we recently discovered and called ERCYS that cause proteins to be refluxed out of the ER. A mechanism that is regulated in by HLJ-1 in yeast. ERCYS is an adaptive and pro-survival mechanism that results in increased chemoresistance and survival in cancer cells. The papers that reviewer #2 refer to are the ones that report DNAJB12 can replace some of the ER-Associated Degradation (ERAD) functions of HLJ-1 in degradation of membranal proteins such as CFTR. These two mechanism are totally different and the role of the yeast HLJ-1 in degradation of CFTR is not needed for ERCYS. This is because we previously showed that the role of the yeast HLJ-1 and probably its orthologues in ERCYS is independent of their activity in ERAD(Igbaria et al. 2019). Surprisingly, the role of HLJ-1 in refluxing the ER proteins is not only independent of the reported ERAD-functions of HLJ1 and the mammalian DNAJBs but rather proceeds more rigorously when the ERAD is crippled (Igbaria et al. 2019). This role of DNAJBs is unique in cancer cells and is responsible in regulating the activity of p53 during the treatment of DNA damage agents.

      In our current manuscript we show by similarity, functionality, and topological orientation, that DNAJB12 and DNJB14 may be part of a well conserved mechanism to reflux proteins from the ER to the cytosol. A mechanism that is independent of DNAJB12/14’s reported activity in ERAD(Grove et al. 2011; Yamamoto et al. 2010; Youker et al. 2004). In addition, DNAJB12 and DNAJB14 facilitate the escape of non-envelope viruses from the ER to the cytosol in similar way to the reflux process(Goodwin et al. 2011; Igbaria et al. 2019; Sicari et al. 2021). All those data show that HLJ-1 reported function may be only the beginning of our understanding on the role that those orthologues carry and that are different from what is known about their ERAD function.

      Action: We added the references to the main text and discussed the differences between the reported DNAJB12 and HLJ-1 functions to the function of DNAJB12, DNAJB14 and the other DNAJ proteins in the reflux process. We also developed around this in the discussion.

      The conditions used to study DNAJB12 and DNAJ14 function in AGR2 reflux from the ER do not appear to be of physiological relevance. As seen below they involve two transfections and treatment with two cytotoxic drugs over a period of 42 hours. The assay for ERCY is accumulation of lumenal ER proteins in a cytosolic fraction. Yet, there is no data or controls that describe the path taken by AGR2 from the ER to cytosol. It seems like pleotropic damage to the ER due the experimental conditions and accompanying cell death could account for the reported results?

      Transfection of cells with siRNA for DNAJB12 or DNAJB14 with a subsequent 24-hour growth period.

      Transfection of cells with a p53-lucifease reporter.

      Treatment of cells with etoposide for 2-hours to inhibit DNA synthesis and induce p53. D. Treatment of cells for 16 hours with tunicamycin to inhibit addition of N-linked glycans to secretory proteins and cause ER-stress.

      Subcellular fractionation to determine the localization of AGR2, DNAJB11, and HYOU1

      KD of DNAJB12 or DNAJB14 have modest if any impact on AGR2 accumulation in the cytosol. There is an effect of the double KD of DNAJB12 or DNAJB14 on AGR2 accumulation in the cytosol. Yet there are no western blots showing AGR2 levels in the different cells, so it is possible that AGR2 is not synthesized in cells lacking DNAJB12 and DNAKB14. The lack of controls showing the impact of single and double KD or DNAJB12 and DNAJB14 on cell viability and ER-homeostasis make it difficult to interpret the result presented. How many control versus siRNA KD cells survive the protocol used in these assays?


      Answer: Despite the long protocol we see differences between the control cells and the DNAJB-silenced cells in terms of the quantity of the refluxed proteins to the cytosol. The luciferase construct was used to assess the activity of p53 so the step of the second transfection was used only in experiments were we assayed the p53-luciferase activity. The rest of the experiments especially those where we tested the levels of p53 and P21 levels, were performed with one transfection. Moreover, all the experiments with the subcellular protein fractionation were performed after one transfection without the second transfection of the p53-Luciferase reporter. Finally, the protocol of the subcellular protein fractionation requires first to trypsinize the cells to lift them up from the plates, at the time of the experiment the cells were almost at 70-80% confluency and in the right morphology under the microscope.

      Here, we performed XTT assay and Caspase-3 assay to asses cell death at the end of the experiment and before the fractionation assay. We did not observe any differences at this stage between the different cell lines (Figure-RV1 for reviewers Only). This can be explained by the fact that we use low concentrations of Tm and Tg for short time of 16 hour after the pulse of etoposide.

      Finally, the claim that and ER-membrane damage result in a mix between the ER and cytosolic components is not true for the following reasons: (1) In case of mixing we would expect that GAPDH levels in the membrane fraction will be increased and that we do not see, and (2) we used our previously described transmembrane-eroGFP (TM-eroGFP) that harbors a transmembrane domain and is attached to the ER membrane facing the ER lumen. The TM-eroGFP was found to be oxidized in all conditions tested. Those data argue against a rupture of the ER membrane which can results in a mix of the highly reducing cytosolic environment with the highly oxidizing ER environment by the passage of the tripeptide GSH from the cytosol to the ER. All those data argue against (1) cell death, and (2) rupture of the ER membrane. Figure RV1 Reviewers Only.

      Moreover, as it is shown in Figure S2, AGR2 is found in the membrane fraction in all the four different knock downs, thus it is synthesized in all of them. Moreover, we assayed the mRNA levels of AGR2 in all the knockdowns and we so that they are at the same levels in all the 4 different conditions and still AGR2 mRNA levels increase upon ER stress in all of the 4 knockdown cells in different backgrounds (Figure S2F-N).

      In Figure 3 the authors overexpress WT-D12 and H139Q-D12 and examine induction of the p53-reporter. There are no western blots showing the expression levels of WT-D12 and H139Q-D12 relative to endogenous DNAJB12. HLJ1 stands for high-copy lethal DnaJ1 as overexpression of HLJ1 kills yeast. The authors present no controls showing that WT-D12 and H139-D12 are not expressed at toxic levels, so the data presented is difficult to evaluate.

      Answer: The expression levels of the overexpression of DNAJB12 and DNAJB14 were present in the initial submission of the manuscript as Figure S3A and S3B. The data showing the relationship between the expression degree and the viability were also included in the initial submission as Figure S3C (Now S3H).

      There is no mechanistic data used to help explain the putative role DNAJB12 and DNAJB14 in ERCY? In Figure 4, why does H139Q JB12 prevent accumulation of AGR2 in the cytosol? There are no westerns showing the level to which DNAJB12 and DNAJB14 are overexpressed.


      Answer: The data showing the levels of DNAJB12 compared to the endogenous were present in the initial submission as Figure S3A and S3B.

      We suggest a mechanism by which DNAJB12 and DNAJB14 interact (Figure 5 and Figure S5) and oligomerize to expel those proteins in similar way to expelling non-envelope viruses to the cytosol. Thus, when expressing the mutant DNAJB12 H139Q may indicate that the J-domain dead-mutant can still be part of the complex but affects the J-domain activity in this oligomer and thus inhibit ER-protein reflux. In other words, we showed that the H139Q exhibits a dominant negative effect when overexpressed. Moreover, here we added another IP experiment in the D12/D14-DKD cells to show that in the absence of DNAJB12 and DNAJB14, SGTA cannot bind the ER-lumenal proteins because they are not refluxed (Figure 5 and Figure S5). Those data indicate that in order for SGTA bind the refluxed proteins they have to go through the DNAJB12 and DNAJB14 and their absence this interaction does not occur. This explanation was also present in the discussion of the initial submission.

      Mechanistically, we show that AGR2 interacts with DNAJB12/14 which are necessary for its reflux. This mechanism involves the functionality of cytosolic HSP70 chaperones and their cochaperones (SGTA) proteins that are recruited by DNAJB12 and 14. This mechanism is conserved from yeast to mammals. Moreover, by using the alpha-fold prediction tools, we found that AGR2 is predicted to interact with SGTA in the cytosol by the interaction between the cysteines of SGTA and AGR2 in a redox-dependent manner.

      **Referees cross-commenting**

      __ __ I appreciate the comments of the other reviewers. I agree that the authors could revise the manuscript. Yet, based on my concerns about the physiological significance of the process under study and lack of scholarship in the original draft, I would not agree to review a revised version of the paper.

      Answer: Regards the physiological relevance, we showed in our previous study (Sicari et al. 2021) how relevant is ERCYS in human patients of GBM and murine model of GBM. ERCYS is conserved from yeast to human and is constitutively active in GL-261 GBM model, U87 GBM model and human patients with GBM (Sicari et al. 2021). Here, extended that to other tumors and showed that DNAJB12, DNAJB14 and SGTA high levels are associated with poor prognosis in many cancer types (Figure S6). We also show some data from to show the relevance and added data showing the interaction of SGTA with AGR2 in CRC samples obtained from human patients compared to healthy tissue (Figure S5). This study is highly significant and is relevant not only to cancer but for other pathways that may behave in similar manner. For instance, DNAJB12 and DNAJB14 are part of the mechanism that is used by non-envelope viruses to escape the ER to the cytosol. Thus, the role of those DNAJB proteins seems to be mainly in the reflux of functional (not misfolded) proteins from the ER to the cytosol. We reported earlier that the UDP-Glucose-Glucosyl Transferase 1 (UGGT1) is also expelled during ER stress. UGGT1 is important because it is redeploy to the cytosol during enterovirus A71 (EA71) infection to help viral RNA synthesis (Huang et al, 2017). This redeployment of EAA71 is similar to what happens during the reflux process because on one hand, UGGT1 exit the ER by an ER stress mediated process (Sicari et al. 2021) and it is also a functional in the cytosol as a proteins which help viral RNA synthesis ((Huang et al, 2017). All those data showing that there is more of DNAJB12, DNAJB14, DNAJC14, DNAJC30 and DNAJC18 that still needs to be explored in addition to what is published. We suggest that viruses hijacked this evolutionary conserved machinery and succeeded to use it in order to escape.

      We appreciate the time spent to review our paper and we are sorry that the reviewer reached such verdict that is also not understood by the other reviewers. Most of the points raised by reviewer 2 were already addressed and explained in the initial submission, anyways we appreciate the time and the comments of reviewer #2 on our manuscript.

      Reviewer #2 (Significance (Required)):

      Overall, there are serious concerns about the writing of this paper as it gives the impression that it is the first study on higher eukaryotic and mammalian homologs of yeast HLJ1. The reader is not given the ability to compare the presented data to related published work. There are also serious concerns about the quality of the data presented and the physiological significance of the process under study. In its present form, this work does not appear suitable for publication.

      Answer: Again we thank reviewer #2 for giving us the opportunity to explain how significant is this manuscript especially for people who are less expert in this field. The significance of this paper (1) showing a the unique role of DNAJB12 and DNAJB14 in the molecular mechanism of the reflux process in mammalian cells (not their role in ERAD), (2) showing the implication of other cytosolic chaperones in the process including HSC70 and SGTA (3), our alpha-fold prediction show that this process may be redox dependent that implicate the cysteines of SGTA in extracting the ER proteins, (4) overexpression of the WT DNAJB12 is sufficient to drive this process, (5) mutation in the HPD motif prevent the reflux process probably by preventing the binding to the cytosolic chaperones, and (6) we need both DNAJB12 and DNAJB14 in order to make the interaction between the refluxed ER-proteins and the cytosolic chaperones occur.

      In Summary, this study is highly significant in terms of physiology, we previously reported that ERCYS is conserved in mammalian cells and is constitutively active in human and murine tumors (Sicari et al. 2021). Moreover, DNAJB12 and DNAJB14 are part of the mechanism that is used by non-envelope viruses to escape the ER to the cytosol in a mechanism that is similar to reflux process (Goodwin et al. 2011; Goodwin et al. 2014). Thus, the role of those DNAJB proteins seems to be mainly in the reflux of functional proteins from the ER to the cytosol, viruses used this evolutionary conserved machinery and succeeded to use in order to escape. This paper does not deal with the functional orthologues of the HLJ-1 in ERAD but rather suggesting a mechanism by which soluble proteins exit the ER to the cytosol.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)):____ __

      Summary: Reflux of ER based proteins to the cytosol during ER stress inhibits wt-p53. This is a pro-survival mechanism during ER stress, but as ER stress is high in many cancers, it also promotes survival of cancer cells. Using A549 cells, Dabsan et al. demonstrate that this mechanism is conserved from yeast to mammalian cells, and identify DNAJB12 and DNAJB14 as putative mammalian orthologues of yeast HLJ1.

      This paper shows that DNAJB12 and 14 are likely orthologues of HLJ1 based on their sequences, and their behaviour. The paper develops the pathway of ER-stress > protein reflux > cytosolic interactions > inhibition of p53. The authors demonstrate this nicely using knock downs of DNAJB12 and/or 14 that partially blocks protein reflux and p53 inhibition. Overexpression of WT DNAJB12, but not the J-domain inactive mutant, blocks etoposide-induced p53 activation (this is not replicated with DNAJB14) and ER-resident protein reflux. The authors then show that DNAJB12/14 interact with refluxed ER-resident proteins and cytosolic SGTA, which importantly, they show interacts with the ER-resident proteins AGR2, PRDX4 and DNAJB11. Finally, the authors show that inducing ER stress in cancer cell lines can increase proliferation (lost by etoposide treatment), and that this is partially dependent on DNAJB12/14.

      This is a very interesting paper that describes a nice mechanism linking ER-stress to inhibition of p53 and thus survival in the face of ER-stress, which is a double edged sword regarding normal v cancerous cells. The data is normally good, but the conclusions drawn oversimplify the data that can be quite complex. The paper opens a lot of questions that the authors may want to develop in more detail (non-experimentally) to work on these areas in the future, or alternatively to develop experimentally and develop the observations further. There are only a few experimental comments that I make that I think should be done to publish this paper, to increase robustness of the work already here, the rest are optional for developing the paper further.

      We thank the reviewer for his/her positive comments His/her comments contributed to make our manuscript stronger.

      __Major comments:____ __

      1. Number of experimental repeats must be mentioned in the figure legends. Figures and annotations need to be aligned properly

      __Answer____: __All experiments were repeated at least 3 times. We added the number of repeats on each figure in the figures legends

      Results section 2:

      No intro to the proteins you've looked at for relocalization. Would be useful to have some info on why you chose AGR2. Apart from them being ER-localized, do they all share another common characteristic? Does ability to inhibit p53 vary in potency?

      Answer: We previously showed that AGR2 is refluxed from the ER to the cytosol to bind and inhibit wt-p53 (Sicari et al. 2021). Here, we used AGR2 because, (1) we know that AGR2 is refluxed from the ER to the cytosol, and (2) we know which novel functions it gains in the cytosol so we are able to measure and provide a physiological significance of those novel functions when the levels of DNAJB12 and DNAJB14 are altered. Moreover, we used DNAJB11 (41 kDa) and HYOU1 (150 kDa) proteins to show that alteration in DNAJB12 or DNAJB14 prevent the reflux small, medium and large sized proteins. We added a sentence in the discussion stating that DNAJB12/14 are responsible for the reflux of ER-resident proteins independently of their size. We also added in the result section that we are looking at proteins of different sizes and activities.


      What are the roles of DNAJB12/14 if overexpression can induce reflux? Does it allow increased binding of an already cytosolic protein, causing an overall increase in an interaction that then causes inhibition of p53? What are your suggested mechanisms?

      Answer: Previously it was reported that over-expression of DNAJB12 and DNAJB14 tend to form membranous structures within cell nuclei, which was designate as DJANGOS for DNAJ-associated nuclear globular structures(Goodwin et al. 2014). Because those structures which contain both DNAJB12 and DNAJB14 also form on the ER membrane (Goodwin et al. 2014), we speculate that during stress DNAJB12/14 overexpression may facilitate ERCYS. Interestingly, those structures contain Hsc70 and markers of the ER lumen, the nuclear and ER and nuclear membranes (Goodwin et al. 2014).

      The discussion was edited accordingly to further strengthen and clarify this point

      Fig3: A+B show overexpression of individual DNAJs but not combined. As you go on to discuss the effect of the combination on AGR2 reflux, it would be useful to include this experimentally here.

      Answer: This is a great idea, we tried to do it for long time. Unfortunately when we used cells overexpress DNAJB12 under the doxycycline promoter and transfect with DNAJB14 plasmid expressing DNAJB14 under the CMV promoter, most of the cells float within 24 hours compared to cells transfected with the empty vector alone or with DNAJB14-H136Q. We also did overexpression of DNAJB14 in cells with DNAJB12 conditional expression and also were lethal in Trex293T cells and A549-cells.

      Fig 3C: Subfractionation of cells shows AGR2 in the cytosol of A549 cells. The quality of the data is good but the bands are very high on the blot. For publication is it possible to show this band more centralized so that we are sure that we are not missing bands cut off in the empty and H139Q lanes?

      Also, you have some nice immunofluorescence in the 2021 EMBO reports paper, is it possible to show this by IF too? It is not essential for the story, but it would enrich the figure and support the biochemistry nicely. Also it is notable that the membrane fraction of the refluxed proteins doesn't appear to have a decrease in parallel (especially for AGR2). Is this because the % of the refluxed protein is very small? Is there a transcriptional increase of any of them (the treatments are 12+24 h so it would be enough time)? This could be a nice opportunity to discuss the amount of protein that is refluxed, whether this response is a huge emptying of the ER or more like a gentle release, and also the potency of the gain of function and effect on p53 vs the amount of protein refluxed. This latter part isn't essential but it would be a nice element to expand upon.

      Answer: We re-blotted the AGR2 again, new blot of AGR2 was added. More blots also are added in Figure S2, the text is edited accordingly.

      In new Figure S5 we added immunofluorescence experiment from tumors and non-tumors tissues obtained from Colorectal cancer (CRC) patients showing that the interaction between SGTA and the refluxed AGR2 also occurs in more physiological settings. It is also to emphasize that the suggested mechanism that implicates SGTA is also valid in CRC tumors.

      We performed qPCR experiments in control, DNAJB12-KD, DNAJB14-KD and in the DNAJB12/DNAJB14 double knock down cells (in both A549 and PC3 cells) to follow the mRNA levels of DNAJB11. As shown in the Figure S2F-N, there is no increase in the mRNA levels of DNAJB11, AGR2 or HYOU1 in the different cells in normal (unstressed conditions). Upon ER stress with tunicamycin or thapsigargin there is a little increase in the mRNA levels of HYOU1 and AGR2 but in DNAJB11 mRNA levels. On the other hand, we also performed western blot analysis and we did not detect any difference between the different knockdown cells when we analyzed the levels of DNAJB11 compared to GAPDH. Those data are now added to Figure S2F-N. We must note that in AGR2 and HYOU1 are induced at the mRNA as a result of ER stress. The data with the overexpression of DNAJB12 and DNAJB14 are important control experiment where we show a reflux when DNAJB12 is overexpressed without inducing the ER stress (Figure 3, Figure 4, and Figure S3). In those conditions no induction of AGR2, HYOU1 or DNAJB11 were observed. Those results argue against the reflux as a result of protein induction and the increase in the proteins levels.

      The overall protein levels in steady state are function of how much proteins are made, degraded and probably secreted outside the cell. We do see in Figure S2 under ER stress there are some differences in the levels of the mRNA, moreover, from our work in yeast we showed that the expelled proteins have very long half-life in the cytosol (Igbaria et al. 2019). Because it is difficult to assay how many of the mRNA is translated and how much of it is stable/degraded and the stability of the cytosolic fraction vs the ER, it is hard to interpret on the stability and the levels of the proteins.

      Those data are now added to the manuscript, the text is edited accordingly.

      You still mention DNAJB12 and 14 as orthologues, even though DNAJB14 has no effect on p53 activity when overexpressed. Do you think that this piece of data diminishes this statement?

      Answer: The fact that DNAJB12 and DNAJB14 are highly homologous and that only the double knockdown has a great effect on the reflux process may indicate that they are redundant. Moreover, because only DNAJB12 is sufficient may indicate that some of DNAJB12 function cannot be carried by DNAJB14. In one hand they share common activities as shown in the double knock down and on the other hand DNAJB12 has a unique function that may not be compensated by DNAJB14 when overexpressed.

      __ __ Fig 3D/F: Overexpression of DNAJB14 induces reflux of DNAJB11 at 24h, what does this suggest? Does this indicate having the same role as DNAJB12 but less potently? What's your hypothesis?

      Answer: ERCYS is new and interesting phenomenon and the redistribution of proteins to the cytosol has been documented lately by many groups. Despite that we still do not know what is the specificity of DNAJB12 and DNAJB14 to the refluxed proteins. DNAJB11 is glycosylated protein and now we are testing whether other glycosylated proteins prefer the DNAJB14 pathway or not. This data is beyond the scope of this paper

      "This suggests that the two proteins may have different functions when overexpressed, despite their overlapping and redundant functions" What does it suggest about their dependence on each other? If overexpression of WT DNAJB12 inhibits Tg induced reflux, is it also blocking the ability of DNAJB14 to permit flux?

      Answer: We hypothesize that it is all about the stichometry and the ratios between proteins. When we overexpress DNAJB14 (the one that is not sufficient to cause reflux it may hijack common components and factor by non-specifically binding to them. Those factors may be needed for DNAJB12 to function properly (Like the dominant negative effect of the DNAJB12-HPD mutant for instance). On the other hand, DNAJB12 may have higher affinity for some cytosolic partner and thus can do the job when overexpressed. Here, we deal with the DNAJB12/DNAJB14 as essential components of the reflux process, yet we need to identify the interactome of each of the proteins during stress and the role of the other DNAJ proteins that also share some of the topological and structural similarity to DNAJB12, DNAJB14 and HLJ-1 (DNAJC30, DNAJC14, and DNAJC18). We edited the text accordingly and integrated this in the discussion.

      __ __ Fig 4: PDI shown in blots but not commented on in text. Then included in the schematics. Please comment in the text.

      Answer: We commented PDI in the text.

      Fig 4F: Although the quantifications of the blots look fine, the blot shown does not convincingly demonstrate this data for AGR2. The other proteins look fine, but again it could be useful to see the individual means for each experiment, or the full gels for all replicates in a supplementary figure.

      Answer: the other two repeats are in Figure S4

      __ __Results section 3

      Fig 5A, As there is obviously a difference between DNAJB12/14 it would be useful to do the pulldown with DNAJB14 too. Re. HSC70 binding to DNAJB12 and 14, the abstract states that DNAJB12/14 bind HSC70 and SGTA through their cytosolic J domains. Fig 5 shows pulldowns of DNAJB12 with an increased binding of SGTA in FLAG-DNAJB12 induced conditions, but the HSC70 band does not seem to be enriched in any of the conditions, including after DNAJB12 induction. This doesn't support the statement that DNAJB12 binds HSC70. In fact, in the absence of a good negative control, this would suggest that the HSC70 band seen is not specific. There is also no data to show that DNAJB14 binds HSC70. I recommend including a negative condition (ie beads only) and the data for DNAJB14 pulldown.

      Answer: In Figure 5A we used the Flp-In T-REx-293 cells as it is easier to control and to tune up and down the expression levels of DNAJB12 and DNAJB14. According to new Figure S5A, DNAJB12 binds at the basal levels to HSC70 all the time. It was also surprising for us not to see the differences in the overexpression and we relate that to the fact that all the HSC70 are saturated with DNAJB12. In order to better assay that we repeated the IP in Figure 5A but instead of the IP with DNAJB12, we IP-ed with FLAG antibodies to selectively IP the transfected DNAJB12. As shown in the new Fig 5A, the increase of DNAJB12-FLAG is accompanied with an increase in the binding of HSC70.

      We further tested the interaction between DNAJB12, DNAJB14 and HSC70 during ER stress in cancer cells. In those cells we found that DNAJB12 and DNAJB14 bind to HSC70 and they recruit SGTA upon stress. We also tested the binding between DNAJB12 and DNAJB14, in unstressed conditions, there was a basal binding between both, this interaction was stronger during ER stress. Those data are now added to Figure 5 and Figure S5 and the discussion was edited accordingly.

      The binding of DNAJB12 to SGTA under stress conditions in Fig5B looks much more convincing than SGTA to DNAJB12 in Fig 5A. Bands in all blots need to be quantified from 3 independent experiments, and repeated if not already n=3. If this is solely a technical difference, please explain in the text.

      The conclusions drawn from this interaction data are important and shold be elaborated upon to support th claims made in the paper. The authors may also chose to expand the pulldowns to demonstrate their claims made on olidomerisation of DNAJB12 and 14 here. It is also clear that the interaction data of the SGTA with ER-resident proteins AGR2, PRDX4 and DNAJB11 is strong. The authors may want to draw on this in their hypotheses of the mechanism. I would imagine a complex such as DNAJB14/DNAJB12 - SGTA - AGR2/PRDX4/DNAJB11 would be logical. Have any experiments been performed to prove if complexes like this would form?

      Answer: In Figure 5A we used the Flp-In T-REx-293 cells as it is easier to control and to tune up and down the expression levels of DNAJB12 and DNAJB14. T-REx-293 are highly sensitive to ER stress, they do not die (as we did not observe apoptosis markers to be elevated) but they float and can regrow after the stress is gone. In Figure 5B we are using ER stress without the need to express DNAJB12 in A549 cell line. In order to further verify those data, we repeated the IP in another cell line as well to confirm the data in 5B. We also repeated the IP in 5A with anti-FLAG antibody to improve the IP and to specifically map he interaction with the overexpressed FLAG-DNAJB12 (discussed above). All experiments were done in triplicates and added to Figure 5 and Figure S5.

      We agree with the reviewer on the complex between the refluxed proteins and SGTA. We believed that SGTA may form a complex with other refluxed ER-proteins but we were unable to see an interaction between AGR2-DNAJB11 in the cytosolic fraction or between AGR2-PRDX4 in the conditions tested in the cytosolic fraction. We could not do this in the whole cell lysate because those proteins bind each other in the ER. Finally, our structural prediction using Alpha-fold suggests that the interaction between SGTA and the refluxed AGR2 (and probably others) is redox depending and that it requires disulfide bridge between cysteine 81 on AGR2 and cysteine 153 on SGTA. Thus, we hypothesize that SGTA binds one refluxed protein at the time.

      We repeated the figure with improvement: (1) using more cells in order to increase the amount of IP-ed proteins and to overcome the problem of the faint bands, (2) performing the IP with the FLAG antibodies instead of the DNAJB12 endogenous antibodies.

      Fig 5B: It is clear that DNAJB12 interacts with SGTA. The authors state that DNAJB14 also interacts with SGTA under normal and stress conditions, but the band in 25/50 Tg is very feint. Why would there be stronger binding at the 2 extremes than during low stress induction? In the input, there is a much higher expression of DNAJB14 in 50 Tg. What does this say about the interaction? Is there an effect of ER stress on DNAJB14 expression? A negative control should be included to show any background binding, such as a "beads only" control

      __Answer: __DNAJB14 does not change with ER stress as shown in the Ips (Input) and in the qPCR experiment in Figure S5I. We added beads only control, we also added new Ips to assess the binding between DNAJB14 and DNAJB12, and between DNAJB14-SGTA. All the new Ips and controls now added as Figure 5 and Figure S5.

      Fig 5C data is sound, although a negative control should be included.

      Answer: Negative control was added in Figure S5.

      __Results section 4____ __

      Fig 6A-B: Given that there is the complexity of overexpression v KD of DNAJB12 v 14 causing similar effects on p53 actvity (Fig 2 v 3), it would be interesting to see whether the effect of overexpression mirrors the results in Fig 6A. Is it known what SGTA overexpression does (optional)?

      Answer: In the overexpression system, cells overexpressing DNAJB12 start to die between 24-48 hours as shown in Figure S3C. Thus, it is difficult to assay the proliferation of these cells in those conditions. On the other hand, overexpression of Myc-tagged SGTA in A549 cells, MCF7 or T-ReX293 did not show any reflux of ER-proteins to the cytosol and it didn’t show any significant changes in the proliferation index (Figure Reviewers only RV2).

      Fig 6D: resolution very low

      Answer: Figure 6D was changed

      __ __ Fig 6C-D: There is an interesting difference though between the proposed cytosolic actions of the refluxed proteins. You show that AGR2, PRDX4 and DNAJB11 all bind to SGTA in stress conditions, but in the schematics you show: DNAJB11 binding to HSC70 through SGTA (not shown in the paper), then also PDIA1, PDIA3 binding to SGTA and AGR2 binding to SGTA. What role does SGTA have in these varied reactions? Sometimes it is depicted as an intermediate, sometimes a lone binder, what is its role as a binder? It should be clarified which interactions are demonstrated in the paper (or before) and which are hypothesized in a graphical way (eg. for hypotheses dotted outlines or no solid fill etc). The schematics also suggest that DNAJB14 binding to HSC70 and SGTA is inducible in stress conditions, as is PDIA3, which is not shown in the paper. Discussion "In cancer cells, DNAJB12 and DNAJB14 oligomerize and recruit cytosolic chaperones and cochaperones (HSC70 and SGTA) to reflux AGR2 and other ER-resident proteins and to inhibit wt-p53 and probably different proapoptotic signaling pathways (Figure 5, and Figure 6C-6D)." You havent shown oligomerisation between DNAJB12/14. Modify the text to make it clear that it is a hypothesis.

      Answer: We removed “oligomerize” from the text and added that it as a hypothesis. Figure (C-D) also were changed to be compatible with the text.

      Minor comments:

      __ __ It would be useful to have page or line numbers to help with document navigation, please include them. Typos and inconsistency in how some proteins are named throughout the manuscript

      Answer: Page numbers and line numbers are added. Typos are corrected

      Title: Include reference to reflux. Suggest: "chaperone complexes (?proteins) reflux from the ER to cytosol..." I presume it would be more likely that the proteins go separately rather than in complex. Do you have any ideas on the size range of proteins that can undergo this process?

      Answer: this is true, proteins may cross the ER membrane separately and then be in a complex with cytosolic chaperones. The title is changed accordingly. As discussed earlier, the protein we chose were of different sizes to show that they are refluxed independently of their size. Moreover, our previous work showed that the proteins that were refluxed are of different sizes. Most importantly UGGT1 (around 180 Kda) which is reported to deploy to the cytosol upon viral infection (Huang et al. 2017; Sicari et al. 2020). In this study we used AGR2 (around 19 Kda) and HYOU1 (150Kda).

      ERCY in abstract, ERCYS in intro. There are typos throughout, could be a formatting problem, please check

      Answer: Checked and corrected

      What about the selection of refluxed proteins? Is this only a certain category of proteins? Could it be anything? Have you looked at other cargo / ER resident proteins?

      __ ____Answer: __in our previous study by (Sicari, Pineau et al. 2020) we looked at many other proteins especially glycoproteins from the ER. In (Sicari, Pineau et al. 2020) we used mass spectrometry in order to identify new refluxed proteins and we found 26 new glycoprotein that are refluxed from cells treated with ER stressor and from human tissues obtained from GBM patients (Sicari, Pineau et al. 2020).

      We previously showed that AGR2 is refluxed from the ER to the cytosol to bind and inhibit p53 (Sicari, Pineau et al. 2020). Here, we selected AGR2 because we know that (1) it is refluxed, and (2) we know which novel functions it acquires in the cytosol so we are able to measure and provide a physiological significance of those novel functions when the levels of DNAJB12 and DNAJB14 are altered. Moreover, we selected DNAJB11 (41 kDa) and HYOU1 (150 kDa) proteins to show that alteration in DNAJB12 or DNAJB14 prevent the reflux small, medium and large protein (independently of their size). We also showed earlier by mass spectrometry analysis that the refluxed proteins range from small to very large proteins such as UGGT1, thus we believe that soluble ER-proteins can be substrates of ERCYS independently of their size. In the discussion, we added a note that the reflux by the cytosolic and ER chaperones operates on different proteins independently of their size.

      "Their role in ERCYS and cells' fate determination depends..." Suggest change to "Their role in ERCYS and determination of cell fate..."

      Answer: changed and corrected

      I think that the final sentence of the intro could be made stronger and more concise. There's a repeat of ER and cytosol. Instead could you comment on the reflux permitting new interactions between proteins otherwise spatially separated, then the effect on wt-p53 etc.

      Answer: The sentence was rephrased as suggested to “ In this study, we found that HLJ1 is conserved through evolution and that mammalian cells have five putative functionality orthologs of the yeast HLJ1. Those five DNAJ- proteins (DNAJB12, DNAJB14, DNAJC14, DNAJC18, and DNAJC30) reside within the ER membrane with a J-domain facing the cytosol (Piette et al. 2021; Malinverni et al. 2023). Among those, we found that DNAJB12 and DNAJB14, which are strongly related to the yeast HLJ1 (Grove et al. 2011; Yamamoto et al. 2010), are essential and sufficient for determining cells' fate during ER stress by regulating ERCYS. Their role in ERCYS and determining cells' fate depends on their HPD motif in the J-domain. Downregulation of DNAJB12 and DNAJB14 increases cell toxicity and wt-p53 activity during etoposide treatment. Mechanistically, DNAJB12 and DNAJB14 interact and recruit cytosolic chaperones (HSC70/SGTA) to promote ERCYS. This later interaction is conserved in human tumors including colorectal cancer.

      In summary, we propose a novel mechanism by which ER-soluble proteins are refluxed from the ER to the cytosol, permitting new inhibitory interactions between spatially separated proteins. This mechanism depends on cytosolic and ER chaperones and cochaperones, namely DNAJB12, DNAJB14, SGTA, and HSC70. As a result, the refluxed proteins gain new functions to inhibit the activity of wt-p53 in cancer cells. “

      __Figure legends: __

      In some cases the authors state the number of replicates, but this should be stated for all experiments. If experiments don't already include 3 independent repeats, this should be done. Check text for typos, correct letter capitalisation, spaces and random bold text (some of this could be from incompatability when saving as PDF)

      Answer: all experiments were repeated at least three times. The number of repeats is now indicated in the figure legends of each experiment. Typos and capitalization is corrected as well.

      Fig2E: scrambled not scramble siRNA

      Answer: corrected

      Fig 3: "to expel" is a term not used in the rest of the paper for reflux. Useful to remain consistent with terminology where possible

      Answer: Rephrased and corrected

      Results section 1:

      "Protein alignment of the yeast HLJ1p showed high amino acids similarity to the mammalian..."

      Answer: Rephrased to “ Comparing the amino acid sequences revealed significant similarity between the yeast protein HLJ1p and the mammalian proteins DNAJB12 and DNAJB14”

      __ __ Fig 1C: state in legend which organism this is from (presumably human)

      Answer: in Figure 1C legends it is stated that: “ the HPD motif within the J-domain is conserved in HLJ-1 and its putative human orthologs DNAJB12, DNAJB14, DNAJC14, DNAJC18, and DNAJC30.”

      Results Section 2

      "Test the two strongest hits DNAJB12/14" Add reference to previous paper showing this

      Answer: the references were added.

      __ __ "In the WT and J-protein-silenced A549 cells, there were no differences in the cytosolic enrichment of the three ER resident proteins AGR2, DNAJB11, and HYOU1 in normal and unstressed conditions (Figure 2A-C and Figure S2C)." I think that this is an oversimplification, and in your following discussion, you show this it's more subtle than this.

      Answer: We expanded on this both in the discussion and the results section.

      __ __ The text here isn't so clear: normal and unstressed conditions? Do you mean stressed? Please be careful in your phrases: "DNAJB12-silenced cells are slightly affected in AGR2 and DNAJB11 cytosolic accumulation but not HYOU1." This is the wrong way around. DNAJB12 silencing effects AGR2, not that AGR2 effects the cells (which is how you have written it). This also occurs agan in the next para:

      Answer: Normal cells are non-cancer cells. Unstressed conditions= without ER stress. The sentence was rephrased to: In the absence of ER stress, the cytosolic levels of the three ER-resident proteins (AGR2, DNAJB11, and HYOU1) were similar in wild-type and J-protein-silenced A549 cells.

      "During stress, DNAJB12/DNAJB14 double knockdown was highly affected in the cytosolic..." I think you mean it highly affected the cytosolic accumulation, not that it was affected by the cytosolic accumulation. Please change in the text

      Answer: the sentence is now rephrased to” During stress, double knockdown of DNAJB12 and DNAJB14 highly affected the cytosolic accumulation of all three tested proteins”

      __ __ "DNAJB12 and DNAJB14 are strong hits of the yeast HLJ1" Not clear, I presume you mean they are likely orthologues? Top candidates for being closest orthologues?

      Answer: this is correct, the sentence is rephrased and corrected

      __ __ Fig 2D: typos in WB labelling? I think Tm should be - - +, not - + +as it is now (if it's not a typo, you need more controls, eto alone.

      Answer: the labeling is now corrected

      Fig 2D-E-F typos for DKD? D12/D12 or D12/14?

      Answer: This is correct, thank you for pointing this out. The labeling in corrected

      __ __ "We assayed the phosphorylation state of wt- p53 and p21 protein expression levels (a downstream target of p53 signaling) during etoposide treatment." What are the results of this? Explain what Fig 2D-E shows, then build on this with the +Tm results. Results should be explained didactically to be clear.

      Answer: The paragraph was edited and we explained the results: In these conditions, we saw an increase in the phosphorylation of wt-p53 in the control cells and in cells knocked-down with DNAJB12, DNAJB14 or both. This phosphorylation increased the protein levels of p21 as well (Figure 2D-G). Tm addition to cells treated with etoposide resulted in a reduction in wt-p53 phosphorylation, and as a consequence, the p21 protein levels were also decreased (Figure 2D-G and Figure S2O). Cells lacking DNAJB12 or DNAJB14 have partial protection in wt-p53 phosphorylation and p21 protein levels. Silencing both proteins in A549 and MCF7 cells rescued wt-p53 phosphorylation and p21 levels (Figure 2D-G and Figure S2D). Moreover, similar results were obtained when we assayed the transcriptional activity of wt-p53 in cells transfected with a luciferase reporter under the p53-DNA binding site (Figure 2H). These data confirm that DNAJB12 and DNAJB14 are involved in ER protein reflux and the inhibition of wt-p53 activity during ER stress.


      "(Figure 2D- E). Cells lacking DNAJB12 and or DNAJB14 have partial protection in wt-p53 phosphorylation and p21 protein levels."

      Answer: This sentence is now removed

      You comment on p53 phosphorylation, but you haven't quantified this. This should be done, normalized to p53 levels, if you want to draw these conclusions, especially as total p53 varies between condition. Does Eto increase p53 txn? Does Tm alone increase p53 activity/phospho-p53? These are shown in the Sicari EMBO reports paper in 2021, you should briefly reference those.

      Answer: The blots are now quantified and new blot is added to Figure S2D. The Paragraph was edited and referenced to our previous paper (Sicari et al. 2021). “We then wanted to examine whether the gain of function of AGR2 and the inhibition of wt-p53 depends on the activity of DNAJB12 and DNJAB14. We assayed the phosphorylation state of wt-p53 and p21 protein expression levels (a downstream target of wt-p53 signaling) during etoposide treatment. In these conditions, there was an increase in the phosphorylation of wt-p53 in the control cells and in cells knocked down with DNAJB12, DNAJB14, or both. This phosphorylation also increases protein levels of p21 (Figure 2D-G and Figure S2O). Tm addition to cells treated with etoposide resulted in a reduction in wt-p53 phosphorylation, and as a consequence, the p21 protein levels were also decreased (Figure 2D-G and Figure S2O). Silencing DNAJB12 and DNAJB14 in A549 and MCF-7 cells rescued wt-p53 phosphorylation and p21 levels (Figure 2D-G and Figure S2O). Moreover, similar results were obtained when we assayed the transcriptional activity of wt-p53 in cells transfected with a luciferase reporter under the p53-DNA binding site (Figure 2H). In the latter experiment, etoposide treatment increased the luciferase activity in all the cells tested. Adding ER stress to those cells decreased the luciferase activity except in cells silenced with DNAJB12 and DNAJB14.

      These data confirm that DNAJB12 and DNAJB14 are involved in the reflux of ER proteins in general and AGR2 in particular. Inhibition of DNAJB12 and DNAJB14 prevented the inhibitory interaction between AGR2 and wt-p53 and thus rescued wt-p53 phosphorylation and its transcriptional activity as a consequence. “

      Fig3A: overexpression of DNAJB12 decreases Eto induced p53 but not at steady state. Is this because at steady state the activity is already basal? Or is there another reason?

      Answer: yes, at steady state the activity is already basal

      Switch Figs S3D and S3C as they are not referred to in order. Also Fig S3C: vary colour (or add pattern) on bars more between conditions

      Answer: The Figures now are called by their order in the new version. Colors are now added to Figure S3C.

      Need to define HLJ1 at first mention

      Answer: defined as” HLJ1 - High copy Lethal J-protein -an ER-resident tail-anchored HSP40 cochaperone.

      Results section 3

      HSC70 cochaperone (SGTA) defined twice

      Answer: the second one was removed

      "These data are important because SGTA and the ER-resident proteins (PRDX4, AGR2, and DNAJB11) are known to be expressed in different compartments, and the interaction occurs only when those ER-resident proteins localize to the cytosol." Is there a reference for this?

      Answer: Peroxireoxin 4 is the only peroxerodin that is expressed in the ER. AGR2 and DNAJB11 are also ER luminal proteins that are known to be solely expressed in the ER. SGTA is part of the cytosolic quality control system and is expressed in the cytosol. The references are added in the main text.

      Results section 4

      "by almost two folds"

      Answer: corrected

      Fig 6A: It seems strange that the difference between purple and blue bars in scrambled, and D14-KD are very significant but D12-KD is only significant. Why is this? The error bars don't look that different. It would be interesting to see the individual means for each different replicate.

      Answer: Thank you for pointing this, the two asterixis were aligned in the middle as one during figure alignments. In D14 the purple one has a lower error bar thus this changes the significance when compared to the blue while in D12-KD, the error bars in the eto treatment and the eto-Tm both are slightly higher. Graphs of the three different replicates are now added in Figure S6. Each one of the three biological replicates was repeated in three different technical repeats (averaged in the graphs).

      Figures: Fig 6A: Scale bars not well placed. Annotation on final set should be D12/D14 DKD?

      Answer: both were Corrected

      __Discussion __47. The authors mention that they want to use DNAJB12/4-HSC70/SGTA axis to impair cancer cell fitness: What effect would this have though in a non cancer model? Would this be a viable approach Although it is obviously early days, which approach would the authors see as potentially favorable?


      Answer: In our previous study we used an approach to target AGR2 in the cytosol because the reflux of AGR2 occurs only in cancer cells and not in normal cells. In that study we targeted AGR2 with scFv that targets AGR2 and is expressed in the cytosol, in this case it will target AGR2 in the cytosol which only occurs in cancer. Here, we suggest to target the interaction between the refluxed proteins and their new partners in the cytosol or to target the mechanism that causes their reflx to the cytosol by inhibiting for instance the interaction between SGTA and DNAJB proteins.


      __ __ Second para: Should be "Here we present evidences"

      Answer: we replaced with “Here we present evidences”

      "DNAJB12 overexpression was also sufficient to promote ERCYS by refluxing AGR2 and inhibit wt-p53 signaling in cells treated with etoposide" Suggest:

      Answer: DNAJB12 overexpression is also sufficient to promote ERCYS by refluxing AGR2 and inhibit wt-p53 signaling in cancer cells treated with etoposide (Figure 3). This suggests that it is enough to increase the levels of DNAJB12 without inducing the unfolded protein response in order to activate ERCYS. Moreover, the downregulation of DNAJB12 and DNAJB14 rescued the inhibition of wt-p53 during ER stress (Figure 2). Thus, wt-p53 inhibition is independent of the UPR activation but depends on the inhibitory interaction of AGR2 with wt-p53 in the cytosol.

      .

      DNAJB12 overexpression was also sufficient to promote ERCYS by increasing reflux of AGR2 and inhibition of wt-p53 signaling in cells treated with etoposide

      Answer: This sentence is repeated twice and was removed

      "Moreover, DNAJB12 was sufficient to promote this phenomenon and cause ER protein reflux by mass action without causing ER stress (Figure 3, Figure 4, and Figure S3)." You dont look at induction of ER stress here, please change the text or explain in more depth with refs if suitable

      Answer: In the initial submission and in the revised version we assayed the activation of the UPR by looking at the levels of spliced Xbp1 and Bip in the different conditions when DNAJB12 and DNAJB14 are overexpressed (Figure S3C and S3D). Our data show that although DNAJB12 overexpression induces ERCYS, there was no UPR activation.

      The mention of viruses is sparse in this paper. If it is a main theory, put it more centrally to the concept, and explain in more detail. As it is, its appearance in the final sentence is out of context.

      Answer: DNAJB12 and DNAJB14 were reported to facilitate the escape of non-envelope viruses from the endoplasmic reticulum to the cytosol. The mechanism of non-envelope penetration is highly similar to the reflux of proteins from the ER to the cytosol. Interestingly, this mechanism takes place when the DNAJB12 and DNAJB14 form a complex with chaperones from both the ER and the cytosol including HSC70, SGTA and BiP (Walczak et al. 2014; Goodwin et al. 2011; Goodwin et al. 2014)..

      Moreover, the UGGT1 that was independently found in our previous mass spectrometry analysis of the digitonin fraction obtained from HEK293T cells treated with the ER stressor thapsigargin and from isolated human GBM tumors (Sicari et al. 2020), is known to deploy to the cytosol upon viral infection (Huang et al. 2017; Sicari et al. 2020). We therefore hypothesized that the same machinary that is known to allow viruses to escape the ER to penetrate the cytosol may play an important role in the reflux of ER proteins to the cytosol.

      Because ER protein reflux and the penetration of viruses from the ER to the cytosol behave similarly, we speculate that viruses hijacked an evolutionary conserved machinery -ER protein reflux- to penetrate to the cytosol. This is key because it was also reported that during the process of nonenveloped viruses penetration, large, intact and glycosylated viral particles are able to penetrate the ER membrane on their way to the cytosol (Inoue and Tsai 2011).

      Action: we developed the discussion around this point and clarified it better because we believe it central to show that viruses hijacked this conserved mechanism.

      **Referees cross-commenting**

      I agree with the comments from Reviewer 1.

      Reviewer 2 also is correct in many ways, but I think that they have somewhat overlooked the relevance of the ER-stress element and treatments. The authors do need to reference past papers more to give a full story, as this includes the groups own papers, I don't think that it is an ethical problem but rather an oversight in the writing. Regarding reviewer 2's concerns about overexpression levels and cell death, the authors do use an inducible cell line and show the levels of DNAJB12 induced (could CRISPR also be considered?). This could be used to further address reviewer 2's concerns. It would also be useful to see data on cell death in the conditions used in the paper. Re concerns about ER integrity, this could be addressed by using IF (or EM) to show a secondary ER marker that remains ER-localised, and this would also be of interest regarding my comment on which categories of proteins can undergo reflux. If everything is relocalised, then reviewer 2's point would be validated.

      Reviewer #3 (Significance (Required)):

      Significance

      General assessment: This paper robustly shows that the yeast system of ER to cytosol reflux of ER-resident proteins is conserved in mammalian cells, and it describes clearly the link between ER stress, protein reflux and inhibition of p53 in mammalian cells. The authors have the tools to delve deeper into this mechanism and robustly explore this pathway, however the mechanistic elements - where not instantly clear from the results - have been over interpreted somewhat The results have been oversimplified in their explanations and some points and complexities of the study need to be addressed further to make the most of them - these are often some of the more interesting concepts of the paper, for example the differences in DNAJB12/14 and how the proteins orchestrate in the cytosol to play their cytosol-specific effects. I think that many points can be addressed in the text, by the authors being clear and concise with their reporting, while other experiments would turn this paper from an observational one, into a very interesting mechanistic one.

      Advance: This paper is based on previous nice papers from the group. It is a nice progressions from yeast, to basic mechanism, to physiological model. But as mentioned, without a strong mechanistic improvement, the paper would remain observatory.

      Audience: This paper is interesting to cell biologists (homeostasis, quality control and trafficking) as well as cancer cell biologists (fitness of cancer cells and homeostasis) and it is a very interesting demonstration of a process that is a double edged sword, depending on the environment of the cells.

      My expertise: cell biology, trafficking, ER homeostasis

      Answer: We would like to thank the reviewer for his/her positive feedback on our manuscript. All the comments of the three reviewers are now addressed and the manuscript has been strengthen. We put more emphasis on the mechanistic aspect with more Ips and knockdowns. We also added data to show that it is physiologically relevant. We hope that after that the revised version addressed all the concerns raised by the reviewers.

      Goodwin, E. C., A. Lipovsky, T. Inoue, T. G. Magaldi, A. P. Edwards, K. E. Van Goor, A. W. Paton, J. C. Paton, W. J. Atwood, B. Tsai, and D. DiMaio. 2011. 'BiP and multiple DNAJ molecular chaperones in the endoplasmic reticulum are required for efficient simian virus 40 infection', MBio, 2: e00101-11.

      Goodwin, E. C., N. Motamedi, A. Lipovsky, R. Fernandez-Busnadiego, and D. DiMaio. 2014. 'Expression of DNAJB12 or DNAJB14 causes coordinate invasion of the nucleus by membranes associated with a novel nuclear pore structure', PLoS One, 9: e94322.

      Grove, D. E., C. Y. Fan, H. Y. Ren, and D. M. Cyr. 2011. 'The endoplasmic reticulum-associated Hsp40 DNAJB12 and Hsc70 cooperate to facilitate RMA1 E3-dependent degradation of nascent CFTRDeltaF508', Mol Biol Cell, 22: 301-14.

      Huang, P. N., J. R. Jheng, J. J. Arnold, J. R. Wang, C. E. Cameron, and S. R. Shih. 2017. 'UGGT1 enhances enterovirus 71 pathogenicity by promoting viral RNA synthesis and viral replication', PLoS Pathog, 13: e1006375.

      Igbaria, A., P. I. Merksamer, A. Trusina, F. Tilahun, J. R. Johnson, O. Brandman, N. J. Krogan, J. S. Weissman, and F. R. Papa. 2019. 'Chaperone-mediated reflux of secretory proteins to the cytosol during endoplasmic reticulum stress', Proc Natl Acad Sci U S A, 116: 11291-98.

      Inoue, T., and B. Tsai. 2011. 'A large and intact viral particle penetrates the endoplasmic reticulum membrane to reach the cytosol', PLoS Pathog, 7: e1002037.

      Malinverni, D., S. Zamuner, M. E. Rebeaud, A. Barducci, N. B. Nillegoda, and P. De Los Rios. 2023. 'Data-driven large-scale genomic analysis reveals an intricate phylogenetic and functional landscape in J-domain proteins', Proc Natl Acad Sci U S A, 120: e2218217120.

      Piette, B. L., N. Alerasool, Z. Y. Lin, J. Lacoste, M. H. Y. Lam, W. W. Qian, S. Tran, B. Larsen, E. Campos, J. Peng, A. C. Gingras, and M. Taipale. 2021. 'Comprehensive interactome profiling of the human Hsp70 network highlights functional differentiation of J domains', Mol Cell, 81: 2549-65 e8.

      Sicari, D., F. G. Centonze, R. Pineau, P. J. Le Reste, L. Negroni, S. Chat, M. A. Mohtar, D. Thomas, R. Gillet, T. Hupp, E. Chevet, and A. Igbaria. 2021. 'Reflux of Endoplasmic Reticulum proteins to the cytosol inactivates tumor suppressors', EMBO Rep: e51412.

      Sicari, Daria, Raphael Pineau, Pierre-Jean Le Reste, Luc Negroni, Sophie Chat, Aiman Mohtar, Daniel Thomas, Reynald Gillet, Ted Hupp, Eric Chevet, and Aeid Igbaria. 2020. 'Reflux of Endoplasmic Reticulum proteins to the cytosol yields inactivation of tumor suppressors', bioRxiv.

      Walczak, C. P., M. S. Ravindran, T. Inoue, and B. Tsai. 2014. 'A cytosolic chaperone complexes with dynamic membrane J-proteins and mobilizes a nonenveloped virus out of the endoplasmic reticulum', PLoS Pathog, 10: e1004007.

      Yamamoto, Y. H., T. Kimura, S. Momohara, M. Takeuchi, T. Tani, Y. Kimata, H. Kadokura, and K. Kohno. 2010. 'A novel ER J-protein DNAJB12 accelerates ER-associated degradation of membrane proteins including CFTR', Cell Struct Funct, 35: 107-16.

      Youker, R. T., P. Walsh, T. Beilharz, T. Lithgow, and J. L. Brodsky. 2004. 'Distinct roles for the Hsp40 and Hsp90 molecular chaperones during cystic fibrosis transmembrane conductance regulator degradation in yeast', Mol Biol Cell, 15: 4787-97.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      Reflux of ER based proteins to the cytosol during ER stress inhibits wt-p53. This is a pro-survival mechanism during ER stress, but as ER stress is high in many cancers, it also promotes survival of cancer cells. Using A549 cells, Dabsan et al. demonstrate that this mechanism is conserved from yeast to mammalian cells, and identify DNAJB12 and DNAJB14 as putative mammalian orthologues of yeast HLJ1.

      This paper shows that DNAJB12 and 14 are likely orthologues of HLJ1 based on their sequences, and their behaviour. The paper develops the pathway of ER-stress > protein reflux > cytosolic interactions > inhibition of p53. The authors demonstrate this nicely using knock downs of DNAJB12 and/or 14 that partially blocks protein reflux and p53 inhibition. Overexpression of WT DNAJB12, but not the J-domain inactive mutant, blocks etoposide-induced p53 activation (this is not replicated with DNAJB14) and ER-resident protein reflux. The authors then show that DNAJB12/14 interact with refluxed ER-resident proteins and cytosolic SGTA, which importantly, they show interacts with the ER-resident proteins AGR2, PRDX4 and DNAJB11. Finally, the authors show that inducing ER stress in cancer cell lines can increase proliferation (lost by etoposide treatment), and that this is partially dependent on DNAJB12/14.

      This is a very interesting paper that describes a nice mechanism linking ER-stress to inhibition of p53 and thus survival in the face of ER-stress, which is a double edged sword regarding normal v cancerous cells. The data is normally good, but the conclusions drawn oversimplify the data that can be quite complex. The paper opens a lot of questions that the authors may want to develop in more detail (non-experimentally) to work on these areas in the future, or alternatively to develop experimentally and develop the observations further. There are only a few experimental comments that I make that I think should be done to publish this paper, to increase robustness of the work already here, the rest are optional for developing the paper further.

      Major comments:

      1. Number of experimental repeats must be mentioned in the figure legends. Figures and annotations need to be aligned properly

      Results section 2: 2. No intro to the proteins you've looked at for relocalisation. Would be useful to have some info on why you chose AGR2. Apart from them being ER-localised, do they all share another common characteristic? Does ability to inhibit p53 vary in potency? 3. What are the roles of DNAJB12/14 if overexpression can induce reflux? Does it allow increased binding of an already cytosolic protein, causing an overall increase in an interaction that then causes inhibition of p53? What are your suggested mechanisms? 4. Fig3: A+B show overexpression of individual DNAJs but not combined. As you go on to discuss the effect of the combination on AGR2 reflux, it would be useful to include this experimentally here. 5. Fig 3C: Subfractionation of cells shows AGR2 in the cytosol of A549 cells. The quality of the data is good but the bands are very high on the blot. For publication is it possible to show this band more centralized so that we are sure that we are not missing bands cut off in the empty and H139Q lanes? Also, you have some nice immunofluorescence in the 2021 EMBO reports paper, is it possible to show this by IF too? It is not essential for the story, but it would enrich the figure and support the biochemistry nicely. Also it is notable that the membrane fraction of the refluxed proteins doesn't appear to have a decrease in parallel (especially for AGR2). Is this because the % of the refluxed protein is very small? Is there a transcriptional increase of any of them (the treatments are 12+24 h so it would be enough time)? This could be a nice opportunity to discuss the amount of protein that is refluxed, whether this response is a huge emptying of the ER or more like a gentle release, and also the potency of the gain of function and effect on p53 vs the amount of protein refluxed. This latter part isn't essential but it would be a nice element to expand upon. 6. You still mention DNAJB12 and 14 as orthologues, even though DNAJB14 has no effect on p53 activity when overexpressed. Do you think that this piece of data diminishes this statement? 7. Fig 3D/F: Overexpression of DNAJB14 induces reflux of DNAJB11 at 24h, what does this suggest? Does this indicate having the same role as DNAJB12 but less potently? What's your hypothesis? 8. "This suggests that the two proteins may have different functions when overexpressed, despite their overlapping and redundant functions" What does it suggest about their dependence on each other? If overexpression of WT DNAJB12 inhibits Tg induced reflux, is it also blocking the ability of DNAJB14 to permit flux? 9. Fig 4: PDI shown in blots but not commented on in text. Then included in the schematics. Please comment in the text. 10. Fig 4F: Although the quantifications of the blots look fine, the blot shown does not convincingly demonstrate this data for AGR2. The other proteins look fine, but again it could be useful to see the individual means for each experiment, or the full gels for all replicates in a supplementary figure. Results section 3 11. Fig 5A, As there is obviously a difference between DNAJB12/14 it would be useful to do the pulldown with DNAJB14 too. Re. HSC70 binding to DNAJB12 and 14, the abstract states that DNAJB12/14 bind HSC70 and SGTA through their cytosolic J domains. Fig 5 shows pulldowns of DNAJB12 with an increased binding of SGTA in FLAG-DNAJB12 induced conditions, but the HSC70 band does not seem to be enriched in any of the conditions, including after DNAJB12 induction. This doesn't support the statement that DNAJB12 binds HSC70. In fact, in the absence of a good negative control, this would suggest that the HSC70 band seen is not specific. There is also no data to show that DNAJB14 binds HSC70. I recommend including a negative condition (ie beads only) and the data for DNAJB14 pulldown. 12. The binding of DNAJB12 to SGTA under stress conditions in Fig5B looks much more convincing than SGTA to DNAJB12 in Fig 5A. Bands in all blots need to be quantified from 3 independent experiments, and repeated if not already n=3. If this is solely a technical difference, please explain in the text. The conclusions drawn from this interaction data are important and shold be elaborated upon to support th claims made in the paper. The authors may also chose to expand the pulldowns to demonstrate their claims made on olidomerisation of DNAJB12 and 14 here. It is also clear that the interaction data of the SGTA with ER-resident proteins AGR2, PRDX4 and DNAJB11 is strong. The authors may want to draw on this in their hypotheses of the mechanism. I would imagine a complex such as DNAJB14/DNAJB12 - SGTA - AGR2/PRDX4/DNAJB11 would be logical. Have any experiments been performed to prove if complexes like this would form? 13. Fig 5B: It is clear that DNAJB12 interacts with SGTA. The authors state that DNAJB14 also interacts with SGTA under normal and stress conditions, but the band in 25/50 Tg is very feint. Why would there be stronger binding at the 2 extremes than during low stress induction? In the input, there is a much higher expression of DNAJB14 in 50 Tg. What does this say about the interaction? Is there an effect of ER stress on DNAJB14 expression? A negative control should be included to show any background binding, such as a "beads only" control. 14. Fig 5C data is sound, although a negative control should be included. Results section 4 15. Fig 6A-B: Given that there is the complexity of overexpression v KD of DNAJB12 v 14 causing similar effects on p53 actvity (Fig 2 v 3), it would be interesting to see whether the effect of overexpression mirrors the results in Fig 6A. Is it known what SGTA overexpression does (optional)? 16. Fig 6D: resolution very low 17. Fig 6C-D: There is an interesting difference though between the proposed cytosolic actions of the refluxed proteins. You show that AGR2, PRDX4 and DNAJB11 all bind to SGTA in stress conditions, but in the schematics you show: DNAJB11 binding to HSC70 through SGTA (not shown in the paper), then also PDIA1, PDIA3 binding to SGTA and AGR2 binding to SGTA. What role does SGTA have in these varied reactions? Sometimes it is depicted as an intermediate, sometimes a lone binder, what is its role as a binder? It should be clarified which interactions are demonstrated in the paper (or before) and which are hypothesized in a graphical way (eg. for hypotheses dotted outlines or no solid fill etc). The schematics also suggest that DNAJB14 binding to HSC70 and SGTA is inducible in stress conditions, as is PDIA3, which is not shown in the paper. Discussion "In cancer cells, DNAJB12 and DNAJB14 oligomerize and recruit cytosolic chaperones and cochaperones (HSC70 and SGTA) to reflux AGR2 and other ER-resident proteins and to inhibit wt-p53 and probably different proapoptotic signaling pathways (Figure 5, and Figure 6C-6D)." You havent shown oligomerisation between DNAJB12/14. Modify the text to make it clear that it is a hypothesis. Minor comments: 18. It would be useful to have page or line numbers to help with document navigation, please include them. Typos and inconsistency in how some proteins are named throughout the manuscript 19. Title: Include reference to reflux. Suggest: "chaperone complexes (?proteins) reflux from the ER to cytosol..." I presume it would be more likely that the proteins go separately rather than in complex. Do you have any ideas on the size range of proteins that can undergo this process? 20. ERCY in abstract, ERCYS in intro. There are typos throughout, could be a formatting problem, please check 21. What about the selection of refluxed proteins? Is this only a certain category of proteins? Could it be anything? Have you looked at other cargo / ER resident proteins? 22. "Their role in ERCYS and cells' fate determination depends..." Suggest change to "Their role in ERCYS and determination of cell fate..." 23. I think that the final sentence of the intro could be made stronger and more concise. There's a repeat of ER and cytosol. Instead could you comment on the reflux permitting new interactions between proteins otherwise spatially separated, then the effect on wt-p53 etc.

      Figure legends:

      1. In some cases the authors state the number of replicates, but this should be stated for all experiments. If experiments don't already include 3 independent repeats, this should be done. Check text for typos, correct letter capitalisation, spaces and random bold text (some of this could be from incompatability when saving as PDF)
      2. Fig2E: scrambled not scramble siRNA
      3. Fig 3: "to expel" is a term not used in the rest of the paper for reflux. Useful to remain consistent with terminology where possible

      Results section 1:

      1. "Protein alignment of the yeast HLJ1p showed high amino acids similarity to the mammalian..."
      2. Fig 1C: state in legend which organism this is from (presumably human) Results Section 2
      3. "Test the two strongest hits DNAJB12/14" Add reference to previous paper showing this
      4. "In the WT and J-protein-silenced A549 cells, there were no differences in the cytosolic enrichment of the three ER resident proteins AGR2, DNAJB11, and HYOU1 in normal and unstressed conditions (Figure 2A-C and Figure S2C)." I think that this is an oversimplification, and in your following discussion, you show this it's more subtle than this.
      5. The text here isn't so clear: normal and unstressed conditions? Do you mean stressed? Please be careful in your phrases: "DNAJB12-silenced cells are slightly affected in AGR2 and DNAJB11 cytosolic accumulation but not HYOU1." This is the wrong way around. DNAJB12 silencing effects AGR2, not that AGR2 effects the cells (which is how you have written it). This also occurs agan in the next para:
      6. "During stress, DNAJB12/DNAJB14 double knockdown was highly affected in the cytosolic..." I think you mean it highly affected the cytosolic accumulation, not that it was affected by the cytosolic accumulation. Please change in the text
      7. "DNAJB12 and DNAJB14 are strong hits of the yeast HLJ1" Not clear, I presume you mean they are likely orthologues? Top candidates for being closest orthologues?
      8. Fig 2D: typos in WB labelling? I think Tm should be - - +, not - + +as it is now (if it's not a typo, you need more controls, eto alone.
      9. Fig 2D-E-F typos for DKD? D12/D12 or D12/14?
      10. "We assayed the phosphorylation state of wt- p53 and p21 protein expression levels (a downstream target of p53 signaling) during etoposide treatment." What are the results of this? Explain what Fig 2D-E shows, then build on this with the +Tm results. Results should be explained didactically to be clear.
      11. "(Figure 2D- E). Cells lacking DNAJB12 and or DNAJB14 have partial protection in wt-p53 phosphorylation and p21 protein levels."
      12. You comment on p53 phosphorylation, but you haven't quantified this. This should be done, normalized to p53 levels, if you want to draw these conclusions, especially as total p53 varies between condition. Does Eto increase p53 txn? Does Tm alone increase p53 activity/phospho-p53? These are shown in the Sicari EMBO reports paper in 2021, you should briefly reference those.
      13. Fig3A: overexpression of DNAJB12 decreases Eto induced p53 but not at steady state. Is this because at steady state the activity is already basal? Or is there another reason?
      14. Switch Figs S3D and S3C as they are not referred to in order. Also Fig S3C: vary colour (or add pattern) on bars more between conditions
      15. Need to define HLJ1 at first mention Results section 3
      16. HSC70 cochaperone (SGTA) defined twice
      17. "These data are important because SGTA and the ER-resident proteins (PRDX4, AGR2, and DNAJB11) are known to be expressed in different compartments, and the interaction occurs only when those ER-resident proteins localize to the cytosol." Is there a reference for this? Results section 4
      18. "by almost two folds"
      19. Fig 6A: It seems strange that the difference between purple and blue bars in scrambled, and D14-KD are very significant but D12-KD is only significant. Why is this? The error bars don't look that different. It would be interesting to see the individual means for each different replicate.
      20. Figures: Fig 6A: Scale bars not well placed. Annotation on final set should be D12/D14 DKD? Discussion
      21. The authors mention that they want to use DNAJB12/4-HSC70/SGTA axis to impair cancer cell fitness: What effect would this have though in a non cancer model? Would this be a viable approach? Although it is obviously early days, which approach would the authors see as potentially favourable?
      22. Second para: Should be "Here we present evidences"
      23. "DNAJB12 overexpression was also sufficient to promote ERCYS by refluxing AGR2 and inhibit wt-p53 signaling in cells treated with etoposide" Suggest:
      24. DNAJB12 overexpression was also sufficient to promote ERCYS by increasing reflux of AGR2 and inhibition of wt-p53 signaling in cells treated with etoposide
      25. "Moreover, DNAJB12 was sufficient to promote this phenomenon and cause ER protein reflux by mass action without causing ER stress (Figure 3, Figure 4, and Figure S3)." You dont look at induction of ER stress here, please change the text or explain in more depth with refs if suitable
      26. The mention of viruses is sparse in this paper. If it is a main theory, put it more centrally to the concept, and explain in more detail. As it is, its appearance in the final sentence is out of context.

      Referees cross-commenting

      I agree with the comments from Reviewer 1. Reviewer 2 also is correct in many ways, but I think that they have somewhat overlooked the relevance of the ER-stress element and treatments. The authors do need to reference past papers more to give a full story, as this includes the groups own papers, I don't think that it is an ethical problem but rather an oversight in the writing. Regarding reviewer 2's concerns about overexpression levels and cell death, the authors do use an inducible cell line and show the levels of DNAJB12 induced (could CRISPR also be considered?). This could be used to further address reviewer 2's concerns. It would also be useful to see data on cell death in the conditions used in the paper. Re concerns about ER integrity, this could be addressed by using IF (or EM) to show a secondary ER marker that remains ER-localised, and this would also be of interest regarding my comment on which categories of proteins can undergo reflux. If everything is relocalised, then reviewer 2's point would be validated.

      Significance

      General assessment: This paper robustly shows that the yeast system of ER to cytosol reflux of ER-resident proteins is conserved in mammalian cells, and it describes clearly the link between ER stress, protein reflux and inhibition of p53 in mammalian cells. The authors have the tools to delve deeper into this mechanism and robustly explore this pathway, however the mechanistic elements - where not instantly clear from the results - have been over interpreted somewhat. The results have been oversimplified in their explanations and some points and complexities of the study need to be addressed further to make the most of them - these are often some of the more interesting concepts of the paper, for example the differences in DNAJB12/14 and how the proteins orchestrate in the cytosol to play their cytosol-specific effects. I think that many points can be addressed in the text, by the authors being clear and concise with their reporting, while other experiments would turn this paper from an observational one, into a very interesting mechanistic one.

      Advance: This paper is based on previous nice papers from the group. It is a nice progressions from yeast, to basic mechanism, to physiological model. But as mentioned, without a strong mechanistic improvement, the paper would remain observatory.

      Audience: This paper is interesting to cell biologists (homeostasis, quality control and trafficking) as well as cancer cell biologists (fitness of cancer cells and homeostasis) and it is a very interesting demonstration of a process that is a double edged sword, depending on the environment of the cells.

      My expertise: cell biology, trafficking, ER homeostasis

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02491

      Corresponding author(s): Gilbert, Vassart

      1. General Statements [optional]

      We thank referees 1 and 2 for their in-depth analysis of our manuscript. They see interest in our study, with questions to be answered. Referee 3 is essentially negative, considering that there is nothing new ("novel finding is missing"). We respectfully disagree with him/her, comforted by the opinion of referee 2 that "the authors developed a protocol to reproducibly generate fetal-like spheroids from adult tissue is an important advancement in the field and ... the manuscript should attract a significant amount of attention in the intestinal field" and we provide evidence in our answers that he/she did not read the manuscript with the same attention as referees 1 and 2 (see in particular answer to his/her question 5).

      Here is a summary of the main reason why we consider that our study represents valuable new information in the field of intestinal regeneration.

      It is based on the serendipitous observation that dissociation of adult intestinal tissue by collagenase generates stably replatable spheroids upon culture in matrigel. Surprisingly and contrary to canonical EDTA-generated intestinal organoids and fetal spheroids, these spheroids are not traced in Rosa26Tomato mice harboring a VilCre transgene, despite expressing robustly endogenous Villin. Our interpretation is that adult intestinal spheroids originate from a cell lineage, distinct from the main developmental intestinal lineage, in which the VilCre transgene is unexpectedly not expressed, probaly due to the absence of cis regulatory sequences required for expression in this lineage.

      Adult spheroid transcriptome shares a gene signature with the YAP/TAZ signature commonly expressed in models of intestinal regeneration. This led us to look for VilCre negative crypts in the regenerating intestine of Lgr5/DTR mice in which Lgr5-positive stem cells have been ablated by diphtheria toxin. Numerous VilCre negative clones were observed, identifying a novel lineage of stem cells implicated in intestinal regeneration.

      FACS purification and scRNAseq analysis of the rare VilCre negative cells present at homeostasis identified a population of cells with characteristics of quiescent stem cells.

      In sum, we believe that our study demonstrates the existence of a hitherto undescribed stem cell lineage involved in intestinal regeneration. It points to the existence of a hierarchical model of intestinal regeneration in addition to the well-accepted plasticity model.

      2. Description of the planned revisions

      See section 3 below.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Here is a point-by-point reply to the queries of the three referees, with indication of the revisions introduced in the manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      • *In this manuscript, Marefati et al report an Lgr5-independent lineage in the regenerating intestine using in vitro organoids and in vivo injury-coupled lineage tracing model. In organoids, collagenase/dispase dissociated resulted in "immortal spheroids" that maintain a cystic and undifferentiated phenotype in the absence of standard growth factors (Rspondin/Noggin/EGF). Bulk RNAseq of spheroids demonstrates downregulation of classical CBC signatures and upregulation of fetal spheroid, mesenchymal, inflammation and regenerative signatures. In mice, Villin-Cre lineage tracing revealed some Villin- negative progenies that lack reporter tracing throughout crypt-villus ribbons after injury.

      *The authors proposed that there is Lgr5-independent population support the regenerative response upon CBC depletion. A major caveat of this study is the identification of this population is based on absence of VilCre expression. *

      We respectfully disagree. It is precisely this characteristic that makes the interest of our study. Whereas mosaicism of transgene expression is widespread and usually of little significance, our study shows that the rare VilCre-negative cells in the intestinal epithelium are not randomly showing this phenotype: they give specifically birth to what we call adult spheroids and regenerating crypts, which cannot be due to chance. The absence of VilCre expression allows tracing these cells from the zygote stage of the various VilCre/Ros26 reporter mice. We have modified our text to emphasize this point.

      *It is surprising that there is no characterisation of Lgr5 expression throughout the manuscript whilst claiming of a Lgr5- independent lineage. *

      We understand the perplexity of the referee not to see direct Lgr5 expression data in our manuscript, given our title. However, our point is that it is the cells at the origin of adult spheroids and the regenerating crypts we have identified that are Lgr5-negative, not the spheroids or the regenerated crypts themselves. Those are downstream offspring that may, and indeed have, gained some Lgr5 expression (e.g. figure 3F). We believe that our data showing that VilCre-negative spheroids are not traced in Lgr5-CreERT2/Rosa reporter mice convincingly demonstrate absence of Lgr5 expression in the cells at the origin of adult spheroids (figure 4G). We think that this experiment is better evidence than attempts to show absence of two markers (Tom and Lgr5) in the rare "white" cells present in the epithelium. Regarding the Lgr5 status of cells at the origin of the regenerating "white" crypts that we have identified, the early appearance of these crypts following ablation of CBC (i.e. Lgr5+ve) cells is a strong argument that they originate from Lgr5-negative cells. Regarding the scRNAseq experiment, Lgr5 transcripts are notoriously low and difficult to measure reliably in CBCs (Haber et al 2017). However, blowing up the pertinent regions of the merged UMAP allows showing some Lgr5 transcripts in clusters 5,6 and none in cluster 1 of figure 8GH. Given the very low level of detection, we had chosen not to include these data in the manuscript, but we hope they may help answer the point of the referee (see portion of UMAP below, with Olfm4 as a control, together with the corresponding violin plot). Several markers that gave significant signals in the CBC cluster (Smoc2, Axin2, Slc12a2) were virtually undetectable in the Olfm4-low /Tom-negative cluster of our scRNAseq data (figure 8I) supporting our conclusion.

      Although the research question is potentially interesting, the concept of epithelial reprogramming upon injury is well documented in the field. The data generated in this manuscript also seem to be preliminary and lack of detailed characterisation. Below are specific comments.

      We do not question the existence of epithelial reprogramming upon injury. We believe our data show, in addition to this well demonstrated phenomenon, the existence of rare cells traced by absence of VilCre expression that are at the origin of a developmental cell lineage distinct from Lgr5+ stem cells and also implicated in regeneration.

      • Expression of Lgr5 should be properly characterised throughout the manuscript in both organoid models and injury-induced regeneration in vivo.
      • *

      See above for a detailed answer to this point.

      • An important question is the origin of these "Lgr5-independent" adult spheroids. They look and appear like fetal organoids, which could be induced by injury (e.g. upon collagenase/dispase dissociation). Have the authors tried to culture fetal spheroids in BCM over extensive period of time? Do they behave the same? This would be a great way to directly compare the collagenase/dispase-derived organoids with fetal origin. * *Fetal spheroids require ENR for survival and die in BCM. We have chosen to illustrate this point in Fig2A by showing that, contrary to adult spheroid, they die even when only Rspondin is missing.

      • Fig 1C, Why is the replating spheroid culture time different between mesenchymal cells and conditioned medium? We took the earliest time showing convincingly the return to the organoid phenotype. This timing difference does not modify the conclusion that EDTA organoids becoming spheroid-like when exposed to factors originating from mesenchymal cells revert to the organoid phenotype when returned to ENR medium without mesenchymal influence.

      • *It is unclear how the bulk RNA-seq data in Fig. 3 were compared. How long were the adult organoids and spheroids cultured for (how many passages)? Were they culture in the same condition of were they in ENR vs BCM? * Both EDTA organoids and spheroids displaying a stable phenotype were used in this experiment. Organoids were collected at passage 4, day 5; spheroids were collected at passage passage 9 day 3.

      As stated in the legend to the figure: "...to allow pertinent comparison spheroids and organoids were cultured in the same ENR-containing medium...".

      These are important information to consider when interpreting the results. For instance, are Ptgs1 & Ptgs2 expression in adult spheroids the same in ENR vs BCM? Are the gene signatures (regenerative, fetal and YAP) changed in adult spheroids culturing in ENR vs BCM?

      We did compare bulk RNAseq of EDTA organoids to ENR-cultured spheroids, short term (passage 6, day 6) BCM-cultured spheroids and long term BCM-cultured (passage 26, day 6) spheroids. To avoid overloading the manuscript these data were not shown in the original manuscript. In summary the BCM-cultured spheroids display a similar phenotype as those cultured in ENR, but with further de-differentiation. See in revision plan folder the results for PTGS, some differentiation markers and fetal regenerative markers including YAP induced genes.

      We have included a brief description of these data in the new version of the manuscript and added an additional supplementary file (Suppl table 2) presenting the whole data set.

      • It is stated: "In agreement with their aptitude to grow indefinitely, adult spheroids express a set of upregulated genes overlapping significantly with an "adult tissue stem cell module" [159/721 genes; q value 2.11 e-94) (Fig.S2F)].". What is the definition of "indefinitely"? Are they referring to the Fig 1B where spheroid were passaged to P10? The authors should avoid the term "indefinitely" but use a more specific time scale, e.g. passages, months etc.

      We agree that the term indefinitely should be avoided, as it is vague. We have introduced the maximum number of passages during which we have maintained the stable spheroid phenotype (26 passages). Also worth noting, the spheroids could be frozen and cultured repeatedly over many months.

      SuppFig 3D: Row Z-Score is missing the "e" in Score.

      Corrected

      • Fig 4E: Figure legend says QNRQ instead of CNRQ. Corrected

      • Fig 4G: The brightfield image of adult spheroids 5 days after 3x TAM injections doesn't look like a spheroid. It seems to be differentiating. True, the choice was not the best as the spheroids started to darken. When further replated, however, the offspring of these spheroids showing a clear phenotype remain negative 30 days after tamoxifen administration as shown on the figure. We are sorry, but for reasons explained in section 4 below, we cannot redo the experiment to get a better picture.

      • Fig 4: Most mouse model data are missing the number of mice & their respective age used for organoid isolation. We have introduced these data in the legend.

      • *Fig 4A-D, H-G: How was fluorescent signal of organoids quantified? *

      The settings of fluo imaging or time of LacZ staining were the same for organoids and spheroid pictures. This has been added to the material and methods of the figure and an example is shown below for Rosa26Tomato.

      *How many images? * 2 per animal per condition.

      *Were there equal numbers of organoids? *

      No, see number of total elements counted added to the figure

      This all needs to be included in methods/figure legends.

      We have introduced additional pertinent information in the material and methods section.

      • Figure 4B-D, G-H: Which culturing conditions were used for adult spheroids? Original method or sandwich method? These data were obtained with the original protocol

      • Fig 6D-E: Please add the timepoint after DT administration these samples are from. It is not listed in text or figure legend. These samples were those obtained from mice sacrificed at the end of the 5 day period as indicated in panel A. This has been emphasized in the legend of the figure.

      • SuppFig 6D: again timepoint is missing. In this experiment all samples were untreated as indicated. This has been emphasized in the legend of the figure.

      • SuppFig 6: How were the crypts of these mice (DT WT & DT HE) isolated? Was this via EDTA? This was RNA extracted from total uncultured EDTA-released material (crypts). This has been emphasized in the legend of the figure.

      Also, what is the timepoint for isolation for these samples? Even if untreated, the timepoint adds context to the data. Please add more context to describing these different experiments, either in the figure legends or methods section.

      All these experiments were from 2 month old animals. We have indicated this in the legend of the figure.

      • SuppFig 6E: The quality of the heatmap resolution is too poor to read gene names. We have improved the resolution of the figure and hope the name of the genes are readable now.

      • 5-7, are the regenerating crypt-villus units fully differentiated or are they maintained in the developmental state? Immunostaining of markers for stem cells (Lgr5), differentiated lineages (Alpi, Muc2, Lyz, ChgA etc.) and fetal state (Sca1, Trop2 etc) should be analysed in those "white" unrecombined crypt-villus units. The differentiation phenotype is shown by the clear presence of morphologically-identified Paneth and Goblet cells. We agree that specific immunostainings could have been performed to further explore this point. Regarding the fetal state, Clu expression was shown during the regeneration period (see figure 7D,E).

      Unfortunately, for reasons explained in section 4 below, we are not in a position to perform these additional experiments.

      • The following text needs clarification: "The kinetics of appearance of newly formed un-recombined ("white") crypts was studied after a single pulse of DT (Fig.7A). This demonstrated an increase at 48 hours, with further increase at day 10 and stable maintenance at day 30. The presence of newly formed white crypts one month after toxin administration indicates that the VilCre-negative lineage is developmentally stable and does not turn on the transgene during differentiation of the various epithelial lineages occurring after regeneration (Fig.7B).

      *Comment: The "newly formed" is an overstatement, the data doesn't conclude that those are "new" crypts. *

      Except if we do not understand the point, we think we can write that a fraction of "white" crypts must be "newly formed", since they are in excess of those present in untreated animals at the same time point.

      *The end of the sentence states that these "white" crypts form developmentally stable lineages, thus these white crypts at day 30 could originate from the initial injury. *

      As stated above, we consider that crypts found in excess of those present in untreated animals result from the initial injury.

      *There was no characterisation of the various epitheial lineages. Are they fully differentiated? *

      See above the point related to Paneth cells and Goblet cells.

      Is Lgr5 expressed one month after toxin administration? Can the VilCre neg lineage give rise to CBCs?

      We have tried hard to show presence or absence of Lgr5 in white crypts at the various times following DT administration. We tried double RFP / Lgr5-RNA scope labeling and double GFP/RFP immunolabeling. Unfortunately, we could not get these methods to produce convincing specific labeling of CBCs in homeostatic crypts, which explains why we could not reach a conclusion regarding the white crypts.

      However, there is an indirect indication that "chronic" white crypts (i.e. those caused by DTR expression in CBC, plus those observed 30 days after DT administration) do not express Lgr5. Indeed, acute regeneration indicated by Clu expression at day 5 in Fig.7C is lower in white crypts than in red ones strongly suggesting that white crypts preexisting DT administration (the "chronic ones) do not express Lgr5DTR.

      The relationship between white crypt generation and appearance of Clu-positive revival cells (Ayyaz et al., 2019) was then explored. In agreement with others and similar to what happens in the irradiation model, (Ayyaz et al., 2019; Yuan et al., 2023) Clu-positive cells were rare in crypts of untreated mice and their number transiently increased forty-eight hours after a single pulse of DT, and more so after three pulses of DT (Fig.7C,D).

      Comment: Comparing 1 pulse at day 2 vs 3 pulses at day 5 makes the data hard to interpret. How is the Clu ISH level for 1 pulse at day 5? Are they equivalent?

      After a single pulse of of DT, Clu is only transiently increased. As shown by Ayyaz et al it is back to the starting point at day 5 (supplementary figure 4 of Ayyaz et al).

      Clu-positive cells were less frequently observed in white crypts (see "Total" versus "White" in Fig.7C). This fits with the hypothesis that Clu expression marks acutely regenerating crypts and that a proportion of the white crypts are chronically regenerating due to DTR expression in CBCs."

      *Comment: I believe the authors suggested that the discrepancy of less Clu expression in white crypts is due to the ectopic expression of DTR in CBCs causing low grade injury without DT administration. This means that some white crypts could have been formed before the administration of DT, and thus are on a different regenerative timeline compared to the white crypts formed from DT administration. *

      Yes, this is our interpretation. We have clarified it in the text.

      Is there any proof of the chronic regeneration? Immunostaining of chronic regenerative markers such as Sca1, Anxa1 or Yap1 nuclear localization would support the claim. It'd be important to show only the white crypts, but not the RFP+ ones, show regenerative markers.

      We think that the steady state higher number of white crypts in untreated Lgr5-DTR animals, compared to wild type siblings indicates chronical low-grade regeneration, which is supported by the RNAseq data (Suppl fig6). It must be noted, however, that this phenotype is mild compared to the well described fetal-like regeneration phenotype described in most injury models. Since these white crypts were made at undetermined earlier stages, the great majority of them are not expected to show markers of acute regeneration like Clu, Sca1....

      Fig 7D-E: What are the timepoints of harvest for HE-WT-HE 1 pulse DT mice and HE- HE-HE PBS injected mice?

      We have added this information in the figure.

      • *Fig 8-9: Regarding the CBC-like Olfm4 low population, what is the status of Lgr5? This should be shown in the figure since the argument is that this is an Lgr5-independent lineage. * See response to the second point.

      And what about the regenerative, Yap, mesenchymal and inflammatory signatures? Are they enriched in the white crypts similar to the in vitro spheroids?

      In a portion of white crypts, those we believe are newly formed after CBC ablation (see above), there is a transient increase in Clu, which may be considered a marker of Yap activation. In the CBC-like Olfm4 low cells, as seen by scRNAseq, there is nothing like an actively regenerating phenotype. This is expected, since these cells are coming from homeostatic untreated VilCre/Rosa26Tom animals and are supposed to be quiescent "awaiting to be activated".

      Reviewer #1 (Significance (Required)):

      Strengths: The study employed a range of in vitro and in vivo models to test the hypothesis.

      • *

      *Limitations: Unfortunately, the models chosen did not provide sufficient evidence to draw the conclusions. Injury induced reprogramming, both in vivo and in vitro, has been well documented in the field. The new message here is to show that such reprogrammed state is continuous rather than transient; instead of regenerating Lgr5+ stem cells, it can continue to differentiate to all cell lineages in Lgr5-independent manner.

      *

      We respectfully disagree with this analysis of our results. What we show is not "that such reprogrammed state is continuous rather than transient; instead of regenerating Lgr5+ stem cells, it can continue to differentiate to all cell lineages in Lgr5-independent manner", but that a quiescent stem cell line, not previously identified, is activated to regenerate a portion of crypts following CBC ablation. These cells are not reprogrammed, they correspond to a developmental lineage waiting to be activated and keep their VilCre-negative state at least of 30 days. We believe that their "by default tracing" (VilCre negative from the zygote stage) is as strong an evidence for the existence of such a lineage as positive lineage tracing would be. The increase in crypts originating from this lineage after CBC ablation indicates that it is implicated in regeneration. We do not question the well-demonstrated plasticity-associated reprogramming taking place during regeneration; we simply suggest that this would coexist with the involvement of the quiescent VilCre-negative lineage we have identified.

      *However, through the manuscript, there was no immunostaining of Lgr5 and other differentiation markers. The conclusion is an overstatement without solid proof. * We have provided the best answer we could to this point in our answer to the second question of the referee hereabove.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, the Marefati et al. developed a novel approach to generate spheroids from adult intestinal epithelium using a collagenase/dispase based protocol. Adult spheroids were found to be distinct from classic budding-type organoids normally generated from EDTA based release of the crypt epithelium. Transcriptional profiling indicated that adult spheroids were undifferentiated and similar to regenerating crypts or fetal spheroids. To identify the cell of origin that generates adult spheroids, the authors labelled epithelial cells with VilCreERT-LSL-Tom, VilCre-LSL-GFP and Lgr5CreERT- LSLTom mice. From these experiments the authors conclude that that spheroids are only generated from Vil-Cre negative and Lgr5 negative cells. Next the authors deleted the anti- apoptotic gene Mcl1 using Vil-CreERT mice. This led to a strong apoptotic response throughout the crypt epithelium and tissues processed from knockout mice readily generated spheroids, and in vivo, replenishment of the gut epithelium was mediated by unrecombined cells. In a second model, CBCs were ablated using Lgr5DTR mice and VilCre negative cells were found again to contribute to regeneration of the crypt epithelium. Finally based on the absence of Vil-Cre reporter activity, the authors were able to sort out and perform scRNAseq to profile VilCre negative cells. These cells were found to be quiescent, express the stem cell marker Olfm4 and were also abundant in ribosomal gene expression.

      • *

      The fact that the authors developed a protocol to reproducibly generate fetal-like spheroids from adult tissue is an important advancement in the field. Previous reports have shown that treatment with various small molecule inhibitors can revert budding organoids into a spheroid morphology, but this manuscript demonstrates that spheroids can also be generated from otherwise untreated cells. This new methodology will provide new tools to dissect the molecular determinants of fetal/regenerative cells in the gut. Based on this, the manuscript should attract a significant amount of attention in the intestinal field.

      • *

      As pointed out by the authors themselves the study has important limitations that diminish enthusiasm. The primary issue relates to the inability of the team to identify markers of VilCre neg cells other than the fact that these cells are Olfm4+ and quiescent. Nonetheless, for the reasons stated above the manuscript should reach the target audience within the research community, if the authors can address the specific points below related to issues with methodology as well as defining more precisely the characteristics and growth requirements of adult spheroid cultures.

      Thank you for this positive analysis of our study.

      Major comments

      The main conclusion of the study is that Vil-Cre neg cells are rare quiescent Olfm4+ crypt cells. If this is the case, then standard EDTA treatment should release these cells as well. Consequently, spheroids should also emerge from isolated crypts grown in the absence of ENR. If this is not the case how do the authors explain this?

      We have tried hard to generate spheroids by culturing EDTA organoids in medium lacking ENR and by treating EDTA organoids with collagenase/dispase, without success. Therefore, we are left with the conclusion that spheroid-generating cells must be more tightly attached to the matrix than those released by EDTA, and that it is their release from this attachment by collagenase that triggers a regeneration-like phenotype. This hypothesis is supported by several models of regeneration in other tissues as indicated in our references (Gilbert et al., 2010; Machado et al., 2021; Montarras et al., 2005).

      From the text the authors appear to suggest that growth of adult spheroids is dependent initially on "material" released by collagenase/dispase treatment. An obvious candidate would be mesenchymal cells, which are known to secrete factors such as Wnts and PGE2 that drive spheroid morphology. To test this, the authors should treat spheroid cultures with Porcupine and/or PGE2 inhibitors.

      We followed similar reasoning, considering that spheroids express strongly Ptgs1 ,2 (Figure 3A). We thought their phenotype might be maintained by autocrine prostaglandin action. We tested aspirin, a Ptgs inhibitor, which was without effect on the spheroid phenotype. Besides, we explored a wide variety of conditions to test whether they would affect the spheroid phenotype [Aspirin-see above, cAMP agonists/antagonists, YapTaz inhibitors (verteporfin and CA3), valproic acid, Notch inhibitors (DAPT, DBZ, LY511455), all-trans retinoic acid, NFkB inhibitors (TCPA, BMS), TGFbeta inhibitor (SB431542)]. As these results were negative, we did not include them in the manuscript.

      • If these inhibitors block growth then this would suggest that either stromal cells or autocrine signalling involving these pathways is important. Overall, more in-depth analysis of the growth requirements of adult spheroids is required.*

      Figure 1d indicates that adult spheroids can be propagated for at least 10 passages. The abstract mentions they are "immortal". The text itself does not address this issue. More precise information as to how long spheroids can be propagated is required. If these cultures can be propagated for 10 passages or more it becomes important to determine what nutrients/mitogens in the basal media are driving growth? Alternatively, what is the evidence that spheroid cultures are completely devoid of mesenchymal cells. The text only mentions that "Upon replating, these spheroids could be stably cultured free of mesenchymal cells (Fig.1B)". No validation is shown to support this.

      We agree that "immortal" is not a good way to characterize our spheroids, as also pointed out by referee nr 1. We have changed that in the text, indicating the maximal number of replating we tested was 26 and replacing immortal by stably replatable. Of note, the spheroids could frozen/thawed and recultured many times.

      Related to the question whether mesenchymal cells could still contaminate the spheroid cultures, we can provide the following answers:

      • No fibroblasts could be seen in replated cultures and multiple spheroids could be repeatedly propagated from a single starting spheroid.
      • The bulk RNAseq experiment comparing organoids to ENR or BCM cultured spheroids show, despite expression of several mesenchymal markers (see matrisome in Fig3), absence of significant expression of Pdgfra (see in revision plan folder for CP20Millions results from the raw data of new suppl table 2, with Clu, Tacstd2 and Alpi shown as controls).
      • Regarding the nutrients/mitogens in the medium driving spheroid growth, we did not explore the point further than showing that they grow in basal medium (i.e. advanced DMEM), given that the presence of Matrigel makes it difficult to pinpoint what is really needed. In Figure 2, the authors describe the growth requirements for adult spheroids and indicate that spheroids grown in ENR or EN became dark and shrink. The representative images showing this are clear, but this analysis should be quantified.

      Added to the manuscript.

      In SF3, the gene expression profile of organoids from the sandwich method only partially overlaps with that of organoids from the old protocol. What are the gene expression differences between the 2 culture systems? Secondly, the sandwich method appears to sustain growth of Tom+ spheroids based on RNAseq and the IF images. This suggest that Vil-Cre negative cells are not necessarily the only source of adult spheroids and thus this experiment seems to indicate that any cell may be converted to grow as a spheroid under the right conditions. These points should be addressed.

      Looking back to our data in order to answer the point raised by the referee, we realized that we had inadvertently-compared organoids to ENR-cultured spheroids generated by the first protocol to BCM-cultured spheroids generated by the sandwich method. We have corrected this error in a new version of suppl fig3. This shows increased correspondence between genes up- or downregulated in the spheroids obtained in the two protocols (from 49/48% to 57/57% (Venn diagram on the new figure). We agree that, even after this correction, the spheroids obtained with the two protocols present sizeable differences in their transcriptome. However, considering the very different way these spheroids were obtained and cultured initially, we do not believe this to be unexpected. The important point in our opinion is that the core of the up- and down-regulated genes typical of the de-differentiation phenotype of adult spheroids is very similar, as shown in the heatmap (which was made with the correct samples!). Also, a key observation is that that both kind of spheroids survive and can be replated in basal medium. As already stated, this characteristic is only seen rare cases [spheroids obtained from rare FACS-purified cells (Smith et al 2018) or helminth-infected intestinal tissue (Nusse et al.2018)]. Together with the observation that the majority of them is not traced by VilCre constitutes what we consider the halmark of the spheroids described in our study. As shown in figure 4E (old protocol) and Suppl Fig.3 (sandwich protocol) both red and white spheroids were extremely low in VilCre expression. As stated in the text, the fact that some spheroids are nevertheless red is most probably related to the extreme sensitivity of the Rosa26Tom marker to recombination (Liu et al., 2013), but this does not mean that there are two phenotypically different kind of spheroids. It means that the arbitrary threshold of Rosa26Tom recombination introduces an artificial subdivision of spheroids with no phenotypical significance.

      Regarding the point made by the referee that "that any cell may be converted to grow as a spheroid under the right conditions", we agree and have shown with others that organoids acquire indeed a spheroid phenotype when cultured for instance in fibroblasts-conditioned medium (see suppl fig1B and (Lahar et al., 2011; Roulis et al., 2020) quoted in the manuscript). However, these spheroids cannot be propagated in basal medium, and revert to an organoid phenotype when put back in ENR (Suppl fig1B).

      *In Figure 4, the authors conclude that spheroids do not originate from Lgr5 cell derived clones even after 30days post Tam induction. Does this suggest that in vivo and under homeostatic conditions VilCre neg cells are derived from a distinct stem cell pool or are themselves a quiescent stem cell. Given the rarity of VilCre neg cells, the latter seems unlikely.

      *

      Despite their rarity, we believe VilCre-negative cells observed under homeostatic conditions are themselves quiescent stem cells. Actually, if they were derived from a larger stem cell pool, this pool should also be VilCre-negative. And we do not see such larger number of VilCre-neg cells under homeostatic conditions.

      The problem with the original assertion is that Lgr5-CreERT mice are mosaic and therefore not all Lgr5+ cells are labelled in this model. "White" spheroids may thus derive from cells that in turn derive from these unlabelled Lgr5 cells.

      We had considered the possibility that mosaicism [very low for VilCre (Madison et al., 2002); in the 40-50% range for Lgr5CreERT2 (Barker & Clevers. Curr Protoc Stem Cell Biol. 2010 Chapter 5)] could explain our data. We think, however that we can exclude this possibility on the basis that spheroids do not conform to the expected ratio of unrecombined cells, given the observed level of mosaicism. Indeed, for VilCre, a few percent, at most, of unrecombined cells in the epithelium translates into almost 100% unrecombined spheroids. For Lgr5CreERT2 mice, the mosaicism level is in the range of 40%, which is what we observe for EDTA organoids (Figure 4G), while spheroids were in their vast majority unrecombined.

      We have included a discussion about the possible role of mosaicism in the new version.

      ATACseq experiments were briefly mentioned in the manuscript but unfortunately little information was extracted from this experiment. What does this experiment reveal about the chromatin landscape of adult spheroids relative to normal organoids?

      We only performed this experiment to search for an explanation to the paradoxical absence of expression of the VilCre transgene in spheroids, despite robust expression of endogenous villin (Suppl Fig.4). We chose to show the chromatin landscape of a gene equally expressed in both organoids and spheroids (Krt19), a gene specifically expressed in spheroids (Tacstd2) and the endogenous Villin gene also expressed in both. We believe that the observation of a clear difference in pattern of the chromatin accessibility around the endogenous villin gene in organoids and spheroids provides an explanation to the observed results. The cis regulatory sequences needed for expression of the endogenous villin gene seem to be different in organoids and spheroids, which may explain why the regulatory sequences present in the transgene (only 12.4kb) might not allow expression of the transgene in spheroids. We have added a sentence in the manuscript clarifying this point. Missing is obviously the chromatin landscape around the VilCre transgene, but this is beyond reach in such kind of experiments.

      Reviewer #2 (Significance (Required)):

      The fact that the authors developed a protocol to reproducibly generate fetal-like spheroids from adult tissue is an important advancement in the field. Previous reports have shown that treatment with various small molecule inhibitors can revert budding organoids into a spheroid morphology, but this manuscript demonstrates that spheroids can also be generated from otherwise untreated cells. This new methodology will provide new tools to dissect the molecular determinants of fetal/regenerative cells in the gut. Based on this, the manuscript should attract a significant amount of attention in the intestinal field.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): CR-2024-02491

      An Lgr5-independent developmental lineage is involved in mouse intestinal regeneration

      Marefati et al.

      Homeostatic maintenance of the intestinal epithelium has long been thought to rely upon Wnt signaling responsive Lgr5-expressing stem cells that reside at the crypt base.

      However, myriad reported mechanisms or populations have been reported to underlie epithelial regeneration after injury. Many groups have reported that reacquisition of a fetal- link intestinal phenotype is an import part of the regenerative response, however the originating cell type has not been definitively identified. Herein, the authors demonstrate that cells from adult homeostatic intestine can generate immortal spheroids that resemble fetal spheroids and are derived independent of Lgr5+ intestinal stem cells (ISCs). The authors then draw the conclusion that this indicates that a hierarchical stem cell model applies to regeneration of the intestinal epithelium, in addition to the plasticity model.

      • *

      Comments:

      1. Please indicate what species is used for studies in Fig 1.

      All experiments were performed in Mus musculus.

      Please clarify if Figure 2 studies utilize Matrigel or not.

      Yes

      RNA-seq analyses of adult intestinal generated spheroids lack the granularity of single cell analyses and thus it is unclear if this is a homogeneous population or if the population has diversity across it (i.e., enteroids/organoids have a high level of diversity). Many of the conclusions from the RNA-seq study are broad and generalized-for example Fig 3F indicates that markers of the +4 ISC populations (Bmi1, tert, lrig1, hopx) were all expressed similarly in adult spheroids as compared to adult organoids. However, while this may be true in the bulk-RNA-seq analyses, clearly scRNA-seq would provide a better foundation to make this statement, as enteroids/organoids are comprised of heterogeneous subpopulations. . .and it might indicate that these +4 markers have only very low expression in the spheroids. Based upon these concerns, misconclusions are likely to be drawn.

      We agree and it would be certainly worthwhile to perform scRNAseq of adult spheroid populations. This would certainly be worth doing in future studies to explore the possible heterogeneity of adult spheroids. We nevertheless believe that our scRNAseq performed on homeostatic intestinal tissue from VilCre/Rosa26Tom mice identify Olfm4-low VilCre-neg cells that are likely at the origin of adult spheroids and display a quite homogenous phenotype.

      *The language around Figure 4 results is confusing. Please define "white" and "red". It might be simpler to designate recombined versus not recombined lineage.

      *

      We have clarified this in the figure.

      The hypothesis that collagenase/dispase solution acts as a proxy for injury is not demonstrated and backed by data. Thus, it is difficult to make the conclusion that this approach could represent a "stable avatar" of intestinal regenerating cells. It is clear that subpopulations of crypt-based cells generate spheroids in culture without collagenase/dispase (see the cited reference Smith et al, 2018).

      * *Smith et al demonstrate clearly the possibility to obtain spheroids with properties probably similar to ours from EDTA derived intestinal crypt cells. However they need to prepurify them by FACS. Besides, Nusse et al describe spheroids similar to ours after infection of the intestine by helminths (Nusse et al. 2018). In our case, and for most labs preparing enteroids with the EDTA protocol, the result is close to 100% organoids. Even if we treat EDTA organoids with collagenase, we do not obtain spheroids. This brought us to the conclusion that spheroid-generating cells must be more tightly attached to the matrix than CBCs and that it is their release from the matrix that activates the spheroid regeneration-like phenotype. This hypothesis is supported by several models of regeneration in other tissues as indicated in our references (Gilbert et al., 2010; Machado et al., 2021; Montarras et al., 2005)

      A study based on the absence of recombination in a VilCre lineage tracing scenario is not well-established to be strong experimental approach, as there are many reasons why recombination may not cells may not be lineage marked. In order to use this system as the authors intend, they first need to demonstrate that villin is not expressed in the discrete cell population that they are targeting. For the presented observational studies, this would be difficult to do. While they do demonstrate differences in chromatin accessibility between cells from organoids versus spheroids (fig s4), some of these differences could merely be due to the bulk analytical nature of the study and the lack of comparing stem cell populations from spheroids to stem cell populations from organoids-since the spheroids are likely homogenous versus the organoids that only have a small fraction of stem cells-and thus represent a mix of stem cell and differentiated cell populations. The authors do not demonstrate that villin protein expression varies in these cells.

      If it were found that villin is not expressed in their "novel" population, then one would expect that the downstream use of villin-based recombination would demonstrate the same recombination potential (i.e., Mcl1 would not be recombined). Both recombination studies in Fig 6 are difficult to interpret, and thus it is not clear if these studies support the stated conclusions. Quantification of number of crypts that are negative should be reported as a percentage of recombined crypts.

      We are sorry but there seems to be a complete misunderstanding of our data regarding the point raised by the referee. The important point of our initial observation is that despite robust expression of villin in spheroids, the VilCre transgene is not expressed (see figure 4E). This in our opinion makes absence of VilCre expression (or of Rosa marker recombination) a trustful marker of a new developmental lineage. All the data in figure 4 constitute an answer.

      *The reasoning about heterogeneity of cell type in organoids versus probable homogeneity of spheroids is well taken. However, as the endogenous villin gene is expressed in all cells of both organoids and spheroids, it is highly significant that only spheroids do not express the transgene. *

      We performed the ATACseq experiment to search for an explanation to the paradoxical absence of expression of the VilCre transgene in spheroids, despite robust expression of endogenous villin (Suppl Fig.4). We chose to show the chromatin landscape of a gene equally expressed in both organoids and spheroids (Krt19), a gene specifically expressed in spheroids (Tacstd2) and the endogenous Villin gene also expressed in both. We believe that the observation of a clear difference in pattern of the chromatin accessibility around the endogenous villin gene in organoids and spheroids provides an explanation to the observed results. The cis regulatory sequences needed for expression of the endogenous villin gene seem to be different in organoids and spheroids, which may explain why the regulatory sequences present in the transgene (only 12.4kb) might not allow expression of the transgene in spheroids. We have added a sentence in the manuscript clarifying this point. Missing is obviously the chromatin landscape around the VilCre transgene, but this is beyond reach in such kind of experiments.

      *Figure 8 indicates that the cell population identified by scRNA-seq may be quiescent. Companion IF or IHC should be conducted to confirm this finding, as well as other conclusions from the informatics conducted.

      *

      We agree that additional experiments could be performed to support this point. We are unfortunately not in a position to perform these experiments (see section 4 below).

      Clearly the data is intriguing, however, the conclusion is strong and is an over interpretation of the presented data. There are a number of validation or extension data that would enhance the overall interpretation of the study: 1. validation of scRNA-seq or bulk RNA-seq concepts by protein staining of intestinal tissues in the damage model will serve as a secondary observation. 2. identification of the ISC that they are defining is critical and important. There is already the notion that this cell type exists and it has been shown with various different markers. 3. expand the analyses of the fetal-like expression profiling to injured intestines to demonstrate that the lineage negative cells indeed express fetal-like proteins. 4. expand the discussion of the Clu+ cell type. Is this cell the previously described revival cell? If so, how does this body of work provide unique aspects to the field?

      We agree that all these suggested experiments could be performed and would be of interest. However, we consider that they would not modify the main message of our study and would only constitute an expansion of the present work. As already stated, we are not in the position to perform them (see section 4).

      *There is some level of conflicting data, with the stem population being proliferative in culture stimulated by the stromal cells, but quiescent in vivo and also based upon scRNA- seq data in Fig 9.

      *

      We do not see any conflict in our observation regarding this point. The observation that cells that are quiescent in vivo become proliferative when subjected to culture (with or without addition of stromal cells) is routinely made in a multitude of cell culture systems. In particular, it has been shown that intestinal tissue dissociation activates the Yap/Taz pathway, resulting in proliferation (Yu et al. Hippo Pathway Regulation of Gastrointestinal Tissues. Annual Review of Physiology, 2015 Volume 77, 201-227).

      Many of the findings have been previously reported: Population that grows as spheroids (Figure 2), Population that is Wnt independent (Figure 2), Lgr5 independent regenerative growth of the intestine (figure 3F, Figure 4), Clu+ ISCs drive regeneration (Figure 7).

      Whereas these individual findings have indeed been reported, it was in a different context. We strongly disagree with the underlying suggestion that our study would not bring new information. We have identified here a developmental lineage involved in intestinal regeneration that has not been described up to now.

      Minor comments:

        • The statement that spheroids must originate from collagenase/dispase digested material might be an overstatement. As spheroids generation from EDTA treated intestines have been previously reported (Smith et al, 2018). * See answer to point 4 above. *Overall while the study includes an extensive amount of work and different approaches, a foundationally supported novel finding is missing. Many of the statements have already been demonstrated by others in the fields. In addition, one of the most intriguing aspects of the study is that the stromal population impacts this stem cell population, however, interactions and factors stimulating the crosstalk are not addressed.

      *

      Reviewer #3 (Significance (Required)):

      Overal while the study includes an extensive amount of work and different approaches, a foundationally supported novel finding is missing. Many of the statements have already been demonstrated by others in the fields. In addition, one of the most intriguing aspects of the study is that the stromal population impacts this stem cell population, however, interactions and factors stimulating the crosstalk are not addressed.

      We can only disagree.

      4. Description of analyses that authors prefer not to carry out

      • *

      We have answered most questions raised by the referees by explaining our view, by clarifying individual points and, in several cases, by providing additional information that was not included in the original manuscript.

      In a limited number of cases when additional experiments were suggested, we were unfortunately obliged to write that we are not in a position to perform them. This is because my lab is closing after more than fifty years of uninterrupted activity. There will unfortunately be nobody to perform additional experiments.

      Nevertheless, as written by referees 1 and 2, we believe that the revised manuscript, as it stands, contains data that will be of interest to the people in the field and may be the bases for future developments. We hope editors will find interest in publishing it.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-02306

      Corresponding author(s): John, Yates

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      • *

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      We greatly appreciate the reviewers taking time from their busy scientific careers to evaluate our manuscript. We were elated to read all the positive comments, such as “the conclusions are well-supported and convincing”, “should contribute to a more nuanced understanding of SCZ pathogenesis”; “The potential implications for drug development underscore the broader significance of the study in advancing our knowledge of neurobiology and its relevance to neurological disorders like schizophrenia”, and “The study is informative, and has great potential to enrich the specific literature of this field”. We also found the constructive criticism very helpful for improving our manuscript. We performed additional experiments and bioinformatic analyses, as requested. We modified the manuscript to answer the reviewers’ questions. Due to its complexity, it is difficult to describe the different and sometimes conflicting hypotheses of SCZ pathogenesis in a single manuscript. This complexity is reflected in the conflicting requests from the reviewers. One reviewer requested we investigate and highlight the role of non-neuronal cells in SCZ while another reviewer suggested we did not focus enough on synaptic proteins. We believe we have achieved a balance to represent the intricacy of SCZ biology and the different opinions of the reviewers.

      Thanks again.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      • *

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate). In this manuscript, McClatchy and colleagues used a conventional approach combining immunoprecipitation (IP) of endogenous target proteins (baits) followed by liquid chromatography mass spectrometry (MS) analysis of the co-immunoprecipitating proteins to map protein-protein interaction (PPI). This interaction network is centered around baits that had been annotated as susceptibility factors for schizophrenia (SCZ). A variety of previous studies have identified thousands of such SCZ susceptibility factors. Mostly based on the availability of antibodies, 8 bait proteins were selected in this study. The authors reasoned that immunoprecipitating endogenous proteins from tissues using specific antibodies was a more accurate view of physiological conditions than epitope tagging followed by affinity purification (AP) from cells in culture. The model system from which proteins were extracted was the hippocampus dissected from mice that had been treated or not by phencyclidine (PCP), a drug that has been shown to induce SCZ symptoms in humans and animals. By comparing the proteins identified and quantified from the PCP-treated samples against control IPs and/or saline-injected mouse controls, a large number of PPI were deemed statistically significant. Most of these potential interactors were not present in PPI databases (BioGRID), most likely because such databases are populated with large-scale APMS datasets from cell cultures, with very few studies using brain tissue. Strikingly, many of the co-immunoprecipitated proteins were also known as SCZ susceptibility factors, which lend weight to the hypothesis that these factors form a large protein interaction network, localized at the synapses.

      Major comments: - Are the key conclusions convincing? Overall, the conclusions drawn from the experimental design, data analysis, and corroboration with existing literature are well-supported and convincing. When selecting the SCZ susceptibility factors, the authors clearly state their goal, the databases used for gene selection, and the rationale for choosing proteins with synaptic localization. The inclusion of evidence from genetic studies and previous publications strengthens the credibility of the selected genes. The methodology used to establish the novel SCZ PPI network is mostly well-described (see minor comments below). The use of an 15N internal standard also adds rigor to the quantitation of PPI. The GO enrichment analysis provides valuable insights into the biological functions and cellular components associated with the SCZ PPI network. The annotation of identified proteins using the SynGo synaptic database and the distribution of annotated synaptic proteins among different baits further support the biological relevance of this PPI network. The cross-referencing of the PPI network with published genetic studies on SCZ susceptibility genes adds robustness to the findings. Specifically, the observation that 68% of protein interactors have evidence of being potential SCZ risk factors is a strong corroboration of the prevailing hypothesis in the field. Finally, the significant changes induced by PCP that were identified for all baits except Syt1, along with the comparison of altered proteins with SAINT-identified PPI, add depth to the understanding of PCP modulation.

      - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? No, but note that APMS/IPMS has been around for more than a decade (Introduction page 3).

      We agree and did not mean to imply that IP-MS is new technology. We tried to convey that IP-MS is not new technology, but the number of IP-MS studies employed to study the PPI of endogenous proteins in brain tissue is a small percentage of all the published PPI MS studies.

      We added the following to the Conclusions to clarify this point: “Although IP-LC-MS technology has been employed for more than a decade, quantitation of proteins using this strategy in mammalian tissue is scarce in the literature.”

      - Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. One piece of data that is missing are Western blots using the 8 selected antibodies against the proteins extracted from their experimental samples to validate the antibodies recognize 1 protein of the expected size from these tissue extracts.

      We took your suggestion and performed immunoblots with our 8 IP antibodies using the starting material (i.e. rat brain hippocampus). All antibodies recognized a single band of the approximate molecular weight of the target except for the Gsk3b, which produced a doublet instead of a single band. This image is similar to what has been observed with the phosphorylation of Gsk3b(Krishnankutty, Kimura et al. 2017, Vainio, Taponen et al. 2021). To provide evidence that the additional band observed for Gsk3b is the phosphorylated target protein, we searched our Gsk3b IP dataset for a differential phosphorylation (i.e. 79.9663) on S,T, or Y. Even though we did not perform phosphorylation enrichment, we identified S389 as abundantly phosphorylated in all Sal and PCP samples consistent with our immunoblot. Images of these immunoblots are now Supplementary Figure 1.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. Running SDS-PAGE and Western blotting should be straightforward and cheap.

      - Are the data and the methods presented in such a way that they can be reproduced? Yes

      - Are the experiments adequately replicated and statistical analysis adequate? Yes

      Minor comments: - Specific experimental issues that are easily addressable. The rationale for the short duration between PCP injection and animal sacrifice is only explained in the discussion section (page 17). The fact that this short treatment of less than 30 min should prevent any change in transcription or translation should be introduced earlier (in the experimental procedures).

      We agree this is an important aspect of the study and that it suggests that the effect of PCP is independent of changes in transcription and translation as stated in the Discussion.

      We added the following to the Introduction:

      “PCP was administered for less than 30min., which precluded any changes in transcription or translation and allowed us to focus on PPI.*” *

      Note that the duration is written as 26 min on page 4 and 25 min on page 9. Please reconcile these numbers*. *

      We have corrected this typo. It was 26min.<br /> Is there any biological significance for this SCZ study that the mice were maintained on a reverse day-night cycle?

      Rats are nocturnal animals, i.e. active at night and sleep during the day. In this study, rats were housed on a reverse day-night cycle so that assessment of the response to PCP could be evaluated during their active phase. This is not specific SCZ research and is the routine protocol for behavioral testing in the Powell laboratory. It is not clear from reading Experimental Procedures/Bioinformatic Analysis section (page 6) if normalized N14/N15 protein ratios measured in the bait-IPs and control-IPs were used for the SAINT analysis? Or did the authors used label-free quantitation with spectral counts?

      We apologize for not making the methods clearer. In the results, it is stated that the N14 identifications are used in the SAINT analysis, and we state in the Discussion that SAINT uses spectral counts. We modified the Experimental Procedures/Bioinformatic Analysis section (page 6) to state: The input for SAINT was only the 14N identifications.

      *- Are prior studies referenced appropriately? Yes

      • Are the text and figures clear and accurate? *Fig1C: The workflow is a little too simple, the authors might want to add more details.

      We revised Fig1C with more details as suggested.

      FigS1C: Please add x-axis title (spectral counts) directly to the figure.

      “Spectral counts” was added to the x-axis. FigS1C is now FigS2C ,with the addition of the immunoblots you suggested. Fig2B-D: The color scale bar should have number values to denote lower and upper limits in % (as opposed to "lowest" and "highest"). Numerical values were added to replace the upper and lower limits. - Do you have suggestions that would help the authors improve the presentation of their data and conclusions? No * *

      Reviewer #1 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. In this study, the authors have drastically expanded the protein interaction landscape around 8 known SCZ susceptibility factors by using a conventional IPMS approach. Performing the IPs on protein extracted from hippocampus dissected from mice treated with phencyclidine to model SCZ increases the biological significance of such lists of proteins. Furthermore, the co-immunoprecipitation of many other SCZ susceptibility factors along with the 8 selected baits supports the hypothesis that these proteins of varied functions are part of large interaction networks. Overall, the integration of experimental data with in silico networks, along with the quantification of PPI changes in response to PCP, should contribute to a more nuanced understanding of SCZ pathogenesis. The potential implications for drug development underscore the broader significance of the study in advancing our knowledge of neurobiology and its relevance to neurological disorders like schizophrenia.

      • Place the work in the context of the existing literature (provide references, where appropriate). Overall, this study contributes to the existing literature by providing experimental data on in vivo PPI networks related to SCZ risk factors. Not only do the authors validate 124 known interactions but also they identify many novel PPI, due to a gap in the existing literature regarding the comprehensive mapping of PPI directly from tissue extracts, especially brain tissue. The authors advocate for more IPMS studies in mammalian tissues to generate robust tissue-specific in silico networks, which agrees with the growing understanding of the importance of tissue-specific networks for identifying disease mechanisms and potential drug targets. Furthermore, the SCZ PPI network reported here is enriched in proteins previously associated with SCZ, which aligns with the existing literature emphasizing the involvement of certain proteins and pathways in the pathogenesis of SCZ [References: 78-85]. The authors also investigate the response of the SCZ network to PCP treatment, hence providing insights into the potential effects of post-translational modifications, protein trafficking, and PPI alterations in a model of schizophrenia, which adds to existing knowledge about the impact of PCP on the molecular processes associated with SCZ [References: 88, 89, 92].

      • State what audience might be interested in and influenced by the reported findings. Overall, the findings reported in this manuscript have implications for both basic research in molecular biology and potential translational applications in the development of targeted therapies for neurological disorders, particularly schizophrenia. The study delves into in vivo protein-protein interaction (PPI) networks related to genes implicated in schizophrenia (SCZ) risk factors. Researchers in neuroscience, molecular biology, and psychiatry would find the information valuable for understanding the molecular basis of SCZ. The study highlights the potential for identifying disease "hubs" that could be drug targets. Pharmacologists and drug developers interested in targeting protein complexes for drug development, especially in the context of neurological disorders, may find the study relevant.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. Technical Expertise | biochemistry, liquid chromatography mass spectrometry, proteomics, computational biology, protein engineering, protein interaction networks, post-translational modifications, protein crosslinking, proximity labeling, limited proteolysis, thermal shift assay, label-free and isotope-labeled quantitation. Biological Applications | human transcriptional complexes, apicomplexan parasites, viruses, nuclear envelope, ubiquitin ligases, non-model organisms.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: McClatchy, Powell and Yates aimed at identifying a protein interactome associated to schizophrenia. For that, they treated rats (N14 and N15) with PCP, which disturbs gutamatergic transmission, as a model for the disease and co-immunoprecipitated hippocampi proteins, which were further analyzed by standard LC-MS.

      The study is new, considering not much has been done in this direction in the field of schizophrenia. This justifies its publication. On the other hand, a major flaw of the is the lack of information on the level of interaction of the so called protein interactome. Meaning, we cannot distinguish, as the study was performed, which proteins are directly interacting with the targets of interest from proteins which are interacting with targets´ interactors. The different shells of interaction are crucial information in protein interactomics.

      Major: most of I am pointing below must be at least discussed or better presented in the paper, as It may not be solvable considering how the study has been conducted.

      1) The study fails in defining the level of interaction of the protein interactome with the considered targets. This has been shortly mentioned in the discussion, but must be more explicit to readers, for instance, in the abstract, introduction and in the methods sections. We agree this is crucial information that is absent from our dataset. As we explained in the Discussion, we cannot distinguish between PPI that are direct interactors with the target protein and PPI that reside in a multi-protein complex that includes the protein (i.e. indirect). This is an inherent problem with any IP-MS study. We amended the Introduction to highlight the ambiguity of the interaction data produced by the IP-MS approach, as you suggested.

      Text added to the Introduction:

      “Regardless of whether Ab or tagged proteins are employed to identify PPI from a biological sample, it cannot be determined if the identified interactor binds directly to the target protein or reside in a complex of proteins that includes the target protein (i.e. indirect).”

      Since this important information is routinely missing from IP-MS studies, we decided to try to determine the level of interaction by using the artificial intelligence algorithm AlphaFold3(AF3). We believe it is not yet optimized for PPI, but AF3 is a big leap forward in the field of structural biology. For example, we observed AF3 did not predict high confident structures for our large membrane target proteins and was unable to validate known direct PPI of these targets. In addition, analyzing data with AF3 is currently not automated or streamlined so with ~1600 PPI identified in our dataset, we chose to look at one target protein, Ppp1ca. AF3 identified many known direct binding proteins in our Ppp1ca PPI dataset, which gives high confidence to the novel PPI predicted to be direct interactors. The AF3 data is encompassed in an additional Figure 6.

      The following was added to the Results Section:

      “A disadvantage of IP-MS studies is that it cannot distinguish between a PPI that binds directly to the target protein, and a PPI in which the interactor and target protein reside the same multiprotein complex (i.e. indirect). We sought to predict which PPI may be directly interacting with its target protein by using the artificial intelligence algorithm AlphaFold3(AF3) (Abramson, Adler et al. 2024). First, we analyzed the predicted AF3 structure of the targets using the pTM score and the fraction of each structure calculated to be disordered (Figure 6A and Supplementary Table7). Our reasoning was that if targets have a poorly resolved structures, it will be difficult to screen them for direct PPI. A pTM score >0.5 suggests that the structure may be correct (the highest confidence score is 1). Undefined or disordered regions hinder the accuracy of the prediction. All targets possessed a pTM score > 0.5 except Syt1. The disordered fraction negatively correlated with the pTM score, as expected. Gsk3b, Ppp1ca, and Map2k1 had the highest pTM scores and were also the smallest of our target proteins (Figure 6B). Ppp1ca had the most confident structure (i.e. pTM 0.9) and the smallest disordered fraction (i.e. 0.07). Next, we determined the AF3 prediction of previously reported direct interactions of the targets. We used the iPTM score to determine interaction confidence. An iPTM score >0.8 is considered a highly confident direct interaction, whereas 0.8. These eight PPI have all previously been reported to form a direct interaction with Ppp1ca, except Phactr3 (Zhang, Zhang et al. 1998, Terrak, Kerff et al. 2004, Hurley, Yang et al. 2007, Marsh, Dancheck et al. 2010, Ragusa, Dancheck et al. 2010, Ferrar, Chamousset et al. 2012, Choy, Srivastava et al. 2024, Xu, Sadleir et al. 2024)*. Phactr3 is structurally similar to, but less studied than, the reported direct interactor Phactr1. These interactors are all inhibitors of PP1 except Ppp1r9b which targets Ppp1ca to specific subcellular compartments. Nine PPI were assigned a score The following has been added to the Discussion:

      Our SCZ PPI network consists of two types of PPI: direct physical interactions and “co-complex” or indirect interactions. Typically, the nature of the interaction can be distinguished in IP-MS studies. We decided to employ the new AF3 algorithm to screen the PPI of Ppp1ca to provide evidence for direct interactors. We chose to examine the PPI assigned to Ppp1ca, because its structure was the most confident among our target proteins and AF3 correctly predicted a known direct interactor with high confidence. Ppp1ca is a catalytic subunit of the phosphatase PP1, which is required to associate with regulatory subunits to create holoenzymes (Li, Wilmanns et al. 2013). Eighteen PPI were predicted to be directly interacting with Ppp1ca using a 0.6 or higher iPTM filter. This filter may be too conservative and generate false negatives, because another study employed a 0.3 filter followed by additional interrogation to screen for direct PPI (Weeratunga, Gormal et al. 2024). Forty-four percent of these predictions were confirmed by previous publications. Most of the validated direct interactions are inhibitors of the phosphatase, but one, Ppp1r9b (aka spinophilin), is known to target Ppp1ca to dendrite spines to enhance its activity to specific substrates (Allen, Ouimet et al. 1997, Salek, Claeboe et al. 2023). This high correlation with the literature provides substantial confidence in the novel PPI predicted to be direct Ppp1ca interactors. The AF3 screen predicted that NDRG2 directly interacts with Ppp1ca. This protein is known to regulate many phosphorylation dependent signaling pathways by directly interacting with other phosphatases including Pp1ma and PP2A (Feng, Zhou et al. 2022, Lee, Lim et al. 2022). Actin binding protein Capza1 was also predicted to directly interact with Ppp1ca and Ppp1ca interacts with actin and its binding proteins to maintain optimal localization for efficient activity to specific substrates (Foley, Ward et al. 2023). Hsp1e is a heat shock protein predicted to directly interact with Ppp1ca. Although there is no direct connection to Ppp1ca, other heat shock proteins have been reported to regulate Ppp1ca (Mivechi, Trainor et al. 1993, Flores-Delgado, Liu et al. 2007, Qian, Vafiadaki et al. 2011). We also observed that many of these direct PPI were altered with PCP treatment. One direct interactor, Ppp1r1b (aka DARPP-32), is phosphorylated at Thr34 by PKA in the brain upon PCP treatment. This phosphorylation event converts Ppp1rb to a potent inhibitor of Ppp1ca(Svenningsson, Tzavara et al. 2003). Importantly, manipulation of Thr34 attenuated the behavioral effects of PCP. Consistent with this report, Ppp1r1b-Ppp1ca interaction was only observed with PCP in our study. Further investigation is needed to determine if our novel direct interactors regulate the PCP phenotype. We conclude that AF3 can provide important structural insights into the nature of PPI obtained from large scale IP-MS studies.

      2) Considering the protein extraction protocol, it is fair to mention that only the most soluble proteins are being considered here. I am bringing this up since the importance of membrane receptors is clear in the studied context. This is an interesting point. It has been predicted that transmembrane proteins constitute 25-30% of the proteome(Dobson, Remenyi et al. 2015). Thus, we would predict our dataset will have more soluble proteins than membrane proteins. Half of our target proteins were transmembrane proteins, so in designing the protocol for this study we ensured that these membrane proteins could be significantly enriched compared to the control IPs (Supplementary Figure 2C). In addition, compared to soluble proteins, membrane proteins are notoriously difficult to identify by bottom-up proteomics (Savas, Stein et al. 2011). We decided to investigate how many of our protein interactors were transmembrane proteins. Using Uniprot, 199 (20%) of our protein interactors were determined to have a transmembrane domain. Therefore, this data does not support the statement that only the most soluble proteins are being considered in our study. We added this percentage of transmembrane proteins in our network to the text of the Results section.

      3) It is not clear from the methods description if antibodies from all 8 targets were all together in one Co-IP or have been incubated separately in 8 different hippocampi samples. It seems the first, given how results have been presented. If so, this maximizes the major issue raised above (in 1). We apologize for not clearly describing our experimental design. All the targets were immunoprecipitated separately and analyzed separately on the mass spectrometer. With all the biological replicates and two conditions (i.e. Saline and PCP), we performed 48 individual, separate IPs. There were an additional 48 individual, separate IPs run in parallel that were the control IPs.

      We modified the schematic of our experimental design in Figure 1C to clarify that the 8 targets IPs were analyzed separately. In addition, we modified the Results to read:

      “In total, 96 (48 bait and 48 control) IPs were performed, and each was analyzed separately by LC-MS analysis.”

      4) Definitely, results here are not representing a "SCZ PPI network". PCP-treated animals, as any other animal model, are rather limited models to schizophrenia. As a complex multifactorial disease, synaptic deficits, which is the focus of this study, can no longer be considered "the pivot" of the disease. Synaptic dysfunction is only one among many other factors associated to schizophrenia.

      We do agree that synaptic dysfunction is only one factor associated with SCZ and we will discuss this more in our response to your next comment.

      We understand the limitations of PCP as an animal model of SCZ. It is quite difficult to model a specific human complex multifactorial neurological disease in rodents and we would contend that there is no single universal SCZ model that everyone agrees with. We addressed this by adding the following to the Introduction:

      Since many SCZ symptoms are uniquely human, this is no single animal model that truly replicates all the complex human SCZ phenotypes(Winship, Dursun et al. 2019). In this respect, all SCZ animal models can be considered limited.* “ *

      We respectfully disagree, however, with the term SCZ PPI network. This study is focused on SCZ by choosing proteins implicated in SCZ, quantitating how the PPI changes in a SCZ model, and discussing how our findings are relevant to SCZ pathogenesis. So, it seems logical to call our dataset a SCZ PPI network. We do concede that without further experimentation we do not know if these PPI play a causal role in SCZ. Furthermore, our novel PPI may involve biological pathways unrelated to SCZ and that have relevance to other biological conditions.

      We added the following statement to the Discussion to address this comment:

      “Even though our network was constructed in the context of SCZ, our dataset has relevance to other neurological diseases where our targets have been implicated in the pathogenesis.

      5) Authors should look for protein interactions that might be happening also in glial cells. They are not the majority in hippocampus, but are present in the type of tissue analyzed here. Thus, some of the interactions observed might be more abundantly present in those cells. Maybe enriching using bioinformatics tools the PPI network to different cell types.

      As mentioned above, we agree that synaptic dysfunction is just one of the hypotheses of SCZ pathogenesis and emerging evidence suggests that dysfunction in astrocytes and microglia are factors. Since these non-neuronal cells can regulate synapses, these hypotheses are not mutually exclusively and suggests that at the cellular level SCZ etiology involves multiple cell types.

      We addressed your query by comparing our PPI network to an RNA-seq analysis of different cell types in the rodent brain(Zhang, Chen et al. 2014). First, we analyzed our target proteins, and found that they were expressed in all cell types to varying degrees except Syngap which was not in the RNA-seq database. This data is now represented in Figure 3E. We then determined the RNA abundance distribution of all the protein interactors, which is represented in Figure 3D as a heatmap. From a bird’s eye view, it suggests that some PPI exist in non-neuronal cells. Next, we determine how many of our protein interactors were enriched in one cell type, which is shown in Figure 3F. We defined an enriched protein as having >50% of the RNA signal in one cell type. We identified 175 proteins that were enriched in one cell type compared to the entire RNA-seq dataset which had 4008 enriched proteins. In the entire RNA-seq dataset, 24% of the enriched proteins were in neurons whereas 47% of our protein interactors were enriched in neurons. This is consistent with the enrichment of synaptic proteins in our network. There was also an increased percentage of astrocytes (19%) and oligodendrocytes (6%) in our network compared to the entire database (i.e. astrocytes-11% and oligodendrocytes-4%). In other cell types, such as microglia, there was less protein enrichment in our network compared to the database. We have amended this cell type analysis to our manuscript and concluded that a portion of our PPI network may occur in non-neuronal cells. We also created a supplementary table of our network with its associated RNA-seq data.

      Text added to the Results:

      “Non-synaptic proteins represented 59% of our network suggesting that some PPI may occur in non-neuronal cells. To investigate this possibility, we annotated our network with a transcriptome rodent brain database of eight cell types(Zhang, Chen et al. 2014). All the targets were detected in all cell types but there was obvious enrichment in specific cell types for some targets (Figure 3E). Syngap1 was not in the database. We also observed a large variation of cellular distributions for the interactors (Figure 3D). Next, we sought to determine how many interactors are enriched in a particular cell type by defining cell enrichment as a protein having >50% RNA signal in one cell type. We identified 175 protein interactors enriched in one cell type, whereas the entire database had 4008 proteins enriched (Figure 3F). Consistent with our synaptic enrichment, 47% of the enriched protein interactors were in neurons whereas only 24% of the enriched protein in the entire database were in neurons. We also observed an increase in protein interactors enriched in astrocytes compared to the database. Overall, this analysis provides evidence that our identified PPI may occur in non-neuronal cells.”

      Text added to the Discussion:

      “The exact etiology of SCZ, however, remains unclear and synaptic dysfunction is only one hypothesis (Misir and Akay 2023). There is evidence for the involvement of non-neuronal cell types, including endothelial cells, astrocytes, and microglia(Tarasov, Svistunov et al. 2019, Rodrigues-Neves, Ambrosio et al. 2022, Stanca, Rossetti et al. 2024). Although we observed an enrichment of synaptic proteins in our SCZ network, we provided evidence that a portion of our network may occur in non-neuronal cells. Since non-neuronal cells can regulate synapses(Vilalta and Brown 2018, Bauminger and Gaisler-Salomon 2022), synaptic dysfunction and perturbations in non-neuron cells in SCZ etiology are not mutually exclusive. Our data corresponds with emerging evidence that pathogenesis is multifaceted, involving dysfunction in multiple cell types.

      Minor: 1) in the abstract, it is not clear if 90% of the PPI are novel to brain tissue in general or specifically schizophrenia. We apologize for the confusing sentence. 90% are novel meaning the PPI have not been reported in any study. We changed the abstract to read:

      “Over 90% of the PPI have not been previously reported.”

      2) authors refer to LC-MS-based proteomics as "MS" all across the text. Who am I to say this to Yates et al, but I think it is rather simplified use "Mass Spectrometry Analysis", when this is a typical LC-MS type of analysis We agree with you. We have replaced MS analysis with LC-MS analysis in the manuscript.

      3) Several references used to construct the hypothesis of the paper are rather outdated: several from 10-15 years ago. It would be interesting to provide to the reader up to date references, given the rapid pace science has been progressing. We agree many of the references are 10-15 years old. Many of the hypotheses and biological mechanisms we discussed can be supported by too many studies to cite them all, due to space. If we could, we would. We also agree that there are many more recent studies that have confirmed and added more details to the original discovery or hypothesis cited. We cite the first study to support our conclusions because it deserves the most credit.

      4) "UniProt rat database". Please, state the version and if reviewed or unreviewed.

      This information was added to the Methods section. UniProt reviewed rat database with isoforms 03-25-2014.

      Reviewer #2 (Significance (Required)):

      The study is informative, and has great potential to enrich the specific literature of this field. But should tone down some arguments, given the experimental limitations of the PPI network (as described above) and should state PCP-treated rats as a limited model to schizophrenia.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary

      It is now widely accepted that schizophrenia is polygenic disorder in which a large fraction of the genetic risk is in variants affecting the expression of synaptic proteins. Moreover, it is known that these synaptic proteins are found in multiprotein complexes and that many proteins encoded by schizophrenia risk genes interact directly or indirectly in these complexes. It is also known that some drugs including phencyclidine, which binds to NMDA receptors and to Dopamine D2 receptors (not mentioned by the authors) can induce schizophreniform psychosis. The authors have set out to advance on this position by performing proteomic mass spectrometry studies on proteins identified as encoded by schizophrenia risk genes. They target 8 proteins for immunoprecipitation from rat brain and identify coisolated proteins and perform various network analyses. In the most interesting part of the paper they ask if PCP-treatment altered protein interactions and report various changes.

      Major comments:

      1. Choice of target proteins. It was not until the first paragraph of the results section that the authors first name the 8 synaptic proteins that have chosen to study. This information should be in the abstract.

      This information was added to the abstract as requested.

      The authors then use figure 1A and 1B as evidence that these 8 "baits" are schizophrenia-relevant proteins. Figure 1A does not provide any evidence at all and Figure 1B is about as weak a line of evidence imaginable - a histogram of the number of papers that have the search term "schizophrenia" and the protein name. I tried this search for Grin2B and almost immediately found papers that reported no association between Grin2B and schizophrenia (e.g. PMID: 33237434). Figure 1B should be scrapped.

      The purpose of Figure 1A was not to demonstrate that there is evidence that our proteins are involved in SCZ. The purpose of this figure is to show that these proteins are diverse in function and structure (blue = membrane proteins; yellow = soluble proteins), and that there are published studies reporting physical and functional interactions between these 8 proteins. This suggests that a more extensive network may exist.

      We agree that Figure 1B does not specifically describe how each protein is related to SCZ but demonstrates how many papers investigating their connection to SCZ have been published. We understand how by itself, this can be considered weak. We still think it is important to show that multiple laboratories have published papers connecting these proteins to SCZ. Instead of scrapping this figure, we have moved it to the Supplementary Figure 2A.

      We read PMID: 33237434 and interpret their findings quite differently than you. This report examined whether one single nucleotide mutation (SNV) in Grin2b is associated with the cognitive dysfunction in SCZ but did not examine if this mutation is associated with the other major SCZ phenotypes (i.e. psychotic and emotional). Specifically, the study selected 117 “patients in whom cognitive dysfunctions are present despite effective antipsychotic treatment of other schizophrenia symptoms.” The study concluded that Grin2B SNV was not associated with this subset of patients but concluded that they need to search for other NMDAR variants and study their association with SCZ. We would argue that the only reason this group performed these experiments was the well-known association between Grin2b and SCZ. Many studies have found SNVs in Grin2B that are associated with SCZ, but there are conflicting reports. It is unclear if the discrepancies are connected to different cohorts, complexity of SCZ phenotype, or small sample sizes. Regardless of Grin2B mutations significantly associated with SCZ, there are several lines of evidence that Grin2B is involved in SCZ. Most importantly, Grin2b is a component of the NMDAR, which is a key player to the SCZ hypo-glutamate hypothesis and the receptor that binds PCP. By immunoprecipitating Grin2b, we are analyzing the PPI network of NMDAR, which is arguably the most studied complex in SCZ research.

      The remaining part of paragraph 1 of the results does not provide an adequate, let alone systematic, justification for the use of the 8 baits. It would be appropriate to construct a table with the 8 proteins and cite relevant papers and identify the basis for why they are implicated in schizophrenia (is it a direct mutation or some other evidence?). What makes these 8 proteins better than many others that are cited as synaptic schizophrenia relevant proteins?

      We apologize for not clearly and thoroughly describing the reasons for choosing our baits. As stated in the first paragraph of the Results, we chose the proteins that had evidence of being a SCZ risk factor in SCZ databases that included a plethora of human genomic studies. This criterion by itself results in ~5000 genes. To further narrow our candidates, we chose targets that were synaptic and were observed to have phosphorylation changes in response to PCP in an SCZ animal model. Since protein-protein interactions (PPI) are often dependent on phosphorylation, we believe this is an important criterion for quantitation of PPI in response to PCP. These requirements still resulted in a list of hundreds of proteins. So, what makes these better than any other SCZ relevant protein? As stated in the manuscript, the major limiting criterion was identifying commercial antibodies that can efficiently immunoprecipitate their target in brain tissue. Since there are many reports associating our targets with SCZ, we directed the reader to SCZ databases that compile large genomic association studies. We understand, however, the request for more specific information regarding the biological connection between these proteins and SCZ. We took your suggestion and constructed a table with our 8 targets, and it is now Figure 1A. In this table, we selected references to indicate if the target has reported changes in expression and/or activity in SCZ samples (i.e. human and animal model) or genetic association with SCZ in human studies.

      The methods of protein extraction are particularly concerning. The postsynaptic density of excitatory synapses (which contains several of the target proteins in this study) has been notoriously difficult to solubilise unless one uses high pH (9) and harsh detergent extraction (1% deoxycholate). The authors use pH 7 and weak detergent conditions, which are likely to be inefficient for solubilising at least several of the target proteins. Nowhere do the authors report how much of the total of their target protein is being solubilised. Indeed, there are no figures showing biochemical conditions at all. What if only a small percentage of the target protein is being immunoprecipitated - what does this mean for the interaction data? How do we know if the fraction being immunoprecipitated is from the synapse? (why did they not use synaptosomes).

      How do we know if the fraction being immunoprecipitated is from the synapse? (why did they not use synaptosomes). The absence of this kind of data undermines the reader's confidence in the findings.

      We apologize for not clearly explaining our experimental design We were not interested in identifying the PPI of the PSD. All these proteins have been localized to the synapse, but they are also localized to other neuronal compartments and non-neuronal cell types. Synaptic dysfunction is one hypothesis of SCZ pathogenesis, but there is evidence of other cell types, including astrocytes, microglia, and oligodendrocytes(Kerns, Vong et al. 2010, Ma, Abazyan et al. 2013, Goudriaan, de Leeuw et al. 2014, Park, Noh et al. 2020). For these reasons, we chose an unbiased approach to identifying PPI.

      The Results have been amended to read: “All the targets are localized to the synapse, but also localized to non-synaptic compartments and expressed in non-neuronal cells. Thus, since there is also evidence for non-synaptic perturbations contributing to SCZ pathogenesis, we chose to perform an unbiased analysis in unfractionated brain tissue (Tarasov, Svistunov et al. 2019, Rodrigues-Neves, Ambrosio et al. 2022, Stanca, Rossetti et al. 2024). “

      Why do we choose a specific solubilization strategy? Harsh detergents can disrupt PPI and prevent efficient enrichment of the target by disrupting the target-antibody interaction(Pankow, Bamberger et al. 2015). To identify protein interactions, mild detergent conditions are typically employed in PPI studies. We used a combination of “weak” detergents (i.e. 0.5% NP-40, 0.5% Triton, and 0.01% Deoxycholate) to help prevent non-specific PPI, but still allowing efficient enrichment of the target proteins. We do agree that with our conditions the targets were not completely solubilized. It is a balancing act to find the correct conditions for IP-MS analysis. Since we are unable to immunoprecipitate all the target protein, we did not identify all the PPI for each target, and we did not make this claim. Importantly, we did identify known interactions for all our targets. Our mild detergent protocol is similar to other PPI studies and our results validates results reported in previous studies. It is more important to significantly enrich the target protein over control than to achieve complete solubilization (Supplementary Figure 2D). This allows us to use control IPs to successfully employ the SAINT algorithm to determine which proteins are confident PPI using a 5% FDR.

      How do we know protein are being immunoprecipitated from the synapse? As we show in Figures 2B and 3A, multiple proteins are annotated to the synapse with different databases, Gene ontology and SynGO. Well-known synaptic PPI were also observed, such as Grin2B-Dlg4(i.e. PSD-95), providing further evidence for proteins being immunoprecipitated for the synapses. Besides validating over a hundred published PPI interactions, we also identified many reciprocal interactions between the target datasets demonstrating the reproducibility of our protocol. Thus, we respectfully disagree with you and assert that our PPI network is very confident.

      The immunoprecipitation protocol is unusual in that the homogenates were incubated overnight (twice), which is a very long period compared to most published protocols. This is a concern because spurious protein interactions could form during this long incubation.

      There are many different immunoprecipitation protocols in the literature. The IP conditions depend upon the target protein and the antibody employed. Specifically, the abundance of the target and the affinity of the antibody to the target will dictate the IP conditions. We routinely perform overnight incubation for our IP-MS studies(Pankow, Bamberger et al. 2016, McClatchy, Yu et al. 2018). In our experience with brain tissue, this results in the highest enrichment of the target protein and the best reproducibility between biological replicates compared to IP protocols with shorter incubation times. Many other laboratories use overnight incubations(Lin and Lai 2017, Iqbal, Akins et al. 2018, Lagundzin, Krieger et al. 2022), so we do not consider our protocol unusual. We do find that IPs with tagged proteins in cell culture are more amenable to short incubation times. We have no evidence that overnight incubation causes spurious protein interactions nor could find any in the literature. Non-specific interactions are a concern with IP-MS experiments regardless of the incubation time. We took multiple steps to reduce the non-specific PPI from affecting our dataset. The first overnight incubation was incubating the brain lysate with agarose beads linked to IgGs to preclear the lysate from “sticky” non-specific interactors binding to IgGs and the beads. In addition, control IPs with IgG crosslinked to beads were incubated with brain lysate in parallel to each target IP. We computationally compared the non-specific control IPs with the target IPs using the SAINT algorithm to generate a confident list of PPI with a stringent 5% FDR. Therefore, our pipeline is specifically designed to prevent spurious PPI.

      In the section "Biological interpretation of scz PPI network". Surprisingly the authors found that synaptic proteins that are exclusively postsynaptic (Grin2B, SynGAP) or exclusively presynaptic (Syt1) show very high percentages of their interacting proteins are from the synaptic compartments where the target protein is not expressed. The authors offer no explanation for this paradox. One explanation for this could be that spurious PPIs have formed in the protein extraction/immunoprecipitation protocol. These findings need validation by biochemical fractionation of synapses into pre and post synaptic fractions and immunohistochemistry to demonstrate the subsynaptic localisation of the proteins. Grin2b is traditionally described as exclusively post-synaptic, but there is evidence for other localizations, including presynaptic(Berretta and Jones 1996, Sjostrom, Turrigiano et al. 2003, Bouvier, Larsen et al. 2018) and expression in astrocytes(Serrano, Robitaille et al. 2008, Lee, Ting et al. 2010, Lalo, Koh et al. 2021, Kim, Choi et al. 2024). Syngap has been localized to non-synaptic sites and glia expression in addition to its heavily studied role at the post synapse(Moon, Sakagami et al. 2008, Araki, Zeng et al. 2015, Birtele, Del Dosso et al. 2023). Syt1 is commonly used as a presynaptic marker, but along with other proteins previously reported to be exclusively presynaptic (such as SNAP-25), it has been localized to the postsynapse (Selak, Paternain et al. 2009, Tomasoni, Repetto et al. 2013, Hussain, Egbenya et al. 2017, Madrigal, Portales et al. 2019, Sumi and Harada 2023). Similarly, SynGo database assigns both post-synaptic and pre-synaptic localizations to Grin2b as stated in the manuscript. Thus, our data is not paradoxical, but supports the emerging evidence against the canonical exclusivity of the pre- and post-synaptic compartments. Determining subsynaptic localization of a protein is a huge undertaking and requires expertise we do not possess. This is why we relied on synaptic databases and the literature for our interpretation of our data, as other publications have done.

      We added the following to the Discussion to address this issue:

      “Using the SynGo database, 418 proteins (i.e. 41% of our network) were identified as synaptic proteins consistent with the targets having a synaptic localization. Defining the synaptic proteome is inherently difficult because the synapse is an “open organelle”, and many synaptic proteins also have non-synaptic localizations and are expressed in non-neuronal cells. We further attempted to define our synaptic PPI by differentiating between pre- and post- synaptic compartments via SynGo. Half of our targets were annotated to both compartments and all targets had PPI that were annotated to both. This data supports the emerging evidence against the canonical localization exclusivity of the pre and post synapse(Bouvier, Larsen et al. 2018, Madrigal, Portales et al. 2019).”

      My concerns about spurious interactions are raised again because the authors say that 92% of their interactions are novel (I note that they authors have not compared their interaction data of the NMDA receptor with published datasets from Dr Seth Grant's laboratory). BioGrid itself is good but not enough for comparison, maybe at this point it worth taking String, which accumulates several sources of PPIs, just select the direct PPIs.

      Since the MS-IP experiments in our study have never been performed before, we are not surprised by the extent of novel data we produced. As described above, we took many steps to prevent spurious PPI from entering our final dataset, including the use of detergents, preclearing and stringent bioinformatic filtering. Our entire dataset is very large, so the 8% of PPI that we replicated from other studies represents 124 interactions. We believe this to be an impressive number which correlates to the confidence of our data. Providing more confidence, we identified many reciprocal PPI where shared protein interactors between target proteins were identified in both target protein datasets.

          The PPI described for our targets in BioGrid encompassed 713 publications.  Two of the BioGrid datasets that were compared to our Grin2b PPI data were from the laboratory of Seth Grant.  Arbuckle et al (2010) is a low-throughout paper that describes a Grin2b and DLG4 PPI (that we also identified) and Husi et al (__2000__) is a seminal paper using high-throughput LC-MS to identify PPI in the PSD of mouse brain.  There were many differences between Husi et al and our pipeline.  Husi et al employed the C-terminal Grin2b peptide to pull down interactors from the PSD fraction whereas we employed Grin2b antibody to enrich Grin2b and its interactors from unfractionated brain tissue.  Despite these differences, our studies found 8 proteins in common.
      

      We took your suggestion and compared our data to String which includes direct PPI and functional PPI. Our input was the high confidence PPI identified by SAINT with 5% FDR as with the BioGrid comparison. The PPI network for each target protein had a more significant enrichment (p We think the problem you suggest with SynGO is more of an inherent problem with characterizing the synaptic proteome. The synaptic proteome is difficult to define since it is an “open organelle” with proteins transporting in and out. In addition, most synaptic proteins, such as mitochondrial and translational proteins, also have non-synaptic localizations. It is not possible to isolate a contaminant-free “pure” synaptic preparation by biochemical fractionation. Recently, SynGO was used in a meta-analysis of previously published PSD datasets(Kaizuka, Hirouchi et al. 2024). Kaizuka et al. found 123 proteins identified in 20 PSD datasets. SynGo annotated proteins with post-synaptic localization from this list. To a lesser extent they also identified presynaptic localizations, but it is unclear if the presynaptic proteins are novel localizations. Kaizuka et al. continued the investigation and identified a novel PSD protein, thus demonstrating that our knowledge of pre- and post- synaptic proteomes is incomplete.

      Minor comments

      1. A number of papers have reported protein interactions of native NMDA receptor complexes and their associated proteins isolated from rodent brain and are neither referenced in this paper. It would be relevant to compare these published datasets with the Grin2B IP datasets.

      We employed BioGrid as a reference of reported PPI for each of our target proteins. For Grin2B, the PPI came from 142 different publications. For eight target proteins, we decided *BioGrid * was the best resource for determining the novelty of our PPI because it is routinely used for large-scale unbiased PPI analysis. To determine the novelty of our network, we compared our PPI network to 713 publications via BioGrid. We are unsure whether the papers you are referring to are included in the BioGrid database. To make it easier for readers with similar queries, we added an additional supplementary table (TableS4) including all the publications (i.e. PMID numbers) included in BioGrid comparison for each target protein.

      We amended the Results with the following sentence, so the readers realized the extensiveness of the Biogrid comparison analysis:

      “There were 713 publications in BioGrid that describe at least one interaction with one of our targets (Supplementary Table4).”

      The use of the term "bait" in purification experiments typically refers to a protein and not an antibody. I suggest removing the word bait to avoid ambiguity and simply use the word target. We took your suggestion and used “target” instead of “bait” to avoid ambiguity.

      26 mins of treatment gives completely different set of PPIs between PCP and saline which is very interesting, so both networks should be included in Supplementary. Also, it would be useful to have a list of modulated (phosphorylated in their case, but also ubiquitinated etc) proteins, which is not presented. Table S1 lists the PPI for each target, and we designated whether the interactors were for Sal, PCP, or both. Phosphorylated and ubiquitinated proteins are very hard to reproducibly identify without an additional enrichment step. Since we did not perform this enrichment step, we did not search for these modifications and do not have any modified proteins to report.

      As they say their final network is composed of "direct physical and "co-complex" interactors and they cannot distinguish between them. This is particularly bad for the postsynapse, where all the PSD components can be co-IP-ed in different combinations. It can explain the Figure 5C, where most of the proteins have FDR = 1, which means they do not reproduce. Figure 5C represents the intersection of 15N quantification and SAINT analysis. The x-axis is the FDR reported for SAINT analysis, and the y-axis is the significant proteins from the N15 analysis. This figure demonstrates that some proteins that were significantly different with PCP via N15 quantification also were annotated as PPI by SAINT (i.e. 5%. As stated in the Discussion, we concluded that the SAINT analysis and N15 quantitation are complementary in identifying PPI and that the quantification of a biological perturbation may aid the identification of PPI. Figure 5C is not related to whether our PPI are direct physical or "co-complex" interactors. Distinguishing between direct physical and co-complex interactors is an inherent problem for all IP studies. Since another reviewer also highlighted this deficit in our manuscript, we decided to analyze our PPI dataset with the artificial intelligence algorithm AlphaFold 3(AF3). The AF3 data is encompassed in Figure 6.

      The following AF3 data was added to the Results Section:

      “A disadvantage of IP-MS studies is that it cannot distinguish between a PPI that binds directly to the target protein, and a PPI in which the interactor and target protein reside in the same multiprotein complex (i.e. indirect). We sought to predict which PPI may be directly interacting with its target protein by using the artificial intelligence algorithm AlphaFold3(AF3) (Abramson, Adler et al. 2024). First, we analyzed the predicted AF3 structure of the targets using the pTM score, and determined the fraction of each structure that was calculated to be disordered (Figure 6A and Supplementary Table7). Our reasoning was that if our targets have a poorly resolved structures then it will be difficult to screen for direct PPI. A pTM score >0.5 suggests that the structure may be correct, with the highest confidence equaling 1. Undefined or disordered regions hinder the accuracy of the prediction, and all our targets possessed a pTM score > 0.5 except Syt1. The fraction of disordered negatively correlated with the pTM score, as expected. Gsk3b, Ppp1ca, and Map2k1 were the target proteins with the highest pTM scores and were also the smallest of our targets (Figure 6B). Ppp1ca had the most confident structure (i.e. pTM 0.9) and the least fraction disordered (i.e. 0.07). Next, we determined the AF3 prediction of previously reported direct interactions of the targets. We used the iPTM score to determine an interaction confidence. An iPTM score >0.8 is a highly confident direct interaction, whereas 0.8. These eight PPI have all previously been reported to form a direct interaction with Ppp1ca, except Phactr3 (Zhang, Zhang et al. 1998, Terrak, Kerff et al. 2004, Hurley, Yang et al. 2007, Marsh, Dancheck et al. 2010, Ragusa, Dancheck et al. 2010, Ferrar, Chamousset et al. 2012, Choy, Srivastava et al. 2024, Xu, Sadleir et al. 2024)*. Phactr3 is structurally similar to, but less studied than, the reported direct interactor, Phactr1. These interactors are all inhibitors of PP1 except for Ppp1r9b which targets Ppp1ca to specific subcellular compartments. Nine PPI were assigned a score The following AF3 interpretation was added to the Discussion:

      “Our SCZ PPI network consists of two types of PPI: direct physical interactions and “co-complex” or indirect interactions. Typically, the nature of the interaction cannot be distinguished in IP-MS studies. We decided to employ the new AF3 algorithm to screen the PPI of Ppp1ca to provide evidence for direct interactors. We chose to examine the PPI assigned to Ppp1ca, because its structure was the most confident among our target proteins and AF3 correctly predicted a known direct interactor with high confidence. Ppp1ca is a catalytic subunit of the phosphatase PP1, which is required to associate with regulatory subunits to create holoenzymes (Li, Wilmanns et al. 2013). Eighteen PPI were predicted to be directly interacting with Ppp1ca using a 0.6 or higher iPTM filter. This filter may be too conservative and may generate false negatives, because another study employed a 0.3 filter followed by additional interrogation to screen for direct PPI (Weeratunga, Gormal et al. 2024). Forty-four percent of these predictions were confirmed by previous publications. Most of these validated direct interactions are inhibitors of the phosphatase, but one, Ppp1r9b (aka spinophilin), is known to target Ppp1ca to dendritic spines (Allen, Ouimet et al. 1997, Salek, Claeboe et al. 2023). This high correlation with the literature provides substantial confidence to the novel PPI predicted to be direct Ppp1ca interactors. The AF3 screen predicted that NDRG2 directly interacts with Ppp1ca. This protein is known to regulate many phosphorylation dependent signaling pathways by directly interacting with other phosphatases including Pp1ma and PP2A (Feng, Zhou et al. 2022, Lee, Lim et al. 2022). Actin binding protein Capza1 was also predicted to directly interact with Ppp1ca and Ppp1ca interacts with actin and its binding proteins to maintain optimal localization for efficient activity to specific substrates (Foley, Ward et al. 2023). Hsp1e is a heat shock protein predicted to directly interact with Ppp1ca. Although there is no direct connection to Ppp1ca, other heat shock proteins have been reported to regulate Ppp1ca (Mivechi, Trainor et al. 1993, Flores-Delgado, Liu et al. 2007, Qian, Vafiadaki et al. 2011). We also observed that many of the direct PPI were altered with PCP treatment. One direct interactor, Ppp1r1b (aka DARPP-32), is phosphorylated at Thr34 by PKA in the brain upon PCP treatment. This phosphorylation event converts Ppp1rb to a potent inhibitor of Ppp1ca(Svenningsson, Tzavara et al. 2003). Importantly, the manipulation of Thr34 attenuated the behavioral effects of PCP. Consistent with this report, Ppp1r1b-Ppp1ca interaction was only observed with PCP in our study. Further investigation is needed to determine if our novel direct interactors regulate the PCP phenotype. We conclude that AF3 can provide important structural insights into the nature of PPI obtained from large scale IP-MS studies.”

      The way PPI data is reported can be improved so that I does not have to be extracted from Table 1 and 2. It would be good if they provide just two columns PPI list, with names or IDs, plus PSP/saline/both conditions in third column, for ease of comparison with other sources and building the graph. They can add it as another spreadsheet to Table 2. We generated this table (TableS2) as you requested.

      Is Figure 2 built for Sal or PCP conditions? as they have only 23% interactions in common (Figure 4A) the Figure 2 should be pretty different for two conditions. Are the 1007 interactors combined from SAL and PCP?

      Figure 2 contains ALL the unique PPI for each target regardless of Sal or PCP conditions. The 1007 protein interactors shown in Figure 2Awhere Sal and PCP were combined to generate a non-redundant list of proteins for each target.

      We amended the Results to make this clearer:

      “When the PCP and SAL datasets were combined, there were 1007 unique proteins.”

      This sentence was added to Figure 2A:

      “For this comparison, Sal and PCP PPI were combined into a unique PPI list for each target.”

      Figure 1F is mentioned but no figure is shown. We apologize for this oversight, and we have corrected the manuscript. 8. Overall the paper could be edited and made more concise, especially the introduction and discussion. We extensively edited the manuscript to be more concise.

      Reviewer #3 (Significance (Required)):

      General assessment

      Proteomic mass spectrometry of immunoprecipitated complexes from synapses has been extensively studied since Husi et al (2000) first study of NMDA receptor and AMPA receptor complexes. Since then, a wide variety of methods have been employed to purify synaptic protein complexes including peptide affinity, tandem-affinity purification of endogenous proteins tagged with FLAG and Histine-affinity tags amongst other methods. Purification of protein complexes and the postsynaptic density from the postsynaptic terminal of mammalian excitatory synapses have been crucial for establishing that schizophrenia is a polygenic disorder affecting synapses (e.g. Fernandez et al, 2009; Kirov et al, 2012; Purcell et al, 2014, Fromer et al, 2014 etc). Network analyses of the postsynaptic proteome have described networks of schizophrenia interacting proteins (e.g. Pocklington et al, 2006; Fernandez et al, 2009) and other neuropsychiatric disorders.

      Hundreds of synaptic protein complexes have been identified (Frank et al, 2016), but very few have been characterised using proteomic mass spectrometry. This paper has chosen 8 protein targets for such analysis and identified many proteins that a putative interactors of the target protein. At this level the current manuscript does not represent a conceptual advance and the value of the data lies in its utility as a resource that may be used in future studies.

      The findings from the 8 target proteins from normal adult rat brain were used for a secondary study that describes the effects that PCP has on the interaction networks. Interestingly, this work shows that 26 minutes of drug treatment leads to considerable changes in the interactomes of the target proteins. These descriptive data could be used in future studies to understand the cell biological mechanisms that mediate these rapid changes in the proteome. PCP and drugs that interact with NMDA receptors are known to induce changes in synaptic proteome phosphorylation including modifications in protein-protein interaction sites, which may explain the PCP effects.

      The study would benefit from validation of experimental protocols for solubilisation and immunoprecipitation and validation of described interactions using orthogonal biochemical or localisation experiments.

      Audience Specialists in synapse proteins and mechanisms of schizophrenia.

      Expertise

      The reviewers' expertise is in molecular biology of synapses including synapse proteomics, protein interaction and network analysis, and genetics of schizophrenia and other brain disorders.

      Abramson, J., J. Adler, J. Dunger, R. Evans, T. Green, A. Pritzel, O. Ronneberger, L. Willmore, A. J. Ballard, J. Bambrick, S. W. Bodenstein, D. A. Evans, C. C. Hung, M. O'Neill, D. Reiman, K. Tunyasuvunakool, Z. Wu, A. Zemgulyte, E. Arvaniti, C. Beattie, O. Bertolli, A. Bridgland, A. Cherepanov, M. Congreve, A. I. Cowen-Rivers, A. Cowie, M. Figurnov, F. B. Fuchs, H. Gladman, R. Jain, Y. A. Khan, C. M. R. Low, K. Perlin, A. Potapenko, P. Savy, S. Singh, A. Stecula, A. Thillaisundaram, C. Tong, S. Yakneen, E. D. Zhong, M. Zielinski, A. Zidek, V. Bapst, P. Kohli, M. Jaderberg, D. Hassabis and J. M. Jumper (2024). "Accurate structure prediction of biomolecular interactions with AlphaFold 3." Nature 630(8016): 493-500.

      Allen, P. B., C. C. Ouimet and P. Greengard (1997). "Spinophilin, a novel protein phosphatase 1 binding protein localized to dendritic spines." Proc Natl Acad Sci U S A 94(18): 9956-9961.

      Anschuetz, A., K. Schwab, C. R. Harrington, C. M. Wischik and G. Riedel (2024). "A Meta-Analysis on Presynaptic Changes in Alzheimer's Disease." J Alzheimers Dis 97(1): 145-162.

      Araki, Y., M. Zeng, M. Zhang and R. L. Huganir (2015). "Rapid dispersion of SynGAP from synaptic spines triggers AMPA receptor insertion and spine enlargement during LTP." Neuron 85(1): 173-189.

      Bauminger, H. and I. Gaisler-Salomon (2022). "Beyond NMDA Receptors: Homeostasis at the Glutamate Tripartite Synapse and Its Contributions to Cognitive Dysfunction in Schizophrenia." Int J Mol Sci 23(15).

      Berretta, N. and R. S. Jones (1996). "Tonic facilitation of glutamate release by presynaptic N-methyl-D-aspartate autoreceptors in the entorhinal cortex." Neuroscience 75(2): 339-344.

      Birtele, M., A. Del Dosso, T. Xu, T. Nguyen, B. Wilkinson, N. Hosseini, S. Nguyen, J. P. Urenda, G. Knight, C. Rojas, I. Flores, A. Atamian, R. Moore, R. Sharma, P. Pirrotte, R. S. Ashton, E. J. Huang, G. Rumbaugh, M. P. Coba and G. Quadrato (2023). "Non-synaptic function of the autism spectrum disorder-associated gene SYNGAP1 in cortical neurogenesis." Nat Neurosci 26(12): 2090-2103.

      Bouvier, G., R. S. Larsen, A. Rodriguez-Moreno, O. Paulsen and P. J. Sjostrom (2018). "Towards resolving the presynaptic NMDA receptor debate." Curr Opin Neurobiol 51: 1-7.

      Choy, M. S., G. Srivastava, L. C. Robinson, K. Tatchell, R. Page and W. Peti (2024). "The SDS22:PP1:I3 complex: SDS22 binding to PP1 loosens the active site metal to prime metal exchange." J Biol Chem 300(1): 105515.

      Dobson, L., I. Remenyi and G. E. Tusnady (2015). "The human transmembrane proteome." Biol Direct 10: 31.

      Feng, D., J. Zhou, H. Liu, X. Wu, F. Li, J. Zhao, Y. Zhang, L. Wang, M. Chao, Q. Wang, H. Qin, S. Ge, Q. Liu, J. Zhang and Y. Qu (2022). "Astrocytic NDRG2-PPM1A interaction exacerbates blood-brain barrier disruption after subarachnoid hemorrhage." Sci Adv 8(39): eabq2423.

      Ferrar, T., D. Chamousset, V. De Wever, M. Nimick, J. Andersen, L. Trinkle-Mulcahy and G. B. Moorhead (2012). "Taperin (c9orf75), a mutated gene in nonsyndromic deafness, encodes a vertebrate specific, nuclear localized protein phosphatase one alpha (PP1alpha) docking protein." Biol Open 1(2): 128-139.

      Flores-Delgado, G., C. W. Liu, R. Sposto and N. Berndt (2007). "A limited screen for protein interactions reveals new roles for protein phosphatase 1 in cell cycle control and apoptosis." J Proteome Res 6(3): 1165-1175.

      Foley, K., N. Ward, H. Hou, A. Mayer, C. McKee and H. Xia (2023). "Regulation of PP1 interaction with I-2, neurabin, and F-actin." Mol Cell Neurosci 124: 103796.

      Goudriaan, A., C. de Leeuw, S. Ripke, C. M. Hultman, P. Sklar, P. F. Sullivan, A. B. Smit, D. Posthuma and M. H. Verheijen (2014). "Specific glial functions contribute to schizophrenia susceptibility." Schizophr Bull 40(4): 925-935.

      Hemmings, H. C., Jr., P. Greengard, H. Y. Tung and P. Cohen (1984). "DARPP-32, a dopamine-regulated neuronal phosphoprotein, is a potent inhibitor of protein phosphatase-1." Nature 310(5977): 503-505.

      Hurley, T. D., J. Yang, L. Zhang, K. D. Goodwin, Q. Zou, M. Cortese, A. K. Dunker and A. A. DePaoli-Roach (2007). "Structural basis for regulation of protein phosphatase 1 by inhibitor-2." J Biol Chem 282(39): 28874-28883.

      Hussain, S., D. L. Egbenya, Y. C. Lai, Z. J. Dosa, J. B. Sorensen, A. E. Anderson and S. Davanger (2017). "The calcium sensor synaptotagmin 1 is expressed and regulated in hippocampal postsynaptic spines." Hippocampus 27(11): 1168-1177.

      Iqbal, H., D. R. Akins and M. R. Kenedy (2018). "Co-immunoprecipitation for Identifying Protein-Protein Interactions in Borrelia burgdorferi." Methods Mol Biol 1690: 47-55.

      Kaizuka, T., T. Hirouchi, T. Saneyoshi, T. Shirafuji, M. O. Collins, S. G. N. Grant, Y. Hayashi and T. Takumi (2024). "FAM81A is a postsynaptic protein that regulates the condensation of postsynaptic proteins via liquid-liquid phase separation." PLoS Biol 22(3): e3002006.

      Kaizuka, T., T. Suzuki, N. Kishi, K. Tamada, M. W. Kilimann, T. Ueyama, M. Watanabe, T. Shimogori, H. Okano, N. Dohmae and T. Takumi (2024). "Remodeling of the postsynaptic proteome in male mice and marmosets during synapse development." Nat Commun 15(1): 2496.

      Kerns, D., G. S. Vong, K. Barley, S. Dracheva, P. Katsel, P. Casaccia, V. Haroutunian and W. Byne (2010). "Gene expression abnormalities and oligodendrocyte deficits in the internal capsule in schizophrenia." Schizophr Res 120(1-3): 150-158.

      Kim, H., S. Choi, E. Lee, W. Koh and C. J. Lee (2024). "Tonic NMDAR Currents in the Brain: Regulation and Cognitive Functions." Biol Psychiatry.

      Koopmans, F., P. van Nierop, M. Andres-Alonso, A. Byrnes, T. Cijsouw, M. P. Coba, L. N. Cornelisse, R. J. Farrell, H. L. Goldschmidt, D. P. Howrigan, N. K. Hussain, C. Imig, A. P. H. de Jong, H. Jung, M. Kohansalnodehi, B. Kramarz, N. Lipstein, R. C. Lovering, H. MacGillavry, V. Mariano, H. Mi, M. Ninov, D. Osumi-Sutherland, R. Pielot, K. H. Smalla, H. Tang, K. Tashman, R. F. G. Toonen, C. Verpelli, R. Reig-Viader, K. Watanabe, J. van Weering, T. Achsel, G. Ashrafi, N. Asi, T. C. Brown, P. De Camilli, M. Feuermann, R. E. Foulger, P. Gaudet, A. Joglekar, A. Kanellopoulos, R. Malenka, R. A. Nicoll, C. Pulido, J. de Juan-Sanz, M. Sheng, T. C. Sudhof, H. U. Tilgner, C. Bagni, A. Bayes, T. Biederer, N. Brose, J. J. E. Chua, D. C. Dieterich, E. D. Gundelfinger, C. Hoogenraad, R. L. Huganir, R. Jahn, P. S. Kaeser, E. Kim, M. R. Kreutz, P. S. McPherson, B. M. Neale, V. O'Connor, D. Posthuma, T. A. Ryan, C. Sala, G. Feng, S. E. Hyman, P. D. Thomas, A. B. Smit and M. Verhage (2019). "SynGO: An Evidence-Based, Expert-Curated Knowledge Base for the Synapse." Neuron 103(2): 217-234 e214.

      Krishnankutty, A., T. Kimura, T. Saito, K. Aoyagi, A. Asada, S. I. Takahashi, K. Ando, M. Ohara-Imaizumi, K. Ishiguro and S. I. Hisanaga (2017). "In vivo regulation of glycogen synthase kinase 3beta activity in neurons and brains." Sci Rep 7(1): 8602.

      Lagundzin, D., K. L. Krieger, H. C. Law and N. T. Woods (2022). "An optimized co-immunoprecipitation protocol for the analysis of endogenous protein-protein interactions in cell lines using mass spectrometry." STAR Protoc 3(1): 101234.

      Lalo, U., W. Koh, C. J. Lee and Y. Pankratov (2021). "The tripartite glutamatergic synapse." Neuropharmacology 199: 108758.

      Lee, B. H., F. Schwager, P. Meraldi and M. Gotta (2018). "p37/UBXN2B regulates spindle orientation by limiting cortical NuMA recruitment via PP1/Repo-Man." J Cell Biol 217(2): 483-493.

      Lee, K. W., S. Lim and K. D. Kim (2022). "The Function of N-Myc Downstream-Regulated Gene 2 (NDRG2) as a Negative Regulator in Tumor Cell Metastasis." Int J Mol Sci 23(16).

      Lee, M. C., K. K. Ting, S. Adams, B. J. Brew, R. Chung and G. J. Guillemin (2010). "Characterisation of the expression of NMDA receptors in human astrocytes." PLoS One 5(11): e14123.

      Li, X., M. Wilmanns, J. Thornton and M. Kohn (2013). "Elucidating human phosphatase-substrate networks." Sci Signal 6(275): rs10.

      Lin, J. S. and E. M. Lai (2017). "Protein-Protein Interactions: Co-Immunoprecipitation." Methods Mol Biol 1615: 211-219.

      Ma, T. M., S. Abazyan, B. Abazyan, J. Nomura, C. Yang, S. Seshadri, A. Sawa, S. H. Snyder and M. V. Pletnikov (2013). "Pathogenic disruption of DISC1-serine racemase binding elicits schizophrenia-like behavior via D-serine depletion." Mol Psychiatry 18(5): 557-567.

      Madrigal, M. P., A. Portales, M. P. SanJuan and S. Jurado (2019). "Postsynaptic SNARE Proteins: Role in Synaptic Transmission and Plasticity." Neuroscience 420: 12-21.

      Marsh, J. A., B. Dancheck, M. J. Ragusa, M. Allaire, J. D. Forman-Kay and W. Peti (2010). "Structural diversity in free and bound states of intrinsically disordered protein phosphatase 1 regulators." Structure 18(9): 1094-1103.

      McClatchy, D. B., N. K. Yu, S. Martinez-Bartolome, R. Patel, A. R. Pelletier, M. Lavalle-Adam, S. B. Powell, M. Roberto and J. R. Yates (2018). "Structural Analysis of Hippocampal Kinase Signal Transduction." ACS Chem Neurosci 9(12): 3072-3085.

      Misir, E. and G. G. Akay (2023). "Synaptic dysfunction in schizophrenia." Synapse 77(5): e22276.

      Mivechi, N. F., L. D. Trainor and G. M. Hahn (1993). "Purified mammalian HSP-70 KDA activates phosphoprotein phosphatases in vitro." Biochem Biophys Res Commun 192(2): 954-963.

      Moon, I. S., H. Sakagami, J. Nakayama and T. Suzuki (2008). "Differential distribution of synGAP alpha1 and synGAP beta isoforms in rat neurons." Brain Res 1241: 62-75.

      Pankow, S., C. Bamberger, D. Calzolari, A. Bamberger and J. R. Yates, 3rd (2016). "Deep interactome profiling of membrane proteins by co-interacting protein identification technology." Nat Protoc 11(12): 2515-2528.

      Pankow, S., C. Bamberger, D. Calzolari, S. Martinez-Bartolome, M. Lavallee-Adam, W. E. Balch and J. R. Yates, 3rd (2015). "∆F508 CFTR interactome remodelling promotes rescue of cystic fibrosis." Nature 528(7583): 510-516.

      Park, G. H., H. Noh, Z. Shao, P. Ni, Y. Qin, D. Liu, C. P. Beaudreault, J. S. Park, C. P. Abani, J. M. Park, D. T. Le, S. Z. Gonzalez, Y. Guan, B. M. Cohen, D. L. McPhie, J. T. Coyle, T. A. Lanz, H. S. Xi, C. Yin, W. Huang, H. Y. Kim and S. Chung (2020). "Activated microglia cause metabolic disruptions in developmental cortical interneurons that persist in interneurons from individuals with schizophrenia." Nat Neurosci 23(11): 1352-1364.

      Partiot, E., A. Hirschler, S. Colomb, W. Lutz, T. Claeys, F. Delalande, M. S. Deffieu, Y. Bare, J. R. E. Roels, B. Gorda, J. Bons, D. Callon, L. Andreoletti, M. Labrousse, F. M. J. Jacobs, V. Rigau, B. Charlot, L. Martens, C. Carapito, G. Ganesh and R. Gaudin (2024). "Brain exposure to SARS-CoV-2 virions perturbs synaptic homeostasis." Nat Microbiol.

      Qian, J., E. Vafiadaki, S. M. Florea, V. P. Singh, W. Song, C. K. Lam, Y. Wang, Q. Yuan, T. J. Pritchard, W. Cai, K. Haghighi, P. Rodriguez, H. S. Wang, D. Sanoudou, G. C. Fan and E. G. Kranias (2011). "Small heat shock protein 20 interacts with protein phosphatase-1 and enhances sarcoplasmic reticulum calcium cycling." Circ Res 108(12): 1429-1438.

      Ragusa, M. J., B. Dancheck, D. A. Critton, A. C. Nairn, R. Page and W. Peti (2010). "Spinophilin directs protein phosphatase 1 specificity by blocking substrate binding sites." Nat Struct Mol Biol 17(4): 459-464.

      Rodrigues-Neves, A. C., A. F. Ambrosio and C. A. Gomes (2022). "Microglia sequelae: brain signature of innate immunity in schizophrenia." Transl Psychiatry 12(1): 493.

      Salek, A. B., E. T. Claeboe, R. Bansal, N. F. Berbari and A. J. Baucum, 2nd (2023). "Spinophilin-dependent regulation of GluN2B-containing NMDAR-dependent calcium influx, GluN2B surface expression, and cleaved caspase expression." Synapse 77(3): e22264.

      Savas, J. N., B. D. Stein, C. C. Wu and J. R. Yates, 3rd (2011). "Mass spectrometry accelerates membrane protein analysis." Trends Biochem Sci 36(7): 388-396.

      Selak, S., A. V. Paternain, M. I. Aller, E. Pico, R. Rivera and J. Lerma (2009). "A role for SNAP25 in internalization of kainate receptors and synaptic plasticity." Neuron 63(3): 357-371.

      Serrano, A., R. Robitaille and J. C. Lacaille (2008). "Differential NMDA-dependent activation of glial cells in mouse hippocampus." Glia 56(15): 1648-1663.

      Sjostrom, P. J., G. G. Turrigiano and S. B. Nelson (2003). "Neocortical LTD via coincident activation of presynaptic NMDA and cannabinoid receptors." Neuron 39(4): 641-654.

      Stanca, S., M. Rossetti, L. Bokulic Panichi and P. Bongioanni (2024). "The Cellular Dysfunction of the Brain-Blood Barrier from Endothelial Cells to Astrocytes: The Pathway towards Neurotransmitter Impairment in Schizophrenia." Int J Mol Sci 25(2).

      Sumi, T. and K. Harada (2023). "Muscarinic acetylcholine receptor-dependent and NMDA receptor-dependent LTP and LTD share the common AMPAR trafficking pathway." iScience 26(3): 106133.

      Svenningsson, P., E. T. Tzavara, R. Carruthers, I. Rachleff, S. Wattler, M. Nehls, D. L. McKinzie, A. A. Fienberg, G. G. Nomikos and P. Greengard (2003). "Diverse psychotomimetics act through a common signaling pathway." Science 302(5649): 1412-1415.

      Tarasov, V. V., A. A. Svistunov, V. N. Chubarev, S. S. Sologova, P. Mukhortova, D. Levushkin, S. G. Somasundaram, C. E. Kirkland, S. O. Bachurin and G. Aliev (2019). "Alterations of Astrocytes in the Context of Schizophrenic Dementia." Front Pharmacol 10: 1612.

      Terrak, M., F. Kerff, K. Langsetmo, T. Tao and R. Dominguez (2004). "Structural basis of protein phosphatase 1 regulation." Nature 429(6993): 780-784.

      Tokizane, K., C. S. Brace and S. I. Imai (2024). "DMH(Ppp1r17) neurons regulate aging and lifespan in mice through hypothalamic-adipose inter-tissue communication." Cell Metab 36(2): 377-392 e311.

      Tomasoni, R., D. Repetto, R. Morini, C. Elia, F. Gardoni, M. Di Luca, E. Turco, P. Defilippi and M. Matteoli (2013). "SNAP-25 regulates spine formation through postsynaptic binding to p140Cap." Nat Commun 4: 2136.

      Vainio, L., S. Taponen, S. M. Kinnunen, E. Halmetoja, Z. Szabo, T. Alakoski, J. Ulvila, J. Junttila, P. Lakkisto, J. Magga and R. Kerkela (2021). "GSK3beta Serine 389 Phosphorylation Modulates Cardiomyocyte Hypertrophy and Ischemic Injury." Int J Mol Sci 22(24).

      van Oostrum, M., T. M. Blok, S. L. Giandomenico, S. Tom Dieck, G. Tushev, N. Furst, J. D. Langer and E. M. Schuman (2023). "The proteomic landscape of synaptic diversity across brain regions and cell types." Cell 186(24): 5411-5427 e5423.

      Vilalta, A. and G. C. Brown (2018). "Neurophagy, the phagocytosis of live neurons and synapses by glia, contributes to brain development and disease." FEBS J 285(19): 3566-3575.

      Weeratunga, S., R. S. Gormal, M. Liu, D. Eldershaw, E. K. Livingstone, A. Malapaka, T. P. Wallis, A. T. Bademosi, A. Jiang, M. D. Healy, F. A. Meunier and B. M. Collins (2024). "Interrogation and validation of the interactome of neuronal Munc18-interacting Mint proteins with AlphaFold2." J Biol Chem 300(1): 105541.

      Winship, I. R., S. M. Dursun, G. B. Baker, P. A. Balista, L. Kandratavicius, J. P. Maia-de-Oliveira, J. Hallak and J. G. Howland (2019). "An Overview of Animal Models Related to Schizophrenia." Can J Psychiatry 64(1): 5-17.

      Xu, Z., L. Sadleir, H. Goel, X. Jiao, Y. Niu, Z. Zhou, G. de Valles-Ibanez, G. Poke, M. Hildebrand, N. Lieffering, J. Qin and Z. Yang (2024). "Genotype and phenotype correlation of PHACTR1-related neurological disorders." J Med Genet 61(6): 536-542.

      Zhang, J., L. Zhang, S. Zhao and E. Y. Lee (1998). "Identification and characterization of the human HCG V gene product as a novel inhibitor of protein phosphatase-1." Biochemistry 37(47): 16728-16734.

      Zhang, Y., K. Chen, S. A. Sloan, M. L. Bennett, A. R. Scholze, S. O'Keeffe, H. P. Phatnani, P. Guarnieri, C. Caneda, N. Ruderisch, S. Deng, S. A. Liddelow, C. Zhang, R. Daneman, T. Maniatis, B. A. Barres and J. Q. Wu (2014). "An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex." J Neurosci 34(36): 11929-11947.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary: McClatchy, Powell and Yates aimed at identifying a protein interactome associated to schizophrenia. For that, they treated rats (N14 and N15) with PCP, which disturbs gutamatergic transmission, as a model for the disease and co-immunoprecipitated hippocampi proteins, which were further analyzed by standard LC-MS.

      The study is new, considering not much has been done in this direction in the field of schizophrenia. This justifies its publication. On the other hand, a major flaw of the is the lack of information on the level of interaction of the so called protein interactome. Meaning, we cannot distinguish, as the study was performed, which proteins are directly interacting with the targets of interest from proteins which are interacting with targets´ interactors. The different shells of interaction are crucial information in protein interactomics.

      Major: most of I am pointing below must be at least discussed or better presented in the paper, as It may not be solvable considering how the study has been conducted.

      1. The study fails in defining the level of interaction of the protein interactome with the considered targets. This has been shortly mentioned in the discussion, but must be more explicit to readers, for instance, in the abstract, introduction and in the methods sections.
      2. Considering the protein extraction protocol, it is fair to mention that only the most soluble proteins are being considered here. I am bringing this up since the importance of membrane receptors is clear in the studied context.
      3. It is not clear from the methods description if antibodies from all 8 targets were all together in one Co-IP or have been incubated separately in 8 different hippocampi samples. It seems the first, given how results have been presented. If so, this maximizes the major issue raised above (in 1).
      4. Definitely, results here are not representing a "SCZ PPI network". PCP-treated animals, as any other animal model, are rather limited models to schizophrenia. As a complex multifactorial disease, synaptic deficits, which is the focus of this study, can no longer be considered "the pivot" of the disease. Synaptic dysfunction is only one among many other factors associated to schizophrenia.
      5. Authors should look for protein interactions that might be happening also in glial cells. They are not the majority in hippocampus, but are present in the type of tissue analyzed here. Thus, some of the interactions observed might be more abundantly present in those cells. Maybe enriching using bioinformatics tools the PPI network to different cell types.

      Minor:

      1. in the abstract, it is not clear if 90% of the PPI are novel to brain tissue in general or specifically schizophrenia.
      2. authors refer to LC-MS-based proteomics as "MS" all across the text. Who am I to say this to Yates et al, but I think it is rather simplified use "Mass Spectrometry Analysis", when this is a typical LC-MS type of analysis
      3. Several references used to construct the hypothesis of the paper are rather outdated: several from 10-15 years ago. It would be interesting to provide to the reader up to date references, given the rapid pace science has been progressing.
      4. "UniProt rat database". Please, state the version and if reviewed or unreviewed.

      Significance

      The study is informative, and has great potential to enrich the specific literature of this field. But should tone down some arguments, given the experimental limitations of the PPI network (as described above) and should state PCP-treated rats as a limited model to schizophrenia.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      __Evidence, reproducibility and clarity __

      The work by Przanowska et al., sought to understand the role of ORC2 in murine development and further wanted to discover its role in liver endo-reduplication. The overall methods used is sufficient enough to address its role but is not very conclusive based on their overall results and data provided as elaborated in below comments.

      Major Comments:

      1. The major issue of the paper is how well is ORC2 depleted in perinatal liver (Fig. 2C) and is not very clear from the data as all the western blots are at very low exposure levels and bands are very weak (still weak bands seen). There are good antibodies of ORC2 which can be used for IHC staining and can be used to address the extent of ORC2 depletion.

      We have now shown that ORC2 protein is significantly decreased in the hepatocytes of the Orc2 KO and DKO livers (New Fig. 2C and 6D). The decrease is consistent, with 4-5 mice examined, and all showing the depletion. We have been unable to do immunohistochemistry on tissue sections of the mouse livers with the anti-ORC antibodies we have tried, and this could be a reflection of the low level of the proteins. On hepatocytes in culture we have obtained faint signal with the anti-ORC2 antibody in WT cells, and this is clearly absent in 100% of the hepatocytes. See Fig. R1 below.

      __Reviewer Fig R1: __


      A) Immunofluorescence of hepatocytes in culture from livers of WT and two DKO mice.

      B) Quantitation of A) from counting 70-100 cells from each specimen.

      However, the calculations in the methods and the discussion are very compelling that at least the last 6-9 cell divisions in normal development start with 2n nuclei in the livers at baseline (Fig. 3B-G and 6I).

      Why in Fig 2C, the M2 mice is showing an equivalent level of ORC2 protein compared to mice M1 with NO CRE expression (compare lane1 and lane5). So, the results are based on one mouse which I do not think is significant enough to come to the conclusion. The authors need to add more data from different mice for statistical significance. Please use IHC to show the depletion of ORC2 protein in the liver sections.

      We had used total liver and had pointed out that residual ORC2 protein will be seen from stromal cells (endothelia, blood vessels and blood cells). We have therefore removed the figure which measured ORC2 levels in total liver and have now shown that when hepatocytes are isolated from five animals there was a massive depletion of ORC2 in all five animals (new Fig. 3C).

      As nicely demonstrated in the previous paper by Okano-Uchida et al., 2018 that ORC1 depletion in the liver shows an DNA ploidy effect from 6-week onwards. The authors need to demonstrate in this paper also when the 16N phenotype is observed starting from week1 to 12 months.

      Based on the results from our previous paper (Okano-Uchida et al., 2018) we decided to measure 16N phenotype at 6 weeks of age. The endoreduplication occurs at a stage when ORC2 protein is undetectable during normal development or during regeneration.

      In the double knockout experiments (ORC1 and ORC2) the authors are not even bothered to demonstrate that how much are both the proteins are actually depleted from the cells, so on the results obtained from these mice experiments are not conclusive or explanatory.

      We have performed immunoblotting of isolated hepatocytes and immunohistochemistry of livers for ORC1 and ORC2. Our data shows that both proteins are depleted in all four mice tested (New Fig. 6D).

      Minor points:

      1. Why are scale bars missing in right panel of Fig. 2G, Fig. 6D Supp Fig. 2B KO studies. The authors need to confirm that that all the large nuclei have NO or less significant ORC2 protein through IHC H&E staining.

      The scale bars are missing from the right panels to avoid redundancy. We have added “Both panels are at the same scale.” in the figure legend, according to https://doi.org/10.1371/journal.pbio.3001161.

      1. Please explain why is EYFP in Fig. 5G is cytoplasmic compared to Fig 4C (nuclear). We consistently see this variability and it was there in our previous results (Okano-Uchida et al., 2018), where EYFP was cytoplasmic in tissues, but was nuclear (and some cytoplasmic) in hepatocytes in culture.

      We do not know the reason for this difference but consistently see this difference. We now say in the text: “We did not explore why the EYFP protein is mostly nuclear in hepatocytes in culture (Fig. 4C) and mostly cytoplasmic in hepatocytes in the liver tissue (Fig. 5G, 7G), but speculate that differences in signaling pathways or fixation techniques between the two conditions contribute to this difference.”

      Are authors using the same genotype of Alb-Cre mice as shown by Okano-Uchida et al., 2018 as I do not find the reference of Schuler et. al., 2004 (PMID:15282742).

      We have been using two independent Alb-Cre animals. This is now described in the Methods.


      Significance

      The article is exactly based on their previous published paper but instead of ORC1, they were interested in dissecting the role of ORC2. Although they have discussed that CDC6 may be involved in replacing ORC1 KO mice to rescue the extensive DNA replication in endoreduplication, but instead of going to hunt the role of CDC6 in endoreduplication they checked the effect of ORC2 which actually lower the overall impact of the paper.

      We studied ORC2 conditional KO mice in a similar manner to the previously published ORC1 conditional KO in order to ensure (1) that the lack of effect in the Orc1 KO was not because ORC1 can theoretically be substituted for by CDC6 and (2) to establish the double KO of Orc1 and Orc2. To the best of our knowledge this is the first description of removal of two subunits of ORC complex at once in a mouse model. Moreover, in the light of rising recognition of sex as biological variable, we report sex-dependent effects which are very intriguing.

      We have not attempted knocking out CDC6 to uncover novel mechanisms of DNA replication, because we first needed to make sure that the mice can truly endo-reduplicate without two of the six subunits of ORC. Note that our published results in cancer cell lines (Shibata, 2016) show that CDC6 is still essential in the ORC KO cell lines, so a future experiment will likely reveal that CDC6 is still essential for endoreduplication in the ORC KO mice in vivo.

      Reviewer #2

      __Evidence, reproducibility and clarity __

      It has been reported that in the absence of ORC1, liver cells can still endoreduplicate and it has been speculated that this might occur if CDC6 can replace, at least partially, the function of ORC1. Here, authors evaluate if this is also true in the absence of ORC2 and found that ORC2 is required for cell proliferation in mouse hepatocytes but not for endoreduplication. This is also the case after combining the conditional mutations of ORC1 and ORC2. They propose that a mechanism must exist to load sufficient MCM2-7 to support DNA replication in the absence of these two ORC subunits. Some of the conclusions need further experimental support. The rationale for testing the requirement of ORC2, with or without ORC1, for endoreduplication is valid. However, a key point is that the endoreduplication level seems to be higher in the absence of ORC2 or both ORC1 and ORC2, and this is not properly addressed. Also, mechanistic details on how this could be triggered are absent from this study. As indicated below almost every figure in this manuscript contains weak points (see below).

      We now discuss the following: “One possible explanation of the greater endoreduplication in both our papers is that mitosis may be arrested earlier in development by G2 DNA damage checkpoints activated by incomplete licensing and replication of the genome in the absence of ORC. As a result, endoreduplication cycles could begin earlier in development resulting in greater endoreduplication.”

      Major 1. Fig 1G, needs a detailed comment and justification.

      We have added the following to the text: “The proliferation rate of the MEF were measured by MTT assays. Even in the Orc2+/+ MEF, the infection with adeno-Cre decreased proliferation a little (the orange line compared to the blue line in Fig. 1G). However, for Orc2f/f MEF infection with adeno-Cre impairs proliferation even further (yellow line compared to black line in Fig. 1G)..

      Note that Adeno-Cre has been reported to be toxic for cell proliferation (citations 1, 2, 3), and so we included Adeno-Cre expression in ORC2+/+ (WT) as a background control.

      Citation:

      1. Pfeifer A, Brandon EP, Kootstra N, Gage FH, Verma IM: Delivery of the Cre recombinase by a self deleting lentiviral vector: Efficient gene targeting in vivo. Proc Natl Acad Sci USA. 2001, 98: 11450-11455. 10.1073/pnas.201415498.
      2. Loonstra A, Vooijs M, Beverloo HB, Allak BA, Drunen EV, Kanaar R, Berns A, Jonkers J: Growth inhibition and DNA damage induced by Cre recombinase in mammalian cells. Proc Natl Acad Sci USA. 2001, 98: 9209-9214. 10.1073/pnas.161269798.
      3. Schmidt EE, Taylor DS, Prigge JR, Barnet S, Capecchi R: Illegitimate Cre-dependent chromosome rearrangements in transgenic mouse spermatids. Proc Natl Acad Sci USA. 2000, 97: 13702-13707. 10.1073/pnas.240471297.
      4. Fig 2D-F. Is this conclusion applicable to other endoreplicating tissues? Have authors consider to analyze body weight and liver weight measurements after normalization with similar data from a non-affected organ? The conditional KO was performed specifically in the liver. ORC is intact in other tissues in these animals. As a future direction our lab plans to study cardiac-specific conditional KO of ORC subunits to test whether other endo-reduplicating tissues can also synthesize DNA in the absence of ORC subunits.

      Fig 3 shows inconsistent results or results that lack proper justification in the text. The 2C peak is missing in Fig 3E (yellow line, positive control). However, 2n nuclei appear in Fig 3F-H. Also, the blue and yellow peaks do not coincide in the flow cytometry profiles, in particular for 8C and 16C.

      There was an error in the plotting of the former Fig. 3E. The information is better presented in the former Fig. 3F-H (now Fig. 3E-G) and so have removed the former Fig. 3E from the paper.

      Fig 4. Shorter EdU pulses could be more informative of the actual amount of S-phase cells. Thus, the use of a 2h EdU pulse needs a clear justification.

      The half-life of EDU incorporation differs slightly between in vivo and in vitro conditions. In vivo, slower cell proliferation requires a longer time, approximately 4 hours. However, in vitro, liver cells grow faster, and a 2-hour EDU pulse with 20 µM is sufficient for detection compared to a 3-hour pulse with 10 µM BrdU (Okano-Uchida et al., 2018). Several publications also use a 2-hour EDU incubation time (https://doi.org/10.1098/rsob.150172).

      Fig 5. EYFP is cytoplasmic, in contrast with results shown in Fig 4C

      We consistently see this variability and it was there in our previous results (Okano-Uchida et al., 2018), where EYFP was cytoplasmic in tissues, but was nuclear (and some cytoplasmic) in hepatocytes in culture.

      We do not know the reason for this difference but consistently see this difference. We now say in the text: “We did not explore why the EYFP protein is mostly nuclear in hepatocytes in culture (Fig. 4C) and mostly cytoplasmic in hepatocytes in the liver tissue (Fig. 5G, 7G), but speculate that differences in signaling pathways or fixation techniques between the two conditions contribute to this difference.”

      Fig 6. Results obtained with the double mutant are poorly described.

      We have split the figure into two figures (New Fig. 6 and 7) edited the results section to ensure that they are easily comprehended by the readers. We have also included Westerns from hepatocyte cell lysates of four DKO mice to show that ORC1 and ORC2 proteins are reproducible decreased (New Fig. 6D).

      What are the level of other pre-RC components in the mutants used in this study. This could be easily evaluated by Western blotting

      Despite the technical difficulty of not having antibodies that recognize all the mouse initiation proteins, we have now measured mouse ORC1, ORC2, ORC3, ORC5, ORC6, CDC6 and the MCM2 and MCM3 subunits of MCM2-7. The results do not show a consistent decrease or increase of any of these proteins in individual mice of the two genotypes, Orc2-/- or DKO (New Fig. 2D and 6E)

      How do authors justify their claim that a very limited amount of ORC are sufficient to load a vast excess of MCM2-7 hexamers?

      The rationale is stated in the introduction from data from cancer cell lines: “Given that WT cells have about 150,000 molecules of ORC2, even if this truncated protein is functional ORC2, ~150 molecules of the protein would be expected to load MCM2-7 double hexamers on at least 50,000 origins of replication. Experimentally, we show in Shibata, 2020 (Fig. 7C), that although ORC subunits are undetectable on Westerns, MCM2-7 association with the chromatin is unchanged. By the way, we do not say “vast excess” of MCM2-7, just sufficient MCM2-7 to fire 50,000 origins.

      Minor 1. The titles of the Results section could be more informative of the main conclusion rather than simply descriptive

      We updated our Results titles to be more informative.

      The Discussion is too long

      We have shortened the discussion by removing our calculations to the Results section and abbreviating some of the discussion on endoreduplication. However we had to insert new items brough forth by the reviewers. Due to the controversy of this topic in our field, we had to include extensive discussion of current literature and put our results in their proper context.

      Significance

      The topic is relevant and the hypothesis tested is reasonable, although the conceptual advance is limited (see also below). The major limitation is the absence of mechanistic details addressing the occurrence of extra endoreduplication cycles (compared to controls) in the ORC1 and ORC2 mutants.

      Reviewer #3

      __Evidence, reproducibility and clarity: __

      The origin recognition complex (ORC) is an essential loading factor for the replicative Mcm2-7 helicase complex. Despite ORC's critical role in DNA replication, there have been instances where the loss of specific ORC subunits has still seemingly supported DNA replication in cancer cells, endocycling hepatocytes, and Drosophila polyploid cells. Critically, all tested ORC subunits are essential for development and proliferation in normal cells. This presents a challenge, as conditional knockouts need to be generated, and a skeptic can always claim that there were limiting but sufficient ORC levels for helicase loading and replication in polyploid or transformed cells. That being said, the authors have consistently pushed the system to demonstrate replication in the absence or extreme depletion of ORC subunits.

      Here, the authors generate conditional ORC2 mutants to counter a potential argument with prior conditional ORC1 mutants that Cdc6 may substitute for ORC1 function based on homology. They also generate a double ORC1 and ORC2 mutant, which is still capable of DNA replication in polyploid hepatocytes. While this manuscript provides significantly more support for the ability of select cells to replicate in the absence or near absence of select ORC subunits, it does not shed light on a potential mechanism. While a mechanistic understanding of how these cells proliferate in the absence or extreme depletion of ORC subunits is outside the scope of the current manuscript, it would have been beneficial to see more functional analyses to help guide the field. For example, is there a delay or impairment in Mcm2-7 loading in G1 (FACs-based loading assay from the Cook Lab (Matson et al., eLife. 2017)) in primary hepatocytes with the ORC2 conditional deletion? Is copy number maintained as cells increase polyploidy in the absence of ORC subunits, or are some regions of the genome more sensitive to ORC depletion (CGH arrays or sequencing of the flow-sorted polyploid cells)?

      We thank the reviewer for recognizing the main point of these experiments: to dispel the argument that CDC6 can substitute for ORC1 in the six-subunit ORC (although no one has demonstrated this, the argument is made on the basis of close sequence homology between CDC6 and ORC1). The second point, also appreciated by the reviewer is to show that it is possible to find cells that replicate in the absence or near absence of two ORC subunits.

      The mechanistic questions raised are important, and we will address them here:

      Is there a delay or impairment of MCM2-7 loading in G1? The hepatocytes in culture are fragile and not immortalized and thus, this issue can be much more easily addressed in the cancer cell lines we have made that are missing several ORC subunits and will do that in a later paper. Note however, the surprising lack of change in MCM2-7 association in cell lines where both ORC2 and ORC5 are deleted (Shibata, 2020, Fig. 7C).

      Are some regions of the genome more sensitive to ORC deletion during the polyploidization? We could not find any paper where people have investigated whether the whole genome is uniformly polyploidized in livers. In other words, the baseline conditions in WT livers have not been established. We therefore have postponed experiments to answer this question for a later paper. Note that in unpublished data from mapping SNS-seq origins in WT and ORC deletion cell lines there does not appear to be selective firing of certain origins over others in the deletion cell lines.

      Additional points: I didn't understand how the numbers were derived in Table 2. Was there really a 20-fold decrease in nuclear density for female ORC1 and ORC2 double-deletion hepatocytes? The differences in Figure S2 are dramatic, but not 20-fold dramatic.

      We measure the relative nuclear density by counting the number of plump nuclei (hepatocytes) per field as described for Fig. 5F and 7F now in the Methods section. The reviewer is correct in that we overestimated the decrease of nuclear density in the female DKO mice by two-fold. The revised calculations suggest that 6 cell divisions occur in the female DKO mice after the ORC proteins have decreased to at least __Significance: __

      The strengths of this manuscript are the mouse genetics and the generation of conditional alleles of Orc2 and the rigorous assessment of phenotypes resulting from limiting amounts of specific ORC subunits. It also builds on prior work with ORC1 to rule out Cdc6 complementing the loss of ORC1. The weakness is that it is a very hard task to resolve the fundamental question of how much ORC is enough for replication in cancer cells or hepatocytes. Clearly, there is a marked reduction in specific ORC subunits that is sufficient to impact replication during development and in fibroblasts, but the devil's advocate can always claim limiting levels of ORC remaining in these specialized cells. The significance of the work is that the authors keep improving their conditional alleles (and combining them), thus making it harder and harder (but not impossible) to invoke limiting but sufficient levels of ORC. At this point, the investigators and the field are well-positioned to attempt future functional CRISPR screens to identify other factors that may modulate the response to the loss of ORC subunits. This work will be of interest to the DNA replication, polyploidy, and genome stability communities.

      We thank the reviewer for getting the important point of this paper: “making it harder and harder (but not impossible) to invoke limiting but sufficient levels of ORC….” In other words, either ORC is completely dispensable for loading MCM2-7 in certain cancer cell lines and hepatocytes or it is highly catalytic and one molecule of ORC can load a few hundred MCM2-7 doublets so that most origins in the genome are licensed and capable of firing. We are trying the CRISPR screens in cancer cell lines that the reviewer envisages

    1. Author response:

      “Overall, the paper has several strengths, including leveraging large-scale, multi-modal datasets, using computational reasonable tools, and having an in-depth discussion of the significant results.”

      We thank the reviewer for the very supportive comments.

      Based on the comments and questions, we have grouped the concerns and corresponding responses into three categories.

      (1) The scope and data selection

      “The results are somewhat inconclusive or not validated.

      The overall results are carefully designed, but most of the results are descriptive. While the authors are able to find additional evidence either from the literature or explain the results with their existing knowledge, none of the results have been biologically validated. Especially, the last three result sections (signaling pathways, eQTLs, and TF binding) further extended their findings, but the authors did not put the major results into any of the figures in the main text.”

      The goal of this manuscript is to provide a list of putative childhood obesity target genes to yield new insights and help drive further experimentation. Moreover, the outputs from signaling pathways, eQTLs, and TF binding, although noteworthy and supportive of our method, were not particularly novel. In our manuscript we placed our focus on the novel findings from the analyses. We did, however, report the part of the eQTLs analysis concerning ADCY3, which brought new insight to the pathology of obesity, in Figure 4C.

      “The manuscript would benefit from an explanation regarding the rationale behind the selection of the 57 human cell types analyzed. it is essential to clarify whether these cell types have unique functions or relevance to childhood development and obesity.”

      We elected to comprehensively investigate the GWAS-informed cellular underpinnings of childhood development and obesity. By including a diverse range of cell types from different tissues and organs, we sought to capture the multifaceted nature of cellular contributions to obesity-related mechanisms, and open new avenues for targeted therapeutic interventions.

      There are clearly cell types that are already established as being key to the pathogenesis of obesity when dysregulated: adipocytes for energy storage, immune cell types regulating inflammation and metabolic homeostasis, hepatocytes regulating lipid metabolism, pancreatic cell types intricately involved in glucose and lipid metabolism, skeletal muscle for glucose uptake and metabolism, and brain cell types in the regulation of appetite, energy expenditure, and metabolic homeostasis.

      While it is practical to focus on cell types already proven to be associated with or relevant to obesity, this approach has its limitations. It confines our understanding to established knowledge and rules out the potential for discovering novel insights from new cellular mechanisms or pathways that could play significant roles in the pathogenesis if obesity. Therefore, it is was essential to reflect known biology against the unexplored cell types to expand our overall understanding and potentially identify innovative targets for treatment or prevention.

      “I wonder whether the used epigenome datasets are all from children. Although the authors use literature to support that body weight and obesity remain stable from infancy to adulthood, it remains uncertain whether epigenomic data from other life stages might overlook significant genetic variants that uniquely contribute to childhood obesity.”

      The datasets utilized in our study were derived from a combination of sources, both pediatric and adult. We recognize that epigenetic profiles can vary across different life stages but our principal effort was to characterize susceptibility BEFORE disease onset.

      “Given that the GTEx tissue samples are derived from adult donors, there appears to be a mismatch with the study's focus on childhood obesity. If possible, identifying alternative validation strategies or datasets more closely related to the pediatric population could strengthen the study's findings.” 

      We thank the reviewer for raising this important point. We acknowledge that the GTEx tissue samples are derived from adult donors, which might not perfectly align with the study's focus on childhood obesity. The ideal strategy would be a longitudinal design that follows individuals from childhood into adulthood to bridge the gap between pediatric and adult data, offering systematic insights into how early-life epigenetic markers influencing obesity later in life. In future work, we aim to carry out such efforts, which will represent substantial time and financial commitment.

      Along the same lines, the Developmental Genotype-Tissue Expression (dGTEx) Project is a new effort to study development-specific genetic effects on gene expression at 4 developmental windows spanning from infant to post-puberty (0-18 years). Donor recruitment began in August 2023 and remains ongoing. Tissue characterization and data production are underway. We hope that with the establishment of this resource, our future research in the field of pediatric health will be further enhanced.

      “Figure 1B: in subplots c and d, the results are either from Hi-C or capture-C. Although the authors use different colors to denote them, I cannot help wondering how much difference between Hi-C and capture-C brings in. Did the authors explore the difference between the Hi-C and capture-C?”.

      Thank you for your comment. It is not within the scope of our paper to explore the differences between the Hi-C and Capture-C methods. In the context of our study, both methods serve the same purpose of detecting chromatin loops that bring putative enhancers to sometimes genomically distant gene promoters. Consequently, our focus was on utilizing these methods to identify relevant chromatin interactions rather than comparing their technical differences.

      (2) Details on defining different categories of the regions of interest

      “Some technical details are missing.

      While the authors described all of their analysis steps, a lot of the time, they did not mention the motivation. Sometimes, the details were also omitted.”

      We will add a section to the revision to address the rationale behind different OCRs categories.

      “Line 129: should "-1,500/+500bp" be "-500/+500bp"? 

      A gene promoter was defined as a region 1,500 bases upstream to 500 bases downstream of the TSS. Most transcription factor binding sites are distributes upstream (5’) from TSS, and the assembly of transcription machinery occurs up to 1000 bases 5’ from TSS. Given our interest in SNPs that can potentially disrupt transcription factor binding, this defined promoter length allowed us to capture such SNPs in our analyses.

      “How did the authors define a contact region?”

      Chromatin contact regions identified by Hi-C or Capture-C assays are always reported as pairs of chromatin regions. The Supplementary eMethods provide details on the method of processing and interaction calling from the Hi-C and Capture-C data.

      “The manuscript would benefit from a detailed explanation of the methods used to define cREs, particularly the process of intersecting OCRs with chromatin conformation data. The current description does not fully clarify how the cREs are defined.”

      “In the result section titled "Consistency and diversity of childhood obesity proxy variants mapped to cREs", the authors introduced the different types of cREs in the context of open chromatin regions and chromatin contact regions, and TSS. Figure 2A is helpful in some way, but more explanation is definitely needed. For example, it seems that the authors introduced three chromatin contacts on purpose, but I did not quite get the overall motivation.”

      We apologize for the confusion. Our definition of cREs is consistent throughout the study. Figure 2A will be the first Figure 1A in the revision in order to aid the reader.

      The 3 representative chromatin loops illustrate different ways the chromatin contact regions (pairs of blue regions under blue arcs) can overlap with OCRs (yellow regions under yellow triangles – ATAC peaks) and gene promoters.

      [1] The first chromatin loop has one contact region that overlaps with OCRs at one end and with the gene promoter at the other. This satisfies the formation of cREs; thus, the area under the yellow ATAC-peak triangle is green.

      [2] The second loop only overlapped with OCR at one end, and there was no gene promoter nearby, so it is unqualified as cREs formation.

      [3] The third chromatin loop has OCR and promoter overlapping at one end. We defined this as a special cRE formation; thus, the area under the yellow ATAC-peak triangle is green.

      To avoid further confusion for the reader, we will eliminate this variation in the new illustration for the revised manuscript.

      “Figure 2A: The authors used triangles filled differently to denote different types of cREs but I wonder what the height of the triangles implies. Please specify.”

      The triangles are illustrations for ATAC-seq peaks, and the yellow chromatin regions under them are OCRs. The different heights of ATAC-seq peaks are usually quantified as intensity values for OCRs. However, in our study, when an ATAC-seq peak passed the significance threshold from the data pipeline, we only considered their locations, regardless of their intensities. To avoid further confusion for the reader, we will eliminate this variation in the new illustration for the revised manuscript.

      “Figure 1B-c. the title should be "OCRs at putative cREs". Similarly in Figure 1B-d.”

      cREs are a subset of OCRs.

      - In the section "Cell type specific partitioned heritability", the authors used "4 defined sets of input genomic regions". Are you corresponding to the four types of regions in Figure 2A? 

      Figure 2A will be the first Figure 1A in the revision and will be modified to showcase how we define OCRs and cREs.

      “It seems that the authors described the 771 proxies in "Genetic loci included in variant-to-genes mapping" (ln 154), and then somehow narrowed down from 771 to 94 (according to ln 199) because they are cREs. It would be great if the authors could describe the selection procedure together, rather than isolated, which made it quite difficult to understand.”

      In the Methods section entitled “Genetic loci included in variant-to-genes mapping," we described the process of LD expansion to include 771 proxies from 19 sentinel obesity-significantly associated signals. Not all of these proxies are located within our defined cREs. Figure 2B, now Figure 2A in the revision, illustrates different proportions of these proxies located within different types of regions, reducing the proxy list to 94 located within our defined cREs.

      “Figure 2. What's the difference between the 771 and 758 proxies? “

      13 out of 771 proxies did not fall within any defined regions. The remaining 758 were located within contact regions of at least one cell type regardless of chromatin state.

      (3) Typos

      “In the paragraph "Childhood obesity GWAS summary statistics", the authors may want to describe the case/control numbers in two stages differently. "in stage 1" and "921 cases" together made me think "1,921" is one number.”

      This will be amended in the revision.

      “Hi-C technology should be spelled as Hi-C. There are many places, it is miss-spelled as "hi-C". In Figure 1, the author used "hiC" in the legend. Similarly, Capture-C sometime was spelled as "capture-C" in the manuscript.”

      “At the end of the fifth row in the second paragraph of the Introduction section: "exisit" should be "exist".

      “In Figure 2A: "Within open chromatin contract region" should be "Within open chromatin contact region". 

      These typos and terminology inconsistencies will be amended in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Komarova et al. investigate the clinical prognostic ability of cell-level metabolic heterogeneity quantified via the fluorescence lifetime characteristics of NAD(P)H. Fluorescence lifetime imaging microscopy (FLIM) has been studied as a minimally invasive approach to measure cellular metabolism in live cell cultures, organoids, and animal models. Its clinical translation is spearheaded through macroscopic implementation approaches that are capable of large sampling areas and enable access to otherwise constrained spaces but lack cellular resolution for a one-to-one transition with traditional microscopy approaches, making the interpretation of the results a complicated task. The merit of this study primarily lies in its design by analyzing with the same instrumentation and approach colorectal samples in different research scenarios, namely in vitro cells, in vivo animal xenografts, and tumor tissue from human patients. These conform to a valuable dataset to explore the translational interpretation hurdles with samples of increasing levels of complexity. For human samples, the study specifically investigates the prediction ability of NAD(P)H fluorescence metrics for the binary classification of tumors of low and advanced stage, with and without metastasis, and low and high grade. They find that NAD(P)H fluorescence properties have a strong potential to distinguish between high- and low-grade tumors and a moderate ability to distinguish advanced-stage tumors from low-stage tumors. This study provides valuable results contributing to the deployment of minimally invasive optical imaging techniques to quantify tumor properties and potentially migrate into tools for human tumor characterization and clinical diagnosis.

      Strengths:

      The investigation of colorectal samples under multiple imaging scenarios with the same instrument and approach conforms to a valuable dataset that can facilitate the interpretation of results across the spectrum of sample complexity.

      The manuscript provides a strong discussion reviewing studies that investigated cellular metabolism with FLIM and the metabolic heterogeneity of colorectal cancer in general.

      The authors do a thorough acknowledgement of the experimental limitations of investigating human samples ex vivo, and the analytical limitation of manual segmentation, for which they provide a path forward for higher throughput analysis.

      Weaknesses:

      To substantiate the changes in fluorescence properties at the examined wavelength range (associated with NAD(P)H fluorescence) in relationship to metabolism, the study would strongly benefit from additional quantification of metabolic-associated metrics using currently established standard methods. This is especially interesting when discussing heterogeneity, which is presumably high within and between patients with colorectal cancer, and could help explain the particularities of each sample leading to a more in-depth analysis of the acquired valuable dataset.

      In order to address this issue, we have performed immunohistochemical staining of the available tumor samples for the two standard metabolic markers GLUT3 and LDHA.

      The results are included in Supplementary (Fig.S4). Discussion has been extended.

      Additionally, NAD(P)H fluorescence does not provide a complete picture of the cell/tissue metabolic characteristics. Including, or discussing the implications of including fluorescence from flavins would comprise a more compelling dataset. These additional data would also enable the quantification of redox metrics, as briefly mentioned, which could positively contribute to the prognosis potential of metabolic heterogeneity.

      We agree with the Reviewer that fluorescence from flavins could be helpful to obtain more complete data on cellular metabolic states. However, we lack to detect sufficiently intensive emission from flavins in colorectal cancer cells and tissues. The paragraph about flavins was added in Discussion and representative images - in Supplementary Material (Figure S5).

      In the current form of the manuscript, there is a diluted interpretation and discussion of the results obtained from the random forest and SHAP analysis regarding the ability of the FLIM parameters to predict clinicopathological outcomes. This is, not only the main point the authors are trying to convey given the title and the stated goals, but also a novel result given the scarce availability of these type of data, which could have a remarkable impact on colorectal cancer in situ diagnosis and therapy monitoring. These data merit a more in-depth analysis of the different factors involved. In this context, the authors should clarify how is the "trend of association" quantified (lines 194 and 199).

      We thank the Reviewer for this suggestion. The section has been updated with SHAP analysis using different parameters (dispersion D of t2, a1, tm and bimodality index BI of t2, a1, tm). It is now more clear that D-a1 is more strongly associated with clinicopathological outcomes compared with other variables. We have also added some biological interpretation of these results in the Discussion.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript "Metabolic heterogeneity of colorectal cancer as a prognostic factor: insights gained from fluorescence lifetime imaging" by Komarova et al., the authors used fluorescence lifetime imaging and quantitative analysis to assess the metabolic heterogeneity of colorectal cancer. Generally, this work is logically well-designed, including in vitro and in vivo animal models and ex vivo patient samples. However, since the key parameter presented in this study, the BI index, is already published in a previous paper by this group (Shirshin et al., 2022), and the quantification method of metabolic heterogeneity has already been well (and even better) described in previous studies (such as the one by Heaster et al., 2019), the novelty of this study is doubted. Moreover, I am afraid that the way of data analysis and presentation in this study is not well done, which will be mentioned in detail in the following sections.

      Strengths:

      (1) Solid experiments are performed and well-organized, including in vitro and in vivo animal models and ex vivo patient samples.

      (2) Attempt and efforts to build the association between the metabolic heterogeneity and prognosis for colorectal cancer.

      Weaknesses:

      (1) The human sample number (from 21 patients) is very limited. I wonder how the limited patient number could lead to reliable diagnosis and prognosis;.

      Additional 8 samples of patients’ tumors collected while the manuscript was under review were added to the present data. We agree that the number is still limited to conclude about the prognostic value of cell-level metabolic heterogeneity. But at this point we can expect that this parameter will become a metric for prognosis. We will continue this study to collect more samples of colorectal tumors and expand the approach to different cancer types.

      (2) The BI index or similar optical metrics have been well established by this and other groups; therefore, the novelty of this study is doubted.

      The purpose of this research was to quantify and compare the cellular metabolic heterogeneity across the systems of different complexity - commercial cell lines, tumor xenografts and patients’ tumors - using previously established FLIM-based metrics. For the first time, using FLIM, it was shown that heterogeneity of patients’ samples is much higher than of laboratory models and that it has associations with clinical characteristics of the tumors - the stage and the grade. In addition, this study provides evidence that bimodality (BI) in the distribution of metabolic features in the cell population is less important than the width of the spread (the dispersion value D).

      Some corrections have been made in the text on this point.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The following comments should be addressed to strengthen the rigor and clarity of the manuscript.

      (1) The ethical committee that approved the human studies should also be mentioned in the methods section, as was done with the animal studies.

      Information about the ethics committee has been added in the Manuscript.

      The study with the use of patients’ material was approved by the ethics committee of the Privolzhsky Research Medical University (approval № 09 from 30.06.2023).

      (2) The captions in Figures 2 and 3 must be revised. In Figure 2, it seems the last 2 sentences for the description of (C) do not belong there, and instead, the last sentence in the description of (D) may need to be included in (C) instead. Figure 3 is similar.

      The captions were revised.

      (3) From supplement Figure S2 it seems that EpCam and vimentin staining were only done in two of the mouse tumor types. No further mention is made in the results or methods section. Is there any reason this was not performed in the other tumor types? Were the histology and IHC protocols the same for the mouse and human tumors?

      The data on other tumor types and patients’ tumors have been added in Figure S3. Discussion was extended with the following paragraph.

      One of the possible reasons for metabolic heterogeneity could be the presence of stromal cells or diversity of epithelial and mesenchymal phenotypes of cancer cells within a tumor. Immunohistochemical staining of tumors for EpCam (epithelial marker) and vimentin (mesenchymal marker) showed that the fraction of epithelial, EpCam-positive, cells was more than 90% in tumor xenografts and on average 76±10 % in patients’ tumors (Figure S3). However, the ratio of EpCam- to vimentin-positive cells in patients’ samples neither correlated with D-a1 nor with BI-a1, which means that the presence of cells with mesenchymal phenotype did not contribute to metabolic heterogeneity of tumors identified by NAD(P)H FLIM.

      (4) Clarify the design of the experiments: The results come from 50 - 200 cells in each sample (except 30 in the CaCo2 cell culture) that were counted from 5 - 10 images acquired from each sample. There were 21 independent human samples. How many independent samples were included in the cell culture experiments and the mouse tumor models? Why is there an order of magnitude fewer cells included in the CaCo2 group compared to the other groups (Figure 1)? From the image (Figure 1A - CaCo2), it seems to be a highly populated type of sample, yet only 30 cells were quantified. What prevents the inclusion of the same number of cells to be quantified in each group for a more systematic evaluation?

      We thank the Reviewer for this comment.

      Cell culture experiments included two independent replicates for each cell line, the data from which were then combined. In animal experiments measurements were made in three mice (numbered 1-3 in Figure 2C) for each tumor type. We have made calculations for additional >100 cells of CaCo2 cell line. In the revised version the number of Caco2 cells is 146.

      The text of the Manuscript was revised accordingly.

      (5) Regarding references: Some claims throughout the text would benefit from an additional reference. For example: line 70 "Metabolic heterogeneity [...] is believed to have prognostic value"; line 121 " [...] the uniformity of cell metabolism in a culture, which is consistent with the general view on standard cell lines [...]". The clinical translational aspect (i.e., paragraph in line 255) warrants the inclusion of the efforts already done with FLIM imaging in the clinical setting both in vivo and ex vivo with point-spectroscopy and macroscopy imaging (e.g., Jo Lab, Marcu Lab, French Lab, and earlier work by Mycek and Richards-Kortum in colorectal cancer to name a few).

      Additional references were added.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the Introduction, line 85, the authors mention that "Specifically, the unbound state of NAD(P)H has a short lifetime (~0.4 ns) and is associated with glycolysis, while the protein-bound state has a long lifetime (~1.7-3.0 ns) and is associated with OXPHOS". I do not think this claim is appropriate. One cannot simply say that the unbound state is associated with glycolysis, nor that the bound state is associated with OXPHOS; both unbound and bound state are associated with almost all the metabolic pathways. Instead, the expression of "glycolytic/ OXPHOS shift", as authors used in other sections of this manuscript, is a more appropriate one in this case.

      The text of the Introduction was revised.

      (2) What are the biological implications of the bimodality index (BI)? Please provide specific insights.

      Bimodal distribution indicates there are two separate and independent peaks in the population data. In the metabolic FLIM data, this indicates that there are two sub-populations of cells with different metabolic phenotypes. Previously, we have observed bimodal distribution in the population of chemotherapy treated cancer cells, where one sub-population was responsive (shifted metabolism) and the second - non-responsive (unchanged metabolism) [Shirshin et al., PNAS, 2022]. In the naive tumor, a number of factors have an impact on cellular metabolism, including genetics features and microenvironment, so it is difficult to determine which ones resulted in bimodality. Our data on correlation of bimodality (BI) with clinical characteristics of the tumors show that there are no associations between them. What really matters is the width of the parameter spread in the population. The early-stage tumors (T1, T2) were metabolically more heterogeneous than the late-stage ones (T3, T4). A degree of heterogeneity was also associated with differentiation state, a stage-independent prognostic factor in colorectal cancer where the lower grade correlates with better the prognosis. The early-stage tumors (T1, T2) and high-grade (G3) tumors had significantly higher dispersion of NAD(P)H-a1, compared with the late-stage (T3, T4) and low-grade ones (G1, G2). From the point of view of biological significance of heterogeneity, this means that in stressful and unfavorable conditions, to which the tumor cells are exposed, the spread of the parameter distribution in the population rather than the presence of several distinct clusters (modes) matters for adaptation and survival. The high diversity of cellular metabolic phenotypes provided the survival advantage, and so was observed in more aggressive (undifferentiated or poorly differentiated) and the least advanced tumors.

      The discussion has been expanded on this account.

      (3) Have you run statistics in Figure 1B? If yes, do you find any significance? The same question also applies to Figures 2C and 3C.

      We performed statistical analysis to compare different cell lines in in vitro and in vivo models, the results obtained are presented in Table S4.

      (4) Line 119, why is the BI threshold set at 1.1?

      When setting the BI threshold at 1.1, we relied on the work by Wang et al, Cancer Informatics, 2009. The authors recommended the 1.1 cutoff as more reliable to select bimodally expressed genes. Further, we validated this BI threshold to identify chemotherapy responsive and non-responsive sub-populations of cancer cells (Shirshin et al. PNAS, 2022)

      (5) Line 123, what does the high BI of mean lifetime stand for? Please provide biological implications and insights.

      The sentence was removed because inclusion of additional CaCo2 cells (n=146) for quantification NAD(P)H FLIM data showed no bimodality in this cell culture.

      (6) In the legend for Figure 2C, the authors mention that "the bimodality index (BI-a1) is shown above each box"; however, I do not see such values. It is also true for Figure 3C.

      The legends for Fig. 2 and 3 were corrected.

      (7) In Figure 2, t1-t3 were not explained and mentioned in the main text. What do they mean? Do they mean different time points or different tumors?

      t1-t3 means different tumors in a group. Changes have been made to the figure - individual tumors are indicated by numbers.

      (8) In Figure 3, what do p13, p15 and p16 mean? It is not clearly explained. If they just represent patients numbered 13, 15, and 16, then why are these patients chosen as representatives? Do they represent different stages or are they just chosen randomly?

      Figure 3 was revised. Representative images were changed and a short description for each representative sample was included. In the revised version, representatives have been selected to show different stages and grades.

      (9) In Figure 3, instead of showing the results for each patient, I would suggest that authors show representative results from tumors at different stages; or, at least, clearly indicate the specific information for each patient. I do not think that providing the patient number only without any patient-specific information is helpful.

      Figure 3 was revised.

      (10) The sample number (21 patients) is very limited. I wonder how the limited patient number could lead to reliable diagnosis and prognosis.

      Additional eight samples were added. The text, figures and tables were revised accordingly.

      (11) In Discussion, it would be helpful to compare the BI index used in this study with the previously developed OMI-index (Line 275).

      We believe that BI index and OMI index describe different things and, therefore, it is hard to compare them. While BI index is used to describe the degree of the metabolic heterogeneity, OMI index is an integral parameter that includes redox ratio, mean fluorescence lifetimes of NAD(P)H and FAD, and rather indicates the metabolic state of a cell. In this sense it is more relevant to compare it with conventional redox ratio or Fluorescence Lifetime Redox Ratio (FLIRR) (H. Wallrabe et al., Segmented cell analyses to measure redox states of autofluorescent NAD(P)H, FAD & Trp in cancer cells by FLIM, Sci. Rep. 2018; 8: 79). The assessment of the heterogeneity of the FLIM parameters has been previously reported using the weighted heterogeneity (wH) index (Amy T. Shah et al, In Vivo Autofluorescence Imaging of Tumor Heterogeneity in Response to Treatment, Neoplasia 17, pp. 862–870 (2015). To the best of our knowledge, this is the only metric to quantify metabolic heterogeneity on the basis of FLIM data for today. A comparison of BI with the wH-index showed that the value of wH-index provides results similar to BI in the heterogeneity evaluation as demonstrated in our earlier paper (E.A. Shirshin et al, Label-free sensing of cells with fluorescence lifetime imaging: The quest for metabolic heterogeneity, PNAS 119 (9) e2118241119 (2022).  Yet, the BI provides dimensionless estimation on the inherent heterogeneity of a sample, and therefore it can be used to compare heterogeneity assessed by different decay parameters and FLIM data analysis methods. The limitation of using the OMI index for FLIM data analysis is the low intensity of the FAD signal, which was the case in our experiments.

    1. Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalescent model that allows to simultaneously analyze multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes.

      Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. (See also major comment #1 below about the interpretation of these plots.) A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process.

    2. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalescent model that allows to simultaneously analyze multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes. Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. (See also major comment #1 below about the interpretation of these plots.) A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process. At the same time, I would be careful about placing too much emphasis on new findings that emerge solely by switching to SNP+SMP analysis.

      Major comments:

      - For all of the simulated demographic inference results, only plots are presented. This allows for qualitative but not quantitative comparisons to be made across different methods. It is not easy to tell which result is actually better. For example, in Supp. Fig. 5, eSMC2 seems slightly better in the ancient past, and times the trough more effectively, while SMCm seems a bit better in the very recent past. For a more rigorous approach, it would be useful to have accompanying tables that measure e.g. mean-squared error (along with confidence intervals) for each of the different scenarios, similar to what is already done in Tables 1 and 2 for estimating $r$.

      We believe this comment was addressed in the previous revision (Sup Table 6-10) by adding Root Mean Square Errors for the demographic estimates (and RMSE for recent versus past portions of the demography). 

      - 434: The discussion downplays the really odd result that inputting the true value of the mutation rate, in some cases, produces much worse estimates than when they are learned from data (SFig. 6)! I can't think of any reason why this should happen other than some sort of mathematical error or software bug. I strongly encourage the authors to pin down the cause of this puzzling behaviour. (Comment addressed in revision. Still, I find the explanation added at 449ff to be somewhat puzzling -- shouldn't the results of the regional HMM scan only improve if the true mutation rate is given?)

      We do understand that our results and explanation can appear counter-intuitive. As acknowledged by the reviewer, in the previous round of revision we have at length clarified this puzzling behaviour by the discrepancy in assessing methylation regions using the HMM method which then differs from the HMM for the SMC inference. We are happy to clarify further in response to the new question of reviewer 1:

      If the Reviewer #1 means the SNP mutations (e.g. A → T), knowing the true mutation rate does not help the HMM to recover the region level methylation status. 

      If the Reviewer #1 means the epimutations (whether it is the region, site or both), knowing the true epimutations rates could theoretically help the HMM to recover the region level methylation status. However, at present, our method does not leverage information from epimutation rates to infer the region level methylation status. As inferring the epimutations rates is one of the goals of this study in the SMC inference, and that region level methylation status is required to infer those rates, we suspect that using epimutations rates to infer the region level methylation status could be statistically inappropriate (generating some kind of circular estimations). Instead, our HMM uses only the proportion of methylated and unmethylated sites (estimated from the genome) to determine whether or not a region status is most-likely to be methylated or unmethylated. We now explicit this fact in the HMM for methylation region in the method section.

      We acknowledge that our HMM to infer region level methylation status could be improved, but this would be a complete project and study on its own (due to the underlying complexity of the finite site and the lack of a consensus model for epimutations at evolutionary time scale). We believe our HMM to have been the best compromise with what was known from methylation and our goals when the study was conducted, and future work is definitely worth conducting on the estimation of the methylation regions.

      - As noted at 580, all of the added power from integrating SMPs/DMRs should come from improved estimation of recent TMRCAs. So, another way to study how much improvement there is would be to look at the true vs. estimated/posterior TMRCAs. Although I agree that demographic inference is ultimately the most relevant task, comparing TMRCA inference would eliminate other sources of differences between the methods (different optimization schemes, algorithmic/numerical quirks, and so forth). This could be a useful addition, and may also give you more insight into why the augmented SMC methods do worse in some cases. (Comment addressed in revision via Supp. Table 7.).

      - A general remark on the derivations in Section 2 of the supplement: I checked these formulas as best I could. But a cleaner, less tedious way of calculating these probabilities would be to express the mutation processes as continuous time Markov chains. Then all that is needed is to specify the rate matrices; computing the emission probabilities needed for the SMC methods reduces to manipulating the results of some matrix exponentials. In fact, because the processes are noninteracting, the rate matrix decomposes into a Kronecker sum of the individual rate matrices for each process, which is very easy to code up. And this structure can be exploited when computing the matrix exponential, if speed is an issue.

      We believe this comment was acknowledged in the previous revision (line 649), and we thank the reviewer for this interesting insight.

      - Most (all?) of the SNP-only SMC methods allow for binning together consecutive observations to cut down on computation time. I did not see binning mentioned anywhere, did you consider it? If the method really processes every site, how long does it take to run?

      We believe this comment was addressed in the previous revision and was added to the manuscript in the methods Section (subsection :  SMC optimization function).

      - 486: The assumed site and region (de)methylation rates listed here are several OOM different from what your method estimated (Supp. Tables 5-6). Yet, on simulated data your method is usually correct to within an order of magnitude (Supp. Table 4). How are we to interpret this much larger difference between the published estimates and yours? If the published estimates are not reliable, doesn't that call into question your interpretation of the blue line in Fig. 7 at 533? (Comment addressed in revision.)

      Reviewer #2 (Public Review):

      A limitation in using SNPs to understand recent histories of genomes is their low mutation frequency. Tellier et al. explore the possibility of adding hypermutable markers to SNP based methods for better resolution over short time frames. In particular, they hypothesize that epimutations (CG methylation and demethylation) could provide a useful marker for this purpose. Individual CGs in Arabidopsis tends to be either close to 100% methylated or close to 0%, and are inherited stably enough across generations that they can be treated as genetic markers. Small regions containing multiple CGs can also be treated as genetic markers based on their cumulative methylation level. In this manuscript, Tellier et al develop computational methods to use CG methylation as a hypermutable genetic marker and test them on theoretical and real data sets. They do this both for individual CGs and small regions. My review is limited to the simple question of whether using CG methylation for this purpose makes sense at a conceptual level, not at the level of evaluating specific details of the methods. I have a small concern in that it is not clear that CG methylation measurements are nearly as binary in other plants and other eukaryotes as they are in Arabidopsis. However, I see no reason why the concept of this work is not conceptually sound. Especially in the future as new sequencing technologies provide both base calling and methylating calling capabilities, using CG methylation in addition to SNPs could become a useful and feasible tool for population genetics in situations where SNPs are insufficient.

      We thank again the reviewer #2 for his positive comments.  

      Reviewer #3 (Public Review):

      I very much like this approach and the idea of incorporating hypervariable markers. The method is intriguing, and the ability to e.g. estimate recombination rates, the size of DMRs, etc. is a really nice plus. I am not able to comment on the details of the statistical inference, but from what I can evaluate it seems reasonable and in principle the inclusion of highly mutable sties is a nice advance. This is an exciting new avenue for thinking about inference from genomic data. I remain a bit concerned about how well this will work in systems where much less is understood about methylation,

      The authors include some good caveats about applying this approach to other systems, but I think it would be helpful to empiricists outside of thaliana or perhaps mammalian systems to be given some indication of what to watch out for. In maize, for example, there is a nonbimodal distribution of CG methlyation (35% of sites are greater than 10% and less than 90%) but this may well be due to mapping issues. The authors solve many of the issues I had concerns with by using gene body methylation, but this is only briefly mentioned on line 659. I'm assuming the authors' hope is that this method will be widely used, and I think it worth providing some guidance to workers who might do so but who are not as familiar with these kind of data.

      We thank the reviewer #3 for his positive comments. And we agree with Reviewer #3 concerning the application to data and that our approach needs to be carefully thought before applied. Our results clearly show that methylation processes are not well enough understood to apply our approach as we initially (maybe naively) designed it. Further investigations need to be conducted and appropriate theoretical models need to be developed before reliable results can be obtained. And we hope that our discussion points this out. However, our approach, the theoretical models and the additional tools contained in this study can be used to help researchers in their investigations to whether or not use different genomic markers to build a common (potentially more reliable) ancestral history. We enhanced the discussion in this second revision by clarifying also the use of the methylation from genic regions to avoid  confusion (lines 700-731).

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      In added Supp. Table 7, I don't think these are in log10 units as stated in the caption.

      Well Spotted! Indeed, the RMSE is not in log10 scale, we corrected the caption. We also added that the TMRCA used for MRSE calculations is in generations units to avoid potential confusion.  

      Reviewer #3 (Recommendations for The Authors):

      I very much appreciate the authors' attention to previous questions. I would ask that a bit more is spent in the discussion on concerns/approaches empiricists should keep in mind -- I am wary of this being uncritically applied to data from non-model species. It was not clear to me, for example (only mentioned on line 659 in the discussion) that the thaliana data is only using gene-body methylation. This poses potential issues with background selection that the authors acknowledge appropriately, but also assuages many of my concerns about using genome-wide data. I think text with recommendations for data/filtering/etc or at least cautions of assumptions empiricists should be aware of would help.

      We apologize for the confusion at line 659. As written in the other section of the manuscript we meant CG sites in genic regions (and not only gene body methylated regions).

      Due to the manuscript’s structure, the data from Arabidopsis thaliana is only described at the very end of the manuscript (line 900+). However, a brief description could also be found line 291-296. We however added a sentence in the introduction (line 128) for clarity. 

      We however agree with the comment made by reviewer #3 concerning the application to data. We pointed in the discussion the risk of applying our approach on ill-understood (or illprepared) data and stressed the current need of studies on the epimutations processes at evolutionary time scale ( i.e. at Ne time scale) (line 700-703).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The development of effective computational methods for protein-ligand binding remains an outstanding challenge to the field of drug design. This impressive computational study combines a variety of structure prediction (AlphaFold2) and sampling (RAVE) tools to generate holo-like protein structures of three kinases (DDR1, Abl1, and Src kinases) for binding to type I and type II inhibitors. Of central importance to the work is the conformational state of the Asp-Phy-Gly "DFG motif" where the Asp points inward (DFG-in) in the active state and outward (DFG-out) in the inactive state. The kinases bind to type I or type II inhibitors when in the DFG-in or DFG-out states, respectively.

      It is noted that while AlphaFold2 can be effective in generating ligand-free apo protein structures, it is ineffective at generating holo-structures appropriate for ligand binding. Starting from the native apo structure, structural fluctuations are necessary to access holo-like structures appropriate for ligand binding. A variety of methods, including reduced multiple sequence alignment (rMSA), AF2-cluster, and AlphaFlow may be used to create decoy structures. However, those methods can be limited in the diversity of structures generated and lack a physics-based analysis of Boltzmann weight critical to their relative evaluation.

      To address this need, the authors combine AlphaFold2 with the Reweighted Autoencoded Variational Bayes for Enhanced Sampling (RAVE) method, to explore metastable states and create a Boltzmann ranking. With that variety of structures in hand, grid-based docking methods Glide and Induced-Fit Docking (IFD) were used to generate protein-ligand (kinase-inhibitor) complexes.

      The authors demonstrate that using AlphaFold2 alone, there is a failure to generate DFG-out structures needed for binding to type II inhibitors. By applying the AlphaFold2 with rMSA followed by RAVE (using short MD trajectories, SPIB-based collective variable analysis, and enhanced sampling using umbrella sampling), metastable DFG-out structures with Boltzmann weighting are generated enabling protein-ligand binding. Moreover, the authors found that the successful sampling of DFG-out states for one kinase (DDR1) could be used to model similar states for other proteins (Abl1 and Src kinase). The AF2RAVE approach is shown to result in a set of holo-like protein structures with a 50% rate of docking type II inhibitors.

      Overall, this is excellent work and a valuable contribution to the field that demonstrates the strengths and weaknesses of state-of-the-art computational methods for protein-ligand binding. The authors also suggest promising directions for future study, noting that potential enhancements in the workflow may result from the use of binding site prediction models and free energy perturbation calculations.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores the utility of AlphaFold2 (AF2) and the author's own AF2-RAVE method for drug discovery. As has been observed elsewhere, the predictive power of docking against AF2 structures is quite limited, particularly for proteins like kinases that have non-trivial conformational dynamics. However, using enhanced sampling methods like RAVE to explore beyond AF2 starting structures leads to a significant improvement.

      Strengths:

      This is a nice demonstration of the utility of the authors' previously published RAVE method.

      Weaknesses:

      My only concern is the authors' discussion of induced fit. I'm quite confident the structures discussed are present in the absence of ligand binding, consistent with conformational selection. It seems the author's own data also argues for an important role in conformational selection. It would be nice to acknowledge this instead of going along with the common practice in drug discovery of attributing any conformational changes to induced fit without thoughtful consideration of conformational selection.

      The reviewer is correct. We aim to highlight the significant role of conformational selection. To clarify this, we have expanded the discussion on conformational selection in the introduction.

      Reviewer #3 (Public Review):

      In this manuscript, the authors aim to enhance AlphaFold2 for protein conformation-selective drug discovery through the integration of AlphaFold2 and physics-based methods, focusing on improving the accuracy of predicting protein structures ensemble and small molecule binding of metastable protein conformations to facilitate targeted drug design.

      The major strength of the paper lies in the methodology, which includes the innovative integration of AlphaFold2 with all-atom enhanced sampling molecular dynamics and induced fit docking to produce protein ensembles with structural diversity. Moreover, the generated structures can be used as reliable crystal-like decoys to enrich metastable conformations of holo-like structures. The authors demonstrate the effectiveness of the proposed approach in producing metastable structures of three different protein kinases and perform docking with their type I and II inhibitors. The paper provides strong evidence supporting the potential impact of this technology in drug discovery. However, limitations may exist in the generalizability of the approach across other structures, especially complex structures such as protein-protein or DNA-protein complexes.

      Proteins undergo thermodynamic fluctuations and can occasionally reach metastable configurations. It can be assumed that other biomolecules, such as proteins and DNA, stabilize these metastable states when forming protein-protein or protein-DNA complexes. Since our method has the potential to identify these metastable states, it shows promise for designing drugs targeting proteins in allosteric configurations induced by other biomolecules.

      The authors largely achieved their aims by demonstrating that the AF2RAVE-Glide workflow can generate holo-like structure candidates with a 50% successful docking rate for known type II inhibitors. This work is likely to have a significant impact on the field by offering a more precise and efficient method for predicting protein structure ensemble, which is essential for designing targeted drugs. The utility of the integrated AF2RAVE-Glide approach may streamline the drug discovery process, potentially leading to the development of more effective and specific medications for various diseases.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions

      (1) The computational protocol is found to be insufficient to generate precise values of the relative free energies between structures generated. The authors note in the Conclusion that an enhancement in the workflow might result from the addition of free energy calculations. Can the authors comment on the prospects for generating more accurate estimates of the free energy that might be used to qualitatively evaluate poses and the free energy landscape surrounding putative metastable states? What are the principal challenges and what might help overcome them? What would the most effective computational protocol be?

      More accurate estimates of the free energy can theoretically be achieved by increasing the number of umbrella sampling windows and extending the simulation length until the PMF converges. However, there is always a trade-off between PMF accuracy and computational costs, so we have chosen to stick with the current setup. Metadynamics is another method to obtain a more accurate free energy profile, which we have used in previous versions of AlphaFold2-RAVE, but for the specific systems we investigated, it had issues in achieving back and forth movement given the high entropic nature of the activation loop. Research in enhanced sampling methods and dimensionality reduction techniques for reaction coordinates is continually evolving and will play a critical role in alleviating this problem.

      (2) I was surprised that there was not more correlation of a funnel-like shape in Figures S16 and S18, showing a stronger correlation between low RMSD and better docking score. This is true for both the ponatinib and imatinib applications in DDR1 and Abl1. That also seems true for the trimmed results for Src kinase in Figure S19. I was also surprised that there are structures with very large RMSD but docking scores comparable to the best structures of the lowest RMSD. Might something be done to make the docking score a more effective discriminator?

      The docking algorithm and docking score are used to filter out highly improbable docking poses. False positives in predicted docking poses are a common issue across all docking methods as described for instance in:

      Fan, Jiyu, Ailing Fu, and Le Zhang. "Progress in molecular docking." Quantitative Biology 7 (2019): 83-89.

      Ferreira, R.S., Simeonov, A., Jadhav, A., Eidam, O., Mott, B.T., Keiser, M.J., McKerrow, J.H., Maloney, D.J., Irwin, J.J. and Shoichet, B.K., 2010. "Complementarity between a docking and a high-throughput screen in discovering new cruzain inhibitors." Journal of medicinal chemistry, 53(13), pp.4891-4905.

      Moreover, there is always a trade-off between docking accuracy and computational cost. While employing more accurate docking methods may decrease false positives, it can also be resource-intensive. In such scenarios, our approach to enriching holo-structures can be impactful by reducing the number of pocket structures in the input ensembles and significantly enhancing docking efficiency.

      (3) I think that it is fine to identify one structure as "IFD winner" but also feel that its significance is overstressed, especially given that it can be identified only in a retrospective analysis rather than through de novo prediction.

      We agree with the reviewer. We did not intend to emphasize the specific structure "IFD winner". Rather, we aimed to demonstrate that our method can enrich promising candidates for holo-structures. We verified this by showing that our holo-structure candidates performed well in retrospective docking using IFD, which we previously referred to as "IFD winner". We have now revised this term to "holo-model".

      Minor Points

      p. 3 "DymanicBind" should be "DynamicBind"

      p. 3 Change "We chosen" to "We have chosen" or "we chose."

      p. 3 In identifying the Schrödinger software Glide and IFD, I recommend removing the subjective modifier "industry-leading."

      Modifications done.

      Reviewer #2 (Recommendations For The Authors):

      In the view of this reviewer, the writing is 'choppy'.

      We have tried to improve the writing.

      Reviewer #3 (Recommendations For The Authors):

      (1) In Figure 1, the workflow labels (i) to (iv) are not shown on the figures, making it difficult for readers to follow. Consider adding these labels to the figures.

      Modifications done.

      (2) Explain how Boltzmann ranks were calculated based on unbiased MD simulations to guide the enrichment of holo-like structures in metastable states.

      The Methods section is now updated for clarification.

      (3) The authors could clarify how the classical DFG-out decoys in the DDR1 rMSA AF2 ensemble are transferred to Abl1 kinase in the Methods section.

      The Methods section is now updated for clarification.

      (4) The authors can clarify the methodology section by providing more detailed explanations about how the unbiased MD simulations are performed, including which MD simulation software was used and whether energy minimization and equilibrium steps were needed as in conventional MD simulations, and other setup details.

      The Methods section is now updated for clarification.

      (5) The validation of the proposed approach in this work used three kinase proteins. The authors can enhance the discussion section by addressing other types of protein structure prediction that can use the proposed approach in drug discovery, beyond the three kinase proteins tested.

      The proposed approach is theoretically applicable to other types of proteins, such as GPCRs, where both conformational selection and the induced-fit effect are crucial. We have expanded the discussion on the generalization of our protocol in the Conclusion section.

      (6) The authors should add appropriate citations for the software and tools used in the manuscript. For example, a reference should be added for the Glide XP docking experiments that utilized the Maestro software. Double-check all related software citations.

      We have now updated the citations for docking experiments based on the instruction of the Maestro Glide User manual and IFD User manual.

      (7) The authors should consider offering a comprehensive list of software tools and databases utilized in the study to assist in replicating the experiments and further validating the results.

      We have now added a summary of tools used in the Methods section.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The authors present evidence suggesting that MDA5 can substitute as a sensor for triphosphate RNA in a species that naturally lacks RIG-I. The key findings are potentially important for our understanding of the evolution of innate immune responses. Compared to an earlier version of the paper, the strength of evidence has improved but it is still partially incomplete due to a few key missing experiments and controls.

      We would like to thank the editorial team for their positive comments and constructive suggestions on improving our manuscript. We have made further improvements based on the valuable suggestions of the reviewers, and we are pleased to send you the revised manuscript now. After revising the manuscript and further supplementing with experiments, we think that our existing data can support our claims.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study offers valuable insights into host-virus interactions, emphasizing the adaptability of the immune system. Readers should recognize the significance of MDA5 in potentially replacing RIG-I and the adversarial strategy employed by 5'ppp-RNA SCRV in degrading MDA5 mediated by m6A modification in different species, further indicating that m6A is a conservational process in the antiviral immune response.

      However, caution is warranted in extrapolating these findings universally, given the dynamic nature of host-virus dynamics. The study provides a snapshot into the complexity of these interactions, but further research is needed to validate and extend these insights, considering potential variations across viral species and environmental contexts. Additionally, it is noted that the main claims put forth in the manuscript are only partially supported by the data presented.

      After meticulous revisions of the manuscript, including adjustments to the title, abstract, results, and discussion, the main claim of our study now is the arm race between the MDA5 receptor and SCRV virus in a lower vertebrate fish, M. miiuy. This mainly includes two parts: Firstly, the MDA5 of M. miiuy can recognize virus invasion and initiate host immune response by recognizing the triphosphate structure of SCRV. Secondly, as an adversarial strategy, 5’ppp-RNA SCRV virus can utilize the m6A mechanism to degrade MDA5 in M. miiuy. Based on the reviewer's suggestions, we have further supplemented the critical experiments (Figure 3F-3G, Figure 4D, Figure 5G) and provided a more detailed and accurate explanation of the experimental conclusions, we believe that our existing manuscript can support our main claims. In addition, because virus-host coevolution complicates the derivation of universal conclusions, we will further expand our insights in future research.

      Reviewer #2 (Public Review):

      This manuscript by Geng et al. aims to demonstrate that MDA5 compensates for the loss of RIG-I in certain species, such as teleost fish miiuy croaker. The authors use siniperca cheats rhabdovirus (SCRV) and poly(I:C) to demonstrate that these RNA ligands induce an IFN response in an MDA5-dependent manner in m.miiuy derived cells. Furthermore, they show that MDA5 requires its RD domain to directly bind to SCRV RNA and to induce an IFN response. They use in vitro synthesized RNA with a 5'triphosphate (or lacking a 5'triphosphate as a control) to demonstrate that MDA5 can directly bind to 5'-triphosphorylated RNA. The second part of the paper is devoted to m6A modification of MDA5 transcripts by SCRV as an immune evasion strategy. The authors demonstrate that the modification of MDA5 with m6A is increased upon infection and that this causes increased decay of MDA5 and consequently a decreased IFN response.

      One critical caveat in this study is that it does not address whether ppp-SCRV RNA induces IRF3-dimerization and type I IFN induction in an MDA5 dependent manner. The data demonstrate that mmiMDA5 can bind to triphosphorylated RNA (Fig. 4D). In addition, triphosphorylated RNA can dimerize IRF3 (4C). However, a key experiment that ties these two observations together is missing.

      Specifically, although Fig. 4C demonstrates that 5'ppp-SCRV RNA induces dimerization (unlike its dephosphorylated or capped derivatives), this does not proof that this happens in an MDA5-dependent manner. This experiment should have been done in WT and siMDA5 MKC cells side-by-side to demonstrate that the IRF3 dimerization that is observed here is mediated by MDA5 and not by another (unknown) protein. The same holds true for Fig. 4J.

      Thank you for the referee's professional suggestions. In fact, we have transfected SCRV RNA into WT and si-MDA5 MKC cells, and subsequently assessed the dimerization of IRF3 and the IFN response (Figure 2P-2Q). The results indicated that knockdown of MDA5 prevents immune activation of SCRV RNA. However, considering the potential for SCRV RNA to activate immunity independent of the triphosphate structure, this experimental observation does not comprehensively establish the MDA5-dependent induction of IRF3 dimer by 5’ppp-RNA. Accordingly, in accordance with the referee's recommendation, we proceeded to investigate the inducible activity of 5'ppp-SCRV on IRF3 dimerization in WT and si-MDA5 MKC cells, revealing that 5'ppp-SCRV indeed elicits immunity in an MDA5-dependent manner (Figure 4D). Additionally, poly(I:C)-HMW, a known ligand for MDA5, demonstrated a residual, albeit attenuated, activation of IRF3 following MDA5 knockdown, potentially attributed to its capacity to stimulate immunity through alternative pathways such as TLR3.

      - Fig 1C-D: these experiments are not sufficiently convincing, i.e. the difference in IRF3 dimerization between VSV-RNA and VSV-RNA+CIAP transfection is minimal.

      We have reconstituted the necessary materials and repeated the pertinent experiments depicted in Fig 1C-1D. The results demonstrate that SCRV-RNA+CIAP and VSV-RNA+CIAP exhibit a mitigating effect on the induction activity of SCRV-RNA and VSV-RNA on IRF3 dimerization, albeit without complete elimination (Figure 1C and 1D). These findings suggest the presence of receptors within M. miiuy and G. gallus capable of recognizing the viral triphosphate structure; however, it is worth noting that RNA derived from SCRV and VSV viruses does not exclusively depend on the triphosphate structure to activate the host's antiviral response.

      Fig. 2N and 2O: why did the authors decide to use overexpression of MDA5 to assess the impact of STING on MDA5-mediated IFN induction? This should have been done in cells transfected with SCRV or polyIC (as in 2D-G) or in infected cells (as in 2H-K). In addition, it is a pity that the authors did not include an siMAVS condition alongside siSTING, to investigate the relative contribution of MAVS versus STING to the MDA5-mediated IFN response. Panel O suggests that the IFN response is completely dependent on STING, which is hard to envision.

      In our previous laboratory investigations, we have substantiated the induction effect of STING on IFN under SCRV infection or poly(I:C) stimulation, as documented in the relevant literature (10.1007/s11427-020-1789-5), which we have referenced in our manuscript (lines 177-178). While we did assess the impact of STING on MDA5-mediated IFN induction in SCRV-infected cells, as indicated in the figure legends, we have revised Figure 2N-2O for improved clarity, and similarly, Figure 1H-1I has also been updated. Furthermore, considering that RNA virus infection can activate the cGAS/STING axis (10.3389/fcimb.2023.1172739) and the significant role of MAVS in sensing RNA virus invasion in the NLR pathway (10.1038/ni.1782), it is challenging to ascertain the respective contributions of STING and MAVS to the immune signaling cascade mediated by MDA5 during RNA virus infection. We intend to explore this aspect further in future research endeavors.

      Fig. 3F and 3G: where are the mock-transfected/infected conditions? Given that ectopic expression of hMDA5 is known to cause autoactivation of the IFN pathway, the baseline ISG levels should be shown (ie. In absence of a stimulus or infection). Normalization of the data does not reveal whether this is the case and is therefore misleading.

      Based on the reviewer's suggestions, we have rerun the experiment. We examined the effects of MDA5 and MDA5-ΔRD on antiviral factors in both uninfected, SCRV-infected, and poly(I:C)-HMW-stimulated MKC cells. Results showed that overexpression of both MDA5 and MDA5-ΔRD stimulated the expression of antiviral genes. However, when cells were infected or stimulated with SCRV or poly(I:C)-HMW, only the overexpression of MDA5, not MDA5-ΔRD, significantly increased the expression of antiviral genes (Figure 3F-3I).

      Fig. 4F and 4G: can the authors please indicate in the figure which area of the gel is relevant here? The band that runs halfway the gel? If so, the effects described in the text are not supported by the data (i.e. the 5'OH-SCRV and 5'pppGG-SCRV appear to compete with Bio-5'ppp-SCRV as well as 5'ppp-SCRV).

      Apologies for any confusion. The relevant areas in the gel pertaining to the experimental findings were denoted with asterisks and elaborated upon in the figure legends (Figure 4G, 4H, and 4M). The findings indicated that 5'ppp-SCRV, in contrast to 5'OH-SCRV and 5'pppGG-SCRV, demonstrated the ability to compete with bio-5'ppp-SCRV.

      My concerns about Fig. 5 remain unaltered. The fact that MDA5 is an ISG explains its increased expression and increased methylation pattern. The authors should at the very least mention in their text that MDA5 is an ISG and that their observations may be partially explained by this fact.

      First, as our m6A change analysis pipeline controls for changes in gene expression, these data should represent true changes in m6A modification rather than changes in the expression of m6A-modified transcripts (10.1038/s41598-020-63355-3). Similar studies demonstrated that m6A modification in RIOK3 and CIRBP mRNAs are altered following Flaviviridae infection (10.1016/j.molcel.2019.11.007). The specific calculation method is as follows: relative m6A level for each transcript was calculated as the percent of input in each condition normalized to that of the respective positive control spike-in. Fold change of enrichment was calculated with mock samples normalized to 1. Therefore, changes in the expression level of MDA5 can partially explain the increase in m6A modification on all MDA5 mRNA in cells, but it cannot indicate changes in m6A modification on each mDA5 transcript. We have supplemented the calculation method process in the manuscript and cited relevant literature (Lines 606-608). In addition, we have elaborated on the fact that MDA5 is an ISG gene in the experimental results (lines 260-261), and emphasized its compatibility with enhanced m6A modification of MDA5 in the discussion section (lines 405-409).

      Reviewer #3 (Public Review):

      In this manuscript, the authors explored the interaction between the pattern recognition receptor MDA5 and 5'ppp-RNA in the Miiuy croaker. They found that MDA5 can serve as a substitute for RIG-I in detecting 5'ppp-RNA of Siniperca cheilinus rhabdovirus (SCRV) when RIG-I is absent in Miiuy croaker. Furthermore, they observed MDA5's recognition of 5'ppp-RNA in chickens (Gallus gallus), a species lacking RIG-I. Additionally, the authors documented that MDA5's functionality can be compromised by m6A-mediated methylation and degradation of MDA5 mRNA, orchestrated by the METTL3/14-YTHDF2/3 regulatory network in Miiuy croaker during SCRV infection. This impairment compromises the innate antiviral immunity of fish, facilitating SCRV's immune evasion. These findings offer valuable insights into the adaptation and functional diversity of innate antiviral mechanisms in vertebrates.

      We extend our sincere appreciation for your professional comments and insightful suggestions on our manuscript, as they have significantly contributed to enhancing its quality.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The interpretation of Figures 1H and I, along with the captions, seems unclear. Particularly, understanding the meaning of the X-axis in Figure I is challenging. Additionally, the designation of "H2O = 1" on the Y-axis in Figure 1E lacks clarity. It would be helpful if the author could revise and clarify these figures for better comprehension.

      We appreciate your reminder and have corrected and clarified these figures and figure legends (lines 768-772). We have replaced the Y-axis of Figure 1I with "Relative mRNA expression" instead of " Relative IFN-1 expression" (Figure 1I). In addition, we have added an explanation of "H2O=1" in the legend of Figure 1E.

      (2) The interpretation of Figure 5 in section 2.5 seems incomplete. The author mentioned that both m6A levels and MDA5 expression levels are increased (lines 256-257), prompting questions about the relationship between m6A and MDA5 expression. If higher m6A levels typically lead to MDA5 mRNA instability and lower MDA5 expression, observing both increasing simultaneously appears contradictory. Considering the dynamic changes shown in Figure 5, it would be more appropriate to propose an alteration in both m6A levels and MDA5 expression levels. Given the fluctuating nature of these changes, definitively labeling them as solely "increased" is challenging. Therefore, offering a nuanced interpretation of the results and clarifying this aspect would bolster the study's conclusions.

      While changes in m6A modification and the expression of m6A-modified transcripts are biologically relevant, identifying bona fide m6A alterations during viral infection will allow us to understand how m6A modification of cellular mRNA is regulated. As our m6A change analysis pipeline controls for changes in gene expression, these data should represent true changes in m6A modification rather than changes in the expression of m6A-modified transcripts (10.1038/s41598-020-63355-3). Similar studies demonstrated that m6A modification in RIOK3 and CIRBP mRNAs are altered following Flaviviridae infection (10.1016/j.molcel.2019.11.007). The specific calculation method is as follows: relative m6A level for each transcript was calculated as the percent of input in each condition normalized to that of the respective positive control spike-in. Fold change of enrichment was calculated with mock samples normalized to 1. Therefore, the upregulation of MDA5 expression can partially explain the increase in m6A modification on all MDA5 mRNA in cells, but it cannot indicate changes in m6A modification on each mDA5 transcript. We have supplemented the calculation method process in the manuscript and cited relevant literature. I hope to receive your understanding.

      In addition, although higher m6A levels often lead to unstable MDA5 mRNA and lower MDA5 expression, SCRV can affect MDA5 expression through multiple pathways. For example, since MDA5 is an interferon-stimulated gene, the infection of SCRV virus can cause strong expression of interferon and indirectly induce high-level expression of MDA5. Therefore, the expression of MDA5 is not contradictory to the simultaneous increase in MDA5 modification (24 h). In order to further enhance our experimental conclusions, we supplemented the dual fluorescence experiment. The results indicate that, the infection of SCRV can inhibit the fluorescence activity of MDA5-exon1 reporter plasmids containing m6A sites but not including the promoter sequence of the MDA5 gene, and this inhibitory effect can be counteracted by cycloleucine (CL, an amino acid analogue that can inhibit m6A modification) (Figure 5G). This further indicates that SCRV can reduce the expression of MDA5 through the m6A pathway.

      Finally, in light of the fluctuations in MDA5 expression levels, we have changed the subheadings of Results 2.5 section and provided a more comprehensive and precise elucidation of the experimental outcomes. We are grateful for your valuable feedback.

      (3) In the discussion section, it would indeed be advantageous for the author to explore the novelty of this work more comprehensively, moving beyond merely acknowledging the widespread loss of RIG-I and suggesting MDA5 as a compensatory mechanism. Considering the well-established roles of MDA5 and m6A in host-virus interactions, the findings of this study may seem familiar in light of previous research. To enhance the discussion, it would be valuable for the author to delve into the implications of this evolutionary model. For instance, does the compensation or loss of RIG-I impact a species' susceptibility to specific types of viruses? Exploring such questions would provide insight into the broader significance of this compensation model and its potential effects on host-virus interactions, thus adding depth to the study's contribution.

      We appreciate the expert advice provided by the referee. In response, we have expanded our discussion in the relevant section, addressing the potential influence of RIG-I deficiency and MDA5 compensation on the antiviral immune system in vertebrates (lines 371-376). Furthermore, we underscore the significance of exploring the impact of SCRV infection on MDA5 m6A modification, considering its compatibility with MDA5 as an ISG gene, in elucidating the host response to viral infection (lines 405-409).

      (4) To improve the manuscript, it would be beneficial if the editors could aid the author in refining the language. Many descriptions in the article are overly redundant, and there should be appropriate differentiation between experimental methods and results.

      We appreciate the reviewer’s comment. We have carefully revised the manuscript and removed redundant descriptions in the experimental results and methods.

      Reviewer #3 (Recommendations For The Authors):

      The authors have addressed all of my concerns.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1

      R1 Cell profiling is an emerging field with many applications in academia and industry. Finding better representations for heterogeneous cell populations is important and timely. However, unless convinced otherwise after a rebuttal/revision, the contribution of this paper, in our opinion, is mostly conceptual, but in its current form - not yet practical. This manuscript combined two concepts that were previously reported in the context of cell profiling, weakly supervised representations. Our expertise is in computational biology, and specifically applications of machine learning in microscopy.

      In our revised manuscript, we have aimed to better clarify the practical contributions of our work by demonstrating the effectiveness of the proposed concepts on real-world datasets. We hope that these revisions and our detailed responses address your concerns and highlight the potential impact of our approach.

      R1.1a. CytoSummaryNet is evaluated in comparison to aggregate-average profiling, although previous work has already reported representations that capture heterogeneity and self-supervision independently. To argue that both components of contrastive learning and sets representations are contributing to MoA prediction we believe that a separate evaluation for each component is required. Specifically, the authors can benchmark their previous work to directly evaluate a simpler population representation (PMID: 31064985, ref #13) - we are aware that the authors report a 20% improvement, but this was reported on a separate dataset. The authors can also compare to contrastive learning-based representations that rely on the aggregate (average) profile to assess and quantify the contribution of the sets representation.

      We agree that evaluating the individual contributions of the contrastive learning framework and single-cell data usage is important for understanding CytoSummaryNet's performance gains.

      To assess the impact of the contrastive formulation independently, we applied CytoSummaryNet to averaged profiles from the cpg0004 dataset. This isolated the effect of contrastive learning by eliminating single-cell heterogeneity. The experiment yielded a 32% relative improvement in mechanism of action retrieval, compared to the 68% gain achieved with single-cell data. These findings suggest that while the contrastive formulation contributes significantly to CytoSummaryNet's performance, leveraging single-cell information is crucial for maximizing its effectiveness. We have added a discussion of this experiment to the Results section:

      “We conducted an experiment to determine whether the improvements in mechanism of action retrieval were due solely to CytoSummaryNet's contrastive formulation or also influenced by the incorporation of single-cell data. We applied the CytoSummaryNet framework to pre-processed average profiles from the 10 μM dose point data of Batch 1 (cpg0004 dataset). This approach isolated the effect of the contrastive architecture by eliminating single-cell data variability. We adjusted the experimental setup by reducing the learning rate by a factor of 100, acknowledging the reduced task complexity. All other parameters remained as described in earlier experiments.

      This method yielded a less pronounced but still substantial improvement in mechanism of action retrieval, with an increase of 0.010 (32% enhancement - Table 1). However, this improvement was not as high as when the model processed single-cell level data (68% as noted above). These findings suggest that while CytoSummaryNet's contrastive formulation contributes to performance improvements, the integration of single-cell data plays a critical role in maximizing the efficacy of mechanism of action retrieval.”

      We don't believe comparing with PMID: 31064985 is useful: while the study showcased the usefulness of modeling heterogeneity using second-order statistics, its methodology is limited in scalability due to the computational burden of computing pairwise similarities for all perturbations, particularly in large datasets. Additionally, the study's reliance on similarity network fusion, while expedient, introduces complexity and inefficiency. We contend that this comparison does not align with our objective of testing the effectiveness of heterogeneity in isolation, as it primarily focuses on capturing second and first-order information. Thus, we do not consider this study a suitable baseline for comparison.

      R1.1b. The evaluation metric of mAP improvement in percentage is misleading, because a tiny improvement for a MoA prediction can lead to huge improvement in percentage, while a much larger improvement in MoA prediction can lead to a small improvement in percentage. For example, in Fig. 4, MEK inhibitor mAP improvement of ~0.35 is measured as ~50% improvement, while a much smaller mAP improvement can have the same effect near the origins (i.e., very poor MoA prediction).

      We agree that relying solely on percentage improvements can be misleading, especially when small absolute changes result in large percentage differences.

      However, we would like to clarify two key points regarding our reporting of percentage improvements:

      • We calculate the percentage improvement by first computing the average mAP across all compounds for both CytoSummaryNet and average profiling, and then comparing these averages. This approach is less susceptible to the influence of outlier improvements compared to calculating the average of individual compound percentage improvements.
      • We report percentage improvements alongside their corresponding absolute improvements. For example, the mAP improvement for Stain4 (test set) is reported as 0.052 (60%). To further clarify this point, we have updated the caption of Table 1 to explicitly state how the percentage improvements are calculated:

      The improvements are calculated as mAP(CytoSummaryNet)-mAP(average profiling). The percentage improvements are calculated as (mAP(CytoSummaryNet)-mAP(average profiling))/mAP(average profiling).

      R1.1b. (Subjective) visual assessment of this figure does not show a convincing contribution of CytoSummaryNet representations of the average profiling on the test set (3.33 uM). This issue might also be relevant for the task of replicate retrieval. All in all, the mAP improvement reported in Table 1 and throughout the manuscript (including the Abstract), is not a proper evaluation metric for CytoSummaryNet contribution. We suggest reporting the following evaluations:

      1. Visualizing the results of cpg0001 (Figs. 1-3) similarly to cpg0004 (Fig. 4), i.e., plotting the matched mAP for CytoSummaryNet vs. average profile.

      2. In Table 1, we suggest referring to the change in the number of predictable MoAs (MoAs that pass a mAP threshold) rather than the improvement in percentages. Another option is showing a graph of the predictability, with the X axis representing a threshold and Y-axis showing the number of MoAs passing it. For example see (PMID: 36344834, Fig. 2B) and (PMID: 37031208, Fig. 2A), both papers included contributions from the corresponding author of this manuscript.

      Regarding the suggestion to visualize the results for compound group cpg0001 similarly to cpg0004, unfortunately, this is not feasible due to the differences in data splitting between the two datasets. In cpg0001, an MoA might have one compound in the training set and another in the test or validation set. Reporting a single value per MoA would require combining these splits, which could be misleading as it would conflate performance across different data subsets.

      However, we appreciate the suggestion to represent the number of predictable MoAs that surpass a certain mAP threshold, as it provides another intuitive measure of performance. To address this, we have created a graph that visualizes the predictability of MoAs across various thresholds, similar to the examples provided in the referenced papers (PMID: 36344834, Figure 2B and PMID: 37031208, Figure 2A). This graph, with the x-axis depicting the threshold and the y-axis showing the number of MoAs meeting the criterion, has been added to Supplementary Material K.

      R1.1c.i. "a subset of 18 compounds were designated as validation compounds" - 5 cross-validations of 18 compounds can make the evaluation complete. This can also enhance statistical power in figures 1-3.

      We appreciate your suggestion and acknowledge the potential benefits of employing cross-validation, particularly in enhancing statistical power. While we understand the merit of cross-validation for evaluating model performance and generalization to unseen data, we believe the results as presented already highlight the generalization characterics of our methods.

      Specifically, (the new) Figure 3 demonstrates the model's improvement over average profiling in both training and validation plates, supporting its ability to generalize to unseen compounds (but not to unseen plates).

      While cross-validation could potentially enhance our analysis, retraining five new models solely for different validation set results may not substantially alter our conclusions, given the observed trends in Suppl Figure A1 and (the new) Figure 4, both of which show results across multiple stain sets (but a single train-test-validation split).


      R1.1c.ii. Clarify if the MoA results for cpg0001 are drawn from compounds from both the training and the validation datasets. If so, describe how the results differ between the sets in text and graphs.

      We confirm that the Mechanism of Action (MoA) retrieval results for cpg0001 are derived from all available compounds. It's important to note that the training and validation dataset split for the replicate retrieval task is different from the MoA prediction task. For replicate retrieval, we train using all available compounds and validate on a held-out set (see Figure 2). For MoA prediction, we train using the replicate retrieval task as the objective on all available compounds but validate using MoA retrieval, which is a distinct task. We have added a brief clarification in the main text to highlight the distinction between these tasks and how validation is performed for each:

      “We next addressed a more challenging task: predicting the mechanism of action class for each compound at the individual well level, rather than simply matching replicates of the exact same compound (Figure 5). It's also important to note that mechanism of action matching is a downstream task on which CytoSummaryNet is not explicitly trained. Consequently, improvements observed on the training and validation plates are more meaningful in this context, unlike in the previous task where only improvements on the test plate were meaningful. For similar reasons, we calculate the mechanism of action retrieval performance on all available compounds, combining both the training and validation sets. This approach is acceptable because we calculate the score on so-called "sister compounds" only—that is, different compounds that have the same mechanism of action annotation. This ensures there is no overlap between the mechanism of action retrieval task and the training task, maintaining the integrity of our evaluation. ”

      R1.1c.iii. "Mechanism of action retrieval is evaluated by quantifying a profile's ability to retrieve the profile of other compounds with the same annotated mechanism of action.". It was unclear to us if the evaluation of mAP for MoA identification can include finding replicates of the same compound. That is, whether finding a close replicate of the same compound would be included in the AP calculation. This would provide CytoSummaryNet with an inherent advantage as this is the task it is trained to do. We assume that this was not the case (and thus should be more clearly articulated), but if it was - results need to be re-evaluated excluding same-compound replicates.

      The evaluation excludes replicate wells of the same compound and only considers wells of other compounds with the same MoA. This methodology ensures that the model's performance on the MoA prediction task is not inflated by its ability to find replicates of the same compound, which is the objective of the replicate retrieval task. Please see the explanation we have added to the main text in our response to R1.1c.ii. Additionally, we have updated the Methods section to clearly describe this evaluation procedure:

      “Mechanism of action retrieval is evaluated by quantifying a profile’s ability to retrieve the profile of different compounds with the same annotated mechanism of action.”



      __R1.2a. __The description of Stain2-5 was not clear for us at first (and second) read. The information is there, but more details will greatly enhance the reader's ability to follow. One suggestion is explicitly stating that these "stains" partitioning was already defined in ref 26. Another suggestion is laying out explicitly a concrete example on the differences between two of these stains. We believe highlighting the differences between stains will strengthen the claim of the paper, emphasizing the difficulty of generalizing to the out-of-distribution stain.

      We appreciate your feedback on the clarity of the Stain2-5 dataset descriptions; we certainly struggled to balance detail and concepts in describing these. We have made the following changes:

      • Explicitly mentioned that the partitioning of the Stain experiments was defined in https://pubmed.ncbi.nlm.nih.gov/37344608/: “The partitioning of the Stain experiments have been defined and explained previously [21].”
      • Moved an improved version of (now) Figure 2 from the Methods section to the main text to help visually explain how the stratification is done early on.
      • Added a new section in the Experimental Setup: Diversity of stain sets, which includes a concrete example highlighting the differences between Stain2, and Stain5 to emphasize the diversity in experimental setups within the same dataset: “Stain2-5 comprise a series of experiments which were conducted sequentially to optimize the experimental conditions for image-based cell profiling. These experiments gradually converged on the most optimal set of conditions; however, within each experiment, there were significant variations in the assay across plates. To illustrate the diversity in experimental setups within the dataset, we will highlight the differences between Stain2 and Stain5.

      Stain2 encompasses a wide range of nine different experimental protocols, employing various imaging techniques such as Widefield and Confocal microscopy, as well as specialized conditions like multiplane imaging and specific stains like MitoTracker Orange. This subset also includes plates acquired with strong pixel binning instead of default imaging and plates with varying concentrations of dyes like Hoechst. As a result, Stain2 exhibits greater variance in the experimental conditions across different plates compared to Stain5.

      In contrast, Stain5, the last experiment in the series, follows a more systematic approach, consistently using either confocal or default imaging across three well-defined conditions. Each condition in Stain5 utilizes a lower cell density of 1,000 cells per well compared to Stain2's 2,500 cells per well. Being the final experiment in the series, Stain5 had the least variance in experimental conditions.

      For training the models, we typically select the data containing the most variance to capture the broadest range of experimental variation. Therefore, we chose Stain2-4 for training, as they represented the majority of the data and captured the most experimental variation. We reserved Stain5 for testing to evaluate the model's ability to generalize to new experimental conditions with less variance.

      All StainX experiments were acquired in different passes, which may introduce additional batch effects.”

      These changes aim to provide a clearer understanding of the dataset's complexity and the challenges associated with generalizing to out-of-distribution data.

      R1.2b. What does each data point in Figures 1-3 represent? Is it the average mAP for the 18 validation compounds, using different seeds for model training? Why not visualize the data similarly to Fig. 4 so the improvement per compound can be clearly seen?

      The data points in (the new) Figures 3,4,5 represent the average mAP for each plate, calculated by first computing the mAP for each compound and then averaging across compounds to obtain the average mAP per plate. We have updated the figure captions to clarify this:

      "... (each data point is the average mAP of a plate) ..."

      While visualizing the mAP per compound, similar to (the new) Figure 6 for cpg0004, could provide insights into compound-level improvements, it would require creating numerous additional figures or one complex figure to adequately represent all the stratifications we are analyzing (plate, compound, Stain subset). By averaging the data per plate across different stratifications, we aim to provide a clearer and more comprehensible overview of the trends and improvements while allowing us to draw conclusions about generalization.

      Please note: this comment is related to the comment R1.1b (Subjective)

      R1.2.c [On the topic of enhancing clarity and readability:] Justification and interpretation of the evaluation metrics.

      Please refer to our response to comment R1.1b, where we have addressed your concerns regarding the justification and interpretation of the evaluation metrics.

      R1.2d. Explicitly mentioning the number of MoAs for each datasets and statistics of number of compounds per MoA (e.g., average\median, min, max).

      We have added the following to the Experimental Setup: Data section:

      “A subset of the data was used for evaluating the mechanism of action retrieval task, focusing exclusively on compounds that belong to the same mechanism class. The Stain plates contained 47 unique mechanisms of action, with each compound replicated four times. Four mechanisms had only a single compound; the four mechanisms (and corresponding compounds) were excluded, resulting in 43 unique mechanisms used for evaluation. In the LINCS dataset, there were 1436 different mechanisms, but only 661 were used for evaluation because the remaining had only one compound.”

      R1.2e. The data split in general is not easily understood. Figure 8 is somewhat helpful, however in our view, it can be improved to enhance understanding of the different splits. Specifically, the training and validation compounds need to be embedded and highlighted within the figure.

      Thank you for highlighting this. We have completely revised the figure, now Figure 2 which we hope more clearly conveys the data split strategy.

      Please note: this comment is related to the comment R1.2a.





      R1.3a. Why was stain 5 used for the test, rather than the other stains?

      Stain2-5 were part of a series of experiments aimed at optimizing the experimental conditions for image-based cell profiling using Cell Painting. These experiments were conducted sequentially, gradually converging on the most optimal set of conditions. However, within each experiment, there were significant variations in the assay across plates, with earlier iterations (Stain2-4) having more variance in the experimental conditions compared to Stain5. As Stain5 was the last experiment in the series and consisted of only three different conditions, it had the least variance. For training the models, we typically select the data containing the most variance to capture the broadest range of experimental variation. Therefore, Stain2-4 were chosen for training, while Stain5 was reserved for testing to evaluate the model's ability to generalize to new experimental conditions with less variance.

      We have now clarified this in the Experimental Setup: Diversity of stain sets section. Please see our response to comment R1.2a. for the full citation.

      R1.3b How were the 18 validation compounds selected?

      20% of the compounds (n=18) were randomly selected and designated as validation compounds, with the remaining compounds assigned to the training set. We have now clarified this in the Results section:

      “Additionally, 20% of the compounds (n=18) were randomly selected and designated as validation compounds, with the remaining compounds assigned to the training set (Supplementary Material H).”

      R1.3c. For cpg0004, no justification for the specific doses selected (10uM - train, 3.33 uM - test) for the analysis in Figure 4. Why was the data split for the two dosages? For example, why not perform 5-fold cross validation on the compounds (e.g., of the highest dose)?

      We chose to use the 10 μM dose point as the training set because we expected this higher dosage to consist of stronger profiles with more variance than lower dose points, making it more suitable for training a model. We decided to use a separate test set at a different dose (3.33 μM) to assess the model's ability to generalize to new dosages. While cross-validation on the highest dose could also be informative, our approach aimed to balance the evaluation of the model's generalization capability with its ability to capture biologically relevant patterns across different dosages.

      This explanation has been added to the text:

      “We chose the 10 μM dose point for training because we expected this high dosage to produce stronger profiles with more variance than lower dose points, making it more suitable for model training.”

      “The multiple dose points in this dataset allowed us to create a separate hold-out test set using the 3.33 μM dose point data. This approach aimed to evaluate the model's performance on data with potentially weaker profiles and less variance, providing insights into its robustness and ability to capture biologically relevant patterns across dosages. While cross-validation on the 10 μM dose could also be informative, focusing on lower dose points offers a more challenging test of the model's capacity to generalize beyond its training conditions, although we do note that all compounds’ phenotypes would likely have been present in the 10 μM training dataset, given the compounds tested are the same in both.”

      R1.3d. A more detailed explanation on the logic behind using a training stain to test MoA retrieval will help readers appreciate these results. In our first read of this manuscript we did not grasp that, we did in a second read, but spoon-feeding your readers will help.

      This comment is related to the rationale behind training on one task and testing on another, which is addressed in our responses to comments R1.1.cii and R1.1.ciii.

      R1.4 Assessment of interpretability is always tricky. But in this case, the authors can directly confirm their interpretation that the CytoSummaryNet representation prioritizes large uncrowded cells, by explicitly selecting these cells, and using their average profile re

      We progressively filtered out cells based on a quantile threshold for Cells_AreaShape features (MeanRadius, MaximumRadius, MedianRadius, and Area), which were identified as important in our interpretability analysis, and then computed average profiles using the remaining cells before determining the replicate retrieval mAP. In the exclusion experiment, we gradually left out cells as the threshold increased, while in the inclusion experiment, we progressively included larger cells from left to right.

      The results show that using only the largest cells does not significantly increase the performance. Instead, it is more important to include the large cells rather than only including small cells. The mAP saturates after a threshold of around 0.4, indicating that larger cells define the profile the most, and once enough cells are included to outweigh the smaller cell features, the profile does not change significantly by including even larger cells.

      These findings support our interpretation that CytoSummaryNet prioritizes large, uncrowded cells. While this approach could potentially be used as a general outlier removal strategy for cell profiling, further investigation is needed to assess its robustness and generalizability across different datasets and experimental conditions.

      We have created Supplementary Material L to report these findings and we additionally highlight them in the Results:

      “To further validate CytoSummaryNet's prioritization of large, uncrowded cells, we progressively filtered cells based on Cells_AreaShape features and observed the impact on replicate retrieval mAP (Supplementary Material L). The results support our interpretation and highlight the key role of larger cells in profile strength.”

      __R1.5. __Placing this work in context of other weakly supervised representations. Previous papers used weakly supervised labels of proteins / experimental perturbations (e.g., compounds) to improve image-derived representations, but were not discussed in this context. These include PMID: 35879608, https://www.biorxiv.org/content/10.1101/2022.08.12.503783v2 (from the same research groups and can also be benchmarked in this context), https://pubs.rsc.org/en/content/articlelanding/2023/dd/d3dd00060e , and https://www.biorxiv.org/content/10.1101/2023.02.24.529975v1. We believe that a discussion explicitly referencing these papers in this specific context is important.

      While these studies provide valuable insights into improving cell population profiles using representation learning, our work focuses specifically on the question of single-cell aggregation methods. We chose to use classical features for our comparisons because they are the current standard in the field. This approach allows us to directly assess the performance of our method in the context of the most widely used feature extraction pipeline in practice. However, we see the value in incorporating them in future work and have mentioned them in the Discussion:

      “Recent studies exploring image-derived representations using self-supervised and self-supervised learning [35][36] could inspire future research on using learned embeddings instead of classical features to enhance model-aggregated profiles.”

      R1.minor1. "Because the improved results could stem from prioritizing certain features over others during aggregation, we investigated each cell's importance during CytoSummaryNet aggregation by calculating a relevance score for each" - what is the relevance score? Would be helpful to provide some intuition in the Results.

      We have included more explanation of the relevance score in the Results section, following the explanation of sensitivity analysis (SA) and critical point analysis (CPA):

      “SA evaluates the model's predictions by analyzing the partial derivatives in a localized context, while CPA identifies the input cells with the most significant contribution to the model's output. The relevance scores of SA and CPA are min-max normalized per well and then combined by addition. The combination of the two is again min-max normalized, resulting in the SA and CPA combined relevance score (see Methods for details).”

      R1.minor2. Figure 1:

      1. Colors of the two methods too similar
      2. The dots are too close. It will be more easily interpreted if they were further apart.
      3. What do the dots stand for?
      4. We recommend considering moving this figure to the supp. material (the most important part of it is the results on the test set and it appears in Fig.2).
      1. We chose a lighter and darker version of the same color as a theme to simplify visualization, as this theme is used throughout (the new) Figures 3,4,5.
      2. We agree; we have now redrawn the figure to fix this.
      3. Each data point is the average mAP of a plate. Please see our answer for R1.2b as well.
      4. We believe that (the new) Figures 3,4,5 serve distinct purposes in testing various generalization hypotheses. We have added the following text to emphasize that the first figures are specifically about generalization hypothesis testing: “We first investigated CytoSummaryNet’s capacity to generalize to out-of-distribution data: unseen compounds, unseen experimental protocols, and unseen batches. The results of these investigations are visualized in Figures 3, 4, and 5, respectively.”

      R1.minor3 Figure 4: It is somewhat misleading to look at the training MoAs and validation MoAs embedded together in the same graph. We recommend showing only the test MoAs (train MoAs can move to SI).

      We addressed this comment in R1.1c.ii. To reiterate briefly, there are no training, validation, or test MoAs because these are not used as labels during the training process. There is an option to split them based on training and validation compounds, which is addressed in R1.1c.ii.


      R1.minor4 Figure 5

      1. Why only Stain3? What happens if we look at Stains 2,3 and 4 together? Stain 5?

      2. Should validation compounds and training compounds be analyzed separately?

      3. Subfigure (d): it is expected that the data will be classified by compound labels as it is the training task, but for this to be persuasive I would like to see this separately on the training compounds first and then and more importantly on the validation compounds.

      4. For subfigures (b) and (d): it appears there are not enough colors for d, which makes it partially not understandable. For example, the pink label in (d) shows a single compound which appears to represent two different MoAs. This is probably not the case, and it has two different compounds, but it cannot be inferred when they are represented by the same color.

      5. For the Subfigure (e) - only 1 circle looks justified (in the top left). And for that one, is it not a case of an outlier plate that would perhaps need to be removed from analysis? Is it not good that such a plate will be identified?

      We have addressed this point in the text, stating that the results are similar for Stain2 and Stain4. Stain5 represents an out-of-distribution subset because of a very different set of experimental conditions (see Experimental Setup: Diversity of stain sets). To improve clarity, we have revised the figure caption to reiterate this information:

      “... Stain2 and Stain4 yielded similar results (data not shown). …”

      1. For replicate retrieval, analyzing validation and training compounds separately is appropriate. However, this is not the case for MoA retrieval, as discussed in our responses to R1.1c.ii and R1.1c.i.
      2. We have created the requested plot (below) but ultimately decided not to include it in the manuscript because we believe that (the new) Figures 3 and 4 are more effective for making quantitative comparative claims.

      [Please see the full revision document for the figures]

      Top: training compounds (validation compounds grayed out); not all compounds are listed in the legend.

      *Bottom: validation compounds (training compounds grayed out). *

      Left: average profiling; Right: CytoSummaryNet

      1. We agree with your observation and have addressed this issue by labeling the center mass as a single class (gray) and highlighting only the outstanding pairs in color. Please refer to the updated figure and our response to R3.6 for more details.

      2. In the updated figure, we have revised the figure caption to focus solely on the annotation of same mechanism of action profile clusters, as indicated by the green ellipses. The annotation of isolated plate clusters has been removed (Figures 7e and 7f) to maintain consistency and avoid potential confusion. Despite being an outlier for Stain3, the plate (BR00115134bin1) clusters with Stain4 plates (Supplementary Figure F1, green annotated square inside the yellow annotated square), indicating it is not merely a noisy outlier and can provide insights into the out-of-sample performance of our model.

      R1.minor5a. Discussion: "perhaps in part due to its correction of batch effects" - is this statement based on Fig. 5F - we are not convinced.

      We appreciate the reviewer's scrutiny regarding our statement about batch effect correction. Upon reevaluation, we agree that this claim was not adequately substantiated by empirical data. We quantified the batch effects using comparison mean average precision for both average profiles and CytoSummaryNet profiles, and the statistical analysis revealed no significant difference between these profiles in terms of batch effect correction. Therefore, we have removed this theoretical argument from the manuscript entirely to ensure that all claims are strongly supported by the data presented.

      R1.minor5b. "Overall, these results improve upon the ~20% gains we previously observed using covariance features" - this is not the same dataset so it is hard to reach conclusions - perhaps compare performance directly on the same data?

      We have now explicitly clarified this is a different dataset. Please see our response to R1.1a for why a direct comparison was not performed. The following clarification can be found in the Discussion:

      “These results improve upon the ~20% gains previously observed using covariance features [13] albeit on a different dataset, and importantly, CytoSummaryNet effectively overcomes the challenge of recomputation after training, making it easier to use.”

      Reviewer 2

      R2.1 The authors present a well-developed and useful algorithm. The technical motivation and validation are very carefully and clearly explained, and their work is potentially useful to a varied audience.

      That said, I think the authors could do a better job, especially in the figures, of putting the algorithm in context for an audience that is unfamiliar with the cell painting assay. (a) For example, a figure towards the beginning of the paper with example images might help to set the stage. (b) Similarly a schematic of the algorithm earlier in the paper would provide a graphical overview. (c) For the sake of a biologically inclined audience, I would consider labeling the images in the caption by cell type and label.

      Thank you for your valuable suggestions on improving the accessibility of our figures for readers unfamiliar with the Cell Painting assay. We have made the following changes to address your comments:

      1. and b. To provide visual context and a graphical overview of the algorithm, we have moved the original Figure 7 to Figure 1. This figure now includes example images that help readers new to the Cell Painting assay.
      2. We have added relevant details to the example images in (the new) Figure 1

        R2.2 The interpretability results were intriguing. The authors might consider further validating these interpretations by removing weakly informative cells from the dataset and retraining. Are the cells so uninformative that the algorithm does better without them, or are they just less informative than other cells?

      Please see our responses to R1.4 and R3.0

      R2.3 As far as I can tell, the authors only oblique state whether the code associated with the manuscript is openly available. Posting the code is needed for reproducibility. I would provide not only a github, but a doi linked to the code, or some other permanent link.

      We have now added a Code Availability and Data Availability section, clearing stating that the code and data associated with the manuscript are openly available.

      R2.4 Incorporating biological heterogeneity into machine-learning driven problems is a critical research question. Replacing means/modes and such with a machine learning framework, the authors have identified a problem with potentially wide significance. The application to cell painting and related assays is of broad enough significance for many journals, However, the authors could further broaden the significance by commenting on other possible cell biology applications. What other applications might the algorithm be particularly suited for? Are there any possible roadblocks to wider use. What sorts of data has the code been tested on so far?

      We have added the following paragraph to discuss the broader applicability of CytoSummaryNet:

      “The architecture of CytoSummaryNet holds significant potential for broader applications beyond image-based cell profiling, accommodating tabular, permutation-invariant data and enhancing downstream task performance when applied to processed population-level profiles. Its versatility makes it valuable for any omics measurements where downstream tasks depend on measuring similarity between profiles. Future research could also explore CytoSummaryNet's applicability to genetic perturbations, expanding its utility in functional genomics.”

      Reviewer 3

      R3.0 The authors have done a commendable job discussing the method, demonstrating its potential to outperform current models in profiling cell-based features. The work is of considerable significance and interest to a wide field of researchers working on the understanding of cell heterogeneity's impact on various biological phenomena and practical studies in pharmacology.

      One aspect that would further enhance the value of this work is an exploration of the method's separation power across different modes of action. For instance, it would be interesting to ascertain if the method's performance varies when dealing with actions that primarily affect size, those that affect marker expression, or compounds that significantly diminish cell numbers.

      Thank you for encouraging comments!

      We have added the following to Results: Relevance scores reveal CytoSummaryNet's preference for large, isolated cells:

      “Statistical t-tests were conducted to identify the features that most effectively differentiate mechanisms of action from negative controls in average profiles, focusing on the three mechanisms of action where CytoSummaryNet demonstrates the most significant improvement and the three mechanisms where it shows the least. Consistent with our hypothesis that CytoSummaryNet emphasizes larger, more sparse cells, the important features for the CytoSummaryNet-improved mechanisms of action (Supplementary Material I1) often involve the radial distribution for the mitochondria and RNA channels. These metrics capture the fraction of those stains near the edge of the cell versus concentric rings towards the nucleus, which are more readily detectable in larger cells compared to small, rounded cells.

      In contrast, the important features for mechanisms of action not improved by CytoSummaryNet (Supplementary Material I) predominantly include correlation metrics between brightfield and various fluorescent channels, capturing spatial relationships between cellular components. Some of these mechanisms of action included compounds that were not individually distinguishable from negative controls, and CytoSummaryNet did not overcome the lack of phenotype in these cases. This suggests that while CytoSummaryNet excels in identifying certain cellular features, its effectiveness is limited when dealing with mechanisms of action that do not exhibit pronounced phenotypic changes.”

      We have also added supplementary material to support (I. Relevant features for CytoSummaryNet improvement).

      R3.0 Another test on datasets that are not concerned with chemical compounds, but rather genetic perturbations would greatly increase the reach of the method into the functional genomics community and beyond. This additional analysis could provide valuable insights into the versatility and applicability of the proposed method.

      We agree that testing the method's behavior on genetic perturbations would be interesting and could provide insights into its versatility. However, the efficacy of the methodology may vary depending on the specific properties of different genetic perturbation types.

      For example, the penetrance of phenotypes may differ between genetic and chemical perturbations. In some experimental setups, a selection agent ensures that nearly all cells receive a genetic perturbation (though not all may express a phenotype due to heterogeneity or varying levels of the target protein). Other experiments may omit such an agent. Additionally, different patterns might be observed in various classes of reagents, such as overexpression, CRISPR-Cas9 knockdown (CRISPRn), CRISPR-interference (CRISPRi), and CRISPR-activation (CRISPRa).

      We believe that selecting a single experiment with one of these technologies would not adequately address the question of versatility. Instead, we propose future studies that may conclusively assess the method's performance across a variety of genetic perturbation types. This would provide a more comprehensive understanding of CytoSummaryNet's applicability in functional genomics and beyond. We have update the Discussion section to reflect this:

      “Future research could also explore CytoSummaryNet's applicability to genetic perturbations, expanding its utility in functional genomics.”

      R3.1. The datasets were stratified based on plates and compounds. It would be beneficial to clarify the basis for data stratification applied for compounds. Was the data sampled based on structural or functional similarity of compounds? If not, what can be expected from the model if trained and validated using structurally or functionally diverse and non-diverse compounds?

      Thank you for raising the important question of data stratification based on compound similarity. In our study, the data stratification was performed by randomly sampling the compounds, without considering their structural or functional similarity.

      This approach may limit the generalizability of the learned representations to new structural or functional classes not captured in the training set. Consequently, the current methodology may not fully characterize the model’s performance across diverse compound structures.

      In future work, it would be valuable to explore the impact of compound diversity on model performance by stratifying data based on structural or functional similarity and comparing the results to our current random stratification approach to more thoroughly characterize the learned representations.

      R3.2. Is the method prioritizing a particular biological reaction of cells toward common chemical compounds, such as mitotic failure? Could this be oncology-specific, or is there more utility to it in other datasets?

      Our analysis of CytoSummaryNet's performance in (the new) Figure 6 reveals a strong improvement in MoAs targeting cancer-related pathways, such as MEK, HSP, MDM, dehydrogenase, and purine antagonist inhibitors. These MoAs share a common focus on cellular proliferation, survival, and metabolic processes, which are key characteristics of cancer cells.

      Given the composition of the cpg0004 dataset, which contains 1,258 unique MoAs with only 28 annotated as oncology-related, the likelihood of randomly selecting five oncology-related MoAs that show strong improvement is extremely low. This suggests that the observed prioritization is not due to chance.

      Furthermore, the prioritization cannot be solely attributed to the frequency of oncology-related MoAs in the dataset. Other prevalent disease areas, such as neurology/psychiatry, infectious disease, and cardiology, do not exhibit similar improvements despite having higher MoA counts.

      While these findings indicate a potential prioritization of oncology-related MoAs by CytoSummaryNet, further research is necessary to fully understand the extent and implications of this bias. Future work should involve conducting similar analyses across other disease areas and cell types to assess the method's broader utility and identify areas for refinement and application. However, given the speculative nature of these observations, we have chosen not to update the manuscript to discuss this potential bias at this time.

      R3.3 Figures 1 and 2 demonstrate that the CytoSummaryNet profiles outperform average-aggregated profiles. However, the average profiling results seem more consistent when compared to CytoSummaryNet profiling. What further conditions or approaches can help improve CytoSummaryNet profiling results to be more consistent?

      The observed variability in CytoSummaryNet's performance is primarily due to the intentional technical variance in our datasets, where each plate tested different staining protocol variations. It's important to note that this level of technical variance is not typical in standard cell profiling experiments. In practice, the variance across plates would be much lower. We want to emphasize that while a model capable of generalizing across diverse experimental conditions might seem ideal, it may not be as practically useful in real-world scenarios. This is because such non-uniform conditions are uncommon in typical cell profiling experiments. In normal experimental settings, where technical variance is more controlled, we expect CytoSummaryNet's performance to be more consistent.

      R3.4 Can the poor performance on unseen data (in the case of stain 5) be overcome? If yes, how? If no, why not?

      We believe that the poor performance on unseen data, such as Stain 5, can be overcome depending on the nature of the unseen data. As shown in Figure 4 (panel 3), the model improves upon average profiling for unseen data when the experimental conditions are similar to the training set.

      The issue lies in the different experimental conditions. As explained in our response to R3.3, this could be addressed by including these experimental conditions in the training dataset. As long as CytoSummaryNet is trained (seen) and tested (unseen) on data generated under similar experimental conditions, we are confident that it will improve or perform as well as average profiling.

      It's important to note that the issue of generalization to vastly different experimental conditions was considered out of scope for this paper. The main focus is to introduce a new method that improves upon average profiling and can be readily used within a consistent experimental setup.

      R3.5 It needs to be mentioned how the feature data used for CytoSummaryNet profiling was normalized before training the model. What would be the impact of feature data normalization before model training? Would the model still outperform if the skewed feature data is normalized using square or log transformation before model training?

      We have clarified in the manuscript that we standardized the feature data on a plate-by-plate basis to achieve zero mean and unit variance across all cells per feature within each plate. We have added the following statement to improve clarity:

      “The data used to compute the average profiles and train the model were standardized at the plate-level, ensuring that all cell features across the plate had a zero mean and unit variance. The negative control wells were then removed from all plates."

      We chose standardization over transformations like squaring or logging to maintain a balanced scale across features while preserving the biological and morphological information inherent in the data. While transformations can reduce skewness and are useful for data spanning several orders of magnitude, they might distort biological relevance by compressing or expanding data ranges in ways that could obscure important cellular variations.

      Regarding the potential impact of square or log transformations on skewed feature data, these methods could improve the model's learning efficiency by making the feature distribution more symmetrical. However, the suitability and effectiveness of these techniques would depend on the specific data characteristics and the model architecture.

      Although not explored in this study, investigating various normalization techniques could be a valuable direction for future research to assess their impact on the performance and adaptability of CytoSummaryNet across diverse datasets and experimental setups.

      R3.6. In Figure 5 b and c, MoAs often seem to be represented by singular compounds and thus, the test (MoA prediction) is very similar to the training (compound ID). Given this context, a discussion about the extent this presents a circular argument supported by stats on the compound library used for training and testing would be beneficial.

      Clusters in (the new) Figure 7 that contain only replicates of a single compound would not yield an improved performance on the MoA task unless they also include replicates of other compounds sharing the same MoA in close proximity. Please see our response to R1.1c.iii. for details. To improve visual clarity and avoid misinterpretation, we have recomputed the colors for (the new) Figure 7 and grayed out overlapping points.

      R3.7 Can you estimate the minimum amount of supervision (fuzzy/sparse labels, often present in mislabeled compound libraries with dirty compounds and polypharmacology being present) that is needed for it to be efficiently trained?

      It's important to note that the metadata used by the model is only based on identifying replicates of the same compound. Mechanism of action (MoA) annotations, which can be erroneous due to dirty compounds, polypharmacology, and incomplete information, are not used in training at all. MoA annotations are only used in our evaluation, specifically for calculating the mAP for MoA retrieval.

      We have successfully trained CytoSummaryNet on 72 unique compounds with 4 replicates each. This is the current empirical minimum, but it is possible that the model could be trained effectively with even fewer compounds or replicates.

      Determining the absolute minimum amount of supervision required for efficient training would require further experimentation and analysis. Factors such as data quality, feature dimensionality, and model complexity could influence the required level of supervision.

      R3.minor1 Figure 5: The x-axis and y-axis tick values are too small, and image resolution/size needs to be increased.

      We have made the following changes to address the concerns:

      • Increased the image resolution and size to improve clarity and readability.
      • Removed the x-axis and y-axis tick values, as they do not provide meaningful information in the context of UMAP visualizations. We believe these modifications enhance the visual presentation of the data and make it easier for readers to interpret the results.

      R3.minor2 The methods applied to optimize hyperparameters in supplementary data need to be included.

      We added the following to Supplementary Material D:

      “We used the Weights & Biases (WandB) sweep suite in combination with the BOHB (Bayesian Optimization and HyperBand) algorithm for hyperparameter sweeps. The BOHB algorithm [47] combines Bayesian optimization with bandit-based strategies to efficiently find optimal hyperparameters.

      Additionally Table D1 provides an overview of all tunable hyperparameters and their chosen values based on a BOHB hyperparameter optimization.”

      R3.minor3 Figure 5(c, d): The names of compound 2 and Compound 5 need to be included in the labels.

      These compounds were obtained from external companies and are proprietary, necessitating their anonymization in our study. This has now been added in the caption of (the new) Figure 7:

      “Note that Compound2 and Compound5 are intentionally anonymized.”

      R3.minor4 Table C1: Plate descriptions need to be included.

      *Table C1: The training, validation, and test set stratification for Stain2, Stain3, Stain4, and Stain5. Five training, four validation, and three test plates are used for Stain2, Stain3, and Stain4. Stain5 contains six test set plates only. *

      __Stain2 __

      Stain3

      Stain4

      Stain5

      Training plates

      Test plates

      BR00113818

      BR00115128

      BR00116627

      BR00120532

      BR00113820

      BR00115125highexp

      BR00116631

      BR00120270

      BR00112202

      BR00115133highexp

      BR00116625

      BR00120536

      BR00112197binned

      BR00115131

      BR00116630highexp

      BR00120530

      BR00112198

      BR00115134

      200922_015124-Vhighexp

      BR00120526

      Validation plates

      BR00120274

      BR00112197standard

      BR00115129

      BR00116628highexp

      BR00112197repeat

      BR00115133

      BR00116629highexp

      BR00112204

      BR00115128highexp

      BR00116627highexp

      BR00112201

      BR00115127

      BR00116629

      Test plates

      BR00112199

      BR00115134bin1

      200922_044247-Vbin1

      BR00113819

      BR00115134multiplane

      200922_015124-V

      BR00113821

      BR00115126highexp

      BR00116633bin1

      We have added a reference to the metadata file in the description of Table C1: https://github.com/carpenter-singh-lab/2023_Cimini_NatureProtocols/blob/main/JUMPExperimentMasterTable.csv

      R3.minor5 Figure F1: Does the green box (stain 3) also involve training on plates from stain 4 (BR00116630highexp) and 5 (BR00120530) mentioned in Table C1? Please check the figure once again for possible errors.

      We have carefully re-examined Figure F1 and Table C1 to ensure their accuracy and consistency. Upon double-checking, we can confirm that the figure is indeed correct. We intentionally omitted the training and validation plates from Figure F1 to maintain clarity and readability, as including them resulted in a cluttered and difficult-to-interpret figure.

      Regarding the specific plates mentioned:

      • BR00116630highexp (Stain4) is used for training, as correctly stated in Table C1. This plate is considered an outlier within the Stain4 dataset and happens to cluster with the Stain3 plates in Figure F1.
      • BR00120530 (Stain5) is part of the test set only and correctly falls within the Stain5 cluster in Figure F1. To improve the clarity of the training, validation, and test split in Table C1, we have added a color scheme that visually distinguishes the different data subsets. This should make it easier for readers to understand the distribution of plates across the various splits.
  2. Jul 2024
    1. Cohort studies increasingly collect biosamples for molecular profiling and are observing molecular heterogeneity. High throughput RNA sequencing is providing large datasets capable of reflecting disease mechanisms. Clustering approaches have produced a number of tools to help dissect complex heterogeneous datasets, however, selecting the appropriate method and parameters to perform exploratory clustering analysis of transcriptomic data requires deep understanding of machine learning and extensive computational experimentation. Tools that assist with such decisions without prior field knowledge are nonexistent. To address this we have developed Omada, a suite of tools aiming to automate these processes and make robust unsupervised clustering of transcriptomic data more accessible through automated machine learning based functions. The efficiency of each tool was tested with five datasets characterised by different expression signal strengths to capture a wide spectrum of RNA expression datasets. Our toolkit’s decisions reflected the real number of stable partitions in datasets where the subgroups are discernible. Within datasets with less clear biological distinctions, our tools either formed stable subgroups with different expression profiles and robust clinical associations or revealed signs of problematic data such as biased measurements.

      This work has been peer reviewed in GigaScience (see paper), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer name: **Pierre Cauchy **

      Kariotis et al present Omada, a tool dedicated to automated partitioning of large-scale, cohort-based RNA-Sequencing data such as TCGA. A great strength for the manuscript is that it clearly shows that Omada is capable of performing partitioning from PanCan into BRCA, COAD and LUAD (Fig 5), and datasets with no known groups (PAH and GUSTO), which is impressive and novel. I would like to praise the authors for coming up with such a tool, as the lack of a systematic tool dedicated to partitioning TCGA-like expression data is indeed a shortcoming in the field of medical genomics Overall, I believe the tool will be very valuable to the scientific community and could potentially contribute to meta-analysis of cohort RNA-Seq data. I only have a few comments regarding the methodology and manuscript. I also think that it should be more clearly stated that Omada is dedicated to large datasets (e.g. TCGA) and not differential expression analysis. I would also suggest benchmarking Omada to comparable tools via ROC curves if possible (see below). Methods: This section should be a bit more homogeneous between text descriptive and mathematical descriptive. It should specify what parts are automated and what part needs user input and refer to the vignette documentation. I also could not find the Omada github repository. Sample and gene expression preprocessing: To me, this section lacks methods/guidelines and only loosely describes the steps involved. "numerical data may need to be normalised in order to account for potential misdirecting quantities" - which kind of normalisation? "As for the number of genes, it is advised for larger genesets (>1000 genes) to filter down to the most variable ones before the application of any function as genes that do not vary across samples do not contribute towards identifying heterogeneity" What filtering is recommended? Top 5% variance? 1%? Based on what metric? Determining clustering potential: To me, it was not clear if this is automatically performed by Omada and how the feasibility score is determined. Intra-method Clustering Agreement: Is this from normalised data? Because affinity matrix will be greatly affected whether it's normalised or non-normalised data as the matrix of exponential(-normalised gene distance)^2 Spectral clustering step 2: "Define D to be the diagonal matrix whose (i, i)-element is the sum of A's i-th row": please also specify that A(i,j) is 0 in this diagonal matrix. Please also confirm which matrix multiplication method is used, product or Cartesian product? Also if there are 0 values, NAs will be obtained in this step. Hierarchical clustering step 5: "Repeat Step 3 a total of n − 1 times until there is only one cluster left." This is a valuable addition as this merges identical clusters, the methods should emphasise that the benefits of this clustering reduction method to help partition data, i.e. that this minimises the number of redundant clusters. Stability-based assessment of feature sets: "For each dataset we generate the bootstrap stability for every k within range". Here it should be mentioned that this is carried out by clusterboot, and the full arguments should be given for documentation "The genes that comprise the dataset with the highest stability are the ones that compose the most appropriate set for the downstream analysis" - is this the single highest or a gene list in the top n datasets? Please specify. Choosing k number of clusters: "This approach prevents any bias from specific metrics and frees the user from making decisions on any specific metric and assumptions on the optimal number of clusters.". Out of consistency with the cluster reduction method in the "intra-clustering agreement" section which I believe is a novelty introduced by Omada, and within the context of automated analysis, the package should also ideally have an optimized number of k-clusters. K-means clustering analysis is often hindered due to the output often resulting in redundant, practically identical clusters which often requires manual merging. While I do understand the rationale described there and in Table 3, in terms of biological information and especially for deregulated genes analysis (e.g. row z-score clustering), should maximum k also not be determined by the number of conditions, i.e 2n, e.g. when n=2, kmax=4; n=3, kmax=8? Test datasets and Fig 6: Please expand on how the number of features 300 was determined. While this number of genes corresponds to a high stability index, is this number fixed or can it be dynamically estimated from a selection (e.g. from 100 to 1000)? Results Overall this section is well written and informative. I would just add the following if applicable: Figure 3: I think this figure could additionally include benchmarking, ROC curves of. Omada vs e.g. previous TCGA clustering analyses (PMID 31805048) Figure 4: I think it would be useful to compare Omada results to previous TCGA clustering analyses, e.g. PMID 35664309 Figure 6: swap C and D. Why is cluster 5 missing on D)?

    1. Today, in order to bridge an emerging chasm, African-Americanwriters may seek to initiate and sustain a greater dialogue betweenactivists and academics. Analyzing the relationship between commen-tary and organizing strengthens critical writing, research, and activ-ism. Or, as Cornel West notes: "Local activists must become more andmore at the center of how we think about the condition for the possibilityof social motion and social movement."52 This seems particularly truein interracial rape cases where racism and sexism violently converge andmythology shapes cultural meanings and social and legal prosecution

      !!!!

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02491

      Corresponding author(s): Gilbert, Vassart

      1. General Statements [optional]

      We thank referees 1 and 2 for their in-depth analysis of our manuscript. They see interest in our study, with questions to be answered. Referee 3 is essentially negative, considering that there is nothing new ("novel finding is missing"). We respectfully disagree with him/her, comforted by the opinion of referee 2 that "the authors developed a protocol to reproducibly generate fetal-like spheroids from adult tissue is an important advancement in the field and ... the manuscript should attract a significant amount of attention in the intestinal field" and we provide evidence in our answers that he/she did not read the manuscript with the same attention as referees 1 and 2 (see in particular answer to his/her question 5).

      Here is a summary of the main reason why we consider that our study represents valuable new information in the field of intestinal regeneration.

      It is based on the serendipitous observation that dissociation of adult intestinal tissue by collagenase generates stably replatable spheroids upon culture in matrigel. Surprisingly and contrary to canonical EDTA-generated intestinal organoids and fetal spheroids, these spheroids are not traced in Rosa26Tomato mice harboring a VilCre transgene, despite expressing robustly endogenous Villin. Our interpretation is that adult intestinal spheroids originate from a cell lineage, distinct from the main developmental intestinal lineage, in which the VilCre transgene is unexpectedly not expressed, probaly due to the absence of cis regulatory sequences required for expression in this lineage.

      Adult spheroid transcriptome shares a gene signature with the YAP/TAZ signature commonly expressed in models of intestinal regeneration. This led us to look for VilCre negative crypts in the regenerating intestine of Lgr5/DTR mice in which Lgr5-positive stem cells have been ablated by diphtheria toxin. Numerous VilCre negative clones were observed, identifying a novel lineage of stem cells implicated in intestinal regeneration.

      FACS purification and scRNAseq analysis of the rare VilCre negative cells present at homeostasis identified a population of cells with characteristics of quiescent stem cells.

      In sum, we believe that our study demonstrates the existence of a hitherto undescribed stem cell lineage involved in intestinal regeneration. It points to the existence of a hierarchical model of intestinal regeneration in addition to the well-accepted plasticity model.

      2. Description of the planned revisions

      See section 3 below.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Here is a point-by-point reply to the queries of the three referees, with indication of the revisions introduced in the manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      • *In this manuscript, Marefati et al report an Lgr5-independent lineage in the regenerating intestine using in vitro organoids and in vivo injury-coupled lineage tracing model. In organoids, collagenase/dispase dissociated resulted in "immortal spheroids" that maintain a cystic and undifferentiated phenotype in the absence of standard growth factors (Rspondin/Noggin/EGF). Bulk RNAseq of spheroids demonstrates downregulation of classical CBC signatures and upregulation of fetal spheroid, mesenchymal, inflammation and regenerative signatures. In mice, Villin-Cre lineage tracing revealed some Villin- negative progenies that lack reporter tracing throughout crypt-villus ribbons after injury.

      *The authors proposed that there is Lgr5-independent population support the regenerative response upon CBC depletion. A major caveat of this study is the identification of this population is based on absence of VilCre expression. *

      We respectfully disagree. It is precisely this characteristic that makes the interest of our study. Whereas mosaicism of transgene expression is widespread and usually of little significance, our study shows that the rare VilCre-negative cells in the intestinal epithelium are not randomly showing this phenotype: they give specifically birth to what we call adult spheroids and regenerating crypts, which cannot be due to chance. The absence of VilCre expression allows tracing these cells from the zygote stage of the various VilCre/Ros26 reporter mice. We have modified our text to emphasize this point.

      *It is surprising that there is no characterisation of Lgr5 expression throughout the manuscript whilst claiming of a Lgr5- independent lineage. *

      We understand the perplexity of the referee not to see direct Lgr5 expression data in our manuscript, given our title. However, our point is that it is the cells at the origin of adult spheroids and the regenerating crypts we have identified that are Lgr5-negative, not the spheroids or the regenerated crypts themselves. Those are downstream offspring that may, and indeed have, gained some Lgr5 expression (e.g. figure 3F). We believe that our data showing that VilCre-negative spheroids are not traced in Lgr5-CreERT2/Rosa reporter mice convincingly demonstrate absence of Lgr5 expression in the cells at the origin of adult spheroids (figure 4G). We think that this experiment is better evidence than attempts to show absence of two markers (Tom and Lgr5) in the rare "white" cells present in the epithelium. Regarding the Lgr5 status of cells at the origin of the regenerating "white" crypts that we have identified, the early appearance of these crypts following ablation of CBC (i.e. Lgr5+ve) cells is a strong argument that they originate from Lgr5-negative cells. Regarding the scRNAseq experiment, Lgr5 transcripts are notoriously low and difficult to measure reliably in CBCs (Haber et al 2017). However, blowing up the pertinent regions of the merged UMAP allows showing some Lgr5 transcripts in clusters 5,6 and none in cluster 1 of figure 8GH. Given the very low level of detection, we had chosen not to include these data in the manuscript, but we hope they may help answer the point of the referee (see portion of UMAP below, with Olfm4 as a control, together with the corresponding violin plot). Several markers that gave significant signals in the CBC cluster (Smoc2, Axin2, Slc12a2) were virtually undetectable in the Olfm4-low /Tom-negative cluster of our scRNAseq data (figure 8I) supporting our conclusion.

      Although the research question is potentially interesting, the concept of epithelial reprogramming upon injury is well documented in the field. The data generated in this manuscript also seem to be preliminary and lack of detailed characterisation. Below are specific comments.

      We do not question the existence of epithelial reprogramming upon injury. We believe our data show, in addition to this well demonstrated phenomenon, the existence of rare cells traced by absence of VilCre expression that are at the origin of a developmental cell lineage distinct from Lgr5+ stem cells and also implicated in regeneration.

      • Expression of Lgr5 should be properly characterised throughout the manuscript in both organoid models and injury-induced regeneration in vivo.
      • *

      See above for a detailed answer to this point.

      • An important question is the origin of these "Lgr5-independent" adult spheroids. They look and appear like fetal organoids, which could be induced by injury (e.g. upon collagenase/dispase dissociation). Have the authors tried to culture fetal spheroids in BCM over extensive period of time? Do they behave the same? This would be a great way to directly compare the collagenase/dispase-derived organoids with fetal origin. * *Fetal spheroids require ENR for survival and die in BCM. We have chosen to illustrate this point in Fig2A by showing that, contrary to adult spheroid, they die even when only Rspondin is missing.

      • Fig 1C, Why is the replating spheroid culture time different between mesenchymal cells and conditioned medium? We took the earliest time showing convincingly the return to the organoid phenotype. This timing difference does not modify the conclusion that EDTA organoids becoming spheroid-like when exposed to factors originating from mesenchymal cells revert to the organoid phenotype when returned to ENR medium without mesenchymal influence.

      • *It is unclear how the bulk RNA-seq data in Fig. 3 were compared. How long were the adult organoids and spheroids cultured for (how many passages)? Were they culture in the same condition of were they in ENR vs BCM? * Both EDTA organoids and spheroids displaying a stable phenotype were used in this experiment. Organoids were collected at passage 4, day 5; spheroids were collected at passage passage 9 day 3.

      As stated in the legend to the figure: "...to allow pertinent comparison spheroids and organoids were cultured in the same ENR-containing medium...".

      These are important information to consider when interpreting the results. For instance, are Ptgs1 & Ptgs2 expression in adult spheroids the same in ENR vs BCM? Are the gene signatures (regenerative, fetal and YAP) changed in adult spheroids culturing in ENR vs BCM?

      We did compare bulk RNAseq of EDTA organoids to ENR-cultured spheroids, short term (passage 6, day 6) BCM-cultured spheroids and long term BCM-cultured (passage 26, day 6) spheroids. To avoid overloading the manuscript these data were not shown in the original manuscript. In summary the BCM-cultured spheroids display a similar phenotype as those cultured in ENR, but with further de-differentiation. See in revision plan folder the results for PTGS, some differentiation markers and fetal regenerative markers including YAP induced genes.

      We have included a brief description of these data in the new version of the manuscript and added an additional supplementary file (Suppl table 2) presenting the whole data set.

      • It is stated: "In agreement with their aptitude to grow indefinitely, adult spheroids express a set of upregulated genes overlapping significantly with an "adult tissue stem cell module" [159/721 genes; q value 2.11 e-94) (Fig.S2F)].". What is the definition of "indefinitely"? Are they referring to the Fig 1B where spheroid were passaged to P10? The authors should avoid the term "indefinitely" but use a more specific time scale, e.g. passages, months etc.

      We agree that the term indefinitely should be avoided, as it is vague. We have introduced the maximum number of passages during which we have maintained the stable spheroid phenotype (26 passages). Also worth noting, the spheroids could be frozen and cultured repeatedly over many months.

      SuppFig 3D: Row Z-Score is missing the "e" in Score.

      Corrected

      • Fig 4E: Figure legend says QNRQ instead of CNRQ. Corrected

      • Fig 4G: The brightfield image of adult spheroids 5 days after 3x TAM injections doesn't look like a spheroid. It seems to be differentiating. True, the choice was not the best as the spheroids started to darken. When further replated, however, the offspring of these spheroids showing a clear phenotype remain negative 30 days after tamoxifen administration as shown on the figure. We are sorry, but for reasons explained in section 4 below, we cannot redo the experiment to get a better picture.

      • Fig 4: Most mouse model data are missing the number of mice & their respective age used for organoid isolation. We have introduced these data in the legend.

      • *Fig 4A-D, H-G: How was fluorescent signal of organoids quantified? *

      The settings of fluo imaging or time of LacZ staining were the same for organoids and spheroid pictures. This has been added to the material and methods of the figure and an example is shown below for Rosa26Tomato.

      *How many images? * 2 per animal per condition.

      *Were there equal numbers of organoids? *

      No, see number of total elements counted added to the figure

      This all needs to be included in methods/figure legends.

      We have introduced additional pertinent information in the material and methods section.

      • Figure 4B-D, G-H: Which culturing conditions were used for adult spheroids? Original method or sandwich method? These data were obtained with the original protocol

      • Fig 6D-E: Please add the timepoint after DT administration these samples are from. It is not listed in text or figure legend. These samples were those obtained from mice sacrificed at the end of the 5 day period as indicated in panel A. This has been emphasized in the legend of the figure.

      • SuppFig 6D: again timepoint is missing. In this experiment all samples were untreated as indicated. This has been emphasized in the legend of the figure.

      • SuppFig 6: How were the crypts of these mice (DT WT & DT HE) isolated? Was this via EDTA? This was RNA extracted from total uncultured EDTA-released material (crypts). This has been emphasized in the legend of the figure.

      Also, what is the timepoint for isolation for these samples? Even if untreated, the timepoint adds context to the data. Please add more context to describing these different experiments, either in the figure legends or methods section.

      All these experiments were from 2 month old animals. We have indicated this in the legend of the figure.

      • SuppFig 6E: The quality of the heatmap resolution is too poor to read gene names. We have improved the resolution of the figure and hope the name of the genes are readable now.

      • 5-7, are the regenerating crypt-villus units fully differentiated or are they maintained in the developmental state? Immunostaining of markers for stem cells (Lgr5), differentiated lineages (Alpi, Muc2, Lyz, ChgA etc.) and fetal state (Sca1, Trop2 etc) should be analysed in those "white" unrecombined crypt-villus units. The differentiation phenotype is shown by the clear presence of morphologically-identified Paneth and Goblet cells. We agree that specific immunostainings could have been performed to further explore this point. Regarding the fetal state, Clu expression was shown during the regeneration period (see figure 7D,E).

      Unfortunately, for reasons explained in section 4 below, we are not in a position to perform these additional experiments.

      • The following text needs clarification: "The kinetics of appearance of newly formed un-recombined ("white") crypts was studied after a single pulse of DT (Fig.7A). This demonstrated an increase at 48 hours, with further increase at day 10 and stable maintenance at day 30. The presence of newly formed white crypts one month after toxin administration indicates that the VilCre-negative lineage is developmentally stable and does not turn on the transgene during differentiation of the various epithelial lineages occurring after regeneration (Fig.7B).

      *Comment: The "newly formed" is an overstatement, the data doesn't conclude that those are "new" crypts. *

      Except if we do not understand the point, we think we can write that a fraction of "white" crypts must be "newly formed", since they are in excess of those present in untreated animals at the same time point.

      *The end of the sentence states that these "white" crypts form developmentally stable lineages, thus these white crypts at day 30 could originate from the initial injury. *

      As stated above, we consider that crypts found in excess of those present in untreated animals result from the initial injury.

      *There was no characterisation of the various epitheial lineages. Are they fully differentiated? *

      See above the point related to Paneth cells and Goblet cells.

      Is Lgr5 expressed one month after toxin administration? Can the VilCre neg lineage give rise to CBCs?

      We have tried hard to show presence or absence of Lgr5 in white crypts at the various times following DT administration. We tried double RFP / Lgr5-RNA scope labeling and double GFP/RFP immunolabeling. Unfortunately, we could not get these methods to produce convincing specific labeling of CBCs in homeostatic crypts, which explains why we could not reach a conclusion regarding the white crypts.

      However, there is an indirect indication that "chronic" white crypts (i.e. those caused by DTR expression in CBC, plus those observed 30 days after DT administration) do not express Lgr5. Indeed, acute regeneration indicated by Clu expression at day 5 in Fig.7C is lower in white crypts than in red ones strongly suggesting that white crypts preexisting DT administration (the "chronic ones) do not express Lgr5DTR.

      The relationship between white crypt generation and appearance of Clu-positive revival cells (Ayyaz et al., 2019) was then explored. In agreement with others and similar to what happens in the irradiation model, (Ayyaz et al., 2019; Yuan et al., 2023) Clu-positive cells were rare in crypts of untreated mice and their number transiently increased forty-eight hours after a single pulse of DT, and more so after three pulses of DT (Fig.7C,D).

      Comment: Comparing 1 pulse at day 2 vs 3 pulses at day 5 makes the data hard to interpret. How is the Clu ISH level for 1 pulse at day 5? Are they equivalent?

      After a single pulse of of DT, Clu is only transiently increased. As shown by Ayyaz et al it is back to the starting point at day 5 (supplementary figure 4 of Ayyaz et al).

      Clu-positive cells were less frequently observed in white crypts (see "Total" versus "White" in Fig.7C). This fits with the hypothesis that Clu expression marks acutely regenerating crypts and that a proportion of the white crypts are chronically regenerating due to DTR expression in CBCs."

      *Comment: I believe the authors suggested that the discrepancy of less Clu expression in white crypts is due to the ectopic expression of DTR in CBCs causing low grade injury without DT administration. This means that some white crypts could have been formed before the administration of DT, and thus are on a different regenerative timeline compared to the white crypts formed from DT administration. *

      Yes, this is our interpretation. We have clarified it in the text.

      Is there any proof of the chronic regeneration? Immunostaining of chronic regenerative markers such as Sca1, Anxa1 or Yap1 nuclear localization would support the claim. It'd be important to show only the white crypts, but not the RFP+ ones, show regenerative markers.

      We think that the steady state higher number of white crypts in untreated Lgr5-DTR animals, compared to wild type siblings indicates chronical low-grade regeneration, which is supported by the RNAseq data (Suppl fig6). It must be noted, however, that this phenotype is mild compared to the well described fetal-like regeneration phenotype described in most injury models. Since these white crypts were made at undetermined earlier stages, the great majority of them are not expected to show markers of acute regeneration like Clu, Sca1....

      Fig 7D-E: What are the timepoints of harvest for HE-WT-HE 1 pulse DT mice and HE- HE-HE PBS injected mice?

      We have added this information in the figure.

      • *Fig 8-9: Regarding the CBC-like Olfm4 low population, what is the status of Lgr5? This should be shown in the figure since the argument is that this is an Lgr5-independent lineage. * See response to the second point.

      And what about the regenerative, Yap, mesenchymal and inflammatory signatures? Are they enriched in the white crypts similar to the in vitro spheroids?

      In a portion of white crypts, those we believe are newly formed after CBC ablation (see above), there is a transient increase in Clu, which may be considered a marker of Yap activation. In the CBC-like Olfm4 low cells, as seen by scRNAseq, there is nothing like an actively regenerating phenotype. This is expected, since these cells are coming from homeostatic untreated VilCre/Rosa26Tom animals and are supposed to be quiescent "awaiting to be activated".

      Reviewer #1 (Significance (Required)):

      Strengths: The study employed a range of in vitro and in vivo models to test the hypothesis.

      • *

      *Limitations: Unfortunately, the models chosen did not provide sufficient evidence to draw the conclusions. Injury induced reprogramming, both in vivo and in vitro, has been well documented in the field. The new message here is to show that such reprogrammed state is continuous rather than transient; instead of regenerating Lgr5+ stem cells, it can continue to differentiate to all cell lineages in Lgr5-independent manner.

      *

      We respectfully disagree with this analysis of our results. What we show is not "that such reprogrammed state is continuous rather than transient; instead of regenerating Lgr5+ stem cells, it can continue to differentiate to all cell lineages in Lgr5-independent manner", but that a quiescent stem cell line, not previously identified, is activated to regenerate a portion of crypts following CBC ablation. These cells are not reprogrammed, they correspond to a developmental lineage waiting to be activated and keep their VilCre-negative state at least of 30 days. We believe that their "by default tracing" (VilCre negative from the zygote stage) is as strong an evidence for the existence of such a lineage as positive lineage tracing would be. The increase in crypts originating from this lineage after CBC ablation indicates that it is implicated in regeneration. We do not question the well-demonstrated plasticity-associated reprogramming taking place during regeneration; we simply suggest that this would coexist with the involvement of the quiescent VilCre-negative lineage we have identified.

      *However, through the manuscript, there was no immunostaining of Lgr5 and other differentiation markers. The conclusion is an overstatement without solid proof. * We have provided the best answer we could to this point in our answer to the second question of the referee hereabove.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, the Marefati et al. developed a novel approach to generate spheroids from adult intestinal epithelium using a collagenase/dispase based protocol. Adult spheroids were found to be distinct from classic budding-type organoids normally generated from EDTA based release of the crypt epithelium. Transcriptional profiling indicated that adult spheroids were undifferentiated and similar to regenerating crypts or fetal spheroids. To identify the cell of origin that generates adult spheroids, the authors labelled epithelial cells with VilCreERT-LSL-Tom, VilCre-LSL-GFP and Lgr5CreERT- LSLTom mice. From these experiments the authors conclude that that spheroids are only generated from Vil-Cre negative and Lgr5 negative cells. Next the authors deleted the anti- apoptotic gene Mcl1 using Vil-CreERT mice. This led to a strong apoptotic response throughout the crypt epithelium and tissues processed from knockout mice readily generated spheroids, and in vivo, replenishment of the gut epithelium was mediated by unrecombined cells. In a second model, CBCs were ablated using Lgr5DTR mice and VilCre negative cells were found again to contribute to regeneration of the crypt epithelium. Finally based on the absence of Vil-Cre reporter activity, the authors were able to sort out and perform scRNAseq to profile VilCre negative cells. These cells were found to be quiescent, express the stem cell marker Olfm4 and were also abundant in ribosomal gene expression.

      • *

      The fact that the authors developed a protocol to reproducibly generate fetal-like spheroids from adult tissue is an important advancement in the field. Previous reports have shown that treatment with various small molecule inhibitors can revert budding organoids into a spheroid morphology, but this manuscript demonstrates that spheroids can also be generated from otherwise untreated cells. This new methodology will provide new tools to dissect the molecular determinants of fetal/regenerative cells in the gut. Based on this, the manuscript should attract a significant amount of attention in the intestinal field.

      • *

      As pointed out by the authors themselves the study has important limitations that diminish enthusiasm. The primary issue relates to the inability of the team to identify markers of VilCre neg cells other than the fact that these cells are Olfm4+ and quiescent. Nonetheless, for the reasons stated above the manuscript should reach the target audience within the research community, if the authors can address the specific points below related to issues with methodology as well as defining more precisely the characteristics and growth requirements of adult spheroid cultures.

      Thank you for this positive analysis of our study.

      Major comments

      The main conclusion of the study is that Vil-Cre neg cells are rare quiescent Olfm4+ crypt cells. If this is the case, then standard EDTA treatment should release these cells as well. Consequently, spheroids should also emerge from isolated crypts grown in the absence of ENR. If this is not the case how do the authors explain this?

      We have tried hard to generate spheroids by culturing EDTA organoids in medium lacking ENR and by treating EDTA organoids with collagenase/dispase, without success. Therefore, we are left with the conclusion that spheroid-generating cells must be more tightly attached to the matrix than those released by EDTA, and that it is their release from this attachment by collagenase that triggers a regeneration-like phenotype. This hypothesis is supported by several models of regeneration in other tissues as indicated in our references (Gilbert et al., 2010; Machado et al., 2021; Montarras et al., 2005).

      From the text the authors appear to suggest that growth of adult spheroids is dependent initially on "material" released by collagenase/dispase treatment. An obvious candidate would be mesenchymal cells, which are known to secrete factors such as Wnts and PGE2 that drive spheroid morphology. To test this, the authors should treat spheroid cultures with Porcupine and/or PGE2 inhibitors.

      We followed similar reasoning, considering that spheroids express strongly Ptgs1 ,2 (Figure 3A). We thought their phenotype might be maintained by autocrine prostaglandin action. We tested aspirin, a Ptgs inhibitor, which was without effect on the spheroid phenotype. Besides, we explored a wide variety of conditions to test whether they would affect the spheroid phenotype [Aspirin-see above, cAMP agonists/antagonists, YapTaz inhibitors (verteporfin and CA3), valproic acid, Notch inhibitors (DAPT, DBZ, LY511455), all-trans retinoic acid, NFkB inhibitors (TCPA, BMS), TGFbeta inhibitor (SB431542)]. As these results were negative, we did not include them in the manuscript.

      • If these inhibitors block growth then this would suggest that either stromal cells or autocrine signalling involving these pathways is important. Overall, more in-depth analysis of the growth requirements of adult spheroids is required.*

      Figure 1d indicates that adult spheroids can be propagated for at least 10 passages. The abstract mentions they are "immortal". The text itself does not address this issue. More precise information as to how long spheroids can be propagated is required. If these cultures can be propagated for 10 passages or more it becomes important to determine what nutrients/mitogens in the basal media are driving growth? Alternatively, what is the evidence that spheroid cultures are completely devoid of mesenchymal cells. The text only mentions that "Upon replating, these spheroids could be stably cultured free of mesenchymal cells (Fig.1B)". No validation is shown to support this.

      We agree that "immortal" is not a good way to characterize our spheroids, as also pointed out by referee nr 1. We have changed that in the text, indicating the maximal number of replating we tested was 26 and replacing immortal by stably replatable. Of note, the spheroids could frozen/thawed and recultured many times.

      Related to the question whether mesenchymal cells could still contaminate the spheroid cultures, we can provide the following answers:

      • No fibroblasts could be seen in replated cultures and multiple spheroids could be repeatedly propagated from a single starting spheroid.
      • The bulk RNAseq experiment comparing organoids to ENR or BCM cultured spheroids show, despite expression of several mesenchymal markers (see matrisome in Fig3), absence of significant expression of Pdgfra (see in revision plan folder for CP20Millions results from the raw data of new suppl table 2, with Clu, Tacstd2 and Alpi shown as controls).
      • Regarding the nutrients/mitogens in the medium driving spheroid growth, we did not explore the point further than showing that they grow in basal medium (i.e. advanced DMEM), given that the presence of Matrigel makes it difficult to pinpoint what is really needed. In Figure 2, the authors describe the growth requirements for adult spheroids and indicate that spheroids grown in ENR or EN became dark and shrink. The representative images showing this are clear, but this analysis should be quantified.

      Added to the manuscript.

      In SF3, the gene expression profile of organoids from the sandwich method only partially overlaps with that of organoids from the old protocol. What are the gene expression differences between the 2 culture systems? Secondly, the sandwich method appears to sustain growth of Tom+ spheroids based on RNAseq and the IF images. This suggest that Vil-Cre negative cells are not necessarily the only source of adult spheroids and thus this experiment seems to indicate that any cell may be converted to grow as a spheroid under the right conditions. These points should be addressed.

      Looking back to our data in order to answer the point raised by the referee, we realized that we had inadvertently-compared organoids to ENR-cultured spheroids generated by the first protocol to BCM-cultured spheroids generated by the sandwich method. We have corrected this error in a new version of suppl fig3. This shows increased correspondence between genes up- or downregulated in the spheroids obtained in the two protocols (from 49/48% to 57/57% (Venn diagram on the new figure). We agree that, even after this correction, the spheroids obtained with the two protocols present sizeable differences in their transcriptome. However, considering the very different way these spheroids were obtained and cultured initially, we do not believe this to be unexpected. The important point in our opinion is that the core of the up- and down-regulated genes typical of the de-differentiation phenotype of adult spheroids is very similar, as shown in the heatmap (which was made with the correct samples!). Also, a key observation is that that both kind of spheroids survive and can be replated in basal medium. As already stated, this characteristic is only seen rare cases [spheroids obtained from rare FACS-purified cells (Smith et al 2018) or helminth-infected intestinal tissue (Nusse et al.2018)]. Together with the observation that the majority of them is not traced by VilCre constitutes what we consider the halmark of the spheroids described in our study. As shown in figure 4E (old protocol) and Suppl Fig.3 (sandwich protocol) both red and white spheroids were extremely low in VilCre expression. As stated in the text, the fact that some spheroids are nevertheless red is most probably related to the extreme sensitivity of the Rosa26Tom marker to recombination (Liu et al., 2013), but this does not mean that there are two phenotypically different kind of spheroids. It means that the arbitrary threshold of Rosa26Tom recombination introduces an artificial subdivision of spheroids with no phenotypical significance.

      Regarding the point made by the referee that "that any cell may be converted to grow as a spheroid under the right conditions", we agree and have shown with others that organoids acquire indeed a spheroid phenotype when cultured for instance in fibroblasts-conditioned medium (see suppl fig1B and (Lahar et al., 2011; Roulis et al., 2020) quoted in the manuscript). However, these spheroids cannot be propagated in basal medium, and revert to an organoid phenotype when put back in ENR (Suppl fig1B).

      *In Figure 4, the authors conclude that spheroids do not originate from Lgr5 cell derived clones even after 30days post Tam induction. Does this suggest that in vivo and under homeostatic conditions VilCre neg cells are derived from a distinct stem cell pool or are themselves a quiescent stem cell. Given the rarity of VilCre neg cells, the latter seems unlikely.

      *

      Despite their rarity, we believe VilCre-negative cells observed under homeostatic conditions are themselves quiescent stem cells. Actually, if they were derived from a larger stem cell pool, this pool should also be VilCre-negative. And we do not see such larger number of VilCre-neg cells under homeostatic conditions.

      The problem with the original assertion is that Lgr5-CreERT mice are mosaic and therefore not all Lgr5+ cells are labelled in this model. "White" spheroids may thus derive from cells that in turn derive from these unlabelled Lgr5 cells.

      We had considered the possibility that mosaicism [very low for VilCre (Madison et al., 2002); in the 40-50% range for Lgr5CreERT2 (Barker & Clevers. Curr Protoc Stem Cell Biol. 2010 Chapter 5)] could explain our data. We think, however that we can exclude this possibility on the basis that spheroids do not conform to the expected ratio of unrecombined cells, given the observed level of mosaicism. Indeed, for VilCre, a few percent, at most, of unrecombined cells in the epithelium translates into almost 100% unrecombined spheroids. For Lgr5CreERT2 mice, the mosaicism level is in the range of 40%, which is what we observe for EDTA organoids (Figure 4G), while spheroids were in their vast majority unrecombined.

      We have included a discussion about the possible role of mosaicism in the new version.

      ATACseq experiments were briefly mentioned in the manuscript but unfortunately little information was extracted from this experiment. What does this experiment reveal about the chromatin landscape of adult spheroids relative to normal organoids?

      We only performed this experiment to search for an explanation to the paradoxical absence of expression of the VilCre transgene in spheroids, despite robust expression of endogenous villin (Suppl Fig.4). We chose to show the chromatin landscape of a gene equally expressed in both organoids and spheroids (Krt19), a gene specifically expressed in spheroids (Tacstd2) and the endogenous Villin gene also expressed in both. We believe that the observation of a clear difference in pattern of the chromatin accessibility around the endogenous villin gene in organoids and spheroids provides an explanation to the observed results. The cis regulatory sequences needed for expression of the endogenous villin gene seem to be different in organoids and spheroids, which may explain why the regulatory sequences present in the transgene (only 12.4kb) might not allow expression of the transgene in spheroids. We have added a sentence in the manuscript clarifying this point. Missing is obviously the chromatin landscape around the VilCre transgene, but this is beyond reach in such kind of experiments.

      Reviewer #2 (Significance (Required)):

      The fact that the authors developed a protocol to reproducibly generate fetal-like spheroids from adult tissue is an important advancement in the field. Previous reports have shown that treatment with various small molecule inhibitors can revert budding organoids into a spheroid morphology, but this manuscript demonstrates that spheroids can also be generated from otherwise untreated cells. This new methodology will provide new tools to dissect the molecular determinants of fetal/regenerative cells in the gut. Based on this, the manuscript should attract a significant amount of attention in the intestinal field.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): CR-2024-02491

      An Lgr5-independent developmental lineage is involved in mouse intestinal regeneration

      Marefati et al.

      Homeostatic maintenance of the intestinal epithelium has long been thought to rely upon Wnt signaling responsive Lgr5-expressing stem cells that reside at the crypt base.

      However, myriad reported mechanisms or populations have been reported to underlie epithelial regeneration after injury. Many groups have reported that reacquisition of a fetal- link intestinal phenotype is an import part of the regenerative response, however the originating cell type has not been definitively identified. Herein, the authors demonstrate that cells from adult homeostatic intestine can generate immortal spheroids that resemble fetal spheroids and are derived independent of Lgr5+ intestinal stem cells (ISCs). The authors then draw the conclusion that this indicates that a hierarchical stem cell model applies to regeneration of the intestinal epithelium, in addition to the plasticity model.

      • *

      Comments:

      1. Please indicate what species is used for studies in Fig 1.

      All experiments were performed in Mus musculus.

      Please clarify if Figure 2 studies utilize Matrigel or not.

      Yes

      RNA-seq analyses of adult intestinal generated spheroids lack the granularity of single cell analyses and thus it is unclear if this is a homogeneous population or if the population has diversity across it (i.e., enteroids/organoids have a high level of diversity). Many of the conclusions from the RNA-seq study are broad and generalized-for example Fig 3F indicates that markers of the +4 ISC populations (Bmi1, tert, lrig1, hopx) were all expressed similarly in adult spheroids as compared to adult organoids. However, while this may be true in the bulk-RNA-seq analyses, clearly scRNA-seq would provide a better foundation to make this statement, as enteroids/organoids are comprised of heterogeneous subpopulations. . .and it might indicate that these +4 markers have only very low expression in the spheroids. Based upon these concerns, misconclusions are likely to be drawn.

      We agree and it would be certainly worthwhile to perform scRNAseq of adult spheroid populations. This would certainly be worth doing in future studies to explore the possible heterogeneity of adult spheroids. We nevertheless believe that our scRNAseq performed on homeostatic intestinal tissue from VilCre/Rosa26Tom mice identify Olfm4-low VilCre-neg cells that are likely at the origin of adult spheroids and display a quite homogenous phenotype.

      *The language around Figure 4 results is confusing. Please define "white" and "red". It might be simpler to designate recombined versus not recombined lineage.

      *

      We have clarified this in the figure.

      The hypothesis that collagenase/dispase solution acts as a proxy for injury is not demonstrated and backed by data. Thus, it is difficult to make the conclusion that this approach could represent a "stable avatar" of intestinal regenerating cells. It is clear that subpopulations of crypt-based cells generate spheroids in culture without collagenase/dispase (see the cited reference Smith et al, 2018).

      * *Smith et al demonstrate clearly the possibility to obtain spheroids with properties probably similar to ours from EDTA derived intestinal crypt cells. However they need to prepurify them by FACS. Besides, Nusse et al describe spheroids similar to ours after infection of the intestine by helminths (Nusse et al. 2018). In our case, and for most labs preparing enteroids with the EDTA protocol, the result is close to 100% organoids. Even if we treat EDTA organoids with collagenase, we do not obtain spheroids. This brought us to the conclusion that spheroid-generating cells must be more tightly attached to the matrix than CBCs and that it is their release from the matrix that activates the spheroid regeneration-like phenotype. This hypothesis is supported by several models of regeneration in other tissues as indicated in our references (Gilbert et al., 2010; Machado et al., 2021; Montarras et al., 2005)

      A study based on the absence of recombination in a VilCre lineage tracing scenario is not well-established to be strong experimental approach, as there are many reasons why recombination may not cells may not be lineage marked. In order to use this system as the authors intend, they first need to demonstrate that villin is not expressed in the discrete cell population that they are targeting. For the presented observational studies, this would be difficult to do. While they do demonstrate differences in chromatin accessibility between cells from organoids versus spheroids (fig s4), some of these differences could merely be due to the bulk analytical nature of the study and the lack of comparing stem cell populations from spheroids to stem cell populations from organoids-since the spheroids are likely homogenous versus the organoids that only have a small fraction of stem cells-and thus represent a mix of stem cell and differentiated cell populations. The authors do not demonstrate that villin protein expression varies in these cells.

      If it were found that villin is not expressed in their "novel" population, then one would expect that the downstream use of villin-based recombination would demonstrate the same recombination potential (i.e., Mcl1 would not be recombined). Both recombination studies in Fig 6 are difficult to interpret, and thus it is not clear if these studies support the stated conclusions. Quantification of number of crypts that are negative should be reported as a percentage of recombined crypts.

      We are sorry but there seems to be a complete misunderstanding of our data regarding the point raised by the referee. The important point of our initial observation is that despite robust expression of villin in spheroids, the VilCre transgene is not expressed (see figure 4E). This in our opinion makes absence of VilCre expression (or of Rosa marker recombination) a trustful marker of a new developmental lineage. All the data in figure 4 constitute an answer.

      *The reasoning about heterogeneity of cell type in organoids versus probable homogeneity of spheroids is well taken. However, as the endogenous villin gene is expressed in all cells of both organoids and spheroids, it is highly significant that only spheroids do not express the transgene. *

      We performed the ATACseq experiment to search for an explanation to the paradoxical absence of expression of the VilCre transgene in spheroids, despite robust expression of endogenous villin (Suppl Fig.4). We chose to show the chromatin landscape of a gene equally expressed in both organoids and spheroids (Krt19), a gene specifically expressed in spheroids (Tacstd2) and the endogenous Villin gene also expressed in both. We believe that the observation of a clear difference in pattern of the chromatin accessibility around the endogenous villin gene in organoids and spheroids provides an explanation to the observed results. The cis regulatory sequences needed for expression of the endogenous villin gene seem to be different in organoids and spheroids, which may explain why the regulatory sequences present in the transgene (only 12.4kb) might not allow expression of the transgene in spheroids. We have added a sentence in the manuscript clarifying this point. Missing is obviously the chromatin landscape around the VilCre transgene, but this is beyond reach in such kind of experiments.

      *Figure 8 indicates that the cell population identified by scRNA-seq may be quiescent. Companion IF or IHC should be conducted to confirm this finding, as well as other conclusions from the informatics conducted.

      *

      We agree that additional experiments could be performed to support this point. We are unfortunately not in a position to perform these experiments (see section 4 below).

      Clearly the data is intriguing, however, the conclusion is strong and is an over interpretation of the presented data. There are a number of validation or extension data that would enhance the overall interpretation of the study: 1. validation of scRNA-seq or bulk RNA-seq concepts by protein staining of intestinal tissues in the damage model will serve as a secondary observation. 2. identification of the ISC that they are defining is critical and important. There is already the notion that this cell type exists and it has been shown with various different markers. 3. expand the analyses of the fetal-like expression profiling to injured intestines to demonstrate that the lineage negative cells indeed express fetal-like proteins. 4. expand the discussion of the Clu+ cell type. Is this cell the previously described revival cell? If so, how does this body of work provide unique aspects to the field?

      We agree that all these suggested experiments could be performed and would be of interest. However, we consider that they would not modify the main message of our study and would only constitute an expansion of the present work. As already stated, we are not in the position to perform them (see section 4).

      *There is some level of conflicting data, with the stem population being proliferative in culture stimulated by the stromal cells, but quiescent in vivo and also based upon scRNA- seq data in Fig 9.

      *

      We do not see any conflict in our observation regarding this point. The observation that cells that are quiescent in vivo become proliferative when subjected to culture (with or without addition of stromal cells) is routinely made in a multitude of cell culture systems. In particular, it has been shown that intestinal tissue dissociation activates the Yap/Taz pathway, resulting in proliferation (Yu et al. Hippo Pathway Regulation of Gastrointestinal Tissues. Annual Review of Physiology, 2015 Volume 77, 201-227).

      Many of the findings have been previously reported: Population that grows as spheroids (Figure 2), Population that is Wnt independent (Figure 2), Lgr5 independent regenerative growth of the intestine (figure 3F, Figure 4), Clu+ ISCs drive regeneration (Figure 7).

      Whereas these individual findings have indeed been reported, it was in a different context. We strongly disagree with the underlying suggestion that our study would not bring new information. We have identified here a developmental lineage involved in intestinal regeneration that has not been described up to now.

      Minor comments:

        • The statement that spheroids must originate from collagenase/dispase digested material might be an overstatement. As spheroids generation from EDTA treated intestines have been previously reported (Smith et al, 2018). * See answer to point 4 above. *Overall while the study includes an extensive amount of work and different approaches, a foundationally supported novel finding is missing. Many of the statements have already been demonstrated by others in the fields. In addition, one of the most intriguing aspects of the study is that the stromal population impacts this stem cell population, however, interactions and factors stimulating the crosstalk are not addressed.

      *

      Reviewer #3 (Significance (Required)):

      Overal while the study includes an extensive amount of work and different approaches, a foundationally supported novel finding is missing. Many of the statements have already been demonstrated by others in the fields. In addition, one of the most intriguing aspects of the study is that the stromal population impacts this stem cell population, however, interactions and factors stimulating the crosstalk are not addressed.

      We can only disagree.

      4. Description of analyses that authors prefer not to carry out

      • *

      We have answered most questions raised by the referees by explaining our view, by clarifying individual points and, in several cases, by providing additional information that was not included in the original manuscript.

      In a limited number of cases when additional experiments were suggested, we were unfortunately obliged to write that we are not in a position to perform them. This is because my lab is closing after more than fifty years of uninterrupted activity. There will unfortunately be nobody to perform additional experiments.

      Nevertheless, as written by referees 1 and 2, we believe that the revised manuscript, as it stands, contains data that will be of interest to the people in the field and may be the bases for future developments. We hope editors will find interest in publishing it.

    1. Reviewer #2 (Public Review):

      Summary:

      Schmidlin & Apodaca et al. aim to distinguish mutants that resist drugs via different mechanisms by examining fitness tradeoffs across hundreds of fluconazole-resistant yeast strains. They barcoded a collection of fluconazole-resistant isolates and evolved them in different environments with a view to having relevance for evolutionary theory, medicine, and genotype-phenotype mapping.

      Strengths:

      There are multiple strengths to this paper, the first of which is pointing out how much work has gone into it; the quality of the experiments (the thought process, the data, the figures) is excellent. Here, the authors seek to induce mutations in multiple environments, which is a really large-scale task. I particularly like the attention paid to isolates with are resistant to low concentrations of FLU. So often these are overlooked in favour of those conferring MIC values >64/128 etc. What was seen is different genotype and fitness profiles. I think there's a wealth of information here that will actually be of interest to more than just the fields mentioned (evolutionary medicine/theory).

      Weaknesses:

      Not picking up low fitness lineages - which the authors discuss and provide a rationale as to why. I can completely see how this has occurred during this research, and whilst it is a shame I do not think this takes away from the findings of this paper. Maybe in the next one!

      In the abstract the authors focus on 'tradeoffs' yet in the discussion they say the purpose of the study is to see how many different mechanisms of FLU resistance may exist (lines 679-680), followed up by "We distinguish mutants that likely act via different mechanisms by identifying those with different fitness tradeoffs across 12 environments". Whilst I do see their point, and this is entirely feasible, I would like a bit more explanation around this (perhaps in the intro) to help lay-readers make this jump. The remainder of my comments on 'weaknesses' are relatively fixable, I think:

      In the introduction I struggle to see how this body of research fits in with the current literature, as the literature cited is a hodge-podge of bacterial and fungal evolution studies, which are very different! So example, the authors state "previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms" (lines 129-131) and then cite three papers, only one of which is a fungal research output. However, the next sentence focuses solely on literature from fungal research. Citing bacterial work as a foundation is fine, but as you're using yeast for this I think tailoring the introduction more to what is and isn't known in fungi would be more appropriate. It would also be great to then circle back around and mention monotherapy vs combination drug therapy for fungal infections as a rationale for this study. The study seems to be focused on FLU-resistant mutants, which is the first-line drug of choice, but many (yeast) infections have acquired resistance to this and combination therapy is the norm.

      Methods: Line 769 - which yeast? I haven't even seen mention of which species is being used in this study; different yeast employ different mechanisms of adaptation for resistance, so could greatly impact the results seen. This could help with some background context if the species is mentioned (although I assume S. cerevisiae). In which case, should aneuploidy be considered as a mechanism? This is mentioned briefly on line 556, but with all the sequencing data acquired this could be checked quickly?

      I think the authors could be bolder and try and link this to other (pathogenic) yeasts. What are the implications of this work on say, Candida infections?

    1. Author response:

      We thank you for the opportunity to provide a concise response. The criticisms are accurately summarized in the eLife assessment:

      the study fails to engage prior literature that has extensively examined the impact of variance in offspring number, implying that some of the paradoxes presented might be resolved within existing frameworks.

      The essence of our study is to propose the adoption of the Haldane model of genetic drift, based on the branching process, in lieu of the Wright-Fisher (WF) model, based on sampling, usually binomial.  In addition to some extensions of the Haldane model, we present 4 paradoxes that cannot be resolved by the WF model. The reviews suggest that some of the paradoxes could be resolved by the WF model, if we engage prior literature sufficiently.

      We certainly could not review all the literature on genetic drift as there must be thousands of them. Nevertheless, the literature we do not cover is based on the WF model, which has the general properties that all modifications of the WF model share.  (We should note that all such modifications share the sampling aspect of the WF model. To model such sampling, N is imposed from outside of the model, rather than self-generating within the model.  Most important, these modifications are mathematically valid but biologically untenable, as will be elaborated below. Thus, in concept, the WF and Haldane models are fundamentally different.)

      In short, our proposal is general with the key point that the WF model cannot resolve these (and many other) paradoxes.  The reviewers disagree (apparently only partially) and we shall be specific in our response below.

      We shall first present the 4th paradox, which is about multi-copy gene systems (such as rRNA genes and viruses, see the companion paper). Viruses evolve both within and between hosts. In both stages, there are severe bottlenecks.  How does one address the genetic drift in viral evolution? How can we model the effective population sizes both within- and between- hosts?  The inability of the WF model in dealing with such multi-copy gene systems may explain the difficulties in accounting for the SARS-CoV-2 evolution. Given the small number of virions transmitted between hosts, drift is strong which we have shown by using the Haldane model (Ruan, Luo, et al. 2021; Ruan, Wen, et al. 2021; Hou, et al. 2023). 

      As the reviewers suggest, it is possible to modify the WF model to account for some of these paradoxes. However, the modifications are often mathematically convenient but biologically dubious. Much of the debate is about the progeny number, K.  (We shall use haploid model for this purpose but diploidy does not pose a problem as stated in the main text.) The modifications relax the constraint of V(k) = E(k) inherent in the WF sampling.  One would then ask how V(k) can be different from E(k) in the WF sampling even though it is mathematically feasible (but biologically dubious)?  Kimura and Crow (1963) may be the first to offer a biological explanation.  If one reads it carefully, Kimura's modification is to make the WF model like the Haldane model. Then, why don't we use the Haldane model in the first place by having two parameters, E(k) and V(k), instead of the one-parameter WF model?

      The Haldane model is conceptually simpler. It allows the variation in population size, N, to be generated from within the model, rather than artificially imposed from outside of the model.  This brings us to the first paradox, the density-dependent Haldane model. When N is increasing exponentially as in bacterial or yeast cultures, there is almost no drift when N is very low and drift becomes intense as N grows to near the carrying capacity.  We do not see how the WF model can resolve this paradox, which can otherwise be resolved by the Haldane model.

      The second and third paradoxes are about how much mathematical models of population genetic can be detached from biological mechanisms. The second paradox about sex chromosomes is rooted in the realization of V(k) ≠ E(k).  Since E(k) is the same between sexes but V(k) is different, how does the WF sampling give rise to V(k) ≠ E(k)? We are asking a biological question that troubled Kimura and Crow (1963) alluded above. The third paradox is acknowledged by two reviewers. Genetic drift manifested in the fixation probability of an advantageous mutation is 2s/V(k).  It is thus strange that the fundamental parameter of drift in the WF model, N (or Ne), is missing.  In the Haldane model, drift is determined by V(k) with N being a scaling factor; hence 2s/V(k) makes perfect biological sense,

      We now answer the obvious question: If the model is fundamentally about the Haldane model, why do we call it the WF-Haldane model? The reason is that most results obtained by the WF model are pretty good approximations and the branching process may not need to constantly re-derive the results.  At least, one can use the WF results to see how well they fit into the Haldane model. In our earlier study (Chen, et al. (2017); Fig. 3), we show that the approximations can be very good in many (or most) settings.

      We would like to use the modern analogy of gas-engine cars vs. electric-motor ones. The Haldane model and the WF model are as fundamentally different concepts as the driving mechanisms of gas-powered vs electric cars.  The old model is now facing many problems and the fixes are often not possible.  Some fixes are so complicated that one starts thinking about simpler solutions. The reservations are that we have invested so much in the old models which might be wasted by the switch. However, we are suggesting the integration of the WF and Haldane models. In this sense, the WF model has had many contributions which the new model gratefully inherits. This is true with the legacy of gas-engine cars inherited by EVs.

      The editors also issue the instruction: while the modified model yields intriguing theoretical predictions, the simulations and empirical analyses are incomplete to support the authors' claims. 

      We are thankful to the editors and reviewers for the thoughtful comments and constructive criticisms. We also appreciate the publishing philosophy of eLife that allows exchanges, debates and improvements, which are the true spirits of science publishing.

      References for the provisional author responses

      Chen Y, Tong D, Wu CI. 2017. A New Formulation of Random Genetic Drift and Its Application to the Evolution of Cell Populations. Mol. Biol. Evol. 34:2057-2064.

      Hou M, Shi J, Gong Z, Wen H, Lan Y, Deng X, Fan Q, Li J, Jiang M, Tang X, et al. 2023. Intra- vs. Interhost Evolution of SARS-CoV-2 Driven by Uncorrelated Selection-The Evolution Thwarted. Mol. Biol. Evol. 40.

      Kimura M, Crow JF. 1963. The measurement of effective population number. Evolution:279-288.

      Ruan Y, Luo Z, Tang X, Li G, Wen H, He X, Lu X, Lu J, Wu CI. 2021. On the founder effect in COVID-19 outbreaks: how many infected travelers may have started them all? Natl. Sci. Rev. 8:nwaa246.

      Ruan Y, Wen H, He X, Wu CI. 2021. A theoretical exploration of the origin and early evolution of a pandemic. Sci Bull (Beijing) 66:1022-1029.

      Review comments

      eLife assessment 

      This study presents a useful modification of a standard model of genetic drift by incorporating variance in offspring numbers, claiming to address several paradoxes in molecular evolution.

      It is unfortunate that the study fails to engage prior literature that has extensively examined the impact of variance in offspring number, implying that some of the paradoxes presented might be resolved within existing frameworks.

      We do not believe that the paradoxes can be resolved.

      In addition, while the modified model yields intriguing theoretical predictions, the simulations and empirical analyses are incomplete to support the authors' claims. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors present a theoretical treatment of what they term the "Wright-Fisher-Haldane" model, a claimed modification of the standard model of genetic drift that accounts for variability in offspring number, and argue that it resolves a number of paradoxes in molecular evolution. Ultimately, I found this manuscript quite strange.

      The notion of effective population size as inversely related to the variance in offspring number is well known in the literature, and not exclusive to Haldane's branching process treatment. However, I found the authors' point about variance in offspring changing over the course of, e.g. exponential growth fairly interesting, and I'm not sure I'd seen that pointed out before.

      Nonetheless, I don't think the authors' modeling, simulations, or empirical data analysis are sufficient to justify their claims. 

      Weaknesses: 

      I have several outstanding issues. First of all, the authors really do not engage with the literature regarding different notions of an effective population. Most strikingly, the authors don't talk about Cannings models at all, which are a broad class of models with non-Poisson offspring distributions that nonetheless converge to the standard Wright-Fisher diffusion under many circumstances, and to "jumpy" diffusions/coalescents otherwise (see e.g. Mohle 1998, Sagitov (2003), Der et al (2011), etc.). Moreover, there is extensive literature on effective population sizes in populations whose sizes vary with time, such as Sano et al (2004) and Sjodin et al (2005).

      Of course in many cases here the discussion is under neutrality, but it seems like the authors really need to engage with this literature more. 

      The most interesting part of the manuscript, I think, is the discussion of the Density Dependent Haldane model (DDH). However, I feel like I did not fully understand some of the derivation presented in this section, which might be my own fault. For instance, I can't tell if Equation 5 is a result or an assumption - when I attempted a naive derivation of Equation 5, I obtained E(K_t) = 1 + r/c*(c-n)*dt. It's unclear where the parameter z comes from, for example. Similarly, is equation 6 a derivation or an assumption? Finally, I'm not 100% sure how to interpret equation 7. I that a variance effective size at time t? Is it possible to obtain something like a coalescent Ne or an expected number of segregating sites or something from this? 

      Similarly, I don't understand their simulations. I expected that the authors would do individual-based simulations under a stochastic model of logistic growth, and show that you naturally get variance in offspring number that changes over time. But it seems that they simply used their equations 5 and 6 to fix those values. Moreover, I don't understand how they enforce population regulation in their simulations---is N_t random and determined by the (independent) draws from K_t for each individual? In that case, there's no "interaction" between individuals (except abstractly, since logistic growth arises from a model that assumes interactions between individuals). This seems problematic for their model, which is essentially motivated by the fact that early during logistic growth, there are basically no interactions, and later there are, which increases variance in reproduction. But their simulations assume no interactions throughout! 

      The authors also attempt to show that changing variance in reproductive success occurs naturally during exponential growth using a yeast experiment. However, the authors are not counting the offspring of individual yeast during growth (which I'm sure is quite hard). Instead, they use an equation that estimates the variance in offspring number based on the observed population size, as shown in the section "Estimation of V(K) and E(K) in yeast cells". This is fairly clever, however, I am not sure it is right, because the authors neglect covariance in offspring between individuals. My attempt at this derivation assumes that I_t | I_{t-1} = \sum_{I=1}^{I_{t-1}} K_{i,t-1} where K_{i,t-1} is the number of offspring of individual i at time t-1. Then, for example, E(V(I_t | I_{t-1})) = E(V(\sum_{i=1}^{I_{t-1}} K_{i,t-1})) = E(I_{t-1})V(K_{t-1}) + E(I_{k-1}(I_{k-1}-1))*Cov(K_{i,t-1},K_{j,t-1}). The authors have the first term, but not the second, and I'm not sure the second can be neglected (in fact, I believe it's the second term that's actually important, as early on during growth there is very little covariance because resources aren't constrained, but at carrying capacity, an individual having offspring means that another individuals has to have fewer offspring - this is the whole notion of exchangeability, also neglected in this manuscript). As such, I don't believe that their analysis of the empirical data supports their claim. 

      Thus, while I think there are some interesting ideas in this manuscript, I believe it has some fundamental issues:

      first, it fails to engage thoroughly with the literature on a very important topic that has been studied extensively. Second, I do not believe their simulations are appropriate to show what they want to show. And finally, I don't think their empirical analysis shows what they want to show. 

      References: 

      Möhle M. Robustness results for the coalescent. Journal of Applied Probability. 1998;35(2):438-447. doi:10.1239/jap/1032192859 

      Sagitov S. Convergence to the coalescent with simultaneous multiple mergers. Journal of Applied Probability. 2003;40(4):839-854. doi:10.1239/jap/1067436085 

      Der, Ricky, Charles L. Epstein, and Joshua B. Plotkin. "Generalized population models and the nature of genetic drift." Theoretical population biology 80.2 (2011): 80-99 

      Sano, Akinori, Akinobu Shimizu, and Masaru Iizuka. "Coalescent process with fluctuating population size and its effective size." Theoretical population biology 65.1 (2004): 39-48 

      Sjodin, P., et al. "On the meaning and existence of an effective population size." Genetics 169.2 (2005): 1061-1070 

      Reviewer #2 (Public Review): 

      Summary: 

      This theoretical paper examines genetic drift in scenarios deviating from the standard Wright-Fisher model. The authors discuss Haldane's branching process model, highlighting that the variance in reproductive success equates to genetic drift. By integrating the Wright-Fisher model with the Haldane model, the authors derive theoretical results that resolve paradoxes related to effective population size. 

      Strengths: 

      The most significant and compelling result from this paper is perhaps that the probability of fixing a new beneficial mutation is 2s/V(K). This is an intriguing and potentially generalizable discovery that could be applied to many different study systems. 

      The authors also made a lot of effort to connect theory with various real-world examples, such as genetic diversity in sex chromosomes and reproductive variance across different species. 

      Weaknesses: 

      One way to define effective population size is by the inverse of the coalescent rate. This is where the geometric mean of Ne comes from. If Ne is defined this way, many of the paradoxes mentioned seem to resolve naturally. If we take this approach, one could easily show that a large N population can still have a low coalescent rate depending on the reproduction model. However, the authors did not discuss Ne in light of the coalescent theory. This is surprising given that Eldon and Wakeley's 2006 paper is cited in the introduction, and the multiple mergers coalescent was introduced to explain the discrepancy between census size and effective population size, superspreaders, and reproduction variance - that said, there is no explicit discussion or introduction of the multiple mergers coalescent. 

      The Wright-Fisher model is often treated as a special case of the Cannings 1974 model, which incorporates the variance in reproductive success. This model should be discussed. It is unclear to me whether the results here have to be explained by the newly introduced WFH model, or could have been explained by the existing Cannings model. 

      The abstract makes it difficult to discern the main focus of the paper. It spends most of the space introducing "paradoxes". 

      The standard Wright-Fisher model makes several assumptions, including hermaphroditism, non-overlapping generations, random mating, and no selection. It will be more helpful to clarify which assumptions are being violated in each tested scenario, as V(K) is often not the only assumption being violated. For example, the logistic growth model assumes no cell death at the exponential growth phase, so it also violates the assumption about non-overlapping generations. 

      The theory and data regarding sex chromosomes do not align. The fact that \hat{alpha'} can be negative does not make sense. The authors claim that a negative \hat{alpha'} is equivalent to infinity, but why is that? It is also unclear how theta is defined. It seems to me that one should take the first principle approach e.g., define theta as pairwise genetic diversity, and start with deriving the expected pair-wise coalescence time under the MMC model, rather than starting with assuming theta = 4Neu. Overall, the theory in this section is not well supported by the data, and the explanation is insufficient. 

      {Alpha and alpha' can both be negative.  X^2 = 0.47 would yield x = -0.7}

      Reviewer #3 (Public Review): 

      Summary: 

      Ruan and colleagues consider a branching process model (in their terminology the "Haldane model") and the most basic Wright-Fisher model. They convincingly show that offspring distributions are usually non-Poissonian (as opposed to what's assumed in the Wright-Fisher model), and can depend on short-term ecological dynamics (e.g., variance in offspring number may be smaller during exponential growth). The authors discuss branching processes and the Wright-Fisher model in the context of 3 "paradoxes": (1) how Ne depends on N might depend on population dynamics; (2) how Ne is different on the X chromosome, the Y chromosome, and the autosomes, and these differences do match the expectations base on simple counts of the number of chromosomes in the populations; (3) how genetic drift interacts with selection. The authors provide some theoretical explanations for the role of variance in the offspring distribution in each of these three paradoxes. They also perform some experiments to directly measure the variance in offspring number, as well as perform some analyses of published data. 

      Strengths: 

      (1) The theoretical results are well-described and easy to follow. 

      (2) The analyses of different variances in offspring number (both experimentally and analyzing public data) are convincing that non-Poissonian offspring distributions are the norm. 

      (3) The point that this variance can change as the population size (or population dynamics) change is also very interesting and important to keep in mind. 

      (4) I enjoyed the Density-Dependent Haldane model. It was a nice example of the decoupling of census size and effective size. 

      Weaknesses: 

      (1) I am not convinced that these types of effects cannot just be absorbed into some time-varying Ne and still be well-modeled by the Wright-Fisher process. 

      (2) Along these lines, there is well-established literature showing that a broad class of processes (a large subset of Cannings' Exchangeable Models) converge to the Wright-Fisher diffusion, even those with non-Poissonian offspring distributions (e.g., Mohle and Sagitov 2001). E.g., equation (4) in Mohle and Sagitov 2001 shows that in such cases the "coalescent Ne" should be (N-1) / Var(K), essentially matching equation (3) in the present paper. 

      (3) Beyond this, I would imagine that branching processes with heavy-tailed offspring distributions could result in deviations that are not well captured by the authors' WFH model. In this case, the processes are known to converge (backward-in-time) to Lambda or Xi coalescents (e.g., Eldon and Wakely 2006 or again in Mohle and Sagitov 2001 and subsequent papers), which have well-defined forward-in-time processes. 

      (4) These results that Ne in the Wright-Fisher process might not be related to N in any straightforward (or even one-to-one) way are well-known (e.g., Neher and Hallatschek 2012; Spence, Kamm, and Song 2016; Matuszewski, Hildebrandt, Achaz, and Jensen 2018; Rice, Novembre, and Desai 2018; the work of Lounès Chikhi on how Ne can be affected by population structure; etc...) 

      (5) I was also missing some discussion of the relationship between the branching process and the Wright-Fisher model (or more generally Cannings' Exchangeable Models) when conditioning on the total population size. In particular, if the offspring distribution is Poisson, then conditioned on the total population size, the branching process is identical to the Wright-Fisher model. 

      (6) In the discussion, it is claimed that the last glacial maximum could have caused the bottleneck observed in human populations currently residing outside of Africa. Compelling evidence has been amassed that this bottleneck is due to serial founder events associated with the out-of-Africa migration (see e.g., Henn, Cavalli-Sforza, and Feldman 2012 for an older review - subsequent work has only strengthened this view). For me, a more compelling example of changes in carrying capacity would be the advent of agriculture ~11kya and other more recent technological advances. 

      Recommendations for the authors: 

      Reviewing Editor Comments: 

      The reviewers recognize the value of this model and some of the findings, particularly results from the density-dependent Haldane model. However, they expressed considerable concerns with the model and overall framing of this manuscript.

      First, all reviewers pointed out that the manuscript does not sufficiently engage with the extensive literature on various models of effective population size and genetic drift, notably lacking discussion on Cannings models and related works.

      Second, there is a disproportionate discussion on the paradoxes, yet some of the paradoxes might already be resolved within current theoretical frameworks. All three reviewers found the modeling and simulation of the yeast growth experiment hard to follow or lacking justification for certain choices. The analysis approach of sex chromosomes is also questioned. 

      The reviewers recommend a more thorough review of relevant prior literature to better contextualize their findings. The authors need to clarify and/or modify their derivations and simulations of the yeast growth experiment to address the identified caveats and ensure robustness. Additionally, the empirical analysis of the sex chromosome should be revisited, considering alternative scenarios rather than relying solely on the MSE, which only provides a superficial solution. Furthermore, the manuscript's overall framing should be adjusted to emphasize the conclusions drawn from the WFH model, rather than focusing on the "unresolved paradoxes", as some of these may be more readily explained by existing frameworks. Please see the reviewers' overall assessment and specific comments. 

      Reviewer #2 (Recommendations For The Authors): 

      In the introduction -- "Genetic drift is simply V(K)" -- this is a very strong statement. You can say it is inversely proportional to V(K), but drift is often defined based on changes in allele frequency. 

      Page 3 line 86. "sexes is a sufficient explanation."--> "sex could be a sufficient explanation" 

      The strongest line of new results is about 2s/V(K). Perhaps, the paper could put more emphasis on this part and demonstrate the generality of this result with a different example. 

      The math notations in the supplement are not intuitive. e.g., using i_k and j_k as probabilities. I also recommend using E[X] and V[X]for expectation and variance rather than \italic{E(X)} to improve the readability of many equations. 

      Eq A6, A7, While I manage to follow, P_{10}(t) and P_{10} are not defined anywhere in the text. 

      Supplement page 7, the term "probability of fixation" is confusing in a branching model. 

      E.q. A 28. It is unclear eq. A.1 could be used here directly. Some justification would be nice. 

      Supplement page 17. "the biological meaning of negative..". There is no clear justification for this claim. As a reader, I don't have any intuition as to why that is the case.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment:

      Franke et al. explore and characterize the color response properties in the mouse primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The data is solid; however, the evidence supporting some conclusions is incomplete. In its current form, the paper makes a useful contribution to how color is coded in mouse V1. Significance would be enhanced with some additional analyses and a clearer discussion of the limitations of the data presented.

      We thank the reviewers for appreciating our manuscript. We have rewritten the conclusions of the paper to be more conservative and now more explicitly focus on color processing in mouse V1, rather than comparing V1 to the retina. Additionally, we discuss the limitations of our approach in detail in the Discussion section. Finally, we have addressed all comments from the reviewers below.

      Referee 1 (Remarks to the Author):

      In this study, Franke et al. explore and characterize color response properties across primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The authors use awake 2P imaging to define the spectral response properties of visual interneurons in layer 2/3. They find that opponent responses are more pronounced at photopic light levels, and that diversity in color opponent responses exists across the visual field, with green ON/ UV OFF responses more strongly represented in the upper visual field. This is argued to be relevant for the detection of certain features that are more salient when using chromatic space, possibly due to noise reduction. In the revised version, Franke et al. have addressed the potential pitfalls in the discussion, which is an important point for the non-expert reader. Thus, this study provides a solid characterization of the color properties of V1 and is a valuable addition to visual neuroscience research.

      My remaining concerns are based more on the interpretation. I’m still not convinced by the statement "This type of color-opponency in the receptive field center of V1 neurons was not present in the receptive field center of retinal ganglion cells and, therefore, is likely computed by integrating center and surround information downstream of the retina." and I would suggest rewording it in the abstract.

      As discussed previously and now nicely added to the discussion, it is difficult to make a direct comparison given the different stimulus types used to characterize the retina and V1 recordings and the different levels of adaptation in both tissues. I will leave this point to the discussion, which allows for a more nuanced description of the phenomenon. Why do I think this is important? In the introduction, the authors argue that "the discrepancy [of previous studies] may be due to differences in stimulus design or light levels." However, while different light levels can be tested in V1, this cannot be done properly in the retina with 2P experiments. To address this, one would have to examine color-opponency in RGC terminals in vivo, which is beyond the scope of this study. Addressing these latter points directly in the discussion would, in my opinion, only strengthen the study.

      We thank the reviewer for the feedback. We removed the sentence mentioned by the reviewer from the abstract, as well as from the summary of our results in the Introduction. Additionally, we now phrase the interpretation of the retinal results more conservatively and specifically highlight in the Discussion that comparing ex-vivo retinal to in-vivo cortical data is challenging. With these changes, we believe that the focus of the paper is explicitly defined to be on the neuronal representation of color in mouse visual cortex, rather than on the comparison of retinal and cortical color processing.

      Minor:

      In the abstract, the second sentence says that we already know the mechanisms in primates.

      Unfortunately, I do not think this is true. First, primates refers to an order with several species, which might have adaptations to their color-processing. Second, I’m aware of several characterizations in "primates" that have led to convincing models (as referenced), but in my opinion, this is far from a true understanding the mechanisms, especially since very little is known about foveal color processing due to the difficulties of these experiments. Similarly in the introduction. "Primates" is indirectly defined as a species. Perhaps some rewording is needed here as well, since we know how different cone distributions can be in rodents (see Peichl’s work).

      Thanks. We have reworded the Abstract and Introduction towards indicating that many studies have been performed in primate species, without suggesting that the mechanisms are described.

      The legend in Fig. 2 has a "Fig. ???"

      Fixed.

      Referee 2 (Remarks to the Author):

      Franke et al. characterize the representation of color in the primary visual cortex of mice, highlighting how this changes across the visual field. Using calcium imaging in awake, head-fixed mice, they characterize the properties of V1 neurons (layer 2/3) using a large center-surround stimulation where green and ultra-violet colors were presented in random combinations. Clustering of responses revealed a set of functional cell-types based on their preference to different combinations of green and UV in their center and surround. These functional types were demonstrated to have different spatial distributions across V1, including one neuronal type (Green-ON/UV-OFF) that was much more prominent in the posterior V1 (i.e. upper visual field). Modelling work suggests that these neurons likely support the detection of predator-like objects in the sky.

      Strengths: The large-scale single-cell resolution imaging used in this work allows the authors to map the responses of individual neurons across large regions of the visual cortex. Combining this large dataset with clustering analysis enabled the authors to group V1 neurons into distinct functional cell types and demonstrate their relative distribution in the upper and lower visual fields. Modelling work demonstrated the different capacity of each functional type to detect objects in the sky, providing insight into the ethological relevance of color opponent neurons in V1.

      We thank the reviewer for appreciating our study.

      Weaknesses: While the study presents convincing evidence about the asymmetric distribution of color-opponent neurons in V1, the paper would greatly benefit from a more in-depth discussion of the caveats related to the conclusions drawn about their origin. This is particularly relevant regarding the conclusion drawn about the contribution of color opponent neurons in the retina. The mismatch between retinal color opponency and V1 color opponency could imply that this feature is not solely inherited from the retina, however, there are other plausible explanations that are not discussed here. Direct evidence for this statement remains weak.

      Thanks for this comment. We removed the retinal findings from the abstract, as well as from the summary of our results in the Introduction. In addition, we now phrase the interpretation of the retinal results more conservatively and specifically highlight in the Discussion that comparing ex-vivo retinal to in-vivo cortical data is challenging. With these changes, we believe that the focus of the paper is explicitly defined to be on the neuronal representation of color in mouse visual cortex, rather than on the comparison of retinal and cortical color processing.

      In addition, the paper would benefit from adding explicit neuron counts or percentages to the quadrants of each of the density plots in Figures 2-5. The variance explained by the principal components does not capture the percentage of color opponent cells. Additionally, there appear to be some remaining errors in the figure legend and labels that have not been addressed (e.g. ’??’ in Fig 2 legend).

      Thank you for this suggestion. We believe that adding the numbers or percentages to the figure panels would make them too crowded. Instead, we have now mentioned in the Results section and the legends that the percentages of variance explained by the color (off-diagonal) and luminance axis (diagonal) correlate with the number of neurons located in the color (top left and bottom right) and luminance contrast quadrants (top right and bottom left), respectively. Together with the number of neurons in each plot stated in the legends and the scale bar indicating the number of neurons per gray level, we hope this approach provides clarity for the reader to interpret the panels. Additionally, we have fixed the broken reference in the legend of Fig. 2.

      Overall, this study will be a valuable resource for researchers studying color vision, cortical processing, and the processing of ethologically relevant information. It provides a useful basis for future work on the origin of color opponency in V1 and its ethological relevance.

      General Suggestions:

      -  Please add possible caveats of using ETA method to the discussion section. For example, it is unclear to what extent ON/OFF cells are being overlooked by using ETA method.

      We now discuss the limitations of the ETA approach in the Discussion section.

      - The caveats of using the percentage of variance explained in the retina as evidence against V1 solely inheriting color-opponency from retinal output neurons are not adequately addressed. For example, could the mismatch in explained variance of the color axis between V1 and RGCs be explained by a subset of non-color opponent RGCs projecting elsewhere (not dLGN-V1) or that color opponent cells project to a larger number of neurons in V1 than non-color opponent cells? We suggest adding a paragraph to the discussion to address this issue.

      We have removed these conclusions from the paper, more carefully interpret the retinal results and mention that comparing ex-vivo retina data with in-vivo cortical data is challenging.

      - Please clarify how the different response types shown in Figure 5e-f lead to differences in noise detection and thereby differences in predator discriminability. For example, why does Gon/UVoff not respond to the noise scene while Goff/UVoff does?

      We added this to the Results section.

      - Please clarify the relationship between ETA amplitude, neural response probability, and neural response amplitude. For example, do color-opponent cells have equal absolute neural response amplitudes to the different colors?

      Thank you for bringing up this point. The ETA is obtained by summing the stimulus sequences that elicit an event (i.e., response), weighted by the amplitude of the response. Consequently, the absolute amplitude of the ETA correlates with the calcium amplitude. Importantly, the ETA amplitudes of different stimulus conditions are comparable because they were estimated on the same normalized calcium trace. Therefore, comparing the absolute amplitudes of ETAs of color-opponent neurons reveals the response magnitude of the cells to different colors. We have now included this information in the Results section.

      Abstract: - "more than a third of neurons in mouse V1 are color-opponent in their receptive field center". It is unclear what data supports this statement. Can you please provide a statement in the manuscript that supports this directly using the number of neurons?

      We added the following sentence to the Results section: Nevertheless, a substantial fraction of neurons (33.1%) preferred color-opponent stimuli and scattered along the off-diagonal in the upper left and lower right quadrants, especially for the RF center.

      Figure 2: - There is a ?? in the figure legend. Which figure should this refer to? - please provide explicit neuron counts/percentages for each quadrant in b.

      We fixed the figure reference. We believe that adding the numbers or percentages to the figure panels would make them too crowded. Instead, we have now mentioned in the Results section and the legends that the percentages of variance explained by the color (off-diagonal) and luminance axis (diagonal) correlate with the number of neurons located in the color (top left and bottom right) and luminance contrast quadrants (top right and bottom left), respectively. Together with the number of neurons in each plot stated in the legends and the scale bar indicating the number of neurons per gray level, we hope this approach provides clarity for the reader to interpret the panels.

      Figure 3: - Fig 3: Color scheme makes it very difficult to differentiate the different conditions, especially when printed.

      Thanks we changed the color scheme.

      - Add explicit neuron counts/percentages for each quadrant in b.

      See above.

      Figure 4: - Add explicit neuron counts/percentages for each quadrant in b.

      See above.

      Figure 5: - Add explicit neuron counts/percentages for each quadrant in c.

      See above.

      Methods: - "we modeled each response type to have a square RF with 10 degrees visual angle in diameter". There appears to be a mismatch between this statement and Figure 5e where 18 degrees is reported.

      Thanks we fixed that.

      Referee 3 (Remarks to the Author):

      This paper studies chromatic coding in mouse primary visual cortex. Calcium responses of a large collection of cells are measured in response to a simple spot stimulus. These responses are used to estimate chromatic tuning properties - specifically sensitivity to UV and green stimuli presented in a large central spot or a larger still surrounding region. Cells are divided based on their responses to these stimuli into luminance or chromatic sensitive groups. The results are interesting and many aspects of the experiments and conclusions are well done; several technical concerns, however, limit the support for several main conclusions,

      Limitations of stimulus choice The paper relies on responses to a large (37.5 degree diameter) modulated spot and surround region. This spot is considerably larger than the receptive fields of both V1 cells and retinal ganglion cells (it is twice the area of the average V1 receptive field). As a result, the spot itself is very likely to strongly activate both center and surround mechanisms, and responses of cells are likely to depend on where the receptive fields are located within the spot

      (and, e.g., how much of the true neural surround samples the center spot vs the surround region). Most importantly, the surrounds of most of the recorded cells will be strongly activated by the central spot. This brings into question statements in the paper about selective activation of center and surround (e.g. page 2, right column). This in turn raises questions about several subsequent analyses that rely on selective center and surround activation.

      Thank you for this comment. A similar point was raised by a reviewer in the first round of revision. We agree with the reviewers that it is critical to discuss both the rationale behind our stimulus design and its limitations to facilitate better interpretation by the reader.

      To be able to record from many V1 neurons simultaneously, we used a stimulus size of 37.5 degree visual angle in diameter, which is slightly larger than center RFs of single V1 neurons (between 20 - 30 degrees visual angle depending on the stimulus, see here). The disadvantage of this approach is that the stimulus is only roughly centered on the neurons’ center RFs. To reduce the impact of potential stimulus misalignment on our results, we used the following steps: { For each recording, we positioned the monitor such that the mean RF across all neurons lies within the center of the stimulus field of view.

      We confirmed that this procedure results in good stimulus alignment for the large majority of recorded neurons within individual recording fields by using a sparse noise stimulus (Suppl. Fig. 1a-c). Specifically, we found that for 83% of tested neurons, more than two thirds of their center RF, determined by the sparse noise stimulus, overlapped with the center spot of the color noise stimulus.

      For analysis, we excluded neurons without a significant center STA, which may be caused by misalignment of the stimulus.

      Together, we believe these points strongly suggest that the center spot and the surround annulus of the noise stimulus predominantly drive center (i.e. classical RF) and surround (i.e. extraclassical RF), respectively, of the recorded V1 neurons. This is further supported by the fact that color response types identified using an automated clustering method were robust across mice (Suppl. Fig. 6c), indicating consistent stimulus centering.

      Nevertheless, we cannot exclude the possibility that the stimulus was misaligned for a subset of the recorded neurons used in our analysis. We agree with the reviewer that such misalignment might have caused the center stimulus to partially activate the surround. To further address this issue beyond the controls we have already implemented, one could compare the results of our approach with an approach that centers the stimulus on individual neurons. However, we believe that performing these additional experiments is beyond the scope of the current study.

      To acknowledge the experimental limitations of our study and the concerns brought up by the reviewer, we have added the steps we perform to reduce the effects of stimulus misalignment in the Results section and discuss the problem of stimulus alignment in the Discussion in a separate section. With this, we believe our manuscript explains both the rationale behind our stimulus design as well as important limitations of the approach.

      Comparison with retina A key conclusion of the paper is that the chromatic tuning in V1 is not inherited from retinal ganglion cells. This conclusion comes from comparing chromatic tuning in a previously-collected data set from retina with the present results. But the retina recordings were made using a considerably smaller spot, and hence it is not clear that the comparison made in the paper is accurate. For example, the stimulus used for the V1 experiments almost certainly strongly stimulates both center and surround of retinal ganglion cells. The text focuses on color opponency in the receptive field centers of retinal ganglion cells, but center-surround opponency seems at least as relevant for such large spots. This issue needs to be described more clearly and earlier in the paper.

      Thanks for this comment. We removed the retinal findings from the abstract, as well as from the summary of our results in the Introduction. In addition, we now phrase the interpretation of the retinal results more conservatively and specifically highlight in the Discussion that comparing ex-vivo retinal to in-vivo cortical data is challenging. With these changes, we believe that the focus of the paper is explicitly defined to be on the neuronal representation of color in mouse visual cortex, rather than on the comparison of retinal and cortical color processing.

      Limitations associated with ETA analysis One of the reviewers in the previous round of reviews raised the concern that the ETA analysis may not accurately capture responses of cells with nonlinear receptive field properties such as On/Off cells. This possibility and whether it is a concern should be discussed.

      Thanks for this comment. We now discuss the limitation of using an ETA analysis in the

      Discussion section.

      Discrimination performance poor Discriminability of color or luminance is used as a measure of population coding. The discrimination performance appears to be quite poor - with 500-1000 neurons needed to reliably distinguish light from dark or green from UV. Intuitively I would expect that a single cell would provide such discrimination. Is this intuition wrong? If not, how do we interpret the discrimination analyses?

      Thank you for raising this point. The plots in Fig. 2c (and Figs. 3-5) show discriminability in bits, with the discrimination accuracy in % highlighted by the dotted horizontal lines. For 500 neurons, the discriminability is approx. 0.8 bits, corresponding to 95% accuracy. Even for 50 neurons, the accuracy is significantly above chance level. We now mention in the legends that the dotted lines indicate decoding accuracy in %.

    1. Author response:

      The following is the authors’ response to the current reviews.

      (1) Though we cannot survey all mutants, our observation that 774 genetically diverse adaptive mutants converge at the level of phenotype is important. It adds to growing evidence (see PMID33263280, PMID37437111, PMID22282810, PMID25806684) that the genetic basis of adaptation is not as diverse as the phenotypic basis. This convergence could make evolution more predictable.

      (2) Previous fitness competitions using this specific barcode system have been run for greater than 25 generations (PMID33263280, PMID27594428, PMID37861305, PMID27594428). We measure fitness per cycle, rather than per generation, so our fitness advantages are comparable to those in the aforementioned studies, including Venkataram and Dunn et al. (PMID27594428).

      (3) Our results remain the same upon removing the ~150 lineages with the noisiest fitness inferences, including those the reviewer mentions (see Figure S7).

      (4) We agree that there are likely more than the 6 clusters that we validated with follow-up studies (see Discussion). The important point is that we see a great deal of convergence in the behavior of diverse adaptive mutants.

      (5) The growth curves requested by the reviewer were included in our original manuscript; several more were added in the revision (see Figures 5D, 5E, 7D, S11B, S11C).


      The following is the authors’ response to the original reviews.

      Public Reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      In their manuscript, Schmidlin, Apodaca, et al try to answer fundamental questions about the evolution of new phenotypes and the trade-offs associated with this process. As a model, they use yeast resistance to two drugs, fluconazole and radicicol. They use barcoded libraries of isogenic yeasts to evolve thousands of strains in 12 different environments. They then measure the fitness of evolved strains in all environments and use these measurements to examine patterns in fitness trade-offs. They identify only six major clusters corresponding to different trade-off profiles, suggesting the vast genotypic landscape of evolved mutants translates to a highly constrained phenotypic space. They sequence over a hundred evolved strains and find that mutations in the same gene can result in different phenotypic profiles.  

      Overall, the authors deploy innovative methods to scale up experimental evolution experiments, and in many aspects of their approach tried to minimize experimental variation. 

      We thank the reviewer for this positive assessment of our work. We are happy that the reviewer noted what we feel is a unique strength of our approach: we scaled up experimental evolution by using DNA barcodes and by exploring 12 related selection pressures.  Despite this scaling up, we still see phenotypic convergence among the 744 adaptive mutants we study. 

      Weaknesses: 

      (1) One of the objectives of the authors is to characterize the extent of phenotypic diversity in terms of resistance trade-offs between fluconazole and radicicol. To minimize noise in the measurement of relative fitness, the authors only included strains with at least 500 barcode counts across all time points in all 12 experimental conditions, resulting in a set of 774 lineages passing this threshold. This corresponds to a very small fraction of the starting set of ~21 000 lineages that were combined after experimental evolution for fitness measurements. 

      This is a misunderstanding that we clarified in this revision. Our starting set did not include 21,000 adaptive lineages. The total number of unique adaptive lineages in this starting set is much lower than 21,000 for two reasons. 

      First, ~21,000 represents the number of single colonies we isolated in total from our evolution experiments. Many of these isolates possess the same barcode, meaning they are duplicates. Second, and perhaps more importantly, most evolved lineages do not acquire adaptive mutations, meaning that many of the 21,000 isolates are genetically identical to their ancestor. In our revised manuscript, we explicitly stated that these 21,000 isolated lineages do not all represent unique, adaptive lineages. We changed the word “lineages” to “isolates” where relevant in Figure 2 and the accompanying legend. And we have added the following sentence to the figure 2 legend (line 212), “These ~21,000 isolates do not represent as many unique, adaptive lineages because many either have the same barcode or do not possess adaptive mutations.”

      More broadly speaking, several previous studies have demonstrated that diverse genetic mutations converge at the level of phenotype and have suggested that this convergence makes adaptation more predictable (PMID33263280, PMID37437111, PMID22282810, PMID25806684). Most of these studies survey fewer than 774 mutants. Further, our study captures mutants that are overlooked in previous studies, such as those that emerge across subtly different selection pressures (e.g., 4 𝜇g/ml vs. 8 𝜇g/ml flu) and those that are undetectable in evolutions lacking DNA barcodes. Thus, while our experimental design misses some mutants (see next comment), it captures many others. Thus, we feel that “our work – showing that 774 mutants fall into a much smaller number of groups” is important because it “contributes to growing literature suggesting that the phenotypic basis of adaptation is not as diverse as the genetic basis (lines 176 - 178).”

      As the authors briefly remark, this will bias their datasets for lineages with high fitness in all 12 environments, as all these strains must be fit enough to maintain a high abundance. 

      We now devote 19 lines of text to discussing this bias (on lines 160 - 162, 278-284, and in more detail on 758 - 767).

      We walk through an example of a class of mutants that our study misses. One lines 759 - 763, we say, “our study is underpowered to detect adaptive lineages that have low fitness in any of the 12 environments. This is bound to exclude large numbers of adaptive mutants. For example, previous work has shown some FLU resistant mutants have strong tradeoffs in RAD (Cowen and Lindquist 2005). Perhaps we are unable to detect these mutants because their barcodes are at too low a frequency in RAD environments, thus they are excluded from our collection of 774.”

      In our revised version, we added more text earlier in the manuscript that explicitly discusses this bias. Lines 278 – 283 now read, “The 774 lineages we focus on are biased towards those that are reproducibly adaptive in multiple environments we study. This is because lineages that have low fitness in a particular environment are rarely observed >500 times in that environment (Figure S4). By requiring lineages to have high-coverage fitness measurements in all 12 conditions, we may be excluding adaptive mutants that have severe tradeoffs in one or more environments, consequently blinding ourselves to mutants that act via unique underlying mechanisms.”

      Note that while we “miss” some classes of mutants, we “catch” other classes that may have been missed in previous studies of convergence. For example, we observe a unique class of FLU-resistant mutants that primarily emerged in evolution experiments that lack FLU (Figure 3). Thus, we think that the unique design of our study, surveying 12 environments, allows us to make a novel contribution to the study of phenotypic convergence.

      One of the main observations of the authors is phenotypic space is constrained to a few clusters of roughly similar relative fitness patterns, giving hope that such clusters could be enumerated and considered to design antimicrobial treatment strategies. However, by excluding all lineages that fit in only one or a few environments, they conceal much of the diversity that might exist in terms of trade-offs and set up an inclusion threshold that might present only a small fraction of phenotypic space with characteristics consistent with generalist resistance mechanisms or broadly increased fitness. This has important implications regarding the general conclusions of the authors regarding the evolution of trade-offs. 

      We agree and discussed exactly the reviewer’s point about our inclusion threshold in the 19 lines of text mentioned previously (lines 160 - 162, 278-284, and 758 - 767). To add to this discussion, and avoid the misunderstanding the reviewer mentions, we added the following strongly-worded sentence to the end of the paragraph on lines 749 – 767 in our revised manuscript: “This could complicate (or even make impossible) endeavors to design antimicrobial treatment strategies that thwart resistance”. 

      More generally speaking, we set up our study around Figure 1, which depicts a treatment strategy that works best if there exists but a single type of adaptive mutant. Despite our inclusion threshold, we find there are at least 6 types of mutants. This diminishes hopes of designing simple multidrug strategies like Figure 1. Our goal is to present a tempered and nuanced discussion of whether and how to move forward with designing multidrug strategies, given our observations. On one hand, we point out how the phenotypic convergence we observe is promising. But on the other hand, we also point out how there may be less convergence than meets the eye for various reasons including the inclusion threshold the reviewer mentions (lines 749 - 767).

      We have made several minor edits to the text with the goal of providing a more balanced discussion of both sides. For example, we added the words, “may yet” to the following sentences on lines 32 – 36 of the abstract: “These findings, on one hand, demonstrate the difficulty in relying on consistent or intuitive tradeoffs when designing multidrug treatments. On the other hand, by demonstrating that hundreds of adaptive mutations can be reduced to a few groups with characteristic tradeoffs, our findings may yet empower multidrug strategies that leverage tradeoffs to combat resistance.”

      (2) Most large-scale pooled competition assays using barcodes are usually stopped after ~25 to avoid noise due to the emergence of secondary mutations. 

      The rate at which new mutations enter a population is driven by various factors such as the mutation rate and population size, so choosing an arbitrary threshold like 25 generations is difficult. 

      We conducted our fitness competition following previous work using the Levy/Blundell yeast barcode system, in which the number of generations reported varies from 32 to 40 (PMID33263280, PMID27594428, PMID37861305, see PMID27594428 for detailed calculation of the fraction of lineages biased by secondary mutations in this system). 

      The authors measure fitness across ~40 generations, which is almost the same number of generations as in the evolution experiment. This raises the possibility of secondary mutations biasing abundance values, which would not have been detected by the whole genome sequencing as it was performed before the competition assay. 

      Previous work has demonstrated that in this evolution platform, most mutations occur during the transformation that introduces the DNA barcodes (Levy et al. 2015). In other words, these mutations are already present and do not accumulate during the 40 generations of evolution. Therefore, the observation that we collect a genetically diverse pool of adaptive mutants after 40 generations of evolution is not evidence that 40 generations is enough time for secondary mutations to bias abundance values.

      We have added the following sentence to the main text to highlight this issue (lines 247 - 249): “This happens because the barcoding process is slightly mutagenic, thus there is less need to wait for DNA replication errors to introduce mutations (Levy et al. 2015; Venkataram et al. 2016).

      We also elaborate on this in the method section entitled, “Performing barcoded fitness competition experiments,” where we added a full paragraph to clarify this issue (lines 972 - 980).

      (3) The approach used by the authors to identify and visualize clusters of phenotypes among lineages does not seem to consider the uncertainty in the measurement of their relative fitness. As can be seen from Figure S4, the inter-replicate difference in measured fitness can often be quite large. From these graphs, it is also possible to see that some of the fitness measurements do not correlate linearly (ex.: Med Flu, Hi Rad Low Flu), meaning that taking the average of both replicates might not be the best approach.  Because the clustering approach used does not seem to take this variability into account, it becomes difficult to evaluate the strength of the clustering, especially because the UMAP projection does not include any representation of uncertainty around the position of lineages. This might paint a misleading picture where clusters appear well separate and well defined but are in fact much fuzzier, which would impact the conclusion that the phenotypic space is constricted. 

      Our noisiest fitness measurements correspond to barcodes that are the least abundant and thus suffer the most from stochastic sampling noise. These are also the barcodes that introduce the nonlinearity the reviewer mentions. We removed these from our dataset by increasing our coverage threshold from 500 reads to 5,000 reads. The clusters did not collapse, which suggests that they were not capturing this noise (Figure S7B).

      More importantly, we devoted 4 figures and 200 lines of text to demonstrating that the clusters we identified capture biologically meaningful differences between mutants (and not noise). We have modified the main text to point readers to figures 5 through 8 earlier, such that it is more apparent that the clustering analysis is just the first piece of our data demonstrating convergence at the level of phenotype.

      (4) The authors make the decision to use UMAP and a gaussian mixed model to cluster and represent the different fitness landscapes of their lineages of interest. Their approach has many caveats. First, compared to PCA, the axis does not provide any information about the actual dissimilarities between clusters. Using PCA would have allowed a better understanding of the amount of variance explained by components that separate clusters, as well as more interpretable components. 

      The components derived from PCA are often not interpretable. It’s not obvious that each one, or even the first one, will represent an intuitive phenotype, like resistance to fluconazole.  Moreover, we see many non-linearities in our data. For example, fitness in a double drug environment is not predicted by adding up fitness in the relevant single drug environments. Also, there are mutants that have high fitness when fluconazole is absent or abundant, but low fitness when mild concentrations are present. These types of nonlinearities can make the axes in PCA very difficult to interpret, plus these nonlinearities can be missed by PCA, thus we prefer other clustering methods. 

      Still, we agree that confirming our clusters are robust to different clustering methods is helpful. We have included PCA in the revised manuscript, plotting PC1 vs PC2 as Figure S9 with points colored according to the cluster assignment in figure 4 (i.e. using a gaussian mixture model). It appears the clusters are largely preserved.

      Second, the advantages of dimensional reduction are not clear. In the competition experiment, 11/12 conditions (all but the no drug, no DMSO conditions) can be mapped to only three dimensions: concentration of fluconazole, concentration of radicicol, and relative fitness. Each lineage would have its own fitness landscape as defined by the plane formed by relative fitness values in this space, which can then be examined and compared between lineages. 

      We worry that the idea stems from apriori notions of what the important dimensions should be. The biology of our system is unfortunately not intuitive. For example, it seems like this idea would miss important nonlinearities such as our observation that low fluconazole behaves more like a novel selection pressure than a dialed down version of high fluconazole. 

      Third, the choice of 7 clusters as the cutoff for the multiple Gaussian model is not well explained. Based on Figure S6A, BIC starts leveling off at 6 clusters, not 7, and going to 8 clusters would provide the same reduction as going from 6 to 7. This choice also appears arbitrary in Figure S6B, where BIC levels off at 9 clusters when only highly abundant lineages are considered. 

      We agree. We did not rely on the results of BIC alone to make final decisions about how many clusters to include. Another factor we considered were follow-up genotyping and phenotyping studies that confirm biologically meaningful differences between the mutants in each cluster (Figures 5 – 8). We now state this explicitly. Here is the modified paragraph where we describe how we chose a model with 7 clusters, from lines 436 – 446 of the revised manuscript:

      “Beyond the obvious divide between the top and bottom clusters of mutants on the UMAP, we used a gaussian mixture model (GMM) (Fraley and Raftery, 2003) to identify clusters. A common problem in this type of analysis is the risk of dividing the data into clusters based on variation that represents measurement noise rather than reproducible differences between mutants (Mirkin, 2011; Zhao et al., 2008). One way we avoided this was by using a GMM quality control metric (BIC score) to establish how splitting out additional clusters affected model performance (Figure S6). Another factor we considered were follow-up genotyping and phenotyping studies that demonstrate biologically meaningful differences between mutants in different clusters (Figures 5 – 8). Using this information, we identified seven clusters of distinct mutants, including one pertaining to the control strains, and six others pertaining to presumed different classes of adaptive mutant (Figure 4D). It is possible that there exist additional clusters, beyond those we are able to tease apart in this study.”

      This directly contradicts the statement in the main text that clusters are robust to noise, as more a stringent inclusion threshold appears to increase and not decrease the optimal number of clusters. Additional criteria to BIC could have been used to help choose the optimal number of clusters or even if mixed Gaussian modeling is appropriate for this dataset. 

      We are under the following impression: If our clustering method was overfitting, i.e. capturing noise, the optimal number of clusters should decrease when we eliminate noise. It increased. In other words, the observation that our clusters did not collapse (i.e.

      merge) when we removed noise suggests these clusters were not capturing noise. 

      Most importantly, our validation experiments, described below, provide additional evidence that our clusters capture meaningful differences between mutants (and not noise).  

      (5) Large-scale barcode sequencing assays can often be noisy and are generally validated using growth curves or competition assays. 

      Some types of bar-seq methods, in particular those that look at fold change across two time points, are noisier than others that look at how frequency changes across multiple timepoints (PMID30391162). Here, we use the less noisy method. We also reduce noise by using a stricter coverage threshold than previous work (e.g., PMID33263280), and by excluding batch effects by performing all experiments simultaneously, since we found this to be effective in our previous work (PMID37237236). 

      Perhaps also relevant is that the main assay we use to measure fitness has been previously validated (PMID27594428) and no subsequent study using this assay validates using the methods suggested above (see PMID37861305, PMID33263280, PMID31611676, PMID29429618, PMID37192196, PMID34465770, PMID33493203). Similarly, bar-seq has been used, without the suggested validation, to demonstrate that the way some mutant’s fitness changes across environments is different from other mutants (PMID33263280, PMID37861305, PMID31611676, PMID33493203, PMID34596043). This is the same thing that we use bar-seq to demonstrate. 

      For all of these reasons above, we are hesitant to confirm bar-seq itself as a valid way to infer fitness. It seems this is already accepted as a standard in our field. However, please see below.

      Having these types of results would help support the accuracy of the main assay in the manuscript and thus better support the claims of the authors. 

      While we don’t agree that fitness measurements obtained from this bar-seq assay generally require validation, we do agree that it is important to validate whether the mutants in each of our 6 clusters indeed are different from one another in meaningful ways.

      Our manuscript has 4 figures (5 - 8) and over 200 lines of text dedicated to validating whether our clusters capture reproducible and biologically meaningful differences between mutants. In the revised manuscript, we added additional validation experiments, such that three figures (Figures 5, 7 and S11) now involve growth curves, as the reviewer requested. 

      Below, we walk through the different types of validation experiments that are present in our manuscript, including those that were added in this revision.

      (1) Mutants from different clusters have different growth curves: In our original manuscript, we measured growth curves corresponding to a fitness tradeoff that we thought was surprising. Mutants in clusters 4 and 5 both have fitness advantages in single drug conditions. While mutants from cluster 4 also are advantageous in the relevant double drug conditions, mutants from cluster 5 are not! We validated these different behaviors by studying growth curves for a mutant from each cluster (Figures 7 and S11), finding that mutants from different clusters have different growth curves. In the revised manuscript, we added growth curves for 6 additional mutants (3 from cluster 1 and 3 from cluster 3), demonstrating that only the cluster 1 mutants have a tradeoff in high concentrations of fluconazole (see Figure 5D & 5E). In sum, this work demonstrates that mutants from different clusters have predictable differences in their growth phenotypes.

      (2) Mutants from different clusters have different evolutionary origins: In our original manuscript, we came up with a novel way to ask whether the clusters capture different types of adaptive mutants. We asked whether the mutants in each cluster originate from different evolution experiments. They often do (see pie charts in Figures 5, 6, 7, 8). In the revised manuscript, we extended this analysis to include mutants from cluster 1. Cluster 1 is defined by high fitness in low fluconazole that declines with increasing fluconazole. In our revised manuscript, we show that cluster 1 lineages were overwhelmingly sampled from evolutions conducted in our lowest concentration of fluconazole (see pie chart in new Figure 5A). No other cluster’s evolutionary history shows this pattern (compare to pie charts in figures 6, 7, and 8).

      **These pie charts also provide independent confirmation supporting the fitness tradeoffs observed for each cluster in figure 4E. For example, mutants in cluster 5 appear to have a tradeoff in a particular double drug condition (HRLF), and the pie charts confirm that they rarely originate from that evolution condition. This differs from cluster 4 mutants, which do not have a fitness tradeoff in HRLF, and are more likely to originate from that environment (see purple pie slice in figure 7). Additional cases where results of evolution experiments (pie charts) confirm observed fitness tradeoffs are discussed in the manuscript on lines 320 – 326, 594 – 598, 681 – 685.

      (3) Mutants from each cluster often fall into different genes: We sequenced many of these mutants and show that mutants in the same gene are often found in the same cluster. For example, all 3 IRA1 mutants are in cluster 6 (Fig 8), both GPB2 mutants are in cluster 4 (Figs 7 & 8), and 35/36 PDR mutants are in either cluster 2 or 3 (Figs 5 & 6). 

      (4) Mutants from each cluster have behaviors previously observed in the literature: We compared our sequencing results to the literature and found congruence. For example, PDR mutants are known to provide a fitness benefit in fluconazole and are found in clusters that have high fitness in fluconazole (lines 485 - 491). Previous work suggests that some mutations to PDR have different tradeoffs than others, which corresponds to our finding that PDR mutants fall into two separate clusters (lines 610 - 612). IRA1 mutants were previously observed to have high fitness in our “no drug” condition and are found in the cluster that has the highest fitness in the “no drug” condition (lines 691 - 696). Previous work even confirms the unusual fitness tradeoff we observe where IRA1 and other cluster 6 mutants have low fitness only in low concentrations of fluconazole (lines 702 - 704).

      (5) Mutants largely remain in their clusters when we use alternate clustering methods:  In our original manuscript, we performed various different re-clustering and/or normalization approaches on our data (Fig 6, S5, S7, S8, S10). The clusters of mutants that we observe in figure 4 do not change substantially when we re-cluster the data. In our revised manuscript, we added another clustering method: principal component analysis (PCA) (Fig S9).  Again, we found that our clusters are largely preserved.

      While these experiments demonstrate meaningful differences between the mutants in each cluster, important questions remain. For example, a long-standing question in biology centers on the extent to which every mutation has unique phenotypic effects versus the extent to which scientists can predict the effects of some mutations from other similar mutations. Additional studies on the clusters of mutants discovered here will be useful in deepening our understanding of this topic and more generally of the degree of pleiotropy in the genotype-phenotype map.

      Reviewer #2 (Public Review): 

      Summary: 

      Schmidlin & Apodaca et al. aim to distinguish mutants that resist drugs via different mechanisms by examining fitness tradeoffs across hundreds of fluconazole-resistant yeast strains. They barcoded a collection of fluconazole-resistant isolates and evolved them in different environments with a view to having relevance for evolutionary theory, medicine, and genotypephenotype mapping. 

      Strengths: 

      There are multiple strengths to this paper, the first of which is pointing out how much work has gone into it; the quality of the experiments (the thought process, the data, the figures) is excellent. Here, the authors seek to induce mutations in multiple environments, which is a really large-scale task. I particularly like the attention paid to isolates with are resistant to low concentrations of FLU. So often these are overlooked in favour of those conferring MIC values >64/128 etc. What was seen is different genotype and fitness profiles. I think there's a wealth of information here that will actually be of interest to more than just the fields mentioned (evolutionary medicine/theory). 

      We are grateful for this positive review. This was indeed a lot of work! We are happy that the reviewer noted what we feel is a unique strength of our manuscript: that we survey adaptive isolates across multiple environments, including low drug concentrations.  

      Weaknesses: 

      Not picking up low fitness lineages - which the authors discuss and provide a rationale as to why. I can completely see how this has occurred during this research, and whilst it is a shame I do not think this takes away from the findings of this paper. Maybe in the next one! 

      We thank the reviewer for these words of encouragement and will work towards catching more low fitness lineages in our next project.

      In the abstract the authors focus on 'tradeoffs' yet in the discussion they say the purpose of the study is to see how many different mechanisms of FLU resistance may exist (lines 679-680), followed up by "We distinguish mutants that likely act via different mechanisms by identifying those with different fitness tradeoffs across 12 environments". Whilst I do see their point, and this is entirely feasible, I would like a bit more explanation around this (perhaps in the intro) to help lay-readers make this jump. The remainder of my comments on 'weaknesses' are relatively fixable, I think: 

      We have expanded the introduction, in particular lines 129 – 157 of the revised manuscript, to walk readers through the connection between fitness tradeoffs and molecular mechanisms. For example, here is one relevant section of new text from lines 131 - 136: “The intuition here is as follows. If two groups of drug resistant mutants have different fitness tradeoffs, it could mean that they provide resistance through different underlying mechanisms. Alternatively, both could provide drug resistance via the same mechanism, but some mutations might also affect fitness via additional mechanisms (i.e. they might have unique “side-effects” at the molecular level) resulting in unique fitness tradeoffs in some environments.”

      In the introduction I struggle to see how this body of research fits in with the current literature, as the literature cited is a hodge-podge of bacterial and fungal evolution studies, which are very different! So example, the authors state "previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms" (lines 129-131) and then cite three papers, only one of which is a fungal research output. However, the next sentence focuses solely on literature from fungal research. Citing bacterial work as a foundation is fine, but as you're using yeast for this I think tailoring the introduction more to what is and isn't known in fungi would be more appropriate. It would also be great to then circle back around and mention monotherapy vs combination drug therapy for fungal infections as a rationale for this study. The study seems to be focused on FLU-resistant mutants, which is the first-line drug of choice, but many (yeast) infections have acquired resistance to this and combination therapy is the norm. 

      We ourselves are broadly interested in the structure of the genotype-phenotype-fitness map (PMID33263280, PMID32804946). For example, we are interested in whether diverse mutations converge at the level of phenotype and fitness. Figure 1A depicts a scenario with a lot of convergence in that all adaptive mutations have the same fitness tradeoffs.

      The reason we cite papers from yeast, as well as bacteria and cancer, is that we believe general conclusions about the structure of the genotype-phenotype-fitness map apply broadly. For example, the sentence the reviewer highlights, “previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms” is a general observation about the way genotype maps to fitness. So, we cited papers from across the tree of life to support this sentence.  And in the next sentence, where we cite 3 papers focusing solely on fungal research, we cite them because they are studies about the complexity of this map. Their conclusions, in theory, should also apply broadly, beyond yeast.

      On the other hand, because we study drug resistant mutations, we hope that our dataset and observations are of use to scientists studying the evolution of resistance. We use our introduction to explain how the structure of the genotype-phenotype-fitness map might influence whether a multidrug strategy is successful (Figure 1).

      We are hesitant to rework our introduction to focus more specifically on fungal infections as this is not our primary area of expertise.

      Methods: Line 769 - which yeast? I haven't even seen mention of which species is being used in this study; different yeast employ different mechanisms of adaptation for resistance, so could greatly impact the results seen. This could help with some background context if the species is mentioned (although I assume S. cerevisiae). 

      In the revised manuscript, we have edited several lines (line 95, 186, 822) to state the organism this work was done with is Saccharomyces cerevisiae. 

      In which case, should aneuploidy be considered as a mechanism? This is mentioned briefly on line 556, but with all the sequencing data acquired this could be checked quickly? 

      We like this idea and we are working on it, but it is not straightforward. The reviewer is correct in that we can use the sequencing data that we already have. But calling aneuploidy with certainty is tough because its signal can be masked by noise. In other words, some regions of the genome may be sequenced more than others by chance.

      Given this is not straightforward, at least not for us, this analysis will likely have to wait for a subsequent paper. 

      I think the authors could be bolder and try and link this to other (pathogenic) yeasts. What are the implications of this work on say, Candida infections? 

      Perhaps because our background lies in general study of the genotype-phenotype map, we are hesitant about making bold assertions about how our work might apply to pathogenic yeasts. We are hopeful that our work will serve as a stepping-stone such that scientists from that community can perhaps make (and test) such statements.   

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I found the ideas and the questions asked in this manuscript to be interesting and ambitious. The setup of the evolution and fitness competition experiments was well poised to answer them, but the analysis of the data is not currently enough to properly support the claims made. I would suggest revising the analysis to address the weaknesses raised in the public review and if possible, adding some more experimental validations. As you already have genome sequencing data showing the causal mutation for many mutants across the different clusters, it should be possible for you to reconstruct some of the strains and test validate their phenotypes and cluster identity. 

      Yes, this is possible. We added more validation experiments (see figure 5). We already had quite a few validation experiments (figures 5 - 8 and lines 479 - 718), but we did not clearly highlight the significance of these analyses in our original manuscript. Therefore, we modified the text in our revised manuscript in various places to do so. For example, we now make clearer that we jointly use BIC scores as well as validation experiments to decide how many clusters to describe (lines 436 - 446). We also make clearer that our clustering analysis is only the first step towards identifying groups of mutants with similar tradeoffs by using words and phrases like, “we start by” (line 411) and “preliminarily” (line 448) when discussing the clustering analysis.  We also point readers to all the figures describing our validation experiments earlier (line 443), and list these experiments out in the discussion (lines 738 - 741).

      Also, please deposit your genome sequencing data in a public database (I am not sure I saw it mentioned anywhere). 

      We have updated line 1088 of the methods section to include this sentence: “Whole genome sequences were deposited in GenBank under SRA reference PRJNA1023288.”

      Reviewer #2 (Recommendations For The Authors):

      I don't think the figures or experiments can be improved upon, they are excellent. There are a few times I feel things are written in a rather confusing way and could be explained better, but also I feel there are places the authors jump from one thing to another really quickly and the reader (who might not be an expert in this area) will struggle to keep up. For example: 

      Explaining what RAD is - it is introduced in the methods, but what it is, is not really explained. 

      Since the introduction is already very long, we chose not to explain radicicol’s mechanism of action here. Instead, we bring this up later on lines 614 – 621 when it becomes relevant.

      More generally, in response to this advice and that from reviewer 1, we also added text to various places in the manuscript to help explain our work more clearly. In particular, we clarified the significance of our validation experiments and various important methodological details (see above). We also better explained the connection between fitness tradeoffs and mechanisms (see above) and added more details about the potential use cases of our approach (lines 142 – 150).

      The abstract states "some of the groupings we find are surprising. For example, we find some mutants that resist single drugs do not resist their combination, and some mutants to the same gene have different tradeoffs than others". Firstly, this sentence is a bit confusing to read but if I've read it as intended, then is it really surprising? It's difficult for organisms (bacteria and fungi) to develop multiple beneficial mutations conferring drug resistance on the same background, hence why combination antifungal drug therapy is often used to treat infections. 

      This is a place where brevity got in the way of clarity. We added a bit of text to make clear why we were surprised. Specifically, we were surprised because not all mutants behave the same. Some resist single drugs AND their combination. Some resist single drugs but not their combination. The sentence in the abstract now reads, “For example, we find some mutants that resist single drugs do not resist their combination, while others do. And some mutants to the same gene have different tradeoffs than others.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This study is convincing because they performed time-resolved X-ray crystallography under different pH conditions using active/inactive metal ions and PpoI mutants, as with the activity measurements in solution in conventional enzymatic studies. Although the reaction mechanism is simple and may be a little predictable, the strength of this study is that they were able to validate that PpoI catalyzes DNA hydrolysis through "a single divalent cation" because time-resolved X-ray study often observes transient metal ions which are important for catalysis but are not predictable in previous studies with static structures such as enzyme-substrate analog-metal ion complexes. The discussion of this study is well supported by their data. This study visualized the catalytic process and mutational effects on catalysis, providing new insight into the catalytic mechanism of I-PpoI through a single divalent cation. The authors found that His98, a candidate of proton acceptor in the previous experiments, also affects the Mg2+ binding for catalysis without the direct interaction between His98 and the Mg2+ ion, suggesting that "Without a proper proton acceptor, the metal ion may be prone for dissociation without the reaction proceeding, and thus stable Mg2+ binding was not observed in crystallo without His98". In future, this interesting feature observed in I-PpoI should be investigated by biochemical, structural, and computational analyses using other metal-ion dependent nucleases. 

      We appreciate the reviewer for the positive assessment as well as all the comments and suggestions.

      Reviewer #2 (Public Review): 

      Summary: 

      Most polymerases and nucleases use two or three divalent metal ions in their catalytic functions. The family of His-Me nucleases, however, use only one divalent metal ion, along with a conserved histidine, to catalyze DNA hydrolysis. The mechanism has been studied previously but, according to the authors, it remained unclear. By use of a time resolved X-ray crystallography, this work convincingly demonstrated that only one M2+ ion is involved in the catalysis of the His-Me I-PpoI 19 nuclease, and proposed concerted functions of the metal and the histidine. 

      Strengths: 

      This work performs mechanistic studies, including the number and roles of metal ion, pH dependence, and activation mechanism, all by structural analyses, coupled with some kinetics and mutagenesis. Overall, it is a highly rigorous work. This approach was first developed in Science (2016) for a DNA polymerase, in which Yang Cao was the first author. It has subsequently been applied to just 5 to 10 enzymes by different labs, mainly to clarify two versus three metal ion mechanisms. The present study is the first one to demonstrate a single metal ion mechanism by this approach. 

      Furthermore, on the basis of the quantitative correlation between the fraction of metal ion binding and the formation of product, as well as the pH dependence, and the data from site-specific mutants, the authors concluded that the functions of Mg2+ and His are a concerted process. A detailed mechanism is proposed in Figure 6. 

      Even though there are no major surprises in the results and conclusions, the time-resolved structural approach and the overall quality of the results represent a significant step forward for the Me-His family of nucleases. In addition, since the mechanism is unique among different classes of nucleases and polymerases, the work should be of interest to readers in DNA enzymology, or even mechanistic enzymology in general. 

      Thank you very much for your comments and suggestions.

      Weaknesses: 

      Two relatively minor issues are raised here for consideration: 

      p. 4, last para, lines 1-2: "we next visualized the entire reaction process by soaking I-PpoI crystals in buffer....". This is a little over-stated. The structures being observed are not reaction intermediates. They are mixtures of substrates and products in the enzyme-bound state. The progress of the reaction is limited by the progress of the soaking of the metal ion. Crystallography has just been used as a tool to monitor the reaction (and provide structural information about the product). It would be more accurate to say that "we next monitored the reaction progress by soaking....". 

      We appreciate the clarification regarding the description of our experimental approach. We agree that our structures do not represent reaction intermediates but rather mixtures of substrate and product states within the enzyme-bound environment. We have revised the text accordingly to more accurately reflect our methodology.

      p. 5, the beginning of the section. The authors on one hand emphasized the quantitative correlation between Mg ion density and the product density. On the other hand, they raised the uncertainty in the quantitation of Mg2+ density versus Na+ density, thus they repeated the study with Mn2+ which has distinct anomalous signals. This is a very good approach. However, there is still no metal ion density shown in the key Figure 2A. It will be clearer to show the progress of metal ion density in a figure (in addition to just plots), whether it is Mg or Mn. 

      Thank you for your insightful comments. We recognize the importance of visualizing metal ion density alongside product density data. To address this, we included in Figure S4 to present Mg2+/Mn2+ and product densities concurrently.

      Reviewer #1 (Recommendations For The Authors): 

      (1) Figure 6. I understand that pre-reaction state (left panel) and Metal-binding state (two middle panels) are in equilibrium. But can we state that the Metal-binding state (two middle panels) and the product state (right panel) are in equilibrium and connected by two arrows? 

      Thank you for your comments. We agree that the DNA hydrolysis reaction process may not be reversible within I-Ppo1 active site. To clarify, we removed the backward arrows between the metal-binding state and product state. In addition, we thank the reviewer for giving a name for the middle state and think it would be better to label the middle state. We added the metal-binding state label in the revised Figure 6 and also added “on the other hand, optimal alignment of a deprotonated water and Mg2+ within the active site, labeled as metal-binding state, leads to irreversible bond breakage (Fig. 6a)” within the text.

      (2) The section on DNA hydrolysis assay (Materials and Methods) is not well described. In this section, the authors should summarize the methods for the experiments in Figure 4 AC, Figure 5BC, Figure S3C, Figure S4EF, and Figure S6AB. The authors presented some graphs for the reactions. For clarity, the author should state in the legends which experiments the results are from (in crystallo or in solution). Please check and modify them. 

      Thank you for the suggestion. We have added four paragraphs to detail the experimental procedures for experiments in these figures. In addition, we have checked all of the figure legends and labeled them as “in crystallo or in solution.” To clarify, we also added “in crystallo” or “solution” in the corresponding panels.

      (3) The authors showed the anomalous signals of Mn2+ and Tl+. The authors should mention which wavelength of X-rays was used in the data collections to calculate the anomalous signals. 

      Thank you for the suggestion. We have included the wavelength of the X-ray in the figure legends that include anomalous maps, which were all determined at an X-ray wavelength of 0.9765 Å.

      (4) The full names of "His-Me" and "HNH" are necessary for a wide range of readers. 

      Thank you for the suggestion. We have included the full nomenclature for His-Me (histidine-metal) nucleases and HNH (histidine-asparagine-histidine) nuclease.

      (5) The authors should add the side chain of Arg61 in Figure 1E because it is mentioned in the main text. 

      Thank you for the suggestion. We have added Arg61 to Figure 1E.

      (6) Figure 5D. For clarity, the electron densities should cover the Na+ ion. The same request applies to WatN in Figure S3B.

      Thank you for catching this detail. We have added the electron density for the Na+ ion in Figure 5D and WatN in Figure S3B.

      (7) At line 269 on page 8, what is "previous H98A I-PpoI structure with Mn2+"? Is the structure 1CYQ? If so, it is a complex with Mg2+. 

      Thank you for catching this detail. We have edited the text to “previous H98A I-PpoI structure with Mg2+.”

      (8) At line 294 on page 9, "and substrate alignment or rotation in MutT (66)." I think "alignment of the substrate and nucleophilic water" is preferred rather than "substrate alignment or rotation". 

      Thank you for the suggestion. We have edited the text to “alignment of the substrate and nucleophilic water.”

      (9) At line 305 on page 9, "Second, (58, 69-71) single metal ion binding is strictly correlated with product formation in all conditions, at different pH and with different mutants (Figure 3a and Supplementary Figure 4a-c) (58)". The references should be cited in the correct positions. 

      Thank you for catching this typo. We have removed the references.

      (10) At line 347 on page 10, "Grown in a buffer that contained (50 g/L glucose, 200 g/L α-lactose, 10% glycerol) for 24 hrs." Is this sentence correct? 

      Thank you for catching this detail. We have corrected the sentence.

      (11) At line 395 on page 11, "The His98Ala I-PpoI crystals of first transferred and incubated in a pre-reaction buffer containing 0.1M MES (pH 6.0), 0.2 M NaCl, 1 mM MgCl2 or MnCl2, and 20% (w/v) PEG3350 for 30 min." In the experiments using this mutant, does a pre-reaction buffer contain MgCl2 or MnCl2? 

      Thank you for bringing this to our attention. We have performed two sets of experiments: 1) metal ion soaking in 1 mM Mn2+, which is performed similarly as WT and does not have Mn2+ in the pre-reaction buffer; 2) imidazole soaking, 1 mM Mn2+ was included in the pre-reaction buffer. We reasoned that the Mn2+ will not bind or promote reaction with His98Ala I-PpoI, but pre-incubation may help populate Mn2+ within the lattice for better imidazole binding. However, neither Mn2+ nor imidazole were observed. We have added experimental details for both experiments with His98Ala I-PpoI.

      (12) In the figure legends of Figure 1, is the Fo-Fc omit map shown in yellow not in green? Please remove (F) in the legends. 

      We have changed the Fo-Fc map to be shown in violet. We have also removed (f) from the figure legends.

      (13) I found descriptions of "MgCl". Please modify them to "MgCl2". 

      Thank you for catching these details. We have modified all “MgCl” to “MgCl2.”

      (14) References 72 and 73 are duplicated. 

      We have removed the duplicated reference.

      Reviewer #2 (Recommendations For The Authors): 

      p. 9, first paragraph, last three lines: "Thus, we suspect that the metal ion may play a crucial role in the chemistry step to stabilize the transition state and reduce the electronegative buildup of DNA, similar to the third metal ion in DNA polymerases and RNaseH." This point is significant but the statement seems a little uncertain. You are saying that the single metal plays the role of two metals in polymerase, in both the ground state and the transition state. I believe the sentence can be stronger and more explicit. 

      Thank you for raising this point. We suspect the single metal ion in I-PpoI is different from the A-site or B-site metal ion in DNA polymerases and RNaseH, but similar to the third metal ion in DNA polymerases and nucleases. As we stated in the text,

      (1) the metal ion in I-PpoI is not required for substrate alignment. The water molecule and substrate can be observed in place even in the presence of the metal ion. In contrast, the A-site or B-site metal ion in DNA polymerases and RNaseH are required for aligning the substrates.

      (2) Moreover, the appearance of the metal ion is strictly correlated with product formation, similar as the third metal ion in DNA polymerase and RNaseH.

      To emphasize our point, we have revised the sentence as

      “Thus, similar to the third metal ion in DNA polymerases and RNaseH, the metal ion in I-PpoI is not required for substrate alignment but is essential for catalysis. We suspect that the single metal ion helps stabilize the transition state and reduce the electronegative buildup of DNA, thereby promoting DNA hydrolysis.”

      Minor typos: 

      p. 2, line 4 from bottom: due to the relatively low resolution... 

      Thank you for catching this. We have edited the text to “due to the relatively low resolution.”

      Figure 4F: What is represented by the pink color? 

      The structures are color-coded as 320 s at pH 6 (violet), 160 s at pH 7 (yellow), and 20 s at pH 8 (green). We have included the color information in figure legend and make the labeling clearer in the panel.

      p. 9, first paragraph, last line: ...similar to the third... 

      Thank you for catching this. We have edited the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This important study explores the potential influence of physiologically relevant mechanical forces on the extrusion of vesicles from C. elegans neurons. The authors provide compelling evidence to support the idea that uterine distension can induce vesicular extrusion from adjacent neurons. The work would be strengthened by using an additional construct (preferably single-copy) to demonstrate that the observed phenotypes are not unique to a single transgenic reporter. Overall, this work will be of interest to neuroscientists and investigators in the extracellular vesicle and proteostasis fields. 

      We now include supporting data using a single copy alternate fluorescent reporter expressed in touch neurons (Fig. 3H).

      In brief, we examined the induction of exophergenesis in an alternative single-copy transgene strain that expresses mKate fluorescent protein specifically in touch receptor neurons. As compared to the multi-copy transgene that is broadly used in this study and expresses mCherry fluorescent protein specifically in touch receptor neurons, the mKate single-copy transgene is associated with a much lower frequency of exophergenesis. However, increasing uterine distension via blocking egg-laying can increase the exophergenesis of the mKate single-copy transgenic line from 0% to approximately 60% on adult day 1, indicating that the observed response is not tied to a single reporter.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors sought to understand the stage-dependent regulation of exophergenesis, a process thought to contribute to promoting neuronal proteostasis in C. elegans. Focusing on the ALMR neuron, they show that the frequency of exopher production correlates with the timing of reproduction. Using many genetic tools, they dissect the requirements of this pathway to eventually find that occupancy of the uterus acts as a signal to induce exophergenesis. Interestingly, the physical proximity of neurons to the egg zone correlates with exophergenesis frequency. The authors conclude that communication between the uterus and proximal neurons occurs through the sensing of mechanic forces of expansion normally provided by egg occupancy to coordinate exophergenesis with reproduction. 

      Strengths: 

      The genetic data presented is thorough and solid, and the observation is novel. 

      Weaknesses: 

      The main weakness of the study is that the detection of exophers is based on the overexpression of a fluorescent protein in touch neurons, and it is not clear whether this process is actually stimulated in wild-type animals, or if neurons have accumulated damaged proteins in relatively young day 2 animals. 

      We now include data using a single copy alternate fluorescent reporter expressed in touch neurons. Although baseline exopher levels are low in this strain, we demonstrate that inducing egg retention in this background markedly increases exopher generation from a baseline of near zero to ~60% (new Fig. 3H), supporting that uterine distention, rather than reporter identity, is associated with early life exopher elevation. Data also add to our observations indicating that high protein-expressing strains generally produce higher baseline levels of exophers in early adulthood (for example, Melentijevic et al. (PMID 28178240) documented that mCherry RNAi knockdown in the strain primarily studied here can lower exopher levels).

      The second point raised here, regarding the occurrence and physiological role of early-adult exophers in “native” non-stressed neurons is a fascinating question that we are beginning to address in continuing experiments. Readers will appreciate that quantifying relatively rare, “invisible” touch receptor neuron exophergenesis accurately without expressing a fluorescent reporter is technically challenging. Our speculation, outlined now a bit more clearly in the Discussion here, is that certain molecular and organelle debris that cannot readily be degraded in cells during larval development may be stored until release to more capable degradative neighbors or to the coelomocytes for later management, as one component of the early adult transition in proteostasis (see J. Labbadia and R. I. Morimoto, PMID 24592319). Receiving cells may be primed for this at a particular timepoint, possibly analogous to the “bulky garbage” collection of over-sized difficult-to-dispose-of household items that a town will address with specialized action only at specific times. The prediction is that we should be able to detect some mass protein aggregation through early development, and at least partial elimination by adult day 3; this elimination should be impaired when eggs are eliminated. Initial testing is underway.

      Reviewer #2 (Public Review): 

      Summary: 

      This paper reports that mechanical stress from egg accumulation is a biological stimulus that drives the formation of extruded vesicles from the neurons of C. elegans ALMR touch neurons. Using powerful genetic experiments only readily available in the C. elegans system, the authors manipulate oocyte production, fertilization, embryo accumulation, and egg-laying behavior, providing convincing evidence that exopher production is driven by stretch-dependent feedback of fertilized, intact eggs in the adult uterus. Shifting the timing of egg production and egg laying alters the onset of observed exophers. Pharmacological manipulation of egg laying has the predicted effects, with animals retaining fewer eggs having fewer exophers and animals with increased egg accumulation having more. The authors show that egg production and accumulation have dramatic consequences for the viscera, and moving the ALMR process away from eggs prevents the formation of exophers. This effect is not unique to ALMR but is also observed in other touch neurons, with a clear bias toward neurons whose cell bodies are adjacent to the filled uterus. Embryos lacking an intact eggshell with reduced rigidity have impaired exopher production. Acute injection into the uterus to mimic the stretch that accompanies egg production causes a similar induction of exopher release. Together these results are consistent with a model where stretch caused by fertilized embryo accumulation, and not chemical signals from the eggs themselves or egg release, underlies ALMR exopher production seen in adult animals. 

      Strengths: 

      Overall, the experiments are very convincing, using a battery of RNAi and mutant approaches to distinguish direct from indirect effects. Indeed, these experiments provide a model generally for how one would methodically test different models for exopher production. The paper is well-written and easy to understand. I had been skeptical of the origin and purpose of exophers, concerned they were an artefact of imaging conditions, caused by deranged calcium activity under stressful conditions, or as evidence for impaired animal health overall. As this study addresses how and when they form in the animal using otherwise physiologically meaningful manipulations, the stage is now set to address at a cellular level how exophers like these are made and what their functions are. 

      Weaknesses: 

      Not many. The experiments are about as good as could be done. Some of the n's on the more difficult-to-work strains or experiments are comparatively low, but this is not a significant concern because of the number of different, complementary approaches used. The microinjection experiment in Figure 7 is very interesting, there are missing details that would confirm whether this is a sound experiment. 

      We expanded description of details for the microinjection experiment in both the figure legend and the methods section, to enhance clarity and substantiate approach.

      Reviewer #3 (Public Review): 

      Summary: 

      In this paper, the authors use the C. elegans system to explore how already-stressed neurons respond to additional mechanical stress. Exophers are large extracellular vesicles secreted by cells, which can contain protein aggregates and organelles. These can be a way of getting rid of cellular debris, but as they are endocytosed by other cells can also pass protein, lipid, and RNA to recipient cells. The authors find that when the uterus fills with eggs or otherwise expands, a nearby neuron (ALMR) is far more likely to secrete exophers. This paper highlights the importance of the mechanical environment in the behavior of neurons and may be relevant to the response of neurons exposed to traumatic injury. 

      Strengths: 

      The paper has a logical flow and a compelling narrative supported by crisp and clear figures. 

      The evidence that egg accumulation leads to exopher production is strong. The authors use a variety of genetic and pharmacological methods to show that increasing pressure leads to more exopher production, and reducing pressure leads to lower exopher production. For example, egg-laying defective animals, which retain eggs in the uterus, produce many more exophers, and hyperactive egg-laying is accompanied by low exopher production. The authors even inject fluid into the uterus and observe the production of exophers. 

      Weaknesses: 

      The main weakness of the paper is that it does not explore the molecular mechanism by which the mechanical signals are received or responded to by the neuron, but this could easily be the subject of a follow-up study. 

      We agree that the molecular mechanisms operative are of considerable interest, and our initial pursuit suggests that a comprehensive study will be required for satisfactory elaboration of how mechanical signals are received or responded to by the neuron.

      I was intrigued by this paper, and have many questions. I list a few below, which could be addressed in this paper or which could be the subject of follow-up studies. 

      - Why do such a low percentage of ALMR neurons produce exophers (5-20%)? Does it have to do with the variability of the proteostress? 

      We do not yet understand why some ALMR neurons within a same genotype will produce exophers and some will not. We know that in addition to the uterine occupation we report here, proteostasis compromise, feeding status, oxidative stress, and osmotic stress can elevate exopher numbers (PMID 34475208); cell autonomous influences on exopher levels include aggresome-associated biology (PMID 37488107) and expression levels of the mCherry protein (PMID 28178240). Turek reports that social interaction on plates can influence muscle exopher levels (PMID 34288362). Thus, although variable proteostress experienced by neurons is likely a factor, we have not yet experimentally defined specific trigger rules. We suspect the summation of internal proteostasis crisis and environmental conditions, including particular force vectors/frequency will underlie the variable exopher production phenomeonon.

      - Why does the production of exophers lag the peak in progeny production by 24-48 hours? Especially when the injection method produces exophers right away?

      The progeny production can track well with exopher production (Fig. 1B), although the nature of egg counts (permanent, one time events) vs. exophers (which are slowly degraded) can skew the peak scores apart. We synchronized animals at the L4 stage. 24 hours later was adult day 1, and we measured then and every subsequent 24 hours. The daily progeny count reflects the total number of progeny produced every 24 hours; exopher events were scored once a day, but exophers can persist such that the daily exopher count can partially reflect slow degradation, with some exophers being counted on two days. We now explain our scoring details better in the Methods section.

      The rapid appearance of exophers, as early as about ~10 minutes after sustained injection, is fascinating and probably holds mechanistic implications for exopher biology. For one thing, we can infer that in the mCherry Ag2 background, touch neurons can be poised to extrude exophers, but that the pressure/push acts to trigger or license final expulsion. It is interesting that we found we needed to administer sustained injection of two minutes to find exopher increase (now better emphasized in the expanded Methods section). We speculate that a multiple pressure events, or sustained force vector might be critical (like an egg slowly passing through??). Minimally, this assay may help us assign molecular roles to pathway components as we identify them moving forward. 

      - As mentioned in the discussion, it would be interesting to know if PEZO-1/PIEZO is required for uterine stretching to activate exophergenesis. pezo-1 animals accumulate crushed oocytes in the uterus. 

      We have begun to test the hypothesis that PEZO-1 is a signaling component for ALMR exophergenesis, initially using the N and C terminal pezo-1 deletion mutants as in Bai et al. (PMID 32490809). These pezo-1 mutants have a mild decrease in ALMR exophergenesis under normal conditions. However, vulva-less conditions in pezo-1N and piezo-1C increased ALMR exophergenesis from approximately 10% to 60%, similar to the response of wild-type worms to high mechanical stress, data that suggest PEZO-1 is not a required player in mediating mechanical force-induced ALMR exophergenesis. We are currently testing genetic requirements for other known mechanosensors. We intend comprehensive investigation of the molecular mechanisms of mechanical signaing in a future study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      -The study would be significantly strengthened by the addition of data detecting regulation of exophergenesis by uterine forces in a more physiological context, in the absence of overexpression of a toxic protein. In other words, is this a process that occurs naturally during reproduction, or is it specific to proteotoxic stress induced by overexpression? Perhaps the authors could repeat key experiments using a single copy transgene, and challenge the animals with exogenous proteotoxic stress if necessary.

      We now include data using a single copy alternate fluorescent reporter expressed in touch neurons. Although baseline exopher levels are low in this strain, we demonstrate that inducing egg retention in this background markedly increases exopher generation from a baseline of near zero to ~60% (Fig. 3H), supporting that uterine distention, rather than reporter identity or over-expression alone dries early life exopher elevation.

      Also noteworthy is that we find exophergenesis in the single-copy transgenic line is only approximately 0.3% on adult day 2 (average in three trials, data not shown), which is much lower than the 5-20% exophergenesis rate typically observed in the multi-copy high expression mCherry transgenic line. Therefore, consequences of overexpression of mCherry likely potentiate exophergenesis.

      -The authors mention that exophergenesis has been described in muscle cells. Is this also dependent on the proximity to the uterus? It would have been interesting to include data on other cell types in the vicinity of the reproductive system.

      Yes, in interesting work on exophers produced by muscle, Turek et al. reported that muscle exopher events are mostly located in a region proximal to the uterus. Moreover, this work also documented that sterile hermaphrodites are associated with approximately 0% muscle exophergenesis, and egg retention in the uterus strongly increases muscle exophergenesis (PMID: 34288362).  

      -Is exophergenesis also induced by other forms of mechanical stress? For example, swimming.

      We have looked at crude treatments such as centrifugation or vortexing without observing changes in exopher levels. Our preliminary work indicates that swimming can increase exophergenesis, and this effect depends on the presence of eggs in the uterus. We appreciate the question, and expect to include documentation of alternative pressure screening in our planned future paper on molecular mechanisms.

      -In Figure 1E, the profile of exopher production for the control condition at 25oC is very similar to the profile observed at 20oC in Figure 1B. However, the profile of progeny production at 25oC is known to have an earlier peak of progeny production. Perhaps egg retention is differently correlated with progeny production at this temperature? The authors could easily test this.

      Overall, exophers (which degrade with time) and progeny counts (a fixed number) have slightly different temporal features, anchored in part by how long exophers or their “starry night” debris persist. Most exophers start to degrade within 1-6 hours (PMID: 36861960), but exopher debris can persist for more than 24 hours. An exopher event observed on day 1 may thus also be recorded at the day 2 time point, which leads to a higher frequency of exopher events on day 2 as compared to day 1.

      We have previously published on the impact of temperature on exopher number (Supplemental Figure 2 in PMID 34475208). In brief, increasing culture temperature for animals that are raised over constant lifetime temperature modestly increases exopher number; a greater increase in exophers is observed under conditions in which animals were switched to a higher temperature in adult life, suggesting changes in temperature (a mandatory part of the ts mutant studies) engages complex biology that modulates exopher production. Our previous data show that in a temperature shift to 25oC, the peak of exophers was at adult day 1. Here, Fig. 1B is constant temperature, 20oC; Fig. 1E has a temperature shift 15-25oC. That egg retention might be temperature-influenced is a plausible hypothesis, but given the complexities of temperature shifts for some mutants, we elected to defer drill-down on the temperature-exopher-egg relationship. 

      -It is not clear how to compare panels A and B in Figure 3. In panel A the males are present throughout the adult life of the hermaphrodites whereas in panel B the males are added in later life. Therefore, the effect of later-life mating on progeny production is not shown and the title of panel A in the legend is misleading. The authors need to perform a progeny count in the same conditions of mating presented in Figure 3B to allow direct comparison.

      As Reviewer 1 suggested, we performed a new progeny count now presented in new Fig. 3A, which more appropriately matches the study presented in Fig. 3B; legends adjusted.

      -On page 12, the authors state that the baseline of exophergenesis in rollers is 71%, but then attribute the 71% in Figure 4F to exophergenesis specifically in ALMR that is posterior to AVM. The authors need to clarify this point.

      Good catch on our error. The baseline of exophergenesis in rollers is ~40%, and we corrected the main text.

      -Considering the conclusion of Figure 2 that blocking embryonic events passed the 4-cell stage does not impact exopher production, it would have been interesting to compare the uterine length for emb-8 and for mex-3, since it is quite intriguing that the former suppresses exopher production while the latter has no effect.

      We repeated the emb-8 and mex-3 RNAi for these studies and encountered variability in outcome for 2 cell stage disruption via emb-8 RNAi, which is consistent with the range of published endpoints for emb-8 RNAi. We elected to include these emb-8 findings in the figure legend 2G, but removed the RNAi data from the main text figure. mex-3 uterine measures are added to revised panels 5H, 6I.

      Reviewer #2 (Recommendations For The Authors): 

      -Leaving the worms in halocarbon oil for too long (e.g. 10 min) can desiccate and kill them. Did the authors take them out of the oil before analyzing exopher production? The authors refer to these as 'sustained injections' without much description beyond that. As the worms are very small, the flow rate needed for a sustained injection over 2 minutes must be very low - so low that the needle is in danger of being clogged. Do the authors have an estimate of how much fluid was injected or the overall flow rate? I realize the flow rate measured outside of the worm may not compare directly to that of a pressurized worm, but such estimates would be instructive, particularly if they can be related to the relative volume of the eggs the injection is trying to mimic.

      After injection or mock injection, we removed the animal from the oil and flipped it if necessary to observe the ALMR neuron on the NGM-agar plate. We now expanded description of the experimental details of injection, including the estimated flow rate, in the revised Methods section.

      - The authors describe the ALMR neurons as "proteostressed", but I am not clear on whether these neurons were treated in a unique procedure to induce such a state or if the authors are merely building on other observations that egg-laying adults are dedicating significant resources to egg production, so they must be proteostressed. If they are not inducing a proteostressed state in their experiments, the authors should refrain from describing their neurons and effects as depending on such a state.

      We revised to more explicity feature published evidence that the ALMR neurons we track with mCherryAg2 bz166 are likely protestressed. Overexpression of mCherry in bz166 is associated with enlargement of lysosomes and formation of large mCherry foci that often correspond toe LAMP::GFP-positive structures in ALMR neurons (PMID: 28178240; PMID: 37488107). Marked changes in ultrastructure reflect TN stress in this background. These cellular features are not seen in wild type animals. We previously published that mCherry, polyQ74, polyQ128, Ab1-42 (which enhance proteostress) over-expression all increase exophers (PMID: 28178240). Likewise most genetic compromise of different proteostasis branches--heat shock chaperones, proteasome and autophagy--promote exophergenesis, supporting exophergenesis as a response to proteostress. In sum, the mCherryAg2 bz166 appear markedly stressed above a non-over expressing line and produce more exophers. RNAi knockdown of the mCherry lowers exopher levels (PMID: 28178240).

      In response to reviewer comment, we added a study with a single copy mKate reporter (new data Fig. 3H). We find a very low baseline of exophers in this background. This would support that high autonomous compromise associated with over-expression influences exopher levels. Interestingly, however, we found that ALMR neurons expressing mKate under a single-copy transgene still exhibit excessive exopher production (>60%) under high mechanical stress (Fig. 3H). These data are consistent with ideas that mechanical stresses can enhance exopher production, and may markedly lower the threshold for exophergenesis in close-to-native stress level neurons.

      - The authors should include more details on the source and use of the RNAi, for example, if the clones were from the Ahringer RNAi library, made anew for this study, or both.

      We now add this information in the methods section.

      - I would be curious if the authors would similarly see an induction in exopher production after acute vulval muscle silencing with histamine. I'm not suggesting this experiment, but it may offer a way to induce exophers in a more controlled manner.

      This is a great suggestion that we will try in future studies.

      - I am not sure if Figure 5 needs to be a main figure in the paper or if it would be more appropriate as a supplement.

      We considered this suggestion but we think that the strikingly strong correleation of uterus length and exopher levels is a major point of the story and these data establish a metric that we will use moving forward to distinquish whethere an exopher modulation disruption is more likely to act by modulation of reproduction or modulation of touch neuron biology. For this reason we elected to keep Figure 5 in the main text.

      Reviewer #3 (Recommendations For The Authors): 

      -The Statistics section in the methods should be expanded to describe the statistics used in the experiments that aren't nominal, of which there are many.

      We have updated and expanded the statistics section.

      -P.2 Line 49 spelling 'que' should be queue (I remember this by the useless queue of letters lined up after the 'q').

      Corrected 

      -The introduction has a bit too much information about oocyte maturation, not relevant to the study.

      We agree that the information about oocyte maturation is not critical for the laying out the related experiments and cut this section to improve focus.

      -p.3 line 22: Some exophers are seen on Day 3, so this should be restated for accuracy.

      Corrected

      -p.3 line 26. Explain here why sperm is necessary (ooyctes don't mature or ovulate effectively without sperm).

      We added this clarifying explanation.

      -p.3 line 44 Clarify in the spe-44 the oocytes are in the oviduct (not the uterus). Might be helpful to include a DIC image to accompany the helpful diagram in Figure 1D. 

      We added a sentence describing the impact of sperm absence on oocyte maturation, progression into the uterus, and retention in the gonad, with reference to PMID: 17472754.  We were able to add a DIC in the tightly packed Figure 1.

      In Supplemental Figure 6, we now include a field picture of oocyte retention in the sem-2 mutant and upon treatment of lin-39(RNAi).

      -p.5 line 3 in the Figure 1D legend; recommend delete 'light with' which is confusing and just refer to the sperm as dark dots. 

      Corrected

      -p.6 line 22-24 Check for alignment of the statements with Figure 2 (2F is cited, but it should be 2G).

      Corrected

      -p12 line 13-15; Many ALMRs not in the egg zone (70%) did not produce exophers - this is still quite a lot. It would be good to state this section in a more straightforward way (less leading the reader) and if possible to give a possible explanation.

      We modified the text to be less leading: “Thus, although ALMR soma positioning in the egg zone does not guarantee exophergenesis in the mCherryAg2 strain, the neurons that did make exophers were nearly always in the egg zone.”

      -p.15 paragraph 3 - clarify how uterine length was controlled for the overall body length of the worm.

      We did not systematically measure body length, but rather focused on uterine distention. It would be of interest to determine if length of the body correlates with uterine size, and then address how that relationship translates to exopher production but here our attention came to rest on the striking correlation of uterine length and number of exophers.

      -p.17 line 23-25; Could be stated more simply. 

      We adjusted the text: “Moreover, the oocyte retention was similarly efficacious in elevating exopher production to egg retention, increasing ALMR exophergenesis to approximately 80% in the sem-2(rf) mutant (Fig. 6C)”.

      -p.23 Line 4. I think by the time the reader reaches this sentence, the egg-coincident exophorgenesis will not be 'puzzling'. 

      Agreed, corrected.

      -p.26, Line 22, Male 'mating', not 'matting'.

      Corrected.

      -Throughout, leave space between number and unit (this is not required for degree or percent, but be consistent). 

      Corrected.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We thank the reviewers for their insights and helpful suggestions on the manuscript. Based on these, we have prepared a revision plan for this manuscript, which is outlined below. We believe these revisions will improve the overall quality of the manuscript.

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      • *

      Reviewer #1

      (Evidence, reproducibility and clarity (Required)):

      Summary:

      This study builds on previous work from the same group, where they use Drosophila photoreceptors as a model system to investigate the role or ER-plasma membrane contact sites in an in vivo setting. The authors recently described a role of the ER-PM contact site protein dEsyt in regulating photoreceptor function in Drosophila. In this follow-up study, they explore whether this function of dEsyt is connected Ca2+ signaling downstream of photoreceptor activation. Using a dEsyt mutant that should be unable to bind Ca2+, they find that Ca2+ to some extent is required for dEsyt localization, membrane contact site formation and photoreceptor function.

      Major comments:

      The use of photoreceptor cells in Drosophila is an elegant model system that enable studies of membrane contact sites and associated proteins in a native condition. The data presented by the authors clearly shows that these structures are important for photoreceptor function, and that dEsyt plays a role at these sites. However, this was already known from previous studies by the same group. When it comes to whether these contacts are sensing Ca2+ changes and if these changes are acting through dEsyt, which is the focus of the current manuscript, the results are unclear to me and would need to be clarified by the authors both in text and with new experiments.

      1) What is the role of cellular Ca2+ signaling in the regulation of dEsyt function? There are several aspects here that needs to be clarified. 1) How is WT dEsyt localization regulated by Ca2+? This could for example be evaluated in the mutant flies used in Fig. 1 (trpl302; trp343), where lack of light-induced Ca2+ influx would be predicted to result in a localization of dEsyt that resembles that observed for dEsytCaBM. 2) Is Ca2+ important for dEsyt localization, lipid exchange or both? The authors express a version of dEsyt with mutation made in all three C2 domains. In mammalian E-Syts, Ca2+ binding to the C2A domain is important for lipid exchange while binding to C2C (in E-Syt1) is important for interactions with lipids in the plasma membrane. Using more carefully designed mutants will allow the authors to determine how Ca2+ regulates dEsyt function in vivo. In addition, the authors must show experimentally that the mutant dEsytCaBM is unable to bind Ca2+ (could e.g. be done by acute Ca2+ changes in the cell-based model used in Fig. 3). Writing that "This transgene carrying a total of nine mutations should render the protein unable to bind calcium" (p. 6, line 173) is not sufficient.

      1) How is WT dEsyt localization regulated by Ca2+?

      We agree that further experimental evidence would be helpful in establishing the significance of cellular Ca2+ signaling in the control of dEsyt function. As suggested by the reviewer, the localization of wild type dEsyt will be examined in the mutants: norpAP24 (PLC null mutant) and trpl302; trp343 (protein null mutants of TRPL and TRP channels respectively) in which light induced calcium influx is eliminated. These data will be included in the revision.

      2) Is Ca2+ important for dEsyt localization, lipid exchange or both?

      We have already performed experiments to address the question of how important calcium binding to dEsyt is for lipid transport at the ER-PM interface in Drosophila photoreceptors. This results indicate a previously unexpected role for lipid exchange and will be included in the revision.

      3). Writing that "This transgene carrying a total of nine mutations should render the protein unable to bind calcium" (p. 6, line 173) is not sufficient.

      We concur with the reviewers that at present we do not have experimental data to demonstrate that dEsytCaBM can't bind Ca2+. However, as Reviewer 4 pointed out, it will be challenging to demonstrate this experimentally. A direct proof would only come from measurements of the calcium binding affinity of dEsyt (which involves protein purification that is beyond the scope of the current work). An indirect demonstration would be any cellular or in vivo experiment. In addition to the in silico analysis already included in Fig 2 C-F, we propose the following to provide additional evidence to strengthen our in silico analysis: Use AlphaFold model to demonstrate that the arrangement of the calcium binding residues in the C2 domain of dEsyt is compatible with Ca2+ binding.

      2) The localization of dEsyt shown in Fig. 3B is a bit confusing. First of all, I would recommend including markers of the ER and the plasma membrane, because without these it is difficult to make statements about the localization of dEsyt to these structures.

      As suggested, to better appreciate the localization of dEsyt in photoreceptors, we will perform colocalization of dEsyt with markers of the PM (Rhabdomere) and ER (Sub Microvillar Cisternae).

      Second, it appears that WT dEsyt localize to the reticular ER, and that the CaBM version localize to the plasma membrane. This is somewhat opposite to mammalian ESyts, where mutations that prevent Ca2+ binding either had no effect (for ESyt2) or prevented (for ESyt1) the interaction with the plasma membrane. It also appears different from the localization in vivo (Fig. 3C). Clarifying this will be important. It will also be important to connect this localization to changes in Ca2+ and not just to the localization of a mutant that may or may not be deficient in Ca2+ binding (see comment above).

      In considering this comment, we need to bear in mind the following:

      • Mammalian cells have three genes that encode for Esyt: Esyt 1, 2 and 3 whereas the Drosophila genome encodes only a single gene for Esyt.
      • In terms of sequence similarity and structure, dEsyt and hEsyt2 are very similar. However, in contrast to hEsyt2 and hEsyt3, which localize to the plasma membrane (PMID: 17360437), dEsyt acts like hEsyt1 and localizes to the ER-PM junctions.
      • A single study (PMID: 27065097) has shown that the SMP domain of Esyt1 can transfer lipids in an in vitro assay. In our studies, we have noted an unexpected function for the SMP domain of dEsyt for in vivo function as measured through phenotypes in the eye (data will be presented in the revised ms).
      • While knocking out the single dEsyt in Drosophila photoreceptor neurons results in phenotypes (Nath et.al PMID: 32716137) to date, knocking out all three Esyts in mammalian cell culture models or mice has not revealed an in vivo Bearing these points in mind it may not be reasonable to expect every observation on mammalian Esyt to be recapitulated in the fly system or vice versa. 3) I don't fully understand the time course of events. The authors show that dEsytCaBM is mislocalized already at day 1 in dark-reared flies (Fig. 3C) but this mislocalization is not accompanied by a change in MCS density or gap distance, and consistently does not influence the localization of RDGB. The authors next expose the flies to constant light illumination to trigger Ca2+ dependent signaling, and this leads to mislocalization of RDGB, perhaps indicating changes in MCS (this is not shown). From these results it is difficult to know what the role of dEsyt is. It would be necessary to also show a control where Ca2+ signaling is not induced, e.g. a parallel dark-control (same number of days but no illumination).

      It is important to remember that even complete loss of Esyt does not result in altered MCS or mislocalization of RDGB on day 1 post eclosion. This has been published by us previously (Nath et.al PMID: 32716137). Since we show in this manuscript that dEsytCaBM exerts a dominant negative effect when expressed in wild type and phenocopies dEsytKO, one might expect expression of dEsytCaBM to also lead to altered MCS density and mislocalization of RDGB by 6D constant light.

      Bearing this in mind, we will incorporate the following data in the manuscript: Addition of MCS density in dEsytKO photoreceptors at Day1 in Figure 3C.

      • Electron Microscopy to check MCS density in Rh1>dEsytCaBM at Day 6CL with appropriate control genotypes.
      • Confocal Imaging: RDGB staining in Rh1>dEsytCaBM- Day 6CD reared flies with appropriate control genotypes- dark control where only reduced Ca2+ signaling is induced due to dark noise or spontaneous PLC activation. This is particularly important given that the authors show in Fig. 1 that preventing Ca2+ influx had a dramatic impact on MCS density even at day 1 (which is in sharp contrast to dEsytCaBM-expressing flies, that show normal morphology at day 1, which rather implies that dEsyt is not a major Ca2+ effector).

      In thinking about this comment, it is important to bear in mind the details of the experimental paradigm in use in each of the experiments while drawing comparisons between the observed results. It is to be noted that throughout the manuscript dEsytCaBM is expressed selectively in photoreceptors using the Rhodopsin enhancer which drives expression of the transgene during late eye development. By contrast, in germ line mutant strains such as trpl302;trp343 the channels are blocked throughout development. Thus the phenotypes of trpl302;trp343 might be broader than that of expressing dEsytCaBM. Therefore, mutating the calcium binding residues of dEsyt and expressing it using Rh1 enhancer at a specific developmental time window might not have the same impact on the contact site density as completely blocking the major calcium permeable channels, TRP and TRPL that is important to sustain the ongoing phototransduction cascade all through the development.

      4) The experiments done in dEsyt KO flies are important, and here the authors show that dEsyt1 could to some extent rescue all phenotypes. Some results are a bit puzzling. For example, dEsyt1CaBM localization in dEsyt1 KO flies is identical to that of WT dEsyt (Fig. 5C), which is in sharp contrast to the data shown in Fig. 3C. What is the reason for this? I would have anticipated the opposite (i.e. that in WT flies, dEsytCaBM can form dimers with endogenous dEsyt through SMP-domain interactions which may have an impact on its localization and the function of endogenous dEsyt, but that in the dEsyt KO cells, dEsytCaBM would show a different localization due to the lack of endogenous dEyt to interact with). It is important to clarify as one of the major observations here is that dEsytCaBM no longer localize to MCS. Since the CaBM version of dEsyt could rescue, to some extent, MCS density and delay photoreceptor degeneration, this implies that Ca2+ may not be required for regulation of dEsyt function or that the mutant is still able to partially bind to Ca2+.

      The localization shown in Fig 5C is not of dEsytCaBM in dEsytKO photoreceptors but the localization of RDGB in Rh1>dEsytCaBM; dEsytKO at Day 1 (Figure 5C i) and as a function of age and illumination- Day 6CL (Figure 5C ii).

      One experiment that would help the authors determining the function of dEsyt in vivo would be to use a mutant that lacks functional SMP domain (ideally also with and without mutations in the C2-domains).

      There is information available to address the question of how the lipid binding module, SMP is important to render dEsyt functional at the ER-PM interface in Drosophila photoreceptors. The same will be included in the revision.

      5) PLC activation typically couples to rapid signaling and involved hydrolysis of PIP2 and release of Ca2+ from the ER. Mammalian Esyts also require PIP2 for plasma membrane binding (through interactions with C2-domains), so constitutive PLC activity would be expected to impair ESyt localization to MCS. Here, the authors expose flies for days of constant illumination. How does this influence plasma membrane PIP2 levels and could this be of relevance for how data is interpreted?

      This is an interesting question from the reviewer. However, we would like to clarify the fact that constitutive activation of PLC is different from constant activation of PLC during illumination. Flies have robust mechanisms for controlling PLC turnover and PIP2 levels during continuous illumination and Ca2+ is a key regulator of this process; the underlying mechanisms have been described by Raghu and Hardie in multiple past papers (PMID: 11343651, PMID: 15355960). This is why, apart from adaptation, flies grown in constant light for many days do not show electrophysiological defects and neither do they undergo retinal degeneration. We will however measure the kinetics of PIP2 resynthesis in (i) wild type (Day 1 vs Day 6CD vs Day 6CL) and (ii) Control, Rh1>dEsyt and Rh1>dEsytCaBM (Day 1 vs Day6CL). This might reveal some interesting insight into the mutants.

      Do the authors know whether the CaBM mutant has reduced affinity for PIP2?

      The ability of wild type dEsyt to bind PIP2 has not been determined. We will test this and if it does so, the impact of CaBM on PIP2 binding can be tested.

      Minor comments:

      • The overexpression of WT dEsyt had a dramatic impact on MCS density and gap distance, while expression of dEsytCaBM did not. If these contacts are important for photoreceptor function, is it not surprising that such a dramatic change in photoreceptor structure was without effect on function? This should be further discussed. The establishment of more contact sites and reduction in contact site distance in Rh1>dEsyt::GFP photoreceptors is likely indicative of the proposed tethering role of the protein at the ER-PM MCS. Increase in contact site density or reduction in distance need not directly parallel to the increase in the levels of MCS proteins that are expressed at these contact sites to enhance the ongoing signal transduction. We will test this idea proposed by the reviewer and include the following data in a revision to strengthen our statement:

      • RDGB levels in control vs Rh1>dEsyt::GFP - Western blot

      • Electroretinograms from the genotypes indicated above as a functional readout of the ongoing signaling cascade.
      • PIP2 kinetics in control vs Rh1>dEsyt::GFP to understand if establishing more contact sites can enhance the replenishment of the lipid at the PM. 2) How is quantification of MCS density and gap distance influenced by retinal degeneration (e.g. induced by dEsyt KO)?

      Wherever we have analyzed MCS density or gap distance, these experiments have been done in flies at ages prior to the onset of retinal degeneration defined as collapse of the microvilli of the rhabdomere. Therefore, our measurements of MCS density and gap in this paper are not affected by retinal degeneration.

      3) The graphical abstract is a bit confusing. It seems to suggest that changes in dEsyt is a consequence of ageing and does not show any role of this protein in photoreceptor function. I think that the abstract could be improved to more clearly highlight the findings in the manuscript. For example, it doesn't at all show the difference in localization between WT and CaBM.

      We will modify the graphical abstract.

      4) P. 5, line 135 the authors state that "The tethering and lipid transfer activity of mammalian Esyts are reported to be influenced by Ca2+". This is a massive understatement. Ca2+ is a critical regulator of Esyt function in mammalian cells.

      The statement will be modified.

      5) In figure legend 1B and C: correct µM to µm.

      Changes will be incorporated as per the suggestion.

      6) In figure legend 2A: should be red rectangles and not black rectangles.

      Changes will be incorporated as per the suggestion.

      7) In Fig. 2B: specify which isoform of human ESyt that is shown.

      Changes will be incorporated as per the suggestion.

      8) In Fig. 2C: do the authors mean D374 or D384 (as indicated in Fig. 2A)?

      Changes will be incorporated as per the suggestion; the residue is D374.

      Significance

      Light-induced signal transduction in photoreceptor cells involves Ca2+ influx and signaling and also depends on correct formation of ER-plasma membrane contact sites. In mammalian cells, the Esyts (esp. Esyt1 and Esyt2) localize to ER-PM contacts in a Ca2+-dependent manner, and the ion has dual effects in both enriching the protein at the membrane contact sites and in promoting lipid transport. Mammalian Esyts form homo- and heterodimers, and the properties of the dimers depends on their composition (PMID: 26202220). Drosophila only have one Esyt (dEsyt) which is structurally most similar to mammalian Esyt2, and the authors have previously shown how this protein is required for photoreceptor function (PMID: 32716137), although the role of Ca2+ was not investigated in that study. However, an earlier study has shown that mutations of all Ca2+-coordinating residues in dEsyt impairs protein function in Drosophila neurons (PMID: 28882990), so a similar Ca2+-dependence in the retina would be expected. The results from the present study confirm the requirement of Ca2+ signaling for dEsyt function, and extends this Ca2+-dependent regulation to also involve photoreceptor-induced Ca2+ signaling, which corroborates many other studies showing the requirement of Ca2+ signaling for the regulation of Esyt function in mammalian cells (e.g. PMID: 23791178; PMID: 27065097; PMID: 29222176; PMID: 26202220; PMID: 24183667; PMID: 30589572). As such, the results from this study represent an incremental step towards understanding Esyt function in vivo. These results would be of greatest interest to researchers working of photoreceptor function, and of some interest to a broader audience working on membrane contact sites and signal transduction. My own background is in mammalian cell biology, with a focus on lipid and Ca2+ signaling and inter-organelle communication. I have limited understanding of the model system used here (Drosophila photoreceptor cells).


      We would like to provide an alternative perspective on the reviewer’s view that “As such, the results from this study represent an incremental step towards understanding Esyt function in vivo.”

      We are well aware of the content in several studies of Esyt in mammalian cells including the ones cited by the reviewer (e.g. PMID: 23791178; PMID: 27065097; PMID: 29222176; PMID: 26202220; PMID: 24183667; PMID: 30589572). These have been cited in our manuscript. However, it is important to recognize that each of these studies is an analysis of the properties of mammalian Esyt as a molecule in the context of Ca2+. However, none of these studies addresses the key question of whether the regulation of Esyt by Ca2+ is important for cellular function or to support cell physiology. The reason for this is quite straightforward and well known in the field. To date, there is no cellular or physiological phenotype that is reported to depend on endogenous Esyt function in mammalian cellular or animal models. As an illustrative example, deletion of all three mammalian Esyt does not affect cell signalling (PMID 23791178) including Ca2+ signalling and a triple knockout of all three Esyt in mice (PMID: 27348751) has no discernable phenotype.

      By contrast, deletion of the single Esyt gene in Drosophila results in robust phenotypes in adult photoreceptors (PMID: 32716137). Using these phenotypes, in this manuscript we study the importance of Ca2+ dependent regulation of cellular functions mediated by dEsyt. Therefore, this study fills an important unfilled gap in establishing the mechanism by which dEsyt proteins regulate cellular functions in vivo, in a Ca2+ dependent manner. We respectfully ask that this not be caricatured as an incremental step.


      Reviewer #2

      Evidence, reproducibility and clarity

      Esyt is a C domain (a Ca2+ binding domain) containing protein that localizes to the ER-MCS, playing a role in ER-mitochondria tethering and lipid transfer. At the same time, proteins at the ER-MCS are well-positioned to sense changing levels of Ca2+. Previous studies reported that loss of Esyt in Drosophila causes a loss of ER-PM integrity and retinal degeneration. Here, the authors report the consequence of disrupting the Esyt C domain in Drosophila photoreceptor cells. They used in-silico strategies to identify the Ca2+ contacting residues within the C domain and generated transgenic flies containing either the wild type or the Esyt-CaBM mutants. They show that the wild type transgene rescues several Esyt KO phenotypes in the Drosophila photoreceptors. In some cases, they report dominant negative effects of Esyt-CaBM overexpression.

      This is a straightforward structure-function analysis of the Esyt C domain. Overall, the experiments are well executed. At the same time, a few aspects of the manuscript could be further improved. For example, the authors analyze multiple aspects of photoreceptor integrity. In some cases, they show that the mutant Esyt transgene shows dominant negative effects. In others, there is no evidence or even a partial function. Clarifying these points could be helpful. Below are a few specific points for the authors' consideration:

      Major Comments

      1. RDGB is a protein that localizes to the ER-MCS. Esyt-CABM-GFP expression causes RDGB mis-localization even in the presence of wild type Esyt expression, suggestive of a dominant negative effect (Fig. 4C). But Esyt CaBM-GFP expression doesn't seem to have a dominant negative effect on contact site distance (Fig. 4D). Are the authors not seeing a dominant negative effect because they didn't examine older flies? Or, is there a distinct effect of Esyt CaBM on RDGB localization and contact site distance? If there is a distinct effect, what is the reason? As the reviewer correctly mentions, we are not seeing a dominant negative effect of dEsytCaBM::GFP expression on contact site distance because we didn't examine older flies.

      Dominant negative effect of dEsytCaBM on the wild type protein is observed in all phenotypes analyzed. The contact site distance analysis shown in the paper is done on day 1 old constant dark reared flies. Contact site distance exhibited by dEsytCaBM is like that of dEsytKO photoreceptors at day 1 post eclosion. dEsyt deprived photoreceptors are comparable to its wild type counterpart at Day 1 in all aspects of phototransduction (PMID: 32716137). But as a function of age and illumination, the dEsytKO photoreceptors exhibit progressive loss in contact site integrity, followed by induction of retinal degeneration and RDGB mis-localisation (PMID: 32716137). These observations are consistent in dEsytCaBM.

      During the revision, the following experiments will be included to strengthen this statement:

      • Add the MCS density and gap distance in dEsytKO photoreceptors at Day1 in Figure 3C.
      • Electron Microscopy to check MCS density and distance in Rh1>dEsytCaBM at Day 6CL with appropriate control genotypes.

      Esyt-CABM-GFP partially rescues the Esyt KO phenotype in retinal degeneration (Fig 6). This is surprising since cellular assays in Fig 4 show a failure of Esyt-CaBM to localize to ER-MCS. The results here contrast with earlier data showing that Esyt-CABM has dominant negative effects. How will the authors interpret the results? Is it possible that Esyt-CAMB still has some residual Ca2+ binding activity? Alternatively, does this result imply that Esyt can still function (albeit at lower capacity) without binding Ca2+? Is there Esyt function unrelated to ER-MCS site maintenance when it comes to its role in retinal degeneration? A reasonable explanation is warranted.

      Partial rescue of dEsytKO phenotypes by Rh1>dEsytCaBM; dEsytKO photoreceptors indicate that apart from calcium sensing, there might be another function for dEsyt at the ER-PM interface which is yet to be discovered.


      Minor Comments:

      Figure legends refer to "SMC" (I am guessing they are referring to Sub microvillar cisternae) without defining it in the text.

      Changes will be incorporated as per the suggestion.


      Significance

      This study will be of interest to those generally interested in the ER mitochondria contact sites. The main significance here is in dissecting the role of the C-domain within the Esyt protein. The authors demonstrate a physiological role using Drosophila photoreceptors as a model.

      We thank the reviewer for appreciating the significance of our study which seeks to show the in vivo significance of the Ca2+ regulation of dEsyt for in vivo function.

      __Reviewer #3 __

      (Evidence, reproducibility and clarity (Required)):

      Summary

      In the present work, the authors explore the role of Ca2+ binding to Esyt in the regulation of ER-PM contact sites using drosophila photoreceptors as a model system. By expressing in wild type or in EsytKO flies a mutated version of dEsyt which is predicted to lose Ca2+ binding, they highlight a potential role of Ca2+ binding to Esyt in the regulation of ER-PM contact sites density and the development of rhabdomeres. The data clearly show the effect of Esyt mutant during development of photoreceptors in Drosophila. However, as discussed below, one essential missing point is the experimental proof that the mutant has indeed lost its ability to bind Ca2+, and that PIP2 binding is not perturbed.

      Major comments

      1. One major comment is the lack of experimental proof that the EsytCABM mutant is indeed unable to bind Ca2+. The MIB tool only gives a prediction and it is not sufficient to prove their statements throughout the manuscript on the requirement of Ca2+ binding for the regulation of MCS. We understand the reviewer’s comment that this manuscript does not contain experimental data demonstrating that dEsytCaBM does not bind Ca2+. However, as Reviewer 4 pointed out, it will be challenging to demonstrate this experimentally. A direct proof would likely come from measurements of the calcium binding affinity of dEsyt (which involves protein purification that is beyond the scope of this work). An indirect demonstration would be any cellular or in vivo experiment oar any additional in silico analysis. To provide additional indirect evidence to address this question, we will:

      2. Use the AlphaFold model to demonstrate that the arrangement of the calcium binding residues in dEsyt is compatible with Ca2+

      3. Evaluate if the wild type dEsyt is mislocalized in the photoreceptors upon eliminating the calcium entry to these specialized sensory neurons. The localization of wild type dEsyt will be examined in the mutants: norpAP24 (PLC null mutant) and trpl302; trp343 (protein null mutant of TRPL and TRP channels respectively) in which light induced calcium influx is eliminated. Moreover, they should check experimentally the potential differences in the capacity of EsytCABM mutant to bind PI(4,5)P2, which can potentially perturb its subcellular localization.

      As recommended by the reviewer, it is important to determine the PIP2 binding capacity of dEsytCaBM. The ability of wild type dEsyt to bind PIP2 has not been determined. We will test this and if it does so, the impact of CaBM on PIP2 binding can be tested.

      Figure 1A: the legend on the right side of the scheme is missing. On the left, RDGB and dEsyt don't associate with the PM.

      Changes will be incorporated as per the suggestion.

      line 125: the authors should describe more precisely the Trp mutant that they used.

      The text will be modified.

      Concerning the quantification of MCS density done throughout the paper, can the authors mention what they considered as an MCS, in other words, what distance they defined as the maximal distance between the ER and the PM.

      We used fixation methods that allow enhanced membrane preservation and better visualization of membranes and MCS (PMID: 2496206). Such images allowed us to quantify the fraction of SMC that are present at the base of the microvilli in each ultrathin section of a photoreceptor. The MCS is the dark stretch that can be seen at the base of the rhabdomere in each TEM image (PMID: 32716137). Contact site distance measured is the absolute distance between the visible demarcation of the PM and SMC as indicated by the yellow arrows in Figure 4D iii, vi, and ix.

      Figure 3: the localization of Esyt and EsytCABM in S2R cells and in vivo is not precisely analyzed: a co-staining with PM and ER markers should be added in order to state the localization at ER-PM MCS or at apical PM.

      As suggested, to better understand the compartmental localization of dEsyt in photoreceptors, we will use markers of PM (Rhabdomere) and ER (Sub Microvillar Cisternae) and conduct co-localization assays.

      line 181: the authors should precise in which membrane compartments Esyt is localized.

      The text will be modified.

      line 185-187: the conclusion here doesn't seem to fit the data, as the EsytCABM mutant looks enriched at ER-PM contact sites.

      As previously answered, we will remark on whether there is an enrichment of dEsytCaBM at the ER-PM contact sites following the co-localization experiment that is recommended in Q5.

      a paragraph on the production of Drosophila transgene mutants should be added to the Mat et Med section.

      The text will be added as suggested.

      considering the phenotypes observed for the EsytCABM mutant in vivo, the authors should provide an analysis of the level of expression of the exogenous proteins Esyt and EsytCABM by western blot in the different backgrounds. EsytCABM seems to be expressed at lower levels in Figure 3C.

      As per the suggestion, western blot analysis will be conducted and better representative confocal images depicting the protein levels will be added in the manuscript.

      Fig 4D: considering the perturbation of RDGB localization observed at Day 6, the authors should analyze the organization of MCS by TEM at Day 6, in addition to Day 1.

      We agree that to support the observation of RDGB mis-localization, the decrease in contact site integrity as a function of age and illumination (Day6CL) should be evaluated in Rh1>dEsytCaBM photoreceptors. The manuscript revision will include data from this experiment.

      the EsytCABM mutant exhibits strong dominant negative effects, but rescues completely or partially some of the phenotypes of Esyt KO: could the authors discuss and provide some hypothesis on this apparent discrepancy?

      We are unsure what the reviewer means by “apparent discrepancy”. When dEsytCaBM is expressed in wild type photoreceptors, it exhibits a strong dominant negative effect presumably by inhibiting the function of wild type dEsyt protein.

      dEsytKO is a protein null allele. Therefore, when dEsytCaBM is expressed in the dEsytKO background it does not exert a dominant negative effect as there is no wild type protein to interact with. The partial rescue of dEsytKO phenotypes by Rh1>dEsytCaBM; dEsytKO photoreceptors likely indicates that calcium binding is not the sole factor affecting dEsyt function at the ER-PM interface.

      lines 230-233: the sentence is not clear. I don't see any consistency between data in Figure 5B, showing only very partial rescue by EsytCABM, and the data in Figure 5C (ii) showing complete rescue of RDGB localization by EsytCABM.

      The time point (six days of continuous light exposure following eclosion) at which RDGB localization was analyzed becomes extremely important in thinking about this reviewer comment. If we look at the degeneration kinetics depicted in figure 5B, we can see that neurodegeneration begins in both dEsytKO and Rh1>dEsytCaBM on Day 8 post-eclosion; prior to which, on Day 6, RDGB is mislocalized from the base. However, in Rh1>dEsytCaBM; dEsytKO, the onset of degeneration is delayed, and the photoreceptors show intact structure until Day 8 or Day 10, and measurable retinal degeneration begins on Day 12. This may be the reason why, RDGB continues to be correctly localized in Rh1>dEsytCaBM; dEsytKO at Day 6CL.

      Figure 6D: could the authors comment the increase of MCS density observed in Esyt-GFP expressing flies.

      Esyt is proposed to function as a tether that connects the ER and PM (PMID: 23791178; PMID: 27065097; PMID: 29222176), bringing them closer together. Based on this idea, perhaps by expressing dEsyt::GFP we are drawing the membranes together thus establishing more MCS.

      on several TEM images, some pictures illustrating different conditions look very similar, as if they were serial cuts: Fig 1B (Day 1 and Day 14), Fig 4D (Rh1 and Rh1>dEsytCABM::GFP), Fig 6B Day 1 and Day 14 and Fig 6C Day 1. Could the authors check if there was a mistake with these pictures?

      The images are not taken from serial sections of the same TEM block as is evident from the arrangement of nucleus of each photoreceptor cell. As mentioned in the figure legends, all experiments are carried out using 3 independent blocks (N=3 fly heads) prepared from each genotype and 10 photoreceptors from each block/ fly retinae are used for quantification of contact site density/ contact site distance. Aside from the arrangement of the accessory cells and cellular nuclei, the TEM images will appear very similar since Drosophila photoreceptor neurons are symmetrically arranged, with around 700–800 ommatidia per eye each comprising 8 photoreceptors.

      Minor comments:

      • lines 84-88 : the sentence is not clear. Besides, the authors should precise what they mean by "extra-cellular Ca2+ influx enhance ER-PM contact sites". Which parameter exactly has been shown to be regulated by Ca2+?

      The paper by Idevall-Hagren et al. proposes that following store operated Ca2+ influx, Esyt1 translocates to ER-PM junctions and the number of ER-PM contact sites increases. Please refer to this section of the publication from Idevall-Hagren et al. (2015) (PMID: 26202220):

      “As detected by TIRF microscopy, the depletion of Ca2+ from the lumen of the ER occurring under these conditions led to a progressive accumulation of ER‐anchored STIM1 at the PM, where it activates Orai Ca2+ channels (Fig 4C). Subsequent addition of 1–10 mM Ca2+ to the extracellular medium, either in the absence or in the presence of SERCA inhibitors, caused a massive increase in cytosolic Ca2+ (SOCE) through the activated Ca2+ channels (Figs 4A and EV4D–G). Such increase induced a very robust translocation of E‐Syt1 to the PM (Figs 4B and EV4D–G), which, in the absence of SERCA inhibition (i.e., when a reversible inhibitor of the SERCA pump had been washed out), preceded the dissociation of STIM1 and the inactivation of SOCE (Fig 4D). Inspection of TIRF microscopy images during the manipulation showed that E‐Syt1 does not form new contacts but populates and expands contacts previously occupied by STIM1.”

      • lines 108-110: can you give the reference?

      Reference for the localization of dEsyt to ER-PM MCS is Nath et.al PMID PMID: 32716137

      Reference for the localization of TRP and TRPL at the microvillar plasma membrane: Numerous primary research papers have shown this- for example see review PMID: 11557987, PMID: 22487656

      • line 189: the authors should summarize the findings in one sentence. "Functional activity" would refer to lipid transfer.

      The text will be modified as per the suggestion.

      Reviewer #3 (Significance (Required)):

      General assessment

      The work relies on a model system that enables the exploration of the role of Esyt in vivo, in a fundamental process highly regulated during development. The data clearly show the effect of Esyt mutant during development of photoreceptors in Drosophila but as discussed before, some experimental evidences are missing to completely prove the statements.

      Advance

      This work brings new insights in the functional role of lipid transfer during development and explores how the dialog between lipid transfer and Ca2+ flux can influence MCS organization. The interesting points that could be explored in the paper are the effects of a Ca2+ influx on Esyt and EsytCABM localization, and on their lipid transfer activity.

      Audience

      This work would be of interest for the membrane contact sites community and for the Developmental biology community.

      We thank the reviewer for highlighting the significance of our work and the clarity of the data. Additional data to address the points they have raised will be provided.

      __Reviewer #4 __

      (Evidence, reproducibility and clarity (Required)):

      In this study, Nath et al., aim at understanding the role of dESyt Ca2+ binding activity on ER-PM MCS in D. melanogaster photoreceptors. Using a combination of transmission electron microscopy and fluorescence microscopy, the authors explore the ability of a dESyt mutant, supposedly unable to bind Ca2+ (based on homology with the human ortholog hESyt2), to recapitulate the function of the wild type version of the protein in establishing ER-PM MCS and modulating their density.

      Findings:

      1) MCS density depends on the activity of TRP and TRPL channels in aging photoreceptors.

      2) Mutation of dESyt Ca2+ binding residues (dEsytCaBM::GFP) leads to a gross mis-localization of the protein, even in the presence of the endogenous protein.

      3) Overexpression of the mutant affects the structure of photoreceptors upon constant illumination.

      4) After 6 days of continuous illumination, RDGB is mis-localized in cells overexpressing dEsytCaBM::GFP.

      5) Overexpressed dEsytCaBM::GFP fails to reduce the distance between ER and PM, meaning it fails to establish ER-PM contract sites, while overexpressed dEsyt::GFP show reduced MCS distance. Overexpressed dEsyt::GFP also leads to a 10% increase in MCS density compared to WT or cells expressing dEsytCaBM::GFP.

      6) dEsytCaBM::GFP is not able to rescue the light dependent retinal degeneration of dESytKO, although it slightly delays the onset, but is able to rescue RDGB localization at day 6 of constant illumination.

      7) Examining MCS density in dESytKO cells, rescues with dEsyt::GFP and dEsytCaBM::GFP show a slightly higher MCS density than dESytKO at day 1. At day 14, ER-PM MCS were non-existent in dESytKO, unchanged in dEsyt::GFP and reduced by 20% in dEsytCaBM::GFP compared to day1.

      Specific comments:

      My field of expertise is biochemistry and structural biology (including cellular cryo-electron tomography), but I have no experience with drosophila biology, so I am not able to judge the drosophila work per se.

      While I find the confocal microscopy experiments compelling, I have some reservations regarding the quantification of the TEM images (MCS distances and density) as it was done manually, and therefore, to some extent subjective, especially, when differences between conditions are in the order of 10%. I would have found the quantification more convincing if done systematically, i.e. segmenting the MCS and computationally measuring distances and densities. Otherwise, the authors could expand a little bit on how their methodology is accurate.

      As the reviewer correctly mentions, the quantification will be more convincing if done systematically, i.e. segmenting the MCS and computationally measuring distances and densities. For MCS measurements, we have experimented with the segmentation method using ImageJ and Imaris. As mentioned in the answer to Q4 of reviewer 3, we used fixation methods that allow enhanced membrane preservation and better visualization of membranes and MCS (Matsumoto‐Suzuki et al, 1989). However, this staining method does not selectively stain the ER which is part of the MCS but all the ER. Due to this, automated segmentation poses significant challenges.

      The primary drawback of the segmentation method is that, in the process of training the software to predict/detect distinct cellular compartments, it recognizes all ER membranes, including SMC as well as the ER that is not part of the MCS. As a result, the software's minimum distance calculation may be between PM and SMC or PM and generic ER, which does not help the analysis we wish to perform. Similarly, to determine the contact site distance in images with obscure ER and PM boundaries, the software uses the border it can identify—which is typically inside the rhabdomere rather than at its edge. For the contact site density measurements, software is not able to distinguish between ER and pigment granules close to the rhabdomere as the gray scale value for both these compartments are comparable.

      Advantages of manual approach:

      To account for potential effects of photoreceptor depth on contact site density and distance, we have analyzed TEM sections obtained directly from the nuclear plane of the photoreceptors to calculate both contact site density and distance. Additionally, by utilizing the freehand line tool, manual analysis enables us to define the length of each little section of the MCS and the base of the rhabdomere. The entire length of the MCS at the base is then calculated by adding each segment together. An illustration of how the manual analysis is done will be included as part of methods in the revision.

      Another point is whether the levels of expression of dESyt proteins (dESyt-GFP and dESytCABM-GFP) are comparable. In the overexpression experiments, what are the expression levels of the constructs compared to the endogenous protein? The authors should provide e.g. a Western blot.

      As per the suggestion, western blot analysis will be conducted to compare the expression levels of the constructs utilized to the endogenous protein.

      Concerning the modelling, while I do think that the identification of dESyt Ca2+ binding residues is correct (the sequence alignment is convincing and the sequence identity is very high), and that most likely the structural arrangement will be conserved, homology modelling (using MODELLER with a single reference) leads to models highly similar to the input reference (in particular when the sequence identity is very high). Therefore, rmsd will necessarily be low and the side chain arrangement of conserved residues will be identical. This is unlikely to happen, as protein structures will not be identical despite high sequence conservation. In addition, a crystal structure is a snapshot of a protein conformation that is favorable for crystal formation. It would have been more interesting to use an AlphaFold model and show that the arrangement on the residues is compatible with Ca2+ binding (i.e., the C positions are similar).

      We agree with the reviewer that the data presented to demonstrate the inability of dEsytCaBM to bind Ca2+ is inadequate as is also pointed out by other reviewers. It would be crucial to prove this using multiple approaches. As suggested AlphaFold model will be used to answer the same.

      Minor comments:

      Line 102: indicate what PI and PA stand for (I don't think that there is a need for acronyms when they are not reused in the text later on).

      Changes will be incorporated as per the suggestion.

      Line 217-219: "When the same experimental set was examined for MCS density, we discovered that the density enhanced by 10% in Rh1>dEsyt::GFP while being comparable between wild type and dEsytCaBM::GFP flies." The authors don't comment on this finding. Does that imply that increase in the protein levels leads to increase in MCS density?

      Yes. Increase in wild type dEsyt protein levels can establish more contact sites as well as reduce the contact site distance which further elucidates the protein's role in functional tethering as mentioned in line 215 as proposed by previous studies in other models (PMID: 23791178; PMID: 27065097; PMID: 29222176).

      Lines 298-302: "...implying that dEsytCaBM exerts a dominant negative effect on wild type dEsyt. One possible mechanism for the phenotypes exhibited by dEsytCaBM expression in wild type cells is suggested by the findings of a structural and mass spectrometry investigation of hEsyt2 that reveals that the SMP domain dimerizes to create a 90Å long cylinder to facilitate the transfer of lipids (Schauder et al., 2014)." It is not clear to me what the authors suggest here: because of the dimerisation between wild type and mutant, the mutant has a negative effect or that the SMP dimerization is somehow impaired in dEsytCaBM?

      SMP domain of Esyt proteins have previously been shown to dimerize (PMID: 23791178, PMID: 24847877). They are known to form either homodimers or heterodimers in mammalian system where there are three genes that code for the protein (Esyt1, 2 and 3). In Drosophila, since it is just one gene that codes for the protein, our hypothesis is that one copy of the functional wild type gene dimerizes with the CaBM mutant and thereby render the wild type gene product nonfunctional.

      Line 304-305: "...protein expression was restricted to the cell body rather than the presynaptic terminals...". I am not sure that this is correct. The fact that a protein is localizing to a compartment does not mean that its expression is restricted to that compartment (one should measure mRNA levels to conclude this).

      The statement is based on the findings made by Kikuma et al, 2017 (PMID: 28882990) when they tried to understand the role of dEsyt at the NMJs.

      In figure 1B legend, indicate what SMC stands for (the acronym should be indicated in figure 1A legend).

      The text will be added as suggested.

      In figure 2A legend Ca binding in black box but in red boxes in figure.

      Changes will be incorporated as per the suggestion.

      **Referees cross-commenting**

      I agree with the other reviewers that one of the premise of this study relies on the loss of calcium binding by the dESyt mutant and this is not experimentally proven by the authors. However, I find that this will be difficult to prove in vivo. Only measurements of dESyt calcium binding affinity would constitute a direct proof (which requires protein purification. Any in vivo or cellular experiment would be an indirect proof. I believe that based on the high sequence conservation with ESyt proteins, the calcium binding residues have been correctly identified.

      Reviewer #4 (Significance (Required)):

      ESyt proteins are known ER-PM tethers involved in lipid transfer at MCS in a Ca2+ dependent manner. Contrary to yeast and mammals, that have several ESyt orthologs, D. melanogaster has only one ESyt, making it an ideal model to study ESyt function in vivo. It has been previously shown that proper localization of ESyt at MCS depends on Ca2+ concentration: ESyts are anchors to the ER but translocate to the PM in response to elevation of Ca2+ levels in the cytosol (Fernández-Busnadiego et al., 2015). The finding that an ESyt mutant unable to bind calcium is not localized properly is therefore not surprising. The link between RDGB, a protein known to localize at MCS, and ESyt has been shown before but to my knowledge Nath et al., show for the first time that RDBG localization at MCS is directly dependent on the Ca2+ binding activity of ESyt. In addition, the authors convincingly demonstrate that the Ca2+ binding activity of dESyt is necessary to maintain the structure of aging photoreceptors.

      The main finding of this study is that the Ca2+ binding activity of dESyt regulates the density of ER-PM MCS in photoreceptors. If true (see my comment below), that would be a novel finding, although the authors don't propose any mechanistic explanation for this.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      We haven't made any changes to the manuscript yet. However, we will be able to implement the changes mentioned in the pointwise response to reviewers above.

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      • *

      We feel that experiments to directly determine the calcium binding of dEsyt and the loss of this in dEsytCaBM are beyond the scope of this study. This is because of the huge work to heterologously express and purify the protein. We have proposed alternate ways to strengthen this conclusion.

    1. Author response:

      The following is the response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Nitta et al, in their manuscript titled, "Drosophila model to clarify the pathological significance of OPA1 in autosomal dominant optic atrophy." The novelty of this paper lies in its use of human (hOPA1) to try to rescue the phenotype of an OPA1 +/- Drosophilia DOA model (dOPA). The authors then use this model to investigate the differences between dominant-negative and haploinsufficient OPA1 variants. The value of this paper lies in the study of DN/HI variants rather than the establishment of the drosophila model per se as this has existed for some time and does have some significant disadvantages compared to existing models, particularly in the extra-ocular phenotype which is common with some OPA1 variants but not in humans. I judge the findings of this paper to be valuable with regards to significance and solid with regards to the strength of the evidence.

      Suggestions for improvements:

      (1) Stylistically the results section appears to have significant discussion/conclusion/inferences in section with reference to existing literature. I feel that this information would be better placed in the separate discussion section. E.g. lines 149-154.

      We appreciate the reviewer’s suggestion to relocate the discussion, conclusions, and inferences, particularly those that reference existing literature, to a separate discussion section. For lines 149–154, we placed them in the discussion section (lines 343–347) as follows. “Our established fly model is the first simple organism to allow observation of degeneration of the retinal axons. The mitochondria in the axons showed fragmentation of mitochondria. Former studies have observed mitochondrial fragmentation in S2 cells (McQuibban et al., 2006), muscle tissue (Deng et al., 2008), segmental nerves (Trevisan et al., 2018), and ommatidia (Yarosh et al., 2008) due to the LOF of dOPA1.”

      For lines 178–181, we also placed them in the discussion section (lines 347–351) as follows. “Our study presents compelling evidence that dOPA1 knockdown instigates neuronal degeneration, characterized by a sequential deterioration at the axonal terminals and extending to the cell bodies. This degenerative pattern, commencing from the distal axons and progressing proximally towards the cell soma, aligns with the paradigm of 'dying-back' neuropathy, a phenomenon extensively documented in various neurodegenerative disorders (Wang et al., 2012). ”

      For lines 213–217, 218–220, and 222–223, we also placed them in the discussion section (lines 363– 391) as follows. “To elucidate the pathophysiological implications of mutations in the OPA1 gene, we engineered and expressed several human OPA1 variants, including the 2708-2711del mutation, associated with DOA, and the I382M mutation, located in the GTPase domain and linked to DOA. We also investigated the D438V and R445H mutations in the GTPase domain and correlated with the more severe DOA plus phenotype. The 2708-2711del mutation exhibited limited detectability via HA-tag probing. Still, it was undetectable with a myc tag, likely due to a frameshift event leading to the mutation's characteristic truncated protein product, as delineated in prior studies (Zanna et al., 2008). Contrastingly, the I382M, D438V, and R445H mutations demonstrated expression levels comparable to the WT hOPA1. However, the expression of these mutants in retinal axons did not restore the dOPA1 deficiency to the same extent as the WT hOPA1, as evidenced in Figure 5E. This finding indicates a functional impairment imparted by these mutations, aligning with established understanding (Zanna et al., 2008). Notably, while the 2708-2711del and I382M mutations exhibited limited functional rescue, the D438V and R445H mutations did not show significant rescue activity. This differential rescue efficiency suggests that the former mutations, particularly the I382M, categorized as a hypomorph (Del Dotto et al., 2018), may retain partial functional capacity, indicative of a LOF effect but with residual activity. The I382M missense mutation within the GTPase domain of OPA1 has been described as a mild hypomorph or a disease modifier. Intriguingly, this mutation alone does not induce significant clinical outcomes, as evidenced by multiple studies (Schaaf et al., 2011; Bonneau et al., 2014; Bonifert et al., 2014; Carelli et al., 2015). A significant reduction in protein levels has been observed in fibroblasts originating from patients harboring the I382M mutation. However, mitochondrial volume remains unaffected, and the fusion activity of mitochondria is only minimally influenced (Kane et al., 2017; Del Dotto et al., 2018). This observation is consistent with findings reported by de la Barca et al. in Human Molecular Genetics 2020, where a targeted metabolomics approach classified I382M as a mild hypomorph. In our current study, the I382M mutation preserves more OPA1 function compared to DN mutations, as depicted in Figures 5E and F. Considering the results from our Drosophila model and previous research, we hypothesize that the I382M mutation may constitute a mild hypomorphic variant. This might explain its failure to manifest a phenotype on its own, yet its contribution to increased severity when it occurs in compound heterozygosity.

      (2) I do think further investigation as to why a reduction of mitochondria was noticed in the knockdown. There are conflicting reports on this in the literature. My own experience of this is fairly uniform mitochondrial number in WT vs OPA1 variant lines but with an increased level of mitophagy presumably reflecting a greater turnover. There are a number of ways to quantify mitochondrial load e.g. mtDNA quantification, protein quantification for tom20/hsp60 or equivalent. I feel the reliance on ICC here is not enough to draw conclusions. Furthermore, mitophagy markers could be checked at the same time either at the transcript or protein level. I feel this is important as it helps validate the drosophila model as we already have a lot of experimental data about the number and function of mitochondria in OPA+/- human/mammalian cells.

      We thank the reviewer for the insightful comments and suggestions regarding our study on the impact of mitochondrial reduction in a knockdown model. We concur with the reviewer’s observation that our initial results did not definitively demonstrate a decrease in the number of mitochondria in retinal axons. Furthermore, we measured mitochondrial quantity by conducting western blotting using antiCOXII and found no reduction in mitochondrial content with the knockdown of dOPA1 (Figure S4A and B). Consequently, we have revised our manuscript to remove the statement “suggesting a decreased number of mitochondria in retinal axons. However, whether this decrease is due to degradation resulting from a decline in mitochondrial quality or axonal transport failure remains unclear.” Instead, we have refocused our conclusion to reflect our electron microscopy findings, which indicate reduced mitochondrial size and structural abnormalities. The reviewer’s observation of consistent mitochondrial numbers in WT versus mutant variant lines and elevated mitophagy levels prompted us to evaluate mitochondrial turnover as a significant factor in our study. Regarding verifying mitophagy markers, we incorporated the mito-QC marker in our experimental design. In our experiments, mito-QC was expressed in the retinal axons of Drosophila to assess mitophagy activity upon dOPA1 knockdown. We observed a notable increase in mCherry positive but GFP negative puncta signals one week after eclosion, indicating the activation of mitophagy (Figure 2D–H). This outcome strongly suggests that dOPA1 knockdown enhances mitophagy in our Drosophila model. The application of mito-QC as a quantitative marker for mitophagy, validated in previous studies, offers a robust approach to analyzing this process. Our findings elucidate the role of dOPA1 in mitochondrial dynamics and its implications for neuronal health. These results have been incorporated into Figure 2, with the corresponding text updated as follows (lines 159–167): “Given that an increase in mitophagy activity has been reported in mouse RGCs and nematode ADOA models (Zaninello et al., 2022; Zaninello et al., 2020), the mitoQC marker, an established indicator of mitophagy activity, was expressed in the photoreceptors of Drosophila. The mito-QC reporter consists of a tandem mCherry-GFP tag that localizes to the outer membrane of mitochondria (Lee et al., 2018). This construct allows the measurement of mitophagy by detecting an increase in the red-only mCherry signal when the GFP is degraded after mitochondria are transported to lysosomes. Post dOPA1 knockdown, we observed a significant elevation in mCherry positive and GFP negative puncta signals at one week, demonstrating an activation of mitophagy as a consequence of dOPA1 knockdown (Figure 2D–H).”  

      (3) Could the authors comment on the failure of the dOPA1 rescue to return their biomarker, axonal number to control levels. In Figure 4D is there significance between the control and rescue. Presumably so as there is between the mutant and rescue and the difference looks less.

      As the reviewer correctly pointed out, there is a significant difference between the control and rescue groups, which we have now included in the figure. Additionally, we have incorporated the following comments in the discussion section (lines 329–342) regarding this significant difference: “In our study, expressing dOPA1 in the retinal axons of dOPA1 mutants resulted in significant rescue, but it did not return to control levels. There are three possible explanations for this result. The first concerns gene expression levels. The Gal4-line used for the rescue experiments may not replicate the expression levels or timing of endogenous dOPA1. Considering that the optimal functionality of dOPA1 may be contingent upon specific gene expression levels, attaining a wild-type-like state necessitates the precise regulation of these expression levels. The second is a nonautonomous issue. Although dOPA1 gene expression was induced in the retinal axons for the rescue experiments, many retinal axons were homozygous mutants, while other cell types were heterozygous for the dOPA1 mutation. If there is a non-autonomous effect of dOPA1 in cells other than retinal axons, it might not be possible to restore the wild-type-like state fully. The third potential issue is that only one isoform of dOPA1 was expressed. In mouse OPA1, to completely restore mitochondrial network shape, an appropriate balance of at least two different isoforms, lOPA1 and s-OPA1, is required (Del Dotto et al., 2017). This requirement implies that multiple isoforms of dOPA1 are essential for the dynamic activities of mitochondria.”

      (4) The authors have chosen an interesting if complicated missense variant to study, namely the I382M with several studies showing this is insufficient to cause disease in isolation and appears in high frequency on gnomAD but appears to worsen the phenotype when it appears as a compound het. I think this is worth discussing in the context of the results, particularly with regard to the ability for this variant to partially rescue the dOPA1 model as shown in Figure 5.

      As the reviewer pointed out, the I382M mutation is known to act as a disease modifier. However, in our system, as suggested by Figure 5, I382M appears to retain more activity than DN mutations. Considering previous studies, we propose that I382M represents a mild hypomorph. Consequently, while I382M alone may not exhibit a phenotype, it could exacerbate severity in a compound heterozygous state. We have incorporated this perspective in our revised discussion (lines 375-391).

      “Notably, while the 2708-2711del and I382M mutations exhibited limited functional rescue, the D438V and R445H mutations did not show significant rescue activity. This differential rescue efficiency suggests that the former mutations, particularly the I382M, categorized as a hypomorph (Del Dotto et al., 2018), may retain partial functional capacity, indicative of a LOF effect but with residual activity. The I382M missense mutation within the GTPase domain of OPA1 has been described as a mild hypomorph or a disease modifier. Intriguingly, this mutation alone does no induce significant clinical outcomes, as evidenced by multiple studies (Schaaf et al., 2011; Bonneau et al., 2014; Bonifert et al., 2014; Carelli et al., 2015). A significant reduction in protein levels has been observed in fibroblasts originating from patients harboring the I382M mutation. However, mitochondrial volume remains unaffected, and the fusion activity of mitochondria is only minimally influenced (Kane et al., 2017; Del Dotto et al., 2018). This observation is consistent with findings reported by de la Barca et al. in Human Molecular Genetics 2020, where a targeted metabolomics approach classified I382M as a mild hypomorph. In our current study, the I382M mutation preserves more OPA1 function compared to DN mutations, as depicted in Figures 5E and F. Considering the results from our Drosophila model and previous research, we hypothesize that the I382M mutation may constitute a mild hypomorphic variant. This might explain its failure to manifest a phenotype on its own, yet its contribution to increased severity when it occurs in compound heterozygosity.”

      (5) I feel the main limitation of this paper is the reliance on axonal number as a biomarker for OPA1 function and ultimately rescue. I have concerns because a) this is not a well validated biomarker within the context of OPA1 variants b) we have little understanding of how this is affected by over/under expression and c) if it is a threshold effect e.g. once OPA1 levels reach <x% pathology develops but develops normally when opa1 expression is >x%. I think this is particularly relevant when the authors are using this model to make conclusions on dominant negativity/HI with the authors proposing that if expression of a hOPA1 transcript does not increase opa1 expression in a dOPA1 KO then this means that the variant is DN. The authors have used other biomarkers in parts of this manuscript e.g. ROS measurement and mito trafficking but I feel this would benefit from something else particularly in the latter experiments demonstrated in figure 5 and 6.

      The reviewer raised concerns regarding the adequacy of axonal count as a validated biomarker in the context of OPA1 mutants. In response, we corroborated its validity using markers such as MitoSOX, Atg8, and COXII. Experiments employing MitoSOX revealed that the augmented ROS signals resulting from dOPA1 knockdown were mitigated by expressing human OPA1. Conversely, the mutant variants 2708-2711del, D438V, and R445H did not ameliorate these effects, paralleling the phenotype of axonal degeneration observed. These findings are documented in Figure 5F, and we have incorporated the following text into section lines 248–254 of the results:

      “Furthermore, we assessed the potential for rescuing ROS signals. Similar to its effect on axonal degeneration, wild-type hOPA1 effectively mitigated the phenotype, whereas the 2708-2711del, D438V, and R445H mutants did not (Figure 5F). Importantly, the I382M variant also reduced ROS levels comparably to the wild type. These findings demonstrate that both axonal degeneration and the increase in ROS caused by dOPA1 downregulation can be effectively counteracted by hOPA1. Although I382M retains partial functionality, it acts as a relatively weak hypomorph in this experimental setup.”

      Moreover, utilizing mito-QC, we observed elevated mitophagy in our Drosophila model, with these results now included in Figure 2D–H. Given the complexity of the genetics involved and the challenges in establishing lines, autophagy activity was quantified by comparing the ratio of Atg8-1 to Atg8-2 via Western blot analysis. However, no significant alterations were detected across any of the genotypes. Additionally, mitochondrial protein levels derived from COXII confirmed consistent mitochondrial quantities, showing no considerable variance following knockdown. These insights affirm that retinal axon degeneration and mitophagy activation are present in the Drosophila DOA model, although the Western blot analysis revealed no significant changes in autophagy activation. Such findings necessitate caution as this model may not fully replicate the molecular pathology of the corresponding human disease. These Western blot findings are presented in Figure S4, with the following addition made to section lines 255–263 of the results:

      “We also conducted Western blot analyses using anti-COXII and anti-Atg8a antibodies to assess changes in mitochondrial quantity and autophagy activity following the knockdown of dOPA1. Mitochondrial protein levels, indicated by COXII quantification, were evaluated to verify mitochondrial content, and the ratio of Atg8a-1 to Atg8a-2 was used to measure autophagy activation. For these experiments, Tub-Gal4 was employed to systemically knockdown dOPA1. Considering the lethality of a whole-body dOPA1 knockdown, Tub-Gal80TS was utilized to repress the knockdown until eclosion by maintaining the flies at 20°C. After eclosion, we increased the temperature to 29°C for two weeks to induce the knockdown or expression of hOPA1 variants. The results revealed no significant differences across the genotypes tested (Figure S4A–D).”

      In assessing the effects of dominant negative mutations, measurements including ROS levels, the ratio of Atg8-1 to Atg8-2, and the quantity of COXII protein were conducted, yet no significant differences were observed (Figure S6). This limitation of the fly model is mentioned in the results, noting the observation of the axonal degeneration phenotype but not alterations in ROS signaling, autophagy activity, or mitochondrial quantity as follows (line 287–290):

      “We investigated the impacts of dominant negative mutations on mitochondrial oxidation levels, mitochondrial quantity, and autophagy activation levels; however, none of these parameters showed statistical significance (Figure S6).”

      The reviewer also inquired about the effects of overexpressing and underexpressing OPA1 on axonal count and whether these effects are subject to a threshold. In response, we expressed both wild-type and variant forms of human OPA1 in Drosophila in vivo and assessed their protein levels using Western blot analysis. The results showed no significant differences in expression levels between the wild-type and variant forms in the OPA1 overexpression experiments, suggesting the absence of a variation threshold effect. These findings have been newly documented as quantitative data in Figure 5C. Furthermore, we have included a statement in the results section for Figure 6A, clarifying that overexpression of hOPA1 exhibited no discernible impact, as detailed on lines 274–276.

      “The results presented in Figure 5C indicate that there are no significant differences in the expression levels among the variants, suggesting that variations in expression levels do not influence the outcomes.”

      (6) Could the authors clarify what exons in Figure 5 are included in their transcript. My understanding is transcript NM_015560.3 contains exon 4,4b but not 5b. According to Song 2007 this transcript produces invariably s-OPA1 as it contains the exon 4b cleavage site. If this is true, this is a critical limitation in this study and in my opinion significantly undermines the likelihood of the proposed explanation of the findings presented in Figure 6. The primarily functional location of OPA1 is at the IMM and l-OPA1 is the primary opa1 isoform probably only that localizes here as the additional AA act as a IMM anchor. Given this is where GTPase likely oligomerizes the expression of s-OPA1 only is unlikely to interact anyway with native protein. I am not aware of any evidence s-OPA1 is involved in oligomerization. Therefore I don't think this method and specifically expression of a hOPA1 transcript which only makes s-OPA1 to be a reliable indicator of dominant negativity/interference with WT protein function. This could be checked by blotting UAS-hOPA1 protein with a OPA1 antibody specific to human OPA1 only and not to dOPA1. There are several available on the market and if the authors see only s-OPA1 then it confirms they are not expressing l-OPA1 with their hOPA1 construct.

      As suggested by the reviewer, we performed a Western blot using a human OPA1 antibody to determine if the expressed hOPA1 was producing the l-OPA1 isoform, as shown in band 2 of Figure 5D. The results confirmed the presence of both l-OPA1 and what appears to be s-OPA1 in bands 2 and 4, respectively. These findings are documented in the updated Figure 5D, with a detailed description provided in the manuscript at lines 224-226. Additionally, the NM_015560.3 refers to isoform 1, which includes only exons 4 and 5, excluding exons 4b and 5b. This isoform can express both l-OPA1 and s-OPA1 (refer to Figure 1 in Song et al., J Cell Biol. 2007). We have updated the schematic diagram in the figure to include these exons. The formation of s-OPA1 through cleavage occurs at the OMA1 target site located in exon 5 and the Yme1L target site in exon 5b of OPA1. Isoform 1 of OPA1 is prone to cleavage by OMA1, but a homologous gene for OMA1 does not exist in Drosophila. Although a homologous gene for Yme1L is present in Drosophila, exon 5b is missing in isoform 1 of OPA1, leaving the origin of the smaller band resembling s-OPA1 unclear at this point.

      Reviewer #2 (Public Review):

      The data presented support and extend some previously published data using Drosophila as a model to unravel the cellular and genetic basis of human Autosomal dominant optic atrophy (DOA). In human, mutations in OPA1, a mitochondrial dynamin like GTPase (amongst others), are the most common cause for DOA. By using a Drosophila loss-of-function mutations, RNAi- mediated knockdown and overexpression, the authors could recapitulate some aspects of the disease phenotype, which could be rescued by the wild-type version of the human gene. Their assays allowed them to distinguish between mutations causing human DOA, affecting the optic system and supposed to be loss-of-function mutations, and those mutations supposed to act as dominant negative, resulting in DOA plus, in which other tissues/organs are affected as well. Based on the lack of information in the Materials and Methods section and in several figure legends, it was not in all cases possible to follow the conclusions of the authors.

      We appreciate the reviewer's constructive feedback and the emphasis on enhancing clarity in our manuscript. We recognize the concerns raised about the lack of detailed information in the Materials and Methods section and several figure legends, which may have obscured our conclusions. In response, we have appended the detailed genotypes of the Drosophila strains used in each experiment to a supplementary table. Additionally, we realized that the description of 'immunohistochemistry and imaging' was too brief, previously referenced simply as “immunohistochemistry was performed as described previously (Sugie et al., 2017).” We have now expanded this section to include comprehensive methodological details. Furthermore, we have revised the figure legends to provide clearer and more thorough descriptions.

      Similarly, how the knowledge gained could help to "inform early treatment decisions in patients with mutations in hOPA1" (line 38) cannot be followed.

      To address the reviewer's comments, we have refined our explanation of the clinical relevance of our findings as follows. We believe this revision succinctly articulates the practical application of our research, directly responding to the reviewer’s concerns about linking the study's outcomes to treatment decisions for patients with hOPA1 mutations. By underscoring the model’s value in differential diagnosis and its influence on initiating treatment strategies, we have clarified this connection explicitly, within the constraints of the abstract’s word limit. The revised sentence now reads: "This fly model aids in distinguishing DOA from DOA plus and guides initial hOPA1 mutation treatment strategies."

      Reviewer #3 (Public Review):

      Nitta et al. establish a fly model of autosomal dominant optic atrophy, of which hundreds of different OPA1 mutations are the cause with wide phenotypic variance. It has long been hypothesized that missense OPA1 mutations affecting the GTPase domain, which are associated with more severe optic atrophy and extra-ophthalmic neurologic conditions such as sensorineural hearing loss (DOA plus), impart their effects through a dominant negative mechanism, but no clear direct evidence for this exists particularly in an animal model. The authors execute a well-designed study to establish their model, demonstrating a clear mitochondrial phenotype with multiple clinical analogs including optic atrophy measured as axonal degeneration. They then show that hOPA1 mitigates optic atrophy with the same efficacy as dOPA1, setting up the utility of their model to test disease-causing hOPA1 variants. Finally, they leverage this model to provide the first direct evidence for a dominant negative mechanism for 2 mutations causing DOA plus by expressing these variants in the background of a full hOPA1 complement.

      Strengths of the paper include well-motivated objectives and hypotheses, overall solid design and execution, and a generally clear and thorough interpretation of their results. The results technically support their primary conclusions with caveats. The first is that both dOPA1 and hOPA1 fail to fully restore optic axonal integrity, yet the authors fail to acknowledge that this only constitutes a partial rescue, nor do they discuss how this fact might influence our interpretation of their subsequent results.

      As the reviewer rightly points out, neither dOPA1 nor hOPA1 achieve a complete recovery. Therefore, we acknowledge that this represents only a partial rescue and have added the following explanations regarding this partial rescue in the results and discussion sections.

      Result:

      Significantly —> partially (lines 207 and 228) Discussion (lines 329–342):

      In our study, expressing dOPA1 in the retinal axons of dOPA1 mutants resulted in significant rescue, but it did not return to control levels. There are three possible explanations for this result. The first concerns gene expression levels. The Gal4-line used for the rescue experiments may not replicate the expression levels or timing of endogenous dOPA1. Considering that the optimal functionality of dOPA1 may be contingent upon specific gene expression levels, attaining a wild-type-like state necessitates the precise regulation of these expression levels. The second is a non-autonomous issue. Although dOPA1 gene expression was induced in the retinal axons for the rescue experiments, many retinal axons were homozygous mutants, while other cell types were heterozygous for the dOPA1 mutation. If there is a non-autonomous effect of dOPA1 in cells other than retinal axons, it might not be possible to restore the wild-type-like state fully. The third potential issue is that only one isoform of dOPA1 was expressed. In mouse OPA1, to completely restore mitochondrial network shape, an appropriate balance of at least two different isoforms, l-OPA1 and s-OPA1, is required (Del Dotto et al., 2017). This requirement implies that multiple isoforms of dOPA1 are essential for the dynamic activities of mitochondria.

      The second caveat is that their effect sizes are small. Statistically, the results indeed support a dominant negative effect of DOA plus-associated variants, yet the data show a marginal impact on axonal degeneration for these variants. The authors might have considered exploring the impact of these variants on other mitochondrial outcome measures they established earlier on. They might also consider providing some functional context for this marginal difference in axonal optic nerve degeneration.

      In response to the reviewer’s comment regarding the modest effect sizes observed, we acknowledge that the magnitude of the reported changes is indeed small. To explore the impact of these variants on additional mitochondrial outcomes as suggested, we employed markers such as MitoSOX, Atg8, and COXII for validation. However, we could not detect any significant effects of the DOA plus-associated variants using these methods. We apologize for the redundancy, but to address Reviewer #1's fifth question, we present experimental results showing that while the increased ROS signals observed upon dOPA1 knockdown were rescued by expressing human OPA1, the mutant variants 2708-2711del, D438V, and R445H did not ameliorate this effect. This outcome mirrors the axonal degeneration phenotype and is documented in Figure 5F. The following text has been added to the results section lines 248–254:

      “Furthermore, we assessed the potential for rescuing ROS signals. Similar to its effect on axonal degeneration, wild-type hOPA1 effectively mitigated the phenotype, whereas the 2708-2711del, D438V, and R445H mutants did not (Figure 5F). Importantly, the I382M variant also reduced ROS levels comparably to the wild type. These findings demonstrate that both axonal degeneration and the increase in ROS caused by dOPA1 downregulation can be effectively counteracted by hOPA1. Although I382M retains partial functionality, it acts as a relatively weak hypomorph in this experimental setup.”

      Moreover, utilizing mito-QC, we observed elevated mitophagy in our Drosophila model, with these results now included in Figure 2D–H. Given the complexity of the genetics involved and the challenges in establishing lines, autophagy activity was quantified by comparing the ratio of Atg8-1 to Atg8-2 via Western blot analysis. However, no significant alterations were detected across any of the genotypes. Additionally, mitochondrial protein levels derived from COXII confirmed consistent mitochondrial quantities, showing no considerable variance following knockdown. These insights affirm that retinal axon degeneration and mitophagy activation are present in the Drosophila DOA model, although the Western blot analysis revealed no significant changes in autophagy activation. Such findings necessitate caution as this model may not fully replicate the molecular pathology of the corresponding human disease. These Western blot findings are presented in Figure S4, with the following addition made to section lines 255–263 of the results:

      “We also conducted Western blot analyses using anti-COXII and anti-Atg8a antibodies to assess changes in mitochondrial quantity and autophagy activity following the knockdown of dOPA1. Mitochondrial protein levels, indicated by COXII quantification, were evaluated to verify mitochondrial content, and the ratio of Atg8a-1 to Atg8a-2 was used to measure autophagy activation. For these experiments, Tub-Gal4 was employed to systemically knockdown dOPA1. Considering the lethality of a whole-body dOPA1 knockdown, Tub-Gal80TS was utilized to repress the knockdown until eclosion by maintaining the flies at 20°C. After eclosion, we increased the temperature to 29°C for two weeks to induce the knockdown or expression of hOPA1 variants. The results revealed no significant differences across the genotypes tested (Figure S4A–D).”

      In assessing the effects of dominant negative mutations, measurements including ROS levels, the ratio of Atg8-1 to Atg8-2, and the quantity of COXII protein were conducted, yet no significant differences were observed (Figure S6). This limitation of the fly model is mentioned in the results, noting the observation of the axonal degeneration phenotype but not alterations in ROS signaling, autophagy activity, or mitochondrial quantity as follows (line 287–290):

      “We investigated the impacts of dominant negative mutations on mitochondrial oxidation levels, mitochondrial quantity, and autophagy activation levels; however, none of these parameters showed statistical significance (Figure S6).”

      Despite these caveats, the authors provide the first animal model of DOA that also allows for rapid assessment and mechanistic testing of suspected OPA1 variants. The impact of this work in providing the first direct evidence of a dominant negative mechanism is under-stated considering how important this question is in development of genetic treatments for DOA. The authors discuss important points regarding the potential utility of this model in clinical science. Comments on the potential use of this model to investigate variants of unknown significance in clinical diagnosis requires further discussion of whether there is indeed precedent for this in other genetic conditions (since the model is nevertheless so evolutionarily removed from humans).

      As suggested by the reviewer, we have expanded the discussion in our study to emphasize in greater detail the significance of the fruit fly model and the MeDUsA software we have developed, elaborating on the model's potential applications in clinical science and its precedents in other genetic disorders. Our text is as follows (lines 299–318):

      “We have previously utilized MeDUsA to quantify axonal degeneration, applying this methodology extensively to various neurological disorders. The robust adaptability of this experimental system is demonstrated by its application in exploring a wide spectrum of genetic mutations associated with neurological conditions, highlighting its broad utility in neurogenetic research. We identified a novel de novo variant in Spliceosome Associated Factor 1, Recruiter of U4/U6.U5 Tri-SnRNP (SART1). The patient, born at 37 weeks with a birth weight of 2934g, exhibited significant developmental delays, including an inability to support head movement at 7 months, reliance on tube feeding, unresponsiveness to visual stimuli, and development of infantile spasms with hypsarrhythmia, as evidenced by EEG findings. Profound hearing loss and brain atrophy were confirmed through MRI imaging. To assess the functional impact of this novel human gene variant, we engineered transgenic Drosophila lines expressing both wild type and mutant SART1 under the control of a UAS promoter.

      Our MeDUsA analysis suggested that the variant may confer a gain-of-toxic-function (Nitta et al.,  2023). Moreover, we identified heterozygous loss-of-function mutations in DHX9 as potentially causative for a newly characterized neurodevelopmental disorder. We further investigated the pathogenic potential of a novel heterozygous de novo missense mutation in DHX9 in a patient presenting with short stature, intellectual disability, and myocardial compaction. Our findings indicated a loss of function in the G414R and R1052Q variants of DHX9 (Yamada et al., 2023). This experimental framework has been instrumental in elucidating the impact of gene mutations, enhancing our ability to diagnose how novel variants influence gene function.”

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall I enjoyed reading this paper. It is well presented and represents a significant amount of well executed study. I feel it further characterizes a poorly understood model of OPA1 variants and one which displays significant differences with the human phenotype. However I feel the use of this model with the author's experiments are not enough to validate this model/experiment as a screening tool for dominant negativity. I have therefore suggested the above experiments as a way to both further validate the mitochondrial dysfunction in this model and to ensure that the expressed transcript is able affect oligomerization as this is a pre-requisite to the authors conclusions.

      We assessed the extent to which our model reflects mitochondrial dysfunction using COXII, Atg8, and MitoSOX markers. Unfortunately, neither COXII levels nor the ratio of Atg8a-1 to Atg8a-2 showed significant variations across genotypes that would clarify the impact of dominant negative mutations. Nonetheless, MitoSOX and mito-QC results revealed that mitochondrial ROS levels and mitophagy are increased in Drosophila following intrinsic knockdown of dOPA1. These findings are documented in Figures 2, 5, and S6.

      Regarding oligomer formation, the specifics remain elusive in this study. However, the expression of dOPA1K273A, identified as a dominant negative variant in Drosophila, significantly disrupted retinal axon organization, as detailed in Figure S7. From these observations, we hypothesize that oligomerization of wild-type and dominant negative forms in Drosophila results in axonal degeneration. Conversely, co-expression of Drosophila wild-type with human dominant negative forms does not induce degeneration, suggesting that they likely do not interact.

      Reviewer #2 (Recommendations For The Authors):

      Materials and Methods:

      The authors used GMR-Gal4 to express OPA1-RNAi. I) GMR is expressed in most cells in the developing eye behind the morphogenetic furrow. So the defects observed can be due to knock- down in support cells rather than in photoreceptor cells.

      We have added the following sentences in the result (lines 194–196)."The GMR-Gal4 driver does not exclusively target Gal4 expression to photoreceptor cells. Consequently, the observed retinal axonal degeneration could potentially be secondary to abnormalities in support cells external to the photoreceptors.”

      OPA1-RNAi: how complete is the knock-down? Have the authors tested more than one RNAi line?

      We conducted experiments with an additional RNAi line, and similarly observed degeneration in the retinal axons (Figure S2 A and B; lines 178–179).

      The loss-of-function allele, induced by a P-element insertion, gives several eye phenotypes when heterozygous (Yarosh et al., 2008). Does RNAi expression lead to the same phenotypes?

      A previous report indicated that the compound eyes of homozygous mutations of dOPA1 displayed a glossy eye phenotype (Yarosh et al., 2008). Upon knocking down dOPA1 using the GMR-Gal4 driver, we also observed a glossy eye-like rough eye phenotype in the compound eyes. These findings have been added to Figure S3 and lines 192–194.

      There is no description on the way the somatic clones were generated. How were mutant cells in clones distinguished from wild-type cells (e. g. in Fig. 4).

      In the Methods section, we described the procedure for generating clones and their genotypes as follows (lines 502–505): "The dOPA1 clone analysis was performed by inducing flippase expression in the eyes using either ey-Gal4 with UAS-flp or ey3.5-flp, followed by recombination at the chromosomal location FRT42D to generate a mosaic of cells homozygous for dOPA1s3475." Furthermore, we have created a table detailing these genotypes. In these experiments, it was not possible to differentiate between the clone and WT cells. Accordingly, we have noted in the Results section (lines 201–203): "Note that the mutant clone analysis was conducted in a context where mutant and heterozygous cells coexist as a mosaic, and it was not possible to distinguish between them.”

      Why were flies kept at 29{degree sign}C? this is rather unusual.

      Increased temperature was demonstrated to induce elevated expression of GAL4 (Kramer and Staveley, Genet. Mol. Res., 2003), which in turn led to an enhanced expression of the target genes. Therefore, experiments involving knockdown assays or Western blotting to detect human OPA1 protein were exclusively conducted at 29°C. However, all other experiments were performed at 25°C, as described in the methods sections: “Flies were maintained at 25°C on standard fly food. For knockdown experiments (Figures 1C–E, 1F–H, 2A–H, 3B–K, 5F, S1, S2 A and B, and S6A), flies were kept at 29°C in darkness.” Furthermore, “We regulated protein expression temporally across the whole body using the Tub-Gal4 and Tub-GAL80TS system. Flies harboring each hOPA1 variant were maintained at a permissive temperature of 20°C, and upon emergence, females were transferred to a restrictive temperature of 29°C for subsequent experiments.”

      Legends:

      It would be helpful to have a description of the genotypes of the flies used in the different experiments. This could also be included as a table.

      We have created a table detailing the genotypes. Additionally, in the legend, we have included a note to consult the supplementary table for genotypes.

      Results:

      Line 141: It is not clear what they mean by "degradation", is it axonal degeneration? And if so, what is the argument for this here?

      In the manuscript, we addressed the potential for mitochondrial degradation; however, recognizing that the expression was ambiguous, the following sentence has been omitted: "Nevertheless, the degradation resulting from mitochondrial fragmentation may have decreased the mitochondrial signal.”

      Fig. 2: Axons of which photoreceptors are shown?

      We have added "a set of the R7/8 retinal axons" to the legend of Figure 2.

      Line 167: The authors write that axonal degeneration is more severe after seven days than after eclosion. Is this effect light-dependent? The same question concerns the disappearance of the rhabdomere (Fig. 3G–J).

      We conducted the experiments in darkness, ensuring that the observed degeneration is not light- dependent. This condition has been added to the methods section to clarify the experimental conditions.

      Line 178/179: Based on what results do they conclude that there is degeneration of the "terminals" of the axons?

      Quantification via MeDUsA has enabled us to count the number of axonal terminals, and a noted decrease has led us to conclude axonal terminal degeneration. We have published two papers on these findings. We have added the following description to the results section to clarify how we defined degeneration (lines 174–176): "We have assessed the extent of their reduction from the total axonal terminal count, thereby determining the degree of axonal terminal degeneration (Richard JNS 2022; Nitta HMG 2023).

      Line 189: They write: ".. we observed dOPA1 mutant axons...". How did they distinguish es mutant from the controls?

      Fig. 5 and Fig. 6: How did they distinguish genetically mutant cells from genetically control cells in the somatic clones?

      Mutant clone analysis was conducted in a context where mutant and heterozygous cells coexist as a mosaic, and it was not possible to distinguish between them. Accordingly, this point has been added to lines 201–203, “Note that the mutant clone analysis was conducted in a context where mutant and heterozygous cells coexist as a mosaic, and it was not possible to distinguish between them.” and the text in the results section has been modified as follows:

      (Before “To determine if dOPA1 is responsible for axon neurodegeneration, we observed the dOPA1 mutant axons by expressing full- length versions of dOPA1 in the photoreceptors at one day after eclosion and found that dOPA1 expression significantly rescued the axonal degeneration” —>

      (After “To determine if dOPA1 is responsible for axon neurodegeneration, we quantify the number of the axons in the dOPA1 eye clone fly with the expression of dOPA1 at one day after eclosion and found that dOPA1 expression partially rescued the axonal degeneration”

      Line 225/226: It is not clear to me how their approach "can quantitatively measure the degree of LOF".

      To address the reviewer's question and clarify how our approach quantitatively measures the degree of loss of function (LOF), we revised the statement (lines 238–247):

      "Our methodology distinctively facilitates the quantitative evaluation of LOF severity by comparing the rescue capabilities of various mutations. Notably, the 2708-2711del and I382M mutations demonstrated only partial rescue, indicative of a hypomorphic effect with residual activity. In contrast, the D438V and R445H mutations failed to show significant rescue, suggesting a more profound LOF. The correlation between the partial rescue by the 2708-2711del and I382M mutations and their classification as hypomorphic is significant. Moreover, the observed differences in rescue efficacy correspond to the clinical severities associated with these mutations, namely in DOA and DOA plus disorders. Thus, our results substantiate the model’s ability to quantitatively discriminate among mutations based on their impact on protein functionality, providing an insightful measure of LOF magnitude.”

      Discussion:

      Line 251, 252 and line 358: What is "the optic nerve" in the adult Drosophila?

      In humans, the axons of retinal ganglion cells (RGCs) are referred to as the optic nerve, and we posit that the retinal axons in flies are similar to this structure. In the introduction section, where it is described that the visual systems of flies and humans bear resemblance, we have appended the following definition (lines 107–108): “In this study, we defined the retinal axons of Drosophila as analogous to the human optic nerve.”

      Line 344: These bands appear only upon overexpression of the hOPA1 constructs, so this part of the is very speculative.

      Confirmation was achieved using anti-hOPA1, demonstrating that myc is not nonspecific. These results have been added to Figure 5D. Furthermore, the phrase “The upper band was expected as” has been revised to “From a size perspective, the upper band was inferred to represent the full-length hOPA1 including the mitochondria import sequence (MIS).” (lines 464–465)

      I was missing a discussion about the increase of ROS upon loss/reduction of dOPA1 observed by others and described here. Is there an increase of ROS upon expression of any of the constructs used?

      We demonstrated that not only axonal degeneration but also ROS can be suppressed by expressing human OPA1 in the genetic background of dOPA1 knockdown. Additionally, rescue was not possible with any variants except for I382M. Furthermore, we assessed whether there were changes in ROS in the evaluation of dominant negatives, but no significant differences were observed in this experimental system. These findings have been added to the discussion section as follows (lines 318–328). “Our research established that dOPA1 knockdown precipitates axonal degeneration and elevates ROS signals in retinal axons. Expression of human OPA1 within this context effectively mitigated both phenomena; it partially reversed axonal degeneration and nearly completely normalized ROS levels. These results imply that factors other than increased ROS may drive the axonal degeneration observed post-knockdown. Furthermore, while differences between the impacts of DN mutations and loss-of- function mutations were evident in axonal degeneration, they were less apparent when using ROS as a biomarker. The extensive use of transgenes in our experiments might have mitigated the knockdown effects. In a systemic dOPA1 knockdown, assessments of mitochondrial quantity and autophagy activity revealed no significant changes, suggesting that the cellular consequences of reduced OPA1 expression might vary across different cell types.”

      Reviewer #3 (Recommendations For The Authors):

      Consider being more explicit regarding literature that has or has failed to test a direct dominant negative effect by expressing a variant in question in the background of a full OPA1 complement. My understanding is that this is the first direct evidence of this widely held hypothesis. This lends to the main claim promoting the utility of fly as a model in general. The authors might also outline this in the introduction as a knowledge gap they fill through this study.

      In the introduction, we have incorporated a passage that highlights precedents capable of distinguishing between LOF and DN effects, and we note the absence of models capable of dissecting these distinctions within an in vivo organism. This study aims to address this gap, proposing a model that elucidates the differential impacts of LOF and DN within the context of a living model organism, thereby contributing to a deeper understanding of their roles in disease pathology. We added the following sentences in the introduction (lines 71–80).

      “In the quest to differentiate between LOF and DN effects within the context of genetic mutations, precedents exist in simpler systems such as yeast and human fibroblasts. These models have provided valuable insights into the conserved functions of OPA1 across species, as evidenced by studies in yeast models (Del Dotto et al., 2018) and fibroblasts derived from patients harboring OPA1 mutations (Kane et al., 2017). However, the ability to distinguish between LOF and DN effects in an in vivo model organism, particularly at the structural level of retinal axon degeneration, has remained elusive. This gap underscores the necessity for a more complex model that not only facilitates molecular analysis but also enables the examination of structural changes in axons and mitochondria, akin to those observed in the actual disease state.”

      The authors should clarify the language used in the abstract and introduction on the effect of hOPA1 DOA and DOA plus on the dOPA1- phenotype. Currently written as "none of the previously reports mutations known to cause DOA or DOA plus were rescued, their functions seems to be impaired." but presumably the authors mean that these variants failed to rescue to the dOPA1 deficient phenotype.

      We thank the reviewer for the constructive feedback. We acknowledge the need for clarity in our description of the effects of hOPA1 DOA and DOA plus mutations on the dOPA1- phenotype in both the abstract and the introduction. The current phrasing, "none of the previously reported mutations known to cause DOA or DOA plus were rescued, their functions seem to be impaired," may indeed be confusing. To address your concern, we have revised this statement to more accurately reflect our findings: "Previously reported mutations failed to rescue the dOPA1 deficiency phenotype." For Abstract site, we have changed as following. "we could not rescue any previously reported mutations known to cause either DOA or DOA plus.”→ “mutations previously identified did not ameliorate the dOPA1 deficiency phenotype.”

      DOA plus is associated with a multiple sclerosis-like illness; as written it suggests that the pathogenesis of sporadic multiple sclerosis and that associated with DOA plus share and underlying pathogenic mechanism. Please use the qualifier "-like illness." 

      We have added the term “multiple sclerosis-like illness” wherever “multiple sclerosis” is mentioned.

    1. Studies of the motions of the most remote globular clusters and the small galaxies that orbit our own show that the total mass of the Galaxy is at least 2 × 1012 MSun, which is about twenty times greater than the amount of luminous matter. Moreover, the dark matter (as astronomers have come to call the invisible material) extends to a distance of at least 200,000 light-years from the center of the Galaxy. Observations indicate that this dark matter halo is almost but not quite spherical. The obvious question is: what is the dark matter made of? Let’s look at a list of “suspects” taken from our study of astronomy so far. Since this matter is invisible, it clearly cannot be in the form of ordinary stars. And it cannot be gas in any form (remember that there has to be a lot of it). If it were neutral hydrogen gas, its 21-cm wavelength spectral-line emission would have been detected as radio waves. If it were ionized hydrogen, it should be hot enough to emit visible radiation. If a lot of hydrogen atoms out there had combined into hydrogen molecules, these should produce dark features in the ultraviolet spectra of objects lying beyond the Galaxy, but such features have not been seen. Nor can the dark matter consist of interstellar dust, since in the required quantities, the dust would significantly obscure the light from distant galaxies.

      Dark matter is an interesting concept in astrophysics precisely because we have absolutely no idea what it is. Because of how it affects gravity, we assume its some form of matter, but in truth we are unsure. Dark matter is more grounded in our current model of the universe that dark energy is, due to its effect on gravity, but it still follows the same habit of physicists encountering an unknown an labeling it "dark something" to account for the discrepancy in their model. It goes to show how there's still a lot more to learn about the universe, and that our current model may not be as correct as we think it is.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study of human intelligence has been the focus of cognitive neuroscience research, and finding some objective behavioral or neural indicators of intelligence has been an ongoing problem for scientists for many years. Melnick et al, 2013 found for the first time that the phenomenon of spatial suppression in motion perception predicts an individual's IQ score. This is because IQ is likely associated with the ability to suppress irrelevant information. In this study, a high-resolution MRS approach was used to test this theory. In this paper, the phenomenon of spatial suppression in motion perception was found to be correlated with the visuo-spatial subtest of gF, while both variables were also correlated with the GABA concentration of MT+ in the human brain. In addition, there was no significant relationship with the excitatory transmitter Glu. At the same time, SI was also associated with MT+ and several frontal cortex FCs.

      Strengths:

      (1) 7T high-resolution MRS is used.

      (2) This study combines the behavioral tests, MRS, and fMRI.

      Weaknesses:

      (1) In the intro, it seems to me that the multiple-demand (MD) regions are the key in this study. However, I didn't see any results associated with the MD regions. Did I miss something?

      Thank you to the reviewer for pointing this out. After careful consideration, we agree with your point of view. According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. This suggests that hMT+ does have the potential to become the core of MD system. However, due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated through the frontal cortex”, it is not yet sufficient to prove that hMT+is the core node of the MD system, we have adjusted the explanatory logic of the article. Briefly, we emphasize the de-redundancy of hMT+ in visual-spatial intelligence and the improvement of information processing efficiency, while weaken the significance of hMT+ in MD systems.

      (2) How was the sample size determined? Is it sufficient?

      Thank you to reviewer for pointing this out. We use G*power to determine our sample size. In the study by Melnick (2013), they reported a medium effect between SI and Perception Reasoning sub-ability (r=0.47). Here we use this r value as the correlation coefficient (ρ H1), setting the power at the commonly used threshold of 0.8 and the alpha error probability at 0.05. The required sample size is calculated to be 26. This ensures that our study has reasonable power to yield valid statistical results. Furthermore, compared to earlier within-subject studies like Schallmo et al.'s 2018 research, which used 22 datasets to examine GABA levels in MT+ and the early visual cortex (EVC), our study includes an enough dataset.

      (3) In Schallmo elife 2018, there was no correlation between GABA concentration and SI. How can we justify the different results different here?

      Thank reviewer for pointing this out. There are several differences between us:

      a. While the earlier study by Schallmo et al. (2018) employed 3T MRS, we utilize 7T MRS, enhancing our ability to detect and measure GABA with greater accuracy.

      b. Schallmo elife 2018 choose to use the bilateral hMT+ as the MRS measurement region while we use the left hMT+. The reason why we focus on left hMT+ are describe in reviewer 1. (6). Briefly, use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      c. The resolution of MRS sequence in Schallmo elife 2018 is 3 cm isotropic voxel, while we apply 2 cm isotropic voxel. This helps us more precisely locate hMT+ and exclude more white matter signal.

      (4) Basically this study contains the data of SI, BDT, GABA in MT+ and V1, Glu in MT+ and V1-all 6 measurements. There should be 6x5/2 = 15 pairwise correlations. However, not all of these results are included in Figure 1 and supplementary 1-3. I understand that it is not necessary to include all figures. But I suggest reporting all values in one Table.

      We thank the reviewer for the good suggestion, we have made a correlation matrix to reporting all values in Figure Supplementary 9.

      (5) In Melnick (2013), the IQ scores were measured by the full set of WAIS-III, including all subtests. However, this study only used the visual spatial domain of gF. I wonder why only the visuo-spatial subtest was used not the full WAIS-III?

      We thank the reviewer for pointing this out. The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.

      (6) In the functional connectivity part, there is no explanation as to why only the left MT+ was set to the seed region. What is the problem with the right MT+?

      We thank the reviewer for pointing this out. The main reason is that our MRS ROI is the left hMT+, we would like to make different models’ ROI consistent to each other. Use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      (7) In Melnick (2013), the authors also reported the correlation between IQ and absolute duration thresholds of small and large stimuli. Please include these analyses as well.

      We thank the reviewer for the good advice. Containing such result do help researchers compare the result between Melnick and us. We have made such figures in the revised version (Figure 3f, g).

      Reviewer #2 (Public Review):

      Summary:

      Recent studies have identified specific regions within the occipito-temporal cortex as part of a broader fronto-parietal, domain-general, or "multiple-demand" (MD) network that mediates fluid intelligence (gF). According to the abstract, the authors aim to explore the mechanistic roles of these occipito-temporal regions by examining GABA/glutamate concentrations. However, the introduction presents a different rationale: investigating whether area MT+ specifically, could be a core component of the MD network.

      Strengths:

      The authors provide evidence that GABA concentrations in MT+ and its functional connectivity with frontal areas significantly correlate with visuo-spatial intelligence performance. Additionally, serial mediation analysis suggests that inhibitory mechanisms in MT+ contribute to individual differences in a specific subtest of the Wechsler Adult Intelligence Scale, which assesses visuo-spatial aspects of gF.

      Weaknesses:

      (1) While the findings are compelling and the analyses robust, the study's rationale and interpretations need strengthening. For instance, Assem et al. (2020) have previously defined the core and extended MD networks, identifying the occipito-temporal regions as TE1m and TE1p, which are located more rostrally than MT+. Area MT+ might overlap with brain regions identified previously in Fedorenko et al., 2013, however the authors attribute these activations to attentional enhancement of visual representations in the more difficult conditions of their tasks. For the aforementioned reasons, It is unclear why the authors chose MT+ as their focus. A stronger rationale for this selection is necessary and how it fits with the core/extended MD networks.

      We really appreciate reviewer’s opinions. The reason why we focus on hMT+ is following: According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with high correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. In addition, Fedorenko et al. 2013, the averaged MD activity region appears to overlap with hMT+. Based on these findings, we assume that hMT+ does have the potential to become the core of MD system.

      (2) Moreover, although the study links MT+ inhibitory mechanisms to a visuo-spatial component of gF, this evidence alone may not suffice to position MT+ as a new core of the MD network. The MD network's definition typically encompasses a range of cognitive domains, including working memory, mathematics, language, and relational reasoning. Therefore, the claim that MT+ represents a new core of MD needs to be supported by more comprehensive evidence.

      Thank reviewer for pointing this out. After careful consideration, we agree with your point of view. Due to our results only delving into visuo-spatial intelligence, it is not yet sufficient to prove that hMT is the core node of the MD system. We will adjust the explanatory logic of the article, that is, emphasizing the de-redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript aims to understand the role of GABA-ergic inhibition in the human MT+ region in predicting visuo-spatial intelligence through a combination of behavioral measures, fMRI (for functional connectivity measurement), and MRS (for GABA/glutamate concentration measurement). While this is a commendable goal, it becomes apparent that the authors lack fundamental understanding of vision, intelligence, or the relevant literature. As a result, the execution of the research is less coherent, dampening the enthusiasm of the review.

      Strengths:

      (1) Comprehensive Approach: The study adopts a multi-level approach, i.e., neurochemical analysis of GABA levels, functional connectivity, and behavioral measures to provide a holistic understanding of the relationship between GABA-ergic inhibition and visuo-spatial intelligence.

      (2) Sophisticated Techniques: The use of ultra-high field magnetic resonance spectroscopy (MRS) technology for measuring GABA and glutamate concentrations in the MT+ region is a recent development.

      Weaknesses:

      Study Design and Hypothesis

      (1) The central hypothesis of the manuscript posits that "3D visuo-spatial intelligence (the performance of BDT) might be predicted by the inhibitory and/or excitation mechanisms in MT+ and the integrative functions connecting MT+ with the frontal cortex." However, several issues arise:

      (1.1) The Suppression Index depicted in Figure 1a, labeled as the "behavior circle," appears irrelevant to the central hypothesis.

      We thank the reviewer for pointing this out. In our study, the inhibitory mechanisms in hMT+ are conceptualized through two models: the neurotransmitter model and the behavioral model. The Suppression Index is essential for elucidating the local inhibitory mechanisms within the behavioral model. However, we acknowledge that our initial presentation in the introduction may not have clearly articulated our hypothesis, potentially leading to misunderstandings. We have revised the introduction to better clarify these connections and ensure the relevance of the Suppression Index is comprehensively understood.

      (1.2) The construct of 3D visuo-spatial intelligence, operationalized as the performance in the Block Design task, is inconsistently treated as another behavioral task throughout the manuscript, leading to confusion.

      We thank the reviewer for pointing this out. We acknowledge that our manuscript may have inconsistently presented this construct across different sections, causing confusion. To address this, we ensured a consistent description of 3D visuo-spatial intelligence in both the introduction and the discussion sections. But we maintained ‘Block Design task score' within the results section to help readers clarify which subtest we use.

      (1.3) The schematics in Figure 1a and Figure 6 appear too high-level to be falsifiable. It is suggested that the authors formulate specific and testable hypotheses and preregister them before data collection.

      We thank the reviewer for pointing this out. We have revised the Figure 1a and made it less abstract and more logical. For Figure 6, the schematic represents our theoretical framework of how hMT+ contributes to 3D visuo-spatial intelligence, we believe the elements within this framework are grounded in related theories and supported by evidence discussed in our results and discussions section, making them specific and testable.

      (2) Central to the hypothesis and design of the manuscript is a misinterpretation of a prior study by Melnick et al. (2013). While the original study identified a strong correlation between WAIS (IQ) and the Suppression Index (SI), the current manuscript erroneously asserts a specific relationship between the block design test (from WAIS) and SI. It should be noted that in the original paper, WAIS comprises Similarities, Vocabulary, Block design, and Matrix reasoning tests in Study 1, while the complete WAIS is used in Study 2. Did the authors conduct other WAIS subtests other than the block design task?

      Thank you for pointing this out. Reviewer #1 also asked this question, we copy the answers in here “The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.”

      (3) Additionally, there are numerous misleading references and unsubstantiated claims throughout the manuscript. As an example of misleading reference, "the human MT ... a key region in the multiple representations of sensory flows (including optic, tactile, and auditory flows) (Bedny et al., 2010; Ricciardi et al., 2007); this ideally suits it to be a new MD core." The two references in this sentence are claims about plasticity in the congenitally blind with sensory deprivation from birth, which is not really relevant to the proposal that hMT+ is a new MD core in healthy volunteers.

      Thank you for pointing this out. We have carefully read the corresponding references and considered the corresponding theories and agree with these comments. Due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated by reverberation with frontal cortex”, it is not yet sufficient to prove that hMT+ is the core node of the MD system, we will adjust the explanatory logic of the article, that is, emphasizing the de redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems. In addition, regarding the potential central role of hMT+ in the MD system, we agree with your view that research on hMT+ as a multisensory integration hub mainly focuses on developmental processes. Meanwhile, in adults, the MST region of hMT+ is considered a multisensory integration area for visual and vestibular inputs, which potentially supports the role of hMT+ in multitasking multisensory systems (Gu et al., J. Neurosci, 26(1), 73–85, 2006; Fetsch et al., Nat. Neurosci, 15, 146–154, 2012.). Further research could explore how other intelligence sub-ability such as working memory and language comprehension are facilitated by hMT+'s features.

      Another example of unsubstantiated claim: the rationale for selecting V1 as the control region is based on the assertion that "it mediates the 2D rather than 3D visual domain (Born & Bradley, 2005)". That's not the point made in the Born & Bradley (2005) paper on MT. It's crucial to note that V1 is where the initial binocular convergence occurs in cortex, i.e., inputs from both the right and left eyes to generate a perception of depth.

      Thank you for pointing this out. We acknowledge the inappropriate citation of "Born & Bradley, 2005," which focuses solely on the structure and function of the visual area MT. However, we believe that choosing hMT+ as the domain for 3D visual analysis and V1 as the control region is justified. Cumming and DeAngelis (Annu Rev Neurosci, 24:203–238.2001) state that binocular disparity provides the visual system with information about the three-dimensional layout of the environment, and the link between perception and neuronal activity is stronger in the extrastriate cortex (especially MT) than in the primary visual cortex. This supports our choice and emphasizes the relevance of hMT+ in our study. We have revised our reference in the revised version.

      Results & Discussion

      (1) The missing correlation between SI and BDT is crucial to the rest of the analysis. The authors should discuss whether they replicated the pattern of results from Melnick et al. (2013) despite using only one WAIS subtest.

      We thank for the reviewer’s suggestion. We have placed it in the main text (Figure 3e).

      (2) ROIs: can the authors clarify if the results are based on bilateral MT+/V1 or just those in the left hemisphere? Can the authors plot the MRS scan area in V1? I would be surprised if it's precise to V1 and doesn't spread to V2/3 (which is fine to report as early visual cortex).

      We thank for the reviewer’s suggestion. We have drawn the V1 ROI MRS scanning area (Figure supplement 1). Using the template, we checked the coverage of V1, V2, and V3. Although the MRS overlap regions extend to V2 (3%) and V3 (32%), the major coverage of the MRS scanning area is in V1, with 65% overlap across subjects.

      (3) Did the authors examine V1 FC with either the frontal regions and/or whole brain, as a control analysis? If not, can the author justify why V1 serves as the control region only in the MRS but not in FC (Figure 4) or the mediation analysis (Figure 5)? That seems a little odd given that control analyses are needed to establish the specificity of the claim to MT+

      We thank for the reviewer’s suggestion. We have done the V1 FC-behavior connection as control analysis (Figure supplement 7). Only positive correlations in the frontal area were detected, suggesting that in the 3D visuo-spatial intelligence task, V1 plays a role in feedforward information processing. However, hMT+, which showed specific negative correlations in the frontal, is involved in the inhibition mechanism. These results further emphasize the de-redundancy function of hMT+ in 3D visuo-spatial intelligence.

      Regarding the mediation analysis, since GABA/Glu concentration in V1 has no correlation with BDT score, it is not sufficient to apply mediation analysis.

      (4) It is not clear how to interpret the similarity or difference between panels a and b in Figure 4.

      We thank the reviewer for pointing this out. We have further interpreted the difference between a and b in the revised version. Panels a represents BDT score correlated hMT+-region FC, which is obviously involved in frontal cortex. While panels b represents SI correlated hMT+-region FC, which shows relatively less regions. The overlap region is what we are interested in and explain how local inhibitory mechanisms works in the 3D visuo-spatial intelligence. In addition, we have revised Figure 4 and point out the overlap region.

      (5) SI is not relevant to the authors‘ priori hypothesis, but is included in several mediation analyses. Can the authors do model comparisons between the ones in Figure 5c, d, and Figure S6? In other words, is SI necessary in the mediation model? There seem discrepancies between the necessity of SI in Figures 5c/S6 vs. Figure 5d.

      We thank the reviewer for highlighting this point. The relationship between the Suppression Index (SI) and our a priori hypotheses is elaborated in the response to reviewer 3, section (1). SI plays a crucial role in explicating how local inhibitory mechanisms, on the psychological level, function within the context of the 3D visuo-spatial task. Additionally, Figure 5c illustrates the interaction between the frontal cortex and hMT+, showing how the effects from the frontal cortex (BA46) on the Block Design Task are fully mediated by SI. This further underscores the significance of SI in our model.

      (6) The sudden appearance of "efficient information" in Figure 6, referring to the neural efficiency hypothesis, raises concerns. Efficient visual information processing occurs throughout the visual cortex, starting from V1. Thus, it appears somewhat selective to apply the neural efficiency hypothesis to MT+ in this context.

      We thank the reviewer for highlighting this point. There is no doubt that V1 involved in efficient visual information processing. However, in our result, the V1 GABA has no significant correlation between BDT score, suggesting that the V1 efficient processing might not sufficiently account for the individual differences in 3D visuo-spatial intelligence. Additionally, we will clarify our use of the neural efficiency hypothesis by incorporating it into the introduction of our paper to better frame our argument.

      Transparency Issues:

      (1) Don't think it's acceptable to make the claim that "All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary information". It is the results or visualizations of data analysis, rather than the raw data themselves, that are presented in the paper/supp info.

      We thank the reviewer for pointing this out. We realized that such expression would lead to confusion. We have deleted this expression.

      (2) No GitHub link has been provided in the manuscript to access the source data, which limits the reproducibility and transparency of the study.

      We thank the reviewer for pointing this out. We have attached the GitHub link in the revised version.

      Minor:

      "Locates" should be replaced with "located" throughout the paper. For example: "To investigate this issue, this study selects the human MT complex (hMT+), a region located at the occipito-temporal border, which represents multiple sensory flows, as the target brain area."

      We thank the reviewer for pointing this out. We have revised it.

      Use "hMT+" instead of "MT+" to be consistent with the term in the literature.

      We thank the reviewer for pointing this out. We agree to use hMT+ in the literature.

      "Green circle" in Figure 1 should be corrected to match its actual color.

      We thank the reviewer for pointing this out. We have revised it.

      The abbreviation for the Wechsler Adult Intelligence Scale should be "WAIS," not "WASI."

      We thank the reviewer for pointing this out. We have revised it.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The figures and tables should be substantially improved.

      We thank the reviewer for pointing this out. We have improved some of the figures’ quality.

      (2) Please explain the sample size, and the difference between Schallmo eLife 2018, and Melnick, 2013.

      We thank the reviewer for pointing this out. These questions are answered in the public review. We copy the answer in the public review.

      (2.1)  How was the sample size determined? Is it sufficient??

      Thank you to the reviewer for pointing this out. We use G*power to determine our sample size. In the study by Melnick (2013), they reported a medium effect between SI and Perception Reasoning sub-ability (r=0.47). Here we use this r value as the correlation coefficient (ρ H1), setting the power at the commonly used threshold of 0.8 and the alpha error probability at 0.05. The required sample size is calculated to be 26. This ensures that our study has adequate power to yield valid statistical results. Furthermore, compared to earlier within-subject studies like Schallmo et al.'s 2018 research, which used 22 subjects to examine GABA levels in MT+ and the early visual cortex (EVC), our study includes an enough dataset.

      (2.2)  In Schallmo elife 2018, there was no correlation between GABA concentration and SI. How can we justify the different results different here?

      Thank you to the reviewer for pointing this out. There are several differences between the two studies, ours and theirs:

      a. While the earlier study by Schallmo et al. (2018) employed 3T MRS, we utilize 7T MRS, enhancing our ability to detect and measure GABA with greater accuracy.

      b. Schallmo elife 2018 choose to use the bilateral hMT+ as the MRS measurement region while we use the left hMT+. The reason why we focus on left hMT+ are described in review 1. (6). Briefly, use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      c. The resolution of MRS sequence in Schallmo elife 2018 is 3 cm isotropic voxel, while we apply 2 cm isotropic voxel. This helps us more precisely locate hMT+ and exclude more white matter signal.

      (3) Table 1 and Table Supplementary 1-3 contain many correlation results. But what are the main points of these values? Which values do the authors want to highlight? Why are only p-values shown with significance symbols in Table Supplementary 2?

      (3.1) what are the main points of these values?

      Thank you to the reviewer for pointing this out. These correlations represent the relationship between behavior task (SI/BDT) and resting-state functional connectivity. It indicates that left hMT+ is involved in the efficient information integration network when it comes to the BDT task. In addition, left hMT+’s surround suppression is involved in several hMT+ - frontal connectivity. Furthermore, the overlapping regions between two tasks indicate a shared underlying mechanism.

      (3.2) Which values do the authors want to highlight?

      Table 1 and Table Supplementary 1-3 present the preliminary analysis results for Table 2 and Table Supplementary 4-6. So, we generally report all value. Conversely, in the Table 2 and Table Supplementary 4-6, we highlight (bold font) indicating the significant correlations survived from multi correlation correction.

      (3.3) Why are only p-values shown with significance symbols in Table Supplementary 2?

      Thank you for pointing this out, it is a mistake. We have revised it and delete the significance symbols.

      (4) Line 27, it is unclear to me what is "the canonical theory".

      We thank the reviewer for pointing this out. We have revised “the canonical theory" to “the prevailing opinion”.

      (5) Throughout the paper, the authors use "MT+", I would suggest using "hMT+" to indicate the human MT complex, and to be consistent with the human fMRI literature.

      We thank the reviewer for pointing this out. We have revised them and used "hMT+" to be consistent with the human fMRI literature.

      (6) At the beginning of the results section, I suggest including the total number of subjects. It is confusing what "31/36 in MT+, and 28/36 in V1" means.

      We thank the reviewer for pointing this out. We have included the total number of subjects in the beginning of result section.

      (7) Line 138, "This finding supports the hypothesis that motion perception is associated with neural activity in MT+ area". This sentence is strange because it is a well-established finding in numerous human fMRI papers. I think the authors should be more specific about what this finding implies.

      We thank the reviewer for pointing this out. We have deleted the inappropriate sentence "This finding supports the hypothesis that motion perception is associated with neural activity in MT+ area".

      (8) There are no unit labels for all x- and y-axies in Figure 1. I only see the unit for Conc is mmol per kg wet weight.

      We thank the reviewer for pointing this out. Figure 1 is a schematic and workflow chart, so labels for x- and y-axes are not needed. I believe this confusion might pertain to Figure 3. In Figures 3a and 3b, the MRS spectrum does not have a standard y-axis unit as it varies based on the individual physical conditions of the scanner; it is widely accepted that no y-axis unit is used. While the x-axis unit is ppm, which indicate the chemical shift of different metabolites. In Figure 3c, the BDT represents IQ scores, which do not have a standard unit. Similarly, in Figures 3d and 3e, the Suppression Index does not have a standard unit.

      (9) Although the correlations are not significant in Figure Supplement 2&3, please also include the correlation line, 95% confidence interval, and report the r values and p values (i.e., similar format as in Figure 1C).

      We thank the reviewer for pointing this out. We have revised them.

      (10) There is no need to separate different correlation figures into Figure Supplementary 1-4. They can be combined into the same figure.

      We thank the reviewer for the suggestion. However, each correlation figure in the supplementary figures has its own specific topic and conclusion. The correlation figures in Supplementary Figure 1 indicate that GABA in V1 does not show any correlation with BDT and SI, illustrating that inhibition in V1 is unrelated to both 3D visuo-spatial intelligence and motion suppression processing. The correlations in Supplementary Figure 2 indicate that the excitation mechanism, represented by Glutamate concentration, does not contribute to 3D visuo-spatial intelligence in either hMT+ or V1. Supplementary Figure 3 validates our MRS measurements. Supplementary Figure 4 addresses potential concerns regarding the impact of outliers on correlation significance. Even after excluding two “outliers” from Figures 3d and 3e, the correlation results remain stable.

      (11) Line 213, as far as I know, the study (Melnick et al., 2013) is a psychophysical study and did not provide evidence that the spatial suppression effect is associated with MT+.

      We thank the reviewer for pointing this out. It was a mistake to use this reference, and we have revised it accordingly.

      (12) At the beginning of the results, I suggest providing more details about the motion discrimination tasks and the measurement of the BDT.

      We thank the reviewer for pointing this out. We have included some brief description of task at the beginning of the result section.

      (13) Please include the absolute duration thresholds of the small and large sizes of all subjects in Figure 1.

      We thank the reviewer for the suggestion. We have included these results in Figure 3.

      (14) Figure 5 is too small. The items in plot a and b can be barely visible.

      We thank the reviewer for pointing this out. We increase the size and resolution of Figure 5.

      Reviewer #2 (Recommendations For The Authors):

      Recommendations for improving the writing and presentation.

      I highly recommend editing the manuscript for readability and the use of the English language. I had significant difficulties following the rationale of the research due to issues with the way language was used.

      We thank the reviewer for pointing this out. We apologize for any shortcomings in our initial presentation. We have invited a native English speaker to revise our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      Heer and Sheffield used 2 photon imaging to dissect the functional contributions of convergent dopamine and noradrenaline inputs to the dorsal hippocampus CA1 in head-restrained mice running down a virtual linear path. Mice were trained to collect water rewards at the end of the track and on test days, calcium activity was recorded from dopamine (DA) axons originating in the ventral tegmental area (VTA, n=7) and noradrenaline axons from the locus coeruleus (LC, n=87) under several conditions. When mice ran laps in a familiar environment, VTA DA axons exhibited ramping activity along the track that correlated with distance to reward and velocity to some extent, while LC input activity remained constant across the track, but correlated invariantly with velocity and time to motion onset. A subset of recordings taken when the reward was removed showed diminished ramping activity in VTA DA axons, but no changes in the LC axons, confirming that DA axon activity is locked to reward availability. When mice were subsequently introduced to a new environment, the ramping to reward activity in the DA axons disappeared, while LC axons showed a dramatic increase in activity lasting 90 s (6 laps) following the environment switch. In the final analysis, the authors sought to disentangle LC axon activity induced by novelty vs. behavioral changes induced by novelty by removing periods in which animals were immobile and established that the activity observed in the first 2 laps reflected novelty-induced signal in LC axons.  

      Strengths:  

      The results presented in this manuscript provide insights into the specific contributions of catecholaminergic input to the dorsal hippocampus CA1 during spatial navigation in a rewarded virtual environment, offering a detailed analysis of the resolution of single axons. The data analysis is thorough and possible confounding variables and data interpretation are carefully considered.  

      Weaknesses:  

      Aspects of the methodology, data analysis, and interpretation diminish the overall significance of the findings, as detailed below.  

      The LC axonal recordings are well-powered, but the DA axonal recordings are severely underpowered, with recordings taken from a mere 7 axons (compared to 87 LC axons).

      Additionally, 2 different calcium indicators with differential kinetics and sensitivity to calcium changes (GCaMP6S and GCaMP7b) were used (n=3, n=4 respectively) and the data pooled. This makes it very challenging to draw any valid conclusions from the data, particularly in the novelty experiment. The surprising lack of novelty-induced DA axon activity may be a false negative. Indeed, at least 1 axon (axon 2) appears to be showing a novelty-induced rise in activity in Figure 3C. Changes in activity in 4/7 axons are also referred to as a 'majority' occurrence in the manuscript, which again is not an accurate representation of the observed data.  

      We appreciate the reviewer's detailed feedback regarding the analysis of VTA axons in our dataset. The relatively low sample size for VTA axons is due to their sparsity in the dCA1 region of the hippocampus and the inherent difficulty in recording from these axons. VTA axons are challenging to capture due to their low baseline fluorescence and long-range axon segments, resulting in a typical yield of only a single axon per field of view (FOV) per animal. In contrast, LC axons are more abundant in dCA1.

      To address the disparity in sample sizes between LC and VTA axons, we down-sampled the LC axons to match the number of VTA axons, repeating this process 1000 times to create a distribution. However, we acknowledge the reviewer's concern that the relatively low sample size for VTA axons might result in insufficient sampling of this population. Increasing the baseline expression of GCaMP to record from VTA axons requires several months, limiting our ability to quickly expand the sample size.

      In response to the reviewer's comments, we have added recordings from 2 additional VTA axons, increasing the sample size from 7 to 9. We re-analyzed all data from the familiar environment with n=9 VTA axons, comparing them to down-sampled LC axons as previously described. However, the additional axons were not recorded in the novel environment. We agree with the reviewer that the lack of novelty-induced DA axon activity may be a false negative. To address this, we have revised the description of our results to include the following sentence:

      “However, 1 VTA ROI showed an increase in activity immediately following exposure to novelty, indicating heterogeneity across VTA axons in CA1, and the lack of a novelty signal on average may be due to a small sample size.”

      Regarding the use of two different GCaMP constructs, we understand the reviewer's concern. We used GCaMP6s and GCaMP7b variants to determine if one would improve the success rate of recording from VTA axons. Given the long duration of these experiments and the low yield, we pooled the data from both GCaMP variants to increase statistical power. However, we recognize the importance of verifying that there are no differences in the signals recorded with these variants.

      With the addition of 2 VTA DA axons expressing GCaMP6s, we now have n=5 GCaMP6s and n=4 GCaMP7b VTA DA axons. This allowed us to compare the activity of the two sensors in the familiar environment. As shown in new Supplementary Figure 2, both sets of axons responded similarly to the variables measured: position in VR, time to motion onset, and animal velocity (although the GCaMP6s expressing axons showed stronger correlations). Since all LC axons recorded expressed GCaMP6s, we also specifically compared VTA GCaMP6s axons to LC GCaMP6s axons (Supp Fig. 3). Our conclusions remained consistent when comparing this subset of VTA axons to LC axons.

      Overall, our paper now includes comparisons of combined VTA axons (n=9) and separately the GCaMP6s-expressing VTA axons (n=5) with LC axons. Both datasets support our initial conclusions that VTA axons signal proximity to reward, while LC axons encode velocity and motion initiation in familiar environments.

      The authors conducted analysis on recording data exclusively from periods of running in the novelty experiment to isolate the effects of novelty from novelty-induced changes in behavior. However, if the goal is to distinguish between changes in locus coeruleus (LC) axon activity induced by novelty and those induced by motion, analyzing LC axon activity during periods of immobility would enhance the robustness of the results.  

      We appreciate the reviewer's insightful suggestion to analyze LC axon activity during periods of immobility to distinguish between changes induced by novelty and those induced by motion. This additional analysis would indeed strengthen our conclusions regarding the LC novelty signal.

      In response to this suggestion, we performed the same analysis as before, but focused on periods of immobility. Our findings indicate that following exposure to novelty, there was a significant increase in LC activity specifically during immobility. This supports the idea that LC axons produce a novelty signal that is independent of novelty-induced behavioral changes. The results of this analysis are now presented in new Supplementary Figure 5b

      The authors attribute the ramping activity of the DA axons to the encoding of the animals' position relative to reward. However, given the extensive data implicating the dorsal CA1 in timing, and the remarkable periodicity of the behavior, the fact that DA axons could be signalling temporal information should be considered.  

      This is an insightful comment regarding the potential role of VTA DA axons in signaling temporal information. We agree that VTA DA axons could indeed be encoding temporal information, as previous work from our lab has shown that these axons exhibit ramping activity when averaged by time to reward (Krishnan et al., 2022).

      To address this, we have now examined DA axon activity relative to time to reward, as shown in new Supplementary Figure 4. Our analysis confirms that these axons ramp up in activity relative to time to reward. Given the periodicity of our mice's behavior in these experiments, as the reviewer correctly points out, we are unable to distinguish between spatial proximity to reward and time to reward. We have added a sentence to our paper highlighting this limitation and stating that further experiments are necessary to differentiate these two variables.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      The authors should explain and justify the use of a longer linear track (3m, as opposed to 2m in the DAT-cre mice) in the LC axon recording experiments.  

      We appreciate the reviewer's insightful comment regarding the use of a longer linear track (3m, as opposed to 2m in the DAT-cre mice) in the LC axon recording experiments. The choice of a 3m track for LC axon recordings was made to align with a previous experiment from our lab (Dong et al., 2021), in which mice were exposed to a novel 3m track while CA1 pyramidal cell populations were recorded. In that study, we detailed the time course of place field formation within the novel track. Our current hypothesis is that LC axons signal novelty, and we aimed to investigate whether the time course of LC axon activity aligns with the time course of place field formation. This hypothesis, and the potential role of LC axons in facilitating plasticity for new place field formation, is further discussed in the Discussion section of our paper.

      For the VTA axon recordings, we utilized a 2m track, consistent with another recent study from our lab (Krishnan et al., 2022), where reward expectation was manipulated, and CA1 pyramidal cell populations were recorded. By matching the track length to this prior study, we aimed to explore how VTA dopaminergic inputs to CA1 might influence CA1 population dynamics along the track under conditions of varying reward expectations.

      We acknowledge that using different track lengths for LC and VTA recordings introduces a variable that could potentially confound direct comparisons. To address this, we normalized the track lengths for our LC versus VTA comparison analysis. This normalization allowed us to directly compare patterns of activity across the two types of axons by adjusting the data to a common scale, thereby ensuring that any observed differences or similarities are attributable to the intrinsic properties of the axons rather than differences in track lengths. By doing so, we could assess relative changes in activity levels at matched spatial bins.

      Although the experiences of the animals on the different track lengths are not identical, our observations suggest that LC and VTA axon signals are not majorly influenced by variations in track length. LC axons are associated with velocity and a pre-motion initiation signal, neither of which are affected by track length. VTA axons, which also correlate with velocity, can be compared to LC axon velocity signals because mice reach maximal velocity very quickly a long the track, well before the end of the 2m track. The range of velocities are therefore capture on both track lengths. While VTA axons exhibit ramping activity as they approach the reward zone—a signal potentially modulated by track length—LC axons do not show such ramping to reward signals. Thus, a comparison across different track lengths is justified for this aspect of our analysis.

      To further enhance the rigor of our comparisons between axon dynamics recorded on 2m and 3m tracks, we conducted an additional analysis plotting axon activity by time to reward and actual (un-normalized) distance from reward (Supplementary Figure 4). This analysis revealed very similar signals between the two sets of axons, supporting our initial conclusions.

      We thank the reviewer for raising this important point and hope that our detailed explanation and additional analysis address their concern.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      Dong, C., Madar, A. D. & Sheffield, M.E. Distinct place cell dynamics in CA1 and CA3 encode experience in new environments. Nat Commun 12, 2977 (2021).

      Reviewer #2 (Public Review):  

      Summary:  

      The authors used 2-photon Ca2+-imaging to study the activity of ventral tegmental area (VTA) and locus coeruleus (LC) axons in the CA1 region of the dorsal hippocampus in head-fixed male mice moving on linear paths in virtual reality (VR) environments.  

      The main findings were as follows:  

      - In a familiar environment, the activity of both VTA axons and LC axons increased with the mice's running speed on the Styrofoam wheel, with which they could move along a linear track through a VR environment.  

      - VTA, but not LC, axons showed marked reward position-related activity, showing a ramping-up of activity when mice approached a learned reward position.  

      - In contrast, the activity of LC axons ramped up before the initiation of movement on the Styrofoam wheel.  

      - In addition, exposure to a novel VR environment increased LC axon activity, but not VTA axon activity.  

      Overall, the study shows that the activity of catecholaminergic axons from VTA and LC to dorsal hippocampal CA1 can partly reflect distinct environmental, behavioral, and cognitive factors. Whereas both VTA and LC activity reflected running speed, VTA, but not LC axon activity reflected the approach of a learned reward, and LC, but not VTA, axon activity reflected initiation of running and novelty of the VR environment.  

      I have no specific expertise with respect to 2-photon imaging, so cannot evaluate the validity of the specific methods used to collect and analyse 2-photon calcium imaging data of axonal activity.  

      Strengths:  

      (1) Using a state-of-the-art approach to record separately the activity of VTA and LC axons with high temporal resolution in awake mice moving through virtual environments, the authors provide convincing evidence that the activity of VTA and LC axons projecting to dorsal CA1 reflect partly distinct environmental, behavioral and cognitive factors.  

      (2) The study will help a) to interpret previous findings on how hippocampal dopamine and norepinephrine or selective manipulations of hippocampal LC or VTA inputs modulate behavior and b) to generate specific hypotheses on the impact of selective manipulations of hippocampal LC or VTA inputs on behavior.  

      Weaknesses:  

      (1) The findings are correlational and do not allow strong conclusions on how VTA or LC inputs to dorsal CA1 affect cognition and behavior. However, as indicated above under Strengths, the findings will aid the interpretation of previous findings and help to generate new hypotheses as to how VTA or LC inputs to dorsal CA1 affect distinct cognitive and behavioral functions.  

      (2) Some aspects of the methodology would benefit from clarification.  

      First, to help others to better scrutinize, evaluate, and potentially to reproduce the research, the authors may wish to check if their reporting follows the ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines for the full and transparent reporting of research involving animals (https://arriveguidelines.org/). For example, I think it would be important to include a sample size justification (e.g., based on previous studies, considerations of statistical power, practical considerations, or a combination of these factors). The authors should also include the provenance of the mice. Moreover, although I am not an expert in 2-photon imaging, I think it would be useful to provide a clearer description of exclusion criteria for imaging data.

      We thank the reviewer for helping us formalize the scientific rigor of our study. There are ten ARRIVE Guidelines and we have addressed most of them in our study already. However, there is an opportunity to add detail. We have listed below all ten points and how we have addressed each one (and point out any new additions):

      (1) Experimental design - we go into great depth explaining the experimental set-up, how we used the autofluorescent blebs as imaging controls, how we controlled for different sample sizes between the two populations, and the statistical tests used for comparisons. We also carefully accounted for animal behavior when quantifying and describing axon dynamics both in the familiar and novel environments.

      (2) Sample size - we state both the number of ROIs and mice for each analysis. We have now also added the number of mice we observed specific types of activity in. 

      (3) Inclusion/exclusion criteria - The following has now been added to the Methods section: Out of the 36 NET-Cre mice injected, 15 were never recorded from for either failing to reach behavioral criteria, or a lack of visible expression in axons. Out of the 54 DAT-Cre mice injected, imaging was never conducted in 36 of them for lack of expression or failing to reach behavioral criteria. Out of the remaining 21 NET-CRE, 5 were excluded for heat bubbles, z-drift, or bleaching, while 10 DAT-Cre were excluded for the same reasons. This was determined by visually assessing imaging sessions, followed by using the registration metrics output by suite2p. This registration metric conducted a PCA on the motion-corrected ROIs and plotted the first PC. If the PC drifted largely, to the point where no activity was apparent, the video was excluded from analysis. 

      (4) Randomization - Already included in the paper is a description of random downsampling of LC axons to make statistical comparisons with VTA axons. LC axons were selected pseudo-randomly (only one axon per imaging session) to match VTA sampling statistics. This randomization was repeated 1000 times and comparisons were made against this random distribution. 

      (5) Blinding-masking - no blinding/masking was conducted as no treatments were given that would require this. We will include this statement in the next version. 

      (6) Outcomes - We defined all outcomes measured, such as those related to animal behavior and axon signaling. 

      (7) Statistical methods - None of the reviewers had any issues regarding our description of statistical methods, which we described in great detail in this version of the paper. 

      (8) Experimental animals - We have now described that DAT- Cre mice were obtained through JAX labs, and NET-Cre mice were obtained from the Tonegawa lab (Wagatsuma et al. 2017). This was absent in the initial version of the paper.

      (9) Experimental procedure - Already listed in great detail in Methods section.

      (10) Results - Rigorously described in detail for behaviors and related axon dynamics.

      Wagatsuma, Akiko, Teruhiro Okuyama, Chen Sun, Lillian M. Smith, Kuniya Abe, and Susumu Tonegawa. “Locus Coeruleus Input to Hippocampal CA3 Drives Single-Trial Learning of a Novel Context.” Proceedings of the National Academy of Sciences 115, no. 2 (January 9, 2018): E310–16. https://doi.org/10.1073/pnas.1714082115.

      Second, why were different linear tracks used for studies of VTA and LC axon activity (from line 362)? Could this potentially contribute to the partly distinct activity correlates that were found for VTA and LC axons?  

      We thank the reviewer for pointing this out and giving us a chance to address it directly. A detailed response to this is written above for a similar comment from reviewer 1.

      Third, the authors seem to have used two different criteria for defining immobility. Immobility was defined as moving at <5 cm/s for the behavioral analysis in Figure 3a, but as <0.2 cm/s for the imaging data analysis in Figure 4 (see legends to these figures and also see Methods, from line 447, line 469, line 498)? I do not understand why, and it would be good if the authors explained this.  

      This is a typo leftover from before we converted velocity from rotational units of the treadmill to cm/s. This has now been corrected.

      (3) In the Results section (from line 182) the authors convincingly addressed the possibility that less time spent immobile in the novel environment may have contributed to the novelty-induced increase of LC axon activity in dorsal CA1 (Figure 4). In addition, initially (for the first 2-4 laps), the mice also ran more slowly in the novel environment (Figure 3aIII, top panel). Given that LC and VTA axon activity were both increasing with velocity (Figure 1F), reduced velocity in the novel environment may have reduced LC and VTA axon activity, but this possibility was not addressed. Reduced LC axon activity in the novel environment could have blunted the noveltyinduced increase. More importantly, any potential novelty-induced increase in VTA axon activity could have been masked by decreases in VTA axon activity due to reduced velocity. The latter may help to explain the discrepancy between the present study and previous findings that VTA neuron firing was increased by novelty (see Discussion, from line 243). It may be useful for the authors to address these possibilities based on their data in the Results section, or to consider them in their Discussion.  

      We appreciate the reviewer's insightful comment regarding the potential impact of decreased velocity on novelty responses in LC and VTA axons. The decreased velocity in the novel environment could lead to a diminished novelty response in LC axons and could mask a subtle novelty signal in VTA axons. We have now included the following points in our discussion:

      “In addition, as noted above, on average we did observe a velocity associated signal in VTA axons. When mice were exposed to the novel environment their velocity initially decreased. This would be expected to reduce the average signal across the VTA axon population relative to the higher velocity in the familiar environment. It is possible that this decrease could somewhat mask a subtle novelty induced signal in VTA axons. Therefore, additional experiments should be conducted to investigate the heterogeneity of these axons and their activity under different experimental conditions during tightly controlled behavior.”

      “As discussed above, the slowing down of animal behavior in the novel environment could have decreased LC axon activity and reduced the magnitude of the novelty signal we detected during running. The novelty signal we report here may therefore be an under estimate of it's magnitude under matched behavioral settings.”

      However, it is important to note that although VTA axons, on average, showed activity modulated by velocity in a familiar rewarded environment, this relationship was largely due to the activity of two VTA axons that were strongly modulated by velocity, indicating heterogeneity within the VTA axon population in dCA1. We have highlighted this point in the discussion. We also discuss that:

      “It is possible that some VTA DA inputs to dCA1 respond to novel environments, and the small number of axons recorded here are not representative of the whole population.”

      (4) Sensory properties of the water reward, which the mice may be able to detect, could account for reward-related activity of VTA axons (instead of an expectation of reward). Do the authors have evidence that this is not the case? Occasional probe trials, intermixed with rewarded trials, could be used to test for this possibility.  

      Mice receive their water reward through a water spout that is immobile and positioned directly in front of their mouth. Water delivery is triggered by a solenoid when the mice reach the end of the virtual track. Therefore, because the water spout is immobile and the water reward is not delivered until they reach the end of the track, there is nothing for the mice to detect during their run. We have added clarifications about the water spout to the Methods and Results sections, along with appropriate discussion points.

      Additionally, we note that the ramping activity of VTA axons is still present on the initial laps with no reward (Krishnan et al., 2022), indicating that this activity is not directly related to the presence or absence of water but is instead associated with the animal’s reward expectation.

      We thank the reviewer for raising this point and hope that these clarifications address their concern.

      Reviewer #3 (Public Review):  

      Summary:  

      Heer and Sheffield provide a well-written manuscript that clearly articulates the theoretical motivation to investigate specific catecholaminergic projections to dorsal CA1 of the hippocampus during a reward-based behavior. Using 2-photon calcium imaging in two groups of cre transgenic mice, the authors examine the activity of VTA-CA1 dopamine and LC-CA1 noradrenergic axons during reward seeking in a linear track virtual reality (VR) task. The authors provide a descriptive account of VTA and LC activities during walking, approach to reward, and environment change. Their results demonstrate LC-CA1 axons are activated by walking onset, modulated by walking velocity, and heighten their activity during environment change. In contrast, VTA-CA1 axons were most activated during the approach to reward locations. Together the authors provide a functional dissociation between these catecholamine projections to CA1. A major strength of their approach is the methodological rigor of 2-photon recording, data processing, and analysis approaches. These important systems neuroscience studies provide solid evidence that will contribute to the broader field of learning and memory. The conclusions of this manuscript are mostly well supported by the data, but some additional analysis and/or experiments may be required to fully support the author's conclusions.  

      Weaknesses:  

      (1) During teleportation between familiar to novel environments the authors report a decrease in the freezing ratio when combining the mice in the two experimental groups (Figure 3aiii). A major conclusion from the manuscript is the difference in VTA and LC activity following environment change, given VTA and LC activity were recorded in separate groups of mice, did the authors observe a similar significant reduction in freezing ratio when analyzing the behavior in LC and VTA groups separately?  

      In response to the comment regarding the freezing ratios during teleportation between familiar and novel environments, we have analyzed the freezing ratios and lap velocities of DAT-Cre and NET-Cre mice separately (Fig. 3Aiii). Our analysis shows that the mean lap velocities of both groups overlap in the familiar environment and significantly decrease on the first lap of the novel environment (Fig. 3iii, top). For subsequent laps, the velocities in both groups are not statistically significantly different from the familiar environment lap velocities.

      Freezing ratios also show a statistically significant decrease on the first lap of the novel environment compared to the familiar environment in both groups (Fig. 3iii, bottom). In the NETCRE mice, the freezing ratios remain statistically lower in subsequent laps, while in the DATCRE mice, the following laps show a similar trend but without statistical significance. This lack of statistical significance in the DAT-CRE mice is likely due to their already lower freezing ratios in the familiar environment. Overall, the data demonstrate similar behavioral responses in the two groups of mice during the switch from the familiar to the novel environment.

      (2) The authors satisfactorily apply control analyses to account for the unequal axon numbers recorded in the LC and VTA groups (e.g. Figure 1). However, given the heterogeneity of responses observed in Figures 3c, 4b and the relatively low number of VTA axons recorded (compared to LC), there are some possible limitations to the author's conclusions. A conclusion that LC-CA1 axons, as a general principle, heighten their activity during novel environment presentation, would require this activity profile to be observed in some of the axons recorded in most all LC-CA1 mice.

      We agree with the reviewer’s point. To address this issue, when downsampling LC axons to compare to VTA axons, we matched the sampling statistics of the VTA axons/mice by only selecting one LC axon from each mouse to match the VTA dataset.

      Additionally, we have now included the number of recording sessions and the number of mice in which we observed each type of activity. This information has been added to further clarify and support our conclusions.

      Additionally, if the general conclusion is that VTA-CA1 axons ramp activity during the approach to reward, it would be expected that this activity profile was recorded in the axons of most all VTA-CA1 mice. Can the authors include an analysis to demonstrate that each LC-CA1 mouse contained axons that were activated during novel environments and that each VTA-CA1 mouse contained axons that ramped during the approach to reward?  

      As above, we have now added the number of mice that had each activity type we report in the paper here.  

      (3) A primary claim is that LC axons projecting to CA1 become activated during novel VR environment presentation. However, the experimental design did not control for the presentation of a familiar environment. As I understand, the presentation order of environments was always familiar, then novel. For this reason, it is unknown whether LC axons are responding to novel environments or environmental change. Did the authors re-present the familiar environment after the novel environment while recording LC-CA1 activity?  

      While we did not vary the presentation order of familiar and novel environments, we recorded the activity of LC axons in some mice when exposed to a dark environment (no VR cues) prior to exposure to the familiar environment. Our analysis of this data demonstrates that LC axons are also active following abrupt exposure to the familiar environment.

      We have added a new figure showing this response (Supplementary Figure 5A) and expanded on our original discussion point that LC axon activity generally correlates with arousal, as this result also supports that interpretation.

      We thank the reviewer for highlighting this important consideration. It certainly helps with the interpretation regarding what LC axons generally encode.  

      >Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      In addition to what has been described in the public review, I have the following recommendations:  

      The sample size of DA axon recordings should be increased with the use of a single GCaMP for valid conclusions to be made about the lack of novelty-inducted activity in these axons.  

      We have increased the n of VTA GCaMP6s axons in the familiar environment by including two axons that were recorded in the familiar rewarded condition. We have also conducted an analysis comparing GCaMPs versus GCaMP7b, which is discussed in detail above.

      Regarding the concerns about valid conclusions of novelty-induced activity in VTA axons, we have added a comment in the discussion to tone down our conclusions regarding the lack of a novelty signal in the VTA axons. This valid concern is discussed in detail above.  

      The title is currently very generic, and non-informative. I recommend the use of more specific language in describing the type of behavior under investigation. It is not clear to the reviewer why 'learning' is included here.  

      Original title: “Distinct catecholaminergic pathways projecting to hippocampal CA1 transmit contrasting signals during behavior and learning”

      To make it more specific to the experiments conducted here, we have changed the title to this:

      New title: “Distinct catecholaminergic pathways projecting to hippocampal CA1 transmit contrasting signals during navigation in familiar and novel environments”

      Error noted in Figure 4C legend - remove reference to VTA ROIs.  

      The reference to VTA ROIs has been removed from the figure legend

      Reviewer #2 (Recommendations For The Authors):  

      (1) The concluding sentence of the Abstract could be more specific: which distinct types of information are reflected/'signaled'/'encoded' by LC and VTA inputs to dorsal CA1?  

      The abstract has been adjusted accordingly. The new sentence is more specific: “These inputs encode unique information, with reward information in VTA inputs and novelty and kinematic information in LC inputs, likely contributing to differential modulation of hippocampal activity during behavior and learning.”

      (2) Line 46/47: The study by Mamad et al. (2017) did not quite show that VTA dopamine input to dorsal CA1 'drives place preference'. To my understanding, the study showed that suppression of VTA dopamine signaling in a specific place caused avoidance of this place and that VTA dopamine signaling modulated hippocampal place-related firing. So, please consider rephrasing.  

      Corrected, thanks for pointing this out.

      (3) Legend to Figure 3AIII: 'Each lap was compared to the first lap in F . . .' Could you clarify if 'F' refers to the 'familiar environment?  

      Figure legend has been changed accordingly

      (4) Line 176: '36 LC neurons' - should this not be '36 imaged axon terminals in dorsal CA1' or something along these lines?  

      This reference has been changed to “LC axon ROIs”

      (5) Line 353: Why was water restriction started before the hippocampal window implant, if behavioral training to run for water reward only started after the implant? Please clarify.

      A sentence was added to the methods to explain that this was done to reduce bleeding and swelling during the hippocampal window implantation.  

      (6) Line 377: '. . . which took 10-14 days (although some mice never reached this threshold).' How many mice did not reach the criterion within 14 days? I think it is not accurate to say the mice 'never' reached the threshold, as they were only tested for a limited period of time.  

      We have added details of how many mice were excluded from each group and the reason why they were excluded.

      (7) Exclusion criteria for imaging data: The authors state (from line 402): 'Imaging sessions with large amounts of drift or bleaching were excluded from analysis (8 sessions for NET mice, 6 sessions for LC Mice).' What exactly were the quantitative exclusion criteria? Were these defined before the onset of the study or throughout the study?  

      Imaging sessions were first qualitatively assessed by looking for disappearance or movement of structures in the Z-plane throughout the imaging FOV. Additionally, following motion correction in suite2p, we used the registration metrics, which plots the first Principle Component of the motion corrected images, to assess for drift, bleaching, or heat bubbles. If this variable increased or decreased greatly throughout a session, to the point where any apparent activity was not visible in the first PC, the dataset was excluded. We have added these exclusion criteria to the methods section.

      Reviewer #3 (Recommendations For The Authors):  

      Please provide a justification or rationale for having two different criteria for immobility (< 5cm/sec) and freezing (<0.2 cm/sec). If VTA and LC axon activities are different between these two velocities, please provide some commentary on this difference.  

      This is a typo leftover from before we converted velocity from rotational units to cm/s.

    1. Welcome back and in this demo lesson I'm going to step through how you can register a domain using Route 53. Now this is an optional step within the course. Worst case you should know how to perform the domain registration process within AWS and optionally you can use this domain within certain demos within the course to get a more real-world like experience.

      To get started, as always, just make sure that you're logged in to the IAM admin user of the general AWS account which is the management account of the organization. Now make sure that you have the Northern Virginia region selected. While Route 53 is a global service, I want you to get into the habit of using the Northern Virginia region. Now we're going to be using the Route 53 product, so click in the search box at the top of the screen, type Route 53 and then click to move to the Route 53 console.

      Now Route 53, at least in the context of this demo lesson, has two major areas. First is hosted zones and this is where you create or manage DNS zones within the product. Now DNS zones, as you'll learn elsewhere in the course, you can think of as databases which store your DNS records. When you create a hosted zone within Route 53, Route 53 will allocate four name servers to host this hosted zone. And that's important, you need to understand that every time you create a new hosted zone, Route 53 will allocate four different name servers to host that zone. Now the second area of Route 53 is registered domains, and it's in the registered domains area of the console where you can register a domain or transfer a domain in to Route 53.

      Now we're going to register a domain, but before we do that, if you do see any notifications about trying out new versions of the console, then go ahead and click to try out that new version. Where possible, I always like to teach using the latest version of the console UI because it's going to be what you'll be using long-term. So in my case, I'm going to go ahead and click on, try out the new console, depending on when you're doing this demo, you may see this or not. In either case, you want to be using this version of the console UI. So if you are going to register a domain for this course, then you need to go ahead and click register domains.

      The first step is to type the domain that you want into this box. Now, a case study that I use throughout the course is animals for life. So I'm going to go ahead and register a domain related to this case study. So if I type animalsforlive.com and press enter, it will search for the domain and tell us whether it's available. In this case, animalsforlive.com is not available. It's already been registered. In my case, I'm going to use an alternative, so I'm going to try and register animalsforlive.io. Now, I/O domains are one of the most expensive, so if you are registering a domain yourself, I would tend to advise you to look for one of the cheaper ones. I'm going to register this one and it is available.

      Once I've verified that it is available and it's the one I want, we're gonna go ahead and click on select. We can verify the price of this domain for one year, in this case it's 71 US dollars, and then go ahead and click on proceed to check out. Now it's here where you can specify a duration for the domain registration. You can use the default of one year, or alternatively you can go ahead and pick a longer registration period. For this domain I'm going to choose one year and then you can choose whether you want to auto renew the domain after that initial period. In my case I'm going to leave this selected. You'll see a subtotal at the price and then you can click next to move on to the next step.

      Now at this point you need to specify the contact type. In most cases you'll be putting a person or a company but there's also association, public body or reseller. You need to go ahead and fill in all of these details and they do need to be valid details, that's really important. If you are worried about privacy, most domains will allow you to turn on privacy protection, so any details that you enter here cannot be seen externally. Now obviously to keep my privacy intact, I'm going to go ahead and fill in all of these details and I'm going to hide the specifics and once I've entered them all, I'm going to go ahead and click on 'Next' and you should do the same. Again I've hidden my details on the bottom of the screen.

      Route 53 does tell you that in addition to the domain registration cost there is a monthly cost for the hosted zone which will be created as part of this registration. So there is a small monthly cost for every hosted zone which you have hosted using Route 53 and every domain that you have will need one hosted zone. So I'm going to scroll down. Everything looks good, you'll need to agree to the terms and conditions and then click on submit. Now at this point the domain is registering and it will take some time to complete. You may receive a registration email which may include something that you need to do, clicking on a link or some other form of identity verification. You might not get that, but if you do get it, it's important that you do follow all of the steps contained within that email. And if you don't receive an email, you should check your spam folder, because if there are any actions to perform and you don't, it could result in the domain being disabled.

      You can see the status of the domain registration by clicking on "requests" directly below "registered domains". The status will initially be listed as "in progress", and we need this to change to "successful". So pause the video, wait for this status to change, and then you're good to continue. Welcome back, in my case this took about 20 minutes to complete, but as you can see my domain is now registered. So if we go to registered domains you'll be able to see the domain name listed together with the expiration date, the auto renew status, and the status of the transfer lock. Now transfer lock is a security feature, it means the domain cannot be transferred away from route 53 without you disabling this lock.

      Now we're able to see additional details on the domain if we click on the domain name. Now obviously I've hidden my contact information. If you click on the DNSsecKeys tab then it's here where you can configure DNSsec on the domain. We won't be doing anything with that at this stage. One of the important points I want to draw your attention to is the name servers. So I've registered animalsforlife.io and it's these name servers that will be entered into the Animals for Life record within the .io top level domain zone. So these servers are the ones that the DNS system will point at. These currently are set to four Route 53 name servers. And because we've registered the domain inside Route 53, this process is automatic. So a hosted zone is created, four name servers are allocated to host this hosted zone And then those four name servers are entered into our domain records in our top level domain zone.

      This process end-to-end is all automatic. So the four name servers for the animalsforlife.io hosted zone. These are entered into the animalsforlife.io record within the .io top level domain zone. It's all automatic. So if we move to the hosted zone area of the console and then go inside AnimalsForLife.io and then expand the hosted zone details at the top These are the four name servers which are hosting this hosted zone And if you're paying attention You'll note these are the same four servers that are contained within the registered domains Area of the console and these are the same four servers which have been entered into the .io top level domain zone. Now if you ever delete and then recreate a hosted zone It's going to be allocated with four brand new name servers. These name servers will be different than the name servers for the zone which you deleted So if you delete and recreate a hosted zone You'll be given four brand new name servers. In order to stop any DNS problems you'll need to take these brand new name servers and update the items within the registered domains area of the console but again because you've registered the domain within route 53 this process has been handled for you end to end you won't need to worry about any of this unless you delete and recreate the host of zone.

      Now that's everything you need to do at this point if you followed this process throughout this demo lesson you now have an operational domain within the global DNS infrastructure that's manageable within Route 53. Now as I mentioned earlier this is an optional step for the course if you do have a domain registered then you will have the opportunity to use it within various demo lessons within the course. If you don't, don't worry, none of this is mandatory you can do the rest of the course without having a domain. At this point though that is everything I wanted you to do in this demo lesson. Go ahead and complete the video and when you're ready I'll look forward to you joining me in the next.

    1. Welcome back and in this demo lesson you're going to get some experience interacting with CloudWatch. So you're going to create an EC2 instance, you're going to cause that instance to consume some CPU capacity and then you're going to monitor exactly how that looks within CloudWatch. Now to do this in your own environment you'll just need to make sure that you're logged into the general AWS account as the IAM admin user and as always make sure that you have the Northern Virginia region selected which is US-East-1. Once you've got those set correctly then click in the search box at the top and type EC2, find the EC2 service and then just go ahead and open that in a brand new tab.

      Now we're going to skip through the instance creation process because you've done that in a previous demo lesson. So just go ahead and click on instances and then Launch Instance. Under Name, I just want you to put CloudWatch Test as the instance name. Then scroll down and then under the Amazon Machine image to use, go ahead and select Amazon Linux. We're going to pick the Amazon Linux 2023 version, so that's the most recent version of this AMI. It should be listed as Free Tier Eligible, so just make sure that's the case. We'll leave the architecture set to 64-bit x86 and scroll down. It should already be set to an instance type which is free tier eligible, in my case t2.micro. We'll be connecting to this instance using ec2 instance connect so we won't be using an SSH key pair. So in this drop down just click and then say proceed without a key pair. We won't need one because we won't be connecting with a local SSH client. Scroll down further still and under Network Settings click on Edit and just make sure that the default VPC is selected. There should only be one in this list but just make sure that it's set as default. Under Subnet we can leave this as No Preference because we don't need to set one. We will need to make sure that Auto Assign Public IP is set to Enable.

      Under create security group for the name and for the description just go ahead and type CloudWatch SG so CloudWatch SG for both the security group name and the description now the default for security group rule should be fine because it allows SSH to connect from any source location and that's what we want scroll down further still and we'll be leaving storage as default remember this is set from the AMI that we pick. Now because this is a CloudWatch lesson, we're going to set something a little bit different. So expand Advanced Details and then scroll down and look for Detailed CloudWatch Monitoring. Now this does come at an additional cost, so you've got a couple of options. You can just watch me do this or you can do this demo without Detailed Monitoring enabled. And if you don't enable this, it will be entirely free, but you might need to wait a little bit longer for things to happen in the demo lesson so keep that in mind.

      What I'm going to do is I'm going to enable detailed CloudWatch monitoring and if we click on info here we can see some details about exactly what that does and we can also open this in a new tab and explore what additional charges apply if we want to enable it. Now in this case I'm going to enable it you don't have to it's not a huge charge but I think for me demoing this to you it's good that I enable it you don't have to you might just have to wait a little bit longer for things to happen in the demo. Now once all of that set just scroll all the way down to the bottom and go ahead and click launch instance. Now this might take a few minutes to create we're first waiting for this success dialog and once that shows we can go ahead and click on view all instances. Go ahead and click refresh until you see the instance it will start off in a pending state with nothing listed under status check. After a few moments this will change status we'll see that it's in a running state and then we need to wait for this to change to two of two status checks before we continue. So go ahead and pause the video wait for your status check to update and once it does we're good to continue.

      Okay so now this has changed to two out of two checks passed and that's good that's what we want so so it should display running on the instant state and then two out of two checks passed under status check. Once this is the case, go ahead and click in the search box at the top and just type CloudWatch, locate the CloudWatch service, and then open that in a brand new tab. This is the CloudWatch console, and it's here where we're going to create a CloudWatch alarm. Now if you see anything about a new UI or new features, you can just go ahead and close down that dialog. Once we're here, go ahead and click on Alarms on the left and then click on all alarms. This will show a list of all the alarms that you've configured within CloudWatch, and currently there aren't any. What we're going to do is to create an alarm. So click on create alarm, and then click on select metric. Once we're on this screen, scroll down, and we're going to be looking for an EC2 metric, because we need to find the CPU utilization metric, which is inside the EC2 namespace. In other words, it comes from the EC2 service. So go ahead and click on EC2, and then we're looking for per instance metrics. So click on per instance metrics, and this will show all of the EC2 instance metrics that we currently have. Now if I scroll through this list, what you'll see is that I have two different instance IDs, because I'm using this account to create all of these demo lessons. In my case, I see previous instances. Now if you're doing this in your account, if you go back to the EC2 Management Console, you can see your instance ID here. Just remember the last four digits of this instance ID, and then go back to the CloudWatch Console. If you have more than one instance listed in CloudWatch, look for the instance ID that ends with the four digits that you just noted down, and then from that list you need to identify CPU utilization. And so I'm going to check the box next to this metric. Now this is the metric that monitors, as the name suggests, CPU utilization on this specific instance ID, which is our CloudWatch test instance. If I scroll up, I'm able to see any data that's already been gathered for this specific instance. And as you can see, it's not a great deal at the moment because we've only just launched this instance. So I'm gonna go ahead and click on Select Metric, and then because we're creating an alarm, it's going to ask us for what metric and conditions we want to evaluate.

      So I'm going to scroll down, and under Conditions, I'm going to pick Static, because I want this alarm to go into an alarm state when something happens to the CPU utilization. So I'm going to ask CloudWatch that whenever the CPU utilization is greater or equal to a specific value than to go into an alarm state. So that value is going to be 15%. So whenever the CPU utilization on this EC2 instance is greater or equal to 15%, then this alarm will go into the alarm state. So I'm gonna go ahead and click on Next. Now you can set this up so that if this alarm goes into an alarm state, it can notify you using SNS. Now that's useful if this is in production usage, but in this case we're not using it in production, so I'm going to go ahead and click on remove. Scroll down to the bottom, there's also other things that you could pick, so you could do an auto scaling action, an EC2 action, or a systems manager action. But we're going to be talking about these in much more detail as we move through the course. For now we're going to keep this simple, it's just going to be a basic alarm which goes into an alarm state or not. So click on next and then under alarm name I'm going to put CloudWatch test and then high CPU and you should do the same. So type that, click on next, scroll down to the bottom and create that alarm.

      Now initially this alarm state will be insufficient data because CloudWatch hasn't yet gathered enough data on the CPU utilization to generate the state. That's fine because we've we've got another thing that we need to do first. So now move back to the EC2 console and we're going to connect into this instance using EC2 Instance Connect. Remember, that's the web-based way to get access to this instance. So over the top of the CloudWatch Test instance, right click and go to Connect. Make sure that EC2 Instance Connect is selected, so click that tab. You can leave everything as default and click on Connect and that will connect you to this EC2 instance. Now at this point, we need to install an application called stress on this EC2 instance. And stress is an application which will put artificial CPU load onto a system. And that's what we want to do in order to see how CloudWatch reacts. To install stress, we're going to run this command. And this next command will use the yum package manager to install the stress utility. So go ahead and run this command and then clear the screen again. Now the stress command can be run by typing stress and what we're going to do is do a double hyphen help just to get the help for this command. So what we're going to do is we're going to run stress and we're going to specify the number of CPUs to use and we want that number to be the same number of virtual CPUs that this instance has. Now a t2.micro has one virtual CPU and so the command that we need to run is stress space hyphen c space 1 and then space and then we're going to use hyphen t which is the timeout command and this specifies how long we want to run this for. So we're going to specify 3600 so hyphen t and then a space 3600 and this will run the stress for 3600 seconds and that's plenty for us to see how this affects the metrics which are being monitored by CloudWatch.

      Now what I want to do before we do that is go back to the CloudWatch console. You might need to refresh if you haven't seen the state update yet. In my case it's already showing as okay. So this means that it's now got access to some data. So click on this alarm and you'll be able to see that currently the CPU started off at very low levels and then it spiked up and potentially in my case that's because we've just installed some software. But note here this red line which indicates the alarm level for this alarm. So if the CPU utilisation, which is in blue, exceeds this red line then this alarm will move from OK to ALARM. And that's what we want to simulate. So go back to the instance and press Enter to run this stress command. And that's going to begin placing high levels of CPU load on this instance and what we'll see over the next few minutes is CloudWatch will detect this additional CPU load and it will cause this alarm to go from OK into an alarm state. So move back to the CloudWatch console and just keep hitting refresh until you see a change in the alarm state. Again this might take a few minutes. What I suggest you do is pause the video and wait for your alarm to change away from OK and then you're good to continue.

      Now in my case this only took a few minutes and as you can see the CPU load reported by this alarm in CloudWatch went from this value here and spiked all the way up to this value which is well above the 15% of the alarm threshold. So the alarm changed from OK to IN alarm based on this excessive CPU and if we keep monitoring this over time you'll see that this trend continues because this CPU is under extremely high load because it's been artificially simulated using the stress utility. Now if we go back to this EC2 instance and press ctrl and C at the same time this will exit out of the stress utility and at this point the artificial CPU load has been removed and the instance will gradually move back down to its normal levels which is very close to zero. So again what you'll see is this may take a few minutes to be reflected inside CloudWatch. So keep refreshing this once you've cancelled the stress utility and wait for the reported CPU utilization to move back down below the alarm value. Again that might take a few minutes so go ahead and pause the video and wait for this blue line to move back under the red line and once it does you should see that the alarm state changes from in alarm to OK again.

      In my case it took a few minutes for the blue line to move below the alarm threshold and then a few more minutes afterwards for the alarm to change from in alarm to OK. But as you can see at this point that's exactly what's happened once the CPU usage goes below the configured threshold value then the alarm changes back to an OK state. And at this point that's everything that I wanted to cover in this demo lesson on CloudWatch. CloudWatch is a topic that I'm going to be going into much more detail later on in the course. This has just been a really brief introduction to the product and how it interacts with EC2. Now at this point the only thing left is to clear up the account and put it back into the same state as it was at the start of this lesson. So to do that go ahead and click on All Alarms, select the CloudWatch Test High CPU Alarm that you created, click on the actions dropdown, select delete, and then confirm that deletion. Then go back to EC2, go to the instances overview, right click on the CloudWatch test instance, making sure that it is the correct instance, so CloudWatch test, and then select terminate instance and confirm that termination. Now that's going to move through a few states, it will start with shutting down, and you need to wait until that instance is in a terminated state. Go ahead and pause the video and wait for your instance to change into terminated.

      Okay so once your instance has terminated on the menu on the left scroll down go to security groups select the CloudWatch SG security group making sure that you do pick the correct one so CloudWatch SG click on actions scroll down delete security groups and click on delete and at that point the account is back in the same state as it was at the start of this demo lesson. So thanks for watching this video. I hope you gained some experience of the CloudWatch product and again we're going to be talking about it in much more detail later in the course. At this point though go ahead and complete this video and when you're ready I'll look forward to you joining me in the next.

    1. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Otero-Coronel and colleagues use a combination of acoustic stimuli and electrical stimulation of the tectum to study MSI in the M-cells of adult goldfish. They first perform a necessary piece of groundwork in calibrating tectal stimulation for maximal M-cell MSI, and then characterize this MSI with slightly varying tectal and acoustic inputs. Next, they quantify the magnitude and timing of FFI that each type of input has on the M-cell, finding that both the tectum and the auditory system drive FFI, but that FFI decays more slowly for auditory signals. These are novel results that would be of interest to a broader sensory neuroscience community. By then providing pairs of stimuli separated by 50ms, they assess the ability of the first stimulus to suppress responses to the second, finding that acoustic stimuli strongly suppress subsequent acoustic responses in the M-cell, that they weakly suppress subsequent tectal stimulation, and that tectal stimulation does not appreciably inhibit subsequent stimuli of either type. Finally, they show that M-cell physiology mirrors previously reported behavioural data in which stronger stimuli underwent less integration.

      The manuscript is generally well-written and clear. The discussion of results is appropriately broad and open-ended. It's a good document. Our major concerns regarding the study's validity are captured in the individual comments below. In terms of impact, the most compelling new observation is the quantification of the FFI from the two sources and the logical extension of these FFI dynamics to M-cell physiology during MSI. It is also nice, but unsurprising, to see that the relationship between stimulus strength and MSI is similar for M-cell physiology to what has previously been shown for behavior. While we find the results interesting, we think that they will be of greatest interest to those specifically interested in M-cell physiology and function.

      Strengths:

      The methods applied are challenging and appropriate and appear to be well executed. Open questions about the physiological underpinnings of M-cell function are addressed using sound experimental design and methodology, and convincing results are provided that advance our understanding of how two streams of sensory information can interact to control behavior.

      Weaknesses:

      Our concerns about the manuscript are captured in the following specific comments, which we hope will provide a useful perspective for readers and actionable suggestions for the authors.

      Comment 1 (Minor):

      Line 124. Direct stimulation of the tectum to drive M-cell-projecting tectal neurons not only bypasses the retina, it also bypasses intra-tectal processing and inputs to the tectum from other sources (notably the thalamus). This is not an issue with the interpretation of the results, but this description gives the (false) impression that bypassing the retina is sufficient to prevent adaptation. Adding a sentence or two to accurately reflect the complexity of the upstream circuitry (beyond the retina) would be welcome.

      Comment 2 (Major):

      The premise is that stimulation of the tectum is a proxy for a visual stimulus, but the tectum also carries the auditory, lateral line, and vestibular information. This seems like a confound in the interpretation of this preparation as a simple audio-visual paradigm. Minimally, this confound should be noted and addressed. The first heading of the Results should not refer to "visual tectal stimuli".

      Comment 3 (Major):

      Figure 1 and associated text.

      It is unclear and not mentioned in the Methods section how phasic and tonic responses were calculated. It is clear from the example traces that there is a change in tonic responses and the accumulation of subthreshold responses. Depending on how tonic responses were calculated, perhaps the authors could overlay a low-passed filtered trace and/or show calculations based on the filtered trace at each tectal train duration.

      Comment 4 (Minor):

      Figure 3 and associated text.<br /> This is a lovely experiment. Although it is not written in text, it provides logic for the next experiment in choosing a 50ms time interval. It would be great if the authors calculated the first timepoint at which the percentage of shunting inhibition is not significantly different from zero. This would provide a convincing basis for picking 50ms for the next experiment. That said, I suspect that this time point would be earlier than 50m s. This may explain and add further complexity to why the authors found mostly linear or sublinear integration, and perhaps the basis for future experiments to test different stimulus time intervals. Please move calculations to Methods.

      Comment 5 (Major):

      Figure 4C and lines 398-410.<br /> These are beautiful examples of M-cell firing, but the text suggests that they occurred rarely and nowhere close to significantly above events observed from single modalities. We do not see this as a valid result to report because there is insufficient evidence that the phenomenon shown is consistent or representative of your data.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC- 2024-02497

      Corresponding author(s): Tourriere, Hélene and Maraver, Antonio

      1. General Statements [optional]

      We sincerely thank the Editors and Reviewers for the time devoted to our manuscript. We found their critiques interesting and very helpful. After careful examination and thanks to a large collaborative effort, we will be able to answer to all the reviewers’ comments by adding significantly new experimental data.

      We are also encouraged by the positive comments of the Reviewers:

      “This manuscript will likely engage oncologists who investigate the chemotherapy-resistant mechanisms of platinum compounds in NSCLC treatment” (Reviewer 1);

      “Overall, the authors have conducted experiments that sufficiently elucidate their claims, and the description of the experiments is detailed.”; and “Overall, this work unveils a novel mechanism for Notch activation in response to platinum chemotherapy, providing a renewed outlook on overcoming chemotherapy resistance in NSCLC” (Reviewer 2).

      We are also aware that both reviewers agreed that there is room for improvement, and we are sure that upon accomplishment of all proposed experiments both reviewers will be fully satisfied.

      Please bear in mind that although it was known that platinum-based chemotherapy induced the Notch pathway in lung cancer cells, the underlying molecular mechanism was largely unknown. Thanks to the molecular dissection we performed in our study, we propose an innovative treatment for patients with lung cancer, the main cause of death by cancer in the world. Hence, we agree with both reviewers that our study will be appealing for a large number of cancer researchers, and we feel it will be also the case for those interested in DNA damage, Notch and MDM2 pathways.

      2. Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript from Maraver and co-authors investigates the putative resistance mechanisms that hinder the efficacy of platinum-based therapies (e.g., carboplatin) against non-small cell lung carcinoma (NSCLC). Using in vitro lung cancer cell lines, shRNA-based knockdown, and exogenous overexpression systems, the research describes a DNA damage-induced resistance mechanism involving the NOTCH signaling pathway and the E3 ligase MDM2. The authors show that carboplatin treatment induces DNA damage and promotes ATM activation, which in turn activates the NOTCH signaling pathway via ubiquitination and stabilization of the Notch Intracellular Domain (NICD). New findings include the MDM2-mediated ubiquitination and stabilization of NICD. Using in vivo NSCLC-PDX models, they demonstrate that combining carboplatin with Notch and MDM2 inhibitors can enhance tumor killing, suggesting that targeting the MDM2/NICD axis in conjunction with carboplatin may be a viable therapeutic alternative. Furthermore, they show that NICD and MDM2 levels are elevated among tumor samples from chemo-resistant patients. Consistent with these findings, high MDM2 levels correlate with poor progression-free survival (PFS) in NSCLC patients.

      [Authors] We thank this reviewer for her/his fair summary of our work that highlights our new findings.

      Major comments:

      Some of the key conclusions may not be convincing.

      [Authors] We understand the concerns that reviewer might have and we are sure that upon accomplishment of all experiments detailed below, she/he will be convinced that the manuscript will be ready for publication.

      1. One significant weakness of the manuscript is the lack of exploration into the underlying mechanism of how MDM2 mediates the stabilization of NICD. While the observation of MDM2-mediated NICD stabilization is intriguing, it is important to provide a more convincing explanation for the reviewers. This could be achieved by offering a detailed molecular mechanism, especially considering that MDM2 typically targets proteins for degradation.

      [Authors] After reading this reviewer’s comment, we realize we did a poor job discussing better the previous study demonstrating that MDM2 induced ubiquitination on NICD but not for degradative purposes (Pettersson et al., 2013). In particular, they performed it using a mutated form of ubiquitin in lysine 48, i.e., the K48R mutant. Like this, the authors of this seminal study demonstrated that MDM2 was still able to induce ubiquitination in NICD, and hence it was not degradative.

      Still, and to confirm that this is the case also upon DNA damage, we will perform experiments using same K48R mutant to formally prove that MDM2 upon DNA damage does not ubiquitinate NICD via lysine 48-linked polymers, and hence it is not degradative. Even more, upon discussion with Laetitia Linares, author of our study and long-lasting expert in ubiquitination (for instance see (Riscal et al., 2016) and (Arena et al., 2018)), we will use another ubiquitin mutant in lysine 63. This different type of ubiquitination does not mark proteins for degradation but promote an association of the targeted protein with DNA helping for DNA repair (Liu et al., 2018). Using a ubiquitin mutated in this lysine, i.e., K63R, this type of ubiquitination cannot occur. Taking into account that we observe NICD increase ubiquitination upon DNA damage, the use of K63R will be very informative.

      Hence, we will repeat experiments of current Figure 3A with the same WT ubiquitin as before, and now also with K48R and K63R mutants. Even more, we will also include mutant forms of ubiquitin which can only form ubiquitin chains on lysine 48 (K48 only) or lysine 63 (K63 only) and we anticipate that in the presence of K48 only mutant, NICD will not be ubiquitinated upon DNA damage, while the use of K63 only mutant will be very useful. All these data will be part of the new Figure 3A.

      Of note, Dr Linares has all tools required to perform these experiments and hence we will start them soon.

      Another weakness lies in the unclear role and the underlying mechanism of ATM in the MDM2-mediated NICD stabilization. While the data presented (Fig. 3B, 3C) suggest that carboplatin could elevate MDM2 levels for NICD stabilization, a more precise method to induce MDM2 overexpression specifically for targeting NICD is required. It appears that ATM plays a crucial role in this regulatory process. The following questions must be addressed: Does ATM induce the phosphorylation of MDM2 for its protein stabilization and/or E3 ligase activity?

      [Authors] There are several points here.

      For the first one, the use of a more precise method to induce MDM2 overexpression, it is exactly what we did in Figure 4A, i.e., ectopic expression of MDM2 to demonstrate that MDM2 is sufficient to increase NICD levels.

      For the second one, i.e., the phosphorylation status of MDM2 by ATM in our system, we will perform different experiments. There are up to six proposed residues in MDM2 to be phosphorylated by ATM upon DNA damage: S386, S395, S407, T419, S425, and S429 (Cheng et al., 2011). Among all of them, S395 is the most well-known and again Dr Linares has interesting tools we will use to answer to this specific reviewer’s point. We will use an MDM2 mutant harboring an aspartate instead of the serine in this position, i.e., S395D, that mimics the serine 395 phosphorylation induced by ATM upon DNA damage. We will use this mutant together with the WT and 464A MDM2 proteins already used, and if this residue is important in our phenotype, total levels of NICD will be even higher and/or localize more in the nuclei when compared with WT MDM2. All these new data will appear as the new Figure 4A __and new Figure 4B__.

      Furthermore, we will also use an antibody that recognizes this phosphorylation site by WB after carboplatin treatment and it will be part of the new Figure 3B.

      Finally, we will also express WT MDM2 and purify it by immunoprecipitation in different experimental conditions: steady state, upon carboplatin treatment and also in combination of carboplatin and ATM inhibitor, to perform phospho-proteomics analysis upon all these conditions. Of note, and to show the feasibility of this approach, the proteomic platform at Biocampus in Montpellier has experience using this technique (Kassouf et al., 2019).

      The combination therapy of carboplatin with MDM2 and NICD inhibitors may lack compelling rationale (see below).

      [Authors] This is a very important point but we discuss it below, where more information is provided by the reviewer. Still, we anticipate we will perform a new in vivo experiment to answer to this point.

      In lines 275-276, the authors stated that their preclinical data establish the enhancement of carboplatin's therapeutic effect in NSCLC in vivo through MDM2-NICD axis inhibition. However, it's important to note that this finding remains preliminary at this stage.

      [Authors] We consider that our statement is not exaggerated, but we will tone down the message as proposed by the reviewer in the next submission.

      Minor comments:

      1. The observed loss of NICD during ATMi + carboplatin treatment in Figures 2A and 2B raises the question of whether ATM regulates the gene transcription of NOTCH. In addition to the CHX assay conducted in Figures 2C and 2D, quantifying NOTCH mRNA upon ATM inhibition could provide further insights. Alternatively, referencing relevant studies on this topic may strengthen the discussion.

      [Authors] This is an interesting experiment and we will perform it.

      In Figures 4A and 4B, the noticeable discrepancy between the exogenous expression of wild-type (WT) MDM2 and catalytically inactive MDM2-464A raises concerns. It is essential to consider if the reduced ubiquitination and stability of NICD might be attributed to varying levels of MDM2-464A in the cells rather than its catalytic inactivity. While p53 ubiquitination was utilized as a control, ensuring comparable levels of MDM2 and MDM2-464A expression could enhance the experimental rigor. Compared to the smear poly-ubiquitination bands observed for MDM2 in Figure 4B, the ubiquitination of NICD appears simpler. What distinguishes the feature of MDM2-mediated NICD ubiquitination? Could it potentially involve mono-ubiquitination?

      [Authors] The point of the reviewer is well taken, and importantly, as mentioned above in main point 2, we will repeat these experiments and will appear as new Figure 4A and new Figure 4B.

      Regarding the type of ubiquitination, as explained in detail in major point 1 to same reviewer, we will fully characterize the type of ubiquitination on NICD induced by DNA damage, and we will confirm that MDM2 is required for this specific ubiquitination in future new Figure 4C where we will overexpress the required ubiquitin forms and WT MDM2.

      In Figure 5A, the authors need to consider conducting additional NOTCH-associated factors to definitively demonstrate the activation of NOTCH signaling beyond HES1. Alternatively, in Figure 5B, the NICD Western blot could be complemented by detecting HES1 or other NOTCH-associated factors.

      [Authors] To answer to this particular point, we will test for other downstream targets of Notch as NRARP and it will appear as part of new Figure 5C.

      In Figures 5C and 5D, crucial control groups are missing, specifically mice treated solely with SP141+DBZ, carboplatin+SP141, and SP141+DBZ. It is essential to include these groups to demonstrate that the enhanced tumor killing results from the combination of carboplatin with SP141 and/or DBZ, rather than from SP141 and DBZ alone. Furthermore, in addition to the currently used NSCLC-PDX model harboring the p53 (P151R) mutation, it would be informative to include a NSCLC-PDX model expressing WT p53.

      [Authors] This is a crucial point in this rebuttal as mentioned before in major point 3 and we detail it in here.

      We did only 3 groups because preliminary data indicated that SP141 in combination with carboplatin was not showing any benefit compared to carboplatin alone while upon combination of carboplatin with Notch inhibition there was only a slight increase in therapeutic carboplatin benefit but otherwise not very potent, and for simplicity we preferred to don’t show these data. But, after reading this point from Reviewer 1, even if we will propose later only the triple combination for patients, we clearly need to demonstrate that the other combinations are not potent enough or not at all.

      The reviewer asked to include: “SP141+DBZ, carboplatin+SP141, and SP141+DBZ”. We imagine that she/he meant: SP141+DBZ, carboplatin+SP141, and carboplatin +DBZ, that together with the vehicle, carboplatin and carboplatin+SP141+DBZ makes 6 groups of treatments. Putting together the 8 mice devoted for tumor growth and survival, plus 4 mice for the acute treatment for IHC and WB purposes (for current Figures 5A and 5B) makes a total of 72, that is a substantial number of mice. Of note, since we performed the in vivo experiment presented in the current manuscript, a new Notch inhibitor called nirogacestat, appear in the market being the first in class Notch inhibitor to treat solid cancer patients (desmoid tumors) after demonstrating a significant therapeutic effect in clinical trials (Gounder et al., 2023).

      Hence, we will take advantage of the repetition of this experiment to substitute this new molecule instead of DBZ, that is an interesting molecule for preclinical research, but without any clinical relevance. Therefore, the use of nirogacestat will further increase the medical impact of our data. Importantly, nirogacestat is better tolerated than DBZ, meaning that mice can be treated for longer periods of time and we propose in here to treat up to 12 weeks. Finally, after discussion with Quentin Thomas, author of the manuscript and clinical researcher in the lab, we will provide 4 carboplatin cycles as it is proposed today to NSCLC patients in an attempt of getting closer to the clinical setting. In particular we will provide carboplatin to mice on weeks 1, 4, 7 and 10, while treating with MDM2 inhibitor (SP141) and Notch inhibitor (nirogacestat) from Monday to Friday for the 12 weeks.

      This experiment will be long and will require an important use of resources both human and financial, but we are sure that the effect in tumor growth and survival will be more dramatic than the one presented now.

      On the contrary and as explained in the 4th subheading part of this “revision plan”, including another 72 mice to treat a p53 proficient NSCLC PDX, when we already demonstrated in vitro that p53 is not required for the phenotype described in this study, for us it is totally unfeasible by ethical reasons, i.e., the use of animals in research (please see below for further details).

      All the new data will appear as new Figure 5 (B to E). For new Figure 5A please see below the major comment 2 of Reviewer 2.

      Though beyond the current study's scope, in the discussion section, the authors may want to propose or hypothesize on how MDM2-mediated NICD stabilization contributes to carboplatin resistance. This could provide valuable insights for future research directions.

      [Authors] We will discuss this part as proposed by the reviewer.

      In the Western blot results, the total ATM and ATR controls were absent.

      [Authors] The reviewer is totally right and we will repeat experiments to include all the totals as requested.

      Authors may choose to include a graphical abstract at the end of their study to visually illustrate the mechanisms they have described.

      [Authors] Very good idea thanks, we will do it.

      Reviewer #1 (Significance (Required)):

      Advance: The authors aim to present a novel perspective on the resistance mechanisms to platinum compounds in NSCLC therapy. They explore platinum compounds-induced DNA damage, ATM activation, and MDM2-mediated stabilization of the active form of NOTCH (NICD). However, to strengthen their claims, they must provide more conclusive results.

      Audience: This manuscript will likely engage oncologists who investigate the chemotherapy-resistant mechanisms of platinum compounds in NSCLC treatment, as well as scientists specializing in NOTCH and MDM2 pathways. However, the manuscript's central claims lack robust support from the available data, and the current approaches employed are not sufficiently thoughtful and rigorous; there is room for improvement.

      My expertise is molecular medicine, cancer biology, and epigenetics.

      [Authors] We want to thank again this reviewer for her/his helpful comments that will increase the impact and the relevance of our study while keeping the original message.

      We are also very satisfied when she/he said: “This manuscript will likely engage oncologists who investigate the chemotherapy-resistant mechanisms of platinum compounds in NSCLC treatment”.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Sara Bernardo et al. investigated the molecular mechanisms underlying the activation of the Notch signaling in response to DNA damage induced by platinum-based chemotherapeutic agents in non-small cell lung cancer (NSCLC). They demonstrated that carboplatin treatment induces DNA double-strand breaks (DSBs) and stabilizes NICD, a process dependent on ATM and mediated by MDM2. In vivo experiments in patient-derived xenografts (PDX) showed that inhibition of NICD and MDM2 enhanced platinum effectiveness. Furthermore, clinical analysis revealed a correlation between MDM2 expression and poor prognosis in NSCLC patients treated with platinum compounds, emphasizing the clinical relevance of the MDM2-NICD axis in platinum resistance.

      [Authors] We thank this reviewer for her/his nice synopsis of our study.

      Major comments:

      Overall, the authors have conducted experiments that sufficiently elucidate their claims, and the description of the experiments is detailed. However, there is still room for the improvement.

      [Authors] We are very pleased that reviewer finds our experimental work “…sufficiently elucidate their claims, and the description of the experiments is detailed.” And we are sure that after all the new experiments we are proposing in here, she/he will be fully satisfied.

      1.The finding that MDM2 promoted NICD stability through non degradative ubiquitination is interesting and in line with a previous study. As it is also known that NICD is regulated by various post-translational modifications, including ubiquitination that promotes NICD degradation. It is unclear what's the potential difference between these two types of ubiquitination. For example, do these two differ in specific ubiquitination sites? Can the authors provide some discussion?

      [Authors] We agree with the reviewer and hence we will perform a new set of experiments to determine the role of 2 key lysine residues in the ubiquitin protein promoting either degradation or DNA binding. As explained in detail in major point 1 from reviewer 1, we will determine if DNA damage promotes ubiquitination in position 48, i.e., to degrade, or in position 63, i.e., to facilitate the binding to DNA for repairing upon DNA damage, or in any of these 2 positions. And as mentioned above, we will then confirm that MDM2 is responsible of the specific ubiquitination type we will uncover. We are sure that the reviewer will be satisfied by these new data once is generated.

      As for the specific ubiquitination sites in NICD, there are up to 17 lysine residues susceptible of being ubiquitinated. Hence unveiling what residues are targeted by MDM2 and if they differ from others inducing degradation as those promoted by the E3 ligase FBXW7, we feel is out of the scope of the current manuscript. Still, we will discuss all this part as kindly proposed by the reviewer.

      Could the overexpression of MDM2 or NICD lead to carboplatin resistance in A549 or H358 cells?

      [Authors] This is a very interesting experiment and prompted by the reviewer’s comment we started the subcloning of inducible NICD into lentiviral vectors to generate stable cells and test the carboplatin sensitivity in presence of different levels of NICD. These new data will be the new Figure 5A.

      The trends observed in the western blot data within the manuscript appear inconsistent. While the authors propose that NICD levels increased upon incubation with carboplatin, the discrepancy arises when considering the NICD levels without cycloheximide (CHX) treatment in Figure 1E, where no significant elevation is observed (Lane 6 vs. Lane 1).

      [Authors] The point of the reviewer is well taken. Please bear in mind that in here we are handling several signaling pathways that interact among them while having each one different kinetics. Our finding of increased NICD upon carboplatin treatment is highly consistent in vitro and in vivo, but it is true that in the experiment mentioned by the reviewer is not obvious, probably due to some kinetic issue. We are repeating this experiment to have the increased in NICD upon carboplatin as it is in the rest of the manuscript (up to 9 times only in main figures).

      The quality of western blots needs to be improved, especially Fig. 1C and S1C, also Figure 3B. Moreover, the NICD western blot sometimes appears as one band and sometimes as two bands. Please provide an explanation. If possible, please quantify the bands in western blots.

      [Authors] We agree with the reviewers that not all WB have the same quality and we will repeat some of them to homogenize the quality all over the manuscript, and particularly, we will repeat the ones kindly pointed out by the reviewer.

      The two bands it is something we also noticed and we will pay attention while reproducing the WB, since it might be related to discrepancies in the percentage of acrylamide. If this is not the case, i.e., upon repetition we still observe in some conditions and not in others, we will provide explanations for this in the new submission as kindly proposed by the reviewer.

      Finally, and also as proposed by the reviewer we will quantify the WB bands.

      Please provide a necessary discussion on whether the targeted treatment approach towards the MDM2-NICD axis is applicable to all patients or only to those with high expression of MDM2/NICD.

      [Authors] In the discussion of the current manuscript, we focused into the MDM2 high expression subset of patients for this issue, but in the next submission we will enlarge to patients with high levels of NICD also.

      How to interpret the significance of the simultaneous increase in NICD ubiquitination and stability mediated by MDM2? Please provide a relevant discussion.

      [Authors] We will provide strong experimental data to go beyond discussion (please see above the experiments with ubiquitin mutants), but we will also provide discussion of this particular point.

      In Figure 5B, please also check the level of MDM2. In Figure 5C, carboplatin appears to have little impact on tumor growth. How to explain the increase of Ki-67 in the carboplatin treatment group in Figure 5A?

      [Authors] We will measure also levels of MDM2 in the future new Figure 5C as requested by the reviewer.

      As for the interesting observation of the Ki67, since we will repeat the whole experiment, we will pay special attention to this point if ever it is repeated. Should be this the case, we will elaborate an explanation.

      Minor comments:

      1.Please include scale bars in Figure 1B and Supplemental Figure 1B.

      [Authors] We thank the reviewer for this comment. We will include the scale bars where required.

      2.Figure 5D, the P values of the survival curve should be indicated in the figures.

      [Authors] We will include the P values in the future new Figure 5E.

      3.The presentation of survival curve data in Figures 5D and 6A should be consistent.

      [Authors] The point of the reviewer is well taken and we will use Prism to draw the PFS for patients in Figure 6A as we did for the mice in current Figure 5D.

      4.It seems that supplemental figure 2 is missing.

      [Authors] We actually jumped from supplemental figure 1 to 3 because we do not have any associated supplemental figure to main Figure 2. We will clarify this point in the next submission.

      5.Please carefully check the spelling of the entire text, for example, on page 20, line 426 it should be 'western'. Also, please spell out the abbreviations DDR and ATM.

      [Authors] We will double check all spelling and provide the abbreviations kindly suggested by the reviewer.

      6.The abbreviation for Cleaved caspase 3 should be CC3.

      [Authors] We thank the reviewer for this information, we will use CC3 in the next submission.

      Reviewer #2 (Significance (Required)):

      Notch signaling is associated with the occurrence and development of non-small cell lung cancer (NSCLC). Previous study indicates that the expression of Notch protein is significantly higher in NSCLC tissues compared to normal tissues (PMID: 31170211). Additionally, the upregulation of Notch1 is correlated with higher tumor grades, lymph node metastasis, tumor-node-metastasis (TNM) staging, and poor prognosis (PMID: 25996086). Abnormal activation of Notch signaling pathway is frequently observed in chemotherapy-resistant NSCLC, and some studies have aimed to address NSCLC drug resistance via modulating Notch signaling (PMID: 30087852, 38301911). This manuscript firstly proposes that MDM2-mediated stabilization of NICD upon DNA damage plays a major role in NSCLC response to platinum chemotherapy. It further suggests that targeting the MDM2-NICD axis could prove to be an effective therapeutic strategy. Overall, this work unveils a novel mechanism for Notch activation in response to platinum chemotherapy, providing a renewed outlook on overcoming chemotherapy resistance in NSCLC. This manuscript will attract those interested in the mechanisms of chemotherapy resistance and novel treatment approaches.

      [Authors] We sincerely thank the reviewer for finding that our “…work unveils a novel mechanism for Notch activation in response to platinum chemotherapy, providing a renewed outlook on overcoming chemotherapy resistance in NSCLC”. We are also very satisfied when she/he says: “This manuscript will attract those interested in the mechanisms of chemotherapy resistance and novel treatment approaches.”

      Finally, we are convinced that the reviewer will appreciate all the new proposed experimental data, and also that upon finishing all experiments, she/he will think that the manuscript will be suitable for publication.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      For simplicity, we decided to introduce all changes in next submission upon conclusion of all experimental approaches proposed above.

      4. Description of analyses that authors prefer not to carry out

      While we will perform almost all experiments proposed by reviewers, there is one we feel is not possible to do due to ethical reasons. Reviewer 1 wanted us to perform a new in vivo experiment with the same PDX using up to 6 treatment groups. We use 8 mice per condition (for tumor growth and survival) plus 4 for the “acute” treatment for WB and IHC purposes, hence 12 mice x 6 groups = 72 mice, and we will perform this experiment as indicated above and proposed by the reviewer.

      On the contrary, the reviewer asked us also to repeat the same experiment with a PDX p53 proficient. While we understand the possible interest, since we demonstrated in vitro that p53 is not required for the protective phenotype of MDM2 and Notch upon DNA damage, we honestly believe that using another 72 mice to confirm this aspect in vivo, is against the rational use of animals in research going against the 3Rs rule. Hence, we will not perform this experiment unless Editors believe is strictly required.

      REFERENCES

      Arena, G., Cisse, M. Y., Pyrdziak, S., Chatre, L., Riscal, R., Fuentes, M., Arnold, J. J., Kastner, M., Gayte, L., Bertrand-Gaday, C., et al. (2018). Mitochondrial MDM2 Regulates Respiratory Complex I Activity Independently of p53. Mol Cell 69, 594-609 e598.

      Cheng, Q., Cross, B., Li, B., Chen, L., Li, Z., and Chen, J. (2011). Regulation of MDM2 E3 ligase activity by phosphorylation after DNA damage. Mol Cell Biol 31, 4951-4963.

      Gounder, M., Ratan, R., Alcindor, T., Schoffski, P., van der Graaf, W. T., Wilky, B. A., Riedel, R. F., Lim, A., Smith, L. M., Moody, S., et al. (2023). Nirogacestat, a gamma-Secretase Inhibitor for Desmoid Tumors. N Engl J Med 388, 898-912.

      Kassouf, T., Larive, R. M., Morel, A., Urbach, S., Bettache, N., Marcial Medina, M. C., Merezegue, F., Freiss, G., Peter, M., Boissiere-Michot, F., et al. (2019). The Syk Kinase Promotes Mammary Epithelial Integrity and Inhibits Breast Cancer Invasion by Stabilizing the E-Cadherin/Catenin Complex. Cancers (Basel) 11.

      Liu, P., Gan, W., Su, S., Hauenstein, A. V., Fu, T. M., Brasher, B., Schwerdtfeger, C., Liang, A. C., Xu, M., and Wei, W. (2018). K63-linked polyubiquitin chains bind to DNA to facilitate DNA damage repair. Sci Signal 11.

      Pettersson, S., Sczaniecka, M., McLaren, L., Russell, F., Gladstone, K., Hupp, T., and Wallace, M. (2013). Non-degradative ubiquitination of the Notch1 receptor by the E3 ligase MDM2 activates the Notch signalling pathway. Biochem J 450, 523-536.

      Riscal, R., Schrepfer, E., Arena, G., Cisse, M. Y., Bellvert, F., Heuillet, M., Rambow, F., Bonneil, E., Sabourdy, F., Vincent, C., et al. (2016). Chromatin-Bound MDM2 Regulates Serine Metabolism and Redox Homeostasis Independently of p53. Mol Cell 62, 890-902.

    1. Reviewer #2 (Public Review):

      Summary:

      Turning behavior plays a crucial role in animal exploration and escape responses, regardless of the presence or absence of environmental cues. These turns can be broadly categorized into two categories: strong reorientations, characterized by sudden changes in path directionality, and smooth turns, which involve gradual changes in the direction of motion, leading to sinuosity and looping patterns. One of the key model animals to study these behaviors is the nematode Caenorhabditis elegans, in which the role of strong reorientations has been thoroughly studied. Despite their impact on trajectories, smooth turns have received less attention and remain poorly understood. This study addresses this gap in the literature, by studying the interplay between smooth turns and strong reorientations in nematodes moving in a uniform environment, surrounded by an aversive barrier. The authors use this set-up to study both exploration behavior (when the worm is far from the aversive barrier) and avoidance behavior (when the worm senses the aversive barrier). The main claims of the paper are that (1) during exploratory behavior, the parameters governing strong reorientations are optimized to compensate for the effect of smooth turns, increasing exploration efficiency, and (2) during avoidance, strong reorientations are biased towards the side that maximizes escape success. To support these two claims, the paper presents a detailed quantitative characterization of the statistics of smooth turns and strong reorientations. These results offer insights that may interest a diverse audience, including those in movement ecology, animal search behavior, and the study of Caenorhabditis elegans. In our opinion, the experimental work and data analysis are of the highest quality, resulting in a very clean characterization of C. elegans' turning behavior. However, the experimental design and data analyses presented are not fully aligned with some of the central conclusions drawn, and in particular, we believe that further work is needed to fully support the claim that strong reorientations are optimized to increase exploration efficiency.

      Strengths:

      The authors have addressed important questions in movement ecology through hypothesis-driven experiments. The choice of C. elegans as a model organism to investigate the impact of turning dynamics on escape and exploration is well-justified by its limited repertoire of strong reorientation behaviors and consistent turning bias across strains and individuals. The quality of the experimental data is very high, using state-of-the-art techniques, and a set-up where a robust and reproducible avoidance response can be studied. The data analysis benefits from state-of-the-art techniques and a deep understanding of C. elegans' behavior, resulting in a very clean and very clear set of results. We particularly appreciated the use of a ventral/dorsal reference system (rather than a left/right one), which is more natural and insightful. As a result, the paper presents one of the best characterizations of C. elegans sharp turning behavior published to date. We find that the claim that strong reorientations are chosen in a way that optimizes avoidance behavior is solid and well-supported. The manuscript is well-written and maintains a coherent line of reasoning throughout.

      Weaknesses:

      Our primary concerns revolve around the significance and rigor of the research on exploratory behavior. First, we believe that the experimental arena was too small for accurately observing the unfolding of exploration. The movement of assayed animals was clearly impaired by boundary effects, which obscured key elements of C. elegans exploratory behavior such as the mean square displacement or large-scale trajectory structures emerging from curvature bias. Second, we think that the proof that strong reorientations are optimized to maximize exploration performance is too indirect: it relies on a particular model with some unrealistic assumptions and lacks a quantification of the gains provided by the optimization to the individuals. We believe that a more thorough and direct analysis would be needed to fully support the claim.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Responses to recommendations

      Reviewer #1 (Recommendations For The Authors):

      Describe more precisely how gene expression graphs are built (tissues, reads counts). For example, how were read counts normalized? Were they from DESeq2 data, which only works by comparing two samples? If so, all samples should be independently compared to a reference and the normalized expression value of the reference will change from sample to sample... thus introducing a pure technical artifact.

      We have added additional information about the normalisation method to the

      Material and Methods section (Lines 597-598: “Lastly, expression levels shown in figures 2-5 are normalised gene counts produced by DESeq2.”) and figure legends

      (lines 247, 286, 372, 404: “Gene expression data was generated from whole fish.

      Expression levels were derived from DESeq2 normalised gene counts.”) to address this recommendation. 

      DESeq2 provides a reference independent normalisation through a median of ratios method (a good explanation can be found here:

      https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.h tml). The normalised expression values are independent of any reference, and therefore will not change from sample and sample as suggested in this comment. In contrast, the pairwise comparisons are done when analysing significantly differentially expressed genes between two treatments using a Wald test, which is done against a reference and generates log2 fold change information and p-values.; however, this is different to the normalisation we described above.

      Provide bioinformatics workflows and, if possible, the set of parameters used, the computing resources, etc. Were some assembly finishing steps carried out (by long-range PCR?) and experimental validations (especially for allelespecific transcripts, by conventional RT-PCR based on diagnostic mutations)?

      We have added additional information on the bioinformatics workflows where required, including parameters used (Lines 530, 536, 549-551, and 574-583.). No finishing steps other than HiC scaffolding were performed. No allele-specific analysis was done as part of this manuscript.

      To further improve transparency, we have also uploaded all the scripts used for this study to https://github.com/R-Huerlimann/Malabar_grouper_genome and the gene models and functional annotation to https://figshare.com/projects/Malabar_grouper_Epinephelus_malabaricus_genome_ annotation/199909. This information has been added to the manuscript in lines 600601 and 609-611.

      Reviewer #3 (Recommendations For The Authors):

      General author response:

      All the recommendations of this reviewer are very relevant and would certainly provide a lot of information, but they are constituting a full project in themselves as they would imply establishing this grouper species as an experimental model in our lab. Currently we only have access to the larval and juvenile stages via a collaboration with the Okinawa Prefectural Sea Farming Center, which is an hour drive from our lab, and is limited to the grouper spawning season. If we want to do all what is suggested, we need to have a regular and easy access to the fishes. This would require establishing this model in our marine station, which is not possible due to space and time issues. These groupers grow to a very large size (1-2 m in length, and up to 150 kg in weight) and only mature into males after > 6 years.

      First and foremost, I would advise the authors to extend their TH and cortisol levels measurements to the entire developmental time considered in their analysis.

      For the reasons stated above we could not perform these experiments. We must emphasize that the data regarding TH are available for a closely related species (e.g., Epinephelus coioides, de Jesus et al. 1998) and there is no reason to think that the situation will be drastically different in E. malabaricus. In addition, given that we have now studied several coral reef fish species in the same context (clownfish, surgeonfish, damselfish, gobies) we observed that the transcriptomic data are more robust, more sensitive, and more precise than hormone measurements. 

      Consider carrying out in situ hybridisation of TSH with putative CRH receptors to determine if thyrotrophin could be competent to respond to HPA axis signals.

      We agree studying the interplay between corticoids and thyroid hormones at the neuroendocrine level would be desirable and we fully agree with the experiment suggested by the reviewer, but this is impossible in our current situation. We are not working with an establish animal model like zebrafish or Xenopus, but with a large, long-lived marine fish that reproduces in spawning aggregations and whose husbandry is notoriously difficult.

      Consider conducting cortisol treatment experiments to functionally determine if indeed cortisol is involved in grouper metamorphosis.

      We tried to do TH and cortisol treatments specifically on the early larval stages corresponding to the early TH peak to see how this would impact the development of the fin spines, but our trials were unsuccessful. The larvae at that stage are extremely fragile and even putting them into small volumes of treatment drugs induced massive mortalities. Again, this would mean establishing this grouper species as a model organism and would require a massive effort to improve larval rearing as discussed above. We feel that our data stands on its own in the meantime and adds valuable information to the existing literature by studying a rarely investigated species.

      Responses to comments

      Reviewer #1 (Public Review):

      Weaknesses:

      The manuscript needs proper editing and is not complete. Some wordings lack precision and make it difficult to follow (e.g. line 98 "we assembled a chromosome-scale genome of ..." should read instead "we assembled a chromsome-scla genome sequence of ...". Also, panel Figure 2E is missing.

      We made the suggested change of adding “sequence” in lines 32 and 121. Concerning additional changes, we have carefully edited our manuscript and looked for any incomplete sections. Unfortunately, it is difficult to see what other issues are being raised here without any further information. 

      As for panel E of figure 2, it is not missing. The panel is located to the right, just below “Target Cells”.

      The shortcomings of the manuscripts are not limited to the writing style, and important technical and technological information is missing or not clear enough, thereby preventing a proper evaluation of the resolution of the genomic resources provided:

      Several RNASeq libraries from different tissues have been built to help annotate the genome and identify transcribed regions. This is fine. But all along the manuscript, gene expression changes are summarized into a single panel where it is not clear at all which tissue this comes from (whole embryo or a specific tissue ?), or whether it is a cumulative expression level computed across several tissues (and how it was computed) etc. This is essential information needed for data interpretation.

      No fertilised eggs or embryos have been sequenced. The individual tissues derived from juvenile fish were used for the genome annotation only, using ISOseq. The whole larval fish were used for the developmental analysis using RNAseq, as well as the genome annotation. We have added additional information in the figures and text that the results shown are from whole larvae, and added more detail to the material and methods section about which type of sample was analysed in which way.

      Specifically, we have added “Lastly, expression levels shown in figures 2-5 are normalised gene counts produced by DESeq2.” to lines 597-598 in the Material and Methods section, “Gene expression data was generated from whole larvae.” to line 191, and “Gene expression data was generated from whole fish. Expression levels were derived from DESeq2 normalised gene counts.” to the figure legends in lines 247, 286, 372, 404). Additionally, we have added clarifications in lines 489, 497, 530, and 536. 

      The bioinformatic processing, especially of the assemble and annotation, is very poorly described. This is also a sensitive topic, as illustrated by the numerous "assemblathon" and "annotathon" initiatives to evaluate tools and workflows. Importantly, providing configuration files and in-depth description of workflows and parameter settings is highly recommended. This can be made available through data store services and documents even benefit from DOIs. This provides others with more information to evaluate the resolution of this work. No doubt that it is well done,but especially in the field of genome assembly and annotation, high resolution is VERY cost and time-intensive. Not surprisingly, most projects are conditioned by trade-offs between cost, time, and labor. The authors should provide others with the information needed to evaluate this.

      We have added additional information on parameters used in the genome assembly, annotation and transcriptome analysis in lines 549-551, 577, 579, 580, and 582. Additionally, we have uploaded all scripts to github as outlined in the Code and Data Availability section (lines 599-614).

      The genome assembly did not use a specific workflow (e.g., nextflow), but was done with a simple command and standard parameters in IPA. Scaffolding was carried out by Phase Genomics using their standardised proprietary workflow, of which a detailed description provided by Phase Genomics can be found in the supplementary material.

      Quantifications of T3 and T4 levels look fairly low and not so convincing. The work would clearly benefit from a discussion about why the signal is so low and what are the current technological limitations of these quantifications.

      This would really help (general) readers.

      The T3/T4 levels are consistent with other published work in fish. In the present manuscript for grouper we have a peak level of 1.2 ng/g (1,200 pg/g) of T4 and 0.06 ng/g (60 pg/g) of T3. This is a higher level of T4 and comparable level of T3 to what was found in convict tang (Holzer et al. 2017; Figure 2) with 30 pg/g of T4 and 100 pg/g of T3. Of course, there are also examples with higher levels, such as clownfish (Roux et al. 2023; Figure 1), with 10 ng/g (10,000 pg/g) of T4 and 2 ng/g (2,000 pg/g) of T3.

      The differences could be due to different structure of fish tissues and therefore different hormone extraction efficiency, different hormone measurement protocols, different fish physiology, different fish size (e.g., the weighting of tiny grouper larvae is difficult and less precise than in convict tang). What is important is not the absolute level but the relative level, which shows the change within different larval stages of a species with identical extraction and measurement protocols. Which means our data is internally consistent and coherent with what the grouper literature says.

      Holzer, Guillaume, et al. "Fish larval recruitment to reefs is a thyroid hormonemediated metamorphosis sensitive to the pesticide chlorpyrifos." Elife 6 (2017): e27595.

      Roux, Natacha, et al. "The multi-level regulation of clownfish metamorphosis by thyroid hormones." Cell Reports 42.7 (2023).

      Differential analysis highlights up to ~ 15,000 differentially expressed genes (DEG), out of a predicted 26k genes. This corresponds to more than half of all genes. ANOVA-based differential analysis relies on the simple fact that only a minority of genes are DEG. Having >50% DEG is well beyond the validity of the method. This should be addressed, or at least discussed.

      The large number of differentially expressed genes is due to the fact that this is coming from a larval developmental transcriptome going from one day old larva to fully metamorphosed juveniles at around day 60. 

      While DESeq2 indeed works on an assumption that most genes are not differentially expressed, this affects normalization but not hypothesis testing (Wald-test, LRT tests or ANOVA). However, normalisation in DESeq2 is fairly robust to this assumption. According to the author of DESeq2, Micheal Love, DESeq2 is using the median ratio for normalisation, and as long as the number of up and down regulated genes is relatively even, DESeq2 will be able to handle the data. As part of our general quality control for this project we consulted the MA plots, which do not show any overrepresented up or down expression patterns. Additionally see Michael Love comment on comparing different tissues, which is also applicable here when comparing vastly different larval stages (https://support.bioconductor.org/p/63630/):

      “For experiments where all genes increase in expression across conditions, the median ratio method will not be able to capture this difference, but this is typically not the case for a tissue comparison, as there are many "housekeeping" genes with relatively similar expression pattern across tissues.”

      Reviewer #3 (Public Review):

      Weaknesses:

      However, the authors make substantial considerations that are not proven by experimental or functional data. In fact, this is a descriptive study that does not provide any functional evidence to support the claims made.

      We agree with the reviewer that our paper lacks functional experiments but despite that, the transcriptomic data clearly show the activation of TH and corticoid pathways during two distinct periods: an early activation between D1 and D10, and a second one between D32 and juvenile stage. These data are interesting as they call for further examination of 1) the existence of an early larval developmental step also involving TH and corticosteroids and 2) the possible interaction of corticoids and TH during metamorphosis. This is a question that is certainly not settled yet in teleost fishes and which is of great interest.

      Especially 1) is of interest and importance, since this early activation (unique to our knowledge in any teleost fish studied so far) raises a lot of new questions and once again will certainly be scrutinised by other groups in the years to come, therefore ensuring a good citation impact of this study. We hope that the reviewer, while disagreeing with some our statements, will recognize that our study will be stimulating at that level and that this is what scientific studies should do.

      We acknowledge the descriptive nature of the data and the lack of functional experiments in the Discussion in lines 443 to 445: “This may suggest that in some aspect, cortisol synthesis could work in concert with TH, as has been shown in several different contexts in amphibians, but functional experiments need to be conducted to confirm this hypothesis.” As stated above doing such functional experiment would require establishing the grouper as an experimental model in our husbandry, which currently is not possible due to the large size of the adult fish.

      The consideration that cortisol is involved in metamorphosis in teleosts has never been shown, and the only example cited by the authors (REF 20) clearly states that cortisol alone does not induce flatfish metamorphosis. In that work, the authors clearly state that in vivo cortisol treatment had no synergistic effect with TH in inducing metamorphosis. Moreover, in Senegalensis, the sole pre-otic CRH neuron number decreases during metamorphosis, further arguing that, at least in flatfish, cortisol is not involved in flatfish metamorphosis (PMID: 25575457).  

      We will do our best to improve the clarity of the revised manuscript to avoid any misunderstanding about our claims. However, we would like to point out the semantic shift in the reviewer first sentence: Indeed “being involved” is not the same as “cortisol alone does not induce”. In ref 20 the authors explicitly wrote that “Cortisol further enhanced the effects of both T4 and T3, but was ineffective in the absence of thyroid hormones” and in our view this indeed corresponds to ”being involved in metamorphosis”.

      We are not claiming that cortisol alone is involved in metamorphosis as the reviewer suggests, but simply that there is a possible involvement of cortisol together with TH in metamorphosis. We stand on this claim as we indeed observed an activation of corticoid pathway genes around D32, which is sufficient to say it is involved. We do agree that functional experiments will be needed to properly demonstrate the involvement of corticoids in grouper metamorphosis, but this was not possible in the current study as it would imply to set up a full grouper life cycle in lab conditions which is impossible for the scope of this manuscript.

      We also mentioned in the discussion that the role of corticoids in fish larval development is still debated, and we agree that this remains a contentious issue. We have clarified the Discussion on this point (lines 375-376, lines 439-464).

      We wrote that “There is contrasting evidence of communication between these two pathways during teleost fish larval development with some data suggesting a synergic and other an antagonistic relationship. In terms of synergy, an increase in cortisol level concomitantly with an increase in TH levels has been observed in flatfish [26], golden sea bream [64] and silver sea bream [65]. Cortisol was also shown to enhance in vitro the action of TH on fin ray resorption (phenomenon occurring during flatfish metamorphosis) in flounder[27]. It has also been shown that cortisol regulates local T3 bioavailability in the juvenile sole via regulation of deiodinase 2 in an organ-specific manner [66]. On the antagonistic side, it has been shown that experimentally induced hyperthyroidism in common carp decreases cortisol levels[67], whereas cortisol exposure decreases TH levels in European eel [68]. Given this scattered evidence, the existence of a crosstalk active during teleost larval development and metamorphosis has never been formally demonstrated. The results we obtained in grouper are clearly indicating that HPI axis is activated during both early development and metamorphosis and that cortisol synthesis is activated during early development. This may suggest that in some aspect, cortisol synthesis could work in concert with TH, as has been shown in several different contexts in amphibians [25], but functional experiments need to be conducted to confirm this hypothesis.” In the revised manuscript, we have also added the interesting case of the Senegal sole mentioned by the reviewer.

      In the last revision, we had also added that our results “brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy” meaning that we clearly acknowledge that we are only revealing a hypothesis that remains to be tested. We later follow up with a discussion about the most novel observation and focus of our study, the increase in THs and cortisol during early development, which was unexpected and very intriguing. Again, these results suggest that there might be a link between the two, as has been shown in amphibians. This is typically the kind of results that should encourage more investigations into other fish species. Indeed, this has been pointed out by other authors and in particular by Bob Denver (probably the foremost expert on this topic) in Crespi and Denver 2012: “Elevation in HPA/I axis activity has been described prior to Metamorphosis in amphibians and fish, birth in mammals (reviewed in Crespi & Denver 2005a; Wada 2008)”. B. Denver also adds that: “Experiments in which GCs were elevated prior to metamorphosis or prior to hatching or birth (e.g. Weiss, Johnston & Moore 2007) or inhibited by treatments with GC synthesis blockers (e.g. metyrapone) or receptor antagonists (e.g. RU486, Glennemeir & Denver 2002) demonstrate that GCs play a causal role in precipitating these life-history transitions (also reviewed in Crespi & Denver 2005a; Wada 2008).” We believe the reviewer will be convinced by these elements coming from a colleague unanimously respected in the field. 

      Furthermore, the authors need to recognise that the transcriptomic analysis is whole-body and that HPA axis genes are upregulated, which does not mean they are involved in regulating the HPT axis. The authors do not show that in thyrotrophs, any CRH receptor is expressed or in any other HPT axis-relevant cells and that changes in these genes correlate with changes in TSH expression. An in-situ hybridisation experiment showing co-expression on thyrotrophs of HPA genes and TSH could be a good start. However, the best scenario would be conducting cortisol treatment experiments to see if this hormone affects grouper metamorphosis.

      We agree that functional experiments are needed to validate our hypothesis. As the early peaks of expression levels observed for many genes were very intriguing for us, we did carry out thyroid hormones and goitrogenic treatment on young grouper larvae to test their effect on the morphological changes. Unfortunately, such experiments, already tricky on metamorphosing larvae, are even more risky on such tiny individuals just after hatching and we encountered high mortality rates. We must add that because we cannot establish a full grouper life cycle under lab conditions, we have done these experiments in the context of a commercial husbandry system in Japan, which while excellent limits the scope of possible experiments. We were thus not able to provide functional validation of our hypothesis. Such experiments will be a full project in itself, requiring setting up a rearing system suitable for both larval survival and economical constraints related to drug treatments. We were further limited by the spawning times of the grouper in the operational aquaculture farm, which are limited to a short time during each year. So even if we strongly agree with the necessity of conducting such experiments, we think that this is not in the scope of the present paper, but something future research can explore.

      High TSH and Tg levels usually parallel whole-body TH levels during teleost metamorphosis. However, in this study, high Tg expression levels are only achieved at the juvenile stage, whereas high TSH is achieved at D32, and at the juvenile stage, they are already at their lowest levels.

      This is exactly our point. We observe two peaks in TSH expression, one at D3 and one at D32. The peak at D3 coincides with high thyroid hormone levels on the same day, and while we have not measured TH at D32, existing literature shows that there is a peak in TH during that time (e.g., de Jesus et al., 1998). Similarly, there is a small peak of Tg at D3. Our manuscript focused more on the upregulation of these genes at D3, which has not been reported before in the literature and raised the question of the role of TH so early in the larval development, outside of the metamorphosis period. 

      Regarding the respective levels of TSH and Tg, we first would like to add that their respective order of appearance before metamorphosis (TSH at D32, Tg after) is consistent with what we would expect. We agree however that the strong increase of Tg and TPO expression is later than expected. Therefore, we have added the following sentence in lines 212 to 216: “The respective order of appearance of TSH and Tg (TSH at D32, Tg after) is consistent with what we would expect but a bit later than expected given the morphologicl transformation. It would be interesting to revisit this in a future series of experiments, with tighter temporal sampling to study how gene expression and morphological transformation aligned.“.

      It is very difficult to conclude anything with the TH and cortisol levels measurements. The authors only measured up until D10, whereas they argue that metamorphosis occurs at D32. In this way, these measurements could be more helpful if they focus on the correct developmental time. The data is irrelevant to their hypothesis.

      We respectfully disagree with the reviewer, considering that 1) TH levels have already been investigated in groupers coinciding with pigmentation changes and fin rays resorption (Figure 4 in de Jesus et al, 1998), 2) there is also evidence in numerous fish species that TH level increase is concomitant with increase of TH related genes, and 3) we observed in our data an increase in the expression of TH related genes as well as pigmentation changes and fin rays resorption. Based on our experience in fish metamorphosis and the literature we can say confidently that those observations indicate that metamorphosis is occurring between D32 and the juvenile stage. This clearly shows that our inference is correct. Additionally, we would like to reemphasize that from our experience in several fish species transcriptomic data are more robust and precise than hormone measurements.

      However, as we were surprised by the activation of TH and corticoid pathway genes very early in the larval development (at D3), which is clearly outside of the metamorphosis period, we decided to measure TH and cortisol levels during this period of time to determine if whether or not there this surprising early activation was indeed corresponding to an increase in both TH and cortisol. As such observation has never been made in other teleost species (to our knowledge), and as we were wondering if gene activation was accompanied by hormonal increase, the measurements we did for TH and cortisol between D1 and D10 are relevant. In order to clarify our message further, we have changed some of the mentions of

      “metamorphosis” to “larval development” throughout the manuscript and added other improvements to avoid any confusion between the two periods we are studying: early larval development (between D1 and D10) and metamorphosis (between D32 and juvenile stage).  

      Moreover, as stated in the previous review, a classical sign of teleost metamorphosis is the upregulation of TSHb and Tg, which does not occur at D32 therefore, it is very hard for me to accept that this is the metamorphic stage. With the lack of TH measurements, I cannot agree with the authors. I think this has to be toned down and made clear in the manuscript that D32 might be a putative metamorphic climax but that several aspects of biology work against it. Moreover, in D10, the authors show the highest cortisol level and lowest T4 and T3 levels. These observations are irreconcilable, with cortisol enhancing or participating in TH-driven metamorphosis.

      We thank the reviewer for this comment, but we think that there might be a misunderstanding here. 

      (1) We clearly observed an increase of TSHb (that occurs between D18 and juvenile stage) and an increase of tg from D32 which coincide with the activation of other genes involved in TH pathway (dio2, dio3, and also a strong increase of TRb). All this and put in the context of what we know from previous grouper studies, clearly supports our conclusion that TH-regulated metamorphosis is starting at around D32 in grouper. We also observed morphological changes such as fin rays resorption and pigmentation changes between D32 and juvenile stage. Such morphological changes have already been associated as corresponding to metamorphosis in groupers (De Jesus et al 1998) as they occur during TH level increase, and they also happen to be under the control of TH in grouper (De Jesus et al 1998). Based on this study but also on studies (conducted on many other teleost species) showing that the increase of TH levels is always associated with an activation of TH pathway genes and morphological and pigmentation changes we concluded that metamorphosis of E. malabaricus occurs between D32 and juvenile stage. We have improved the clarity of the manuscript in several places to make sure that our conclusion is based on our transcriptomic and morphological data plus the available literature.

      (2) We clearly observed another activation of TH related gene earlier in the development (between D1 and D10, with a surge of trhrs, tg and tpo at D3. As this activation was very unexpected for us, we decided to focus the analysis of TH levels between D1 and D10 and very interestingly we observed high level of T4 at D3 indicating that THs are instrumental very precociously in the larval development of the malabar grouper which has never been shown before. We declared lines 224-225 that our “data reinforce the existence of two distinct periods of TH signalling activity, one early on at D3 and one late corresponding to classic metamorphosis at D32”. However, we agree that we could have been clearer and clearly explained that this early activation was very intriguing for us and that we wanted to investigate hormonal levels around that period. However, we never claimed anywhere in the manuscript

      that this early developmental period corresponds to metamorphosis. Something else is occurring and both TH and cortisol seem to be involved but further experiments need to be conducted to understand their role and their possible interaction. We have added corresponding statements in the abstract (lines 39-43) and discussion (lines 447 to 449).

      (3) Finally, regarding the comment about cortisol enhancing or participating in TH driven metamorphosis, our data clearly showed an activation of the corticoid pathway genes around metamorphosis (between D32 and juvenile stage) suggesting a potential implication of corticoids in metamorphosis, but we agree with the reviewer that further experiment are needed to test that. We never claimed that cortisol was enhancing or participating in metamorphosis, on the contrary we are “suggesting a possible interaction between TH and corticoid pathway during metamorphosis”. And we also say that our “results brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy.” Nonetheless, we agree that some parts of our manuscript can be confusing in regards of cortisol synthesis during metamorphosis as we did not measure cortisol levels between D32 and juvenile stage. We have therefore made changes throughout the Introduction and Discussion to make this clearer.

      Given this, the authors should quantify whole-body TH levels throughout the entire developmental window considered to determine where the peak is observed and how it correlates with the other hormonal genes/systems in the analysis.

      We did not measure TH levels at later stages as it has already been measured during Epinephelus coioides metamorphosis and the morphological changes observed in this species around the TH peak corresponds to what we observed in Epinephelus malabaricus around the peak of expression of TH pathway genes (see De Jesus et al., 1998 General and Comparative Endocrinology, 112:10-16). The main focus of this manuscript is the novel observation of the existence of an early activation period observed at D3, and for which we needed TH levels to determine if they were involved in another early developmental process (not related to metamorphosis). Our hypothesis is that this early activation might be related to the growth of fin rays necessary to enhance floatability during the oceanic larval dispersal. As we may have arrived at the explanation of this hypothesis too rapidly without setting up the context well enough, we have made changes to the introduction and discussion.

      Even though this is a solid technical paper and the data obtained is excellent, the conclusions drawn by the authors are not supported by their data, and at least hormonal levels should be present in parallel to the transcriptomic data. Furthermore, toning down some affirmations or even considering the different hypotheses available that are different from the ones suggested would be very positive.

      We thank the reviewer for acknowledging the solidity of the method of our paper and the quality of the results. We agree that there were several parts where our message was unclear. We have addressed these points in the revised version of the manuscript to make sure there is no more confusion between the two distinct periods we studied in this paper (early larval development and metamorphosis). We also made sure that our claims about TH/corticoids interaction during both periods remain hypothetical as we cannot yet, despite trials, sustain them with functional experiment.

    1. Author response:

      eLife assessment

      This study offers a useful treatment of how the population of excitatory and inhibitory neurons integrates principles of energy efficiency in their coding strategies. The analysis provides a comprehensive characterisation of the model, highlighting the structured connectivity between excitatory and inhibitory neurons. However, the manuscript provides an incomplete motivation for parameter choices. Furthermore, the work is insufficiently contextualized within the literature, and some of the findings appear overlapping and incremental given previous work.

      We thank the Reviewers and the Reviewing Editor for taking time to provide extremely valuable suggestions and comments, which will help us to substantially improve our paper. In what follows we summarize our current plan to improve the paper taking up on their suggestions.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, leading to the experimentally observed phenomenon of feature competition. They also characterise the impact of various (hyper)parameters, such as adaptation timescale, ratio of excitatory to inhibitory cells, regularisation strength, and background current. These results add useful biological realism to a particular model of efficient coding. However, not all claims seem fully supported by the evidence. Specifically, several biological features, such as the ratio of excitatory to inhibitory neurons, which the authors claim to explain through efficient coding, might be contingent on arbitrary modelling choices. In addition, earlier work has already established the importance of structured connectivity for feature competition. A clearer presentation of modelling choices, limitations, and prior work could improve the manuscript.

      Thanks for these insights and for this summary of our work.

      Major comments:

      (1) Much is made of the 4:1 ratio between excitatory and inhibitory neurons, which the authors claim to explain through efficient coding. I see two issues with this conclusion: (i) The 4:1 ratio is specific to rodents; humans have an approximate 2:1 ratio (see Fang & Xia et al., Science 2022 and references therein); (ii) the optimal ratio in the model depends on a seemingly arbitrary choice of hyperparameters, particularly the weighting of encoding error versus metabolic cost. This second concern applies to several other results, including the strength of inhibitory versus excitatory synapses. While the model can, therefore, be made consistent with biological data, this requires auxiliary assumptions.

      We will describe better the ratio of numbers of E and I neurons found in real data, as suggested. The first submission already contained an analysis of how this ratio of neuron numbers depends on the weighting of the loss of E and I neurons and on the relative weighting of the encoding error vs the metabolic cost in the loss function (see Fig 6E). We will make sure that these results are suitably expanded and better emphasized in revision. We will also include new analysis of dependence of optimal parameters on the relative weighting of encoding error vs metabolic cost in the loss function when studying other parameters (namely: noise intensity, metabolic constant, ratio of mean I-I to E-I connectivity, time constants of single E and I neurons).

      (2) A growing body of evidence supports the importance of structured E-I and I-E connectivity for feature selectivity and response to perturbations. For example, this is a major conclusion from the Oldenburg paper (reference 62 in the manuscript), which includes extensive modelling work. Similar conclusions can be found in work from Znamenskiy and colleagues (experiments and spiking network model; bioRxiv 2018, Neuron 2023 (ref. 82)), Sadeh & Clopath (rate network; eLife, 2020), and Mackwood et al. (rate network with plasticity; eLife, 2021). The current manuscript adds to this evidence by showing that (a particular implementation of) efficient coding in spiking networks leads to structured connectivity. The fact that this structured connectivity then explains perturbation responses is, in the light of earlier findings, not new.

      We agree that the main contribution of our manuscript in this respect is to show how efficient coding in spiking networks can lead to structured connectivity similar to those proposed in the above papers. We apologize if this was not clear enough in the previous version. We will make it clearer in revision.  We nevertheless think it useful to report the effects of perturbations within this network because the structure derived in our network is not identical to those studied in the above paper, and because these results give information about how lateral inhibition works in this network. Thus, we will keep presenting it in the revised version, although we will de-emphasize and simplify its presentation to give more emphasis to the novelty of the derivation of this connectivity rule from the principles of efficient coding.

      (3) The model's limitations are hard to discern, being relegated to the manuscript's last and rather equivocal paragraph. For instance, the lack of recurrent excitation, crucial in neural dynamics and computation, likely influences the results: neuronal time constants must be as large as the target readout (Figure 4), presumably because the network cannot integrate the signal without recurrent excitation. However, this and other results are not presented in tandem with relevant caveats.

      We will improve the Limitations paragraph in Discussion, and also anticipate caveats in tandem with results when needed, as suggested.

      (4) On repeated occasions, results from the model are referred to as predictions claimed to match the data. A prediction is a statement about what will happen in the future - but most of the "predictions" from the model are actually findings that broadly match earlier experimental results, making them "postdictions".

      This distinction is important: compared to postdictions, predictions are a much stronger test because they are falsifiable. This is especially relevant given (my impression) that key parameters of the model were tweaked to match the data.

      We will better distinguish between pre- and post-dictions  in revision.

      Reviewer #2 (Public Review):

      Summary: In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength. It thus argues that some of these observations may come as a direct consequence of efficient coding.

      Strengths:

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models.

      In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some long-standing puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important.

      Though several of the observations have been reported and studied before (see below), this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Thanks for these insights and for the kind words of appreciation of the strengths of our work.

      Weaknesses:

      Though the text of the paper may suggest otherwise, many of the modeling choices and observations found in the paper have been introduced in previous work on efficient spiking models, thereby making this work somewhat repetitive and incremental at times. This includes the derivation of the network into separate excitatory and inhibitory populations, discussion of physical units, comparison of voltage versus spike-timing correlations, and instantaneous E/I balance, all of which can be found in one of the first efficient spiking network papers (Boerlin et al. 2013), as well as in subsequent papers. Metabolic cost and slow adaptation currents were also presented in a previous study (Gutierrez & Deneve 2019). Though it is perfectly fine and reasonable to build upon these previous studies, the language of the text gives them insufficient credit.

      We will improve the text to make sure that credit to previous studies is more precisely and more clearly given.

      Furthermore, the paper makes several claims of optimality that are not convincing enough, as they are only verified by a limited parameter sweep of single parameters at a time, are unintuitive and may be in conflict with previous findings of efficient spiking networks. This includes the following. Coding error (RMSE) has a minimum at intermediate metabolic cost (Figure 5B), despite the fact that intuitively, zero metabolic cost would indicate that the network is solely minimizing coding error and that previous work has suggested that additional costs bias the output. Coding error also appears to have a minimum at intermediate values of the ratio of E to I neurons (effectively the number of I neurons) and the number of encoded variables (Figures 6D, 7B). These both have to do with the redundancy in the network (number of neurons for each encoded variable), and previous work suggests that networks can code for arbitrary numbers of variables provided the redundancy is high enough (e.g., Calaim et al. 2022). Lastly, the performance of the E-I variant of the network is shown to be better than that of a single cell type (1CT: Figure 7C, D). Given that the E-I network is performing a similar computation as to the 1CT model but with more neurons (i.e., instead of an E neuron directly providing lateral inhibition to its neighbor, it goes through an interneuron), this is unintuitive and again not supported by previous work. These may be valid emergent properties of the E-I spiking network derived here, but their presentation and description are not sufficient to determine this.

      We are addressing this issue in two ways. First, we will present results of joint sweeps of variations of pairs of parameters whose joint variations are expected to influence optimality in a way that cannot be understood varying one parameter at a time. Namely we plan to vary jointly the noise intensity and the metabolic constant, as well as the ratio of E to I neuron numbers and the ratio of mean I-I to E-I connectivity. Second, we will individuate a reasonable/realistic range of possible variations of each individual parameter and then perform a Monte Carlo search for the optimal point within this range, and compare the so-obtained results with those obtained from the understanding gained from varying one or two parameters at a time.  We will also add the suggested citation to Calaim et al. 2022 in regard to the points discussed above.

      We will improve the comparison between the Excitatory-Inhibitory and the 1-Cell-Type model (see reply to the suggestions of Referee 3 for more details).

      Alternatively, the methodology of the model suggests that ad hoc modeling choices may be playing a role. For example, an arbitrary weighting of coding error and metabolic cost of 0.7 to 0.3, respectively, is chosen without mention of how this affects the results. Furthermore, the scaling of synaptic weights appears to be controlled separately for each connection type in the network (Table 1), despite the fact that some of these quantities are likely linked in the optimal network derivation. Finally, the optimal threshold and metabolic constants are an order of magnitude larger than the synaptic weights (Table 1). All of these considerations suggest one of the following two possibilities. One, the model has a substantial number of unconstrained parameters to tune, in which case more parameter sweeps would be necessary to definitively make claims of optimality. Or two, parameters are being decoupled from those constrained by the optimal derivation, and the optima simply corresponds to the values that should come out of the derivation.

      In the previously submitted manuscript we presented both the encoding error and the metabolic cost separately as a function of the parameters, so that readers could get an understanding of how stable optimal parameters would be to the change of the relative weighting of encoding error and metabolic cost. We will improve this work by adding the suggested calculations to provide quantitative measures of the dependence of the optimal network parameters and configurations on this relative weighting.

      Reviewer #3 (Public Review):

      Summary: In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work?

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs.

      They then investigate in-depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and showing the networks can operate in a biologically realistic regime.

      Strengths:

      (1) The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field.

      (2) They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly.

      (3) They put sensible constraints on their networks, while still maintaining the good properties these networks should have.

      Thanks for this summary and for these kind words of appreciation of the strengths of our work.

      Weaknesses:

      (1) The paper has somewhat overstated the significance of their theoretical contributions, and should make much clearer what aspects of the derivations are novel. Large parts were done in very similar ways in previous papers. Specifically: the split into E and I neurons was also done in Boerlin et al (2008) and in Barrett et al (2016). Defining the networks in terms of realistic units was already done by Boerlin et al (2008). It would also be worth it to discuss Barrett et al (2016) specifically more, as there they also use split E/I networks and perform biologically relevant experiments.

      We will improve the text to make sure that credit to previous studies is more precisely and more clearly given.

      (2) It is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. While the constraints of Dale's law are sensible (splitting the population in E and I neurons, and removing any non-Dalian connection), they are imposed from biology and not from any coding principles. A discussion of how this could be done would be much appreciated, and in the main text, this should be made clear.

      We indeed removed non-Dalian connections because having only connections respecting Dale’s law is a major constraint for biological plausibility. Our logic was to consider efficient coding within the space of networks that satisfy this (and other) biological plausibility constraints. We did not intend to claim that removing the non-Dalian connections was the result of an analytical optimization. However, to get better insights into how Dale’s Law constrains or influences the design of efficient networks, we added a comparison of the coding properties of networks that either do or do not satisfy Dale’s law. We apologize if this was not sufficiently clear in the previous version and we will clarify this in revision. 

      (3) Related to the previous point, the claim that the network with split E and I neurons has a lower average loss than a 1 cell-type (1-CT) network seems incorrect to me. Only the E population coding error should be compared to the 1-CT network loss, or the sum of the E and I populations (not their average). In my author recommendations, I go more in-depth on this point.

      We will perform the suggested detailed comparisons between the network loss in the 1CT-model and E-I model and then revise or refine conclusions if and as needed, according to the results we will obtain.

      (4) While the paper is supposed to bring the balanced spiking networks they consider in a more experimentally relevant context, for experimental audiences I don't think it is easy to follow how the model works, and I recommend reworking both the main text and methods to improve on that aspect.

      We will try to make the presentation of the model more accessible to a non-computational audience.

      Assessment and context: Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporating aspects of energy efficiency. For computational neuroscientists, this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers, the model provides a clearer link between efficient coding spiking networks to known experimental constraints and provides a few predictions.

      Thanks for these kind words. We will make sure that these points emerge more clearly and in a more accessible way from the revised paper.

    1. https://web.archive.org/web/20240725080148/https://fossacademic.tech/2024/02/11/Move-Slowy-Preview.html [[Move Slowly and Build Bridges by Robert Gehl]] is a forthcoming book on 'Mastodon, the Fediverse, and the Struggle for Ethical Social Media'. This post gives summaries per chapter of the draft. Ch1 focuses on Xodus after Musk only. Odd, there are many examples where costs of leaving socmed platforms played a role, which may well be more informative than just n=1. Ch 2 on AP as protocol Ch 3 CoC as a social layer on networked tech (no regard here it seems for the fact that human networks exist outside of tech and span multiple tech platforms simultaneously, and themselves have social norms that guid behaviour regardless whether codified in CoC or expressed in federation choices) Ch 4 on blocking and defederation as a needed safety tool. Socially I think the default might need to be the other way around, federating is the choice, defed the default, as it is how we do it socially irl. We are not unwelcoming to newcomers in a group but we are wary. Ch 5. Who pays for the fediverse infra. Short answer is we all do/many of us do. I pay my own instance, and also contribute hours to the governance of the largest Dutch instance. Good point about people forgetting there are other bizz models for digital media than what centralised adtech kraken do. Ch 6. on eco impact of socmed, and need of awareness what running this stuff costs ecologically. Seems to then pivot to how degrowth and solarpunk people using fediverse tech to interact, which is not the same thing. (It says mitigate, but compared to what, X? ) Ch 7. Threads , or the corp reaction to a growing fediverse. Conclusion, this is where the ethics will be discussed finally.

      Forthcoming w Oxford Univ Press. Not sure this is for me, reads like a snapshot with a limited time window in which it might be informative. Perhaps of interest for [[Stichting ActivityClub Bestuur Hoofdnote]].

    1. Reviewer #1 (Public Review):

      Summary:

      Boldt et al test several possible relationships between trandiagnostically-defined compulsivity and cognitive offloading in a large online sample. To do so, they develop a new and useful cognitive task to jointly estimate biases in confidence and reminder-setting. In doing so, they find that over-confidence is related to less utilization of reminder-setting, which partially mediates the negative relationship between compulsivity and lower reminder-setting. The paper thus establishes that, contrary to the over-use of checking behaviors in patients with OCD, greater levels of transdiagnostically-defined compulsivity predict less deployment of cognitive offloading. The authors offer speculative reasons as to why (perhaps it's perfectionism in less clinically-severe presentations that lowers the cost of expending memory resources), and set an agenda to understand the divergence in cognition between clinical and nonclinical samples. Because only a partial mediation had robust evidence, multiple effects may be at play, whereby compulsivity impacts cognitive offloading via overconfidence and also by other causal pathways.

      Strengths:

      The study develops an easy-to-implement task to jointly measure confidence and replicates several major findings on confidence and cognitive-offloading. The study uses a useful measure of cognitive offloading - the tendency to set reminders to augment accuracy in the presence of experimentally manipulated costs. Moreover, the utilizes multiple measures of presumed biases - overall tendency to set reminders, the empirically estimated indifference point at which people engage reminders, and a bias measure that compares optimal indifference points to engage reminders relative to the empirically-observed indifference points. That the study observes convergenence along all these measures strengthens the inferences made relating compulsivity to the under-use of reminder-setting. Lastly, the study does find evidence for one of several a priori hypotheses and sets a compelling agenda to try to explain why such a finding diverges from an ostensible opposing finding in clinical OCD samples and the over-use of cognitive offloading.

      Weaknesses:

      Although I think this design and study are very helpful for the field, I felt that a feature of the design might reduce the tasks's sensitivity to measuring dispositional tendencies to engage cognitive offloading. In particular, the design introduces prediction errors, that could induce learning and interfere with natural tendencies to deploy reminder-setting behavior. These PEs comprise whether a given selected strategy will be or not be allowed to be engaged. We know individuals with compulsivity can learn even when instructed not to learn (e.g., Sharp, Dolan, and Eldar, 2021, Psychological Medicine), and that more generally, they have trouble with structure knowledge (eg Seow et al; Fradkin et al), and thus might be sensitive to these PEs. Thus, a dispositional tendency to set reminders might be differentially impacted for those with compulsivity after an NPE, where they want to set a reminder, but aren't allowed to. After such an NPE, they may avoid more so the tendency to set reminders. Those with compulsivity likely have superstitious beliefs about how checking behaviors leads to a resolution of catastrophes, which might in part originate from inferring structure in the presence of noise or from purely irrelevant sources of information for a given decision problem.

      It would be good to know if such learning effects exist if they're modulated by PE (you can imagine PEs are higher if you are more incentivized - e.g., 9 points as opposed to only 3 points - to use reminders, and you are told you cannot use them), and if this learning effect confounds the relationship between compulsivity and reminder-setting.

      A more subtle point, I think this study can be more said to be an exploration than a deductive test of a particular model -> hypothesis -> experiment. Typically, when we test a hypothesis, we contrast it with competing models. Here, the tests were two-sided because multiple models, with mutually exclusive predictions (over-use or under-use of reminders) were tested. Moreover, it's unclear exactly how to make sense of what is called the direct mechanism, which is supported by partial (as opposed to complete) mediation.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02394

      Corresponding author(s): Altman, Brian J

      1. General Statements [optional]

      We thank all three Reviewers for their insightful and helpful feedback and suggestions. We strongly believe that addressing these comments has now resulted in a much-improved manuscript. We appreciate that the Reviewers found the manuscript "interesting" with "valuable insights and... obvious novelty", "an important study that is well-done", and "an important understanding of the crosstalk between cancer cells and immune cells as well as the understanding of how the TME disrupts circadian rhythms". All three Reviewers requested a significant revision, which we provide here. We carefully and completely responded to each Reviewer question or suggestion, in most cases with new experiments and text, and in a very few cases with changes or additions to the Discussion section. This includes new data in seven of the original Figures and Supplementary Figures, and one new main Figure and three new Supplementary Figures. Highlights of these new data include testing the role of low pH in cancer cell supernatant on macrophage rhythms, and analysis of single-cell RNA-sequencing data for heterogeneity in macrophage circadian gene expression. Additional experiments were also performed that were not included in the manuscript, and these data are presented in this Response. A detailed point-by-point response to each comment is included below with excerpts of the data and updated text for the reviewers. Please note that the PDF version of this Response includes images of the new Figures inserted in to the manuscript.

      2. Point-by-point description of the revisions

      __Reviewer #1 __

      Evidence, reproducibility and clarity

      The manuscript by Knudsen-Clark et al. investigates the novel topic of circadian rhythms in macrophages and their role in tumorigenesis. The authors explore how circadian rhythms of macrophages may be influenced by the tumor microenvironment (TME). They utilize a system of bone marrow-derived macrophages obtained from transgenic mice carrying PER2-Luciferase (PER2-Luc), a trackable marker of rhythmic activity. The study evaluates how conditions associated with the TME, such as polarizing stimuli (to M1 or M2 subtype), acidic pH, and elevated lactate, can each alter circadian rhythms in macrophages. The authors employ several approaches to explore macrophage functions in cancer-related settings. While the manuscript presents interesting findings and may be the first to demonstrate that tumor stimuli alter circadian rhythms in macrophages and impact tumor growth, it lacks a clear conclusion regarding the role of altered circadian rhythms in suppressing tumor growth. Several discrepancies need to be addressed before publication, therefore, the manuscript requires revision before publication, addressing the following comments:

      We thank Reviewer #1 for the comments regarding the quality of our work and are pleased that the Reviewer finds that this manuscript "presents interesting findings and may be the first to demonstrate that tumor stimuli alter circadian rhythms in macrophages and impact tumor growth". We have addressed all comments and critiques from Reviewer #1 below. To summarize, we added new data on how different macrophage polarization states affect media pH (Supplementary Figure 4), further characterized gene expression in our distinct macrophage populations (Supplementary Figure 1), provided clarity in the data and text on the universal nature of Clock Correlation Distance (CCD) across macrophage populations (Figure 6), included human tumor-associated macrophage (TAM) data for CCD (Figure 7) analyzed single-cell RNA-sequencing data of TAMs to demonstrate heterogeneity in circadian gene expression (Figure 9), and used tumor-conditioned media to show that low pH still affects macrophage rhythms in this context *Supplementary Figure 5". Thanks to the helpful suggestions of the Reviewer, we also made numerous clarifications and fixed a critical referencing error that the Reviewer identified.

      Major comments: 1. It is well known that pro-inflammatory macrophages primarily rely on glycolysis during inflammation, exhibiting dysregulated tricarboxylic acid (TCA) cycle activity. These pro-inflammatory macrophages are commonly referred to as 'M1' or pro-inflammatory, as noted in the manuscript. In contrast, M2 macrophages, or pro-resolution macrophages, are highly dependent on active mitochondrial respiration and oxidative phosphorylation (OXPHOS). Given that M1 macrophages favor glycolysis, they create an acidic environment due to elevated lactate levels and other acidifying metabolites. However, the study does not address this effect. The authors' hypothesis revolves around the acidic environment created by glycolytic tumors, yet they overlook the self-induced acidification of media when culturing M1 macrophages. This raises the question of how the authors explain the reduced circadian rhythms observed in pro-inflammatory macrophages in their study, while low pH and higher lactate levels enhance the amplitude of circadian rhythms. I would encourage the authors to incorporate the glycolytic activity of pro-inflammatory macrophages into their experimental setup. Otherwise the data look contradictory and misleading in some extent.

      We appreciate the important point Reviewer #1 made that macrophages polarized toward a pro-inflammatory phenotype such as those stimulated with IFNγ and LPS (M1 macrophages) prioritize metabolic pathways that enhance glycolytic flux, resulting in increased release of protons and lactate as waste products from the glycolysis pathway. In this way, polarization of macrophages toward the pro-inflammatory phenotype can lead to acidification of the media, which may influence our observations given that we are studying the effect of extracellular pH on rhythms in macrophages. To address this point, we have performed additional experiments in which we measured pH of the media to capture changes in media pH that occur during the time in which we observe changes in rhythms of pro-inflammatory macrophages.

      In line with the documented enhanced glycolytic activity of pro-inflammatory macrophages, the media of pro-inflammatory macrophages is acidified over time, in contrast to media of unstimulated or pro-resolution macrophages. Notably, while pH decreased over time in the pro-inflammatory group, the pH differential between the pH7.4, pH6.8, and pH6.5 sample groups was maintained over the period in which we observe and measure changes in circadian rhythms of pro-inflammatory macrophages. Additionally, media that began at pH 7.4 was acidified only to pH 7 by day 2, above the acidic pH of 6.8 or 6.5. As a result, there remained a difference in pH between the two groups (pH 7.4 and pH 6.5) out to 2 days consistent with the changes in rhythms that we observe between these two groups. This indicates that the difference in circadian rhythms observed in pro-inflammatory macrophages cultured at pH 7.4 compared to pH 6.5 were indeed due to the difference in extracellular pH between the two conditions. We have incorporated these data, shown below, into Supplementary Figure 4 and added the following discussion of these data to the Results section:

      "In line with their documented enhanced glycolytic capacity, pro-inflammatory macrophages acidified the media over time (Supplementary Figure 4C). Notably, while pH of the media the pro-inflammatory macrophages were cultured in decreased over time pH, the pH differential between the pH 7.4, pH 6.8, and pH 6.5 samples groups of pro-inflammatory macrophages was maintained out to 2 days, consistent with the changes in rhythms that we observe and measure between these groups."

      The article examines the role of circadian rhythms in tumor-associated macrophages, yet it lacks sufficient compelling data to support this assertion. Two figures, Figure 7 and Figure 9, are presented in relation to cancer. In Figure 7, gene expression analysis of Arg1 (an M2 marker) and Crem (a potential circadian clock gene) is conducted in wild-type macrophages, BMAL1-knockout macrophages with dysregulated circadian rhythms, and using publicly available data on tumor-associated macrophages from a study referenced as 83. However, it is noted that this referenced study is actually a review article by Geeraerts et al. (2017) titled "Macrophage Metabolism as Therapeutic Target for Cancer, Atherosclerosis, and Obesity" published in Frontiers in Immunology. This raises concerns about the reliability of the results. Furthermore, comparing peritoneal macrophages from healthy mice with macrophages isolated from lung tumors is deemed inaccurate. It is suggested that lung macrophages from healthy mice and those from mice with lung tumors should be isolated separately for a more appropriate comparison. Consequently, Figure 7B is further questioned regarding how the authors could compare genes from the circadian rhythm pathway between these non-identical groups. As a result, the conclusion drawn from these data, suggesting that tumor-associated macrophages exhibit a gene expression pattern similar to BMAL1-KO macrophages, is deemed incorrect, affecting the interpretation of the data presented in Figure 8.

      We thank Reviewer #1 for pointing out our error in the reference provided as the source of the TAM data used for CCD in Figure 7. While we took care to provide the GEO ID for the data set (GSE188549) in the Methods section, we mistakenly cited Geeraerts (2017) Front Immunol when we should have cited Geeraerts (2021) Cell Rep. We have corrected this citation error in the main text.

      We also appreciate Reviewer #1's concern that we are comparing circadian gene expression of peritoneal macrophages to tumor-associated macrophages derived from LLC tumors, which are grown ectopically in the flank for the experiment from which the data set was produced. To ensure an accurate comparison of gene expression, we downloaded the raw FASTQ files from each dataset and processed them in identical pipelines. Our main comparison between these cell types is Clock Correlation Distance (CCD), which compares the pattern of co-expression of circadian genes (Shilts et al PeerJ 2018). CCD was built from multiple mouse and human tissues to be a "universal" tool to compare circadian rhythms, and designed to compare between different tissues and cell types. Each sample is compared to a reference control built from these multiple tissues. To better convey this concept to readers to give confidence the suitability of CCD for comparing data sets across different tissues, we have added the reference control to Figure 7 (now Figure 6B), We have also expanded our analysis to include bone marrow-derived macrophages, to further demonstrate that the organization of clock gene co-expression is not specific to peritoneal macrophages; we have added this data to Figure 7 (now Figure 6C,D). Finally, we have included an abbreviated explanation of the points made above in the results section.

      Due to the universal nature of the CCD tool, we disagree with Reviewer #1's assertion that "the conclusion drawn from these data, suggesting that tumor-associated macrophages exhibit a gene expression pattern similar to BMAL1-KO macrophages, is deemed incorrect". Indeed, this finding mirrors findings in the original CCD paper, which showed that tumor tissues universally exhibit a disordered molecular clock as compared to normal tissue. Notably, the original CCD paper also compared across cell and tumor types.

      As an additional note to the review, we would like to clarify that nowhere in the manuscript do we propose that Crem is a potential circadian clock gene. We are clear throughout the manuscript that we are using Crem as a previously established biomarker for acidic pH-sensing in macrophages. Please see below for the modified Figure and text.

      "To understand the status of the circadian clock in TAMs, we performed clock correlation distance (CCD) analysis. This analysis has previously been used to assess functionality of the circadian clock in whole tumor and in normal tissue[102]. As the circadian clock is comprised of a series of transcription/translation feedback loops, gene expression is highly organized in a functional, intact clock, with core clock genes existing in levels relative to each other irrespective of the time of day. In a synchronized population of cells, this ordered relationship is maintained at the population level, which can be visualized in a heatmap. CCD is designed to compare circadian clock gene co-expression patterns between different tissues and cell types. To accomplish this, CCD was built using datasets from multiple different healthy tissues from mouse and human to be a universal tool to compare circadian rhythms. Each sample is compared to a reference control built from these multiple tissues (Figure 6B)[102]. To validate the use of this analysis for assessing circadian disorder in macrophages, we performed CCD analysis using publicly available RNA-sequencing data from bone marrow-derived macrophages and wild type peritoneal macrophages, as a healthy control for functional rhythms in a synchronized cell population, and BMAL1 KO peritoneal macrophages, as a positive control for circadian disorder[44]."

      And in the Discussion:

      "Interestingly, analysis of TAMs by clock correlation distance (CCD) presents evidence that rhythms are disordered in bulk TAMs compared to other macrophage populations (Figure 6). CCD is one of the most practical tools currently available to assess circadian rhythms due to its ability to assess rhythms independent of time of day and without the need for a circadian time series, which is often not available in publicly available data from mice and humans[102]."

      If the authors aim to draw a clear conclusion regarding the circadian rhythms of tumor-associated macrophages (TAMs), they may need to analyze single-sorted macrophages from tumors and corresponding healthy tissues. Such data are publicly available (of course not in #83)

      We agree with Reviewer #1 that while our interpretation of the data is that there may be heterogeneity in circadian rhythms of tumor-associated macrophages, we cannot prove this without assessing circadian rhythms at the single cell level. While single-cell RNA-sequencing data of freshly isolated tumor associated macrophages of sufficient read depth for circadian gene expression analysis has historically been unavailable, fortunately a dataset was released recently (May 2024) which we were able to use to address this point. We have analyzed publicly available single-cell RNAseq data of tumor-associated macrophages (GSE260641, Wang 2024 Cell) to determine whether there are differences in expression of circadian clock genes between different TAM populations. We have added these data as a new Figure 9. Please see the figure and updated text below.

      "Tumor-associated macrophages exhibit heterogeneity in circadian clock gene expression.

      __ Our findings suggested that heterogeneity of the circadian clock may lead to disorder in bulk macrophage populations, but did not reveal if specific gene expression changes exist in tumor-associated macrophages at the single-cell level. To determine whether heterogeneity exists within the expression of circadian clock genes of the tumor-associated macrophage population, we analyzed publicly available single-cell RNA sequencing data of macrophages isolated from B16-F10 tumors[107]. To capture the heterogeneity of macrophage subsets within the TAM population, we performed unbiased clustering (Figure 9A). We then performed differential gene expression to determine if circadian clock genes were differentially expressed within the TAM subpopulations. The circadian clock genes Bhlhe40 (DEC1), Bhlhe41 (DEC2), Nfil3 (E4BP4), Rora (RORα), Dbp (DBP), and Nr1d2 (REV-ERBβ) were significantly (adj.p We next sought to determine whether differences in circadian clock gene expression between TAM subpopulations were associated with exposure to acidic pH in the TME. To this end, we first assessed Crem expression in the TAM subpopulations that were identified by unbiased clustering. Crem expression was significantly higher in TAM clusters 4, 5, and 6 compared to TAM clusters 1-3 and 7-9 (Figure 9C). Clusters were subset based on Crem expression into Crem high (clusters 4-6) and Crem low (clusters 1-3, 7-9) (Figure 9D), and differential gene expression analysis was performed. The circadian clock genes Nfil3, Rora, Bhlhe40, and Cry1 (CRY1) were significantly (adj.p __And in the Discussion:

      "Supporting the notion that population-level disorder may exist in TAMs, we used scRNA-sequencing data and found evidence of heterogeneity between the expression of circadian clock genes in different TAM subpopulations (Figure 9A, B). Phenotypic heterogeneity of TAMs in various types of cancer has previously been shown[20, 21, 125, 126], and we have identified distinct TAM subpopulations by unbiased clustering (Figure 9A). Within those TAM subpopulations, we identified differential expression of circadian clock genes encoding transcription factors that bind to different consensus sequences: DEC1 and DEC2 bind to E-boxes, NFIL3 and DBP binds to D-boxes, and RORα and REV-ERBβ binds to retinoic acid-related orphan receptor elements (ROREs)[127, 128]. While little is known about regulation of macrophages by E-box and D-box elements beyond the circadian clock, aspects of macrophage function have been shown to be subject to transcriptional regulation through ROREs[129, 130]. Thus, we speculate that variations in these transcription factors may exert influence on expression of genes to drive diversity between TAM subpopulations. Differential expression of circadian clock genes between TAM subpopulations was also associated with Crem expression (Figure 9C-E), suggesting that exposure of TAMs to acidic pH within the TME can alter the circadian clock. However, there remained significant variation in expression of circadian clock genes within the Crem high and Crem low groups (Figure 9B), suggesting that acidic pH is not the only factor in the TME that can alter the circadian clock. Together, these data implicate the TME in driving heterogeneity in TAM circadian rhythms just as it drives heterogeneity in TAM phenotype.

      Interestingly, in contrast to our observations of circadian disorder in TAMs isolated from LLC tumors (Figure 6), rhythmicity in expression of circadian genes was observed in bulk TAMs isolated from B16 tumors[107]. This suggests that circadian rhythms of TAMs are maintained differently in different types of cancer. Notably, both of these observations were at the population level. Upon separation of the B16 TAM population into subsets by unbiased clustering of single-cell RNA sequencing data, we measured differences in expression of circadian clock genes between TAM subpopulations (Figure 9A,B). This suggests that even within a rhythmic TAM population, there is heterogeneity in the circadian clock of TAM subpopulations."

      Additionally, it is widely acknowledged that human and mouse macrophages exhibit distinct gene expression profiles, both in vitro and in vivo. While assuming that genes involved in circadian rhythms are conserved across species, the authors could consider extending their funding to include analyses of single-sorted macrophages from cancer patients, such as those with lung cancer or pancreatic ductal adenocarcinoma (PDAC). These experiments would provide relevant insights into TAM biology.

      We agree that with Reviewer #1 that ultimately, being able to relate findings in mice to humans is critical. It is important to assess if circadian disorder is observed in TAMs in human cancers as it is for LLC tumor-derived macrophages in mice. To address this point, we have performed CCD using a human data set (GSE116946; Garrido 2020 J Immunother Cancer) suitable for use with CCD (wherein macrophages were isolated from bulk tumor in humans, with a high enough samples size, and not cultured prior to sequencing). We have added these data as a new Figure 7, shown below. Please see the added data and updated text below.

      "We next assessed the status of the circadian clock in human TAMs from NSCLC patients. We performed CCD with publicly available RNA-seq data of tumor-adjacent macrophages and tumor-associated macrophages from NSCLC patients, using alveolar macrophages from healthy donors as a control[104, 105]. To assess the contribution of the acidic TME to circadian disorder, we subset TAM NSCLC patient samples into groups (Crem high TAMs and Crem low TAMs) based on median Crem expression. Notably, in macrophages from human NSCLC there was a trend toward disorder in Crem low but not Crem high TAM samples (Figure 7A,B). Additionally, the co-variance among core clock genes observed in alveolar macrophages from healthy donors was absent within Crem low and Crem high TAM samples (Figure 7C). In all, these data indicate that there is population-level disorder in the circadian rhythms of tumor-associated macrophages in humans and mice, suggesting that circadian rhythms are indeed altered in macrophages within the TME."

      And in the Discussion:

      "Indeed, we observed differences in the circadian clock of Crem low human TAM samples compared to Crem high human TAM samples, suggesting that acidic pH influences circadian disorder in TAMs (Figure 7). Interestingly, Crem low TAM samples exhibited a trend toward disorder while Crem high TAM samples did not. This is of particular interest, as we have observed that acidic pH can enhance circadian rhythms in macrophages, raising the question of whether acidic pH promotes or protects against circadian disorder."

      Minor comments: 1. Figure 2C needs clarification. It's unclear why pro-inflammatory macrophages treated with lactic acid would have a shorter amplitude and period, while acidic pH would increase amplitude and period in M2 macrophages.

      We thank Reviewer #1 for this important observation. Based on the comment, it is our understanding that the Reviewer is referring to the data in Figure 2 (low pH) compared to Figure 4 (lactate). We also find it very interesting that lactate alters rhythms in a manner distinct from the way in which acidic pH alters rhythms. Reviewer 3 asked for clarification on how lactate affected circadian gene expression in pH 7.4 or 6.5. We have added these data as Figure 4C (data and text below). It is notable that lactate opposing effects on circadian gene expression in pH 6.5, enhancing the effects of low pH in some cases (Nr1d1) while blunting them in other cases (Cry1). This is mentioned in the text.

      "Lactate was also observed to alter expression of the circadian clock genes Per2, Cry1, and Nr1d1 over time in BMDMs cultured at pH 6.5, while having more subtle effects at pH 7.4 (Figure 4C). Notably, lactate blunted the effect of pH 6.5 on Cry1 expression, while enhancing the effect of low pH on Nr1d1 expression."

      Why these two stimuli alter rhythms differently remains an open question that is discussed in the Discussion section and is prime to be a topic of future investigation. We have added to the Discussion section potential reasons why these conditions may alter rhythms differently, such as the different pathways downstream of sensing these two different conditions. Please see the updated text, below.

      "Although lactate polarizes macrophages toward a pro-resolution phenotype similar to acidic pH[30, 93], exposure to lactate had different effects on circadian rhythms - and in some cases, circadian clock gene expression - than exposure to acidic pH (Figure 4). Sensing of lactate occurs through different pathways than acid-sensing, which may contribute to the different ways in which these two stimuli modulate circadian rhythms of macrophages[111]. One previously published finding that may offer mechanistic insight into how phenotype can influence circadian rhythms is the suppression of Bmal1 by LPS-inducible miR-155[54]. It has also been observed that RORα-mediated activation of Bmal1 transcription is enhanced by PPARγ co-activation[112]. In macrophages, PPARγ expression is induced upon stimulation with IL-4 and plays a key role in alternative activation of macrophages, promoting a pro-resolution macrophage phenotype, and supporting resolution of inflammation[113-115]. Such observations prompt the question of whether there are yet-unidentified factors induced downstream of various polarizing stimuli that can modulate expression of circadian genes at the transcriptional and protein levels. Further work is required to understand the interplay between macrophage phenotype and circadian rhythms."

      The scale in Figure 2C should be equal for all conditions (e.g., -200).

      We appreciate Reviewer #1's preference for the axes to be scaled similarly to enable cross-comparison between graphs. However, due to the different amplitude of pro-inflammatory macrophages compared to the others, we feel that making all axes the same will make it hard to see the rhythms of pro-inflammatory macrophages, hindering the reader's ability to observe the data. Thus, we have put the matched-axis plots, shown below, in Supplementary Figure 4A.

      Absolute values of amplitude, damping, and period differ between Figure 1 and Figure 2A, B, C. The authors should explain these discrepancies.

      As with many experimental approaches, there is slight variation in absolute values between independent experiments, which Reviewer #1 correctly notes. However, while the absolute values vary slightly, the relationship between the values in each of these conditions remains the same across the panels mentioned by Reviewer #1.

      The authors should consider modulating the acidic environment of macrophages in settings more representative of cancer. For example, by adding conditioned media from tumor cells with pronounced glycolysis.

      We appreciate Reviewer #1's desire to more closely mimic the tumor microenvironment. To address Reviewer #1's point, we cultured macrophages in RPMI or cancer cell (KCKO) supernatant at pH 6.5 or pH-adjusted to pH 7.4 and assessed rhythms by measuring rhythmic activity of Per2-Luc with LumiCycle analysis. We then compared changes in rhythms between macrophages cultured normal media to cancer cell supernatant in pH-matched conditions to assess how cancer cell-conditioned media may influence circadian rhythms of macrophages, and the contribution of acidic pH. We have added these data, shown below, as a new Supplementary Figure 5, and included a discussion of these data in the manuscript. Please see the new Figure and updated text below.

      "Cancer cell supernatant alters circadian rhythms in macrophages in a manner partially reversed by neutralization of pH.

      We have observed that polarizing stimuli, acidic pH, and lactate can alter circadian rhythms. However, the tumor microenvironment is complex. Cancer cells secrete a variety of factors and deplete nutrients in the environment. To model this, we cultured BMDMs in RPMI or supernatant collected from KCKO cells, which are a murine model of pancreatic ductal adenocarcinoma (PDAC)[94, 95], at pH 6.5 or neutralized to pH 7.4 (Supplementary Figure 5). Circadian rhythms of BMDMs cultured in cancer cell supernatant at pH 7.4 or pH 6.5 exhibited increased amplitude and lengthened period compared to RPMI control at pH 7.4 or 6.5, respectively, indicating that cancer cell supernatant contains factors that can alter circadian rhythms of BMDMs. Notably, BMDMs cultured in cancer cell supernatant at pH 6.5 had increased amplitude and shortened period compared to BMDMs cultured in cancer cell-conditioned media at pH7.4, indicating that pH-driven changes in rhythms were maintained in BMDMs cultured in cancer cell supernatant. When the pH of cancer cell supernatant was neutralized to pH7.4, the increased amplitude was decreased, and the shortened period was lengthened, indicating that neutralizing acidic pH partially reverses the changes in rhythms observed in macrophages cultured in cancer cell supernatant at pH 6.5. These data further support our observations that acidic pH can alter circadian rhythms of macrophages both alone and in combination with various factors in the TME."

      And, in the Discussion:

      "We have shown that various stimuli can alter rhythms of macrophages in a complex and contributing manner, including polarizing stimuli, acidic pH, and lactate. TGFβ is produced by a variety of cells within the TME, and was recently identified as a signal that can modulate circadian rhythms[123, 124]. Additionally, when we exposed macrophages to cancer cell-conditioned media, rhythms were modulated in a manner distinct from acidic pH or lactate, with these changes in rhythms partially reversed by neutralization of the cancer cell-conditioned media pH (Supplementary Figure 5). It is conceivable that, in addition to acidic pH, other stimuli in the TME are influencing circadian rhythms to drive population-level disorder that we observed by CCD."

      Arg1 alone is not sufficient as an M2 polarization marker. The authors should include additional markers.

      We thank Reviewer #1 for bringing up this critical point in experimental rigor. While Arg1 is a commonly-used marker for M2 polarization, Reviewer #1 points out that polarization of macrophages is typically assessed by a full panel of markers characteristic of the M2 state. To address this point, we have expanded our panel to include several other markers of M2 polarization in mice such as Retnla, Ym1, MGL1, and CD206. In response to Reviewer 2's major point 2 and Reviewer 3's major point 4 below, we have also expanded our panel of markers used to assess the M1 polarization state with Tnfa, Il1b. and Il6. We have added these data, shown below, to Supplementary Figure 1 and updated the text appropriately. Please see the new Figure and updated text below.

      "Consistent with previous studies, we found that genes associated with anti-inflammatory and pro-resolution programming characteristic of IL-4 and IL-13-stimulated macrophages such as Arg1, Retnla, Chil3 (Ym1), Clec10a (MGL1), and Mrc1 (CD206) were induced in IL-4 and IL-13-stimulated macrophages, but not IFNγ and LPS-stimulated macrophages. In contrast, genes associated with pro-inflammatory activity characteristic of IFNγ and LPS-stimulated macrophages such as Nos2 (iNOS), Tnfa, Il1b, and Il6 were induced in IFNγ and LPS-stimulated macrophages, but not IL-4 and IL-13-stimulated macrophages (Supplementary Figure 1)[28, 30, 65, 71, 74, 75]. This indicates that macrophages stimulated with IL-4 and IL-13 were polarized toward a pro-resolution phenotype, while macrophages stimulated with IFNγ and LPS were polarized toward a pro-inflammatory phenotype."

      __ Significance__

      While the manuscript provides valuable insights and has obvious novelty, it requires a significant revision

      We thank Reviewer #1 for their deep read of our manuscript, and their helpful feedback and suggestions. As shown by the comments above, we are confident we have fully addressed each of the points that were made to result in a much-improved revised manuscript.

      __ Reviewer #2 __

      Evidence, reproducibility and clarity

      Knudsen-Clark et al. showed that the circadian rhythm of bone marrow-derived macrophages (BMDM) can be affected by polarization stimuli, pH of the microenvironment, and by the presence of sodium-lactate. Mechanistically, the acidic pH of cell microenvironment is partly regulated by intracellular cAMP-mediated signaling events in BMDM. The authors also showed that the circadian clock of peritoneal macrophages is also modified by the pH of the cell microenvironment. Using publicly available data, the authors showed that the circadian rhythm of tumor-associated macrophages is similar to that of Bmal1-KO peritoneal macrophages. In a murine model of pancreatic cancer, the authors showed that the tumor growth is accelerated in C57BL/6 mice co-injected with cancer cells and Bmal1-KO BMDM as compared to mice co-injected with cancer cell and wild type BMDM.

      We thank Reviewer #2 for their insightful and helpful comments and feedback. Their Review guided key clarifying experiments and additions to the Discussion and Methods. To summarize, we added new data to Supplementary Figure 1 to characterize distinct gene expression in our different polarized macrophage populations, showed in Supplementary Figure 2 that serum shock independently induces cAMP and Icer, discussed the limitations of the artificial polarization models more clearly, and updated our Methods to better explain how macrophages were isolated from the peritoneum. We also quantified multiple immunoblots of pCREB, provided clarity in the Methods and Reviewer-only data on how our protein-extraction protocol isolates nuclear protein, better introduced the BMAL1-KO mouse model, and showed in Supplementary Figure 6 that low pH can induce oscillations in the absence of a serum shock.

      Major points of criticism: 1. Nine main figures include different experimental models on a non-systematic manner in the manuscript, and only literature-based correlation is used to link the results each other. The authors used in vitro BMDM and peritoneal cell-based model systems to study the effects of IL4+IL13, IFNg+LPS, low pH, sodium-lactate, adenylate cyclase inhibitors on the circadian clock of macrophages. The link between these microenvironment conditions of the cells is still correlative with the tumor microenvironment: publicly available data were used to correlate the increased expression level of cAMP-activated signaling events with the presence of acidic pH of tumor microenvironment. Notably, the cell signaling messenger molecule cAMP is produced by not only low extracellular pH by activated GPCRs, but also starvation of the cell. The starvation is also relevant to this study, since the BMDM used in the in vitro culture system were starving for 24 hours before the measurement of Per2-Luc expression to monitor circadian rhythm.

              We agree with the important point that Reviewer #2 makes that our synchronization protocol of serum starvation followed by serum shock can impact the cAMP signaling pathway. Indeed, it has previously been shown that serum shock induces phosphorylation of CREM in rat fibroblasts, which is indicative of signaling through the cAMP pathway. To address this point, we have added a schematic of our synchronization protocol to Supplementary Figure 2B for additional clarity. We have also performed additional experiments to test whether cAMP signaling is induced in macrophages by our synchronization protocol. For this, we assessed downstream targets of the cAMP signaling pathway, Icer and pCREB, after serum starvation but before serum shock, and at several time points post-treatment with serum shock (Supplementary Figures 2D,E). We observed that Icer and phosphorylation of Creb are induced rapidly in macrophages upon exposure to serum shock, as early as 10 minutes for pCREB and 1 hour post-exposure for Icer. Notably, this signaling is transient and rapidly returns to baseline, with pCREB levels fully returned to baseline by 2 hours post-treatment, at which time media is replaced and the experiment begins (CT 0). These data, shown below, have been added to Supplementary Figure 2 and a discussion of these data has been added to the manuscript - please see the modified text below.
      

      "The synchronization protocol we use to study circadian rhythms in BMDMs involves a 24-hour period of serum starvation followed by 2 hours of serum shock. It has previously been shown that serum shock can induce signaling through the cAMP pathway in rat fibroblasts[98]. To determine whether the synchronization protocol impacts cAMP signaling in macrophages, we harvested macrophages before and after serum shock. We then assessed Icer expression and phosphorylation of cyclic AMP-response element binding protein (CREB), which occur downstream of cAMP and have been used as readouts to assess induction of cAMP signaling in macrophages[29, 96, 100]. Serum shock of macrophages following serum starvation led to rapid phosphorylation of CREB and Icer expression that quickly returned to baseline (Supplementary Figure 2D,E). This indicates that serum starvation followed by serum shock in the synchronization protocol we use to study circadian rhythms in BMDMs induces transient signaling through the cAMP signaling pathway. "

      The definition of pre-resolution macrophages (MF) used across the manuscript could be argued. The authors defined BMDM polarized with IL-4 and IL-13 as pre-resolution MF. Resolution is followed by inflammation, but the IL-4 secretion does not occur in every inflammatory setting. Moreover, IL-4 and IL-13 are secreted during specific tissue environment and immunological settings involving type 2 inflammation or during germinal center reactions of the lymph nodes. • What are the characteristics of pre-resolution macrophages (MF)? The authors indicated that IL-4 and IL-13 cytokines were used to model the pre-resolution macrophages. In which pathological context are these cytokines produced and induce pre-resolution macrophages? IL-4 polarized BMDM can also produce pro-inflammatory protein and lipid mediators as compared to LPS-stimulated BMDM, and IL-4 polarized BMDM still have potent capacity to recruit immune cells and to establish type 2 inflammation.

      • The authors showed Arg1 and Vegfa qPCR data from BMDM only. Based on the literature, these MFs are anti-inflammatory cells particularly. Resolution-related MFs followed by acute inflammation are a specific subset of MFs, and the phenotype of pre-resolution MF should be described, referred, and measured specifically.

      We thank Reviewer #2 for bringing up this important point that clarity is required in describing our in vitro macrophage models. We chose the most commonly used models of in vitro macrophage polarization in the tumor immunology field, M2 (IL-4+IL-13) and M1 (IFNγ+LPS). These polarization conditions have been used for over two decades in the field, and have been well-characterized to drive a pro-inflammatory (for M1) and pro-resolution or anti-inflammatory (for M2) macrophage phenotype (Murray 2017 Annu Rev Phys). Each of these cell states have similarities in phenotype to pro-inflammatory and pro-resolution (pro-tumorigenic) macrophages found in tumors. In fact, in the literature, pro-inflammatory and pro-resolution TAMs will frequently be categorized as "M1" or "M2", respectively, even though this is a gross oversimplification (Ding 2019 J Immunol, Garrido-Martin 2020 J Immunother Cancer).

      As Reviewer #2 points out, IL-4 and IL-13 play a role in inflammatory settings, mediating protective responses to parasites and pathological responses to allergens. Importantly, IL-4 and IL-13 are also key regulators and effectors of resolution and wound repair (Allen 2023 Annu Rev Immunol). In line with this, M2 macrophages show many of the characteristics of pro-resolution programming in their gene expression profile, expressing genes associated with wound healing (ex. Vegf) and immunoregulation (ex. Arg1) (Mantovani 2013 J Pathol). These cells have frequently been used as a model for studying TAMs in vitro, due to the similarity in pro-resolution programming that is dysregulated/hijacked in TAMs (Biswas 2006 Blood). M2 macrophages have also been referred to as anti-inflammatory, and this is in line with their role in the type 2 response driven by IL-4 and IL-13, as this is primarily a response induced by allergy or parasites where tissue damage drives an anti-inflammatory and pro-resolution phenotype in macrophages (Pesce 2009 Plos Pathogens and Allen 2023 Annu Rev Immunol).

      We do not assert that these in vitro models recapitulate the macrophage polarization cycle that Reviewer #2 astutely describes, and indeed, stimuli polarizing macrophages in tumor are much more diverse and complex (Laviron 2022 Cell Rep). We also fully agree with Reviewer #2 that, while IL4 and IL13 may exist in the tumor and be secreted by Th2 CD4 T cells (see Shiao 2015 Cancer Immunol Res), there may be multiple reasons why macrophages may be polarized to a pro-resolution, M2-like state in a tumor (in fact, exposure to low pH and lactate each independently do this, as we show in Supplementary Figure 2 and Figure 4, and was previously shown in Jiang 2021 J Immunol and Colegio 2014 Nature). Nonetheless, using the well-described M1 and M2 in vitro models allows our findings to be directly comparable to the vast literature that also uses these models, and to understand how distinct polarization states respond to low pH.

      We fully agree with Reviewer #2 that these cells must be defined more clearly in the text. We have taken care to discuss the limitations of using in vitro polarization models to study macrophages in our Limitations of the Study section. To better address Reviewer #2's concern, we have more thoroughly introduced the M2 macrophages as a model, and are clear that that these are type 2-driven macrophages that share characteristics of pro-resolution macrophages. We have also added additional citations to the manuscript, including those highlighted above in our response. Finally, we have expanded our panel to better characterize the IL-4/IL-13 stimulated macrophages using more markers that have been characterized in the literature, in line with both Reviewer #2's comments and that of Reviewer #1 and Reviewer #3. Please see the updated data and text, below.

      "As macrophages are a phenotypically heterogeneous population in the TME, we first sought to understand whether diversity in macrophage phenotype could translate to diversity in circadian rhythms of macrophages. To this end, we used two well-established in vitro polarization models to study distinct macrophage phenotypes[5, 60-63]. For a model of pro-inflammatory macrophages, we stimulated macrophages with IFNγ (interferon γ) and LPS (lipopolysaccharide) to elicit a pro-inflammatory phenotype[60, 64]. These macrophages are often referred to as 'M1' and are broadly viewed as anti-tumorigenic, and we will refer to them throughout this paper as pro-inflammatory macrophages[65, 66]. For a model at the opposite end of the phenotypic spectrum, we stimulated macrophages with IL-4 and IL-13[60, 67]. While these type 2 stimuli play a role in the response to parasites and allergy, they are also major drivers of wound healing; in line with this, IL-4 and IL-13-stimulated macrophages have been well-characterized to adopt gene expression profiles associated with wound-healing and anti-inflammatory macrophage phenotypes[68-71]. As such, these macrophages are often used as a model to study pro-tumorigenic macrophages in vitro and are often referred to as 'M2' macrophages; throughout this paper, we will refer to IL-4 and IL-13-stimulated macrophages as pro-resolution macrophages[66, 72, 73]. Consistent with previous studies, we found that genes associated with anti-inflammatory and pro-resolution programming characteristic of IL-4 and IL-13-stimulated macrophages such as Arg1, Retnla, Chil3 (Ym1), Clec10a (MGL1), and Mrc1 (CD206) were induced in IL-4 and IL-13-stimulated macrophages, but not IFNγ and LPS-stimulated macrophages. In contrast, genes associated with pro-inflammatory activity characteristic of IFNγ and LPS-stimulated macrophages such as Nos2 (iNOS), Tnfa, Il1b, and Il6 were induced in IFNγ and LPS-stimulated macrophages, but not IL-4 and IL-13-stimulated macrophages (Supplementary Figure 1)[28, 30, 65, 71, 74, 75]. This indicates that macrophages stimulated with IL-4 and IL-13 were polarized toward a pro-resolution phenotype, while macrophages stimulated with IFNγ and LPS were polarized toward a pro-inflammatory phenotype.

      In the Limitations of the Study section, we now write the following:

      "Our observations of rhythms in macrophages of different phenotypes are limited by in vitro polarization models. It is important to note that while our data suggest that pro-inflammatory macrophages have suppressed rhythms and increased rate of desynchrony, it remains unclear the extent to which these findings apply to the range of pro-inflammatory macrophages found in vivo. We use IFNγ and LPS co-treatment in vitro to model a pro-inflammatory macrophage phenotype that is commonly referred to as 'M1', but under inflammatory conditions in vivo, macrophages are exposed to a variety of stimuli that result in a spectrum of phenotypes, each highly context-dependent. The same is true for for 'M2'; different tissue microenvironment are different and pro-resolution macrophages exist in a spectrum."

      The authors used IFNg and LPS, or IL-4 and IL-13 and co-treatments to polarize BMDM in to type 1 (referred as pro-inflammatory MF) and type 2 (referred as pre-resolution MF) activation state. The comparison between these BMDM populations has limitations, since LPS induces a potent inflammatory response in MF. The single treatment with MF-polarizing cytokines enable a more relevant comparison to study the circadian clock in classically and alternatively activated MF.

      We thank Reviewer #2 for bringing up this important point to provide additional clarity on our polarization conditions. The use of IFNγ and LPS to polarize macrophages toward a pro-inflammatory, M1 phenotype, and the use of IL-4 an IL-13 to polarize macrophages toward a pro-resolution, M2 phenotype have been commonly used for over two decades, and thus are well-characterized in the literature (please see Murray 2017 Annu Rev Phys for an extensive review on the history of these polarization models, as well as Hörhold 2020 PLOS Computational Biology, Binger 2015 JCI, McWhorter 2013 PNAS, Ying 2013 J Vis Exp for more recent studies using these models). The use of LPS alone or in combination with IFNγ, and IL-13 along with IL-4, was introduced in 1998 (Munder 1998 J Immunol). This approach was originally designed to mimic what could happen when macrophages were exposed to CD4+ Th1 cells, which produce IFNγ, or Th2 cells, which produce IL-4 and IL-13 (Munder 1998 J Immunol, Murray 2017 Annu Rev Phys). As Reviewer #2 points out, these stimuli induce potent responses, driving macrophages to adopt pro-inflammatory or pro-resolution/anti-inflammatory phenotypes that are two extremes at opposite ends of the spectrum of macrophage phenotypes (Mosser 2008 Nat Rev Immunol). Since our goal was to study rhythms of distinct macrophage phenotypes in vitro, and how TME-associated conditions such as acidic pH and lactate affect their rhythms, these cell states were appropriate for our questions. Thus, the polarization models used in this paper allowed us to achieve this goal. We include a section in the Discussion on the limitations of in vitro polarization models.

      "A critical question in understanding the role of circadian rhythms in macrophage biology is determining how different polarization states of macrophages affect their internal circadian rhythms. This is especially important considering that tumor-associated macrophages are a highly heterogeneous population. Our data indicate that compared to unstimulated macrophages, rhythms are enhanced in pro-resolution macrophages, characterized by increased amplitude and improved ability to maintain synchrony; in contrast, rhythms are suppressed in pro-inflammatory macrophages, characterized by decreased amplitude and impaired ability to maintain synchrony (Figure 1). These agree with previously published work showing that polarizing stimuli alone and in combination with each other can alter rhythms differently in macrophages[80, 81]. In a tumor, macrophages exist along a continuum of polarization states and phenotypes[18-21, 24]. Thus, while our characterizations of rhythms in in vitro-polarized macrophages provide a foundation for understanding how phenotype affects circadian rhythms of macrophages, further experiments will be needed to assess macrophages across the full spectrum of phenotypes. Indeed, alteration of rhythms may be just as highly variable and context-dependent as phenotype itself."

      There are missing links between the results of showing the circadian rhythm of polarized BMDM, sodium-lactate treated BMDM, and tumor growth. Specifically, do the used pancreatic ductal adenocarcinoma cells produce IL-4 and sodium-lactate? In the LLC-based experimental in silico analysis of tumors, the LLC do not produce IL-4.

      Reviewer #2 raises important points about the source of lactate and IL-4 in tumors as relevance for our investigation of how these factors can alter rhythms in macrophages. Tumor-infiltrating Th2 CD4 T cells are potential sources of IL-4 and IL-13 in the tumor (see Shiao 2015 Cancer Immunol Res). Various cells in the tumor can produce lactate. We discuss this in both the Introduction and the Results: poor vascularization of tumors results in hypoxia areas, where cells are pushed toward glycolysis to survive and thus secrete increased glycolytic waste products such as protons and lactate. As lactate is lactic acid, ionized it is sodium l-lactate.

      How can the circadian rhythm affect the function of BMDM? The Authors should provide evidence that circadian rhythm affects the function of polarized MF.

      We agree with Reviewer #2 that the next step is to determine how altered rhythms influence function of macrophages. This will be the topic of future work, but is outside the scope of this paper. Our contribution with this paper is providing the first evidence that rhythms are altered in the TME and the TME-associated conditions can alter rhythms in macrophages. We have added what is currently known about how circadian rhythms influence macrophages function to the discussion section to facilitate a conversation about this important future direction. Please see the updated text below.

      "Considering our observations that conditions associated with the TME can alter circadian rhythms in macrophages, it becomes increasingly important to understand the relevance of macrophage rhythms to their function in tumors. It has been shown that acidic pH and lactate can each drive functional polarization of macrophages toward a phenotype that promotes tumor growth, with acidic pH modulating phagocytosis and suppressing inflammatory cytokine secretion and cytotoxicity[28, 30, 93]. However, how the changes in circadian rhythms of macrophages driven by these conditions contributes to their altered function remains unknown. Current evidence suggests that circadian rhythms confer a time-of-day-dependency on macrophage function by gating the macrophage response to inflammatory stimuli based on time-of-day. As such, responses to inflammatory stimuli such as LPS or bacteria are heightened during the active phase while the inflammatory response is suppressed during the inactive phase. An important future direction will be to determine how changes in circadian rhythms of macrophages, such as those observed under acidic pH or high lactate, influences the circadian gating of their function."

      In Figure 3, the authors show data from peritoneal cells. The isolated peritoneal cells are not pure macrophage populations. Based on the referred article in the manuscript, the peritoneal cavity contains more then 50% of lymphocytes, and the myeloid compartment contains 80% macrophages.

      Reviewer #2 raises important concerns about the purity of the peritoneal population used in our experiments. We enrich for peritoneal macrophages from the peritoneal exudate cells by removing non-adherent cells in culture. This is described in our Methods section and is a method of isolation that is commonly used in the field, as lymphocytes are non-adherent. In addition to the source cited in the paper within our Methods section (Goncalves 2015 Curr Prot Immunol), please see Layoun 2015 J Vis Exp, de Jesus 2022 STAR Protocols, and Harvard HLA Lab protocol - macrophages enriched in this manner have been shown to be over 90% pure. We have modified our Methods section to make this clear, and added the additional references in this response to this section of our Methods. Please see the modified text below.

      "Peritoneal exudate cells were harvested from mice as previously published[137]. To isolate peritoneal macrophages, peritoneal exudate cells were seeded at 1.2*106 cells/mL in RPMI/10% HI FBS supplemented with 100U/mL Penicillin-Streptomycin and left at 37⁰C for 1 hour, after which non-adherent cells were rinsed off[136]. Isolation of peritoneal macrophages using this method has been shown to yield a population that is over 90% in purity[138, 139]. Peritoneal macrophages were then cultured in Atmospheric Media at pH 7.4 or 6.5 with 100μM D-luciferin, and kept at 37⁰C in atmospheric conditions."

      The figure legend of Figure 3 describes the effects of pH on the circadian rhythm of bone marrow-derived macrophages ex vivo. Peritoneal macrophages involve tissue resident peritoneal macrophages with yolk sac and fetal liver origin, and also involve small peritoneal MF with bone marrow origin. The altered description of results and figure legends makes confusion.

      We are very grateful to Reviewer #2 for pointing out our typo. We have fixed the caption of Figure 3 to properly describe the data as "peritoneal macrophages ex vivo".

      In Figure 6C, one single Western blot is shown with any quantification. The authors should provide data of the relative protein level of p-CREB from at least 3 independent experiments. In the Western-blot part of the methods, the authors described that the pellet was discarded after cell lysis. The p-CREB is the activated form of the transcription factor CREB and there is increased binding to the chromatin to regulate gene expression. By discarding the pellet after cell lysis, the chromatin-bond p-CREB could be also removed at the same time.

      We thank Reviewer 2 for bringing up this point. We agree that quantification is an important aspect of western blot. We have repeated the experiment again for n=3 and provide quantification of pCREB normalized to total protein. We have added these data, shown below, to Figure 5.

      Reviewer #2 also expressed concern that we may not be capturing all of the CREB due to nuclear localization and chromatin binding. We specifically chose the lysis buffer M-Per, which is formulated to lyse the nucleus and solubilize nuclear and chromatin-bound proteins. To demonstrate this, we show in the below Figure to the Reviewer that the nuclear protein p85 is solubilized and readily detectable by western blot using our protein extraction method.

      We have also added an additional sentence in the Methods section for clarity - please see the modified text below.

      "Cells were lysed using the M-Per lysis reagent (Thermo Scientific, CAT#78501), supplemented with protease and phosphatase inhibitor cocktail (1:100; Sigma, CAT#PPC1010) and phosphatase inhibitor cocktail 2 (1:50; Sigma, CAT#P5726), with 200μM deferoxamine (Sigma, CAT#D9533). M-Per is formulated to lyse the nucleus and solubilize nuclear and chromatin-bound proteins, allowing isolation of nuclear proteins as well as cytosolic proteins. Lysates were incubated on ice for 1 hour, then centrifuged at 17,000 xg to pellet out debris; supernatant was collected."

      It is confusing that adenylate-cyclase inhibitor MDL-12 elevated the phospho-CREB levels in BMDM. How can the authors exclude any other inducers of CREB phosphorylation?

      We agree with Reviewer #2 that it is surprising pCREB was elevated with MDL-12 treatment alone, and we do indeed think that there are other pathways contributing to this. We have addressed this point in the Discussion - please see the text below.

      "The mechanism through which acidic pH can modulate the circadian clock in macrophages remains unclear. Evidence in the literature suggests that acidic pH promotes a pro-resolution phenotype in macrophages by driving signaling through the cAMP pathway[29]. It has previously been shown that cAMP signaling can modulate the circadian clock[99]. However, our data indicated that cAMP signaling was not fully sufficient to confer pH-mediated changes in circadian rhythms of macrophages (Figure 5A,B). Treatment with MDL-12, commonly known as an inhibitor of adenylyl cyclase[29, 117], resulted in suppression of pH-induced changes in amplitude of circadian rhythms but did not inhibit signaling through the cAMP signaling pathway (Figure 5C,D). While MDL-12 is commonly used as an adenylyl cyclase inhibitor, it has also been documented to have inhibitory activity toward phosphodiesterases (PDEs) and the import of calcium into the cytosol through various mechanisms[118, 119]. This is of particular interest, as calcium signaling has also been shown to be capable of modulating the circadian clock[120]. Furthermore, while acid-sensing through GPCRs have been the most well-characterized pathways in macrophages, there remain additional ways in which acidic pH can be sensed by macrophages such as acid-sensing ion channels[121, 122]. Further work is required to understand the signaling pathways through which pH can influence macrophage phenotype and circadian rhythms."

      It is described in the methods that BMDM were starving for 24 hours in serum-free culture media followed by serum shock (50% FBS). The cAMP production can be induced during cell starvation which should be considered for the data representation.

      We appreciate that Reviewer #2 points out that our synchronization protocol of serum starvation followed by serum shock may impact the cAMP signaling pathway in macrophages, as serum shock has been shown to induce pCREB, a downstream mediator of cAMP signaling, in rat fibroblasts. Indeed, we show in additional experiments performed (in response to Reviewer #2's major comment 1) evidence that cAMP signaling is induced in macrophages following the serum shock phase of our synchronization protocol, as indicated by elevation of Icer and pCREB. As we note above, this induction is transient and returns to baseline by 2 hours post-serum shock, the time at which we replace media and begin our experiments (CT 0).

      Despite the transient nature of cAMP induction by our synchronization protocol, we agree wholeheartedly with Reviewer #2 that this must be considered in light of our experimental system in which we are studying the effect of acidic pH on circadian rhythms of macrophages, which in itself induces signaling through the cAMP signaling pathway. To address Reviewer #2's point, we have performed experiments in which we culture unstimulated BMDMs in neutral pH 7.4 or acidic pH 6.5, without prior serum starvation and serum shock (i.e. we do not submit these BMDMs to the synchronization protocol). We then observed circadian rhythms of Per2-Luc by LumiCycle to determine whether acidic pH alters circadian rhythms of BMDMs in the absence of prior serum starvation followed by serum shock. Similar to our observations in Figure 2, circadian rhythms of macrophages at pH 6.5 had increased amplitude and shortened period compared to rhythms of macrophages at pH 7.4. This indicates that pH-driven changes in circadian rhythms observed in our system are not due to the synchronization protocol. The data, shown below, have been placed in a new Supplementary Figure 6, and a discussion of these results has been added to the Results section - please see the updated text below.

      "As acidic pH induces signaling through the cAMP pathway, we sought to determine whether acidic pH independently contributed to the pH-driven changes in circadian rhythms we observe in BMDMs. To test this, we omitted the synchronization step and observed BMDM rhythms by LumiCycle when cultured in neutral pH 7.4 or acidic pH 6.8 or pH 6.5 (Supplementary Figure 6). Circadian rhythms of BMDMs cultured at pH 6.5 exhibited similar changes as previously observed, with enhanced amplitude and shortened period relative to BMDMs at pH 7.4. This indicates pH-driven changes observed in circadian rhythms of BMDMs occur in the absence of prior serum starvation and serum shock. "As acidic pH independently induces signaling through the cAMP pathway, we sought to determine whether acid pH could also independently contribute to the pH-driven changes in circadian rhythms we observe in BMDMs. To test this, we omitted the synchronization step and observed BMDM rhythms by LumiCycle when cultured in neutral pH 7.4 or acidic pH 6.8 or pH 6.5 (Supplementary Figure 6). Circadian rhythms of BMDMs cultured at pH 6.5 exhibited similar changes as previously observed, with enhanced amplitude and shortened period relative to BMDMs at pH 7.4. This indicates pH-driven changes observed in circadian rhythms of BMDMs occur in the absence of prior serum starvation and serum shock."

      How can the authors explain and prove that the wild type and Bmal1-KO BMDM co-injected with pancreatic cancer cells subcutaneously survive, present, and have effector functions at the same extent in the subcutaneous tissue, before and during tumor growth (Figure 9)? In other words, what kind of MF-derived parameters could be modified by disrupting the circadian rhythm of MF during tumor development? The production of MF-derived regulatory enzymes, cytokines, growth factors are affected by the disrupted circadian clock in MF?

              Review #2 poses the very important question of why we see differences in tumor growth in our co-injection model, and what might be driving it. Of note, co-injection models of tumor growth are commonly used to determine macrophage-specific roles in tumor growth (Colegio 2014 Nature, Mills 2019 Cell Rep, Lee 2018 Nat Comm). We observed that tumor growth is altered when macrophages with disrupted circadian rhythms (BMAL1 KO) are co-injected compared to when macrophages with intact circadian rhythms (WT) are co-injected in a murine model of pancreatic cancer using KCKO cells. Our observation is supported by a previously published paper in which they used a co-injection model of melanoma, which we cite in the manuscript(Alexander 2020 eLife). What drives this difference in tumor growth remains an open question that is the subject of future work and is outside the scope of this paper, which focuses on our discovery that factors associated with the tumor microenvironment can alter circadian rhythms in macrophages. We have included a discussion on what is currently known about how circadian rhythms alter macrophage function, acknowledging that we have yet to answer these important questions and identifying it as interest for future work. Please see the text below.
      

      "Considering our observations that conditions associated with the TME can alter circadian rhythms in macrophages, it becomes increasingly important to understand the relevance of macrophage rhythms to their function in tumors. It has been shown that acidic pH and lactate can each drive functional polarization of macrophages toward a phenotype that promotes tumor growth, with acidic pH modulating phagocytosis and suppressing inflammatory cytokine secretion and cytotoxicity[28, 30, 93]. However, how the changes in circadian rhythms of macrophages driven by these conditions contributes to their altered function remains unknown. Current evidence suggests that circadian rhythms confer a time-of-day-dependency on macrophage function by gating the macrophage response to inflammatory stimuli based on time-of-day. As such, responses to inflammatory stimuli such as LPS or bacteria are heightened during the active phase while the inflammatory response is suppressed during the inactive phase. An important future direction will be to determine how changes in circadian rhythms of macrophages, such as those observed under acidic pH or high lactate, influences the circadian gating of their function. Data from our lab and others suggest that disruption of the macrophage-intrinsic circadian clock accelerates tumor growth, indicating that circadian regulation of macrophages is tumor-suppressive in models of PDAC (our work) and melanoma [109]. This agrees with complementary findings that behavioral disruption of circadian rhythms in mice (through chronic jetlag) disrupts tumor macrophage circadian rhythms and accelerates tumor growth[56]. It remains unclear whether this is through the pro-tumorigenic functions of macrophages such as extracellular matrix remodeling or angiogenesis, through suppression of the anti-tumor immune response, or a combination of both functions. Further work will be needed to tease apart these distinctions."

      Minor points of criticism: 1. The figure legends of the graphs and diagrams are missing in Figure 2D,E,F

      We thank Reviewer #2 for pointing out that figure legends were missing. We have added legends for Figure 2D,E,F.

      The BMAL1-based in vivo murine model of circadian rhythm is not introduced in the manuscript.

      We thank Reviewer #2 for bringing to our attention that the BMAL1 KO macrophage model was not well-introduced in the manuscript. To address this point, we have modified the text to better introduce this model. Please see the modified text below.

      "As a positive control for circadian clock disruption, we used data from BMAL1 KO peritoneal macrophages [44]. BMAL1 KO macrophages have a genetic disruption of the circadian clock due to the loss of Bmal1, the central clock gene. As a result, circadian rhythms of BMAL1 KO macrophages are disrupted, lacking rhythmicity and downstream circadian regulation of macrophage function (Supplementary Figure 8)[45, 54]. "As a positive control for circadian clock disruption, we used data from BMAL1 KO peritoneal macrophages [44]. BMAL1 KO macrophages have a genetic disruption of the circadian clock due to the loss of Bmal1, the central clock gene. As a result, circadian rhythms of BMAL1 KO macrophages are disrupted, lacking rhythmicity and downstream circadian regulation of macrophage function (Supplementary Figure 8)[45, 54]."__ __

      Significance

      Knudsen-Clark et al. showed that the circadian rhythm of bone marrow-derived macrophages (BMDM) can be affected by polarization stimuli, pH of the microenvironment, and by the presence of sodium-lactate. Mechanistically, the acidic pH of cell microenvironment is partly regulated by intracellular cAMP-mediated signaling events in BMDM. The authors also showed that the circadian clock of peritoneal macrophages is also modified by the pH of the cell microenvironment. Using publicly available data, the authors showed that the circadian rhythm of tumor-associated macrophages is similar to that of Bmal1-KO peritoneal macrophages. In a murine model of pancreatic cancer, the authors showed that the tumor growth is accelerated in C57BL/6 mice co-injected with cancer cells and Bmal1-KO BMDM as compared to mice co-injected with cancer cell and wild type BMDM.

      We are grateful to Reviewer #2 for their very helpful comments and suggestions, which we believe have greatly enhanced the clarity and reproducibility of this manuscript.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      Review for Knudsen-Clark et al.

      "Circadian rhythms of macrophages are altered by the acidic pH of the tumor microenvironment"

      Knudsen-Clark and colleagues explore the impact of TME alterations on macrophage circadian rhythms. The authors find that both acidic pH and lactate modulate circadian rhythms which alter macrophage phenotype. Importantly, they define circadian disruption of tumor-associated macrophages within the TME and show that circadian disruption in macrophages promotes tumor growth using a PDAC line. This represents an important understanding of the crosstalk between cancer cells and immune cells as well as the understanding of how the TME disrupts circadian rhythms. The study is well-done, however, authors need to address several important points below.

      We thank Reviewer #3 for their in-depth and insightful comments and suggestions, which have resulted in a much-improved manuscript. We were pleased that Reviewer #3 found the work to be "an important study that is well-done" and that it "represents an important understanding of the crosstalk between cancer cells and immune cells as well as the understanding of how the TME disrupts circadian rhythms.". In response to Reviewer #3's comments, we have added several new key experiments and changes to the text. To summarize, we added new data to Supplementary Figure 1 to better characterize our macrophage polarization states, showed in Figure 3 that low pH affects peritoneal macrophage circadian gene expression in a similar fashion as bone marrow-derived macrophages, added new data in Figure 4 to show how lactate and low pH affect circadian gene expression over time, and new computational analysis to Figures 6, 7, and Supplementary Figure 9 to probe circadian gene covariance from publicly available data. We also made several key additions to the Discussion to discuss the functional implications of macrophage circadian rhythm disruption by low pH and potential mechanisms of this disruption. Finally, at the request of Reviewer #3, we consolidated several existing Figures and added new data, where appropriate, to existing figures, and we worked to describe new findings succinctly.

      Major comments:

      • In Figures 3 and 4, the authors can include additional clock genes that can be run by qPCR. This was done in Figure 2 and was a nice addition to the data.

      We agree with Reviewer #3's suggestion that an analysis of clock gene expression at the mRNA level would enhance our data in Figures 3 and 4. To address this point, we have performed short time course experiments to assess circadian clock gene expression over time in BMDMs cultured with or without lactate at neutral or acidic pH (for Figure 4). In line with the difference in circadian rhythms of Per2-Luc levels between BMDMs cultured in the presence or absence of lactate which we observed by Lumicycle analysis, we measured changes in expression of the circadian clock genes Per2, Nr1d1, and Cry1 between macrophages cultured with 25 mM sodium-L-lactate compared to those cultured with 0 mM sodium-L-lactate at pH 6.5. We have added these data, shown below, to Figure 4, and updated the manuscript accordingly to discuss these results. Please see below for the new Figure Panel and modified text.

      "Lactate was also observed to alter expression of the circadian clock genes Per2, Cry1, and Nr1d1 over time in BMDMs cultured at pH 6.5, while having more subtle effects at pH 7.4 (Figure 4C). Notably, lactate blunted the effect of pH 6.5 on Cry1 expression, while enhancing the effect of low pH on Nr1d1 expression. In all, these data indicate that concentration of lactate similar to that present in the TME can influence circadian rhythms and circadian clock gene expression of macrophages."

      As an additional measure to address Reviewer #3's point about Figure 3 (peritoneal macrophages), we have compared expression of circadian clock genes in peritoneal macrophages cultured at neutral pH 7.4 or acidic pH 6.8 for 24 hours using a publicly available RNA-seq data set from Jiang 2021 J Immunol (GSE164697). In line with previous observations in macrophages cultured under acidic compared to neutral pH conditions, including the clock gene expression data from Figure 2 in BMDMs and the Per2-Luc levels observed in peritoneal macrophages in Figure 3, we found that peritoneal macrophages exhibited differences in expression of circadian clock genes when cultured at acidic pH 6.8 compared to neutral pH 7.4. We have added these data, shown below, as Figure 3B, and have updated the manuscript accordingly - please see below for the new Figure panel and modified text.

      "To test whether pH-driven changes in circadian rhythms of peritoneal macrophages were reflected at the mRNA level, we compared expression of circadian clock genes in peritoneal macrophages cultured at neutral pH 7.4 or acidic pH 6.8 for 24 hours using publicly available RNA-sequencing data [30]. In line with altered circadian rhythms observed by Lumicycle, peritoneal macrophages cultured at pH 6.8 expressed different levels of circadian clock genes than peritoneal macrophages culture at pH 7.4 (Figure 3B). The trends in changes of gene expression in peritoneal macrophages cultured at pH 6.8 matched what we observed in BMDMs, where low pH generally led to higher levels of circadian clock gene expression (Figure 2D-F). These data support our observations by LumiCycle and indicate that acidic pH drives transcriptional changes in multiple components of the circadian clock. In all, these data are evidence that pH-dependent changes in circadian rhythms are relevant to in vivo-differentiated macrophages."

      We have also updated the Methods section appropriately

      "FASTQ files from a previously published analysis of peritoneal macrophages cultured under neutral pH 7.4 or acidic pH 6.8 conditions were downloaded from NCBI GEO (accession #GSE164697) [30]."

      2) There are far too many figures with minimal data in each. Please consolidate the figures. For example, Figures 1-3 can be fully combined, Figures 4-6 can be combined, and Figures 7-8 can be combined. Additionally, it is unclear if Figure 5 needs to be in the main, it can be moved to the supplement.

      We appreciate the preference of Reviewer #3 to see some of the figures consolidated. We have combined Figures 5 and 6 into a single new Figure 5. Additionally, we have added new data from revisions to current figures to increase the amount of data in each figure and minimize the amount of new figures generated. In all, despite the large amount of new data added to the paper in response to Reviewer comments and suggestions (including additional data in Figure 4 and new Figures 6 and 8), our manuscript now contains 10 main Figures, only one more than the initial submission.

      3) The observation that conditions like pH and lactate alter macrophage phenotype and rhythmicity are important. However, macrophage phenotype via gene expression does not always correlate to function. It is important for authors to demonstrate the effect of pH or lactate on macrophage function. This can be done using co-culture assays with cancer cells. If these experiments cannot be performed, it is suggested that authors discuss these limitations and consideration in the discussion.

      Reviewer #3 correctly points out that changes in phenotype does not always correlate to changes in function. Others have shown that acidic pH and lactate can each alter macrophage phenotype, and also alter macrophage function and the ability to promote tumor growth (please see El-Kenawi 2019 Br J Cancer, Jiang 2021 J Immunol, Colegio 2014 Nature). How changes in rhythms influence macrophage function remains unknown and we agree with Reviewer #3 that this is an important future direction, We have added a section in the Discussion to facilitate the discussion of this important future direction. Please see the text below.

      "Considering our observations that conditions associated with the TME can alter circadian rhythms in macrophages, it becomes increasingly important to understand the relevance of macrophage rhythms to their function in tumors. It has been shown that acidic pH and lactate can each drive functional polarization of macrophages toward a phenotype that promotes tumor growth, with acidic pH modulating phagocytosis and suppressing inflammatory cytokine secretion and cytotoxicity[28, 30, 93]. However, how the changes in circadian rhythms of macrophages driven by these conditions contributes to their altered function remains unknown. Current evidence suggests that circadian rhythms confer a time-of-day-dependency on macrophage function by gating the macrophage response to inflammatory stimuli based on time-of-day. As such, responses to inflammatory stimuli such as LPS or bacteria are heightened during the active phase while the inflammatory response is suppressed during the inactive phase. An important future direction will be to determine how changes in circadian rhythms of macrophages, such as those observed under acidic pH or high lactate, influences the circadian gating of their function."

      4) On line 119-122, authors describe a method for polarization of macrophages. They then reference one gene to confirm each macrophage polarization state. To more definitively corroborate proper macrophage polarization, authors should perform qPCR for additional target genes that are associated with each phenotype. For example, Socs3, CD68, or CD80 for M1, and CD163 or VEGF for M2. Alternatively, the authors should cite previous literature validating this in vitro polarization model.

      We appreciate Reviewer #3's suggestion to better the phenotypic identity of our polarization models with additional canonical markers. To address this point, we have expanded our panel using transcriptional markers commonly used in the murine polarization model for M1 macrophages such as Tnfa, Il6, and Il1b. As discussed in the response to Reviewer #1's minor point 5 and Reviewer #2's major point 2, we have also expanded our panel to include additional markers for M2 such as Vegf, Retnla, Ym1, Mgl1, and CD206. We have added these new data to Supplementary Figure 1. Finally, we have added additional citations for the in vitro polarization models. Please see the modified text and new data, below.

      "As macrophages are a phenotypically heterogeneous population in the TME, we first sought to understand whether diversity in macrophage phenotype could translate to diversity in circadian rhythms of macrophages. To this end, we used two well-established in vitro polarization models to study distinct macrophage phenotypes[5, 60-63]. For a model of pro-inflammatory macrophages, we stimulated macrophages with IFNγ (interferon γ) and LPS (lipopolysaccharide) to elicit a pro-inflammatory phenotype[60, 64]. These macrophages are often referred to as 'M1' and are broadly viewed as anti-tumorigenic, and we will refer to them throughout this paper as pro-inflammatory macrophages[65, 66]. For a model at the opposite end of the phenotypic spectrum, we stimulated macrophages with IL-4 and IL-13[60, 67]. While these type 2 stimuli play a role in the response to parasites and allergy, they are also major drivers of wound healing; in line with this, IL-4 and IL-13-stimulated macrophages have been well-characterized to adopt gene expression profiles associated with wound-healing and anti-inflammatory macrophage phenotypes[68-71]. As such, these macrophages are often used as a model to study pro-tumorigenic macrophages in vitro and are often referred to as 'M2' macrophages; throughout this paper, we will refer to IL-4 and IL-13-stimulated macrophages as pro-resolution macrophages[66, 72, 73]. Consistent with previous studies, we found that genes associated with anti-inflammatory and pro-resolution programming characteristic of IL-4 and IL-13-stimulated macrophages such as Arg1, Retnla, Chil3 (Ym1), Clec10a (MGL1), and Mrc1 (CD206) were induced in IL-4 and IL-13-stimulated macrophages, but not IFNγ and LPS-stimulated macrophages. In contrast, genes associated with pro-inflammatory activity characteristic of IFNγ and LPS-stimulated macrophages such as Nos2 (iNOS), Tnfa, Il1b, and Il6 were induced in IFNγ and LPS-stimulated macrophages, but not IL-4 and IL-13-stimulated macrophages (Supplementary Figure 1)[28, 30, 65, 71, 74, 75]. This indicates that macrophages stimulated with IL-4 and IL-13 were polarized toward a pro-resolution phenotype, while macrophages stimulated with IFNγ and LPS were polarized toward a pro-inflammatory phenotype.

      5) Several portions of the manuscript are unnecessarily long, including the intro and discussion. Please consolidate the text. The results section is also very lengthy, please consider consolidation.

      We appreciate Reviewer #3's preference for a shorter manuscript. The revised manuscript, in response to the many Reviewer comments and requests, contains many new pieces of data, and we have taken care to describe these new data as briefly and simply as possible. In preparation for this Revision, we also removed and shortened several sections of the Results and Discussion where we felt extra explanation was not necessary. We will work with the editor of the journal we submit to ensure the length of the manuscript sections is compliant with the journal's guidelines.

      6) The authors find that macrophage phenotype impacts rhythmicity. However, there is no mechanistic understanding of why this occurs. The authors should provide some mechanistic insight on this topic in the discussion.

      We agree with Reviewer #3 that while the mechanism by which macrophage phenotype alters rhythms remains unknown, this is an important topic of discussion. While there is some literature on how circadian rhythms modulate inflammatory response (and hints at how it may influence phenotype) in macrophages, there is very little on the converse: how phenotype may influence circadian rhythms. We address this point by expanding on our Discussion - please see the modified text below.

      "Elucidating the role of circadian rhythms in regulation of macrophage biology necessitates a better understanding of the crosstalk between phenotype and circadian rhythms. Although lactate polarizes macrophages toward a pro-resolution phenotype similar to acidic pH[30, 93], exposure to lactate had different effects on circadian rhythms - and in some cases, circadian clock gene expression - than exposure to acidic pH (Figure 4). Sensing of lactate occurs through different pathways than acid-sensing, which may contribute to the different ways in which these two stimuli modulate circadian rhythms of macrophages[111]. One previously published finding that may offer mechanistic insight into how phenotype can influence circadian rhythms is the suppression of Bmal1 by LPS-inducible miR-155[54]. It has also been observed that RORα-mediated activation of Bmal1 transcription is enhanced by PPARγ co-activation[112]. In macrophages, PPARγ expression is induced upon stimulation with IL-4 and plays a key role in alternative activation of macrophages, promoting a pro-resolution macrophage phenotype, and supporting resolution of inflammation[113-115]. Such observations prompt the question of whether there are yet-unidentified factors induced downstream of various polarizing stimuli that can modulate expression of circadian genes at the transcriptional and protein levels. Further work is required to understand the interplay between macrophage phenotype and circadian rhythms."

      7) The data presented in Figure 9 is very intriguing and arguably the strongest aspect of the paper. To strengthen the point, the authors could repeat this experiment with an additional cell model, another PDAC line or a different cancer line.

      We appreciate Reviewer #3's comment about the impact of tumor growth data. Indeed, our finding that deletion of Bmal1 in co-injected macrophages accelerated PDAC growth has been recapitulate by others in different cancer models. This lends strength to our observations. We discuss and cite complementary work on macrophage rhythms and tumor growth in other models of cancer the Discussion, please see below.

      "Data from our lab and others suggest that disruption of the macrophage-intrinsic circadian clock accelerates tumor growth, indicating that circadian regulation of macrophages is tumor-suppressive in models of PDAC (our work) and melanoma [109]. This agrees with complementary findings that behavioral disruption of circadian rhythms in mice (through chronic jetlag) disrupts tumor macrophage circadian rhythms and accelerates tumor growth[56]."

      Minor Comments:

      1) Data is Figure 2 is interesting and the impact on circadian rhythms is clear based on changes in amplitude and period. However, though the impact on period and amplitude is clear from Figures 2A-C, the changes in circadian gene expression are less clear. For instance, though amplitude is up in 2B, amplitude is suppressed in 2C. However, that does not appear to be reflected in the gene expression data in Figures 2E and F. The authors should comment on this.

      Reviewer #3 correctly points out that there appear to be discrepancies between the LumiCycle data in Figure 2 and the circadian gene expression data in Figure 2. This discrepancy is perhaps unsurprising given that the gene expression data is only a short time course over 12 hours, while the LumiCycle data are collected over a course of 3 days. The gene expression data do not allow us to determine changes in period or rhythm. Another point of interest is that it's been shown that circadian regulation occurs on many different levels (transcriptional, post-transcriptional, translational, post-translational). As result of this, circadian patterns observed in gene transcripts don't always match those of their encoded proteins; just the same, circadian patterns of proteins aren't always reflected in their encoding gene transcripts (Collins 2021 Genome Res). Due to this multi-level regulation, we propose that the results of the LumiCycle analysis, which measures PER2-Luc levels, are a more robust readout of rhythms because they are further downstream of the molecular clock than transcriptional readouts. That said, observing changes at both the protein (by Lumicycle) and transcriptional level confirm that all components of the clock are altered by acidic pH, even if the way in which they are altered appears to differ. We have incorporated the points we raised above into the Results section.

      Please see the modified text below.

      "Low pH was also observed to alter the expression of the circadian clock genes Per2, Cry1, and Nr1d1 (REV-ERBα) over time across different macrophage phenotypes, confirming that multiple components of the circadian clock are altered by acidic pH (Figure 2D-F). Notably, the patterns in expression of circadian genes did not always match the patterns of PER2-Luc levels observed by LumiCycle. This is perhaps unsurprising, as circadian rhythms are regulated at multiple levels (transcriptional, post-transcriptional, translational, post-translational); as a result, circadian patterns observed in circadian proteins such as PER2-Luc do not always match those of their gene transcripts[77]."

      2) On line 156-158, authors describe damping rate. I believe the authors are trying to say that damping rate increases as the time it takes cells to desynchronize decreases and vice versa. However, this point needs to be better explained.

      We thank Reviewer #3 for bringing to our attention that this was not communicated clearly in the text. We have adjusted our explanation to be clearer. Please see the modified text below.

      "Damping of rhythms in most free-running cell populations (defined as populations cultured in the absence of external synchronizing stimuli) occurs naturally as the circadian clocks of individual cells in the population become desynchronized from each other; thus, damping can be indicative of desynchrony within a population[84]. The damping rate increases as the time it takes for rhythms to dissipate decreases; conversely, as damping rate decreases as the time it takes for rhythms to dissipate increases."

      3) Data presented in Figures 3 and 4 are different in terms of the impact of changing the pH. The source of the macrophages is different, but the authors could clarify this further.

      We thank Reviewer #3 for this comment. Our conclusion is that the impact of low pH is largely similar in Figure 3 (peritoneal macrophages) and Figure 4 (BMDMs exposed to low pH and lactate). In both Figures 3 and 4, exposure to acidic pH by culturing macrophages at pH 6.5 increased amplitude, decreased period, and increased damping rate compared to macrophages cultured at neutral pH 7.4.

      4) For heatmaps shown in Figures 7 and 8, please calculate covariance and display asterisks where P We thank Reviewer #3 for the excellent suggestion to use an additional approach to asses circadian clock status in samples by measuring co-variance in the circadian clock gene network. To address this point, we have performed weighted gene co-expression network analysis (WGCNA) to calculate covariance, as was originally performed in Chun and Fortin et al Science Advances 2022. For the samples analyzed in Figure 7 (now Figure 6), we have added these data to the figure. We have applied this analysis to a new set of human data that we analyzed and added it to the new Figure 7. Finally, for the samples analyzed in Figure 8, we have added these data as a new Supplementary Figure 9. Please see the data and modified text below.

      Figure 6

      "Weighted gene co-expression network analysis (WGCNA) has been used as an alternate approach to measure the co-variance between clock genes and thus assess bi-directional correlations among the core clock gene network in healthy tissue and tumor samples [103]. In line with the circadian disorder observed by CCD, while many bi-directional correlations among the core clock gene network were significant and apparent in wild type peritoneal macrophages, these relationships were altered or abolished within BMAL1 KO peritoneal macrophages and TAM samples, and in some cases replaced by new relationships (Figure 6E). This indicates that there is population-level disorder in the circadian rhythms of tumor-associated macrophages in murine lung cancer."

      Figure 7

      "We next assessed the status of the circadian clock in human TAMs from NSCLC patients. We performed CCD with publicly available RNA-seq data of tumor-adjacent macrophages and tumor-associated macrophages from NSCLC patients, using alveolar macrophages from healthy donors as a control[104, 105]. To assess the contribution of the acidic TME to circadian disorder, we subset TAM NSCLC patient samples into groups (Crem high TAMs and Crem low TAMs) based on median Crem expression. Notably, in macrophages from human NSCLC there was a trend toward disorder in Crem low but not Crem high TAM samples (Figure 7A,B). Additionally, the co-variance among core clock genes observed in alveolar macrophages from healthy donors was absent within Crem low and Crem high TAM samples (Figure 7C). In all, these data indicate that there is population-level disorder in the circadian rhythms of tumor-associated macrophages in humans and mice, suggesting that circadian rhythms are indeed altered in macrophages within the TME."

      Supplementary Figure 9

      "CCD score worsened as populations became increasingly desynchronized, with the 12hr desynchronized population having a significantly worse CCD score than synchronized, homogenous macrophage population (Figure 8C). This indicates that as circadian rhythms of individual macrophages within a population become more different from each other, circadian disorder increases at the population-level. This is further supported by WGCNA, which revealed that the significant co-variance of circadian clock genes in the synchronized population was progressively altered and lost as the population is increasing desynchronized to 12 hours (Supplementary Figure 9)."

      Reviewer #3 (Significance (Required)):

      This is an important study that is well-done. It is the feeling of the reviewer that the study warrants a revision, at the discretion of the editor. The study represents an important understanding of the crosstalk between cancer cells and immune cells as well as the understanding of how the TME disrupts circadian rhythms.

      We thank Reviewer #3 for their comments regarding the impact and significance of our work. As shown by the comments above, we are confident we have fully addressed each of the points that were made to result in a much-improved revised manuscript.




    1. Author response:

      The following is the authors’ response to the original reviews.

      Editors’ recommendations for the authors

      The reviewers recommend the following: 

      (a) Digging deeper into the discussion of the density-dependent dispersal. 

      (b) Clarifying the microfluidic setup.  

      (c) Clarifying the description and interpretation of the transcriptomic evidence. 

      (d) Toning down carbon cycle connections (some reviewers felt the evidence did not fully support the claims). 

      We would like to thank the editors for their thoughtful evaluation of our manuscript and their clear suggestions. We have revised the manuscript in the light of these comments, as we outline below and address in detail in the point-by-point response to the reviewers’ comments that follows. 

      (a) We have expanded the discussion of density-dependent dispersal and revised Figure 2C to improve clarity. 

      (b) We have also added further information concerning the microfluidic setup in the results section and provide an illustration of the setup in a new figure panel, Figure 1A.

      (c) Addressing the reviewers’ comments on the transcriptomic analysis, we have added more information in the description and interpretation of the results. 

      (d) We have rephrased the text describing the role of degradation-dispersal cycles for carbon cycling to highlight it as the motivation of this study and emphasize the link to literature on foraging, without creating expectations of direct measurements of global carbon cycling.

      Public Reviews:

      Reviewer #1 (Public Review):

      [...]

      Weaknesses: 

      Much of the genetic analysis, as it stands, is quite speculative and descriptive. I found myself confused about many of the genes (e.g., quorum sensing) that pop up enriched during dispersal quite in contrast to my expectations. While the authors do mention some of this in the text as worth following up on, I think the analysis as it stands adds little insight into the behaviors studied. However, I acknowledge that it might have the potential to generate hypotheses and thus aid future studies. Further, I found the connections to the carbon cycle and marine environments in the abstract weak --- the microfluidics setup by the authors is nice, but it provides limited insight into naturalistic environments where the spatial distribution and dimensionality of resources are expected to be qualitatively different. 

      We thank the reviewer for their suggestions to improve our manuscript. We agree that the original manuscript would have benefitted from more detailed interpretation of the observed changes in gene expression. We have revised the manuscript to elaborate on the interpretation of the changes in expression of quorum sensing genes (see response to reviewer 1, comment 3), motility genes (see response to reviewer 1, comment 6), alginate lyase genes (see response to reviewer 1, comment 7 and reviewer 2, comment 2), and ribosomal and transporter genes (see response to reviewer 2, comment 2).

      In general, we think that the gene expression study not only supports the phenotypic observations that we made in the microfluidic device, such as the increased swimming motility when exposed to digested alginate medium, but  also adds further insights. Our reasoning for studying the transcriptomes in well mixed-batch cultures was the inability to study gene expression dynamics to support the phenotypic observations about differential motility and chemotaxis in our microfluidics setup. The transcriptomic data clearly show that even in well-mixed environments, growth on digested alginate instead of alginate is sufficient to increase the expression of motility and chemotaxis genes. In addition, the finding that expression of alginate lyases and metabolic genes is increased during growth on digested alginate was revealed through the analysis of transcriptomes, something which would not have been possible in the microfluidic setup. We agree with the reviewer that our analyses implicate further, perhaps unexpected, mechanisms like quorum sensing in the cellular response to breakdown products, and that this represents an interesting avenue for further studies.

      Finally, we  also agree with the reviewer that it would be good to be more explicit in the text that our microfluidic system cannot fully capture the complex dynamics of natural environments. Our approach does, however, allow the characterization of cellular behaviors at spatial and temporal scales that are relevant to the interactions of bacteria, and thus provides a better understanding of colonization and dispersal of marine bacteria in a manner that is not possible through in situ experiments. We have edited our manuscript to highlight this and modified our statements regarding carbon cycling towards emphasizing the role degradation-dispersal cycles in remineralization of polysaccharides (see response to reviewer 1, comment 2).  

      Reviewer #2 (Public Review):

      [...]

      Weaknesses: 

      The explanation of the microfluidics measurements is somewhat confusing but I think this could be easily remedied. The quantitative interpretation of the dispersal data could also be improved and I'm not clear if the data support the claim made. 

      We thank the reviewer for their comments and helpful suggestions. We have revised the manuscript with these suggestions in mind and believe that the manuscript is improved by a more detailed explanation of the microfluidic setup. We have added more information in the text (detailed in response to reviewer 2, comments 1 and 2) and have added a depiction of the microfluidic setup (Fig. 1A). We have also modified the presentation and discussion of the dispersal data (Fig. 2C), as described in detail below in response to reviewer 2, comment 4, and argue that they clearly show density-dependent dispersal. We believe that this modification of how the results are presented provides a more convincing case for our main conclusion, namely that the presence of degradation products controls bacterial dispersal in a density-dependent manner.  

      Reviewer #3 (Public Review):

      [...]

      Weaknesses: 

      I find this paper very descriptive and speculative. The results of the genetic analyses are quite counterintuitive; therefore, I understand the difficulty of connecting them to the observations coming from experiments in the microfluidic device. However, they could be better placed in the literature of foraging - dispersal cycles, beyond bacteria. In addition, the interpretation of the results is sometimes confusing. 

      We thank the reviewer for their suggestions to improve the manuscript. We have edited the manuscript to interpret the results of this study more clearly, in particular with regard to the fact that breakdown products of alginate cause cell dispersal (see response to reviewer 2, comment 1), gene expression changes of ribosomal proteins and transporters (see response to reviewer 2, comment 2), as well as genes relating to alginate catabolism (see response to reviewer 2, comment 3).

      To provide more context for the interpretation of our results we now also embed our findings in more detail in the previous work on foraging strategies and dispersal tradeoffs.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should clarify in more detail what they mean by density dependence in Figure 2. Usually density dependence refers to a per capita dependence, but here it seems that the per capita rate of dispersal might be roughly independent of density (Figure 2c; if you double the number of cells it doubles the number of cells leaving). Rather it seems the dispersal is such that the density of remaining cells falls below a threshold (~300 cells). 

      We thank the reviewer for raising this important point. To analyze the data more explicitly in terms of per capita dependence and so make the density dependence in the dispersal from the microfluidic chambers more clear, we have modified Figure 2C and edited the text. 

      In the modified Figure 2C, we computed the fraction of dispersed cells for each chamber (i.e the change in cell number divided by the cell number at the time of the nutrient switch). This quantity directly reveals the per-capita dependence, as mentioned by reviewer 1, and is now represented on the y-axis of Figure 2C instead of the absolute change in cell number. 

      These data demonstrate that the fraction of dispersed cells increases with increasing numbers of cells present in the chamber at the time of switching, with more highly populated chambers showing a higher fraction of dispersed cells. These findings indicate that there is a strong density dependence in the dispersal process.

      As pointed out by reviewer 1, another interesting aspect of the data is the transition at low cell number. The fraction of dispersed cells is negative in the case of the chamber with approximately 70 cells, consistent with no dispersal at this low density, and a moderate density increase as a function of continued growth.  

      In addition to the new analysis presented in Figure 2C, we have modified the paragraph that discusses this result as follows (line 208):

      “We indeed found that the nutrient switch caused a few or no cells to disperse from small cell groups (Fig. 2B), whereas a large fraction of cells from large cell groups dispersed (Fig. 2C). In fact, the e fraction of cells that dispersed upon imposition of the nutrient switch showed a strong positive relationship with the number of cells present, meaning that cells in chambers with many cells were more likely to disperse than cells in chambers with fewer cells (Fig. 2C).”

      (2) The authors should tone down their claims about the carbon cycle in the abstract. I do not believe the results as they stand could be used to understand degradation-dispersal cycles in marine environments relevant to the carbon cycle, since these behaviors have been studied in microfluidic environments which in my understanding are quite different. As such, statements such as "degradation-dispersal cycles are an integral part in the global carbon cycle, we know little about how cells alternate between degradation and motility" and "Overall, our findings reveal the cellular mechanisms underlying bacterial degradation-dispersal cycles that drive remineralization in natural environments" are overstated in the abstract. 

      We appreciate the reviewer’s comments regarding the connections of our work with the carbon cycle. We have now rephrased these statements in our manuscript to describe a potential connection between our work and the marine carbon cycle. The colonization of polysaccharides particles by bacteria and subsequent degradation has been widely acknowledged to play a significant role in controlling the carbon flow in marine ecosystems. (Fenchel, 2002; Preheim et al., 2011; Yawata et al., 2014, 2020). We still refer to carbon flow in the revised manuscript, though cautiously, as microbial remineralization of biomass, which is recognized as an important factor in the marine biological carbon pump (e.g., (Chisholm, 2000; Jiao et al., 2024). As stated in the previous version of the manuscript, the main motivation of our work was to study the growth behaviors of marine heterotrophic bacteria during polysaccharide degradation, especially to understand when bacteria depart already colonized and degraded particles and find novel patches to grow and degrade, a process that is poorly understood. Therefore, it is conceivable that degradation-dispersal cycles do play a role in the flow of carbon in marine ecosystems. However, we acknowledge that the carbon cycle is influenced by a multitude of biological and chemical processes, and the bacterial degradation-dispersal cycle might not be the sole mechanism at play. 

      We also appreciate the reviewer’s comments highlighting that the complexity of natural environments is not fully captured in our microfluidics system. However, our microfluidics setup does allow us to quantify responses and behaviors of microbial groups at high spatial and temporal resolution, especially in the context of environmental fluctuations. Microbes in nature interact at small spatial scales and have to respond to changes in the environment, and the microfluidics setup enables the quantification of these responses. Moreover, dispersal of the bacterium V. cyclitrophicus that we use in our study, has been previously observed even during growth on particulate alginate (Alcolombri et al., 2021), but the cues and regulation controlling dispersal behaviors have been unclear.  Microfluidic experiments have now allowed us to study this process in a highly quantitative manner, and align well with observations from experiments from more nature-like settings. These quantitative experiments on bacterial strains isolated from marine particles are expected to constrain quantitative models of carbon degradation in the ocean (Nguyen et al., 2022).

      We have now adjusted our statements throughout our manuscript to reflect the knowledge gaps in understanding the triggers of degradation-dispersal cycles and their links with carbon flow in marine ecosystems. The revised manuscript, especially, contains the following statements (line 47 and line 60):

      “Even though many studies indicate that these degradation-dispersal cycles contribute to the carbon flow in marine systems, we know little about how cells alternate between polysaccharide degradation and motility, and which environmental factors trigger this behavioral switch.”

      “Overall, our findings reveal cellular mechanisms that might also underlie bacterial degradation-dispersal cycles, which influence the remineralization of biomass in marine environments.”

      (3) The authors should clarify why they think quorum-sensing genes are increased in expression on digested alginate. The authors currently mention that QS could be used to trigger dispersal, but given the timescales of dispersal in Figure 2 (~half an hour), I find it hard to believe that these genes are expressed and have the suggested effect on those timescales. As such I would have expected the other way round - for QS genes to be expressed highly during alginate growth, so that density could be sensed and responded to. Please clarify. 

      We have now clarified this point in the revised manuscript. While the triggering of dispersal by quorum-sensing genes may indeed appear counterintuitive, and the response is rapid (we see dispersal of cells within 30-40 minutes), both observations are in line with previous studies in another model organism Vibrio cholerae. The dispersal time is similar to the dispersal time of V. cholerae cells from biofilms, as described by Singh and colleagues, (Figure 1E of Ref. Singh et al., 2017). In that case, induction of the quorum sensing dispersal regulator HapR was observed during biofilm dispersal within one hour after switch of condition (Fig. 2, middle panel of Ref. Singh et al., 2017). Even though the specific quorum sensing signaling molecules are probably different in our strain (there is no annotated homolog of the hapR gene in V. cyclitrophicus), we observed that the full set of quorum sensing genes was enriched in cells growing on digested alginate (as reported in line 314 and Fig. 4A).

      We have added this information in the manuscript (line 317): 

      “The set of quorum sensing genes was also positively enriched in cells growing on digested alginate (Fig. 4A and S4F, Table S13). This role in dispersal is in agreement with a previous study that showed induction of the quorum sensing master regulator in V. cholerae cells during dispersal from biofilms on a similar time scale as here (less than an hour) [28].”

      Reviewer #2 (Recommendations For The Authors):

      (1) Around line 144 - I don't really understand how you flow alginate through the microfluidic platform. It seems if the particles are transiently going through the microfluidic chamber then the flow rate and hence residence time of the alginate particles will matter a lot by controlling the time the cells have to colonize and excrete enzymes for alginate breakdown. Or perhaps the alginate is not particulate but is instead a large but soluble polymer? I think maybe a schematic of the microfluidic device would help -- there is an implicit assumption that we are familiar with the Dal Co et al device, but I don't recall its details and maybe a graphic added to Figure 1 would help. 

      a. In reviewing the Dal Co paper I see that cells are trapped and the medium flows through channels and the plane where the cells are held. I am still a little confused about the size of the polymeric alginate -- large scale (>1um) particles or very small polymers? 

      We have now provided a detailed description of our microfluidic experimental system. At the start of the experiments, cells are in fact not trapped within the microfluidic device, but grow and can move freely within a chamber designed with dimensions (sub-micron heights) so that growth occurs only as a monolayer. Cells were exposed to nutrients, either alginate or alginate digestion products, both in soluble form (not particles). These compounds were flowed into the device through a main channel, but entered the flowfree growth chambers by diffusion. To make these aspects of our experiments clearer, we have added further information on this in the Materials & Methods section (line 556), added this information in the abstract (line 51), and in the results (line123).

      To make our microfluidic setup clearer, we have followed this advice and added a schematic as Figure 1A and have added more information on the setup to the main text (line 153):

      “In brief, the microfluidic chips are made of an inert polymer (polydimethylsiloxane) bound to a glass coverslip. The PDMS layer contains flow channels through which the culture medium is pumped continuously. Each channel is connected to several growth chambers that are laterally positioned. The dimensions of these growth chambers (height: 0.85 µm, length: 60 µm, width: 90-120 µm) allow cells to freely move and grow as monolayers. The culture medium, containing either alginate or digested alginate in their soluble form, is constantly pumped through the flow channel and enters the growth chambers primarily through diffusion [15,16,4,17,8]. Therefore, the number of cells and their positioning within microfluidic chambers is determined by the cellular growth rate as well as by cell movement4. This setup combined with time-lapse microscopy allowed us to follow the development of cell communities over time.”

      (2) What makes this confusing is the difference between Figure 1C and Figure S2A -- the authors state that the difference in Figure 1C is due to dispersal, but is there flow through the microfluidic device? So what role does that flow through the device have in dispersal? Is the adhesion of the cell groups driven at all by a physical interaction with high molecular weight polymers in the microfluidic devices or is this purely a biological effect? Could this also be explained by different real concentrations of nutrients in the two cases? 

      We realize from this comment that the role of flow of the medium in the microfluidic setup was not clearly addressed in our manuscript. In fact, cells were not exposed to flow, and nutrients were provided to the growth chambers by diffusion. We have added a clearer explanation of this point on line 158:

      “The culture medium, containing either alginate or digested alginate in their soluble form, is constantly pumped through the flow channel and enters the growth chambers primarily through diffusion [15,16,4,17,8]. Therefore, the number of cells and their positioning within microfluidic chambers is determined by the cellular growth rate as well as by cell movement4.“

      One purely physical effect that we anticipate is that a high viscosity of the medium could immobilize cells. To address this point, we measured the viscosity of both alginate and digested alginate and conclude that the increase in viscosity is not strong enough to immobilize cells. We added a statement in the text (line 170)

      “To test the role of increased viscosity of polymeric alginate in causing the increased aggregation of cells, we measured the viscosity of 0.1% (w/v) alginate or digested alginate dissolved in TR media. For alginate, the viscosity was 1.03±0.01 mPa·s (mean and standard deviation of three technical replicates) whereas the viscosity of digested alginate in TR media was found to be 0.74±0.01 mPa·s. Both these values are relatively close to the viscosity of water at this temperature (0.89 mPa·s18) and, while they may affect swimming behavior [19], they are insufficient to physically restrain cell movement [20].”

      as well as a section in the Materials and Methods (line 594):

      “Viscosity of the alginate and digested alginate solution

      We measured the viscosity of alginate solutions using shear rheology measurements. We use a 40 mm cone-plate geometry (4° cone) in a Netzsch Kinexus Pro+ rheometer. 1200 uL of sample was placed on the bottom plate, the gap was set at 150 um and the sample trimmed. We used a solvent trap to avoid sample evaporation during measurement. The temperature was set to 25°C using a Peltier element. We measure the dynamic viscosity over a range of shear rates  = 0.1 – 100 s-1. We report the viscosity of each solution as the average viscosity measured over the shear rates 10 – 100 s-1, where the shear-dependence of the viscosity was low.

      We measured the viscosity of 0.1% (w/V) alginate dissolved in TR media, which was 1.03 +/- 0.01 mPa·s (reporting the mean and standard deviation of three technical replicates.). The viscosity of 0.1% digested alginate in TR media was found to be 0.74+/-0.01 mPa·s. This means that the viscosity of alginate in our microfluidic experiments is 36% higher than of digested alginate, but the viscosities are close to those expected of water (0.89 mPa·s at 25 degree Celsius according to Berstad and colleagues [18]).”

      While our microfluidic setup allows us to track the position and movement of cells in a spatially structured setting, these observations do not allow us to distinguish directly whether the differences in dispersal are a result of purely physical effects of polymers on cells or are a result of them triggering a biological response in cells that causes them to become sessile. It is known that bacterial appendages like pili interact with polysaccharide residues (Li et al., 2003). Therefore, it is quite plausible that cross-linking by polysaccharides can contribute growth behaviors on alginate. However, our analysis of gene expression demonstrates that flagellum-driven motility is decreased in the presence of alginate compared to digested alginate, alongside other major changes in gene expression. In addition, our measures of dispersal show that dispersal of cells when exposed to digested alginate is density dependent. Both observations suggest that the patterns in dispersal are governed by decision-making processes by cells resulting in changes in cell motility, rather than being a product of purely physical interactions with the polymer. 

      The finding that viscosities of both alginate and digested alginate are similar to that of water, suggests that diffusion of nutrients in the growth chambers should be similar. Therefore, we think that the differences in real concentrations of nutrients is likely not contributing to the observed differences in behavior. 

      (3) Why is Figure S1 arbitrary units? Does this have to do with the calibration of LC-MS? It would be better, it seems, to know the concentrations in real units of the monomer at least. 

      We agree with the reviewer that it would have been better to have absolute concentrations for these compounds. However, to calibrate the mass spectrometer signals (ion counts) to absolute concentrations for the different alginate compounds, we would need an analytical standard of known concentration. We are not aware of such a standard and thus report only relative concentrations. We agree that the y-axis label of Figure S1 should not contain ‘arbitrary’ units, as it shows a ratio (of measurements in the same arbitrary units). We have edited the labels of Figure S1 accordingly and the figure legend in line 26 of the Supplemental Material (“Relative concentrations…”).

      (4) Line 188 - density-dependent dispersal. The claim here is that "cells in chambers with many cells were more likely to disperse than cells in chambers with less cells." (my emphasis). Looking at the data in Figure 2C it appears that about 40% of the cells disperse irrespective of the density, before the switch to digested alginate. So it would seem that there is not a higher likelihood of dispersal at higher cell densities. For the very highest cell density, it does appear that this fraction is larger, but I'd be concerned about making this claim from what I understand to be a single experiment. To support the claim made should the authors plot Change in Cell number/Starting Cell number on the y-axis of Fig. 2C to show that the fraction is increasing? It would seem some additional data at higher starting cell densities would help support this claim more strongly. 

      We thank the reviewer for this comment, which is in line with a remark made by reviewer 1 in their comment 1. In response to these two comments (and as described above), we have edited Figure 2C and now have plotted the change in cell number relative to starting cell number at the y axis to directly show the density dependence. We observe a positive (approximately linear) relationship between the fraction of dispersed cells with the number of cells present in the chamber at the time of switching. This indicates that there is a density dependence in the dispersal process, with highly populated chambers showing a higher fraction of dispersed cells. 

      In addition to the change in Figure 2C, we have modified the paragraph around line 208: “We indeed found that the nutrient switch caused a few or no cells to disperse from small cell groups (Fig. 2B), whereas a large fraction of cells from large cell groups dispersed (Fig. 2C). In fact, the e fraction of cells that dispersed upon imposition of the nutrient switch showed a strong positive relationship with the number of cells present, meaning that cells in chambers with many cells were more likely to disperse than cells in chambers with fewer cells (Fig. 2C).”

      The highest cell number at the start of the switch that we include is about 800 cells. The maximum number of cells that can fit into a chamber are ca. 1000 cells. Thus, 800 resident cells are close to the maximal density.

      (5) A comment -- I find the result of significant chemotaxis towards alginate but not the monomers of alginate to be quite surprising. The ecological relevance of this (line 219) seems like an important result that is worth expanding on a bit at least in the discussion. For now, my question is whether the authors know of any mechanism by which chemotaxis receptors could respond to alginate but not the monomer. How can a receptor distinguish between the two? 

      We agree that this result is surprising, given that oligomers can be more easily transported into the periplasm where sensing takes place, and they also provide an easier accessible nutrient source. Indeed, in case of the insoluble polymer chitin it has been shown that chemotaxis towards chitin is mediated by chitin oligomers (Bassler et al., 1991), which was suggested as a general motif to locate polysaccharide nutrient sources (Keegstra et al., 2022). However, a recent study has changed this perspective by showing widespread chemotaxis of marine bacteria towards the glucose-based marine polysaccharide laminarin, but not towards laminarin oligomers or glucose (Clerc et al., 2023). Together with our results on chemotaxis towards alginate (but not significantly toward alginate oligomers) this suggests that chemotaxis towards soluble polysaccharides can be mediated by direct sensing of the polysaccharide molecules.

      As recommended, we expanded the discussion of the ecological relevance and also added more information on possible mechanisms of selective sensing of alginate and its breakdown products (around line 479).:

      “Direct chemotaxis towards polysaccharides may facilitate the search for new polysaccharide sources after dispersal. We found that the presence of degradation products not only induces cell dispersal but also increases the expression of chemotaxis genes. Interestingly, we found that V. cyclitrophicus ZF270 cells show chemotaxis towards polymeric alginate but not digested alginate. This contrasts with previous findings for bacterial strains degrading the insoluble marine polysaccharide chitin, where chemotaxis was strongest towards chitin oligomers53, suggesting that oligomers may act as an environmental cue for polysaccharide nutrient sources55. However, recent work has shown that certain marine bacteria are attracted to the marine polysaccharide laminarin, and not laminarin oligomers56. Together with our results, this indicates that chemotaxis towards soluble polysaccharides may be mediated by the polysaccharide molecules themselves. The mechanism of this behavior is yet to be identified, but could be mediated by polysaccharide-binding proteins as have been found in Sphingomonas sp. A1 facilitating chemotaxis towards pectin57. Direct polysaccharide sensing adds complexity to chemosensing as polysaccharides cannot freely diffuse into the periplasm, which can lead to a trade-off between chemosensing and uptake58. Furthermore, most polysaccharides are not immediately metabolically accessible as they require degradation. But direct polysaccharide sensing can also provide certain benefits compared to using oligomers as sensory cues. First, it could enable bacterial strains to preferably navigate to polysaccharide nutrients sources that are relatively uncolonized and hence show little degradation activity. Second, strong chemotaxis towards degradation products could hinder a timely dispersal process as the dispersal then requires cells to travel against a strong attractant gradient formed by the degradation products. Overall, this strategy allows cells to alternate between degradation and dispersal to acquire carbon and energy in a heterogeneous world with nutrient hotspots [44,59–61].”

      (6) Comment on lines 287-8 -- that the "positive enrichment of the gene set containing bacterial motility proteins matched the increase in motile cells that we observe in Fig 3E." I'm confused about what is meant by the word "matched" here. Is the implication that there is some quantitative correspondence between increased motility in Figure 3 and the change in expression in Figure 4? Or is the statement a qualitative one -- that motility genes are upregulated in the presence of digested alginate? Table S12 didn't help me answer this question. 

      We thank the reviewer for their helpful comment. Our original statement was a qualitative one - observing that gene expression enrichment in genes associated with bacterial motility aligned with our expectations based on the previous observation of an increase in motile cells. We have now changed the wording to highlight the qualitative nature of this statement (line 315):

      “The positive enrichment of the gene set containing bacterial motility proteins aligned with our expectations based on the increase in motile cells that we observed in Figure 3E (Fig. 4A, Table S12).”

      (7) Line 326 - what is the explanation for the production of public enzymes in the presence of digest? How does this square with the previous narrative about cells growing on alginate digest expressing motility genes and chemotaxing towards alginate? It seems like the story is a bit tenuous here in the sense that digested alginates stimulate both motility - which is hypothesized to drive the discovery of new alginate particles - and lyase enzymes which are used to degrade alginate. So do the high motility cells that are chemotaxing towards alginate also express lyases en route? I'm of the opinion that constructing narratives like these in the absence of a more quantitative understanding of the colonization and degradation dynamics of alginate particles presents a major challenge and may be asking more of the data than the data can provide. 

      a. I noted later that this is addressed later around lines 393 in the Discussion section.

      Indeed, the notion that the presence of breakdown products triggers motility and also increases the expression of alginate lyases and other metabolic genes for alginate catabolism seems counterintuitive. We have now expanded our discussion of these results to contextualize these findings (around line 443):

      "One reason for this observation may be that cells primarily rely on intracellular monosaccharide levels to trigger the upregulation of genes associated with polysaccharide degradation and catabolism, as has previously been observed for E. coli across various carbon sources [50,51]. In fact, the majority of carbon sources are sensed by prokaryotes through one‑component sensors inside the cell50. In the one‑component internal sensing scheme, the enzymes and transporters for the use of various carbon sources are expressed at basal levels, which leads to an increase in pathway intermediates upon nutrient availability. The pathway intermediates are sensed by an internal sensor, usually a transcription factor, and lead to the upregulation of transporter and enzyme expression [50,51]. This results in a positive feedback loop, which enables small changes in substrate abundance to trigger large transcriptional responses [50,52]. Thus, the presence of alginate breakdown products may likely result in increased expression of all components of the alginate degradation pathway, including the expression of degrading enzymes. As the gene expression analysis was performed on well-mixed cultures in culture medium containing alginate breakdown products, we therefore expect a strong stimulation of alginate catabolism. In a natural scenario, where cells disperse from a polysaccharide hotspot before its exhaustion, the expression of alginate catabolism genes may likely decrease again once the local concentration of breakdown products decreases. However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients."

      (8) I like Figure 6, and I think this hypothesis is a good result from this paper, but I think it would be important to emphasize this as a proposal that needs further quantitative analysis to be supported. 

      We have now edited the manuscript to make this point more clear. While both degradation and dispersal are well-appreciated parts of microbial ecology, the transitions and underlying mechanisms are unclear. We have edited the discussion to improve the clarity (line 419): 

      “This cycle of biomass degradation and dispersal has long been discussed in the context of foraging e.g., [44,45,13,46,47], but the cellular mechanisms that drive the cell dispersal remain unclear.”

      Also, we have updated Figure 6 to indicate more clearly which new findings this work proposes (now bold font) and which previous findings that were made in different bacterial taxa and carbon sources that aligns with our  work (now light font). We edited the figure legend accordingly (line 503):

      "By integrating our results with previous studies on cooperative growth on the same system, as well as results on dispersal cycles in other systems, we highlight where the specific results of this work add to this framework (bold font)."

      Minor comments 

      (1) Is there any growth on the enzyme used for alginate digestion? E.g. is the enzyme used to digest the alginate at sufficiently high concentrations that cells could utilize it for a carbon/nitrogen source? 

      We thank the reviewer for raising this point. We added the following paragraph as Supplemental Text to address it (line 179):

      “Protein amount of the alginate lyases added to create digested alginate

      Based on the following calculation, we conclude that the amount of protein added to the growth medium by the addition of alginate lyases is so small that we consider it negligible. In our experiment we used 1 unit/ml of alginate lyases in a 4.5 ml solution to digest the alginate. As the commercially purchased alginate lyases are 10,000 units/g, our 4.5 ml solution contains 0.45 mg of alginate lyase protein. The digested alginate solution diluted 45x when added to culture medium. This means that we added 0.18 µg alginate lyase protein to 1 ml of culture medium.

      As a comparison, for 1ml of alginate medium, 1000µg of alginate is added or for 1 ml of Lysogeny broth (LB) culture medium, 3,500 µg of LB are added.  Thus, the amount of alginate lyase protein that we added is ca. 5000 - 20,000 times smaller than the amount of alginate or LB that one would add to support cell growth. Therefore, we expect the growth that the digestion of the added alginate lyases would allow to be negligible.”

      (2) The lines in Figure 2B are very hard to see. 

      We have addressed this comment by using thicker lines in Figure 2B.

      (3) The black background and images in Figure 3A and B are hard to see as well. 

      We have now replaced Figure 3A and B, now using a white background.

      (4) Typo at the beginning of line 251? 

      Unfortunately we failed to find the typo referred to. We are happy to address it if it still exists in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) I think there is not enough experimental evidence to conclude that the underlying cause of increased motility is the accumulation of digested alginate products. To conclusively show that this is the cause and not just some signal linked to cell density, perhaps the experiment should be repeated with a different carbon source. 

      We thank the reviewer for their comment, which made us realize that we did not make the nature of the dispersal cue clear. The gene expression data was obtained from batch cultures and measured at the same approximate bacterial densities in batch, which indeed shows that the digested alginate is a sufficient signal for an increase in motility gene expression. This agrees very well with our observation that cells growing on digested alginate in microfluidic chambers have an increased fraction of motile cells in comparison with cells exposed to alginate (Fig 3E). However, we did not mean to suggest that the observed dispersal by bacterial motility is not influenced by cell density, in fact, we see that dispersal (and hence the increase in cell motility) in microfluidic chambers that are switched from polymeric to digested alginate depends on the bacterial density in the chamber, with higher bacterial densities showing increased dispersal. This shows that the presence of alginate oligomers does trigger dispersal through motility, but this signal affects bacterial groups in a cell density dependent manner.

      Similar observations have been made in Caulobacter crescentus, which was found to form cell groups on the polymer xylan while cells disperse when the corresponding monomer xylose becomes available (D’Souza et al., 2021). We reference the additional work in lines 179 and 230. Taken together, these observations indicate a more general phenomenon in dispersal from polysaccharide substrates.

      (2) About the expression data: 

      • Ribosomal proteins and ABC transporters are enriched in cells grown on digested alginate and the authors discuss that this explains the difference in max growth rate between alginate and digested alginate. However, in Figure S2E the authors report no statistical difference between growth rates. 

      We have now edited the manuscript to clarify this point. We found that cells grown on degradation products reached their maximal growth rate around 7.5 hours earlier (Fig. S2D) and showed increased expression of ribosomal biosynthesis and ABC transporters in late-exponential phase (Fig. 4A). We consider this shorter lag time as a sign of a different growth state and therefore a possible reason for the difference in ribosomal protein expression.

      As the reviewer correctly points out, the maximum growth rates that were computed from the two growth curves were not significantly different (Fig. S2E). However, for our gene expression analysis, we harvested the transcriptome of cells that reached OD 0.39-0.41 (mid- to late-exponential phase). At this time point, the cell cultures may have differed in their momentary growth rate.

      We edited the manuscript to make this clearer (line 287):

      “Both observations likely relate to the different growth dynamics of V. cyclitrophicus ZF270 on digested alginate compared to alginate (Fig. S2A), where cells in digested alginate medium reached their maximal growth rate 7.5 hours earlier and thus showed a shorter lag time (Fig. S2D). As a consequence, the growth rate at the time of RNA extraction (mid-to-late exponential phase) may have differed, even though the maximum growth rate of cells grown in alginate medium and digested alginate medium were not found to be significantly different (Fig. S2E).”

      • The increased expression of transporters for lyases in cells grown on digested alginate (lines 273-274 and 325-328) is very confusing and the explanation provided in lines 412-420 is not very convincing. My two cents on this: Expression of more enzymes and induction of motility might be a strategy to be prepared for more likely future environments (after dispersal, alginate is the most likely carbon source they will find). This would be in line with observed increased chemotaxis towards the polymer rather than the monomer (Similar to C. elegans). 

      This comment is in line with reviewer 2, comment 7. In response to these two comments (and as described above), we expanded our discussion of these results to contextualize these findings (around line 443):

      “One reason for this observation may be that cells primarily rely on intracellular monosaccharide levels to trigger the upregulation of genes associated with polysaccharide degradation and catabolism, as has previously been observed for E. coli across various carbon sources [50,51]. In fact, the majority of carbon sources are sensed by prokaryotes through one‑component sensors inside the cell [50]. In the one‑component internal sensing scheme, the enzymes and transporters for the use of various carbon sources are expressed at basal levels, which leads to an increase in pathway intermediates upon nutrient availability. The pathway intermediates are sensed by an internal sensor, usually a transcription factor, and lead to the upregulation of transporter and enzyme expression [50,51]. This results in a positive feedback loop, which enables small changes in substrate abundance to trigger large transcriptional responses [50,52]. Thus, the presence of alginate breakdown products may likely result in increased expression of all components of the alginate degradation pathway, including the expression of degrading enzymes. As the gene expression analysis was performed on well-mixed cultures in culture medium containing alginate breakdown products, we therefore expect a strong stimulation of alginate catabolism. In a natural scenario, where cells disperse from a polysaccharide hotspot before its exhaustion, the expression of alginate catabolism genes may likely decrease again once the local concentration of breakdown products decreases. However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients.”

      Additionally, we agree with the intriguing comment that continued expression of alginate lyases may also prepare cells for likely future environments. Further studies that aim to answer whether marine bacteria are primed by their growth on one carbon source towards faster re-initiation of degradation on a new particle will be an interesting research question. We now address this point in our manuscript (line 458):

      “However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients.“

      (3) The yield reached by Vibrio on alginate is significantly higher than the yield in digested alginate, not similar, as stated in lines 133-134. Only cell counts are similar. Perhaps the author can correct this statement and speculate on the reason leading to this discrepancy: perhaps cells tend to aggregate in alginate despite the fact that these are well-mixed cultures. 

      We have edited the description of the OD measurements accordingly and agree with the reviewer that aggregation is indeed a possible reason for the discrepancy (line 141):

      “We also observed that the optical density at stationary phase was higher when cells were grown on alginate (Fig. S2B and C). However, colony counts did not show a significant difference in cell numbers (Fig. S3), suggesting that the increased optical density may stem from aggregation of cells in the alginate medium, as observed for other Vibrio species [7].”

      (4) I suggest toning down the importance of the results presented in this study for understanding global carbon cycling. There is a link but at present it is too much emphasized. 

      We have edited our statements regarding the carbon cycle. In the revised manuscript we stress the lack of direct quantifications of carbon cycling. . We still refer to carbon flow in the revised manuscript, as we would argue that microbial remineralization of biomass is recognized as an important factor in the marine biological carbon pump (e.g., Chisholm, 2000) and research on marine bacterial foraging investigates how bacterial cells manage to find and utilize this biomass.

      Our revised manuscript contains the following modified statements (line 47 and line 60): “Even though many studies indicate that these degradation-dispersal cycles contribute to the carbon flow in marine systems, we know little about how cells alternate between polysaccharide degradation and motility, and which environmental factors trigger this behavioral switch.”

      “Overall, our findings reveal cellular mechanisms that might also underlie bacterial degradation-dispersal cycles, which influence the remineralization of biomass in marine environments.”

      References

      • Alcolombri, U., Peaudecerf, F. J., Fernandez, V. I., Behrendt, L., Lee, K. S., & Stocker, R. (2021). Sinking enhances the degradation of organic particles by marine bacteria. Nature Geoscience, 14(10), 775–780. https://doi.org/10.1038/s41561-021-00817-x
      • Bassler, B. L., Gibbons, P. J., Yu, C., & Roseman, S. (1991). Chitin utilization by marine bacteria. Chemotaxis to chitin oligosaccharides by Vibrio furnissii. Journal of Biological Chemistry, 266(36), 24268–24275. https://doi.org/10.1016/S0021-9258(18)54224-1
      • Chisholm, S. W. (2000). Stirring times in the Southern Ocean. Nature, 407(6805), 685–686. https://doi.org/10.1038/35037696
      • Chubukov, V., Gerosa, L., Kochanowski, K., & Sauer, U. (2014). Coordination of microbial metabolism. Nature Reviews. Microbiology, 12(5), 327–340. https://doi.org/10.1038/nrmicro3238
      • Clerc, E. E., Raina, J.-B., Keegstra, J. M., Landry, Z., Pontrelli, S., Alcolombri, U., Lambert, B. S., Anelli, V., Vincent, F., Masdeu-Navarro, M., Sichert, A., De Schaetzen, F., Sauer, U., Simó, R., Hehemann, J.-H., Vardi, A., Seymour, J. R., & Stocker, R. (2023). Strong chemotaxis by marine bacteria towards polysaccharides is enhanced by the abundant organosulfur compound DMSP. Nature Communications, 14(1), 8080. https://doi.org/10.1038/s41467-023-43143z
      • Dal Co, A., van Vliet, S., Kiviet, D. J., Schlegel, S., & Ackermann, M. (2020). Shortrange interactions govern the dynamics and functions of microbial communities. Nature Ecology and Evolution, 4(3), 366–375. https://doi.org/10.1038/s41559-019-1080-2
      • D’Souza, G., Ebrahimi, A., Stubbusch, A., Daniels, M., Keegstra, J., Stocker, R., Cordero, O., & Ackermann, M. (2023). Cell aggregation is associated with enzyme secretion strategies in marine polysaccharide-degrading bacteria. The ISME Journal. https://doi.org/10.1038/s41396-023-01385-1
      • D’Souza, G. G., Povolo, V. R., Keegstra, J. M., Stocker, R., & Ackermann, M. (2021). Nutrient complexity triggers transitions between solitary and colonial growth in bacterial populations. The ISME Journal, 15(9), 2614–2626. https://doi.org/10.1038/s41396-021-00953-7
      • D’Souza, G., Schwartzman, J., Keegstra, J., Schreier, J. E., Daniels, M., Cordero, O. X., Stocker, R., & Ackermann, M. (2023). Interspecies interactions determine growth dynamics of biopolymer-degrading populations in microbial communities. Proceedings of the National Academy of Sciences of the United States of America, 120(44), e2305198120. https://doi.org/10.1073/pnas.2305198120
      • Fenchel, T. (2002). Microbial Behavior in a Heterogeneous World. Science, 296(5570), 1068–1071. https://doi.org/10.1126/science.1070118
      • Jiao, N., Luo, T., Chen, Q., Zhao, Z., Xiao, X., Liu, J., Jian, Z., Xie, S., Thomas, H., Herndl, G. J., Benner, R., Gonsior, M., Chen, F., Cai, W.-J., & Robinson, C. (2024). The microbial carbon pump and climate change. Nature Reviews Microbiology. https://doi.org/10.1038/s41579-024-01018-0
      • Keegstra, J. M., Carrara, F., & Stocker, R. (2022). The ecological roles of bacterial chemotaxis. Nature Reviews Microbiology, 20(8), 491–504. https://doi.org/10.1038/s41579-022-00709-w
      • Konishi, H., Hio, M., Kobayashi, M., Takase, R., & Hashimoto, W. (2020). Bacterial chemotaxis towards polysaccharide pectin by pectin-binding protein. Scientific Reports, 10(1), 3977. https://doi.org/10.1038/s41598-020-60274-1
      • Li, Y., Sun, H., Ma, X., Lu, A., Lux, R., Zusman, D., & Shi, W. (2003). Extracellular polysaccharides mediate pilus retraction during social motility of Myxococcus xanthus. Proceedings of the National Academy of Sciences, 100(9), 5443–5448. https://doi.org/10.1073/pnas.0836639100
      • Martínez-Antonio, A., Janga, S. C., Salgado, H., & Collado-Vides, J. (2006). Internal sensing machinery directs the activity of the regulatory network in Escherichia coli. Trends in Microbiology, 14(1), 22–27. https://doi.org/10.1016/j.tim.2005.11.002
      • McDougald, D., Rice, S. A., Barraud, N., Steinberg, P. D., & Kjelleberg, S. (2012). Should we stay or should we go: Mechanisms and ecological consequences for biofilm dispersal. Nature Reviews Microbiology, 10(1), 39–50. https://doi.org/10.1038/nrmicro2695
      • Nguyen, T. T. H., Zakem, E. J., Ebrahimi, A., Schwartzman, J., Caglar, T., Amarnath, K., Alcolombri, U., Peaudecerf, F. J., Hwa, T., Stocker, R., Cordero, O. X., & Levine, N. M. (2022). Microbes contribute to setting the ocean carbon flux by altering the fate of sinking particulates. Nature Communications, 13(1), 1657. https://doi.org/10.1038/s41467-022-29297-2
      • Norris, N., Alcolombri, U., Keegstra, J. M., Yawata, Y., Menolascina, F., Frazzoli, E., Levine, N. M., Fernandez, V. I., & Stocker, R. (2022). Bacterial chemotaxis to saccharides is governed by a trade-off between sensing and uptake. Biophysical Journal, 121(11), 2046–2059. https://doi.org/10.1016/j.bpj.2022.05.003
      • Povolo, V. R., D’Souza, G. G., Kaczmarczyk, A., Stubbusch, A. K., Jenal, U., & Ackermann, M. (2022). Extracellular appendages govern spatial dynamics and growth of Caulobacter crescentus on a prevalent biopolymer. bioRxiv, 2022.06.13.495907. https://doi.org/10.1101/2022.06.13.495907
      • Preheim, S. P., Boucher, Y., Wildschutte, H., David, L. A., Veneziano, D., Alm, E. J., & Polz, M. F. (2011). Metapopulation structure of Vibrionaceae among coastal marine invertebrates. Environmental Microbiology, 13(1), 265–275. https://doi.org/10.1111/j.1462-2920.2010.02328.x
      • Schwartzman, J. A., Ebrahimi, A., Chadwick, G., Sato, Y., Orphan, V., & Cordero, O. X. (2021). Bacterial growth in multicellular aggregates leads to the emergence of complex lifecycles. bioRxiv, 2021.11.01.466752. https://doi.org/10.1101/2021.11.01.466752
      • Singh, P. K., Bartalomej, S., Hartmann, R., Jeckel, H., Vidakovic, L., Nadell, C. D., & Drescher, K. (2017). Vibrio cholerae Combines Individual and Collective Sensing to Trigger Biofilm Dispersal. Current Biology, 27(21), 3359-3366.e7. https://doi.org/10.1016/j.cub.2017.09.041
      • Ulrich, L. E., Koonin, E. V., & Zhulin, I. B. (2005). One-component systems dominate signal transduction in prokaryotes. Trends in Microbiology, 13(2), 52–56. https://doi.org/10.1016/j.tim.2004.12.006
      • Wall, M. E., Hlavacek, W. S., & Savageau, M. A. (2004). Design of gene circuits: Lessons from bacteria. Nature Reviews Genetics, 5(1), 34–42. https://doi.org/10.1038/nrg1244
      • Yawata, Y., Carrara, F., Menolascina, F., & Stocker, R. (2020). Constrained optimal foraging by marine bacterioplankton on particulate organic matter. Proceedings of the National Academy of Sciences, 117(41), 25571–25579. https://doi.org/10.1073/pnas.2012443117
      • Yawata, Y., Cordero, O. X., Menolascina, F., Hehemann, J.-H., Polz, M. F., & Stocker, R. (2014). Competition–dispersal tradeoff ecologically differentiates recently speciated marine bacterioplankton populations. Proceedings of the National Academy of Sciences, 111(15), 5622–5627. https://doi.org/10.1073/pnas.1318943111
      • Zöttl, A., & Yeomans, J. M. (2019). Enhanced bacterial swimming speeds in macromolecular polymer solutions. Nature Physics, 15(6), 554–558. https://doi.org/10.1038/s41567-019-0454-3
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reviewer #1 (Evidence, reproducibility, and clarity (Required)):____ __ Summary: Viruses exploit host endoplasmic reticulum (ER)-resident chaperones to support new protein synthesis during viral replication. Here, Najarro et al. study the role of the ER-resident HSP70 family member Binding immunoglobulin protein (BiP) during lytic infection by the Kaposi's sarcoma-associated herpesvirus (KSHV). Using the established doxycycline-inducible lytic reactivation infection model cell line iSLK-BAC16, they showed that KSHV reactivation leads to an upregulation of total BiP protein but not RNA, and is independent of the unfolded protein response. siRNA knockdown or pharmacological inhibition by HA15 of BiP significantly reduced global viral gene expression and infectious virus production. The authors attribute this to at least the reduction of levels of the K1 gene which is required for efficient viral replication. Finally, they showed that HA15 has cytostatic activity in KSHV-transformed B cells and cytotoxic effects in KSHV-infected lymphatic endothelial cells arguing for BiP inhibition as a potential therapeutic strategy to treat KSHV-driven malignancies. The manuscript is well-written and the conclusions were generally supported by the data with a few exceptions below.

      Major comments:

      • They propose in lines 196-199 that the reduction of K1 from HA15 treatment partially explains the defect in virion production during lytic reactivation. I am not convinced that this statement is fully supported by their data. Reduction of K1 is likely a downstream consequence and not the cause of the inhibition of lytic replication.

      We thank the reviewer for this comment. We conducted a more detailed analysis of our RNAseq data in iSLK.219 cells and confirmed the downregulation of the K1 transcript in latently infected cells treated with HA15 (See Fig 3 and Sup Fig 5). It is likely that the drop in transcript levels results from IRE1-mediated degradation in a recently-described process known as RIDDLE (IRE1-mediated RNA decay lacking endomotif), in which IRE1 depletes mRNAs1*. We have included this hypothesis in the discussion. *

      Unfortunately, we cannot confirm the downregulation of K1 at the protein level in iSLK.219 cells since the antibodies are highly specific for K1 variants in PEL cells. To overcome this technical limitation, we conducted mass spectrometry analysis of the viral proteome from whole cell lysates of latent and lytic cells undergoing HA15 treatment. While we detect the expected global downregulation of viral proteins in lytic cells treated with HA15, we were not able to detect any viral proteins except for LANA in the latently infected cells, and our detection of several lytic proteins was limited. We speculate that the levels of latent viral proteins expressed in iSLK.219 cells are below the limits of detection of our assay, or that extensive modification of some of these viral proteins may hinder their detection. Due to these limitations, we decided not to include these data in the manuscript.

      • Additionally, we note that the lower levels of K1 detected in latent iSLK.219 and TREx-BCBL-1 cells treated with HA15 may affect viral reactivation, which is consistent with findings from the Damania lab showing K1's crucial role in viral replication2.*

      • *

      • The quantification of the K1 blots in Fig. 3C only has n=2. With subtle differences by eye, large error bars, and no statistical analysis, it is hard to conclude here with confidence. *

      We agree with the reviewer. We have moved the K1 blot to the Sup. Fig. 3E and adjusted the text accordingly.* *

      • Like K1, ORF45, and K8.1 proteins are similarly decreased at 24 h in Fig. 2E, suggesting that the defect is upstream of K1. Does HA15 affect the amount of endogenous and/or transgene copy of RTA being produced (hence the broader effect in early gene expression at 24h?)?

      • **To answer the Reviewer's query, we re-evaluated the impact of HA15 treatment on the activity of dox-inducible RTA. However, we think it is unlikely for HA15 to alter RTA activity since RTA does not enter the secretory pathway. *

      To evaluate the activity of RTA in HA15 treated cells, we measured the expression of the viral episome-encoded RFP reporter, driven by the viral PAN promoter4*, at 24h post-doxycycline treatment of iSLK.219 cells. We compared the response of the PAN promoter to RTA in cells treated with or without HA15 at this early timepoint, to avoid any potential confounding effects stemming from elevated endogenous RTA expression at later times post-reactivation. We demonstrate that the levels of RFP in iSLK.219 cells treated with Dox are identical in presence or absence of HA15. This result, included in Sup. Fig. 3, indicates that the activity of RTA, crucial for initiating the lytic cycle in this context, is unaffected by BiP inhibition at early times post reactivation. *

      • *

      • K1 levels appear to decrease even during latency. Are the other latent proteins also affected? What about latent genome copies?

      To address this query, we compared the Log2 fold change of latent transcripts (K1, K2, K12, ORF71, ORF72, ORF73) in the iSLK.219 RNAseq data set (Fig 3). Only the K1 transcript is reduced in HA15-treated cells. We include these data in Sup Fig 5A.

      Regarding differences in genome copies, the consistent levels of the viral genome-encoded GFP in HA15 -/+ iSLK-219 cells (Sup Fig 3) indicate no significant changes in the levels of viral genomes at 24h post-treatment (prior to DNA replication). Previous studies by our lab and others show that knockdown of the major latency protein LANA results in episomal loss and lower levels of GFP5*. These results validate the use of GFP fluorescence in iSLK.219 as a proxy for genome copies. *

      • *

      • Fig. 3C was performed in a PEL cell line which they showed to enter cytostasis upon HA15 treatment (Fig. 5). This cytostasis (rather than K1) may be the root cause of the defect in viral replication as cells could be arrested at a different stage compared to the G2 requirement for lytic replication in PEL cells (Balisteri et al., PLOS Pathogens 2016, PMID: 26891221).

      See point 2. below

      • The cytostatic effect in PEL cell lines (Fig. 5) should be demonstrated using more direct methods that measure cell cycle (e.g. PI-BrdU).

      We thank the reviewer for this comment. While more direct methods to measure the cell cycle stage affected by HA15 treatment will inform on its mechanism of action, these experiments lie outside of the scope of this manuscript and we consider are better suited for future studies on the anticancer properties of HA15. The data presented in Fig. 5 demonstrates that HA15 treatment of PEL cells causes a reduction in cell numbers without cytotoxicity, thus supporting our conclusion of a net negative effect on proliferation rather than cell death. The loss of our LN2 tank and PEL cell lines currently limits our ability to do these more detailed analyses. At the moment, we do not have an accurate estimate of how long it will take to replace these cell lines for our subsequent studies.

      • *

      • While having an uninfected B cell as a matched negative control for PEL is challenging, primary peripheral B cells (mostly of mature memory B cell stage) may not be the appropriate negative control. PEL cells are of plasma cell lineage which have unusually high protein translation and overloaded ER. The plasma cell lineage may explain the sensitivity of PEL cells to HA15. It is possible that HA15 may be toxic to plasma cells when used as a therapeutic agent.

      We agree with the reviewer on the potential impact of HA15 on plasma cell viability. Indeed, HA15 (>2uM) treatment reduces the viability of plasma cell myeloma lines (NCI-H929 and U266 cells), substantiating its use as a potential anti-cancer drug6. Although HA15 has not been tested as a therapeutic agent in humans, studies in mice have demonstrated tolerability without evident toxicity, measured as normal body weight7*. The potential therapeutic application of HA15 for cancer warrants further investigation and is beyond the scope of our manuscript. *

      • Does HA15 have cytostatic effects in uninfected or latently infected iSLK cells?

      • *

      We observed no cytostatic or cytotoxic effects in uninfected or latently infected iSLK cells exposed to up to 30uM of HA15. Although HA15 has been tested on various cancer types8*, it has not been evaluated in Renal Carcinoma Cells (RCC), the cellular background of iSLK.219 cells. The mechanism behind the resistance of these cells to HA15 eludes us, but its link to the cellular background of iSLK.219s merits exploration in future studies. *

      Minor comments: 1. Consider changing the title of line 98 to specify cell type since BiP levels do not increase in BCBL-1 (Supp. Fig. 3).

      • *

      Revised in the manuscript

      Fig. 3A may benefit from using z-scores instead of log2TPM so differences are more obvious per gene.

      Since the data have already been collected, can the authors include both latent and lytic cells with and without HA15 treatment in Fig. 3A? It may give more information for the reader. *

      *We have reanalyzed all the RNAseq data and included a z-score plot for all samples in Fig. 3. We also providing three new supplementary tables with the raw counts, the z-scores for viral genes, and the log2 of the normalized counts.

      *

      *Reviewer #1 (Significance (Required)):

      Significance: Here, the authors convincingly demonstrate the proviral role of the ER chaperone BiP during KSHV reactivation. This manuscript will be relevant to researchers in the gammaherpesvirus field. Although the authors did present some interesting data, the scope is narrow, and mechanistic studies were not pursued that would have added more insight in BiP and/or KSHV biology. For instance, how do BiP protein levels increase during reactivation (is this at the level of RNA sequestration/export, translation, or protein stability?)? How does BiP promote lytic replication?

      Field of expertise: KSHV, molecular and cell biology

      *

      * __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ Many viruses have complex relationships with cellular ER proteostasis machinery that remain poorly understood. Here Najarro, et al. report on studies of the oncogenic gammaherpesvirus KSHV. They report that the ER chaperone BiP is upregulated in epithelial cells during KSHV lytic replication. Unexpectedly, BiP upregulation is independent of the unfolded protein response, which stimulates transcriptional activation of BiP to meet the protein folding demand in the ER. Using a combination of genetic and pharmacologic approaches (CRISPRi and selective chemical inhibitor) they demonstrate that BiP inhibition interferes with the replication of diverse enveloped viruses including poxviruses and several herpesviruses, and reduces proliferation of KSHV-infected cells.

      Figure-by-figure:

      Fig. 1: This figure convincingly demonstrates the selective upregulation of BiP at the protein level during the course of KSHV lytic replication, and that KSHV late genes are dispensable for this upregulation. It further demonstrates that BiP is not upregulated at the mRNA level at all during KSHV infection, despite the fact that the UPR-dependent BiP mRNA upregulation pathway (presumably via ATF6 and IRE1) remains functional.

      Fig. 2: This figure convincingly demonstrates that BiP ATPase activity is required to support KSHV lytic replication in both epithelial and B cell models on infection, even though it is also clear that BiP is not upregulated in the B cell model.

      Fig. 3: This data demonstrates that steady-state levels of KSHV lytic gene products are reduced following HA15-treatment, whereas later gene expression was unaffected. As an interesting side note, v-IL6 bucks the trend of HA15-mediated downregulation of viral mRNA levels, suggesting that it may be regulated in a different manner. One thing that the authors may consider is the report from Drs. Yuan Chang and Patrick Moore (PMID: 12434062) that demonstrated that the v-IL6 gene is transactivated by type I interferon. Considering the poor replication of this virus during HA15 treatment, it may be valuable to investigate IFN production by these cells, and the extent to which it is impacted by inhibition of BiP ATPase activity.*

      We thank the reviewer for bringing this report to our attention. We also found intriguing the specific transcriptional upregulation of IL6 in IFN-a treated BCP-1 cells. Although we see a dramatic upregulation of the vIL6 in HA15 treated cells, we still detect the expression of most viral genes, albeit at significantly lower levels than in untreated cells, which indicates that the viral transcriptional program in lytic+HA15 iSLK.219 cells is different from the one seen in IFN-treated BCP-1 cells. Preliminary analyses of the host transcriptome from our RNAseq results show the expression of several ISGs (OAS1, 2 and 3, IFI6, IFIT1, IFIT3, IFITM1) in lytic-untreated iSLK.219 cells, but not in those treated with HA15. Together, these observations substantiate the notion that there is no IFN-driven expression of vIL6 in HA15-treated iSLK.219 cells.

      Fig. 4: This figure demonstrates that HA15 has broad, non-cytotoxic, antiviral activity against diverse enveloped viruses.

      Figs. 5/6: These figure shows cytotoxic effects of HA15 on latently infected PEL cells, either solely infected with KSHV or co-infected with KSHV and EBV, whereas normal B cells were unaffected. HA15 was also cytotoxic to KSHV infected lymphatic endothelial cells.

      **Referees cross-commenting**

      I appreciate the insightful comments from Reviewer #1 and Reviewer #3. I think we are largely on the same page. The data is generally supportive of author's conclusions, with a few exceptions that are straightforward to address in revisions. The manuscript is limited in scope, which could also be addressed by additional experimentation if the authors are motivated to explore mechanism in greater depth. Of particular note is the lack of mechanistic insight into how BiP is upregulated at the protein level during lytic replication, if the mRNA is unchanged. The experimental approaches to this are straightforward.

      *

      *

      We appreciate the reviewers' comments on the scope of our study. The mechanism of BiP upregulation remains an outstanding question for the following technical reasons: We hypothesized that the upregulation of BiP may depend on the IRES element present in its 5' UTR9. We tested this hypothesis by transfecting iSLK.219 cells with a bicistronic Renilla-(BiP)IRES-Firefly luciferase reporter from Licursi et. al10*. Unfortunately, for reasons that still elude us, our reactivation rates in transfected cells were consistently low in all of our experiments and therefore, we were not able to measure luciferase changes consistently and reliably. A potential workaround this technical limitation is to use a lentivirus-encoded IRES reporter to a lentiviral vector, as transduction of iSLK.219 cells does not alter viral reactivation, in our experience. At the moment, we do not have access to these reporters due to our lab's move to a different institution, and the first author of our study has started the next stage of their career. Therefore, we will not be able to pursue these experiments in a timely manner. *

      • *

      *As for the scope of this manuscript, even when the mechanism of BiP upregulation in KSHV infected cells remains unsolved, we consider that the broad-spectrum antiviral effect of BiP inhibition is an exciting finding that advances the field and benefits the virology community-the proteostasis network has been seldomly explored as a potential node for broad-spectrum antiviral intervention. Our results provide important proof-of-concept to continue the investigation of factors involved in protein synthesis, folding and transport as potential targets for the development of versatile broad-spectrum antivirals. *

      Reviewer #2 (Significance (Required)):

      Strengths: This is a well-written manuscript. The text and figures are clear and accurate and the methods are sufficiently informative that the study can be reproduced. The data generally supports the authors' conclusions. BiP appears to be a druggable target with minimal off-target cytotoxicity in normal, uninfected cells, although this study does not go beyond cell culture studies to validate in vivo.

      Weaknesses: The study is somewhat limited in scope. The authors make the case for UPR transcription-independent upregulation of BiP during KSHV infection, and that late gene synthesis is dispensable, but the mechanism is not investigated further.

      Point by point discussion:

      Could an early KSHV gene product involved in this phenotype be identified by screening an ORF library or viral genome-wide CRISPRi screen?

      The question of the viral protein responsible for the upregulation of BiP during lytic infection is indeed a fascinating one. However, we suspect that the mechanism may be not specifically directed to BiP, but rather general modulation of IRES-related translation. Identifying the gene product(s) affected and corroborating IRES involvement is a major undertaking and a long-term goal requiring considerable effort. These analyses are outside the scope of this manuscript, but we will pursue them in the future.

      Or, beyond implicating viral factors in the mechanism of BiP upregulation, can some simple biochemical studies be performed to investigate BiP protein? Is the BiP mRNA more efficiently spliced and exported in KSHV infected cells?

      Do alternative translation initiation mechanisms like eIF2A play a role in boosting BiP levels during infection?

      What is the normal BiP protein turnover mechanism, and is this hindered during KSHV lytic replication? Is BiP AMPylation/de-AMPylation by FICD affected (PMID: 36041787)? These kinds of mechanistic studies are well within reach and would help extend the impact and interest to a broad audience.

      We agree on the putative involvement of translation initiation factors like eIF2A on promoting the translation of BiP (see discussion). We tested the effect of siRNA-mediated KD of eIF2A on BiP expression and found that, interestingly, the levels of BiP rose above those of controls in latent iSLK.219 cells (Data included in the manuscript and the discussion has been modified accordingly). This finding aligns with previous reports suggesting that eIF2A may suppress IRES-mediated translation in yeast cells and in mammalian in vitro translation assays. Moreover, Starck et. al11, observed a 50% increase of endogenous BiP levels in HeLa cells transfected with siRNAs against eIF2A, supporting the IRES-suppressor role for eIF2A in mammalian cells. Future work will be required to address the role of eIF2A on BiP translation. These analyses are beyond the scope our manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Najarro et al. investigates the contribution of BiP/GRP78 to double-stranded DNA virus infection, primarily focusing on the oncogenic gammaherpesvirus Kaposi's sarcoma-associated herpesvirus (KSHV). The authors observe that BiP expression is increased in lytic iSLK.219 cells as well as in KSHV-infected LECs. Interestingly, the authors data suggest a post-translational regulation of BiP in the iSLK.219 cells. Using various knockdown approaches and chemical inhibitors the authors demonstrate that inhibition of BiP impacts KSHV reactivation in multiple cells lines. Importantly, the authors also find that BiP inhibition can selectively kill KSHV-infected cells, while sparing primary B cells. Overall, this is a very well controlled and presented manuscript. My comments for the manuscript are minor, and largely cosmetic to aid the presentation of the data.

      • Fig 1C, It would be ideal to show that PAA treatment did indeed prevent the virus from entering the late stage of gene expression.

      *We have included an immunoblot for K8.1 in Figure 1C to confirm the effect of PFA on arresting the KSHV lytic cycle. *

      Sup Fig2, should show KD efficiency of XBP1, same goes for ATF6.

      • *

      Sup. Fig. 2D shows the expression of XBP1s in NS vs. XBP1KD cells in the presence or absence of Tg. In Sup Fig. 2G we have also included a bar graph showing the efficiency of downregulation of ATF6 mRNA in the presence of the targeting sgRNA.

      Sup Fig 3. It is interesting that the authors do not see increased BiP in TREx-BCBL1-RTA cells. A potential caveat is that lytic reactivation in TREx-BCBL1-RTA cells is not as efficient as in iSLK.219 cells. Therefore, it may simply be a result of the reduced population entering the lytic cycle. It may be worth adding a comment regarding this.

      • Images of the microscopy for Figure 4 would be useful.

      Images have been included in Fig. 4

      • Add label of the cell types for Figure 5.

      DONE

      • Does HSV1, HCMV, or VacV increase BiP expression compared to mock-infected cells?

      Yes, we have included a comment on this in the discussion

      Reviewer #3 (Significance (Required)):

      Overall, this is a very well controlled and presented manuscript.

      • *

      • *

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Many viruses have complex relationships with cellular ER proteostasis machinery that remain poorly understood. Here Najarro, et al. report on studies of the oncogenic gammaherpesvirus KSHV. They report that the ER chaperone BiP is upregulated in epithelial cells during KSHV lytic replication. Unexpectedly, BiP upregulation is independent of the unfolded protein response, which stimulates transcriptional activation of BiP to meet the protein folding demand in the ER. Using a combination of genetic and pharmacologic approaches (CRISPRi and selective chemical inhibitor) they demonstrate that BiP inhibition interferes with the replication of diverse enveloped viruses including poxviruses and several herpesviruses, and reduces proliferation of KSHV-infected cells.

      Figure-by-figure:

      Fig. 1: This figure convincingly demonstrates the selective upregulation of BiP at the protein level during the course of KSHV lytic replication, and that KSHV late genes are dispensable for this upregulation. It further demonstrates that BiP is not upregulated at the mRNA level at all during KSHV infection, despite the fact that the UPR-dependent BiP mRNA upregulation pathway (presumably via ATF6 and IRE1) remains functional.

      Fig. 2: This figure convincingly demonstrates that BiP ATPase activity is required to support KSHV lytic replication in both epithelial and B cell models on infection, even though it is also clear that BiP is not upregulated in the B cell model.

      Fig. 3: This data demonstrates that steady-state levels of KSHV lytic gene products are reduced following HA15-treatment, whereas later gene expression was unaffected. As an interesting side note, v-IL6 bucks the trend of HA15-mediated downregulation of viral mRNA levels, suggesting that it may be regulated in a different manner. One thing that the authors may consider is the report from Drs. Yuan Chang and Patrick Moore (PMID: 12434062) that demonstrated that the v-IL6 gene is transactivated by type I interferon. Considering the poor replication of this virus during HA15 treatment, it may be valuable to investigate IFN production by these cells, and the extent to which it is impacted by inhibition of BiP ATPase activity.

      Fig. 4: This figure demonstrates that HA15 has broad, non-cytotoxic, antiviral activity against diverse enveloped viruses.

      Figs. 5/6: These figure shows cytotoxic effects of HA15 on latently infected PEL cells, either solely infected with KSHV or co-infected with KSHV and EBV, whereas normal B cells were unaffected. HA15 was also cytotoxic to KSHV infected lymphatic endothelial cells.

      Referees cross-commenting

      I appreciate the insightful comments from Reviewer #1 and Reviewer #3. I think we are largely on the same page. The data is generally supportive of author's conclusions, with a few exceptions that are straightforward to address in revisions. The manuscript is limited in scope, which could also be addressed by additional experimentation if the authors are motivated to explore mechanism in greater depth. Of particular note is the lack of mechanistic insight into how BiP is upregulated at the protein level during lytic replication, if the mRNA is unchanged. The experimental approaches to this are straightforward.

      Significance

      Strengths: This is a well-written manuscript. The text and figures are clear and accurate and the methods are sufficiently informative that the study can be reproduced. The data generally supports the authors' conclusions. BiP appears to be a druggable target with minimal off-target cytotoxicity in normal, uninfected cells, although this study does not go beyond cell culture studies to validate in vivo.

      Weaknesses: The study is somewhat limited in scope. The authors make the case for UPR transcription-independent upregulation of BiP during KSHV infection, and that late gene synthesis is dispensable, but the mechanism is not investigated further. Could an early KSHV gene product involved in this phenotype be identified by screening an ORF library or viral genome-wide CRISPRi screen? Or beyond implicating viral factors in the mechanism of BiP upregulation, can some simple biochemical studies be performed to investigate BiP protein? Is the BiP mRNA more efficiently spliced and exported in KSHV infected cells? Do alternative translation initiation mechanisms like eIF2A play a role in boosting BiP levels during infection? What is the normal BiP protein turnover mechanism, and is this hindered during KSHV lytic replication? Is BiP AMPylation/de-AMPylation by FICD affected (PMID: 36041787)? These kinds of mechanistic studies are well within reach and would help extend the impact and interest to a broad audience.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "Engineering of PAClight1P78A: A High-Performance Class-B1 GPCR-Based Sensor for PACAP1-38" by Cola et al. presents the development of a novel genetically encoded sensor, PAClight1P78A, based on the human PAC1 receptor. The authors provide a thorough in vitro and in vivo characterization of this sensor, demonstrating its potential utility across various applications in life sciences, including drug development and basic research.

      The diverse methods to validate PAClight1P78A demonstrate a comprehensive approach to sensor engineering by combining biochemical characterization with in vivo studies in rodent brains and zebrafish. This establishes the sensor's biophysical properties (e.g., sensitivity, specificity, kinetics, and spectral properties) and demonstrates its functionality in physiologically relevant settings. Importantly, the inclusion of control sensors and the testing of potential intracellular downstream effects such as G-protein activation underscore a careful consideration of specificity and biological impact.

      Strengths:

      The fundamental development of PAClight1P78A addresses a significant gap in sensors for Class-B1 GPCRs. The iterative design process -starting from PAClight0.1 to the final PAClight1P78A variant - demonstrates compelling optimization. The innovative engineering results in a sensor with a high apparent dynamic range and excellent ligand selectivity, representing a significant advancement in the field. The rigorous in vitro characterization, including dynamic range, ligand specificity, and activation kinetics, provides a critical understanding of the sensor's utility. Including in vivo experiments in mice and zebrafish larvae demonstrates the sensor's applicability in complex biological systems.

      Weaknesses:

      The manuscript shows that the sensor fundamentally works in vivo, albeit in a limited capacity. The titration curves show sensitivity in the nmol range at which endogenous detection might be possible. However, perhaps the sensor is not sensitive enough or there are not any known robust paradigms for PACAP release. A more detailed discussion of the sensors's limitations, particularly regarding in vivo applications and the potential for detecting endogenous PACAP release, would be helpful.

      We thank the reviewer for carefully analyzing our in vivo data and highlighting the limitation of our results regarding the sensor’s applicability in detecting endogenous PACAP. We added several sections conversing future possibilities for optimization in the discussion (see paragraphs 2-4). We agree that a more specific discussion of the limitations of our study is an important addition to help design future experiments. 

      There are several experiments with an n=1 and other low single-digit numbers. I assume that refers to biological replicates such as mice or culture wells, but it is not well defined. n=1 in experimental contexts, particularly in Figure 1, raises significant concerns about the exact dynamic range of the sensor, data reproducibility, and the robustness of conclusions drawn from these experiments. Also, ROI for cell cultures, like in Figure 1, is not well defined. The methods mentioned ROIs were manually selected, which appears very selective, and the values in Figure 1c become unnecessarily questionable. The lack of definition for "ROI" is confusing. Do ROIs refer to cells, specific locations on the cell membrane, or groups of cells? It would be best if the authors could use unbiased methods for image analysis that include the majority of responsive areas or an explanation of why certain ROIs are included or excluded.

      We thank the reviewer for the helpful suggestions. We have increased the number of replicates to n=3 for both HEK293T and neuron data depicted in Fig.1c. Furthermore, we have added Fig.1c’ containing the quantification of the maximum responses obtained in the dataset shown in Fig.1c also depicting the single values for each replicate. To clarify the definition of an ROI in our manuscript, we have detailed the process of ROI selection in the Methods section “Cell culture, imaging and quantification section”. Additionally, we also increased mouse numbers for in vivo PACAP infusions in mice (see Figure 4g).

      Reviewer #2 (Public Review):

      Summary:

      The PAClight1 sensor was developed using an approach successful for the development of other fluorescence-based GPCR sensors, which is the complete replacement of the third intracellular loop of the receptor with a circularly-permuted green fluorescent protein. When expressed in HEK cells, this sensor showed good expression and a weak but measurable response to the extracellular presence of PACAP1-38 (a

      F/Fo of 43%). Additional mutation near the site of insertion of the linearized GPF, at the C-terminus of the receptor, and within the second intracellular loop produced a final optimized sensor with F/Fo of >1000%. Finally, screening of mutational libraries that also included alterations in the extracellular ligand-binding domain of the receptor yielded a molecule, PAClight1P78A, that exhibited a high ligand-dependent fluorescence response combined with a high differential sensitivity to PACAP (EC50 30 nM based on cytometric sorting of stably transfected HEK293 cells) compared to its congener VIP, (with which PACAP shares two highly related receptors, VPAC1 and VPAC2) as well as several unrelated neuropeptides, and significantly slowed activation kinetics by PACAP in the presence of a 10-fold molar excess of the PAC1 antagonist PACAP6-38. A structurally highly similar control construct, PAClight1P78Actl, showed correspondingly similar basal expression in HEK293 cells, but no PACAP-dependent enhancement in fluorescent properties.

      PAClight1P78A was expressed in neurons of the mouse cortex via AAV9.hSyn-mediated gene transduction. Slices taken from PAClight1P78A-transfected cortex, but not slices taken from PAClight1P78Actl-transfected cortex exhibited prompt and persistent elevation of F/Fo after 2 minutes of perfusion with PACAP1-38 which persisted for up to 14 minutes and was statistically significant after perfusion with 3000, but not 300 or 30 nM, of peptide. Likewise, microinfusion of 200 nL of 300 uM PACAP1-38 into the cortex of optical fiber-implanted freely moving mice elicited a F/Fo (%) of greater than 15, and significantly higher than that elicited by application of similar concentrations of VIP, CRF, or enkephalin, or vehicle alone. In vivo experiments were carried out in zebrafish larvae by the introduction of PAClight1P78A into single-cell stage Danio rerio embryos using a Tol2 transposase-based plasmid with a UAS promoter via injection (of plasmid and transposase mRNA), and sorting of post-fertilization embryos using a marker for transgenesis carried in the UAS :

      PAClight1P78A construct. Expression of PAClight1P78A was directed to cells in the olfactory bulb which express the fish paralog of the human PAC1 receptor by using the Tg(GnRH3:gal4ff) line, and fluorescent signals were elicited by intracerebroventricular administration of PACAP1-38 at a single concentration (1 mM), which were specific to PACAP and to the presence of PAClight1P78A per se, as controlled by parallel experiments in which PAClight1P78Actl instead of PAClight1P78A was contained in the transgenic plasmid.

      Major strengths and weaknesses of the methods and results

      The report represents a rigorous demonstration of the elicitation of fluorescent signals upon pharmacological exposure to PACAP in nervous system tissue expressing PAClight1P78A in both mammals (mice) and fish (zebrafish larvae). Figure 4d shows a change in GFP fluorescence activation by PACAP occurring several seconds after the cessation of PACAP perfusion over a two-minute period, and its persistence for several minutes following. One wonders if one is apprehending the graphical presentation of the data incorrectly, or if the activation of fluorescence efficiency by ligand presentation is irreversible in this context, in which case the utility of the probe as a real-time indicator, in vivo, of released peptide might be diminished.

      We thank the reviewer for their careful consideration of our manuscript and agree that the activation of PAClight persisting for several minutes at micromolar concentrations could be a potential limitation for in vivo applications. We added a possible explanation for the persisting sensor activation in response to artificial application of PACAP38 in paragraph 3 of the discussion. We agree that this addition eases the interpretation of PAClight signals detected in vivo. 

      Appraisal of achievement of aims, and data support of conclusions:

      Small cavils with controls are omitted for clarity; the larger issue of appraisal of results based on the scope of the designed experiments is discussed in the section below. An interesting question related to the time dependence of the PACAP-elicited activation of PAClight1P87A is its onset and reversibility, and additional data related to this would be welcome.

      We agree that the reversibility of the sensor’s fluorescence is indeed an important feature especially for detecting endogenous PACAP release. Our data indicate that the sensor’s fluorescence is reversible when detecting small to medium doses of PACAP38 (see Figure 4d – Application of 30-300nM) that are presumably closer to physiological concentrations than the non-reversible concentration of 3000nM. Please, see also our new discussion on peptide concentrations in paragraph 4 of our discussion. For future experiments, it is indeed advisable to adjust the interval of repeated applications to the decay of the response at the respective concentration. Considering, the long-lasting downstream effects of endogenous signaling, longer intervals between ligand applications are generally preferred to match more closely the physiological range in which endogenous PAC1 is most likely affective. 

      Discussion of the impact of the work, and utility of the methods and data:

      Increasingly, neurotransmitter function may be observed in vivo, rather than by inferring in vivo function from in vitro, in cellular, or ex vivo experimentation. This very valuable report discloses the invention of a genetically encoded sensor for the class B1 GPCR PAC1. PAC1 is the major receptor for the neuropeptide PACAP, which in turn is a major neurotransmitter involved in brain response to psychogenic stress, or threat, in vertebrates as diverse as mammals and fishes. If this sensor possesses the sensitivity to detect endogenously released PACAP in vivo it will indeed be an impactful tool for understanding PACAP neurotransmission (and indeed PACAP action in general, in immune and endocrine compartments as well) in future experiments.

      However, the sensor has not yet been used to detect endogenously released PACAP. Until this has been done, one cannot answer the question as to whether the levels of exogenously perfused/administered PACAP used here merely to calibrate the sensor's sensitivity are indeed unphysiologically high. If endogenous PACAP levels don't get that high, then the sensor will not be useful for its intended purpose. The authors should address this issue and allude to what kind of experiments would need to be done in order to detect endogenous PACAP release in living tissue in intact animals. The authors could comment upon the success of other GPCR sensors that have been used to observe endogenous ligand release, and where along the pathway to becoming a truly useful reagent this particular sensor is.

      We thank the reviewer for highlighting the lack in clarity that the scope of this paper was not intended to cover the detection of endogenous PACAP release. We therefore expanded our discussion to encompass the intended purpose of detecting artificially infused or applied PAC1 agonists, such as conducting fundamental tests of drug specificity and developing new pharmacological ligands to selectively target PAC1. This includes a more detailed discussion of our in vivo findings and a clearer phrasing that stresses the potential application for applied drugs and not endogenous PACAP (see last paragraph in the discussion).

      We also agree that little is known about endogenous concentrations of PACAP in the brain. However, we have supplemented our discussion with several references estimating lower concentrations of PACAP and other peptides in vivo, suggesting average PACAP levels below the detection threshold of the sensor. Importantly, within certain brain regions and in closer proximity to release sites, significantly higher concentrations might be reached. Additionally, our data indicate that the concentrations observed under our current conditions do not saturate the sensor in vivo.  

      We therefore acknowledge the reviewer’s comment on the sensor’s potential limitations under our current experimental conditions. Hence, we expanded our discussion and suggest the use of higher resolution imaging to potentially reveal loci of high PACAP concentrations, which should be validated by future studies (see also our added discussion in paragraph 4). 

      Reviewer #3 (Public Review):

      Summary:

      The manuscript introduces PAClight1P78A, a novel genetically encoded sensor designed to facilitate the study of class-B1 G protein-coupled receptors (GPCRs), focusing on the human PAC1 receptor. Addressing the significant challenge of investigating these clinically relevant drug targets, the sensor demonstrates a high dynamic range, excellent ligand selectivity, and rapid activation kinetics. It is validated across a variety of experimental contexts including in vitro, ex vivo, and in vivo models in mice and zebrafish, showcasing its utility for high-throughput screening, basic research, and drug development efforts related to GPCR dynamics and pharmacology.

      Strengths:

      The innovative design of PAClight1P78A successfully bridges a crucial gap in GPCR research by enabling realtime monitoring of receptor activation with high specificity and sensitivity. The extensive validation across multiple models emphasizes the sensor's reliability and versatility, promising significant contributions to both the scientific understanding of GPCR mechanisms and the development of novel therapeutics. Furthermore, by providing the research community with detailed methodologies and access to the necessary viral vectors and plasmids, the authors ensure the sensor's broad applicability and ease of adoption for a wide range of studies focused on GPCR biology and drug targeting.

      Weaknesses

      To further strengthen the manuscript and validate the efficacy of PAClight1P78A as a selective PACAP sensor, it is crucial to demonstrate the sensor's ability to detect endogenous PACAP release in vivo under physiological conditions. While the current data from artificial PACAP application in mouse brain slices and microinfusion in behaving mice provide foundational insights into the sensor's functionality, these approaches predominantly simulate conditions with potentially higher concentrations of PACAP than naturally occurring levels.

      We thank the reviewer for their valuable comments and agree that the use of PAClight for detecting endogenous PACAP will be of big interest for the scientific community and should be a goal for future research. Considering the time, equipment and additional animal licenses necessary, we are convinced that these questions would go beyond the scope of the current paper and might rather be addressed in a follow-up publication. We therefore rephrased the discussion and added more details to clarify further the intended purpose of the current study. Additionally, we added a paragraph in the discussion suggesting experiments needed to validate PAClight for putative future in vivo applications. 

      Although the sensor's specificity for the PAC1 receptor and its primary ligand is a pivotal achievement, exploring its potential application to other GPCRs within the class-B1 family or broader categories could enhance the manuscript's impact, suggesting ways to adapt this technology for a wider array of receptor studies. Additionally, while the sensor's performance is convincingly demonstrated in short-term experiments, insights into its long-term stability and reusability in more prolonged or repeated measures scenarios would be valuable for researchers interested in chronic studies or longitudinal behavioral analyses. Addressing these aspects could broaden the understanding of the sensor's practical utility over extended research timelines.

      We extend our gratitude to the reviewer for diligently assessing our results. 

      Indeed, the very high level of sensitivity that we could achieve in PAClight leads us to think that potentially a grafting-based approach, such as the one we’ve recently described for class-A GPCR-based sensors (PMID: 37474807) could also work for the direct generation of multiple class-B1 sensors based on the optimized fluorescent protein module present in PAClight. Unfortunately, considering the amount of work that testing this hypothesis would entail, we are not able to perform these experiments in the context of this revision, and would rather pursue them as a future project. Nevertheless, we have expanded the discussion of the manuscript with a paragraph with these considerations.

      While we lack comprehensive data on the long-term stability of the sensor, our preliminary findings from photometry recordings optimization indicate consistent baseline expression of PAClight and PACLight ctrl over several weeks. Conducting experiments to systematically assess stability would require several months, which is currently impractical due to limitations in tools and licenses for repeated in vivo infusions. Hence, we intend to include these experiments in potential follow-up studies.

      Furthermore, the current in vivo experiments involving microinfusion of PACAP near sensor-expressing areas in behaving mice are based on a relatively small sample size (n=2), which might limit the generalizability of the findings. Increasing the number of subjects in these experimental groups would enhance the statistical power of the results and provide a more robust assessment of the sensor's in vivo functionality. Expanding the sample size will not only validate the findings but also address potential variability within the population, thereby reinforcing the conclusions drawn from these crucial experiments.

      We agree with the reviewer that a sample size of N=2 is not sufficient for in vivo recordings. We therefore increased the sample size and now present recordings with 5 PAClight1P78A and 4 PACLight-control mice. Of note, the new data validate our previous findings and conclusions and give a better idea of the variability in vivo that we now discuss in much more detail in the discussion (see paragraph 2). 

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      The lower potency of maxadilan activation might reflect broader implications for ligand-receptor dynamics. Perhaps the authors could discuss the maxadilan binding from a structural perspective, including AlphaFold models. Also, discussing how these findings might influence sensor application in diverse biological contexts would be insightful. Clear definitions and consistent use of these terms are crucial for ensuring that readers understand the methods and results.

      We would like to thank the reviewer for the comments. As part of this work, we did not obtain a dose-response curve for maxadilan peptide, and only reported the maximal response of the sensor to a high concentration of the peptide (10 µM). Thus, our findings would rather inform us on the maximal efficacy of the peptide, as opposed to its potency towards the PAC1R. Furthermore, we would like to point out that due to the lack of structural details for any GPCR-based sensor published to date, we cannot make any molecularly accurate conclusion regarding the precise reasons why a different ligand (in this case the sandfly maxadilan) induces a lower maximal efficacy of the response compared to the endogenous cognate ligand of the receptor. We do not believe that AlphaFold models can accurately replace structural information in this regard, especially given the consideration that the aminoacid linker regions between the GPCR and the fluorescent protein, which are a critical determinant of allosteric chromophore modulation by ligand-induced conformational changes, typically obtain the lowest confidence score in all AlphaFold predicted structural models of GPCR-based sensors. Finally, we would like to refer the reviewer to a very nice recent publication (PMID: 32047270) which resolved the structures of each of these peptides bound to the PAC1 receptor-Gs protein complex, which provides accurate molecular details on the different modalities of receptor binding and activation by PACAP138  versus maxadilan.

      Reviewer #2 (Recommendations For The Authors):

      The authors are congratulated on the meticulous achievement of their aim, i.e. a fluorescence-based sensor for the detection of PACAP with in vivo utility. Whether or not this sensor will have the requisite sensitivity to detect the release of endogenous PACAP within various regions of the nervous system, in response to specific environmental stimuli or changes in brain or physiological state, remains to be determined.

      We thank the reviewer for the very positive evaluation of our manuscript and for the suggested additions that will improve the strength of our arguments.

      We agree that the in vivo detection of endogenous PACAP will be an important objective for future studies. Due to time, resource and animal license constraints, we are not able to address this objective in our current study, but we now detail possible future experiments in the discussion section. Please see also our answer to the suggested discussion points previously.

      Reviewer #3 (Recommendations For The Authors):

      To comprehensively assess the sensor's sensitivity and specificity to endogenous PACAP, I recommend conducting additional in vivo experiments where PAClight1P78A is expressed in neurons that endogenously express the Pac1r receptor (using Adcyap1r1-Cre mouse line). These experiments should involve applying sensory or emotional stimuli known to evoke PACAP release or activating upstream PACAP-expressing neurons. Such studies would offer valuable data on the sensor's performance under natural physiological conditions and its potential utility for exploring PACAP's roles in vivo.

      We express our gratitude to the reviewer for providing detailed methodological approaches to examine endogenous PACAP release. These suggestions will prove invaluable for future investigations and are important additions to a follow-up publication. As mentioned earlier, we have incorporated some of these approaches into our discussion. Additionally, we have underscored the existing limitations in detecting endogenous PACAP in vivo and emphasized the relevance of PAClight for drug development purposes.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): Throughout, the authors claim that there is a cross-talk between UPRmt and SG. This is unsubstantiated and unclear.

      We strongly disagree this comment. Throughout the manuscript, we show how manipulating UPRmt signalling affects SG formation, and how manipulating SG assembly alters mitochondrial functions and UPRmt-associated mitochondrial ouputs. In addition, both other reviewers are supportive of our conclusions.

      Major: Link between UPRmt and stress granules:

      The authors claim a link between the UPRmt and stress granule formation based on the finding that the loss of ATF5 affects the expression of UPRmt markers, but not ISR markers. Yet, the authors actually show that GTPP-induced SGs form in a manner independent of ATF5 (Supp. Fig. 2). Thus, there is no data in the manuscript that substantiates this claim.

      In the revised manuscript, we show that reducing ATF5 level results in defective SG assembly, with SGs displaying small size and more numerous, reflecting a maturation defect (Sup Figure 6B, 6C and 6D). In addition, we show a clear dependence of SGs to PERK activation (see comment below) and a specific increase of the ISR main negative regulator GADD34 (Figure 2A and 2B). Therefore, we disagree with this reviewer's conclusion and provide data supporting a link between UPRmt and SG formation.

      PERK-mediated activation of the ISR. The authors claim that PERK mediates activation of the ISR following GTPP treatment. However, the experiments in Fig. 2E were done 1h after treatment. The authors in Fig. 1C nicely show that SG formation begins at 2h. Thus, it is possible that following a longer GTPP treatment (i.e. >2h) the ISR is activated by different branches; for example, the mitochondrial branch that is mediated by HRI. Thus, the authors should determine which kinase mediates ISR activation at the time point that SG formation is maximal.

      We apologise if the description of the experimental procedure was unclear. These experiments are performed at 2h post GTPP treatment as explained in the text (see line 222) and legend (see lines 715-717, Figure 2 legend), and therefore performed at a time of maximal SG induction. Therefore, the identification of PERK as the driver for eIF2α-P and SG formation is performed at a time point where SG formation is maximal.

      Role of SG-linked decrease in cellular adaptation to stress. The finding that SGs limit mitochondrial respiration is interesting. Presumably this promotes cellular adaptation to mitochondrial stresses. The authors should test whether G3BP1/2 DKO cells are more susceptible to death following longer GTPP treatments.

      We thank the reviewer for this comment. These data are presented in Figure 8, where we show that G3BP1/2 dKO cells are less viable compared to wild-type cells following GTPP treatment for up to 28 hours.

      Minor: Fig. 2C should be moved to supplemental as well as the data indicated the lack of ISR inhibition.

      Figure 2C is now supplementary Figure 3.

      Fig. 3A should have representative images of all conditions from Fig. 3B.

      This has now been included as supplementary Figure 4.

      IFAs in Fig. 3 and 4 are hard to interpret given both DAPI and G3BP1 are in shades of blue. Ideally, insets of a merged panel should show each individual panel.

      We adopted the combination cyan, magenta and clue for our images to make scientific figures accessible to readers with red/green color-blindness. For these figures, G3BP1 is in light cyan and DAPI in dark blue, a colour we adopted previously in three publications (PMID 36965618, PMID 35098996, PMID 31905230), allowing colour blind reader to appreciate the results.

      Reviewer #1 (Significance (Required)): The link between the UPRmt and SGs is interesting and would be an advance. However, the authors put forward data that indicates SGs form in an UPRmt (ATF5)- independent manner. An interesting aspect of this story for which there is data is that SGs limit mitochondrial function. This should be explored further (i.e. although it limits mitochondrial respiration, perhaps SGs protect mitochondria against chronic ISR stress).

      As suggested we now provided an extensive amount of additional data supporting a role in mitochondrial functions, with data demonstrating that the absence of SGs rescues cell viability (Figure 8A and 8B), restoring mitochondrial functions such as respiration, ATP production (Figure 6D, 6E and 6F) or translation (Figure 7A), and reducing the production mitochondrial ROS (Figure 6C) or mitochondrial fragmentation (Figure 6A and 6B).

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Summary: The article by Lopez-Nieto Jordana et al entitled "Activation of the mitochondrial unfolded protein response regulates the dynamic formation of stress granules" describes the identification of a novel cross talk between the mitochondrial unfolded protein response (UPRmt) and the integrated stress response (ISR) and the contributory role SG regulation plays in mitochondrial function and adaptation to stress. This manuscript presents data highlighting that activation of the UPRmt results in the temporal modulation of SG formation via GADD34 levels and further this analysis by suggesting that these levels of GADD34 may enable cells to be protected from prolonged stress.

      Minor comments: This is a very well written manuscript with beautifully presented data. There are some inconsistencies/typos with the abbreviation GTPP- this needs to be checked within the manuscript but examples are on Lines: 204/206/214/324/328/357.

      This has now been corrected throughout.

      Check reference list for inconsistencies; line 680 reference has no page numbers, line 718 reference has no issue or page numbers

      This has now been corrected, references curated throughout.

      Line 255 - is it correct to say induction here? I think impairment should be used.

      This has now been corrected, see lines 283-284.

      Cell type not mentioned in Fig 2 legend.

      This has now been corrected, see line 707.

      Errors in Fig 4 legend - 4F, G do not exist.

      This has now been corrected, see lines 748-750.

      Major comments: In figure 1- the GTPP treatment only results in 25% of cells showing SGs compared with 80% in Ars treated cells. While the activation of ISR markers by GTPP treatment is convincing (in Figure 2A), What happens to overall protein synthesis levels in these cells? Puromycin incorporation assays would be a useful addition here.

      We now show in Figure 1D that GTPP treatment result in a global reduction in translation, and that cells displaying SGs present with a stronger shut-off when compared with treated cell lacking SGs.

      Fig. 1A - ATF4 upregulation is lower in ATF5 siRNA treated cells - what is % uptake of the siRNA in these cells - also see comment below. If possible, it would be nice to see the re-localisation of ATF5 to the nucleus to confirm the UPRmt activation of this protein

      These are experiments that we had planned to perform, however in our hands none of the commercially available antibodies allowed us to determine with confidence the localisation of ATF5. We have not determined the uptake of ATF5 siRNA but show by qPCR a reduction in ATF5 mRNA levels following siRNA treatment (see Figure 1A).

      Does the dispersal of SGs also correlate with a recovery of protein synthesis- there is still a relatively high level of eIF2alph-P at the 8h (from Figure 2A).

      We have not performed these experiments as we do not believe they would have added depth to our study. It is well accepted that SG disassembly results in mRNA re-entry in polysomes and the restart of translation (PMID: 30664789). SGs disappear a few minutes before translation is resumed.

      In Figure 2A the 30 min treatment of GTPP induces a robust level of eIF2α-P yet SGs are only observed following the induction of ATF4/GADD34 at 2h. Puromycin incorporation assays may also be able to shed light on the lack of SG inductions at this stage. The formation of SGs around the time when ATF4 and GADD34 are induced seems counterintuitive and should be commented on.

      As commented in response to an earlier point, our analysis shows that GTPP result in a global reduction in translation level, the assembly of SGs in a subpopulation of cells (as reported also in the context of many viral infection) may reflect cell-specific differences in the levels of eIF2α kinases and/or differences in reaching the threshold needed for eIF2α phosphorylation to induce SG assembly (as shown in PMID 30674674 and PMID 35319985).

      In line 207-208 you state that "PERK is the main eIF2α kinase responsive to GTTP. Overall, these results suggest that induction of the UPRmt is associated with an early SG assembly and ISR activation through PERK." Does the PERK inhibitor inhibit the formation of SG following GTTP treatment? # This is now shown in Figures 2E and 2F. Indeed pharmacological inhibition of PERK following GTPP treatment resulted in inhibition of SG assembly.

      Additionally, does GTPP activation of the UPRmt also induce an oxidative stress and therefore activate an additional EIF2AK such as HRI? If so could be the reason you don't get formation of SGs following Ars treatment? Have you considered what would happen if you used the UV stress which activates GCN2 followed by Ars treatment?

      As shown on Figures 2D and 2E, we could not detect contribution from the other eIF2a kinases GCN2 and PKR following GTPP treatment; and Figures 2E, 2F demonstrate that PERK inhibition is sufficient to revert eIF2a phosphorylation and ablate SG induction, as noted in the response to the point above. This strongly suggest that the eIF2a kinase HRI does not contribute to eIF2a signalling, however we do not exclude in the broader sense (beyond eIF2a signalling) an induction of oxidative during UPRmt activation. Furthermore, as shown in Figure 2D, A-92 treatment reduced p-eIF2a levels in response to UV treatment but not those induced by GTPP therefore we can exclude a contribution from GCN2. If we understand correctly, this reviewer asks what would happen if cells were UV-stressed to activate GCN2 followed by oxidative stress with arsenite. This is outside the scope of this manuscript, but based on our previous work showing that mRNA GADD34 mRNA levels act as the molecular memory of the ISR and drives cell adaptation to acute and chronic stress, we would expect that the response to a second pulse of stress would be dampened by the sustained level of GADD34 mRNA induced following the first stress (see PMID 35319985). In these previous studies we already demonstrated that induction of p-eIF2a and SGs by a first acute stress (heat shock or thapsigargin) impairs the induction of p-eIF2a and SGs by a second acute (heat shock or arsenite) or chronic (HCV infection) stress (PMID 35319985, see Figure 6; PMID: 38602876, see Figure 7).

      Overall, this and the response to the previous comment strongly support that PERK activation, and the resulting induction of GADD34, are responsible for SG regulation following GTPP treatment.

      In Figure 3, for the paraquat experiments have you missed the transient induction of SGs by only looking at 48h? You already have GADD34 levels high here so SGs/eIF2α-P levels will already be lowered.

      We have now included additional timepoints, see supplementary Figure 5, showing the absence of SGs at 1, 2, 6 and 24h post paraquat treatment, to complement the 48h treatment previously shown.

      In addition, when analysing GTPP + Ars treatment impact on SG formation (Fig 2B), could the 2 h GTPP + Ars data also be included, as this is the peak time for SG induction by GTPP

      This is now included in Figure 3B.

      In line 211 you refer to the early and late stages of the stress, how have these been defined? It seems that the ability of the UPRmt to be protective to an additional stressor is time dependent- the number of SGs that are present following the additional stress increases from 4-8h. Does this correlate with a decrease in the level of GADD34?

      We define early and late to the time points corresponding to induction (early) or disassembly (late) of SGs. Also see lines 227-230.

      In line 254 you state that ATF5 silencing didn't impact the ISR or SG formation? These data suggest that the formation of SGs is not a direct impact of activation of the UPRmt but rather activation of the cellular ISR possibly due to the proteotoxic and/or oxidative stress? Can the authors comment on this?

      We now show in supplementary Figure 6 that reducing the expression of ATF5 results in defects in SG maturation with GTPP treatment resulting in more numerous and smaller SGs. Moreover, it should be noted that HSF1, in addition to ATF5, is a key controller of UPRmt induction and future studies could aimed at dissecting the role of HSF1 in the SG-UPRmt crosstalk (discussed in lines 459-461).

      In Figure 4, If GADD34 was driving the loss of SGs in GTPP treated cells why are SGs not persistent in these KO cells. Please comment on this.

      Two phosphatases are known to catalyse eIF2a-P dephosphorylation, GADD34 and CReP. The current model proposes that GADD34, which is induced following stress, acts in a negative feedback loop to resolve cellular stress. In contrast, CReP is constitutively expressed and controls basal P-eIF2α levels independently from stress levels (PMID 27161320). In recent work, we have shown that when GADD34 expression is silenced, CReP takes over to revert eIF2a -P and therefore disassemble SGs (PMID: 38602876). This work also showed that CreP is stress-induced in the absence of GADD34. Therefore, in Figure 4 we can speculate that the absence of SGs in GTPP treated KO cells is due to the ability of CReP to compensate for the absence of GADD34. In the context of GTPP treatment followed by arsenite, GADD34 is important to increase the threshold at which SGs can form, altering the response to a second pulse of stress.

      In addition, in these GADD34KO cells there should also be a persistent level of eIF2α-P when treated with GTPP and Pq, there is some as evidenced by the quantification but this is not very convincing

      As noted here, we do provide evidence of sustained levels of eIF2a-P in cells treated with GTPP at least, the results of independent experiments (n=3) showing persistent phosphorylation when compared treatment in GADD34 KO relative to WT cells. But as noted in the point above the likely activity of CReP can compensate for the lack GADD34, and therefore dampen the amount of eIF2a phosphorylation observed.

      Fig 4B shows no cells exhibiting SG following 4h GTPP treatment, which does not correlate with other experiments in the original cell line, e.g. supp 2B - please explain. Can GTPP still activate the UPR-mt in this CRISPR control cell line

      GTPP still activates the UPRmt in the CRISPR control cell line has shown by the inhibition of arsenite-induced SGs assembly when cells are pre-treated with GTPP for 4h (Figure 4A). However, we have noted that the timings of the response to GTPP can vary slightly, impacting on the exact SG kinetics, depending on the purity of the drug (synthetised through organic routes by our collaborator Dr Altieri), with the SG peak either at 2 h or at 4 h post-GTPP treatment. Potentially live imaging of SGs in control and GADD34 KO cells would alleviate this caveat, however in the time frame of the rebuttal, further engineering of GADD34 KO and parental lines into G3BP1/2 knock-outs / GFP-G3BP1 knock-ins was not achievable.

      In Figure 5, of the 80% of SG still present in GTPP treated Sil SGs- was size or frequency impacted here too as in Pq treatment? # These data are now provided, see Figure 5C and in the result section lines 325-329. These show that GTPP treatment resulted in a reduction in average size of silvestrol-induced SGs, from 0.98 μm2 to 0.9 μm2, and increased average number of SGs, from 18 to 22, when compared to non-treated cells. Additionally, we also quantified features of Ars-induced SGs in GTPP-pretreated cells, data provided in Figure 3C and in the result section lines 245-250. The analysis showed that as paraquat, GTPP pre-treatment also impacts size and frequency of arsenite-induced SGs.

      This is just for clarification but If GTPP is a hsp90 inhibitor, is it specific to mitochondrial Hsp90 proteins?

      Indeed GTPP is specific to mitochondrial Hsp90.

      In the last results section the authors suggest that G3BP1/2 KO cells unable to assemble SGs present with improved mitochondrial function during stress. Firstly, is the UPRmt activated in these KO cells? Could the increased activity just be a consequence of the cells not being able to sense the stress and adapt? Are these cells able to recover from the GTPP stress to the same extent as the wt? Do they die at later timepoints? If you inhibited the disassembly of SGs using DYRK3 inhibitors would you decrease mitochondrial activity? # The figure below confirms the upregulation of UPRmt genes mRNA levels after GTPP treatment in U2OS G3BP1/2 dKO (rebuttal Figure 1). We did not include this in the main manuscript given it is figure heavy already and this did not add depth to our results. Our extensive additional analysis shows that cells unable to assemble SGs present with multiple restored mitochondrial functions following UPRmt induction, including increased ATP production (Fig 6D), and respiration (FIG 6E, 6F), reduced mitochondrial ROS level (Fig 6C) and fragmentation (Fig 6A, 6B). These all support a model in which SG assembled following UPRmt induction contribute to impaired mitochondrial function and that their inhibition/disassembly is necessary to restore mitochondrial homeostasis.

      Rebuttal Figure 1: RT-qPCR analysis of the UPRmt and ISR markers DNAJA3, HSPD1, CHOP and ATF4 mRNA levels in U2OS cells treated with GTPP for up to 6 h. Results shown representative of n=3, normalised to RPL9 mRNA and shown relative to DMSO.

      Reviewer #2 (Significance (Required)): Significance: This is an interesting and clearly important observation providing mechanistic insight into the role SGs may play in the cells control of mitochondrial function during stress. The functional role of SGs in disease and stress is still widely unknown and this manuscript therefore sheds light on how the cell may use SGs to modulate and adapt to mitochondrial stress. This is an exciting area of research that will be applicable to a large audience as SGs are implicated in a wide range of diseases. While the data is significant there are currently a number of important experiments required to strengthen the current observational analysis. Below are some minor and major comments linked to the manuscript. # We thank the reviewer for highlighting the importance of our work in an 'exciting area of research'.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): As it stands, this study will be suited for a specialized cell biology journal. In order to be published in a journal of a broader readership, the authors would need to address two major points:

      1. Mitochondrial dysfunction affects cellular function in many ways. Reduced levels of ATP, oxidative stress by increased ROS levels and mitochondrial precursor proteins that challenge proteostasis in the cytosol are just three major consequences of mitochondrial defects. Arguably, for the generation of stress granules, it will be important which of these consequences of mitochondrial dysfunction are prevalent. Since mitochondrial dysfunction is an ill-defined umbrella term, this study would be stronger if the authors could link stress granule formation to the specific molecular defects that arise from specific inhibition of mitochondrial functions.

      We agree with this reviewer that mitochondrial dysfunction can take many shapes and therefore to address their comment we have now performed an extensive amount of additional experiments probing various aspects of mitochondrial functions. In addition to the data previously included we can now show to that inhibition of SG formation during UPRmt induction result in increased cell viability (Figure 8A-B), restoring mitochondrial functions such as respiration, ATP production (Figure 6C-F) or translation (Figure 7A), and reduce mitochondrial ROS (Figure 6C) or fragmentation (Figure 6A-B). These all support a model in which SGs assembled following UPRmt induction contribute to impaired mitochondrial function and that their inhibition/disassembly is necessary to restore mitochondrial homeostasis.

      1. Also stress granules are an umbrella term. Different treatments will presumably change the spectrum of transcripts that are sequestered in these granules. As mitochondrial defects remodel the transcription and translation of mitochondrial precursor proteins, the study would benefit from a comprehensive analysis of the spectrum of transcripts that are contained in granules induced by GTPP and sodium arsenite, respectively.

      Previous studies, including our own, have demonstrated that indeed different stress (or infections) can result in the assembly of compositionally distinct SGs (or SG-like foci) that sequester specific subset of mRNAs or proteins. These studies are based on affinity purification or proximity ligation approaches followed by multi-omics analysis of SG components by RNA-seq and mass spectrometry. While we agree with this reviewer that determining the composition of UPRmt-induced SGs could help understand their function, we believe these studies are outside the scope of the current manuscript, and this would instead form the basis of subsequent study and manuscript.

      Reviewer #3 (Significance (Required)): The study is interesting but descriptive. It confirms previous observations. The advance in mechanistic insights is limited. Nevertheless, the study is technically sound and of interest for a specialized readership. As it stands, the study might be published in a specialized journal. In order to be of general interest for a large and general readership, the authors will have to provide much more mechanistic and molecular insight, which will require at least another six months of work.

      We have now produced an extensive additional body of work to answer specific comments made by all three reviewers, bolstering our hypothesis, and delving deeper into the impact of SG assembly on mitochondrial functions.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We are grateful for the valuable, constructive comments of the reviewers, which helped to substantially improve the quality of our manuscript. We particularly agree that the original structure of the manuscript was confusing and in parts misleading, since we followed the history of the project, which first identified the RBM39 mediated impact on IRF3 expression, whereas the -omics studies, identifying additional factors, were done at a far later point. Many discrepancies further arose from the low sensitivity of our initial proteomics analysis, which we now repeated, thereby obtaining far more sensitive detection of the key factors we also found in the transcriptomics data.

      We have re-structured the entire manuscript by moving the -omics data from the end of the paper towards the middle and provide similar depth downstream analysis of all relevant key factors identified (RIG-I/MDA5, IFN receptors, STAT1/2), to reduce the focus on IRF3, as suggested. We further changed the title and abstract to reflect this major conceptual change. Thanks to this helpful comment, we think that our manuscript is now conceptually much clearer.

      We further added new data to support the central claims of our manuscript, including a repetition of the proteomics study. Proteomics and transcriptomics now consistently demonstrate the impact of RMB39 knockdown as well as indisulam treatment on several key factors of innate immunity, including IRF3, STAT1/2, RIG-I and MDA5 (now in Fig. 5), with IFNAR2 and IL10RB additionally found in transcriptomics. We provide additional functional evidence that IRF3 is the key factor affected in the TLR3 pathway (IRF3 overexpression, Fig. 6B, C), whereas diminished abundance of RIG-I/MAD5 is equally important in the respective pathway, thereby also affecting NF-κB response (Fig. 6F-I). We further show the functional significance of IFN-receptor/STAT downregulation on type I and III IFN responses (Fig. 7E-G).

      The reviewers also pointed to some datasets showing the expected trends, but in some cases lacking statistical significance, due to variability in knockdown efficiency. We repeated all mentioned datasets with new batches of siRNA with sufficient biological replicates (n=3). We thereby obtained consistent, statistically significant data in all cases. Importantly, all experiments implementing the RMB39.esc control now show consistent rescue (Fig2. A-E).

      To generate a homogenous experimental design for virus infections, we further added new data showing a comparable impact of siRNA knockdown (Fig. 3F) and indisulam treatment (new Fig. 3G) on Sendai virus infection in A549 cells and took this as a rationale to consistently use indisulam for all other infections.

      2. Point-by-point description of the revisions

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __ This manuscript by Li and colleagues examines the role of RBM39 in innate immune signaling. Splicing factor RBM39 was identified through a genome wide screen with a death reporter under control of the IFIT1 promoter that got stimulated with pIC in a TLR3-dependent manner. Besides IFIT1, further experiments showed that RBM39 is also involved in optimal expression of other innate immunity genes like IFNB, CXCL10, RIG-I or MDA5. While NFkB-dependent genes seem not to depend on RBM39, for IRF3 it was shown that protein levels decrease under conditions of RBM39 depletion, because IRF3 mRNAs are (slightly) reduced and spliced differently. The sulfonamid Indisulam could largely recapitulate the phenotype of RBM39 depletion. Further analyses using proteomics and transcriptomics showed that RBM39 is required for mRNA splicing and expression of a large set of other proteins. Altogether, this well designed and written study highlights the fundamental role played by RBM39 in in maintaining the pathways of immunity and metabolism. The key conclusions are convincing but some additional experiments would strengthen them further.

      We are grateful for the very positive general comments of this reviewer.

      Major comments: - For the statistics, authors seem not to have done multiple tests but rather tested individual datasets within larger graphs against each other. Please explain where this is the case and use corrections if multiple testing was done

      We apologize for not have been clearer here, we indeed used multiple testing. In the proteomics, statistical significance was evaluated by "two-sample tests" (Student's T-test with permutation-based FDR 0.05 and 250 number of randomizations). For the analysis of RNAseq data, p values were calculated with the Wald test and corrected for multiple testing according to Benjamini-Hochberg. We have now included this information in the materials and methods section and in the respective figure legends.

      • Fig. 4 shows that RBM39 depletion reduces IFIT expression in virus infected cells and slightly increases virus replication. RBM39 has a major effect on IRF3 levels, but also on other players in innate immunity. What happens if IRF3 is ectopically expressed as in figure 5? With this experiment one could measure how high the contribution of IRF3 miss-splicing is to innate immunity.

      We thank this reviewer for the valuable suggestion. We restructured the entire manuscript, to address several reviewer comments regarding the focus on IRF3 and the lack of data on other factors in the pathway. We now clearly demonstrate that ectopic IRF3 expression entirely rescues the TLR3 response to poly(I:C) in PH5CH cells (Fig. 6B-C), which also explains the lack of impact on the NF-κB pathway (Fig. 2G-H). In contrast, overexpression of IRF3 does not rescue the RIG-I/MDA5 response in A549 cells (new data, Fig. 6F-I). Here, also the NF-κB pathway is affected by knockdown of RBM39, suggesting that reduced RIG-I/MDA5 abundance upon RMB39 knockdown substantially contributed to the diminished innate immune response.

      • Fig. 4 A uses siRNAs but B, C and D only indisulam treatment. It would be better if siRNAs would also be used for the other viruses.

      We agree that a homogenous setup for virus infection would be favorable, however, the use of different cell lines was authorative due to limited permissivess of the used cell types towards virus infection and it appeared challenging to achieve similar knockdown efficiencies. To generate a homogenous experimental design, we now added new data showing a comparable impact of siRNA knockdown (Fig. 3F) and indisulam treatment (new Fig. 3G) on Sendai virus infection in A549 cells and took this as a rationale to consistently use indisulam for all other infections.

      • RBM39 depletion strongly reduces IRF3 levels in the WB, but not so much in RT-PCR and not at all in proteomics. Is the antibody used for WB perhaps recognizing a domain that is underrepresented in isoforms after disturbed splicing? Please clarify.

      Our previous proteomics data suffered from a very low sensitivity, therefore we missed clear detection of many factors, including IRF3. We repeated the whole proteomics analysis with siRNA and indisulam treatment (new Fig. 5A, B) and now found significantly reduced IRF3 protein levels in both conditions (new Fig. S5C), in agreement with the WB data. The lower impact on IRF3 mRNA abundance is due to the additional contribution of alternative splicing (Fig. 6A, Fig. S6A-D), which both in combination affect protein abundance.

      • Volcano plots in figure 7 show a lot of hits obtained after both RBM38 siRNA and indisulam (green dots), and some that are additionally identified in transcriptomes and in proteomes (red dots). Nonetheless only innate immunity and stress response genes are marked, although they do not belong to these highly conserved classes. Please elaborate more on the most RBM39-dependent genes, e.g. by presenting them in a heat map.

      To our knowledge, our study is the first with a comprehensive comparison on the impact of RBM39 knockdown and indisulam treatment on the host cell proteome and transcriptome. However, several studies already did -omics studies on individual conditions/readouts (e.g. (Coomar et al, 2023; Dou et al, 2023; Mai et al, 2016; Nijhuis et al, 2022)). These studies already identified and described in detail key changes in transcriptome and proteome e.g. affecting genes involved in cell cycle control and metabolism, which we find as well. However, the novelty of our paper is the impact on innate immune response, we therefore rather decided to put an even stronger focus on these genes and to omit other factors, like stress response pathway components, etc.. This strategy is supported by the higher sensitivity of our new proteome analysis, which now generated a far better overlap with the transcriptomics, favoring a display setting on highlighting only those factors that were further analyzed in detail in the volcano blots (Fig. 5). Still, interested readers will find the comprehensive list of data in the supplementary Excel-datasheets as well as in our primary data in online depositories.

      Minor comments: - Some abbreviations are not explained, like PGK, siNT, siVTN

      We apologize and have added the missing explanation of abbreviations.

      • Welsch should read Welch

      Corrected.

      • Fig. 2H: were cells also stimulated and if yes, how?

      These were unstimulated conditions, to show the impact of RBM39 on basal expression of the IFNlambda receptor chains. However, we deleted this dataset due to the re-organisation of the manuscript. The analysis of the type I and type III receptor and STAT1/2 expression is now comprehensively shown in Fig. 7/S6E, F, solely based on the transcriptomic data for consistency reasons, along with the functional impact on the IFN response.

      • Fig. 6E: I cannot see a difference between to IRF3-203 and 228 isoforms. And what are the white boxes?

      • Also 6E: Location of the primers is barely visible

      Due to the re-organization of the manuscript these data are now shown in Fig. S6D. Both isoforms are indeed very similar and only differ by a very small (16nt) additional exon in isoform 228. The white boxes are exons not translated in the respective isoforms. We have included this important information in the legend to Fig. S6 and increased the arrows indicating the positions of the primer.

      • Some materials are not properly referenced, like the death reporter, the lentiviral system, or the Rift Valley fever luciferase virus

      We are sorry for the missing information, which has now been added to the materials and methods section.

      • Supplement has no page numbers

      We have added page numbers to the supplementary information.

      Reviewer #1 (Significance (Required)):

      The study advances our knowledge about the regulation of innate immunity. Strengths are the discovery of a novel layer of innate immunity regulation by splicing and the in-depth analysis of the importance of RBM39 for cellular gene expression. A potential weakness might be the focus on innate immunity as other biological functions seem even more dependent on RBM39. However, this reviewer sees the necessity that covering all aspects of RBM39 finction would be beyond the scope of a single study. The relevant literature is appropriately cited (except for some materials, see minor comments). Results will be of interest not only to people doing basic research on innate immunity, but also to those interested in gene regulation in general or to cancer researchers using indisulam

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ The authors performed a CRISPR-based screen for genes required for TLR3-mediated signaling and gene expression in Hepatoma cells. Interferon-stimulated expression of an apoptosis inducer was used as a read-out system. A number of candidate genes were identified and one of these, RBM39, investigated in detail. The protein has previously been linked to both transcriptional control and RNA processing. Validation studies confirm that reduction of cellular RBM39 results in less TLR3-mediated IFN-beta synthesis and lower levels of ISG mRNA synthesis. Initial studies suggest a role of RBM39 in regulating of IRF3 levels, the transcription factor activated by TLR3 signaling to induce IFN-beta synthesis. However, the effect is variable and poorly supported by transcriptomic and proteomic data. Moreover, only one out of four cell-based viral infection models reports a substantial effect of the RBM39 knockdown.

      We apologize for the lack of consistency among several datasets, which was mainly due to the low sensitivity of the proteomic analysis. This has been repeated and now fully confirms all other data. In part due to the comments of this reviewer, we further broadened the scope of the manuscript away from IRF3, including a change of the title.

      Major comments:

      1. The data do not support the claim that RBM39 is a broadly acting player in innate immune responses. In addition, they suggest that IRF3 may not be the only relevant RBM39 target. The most informative knockdown control in this regard would be IRF3 siRNA.

      We have re-structured the entire manuscript and added new data to support the central claims of our manuscript, including a repetition of the proteomics study. Proteomics and transcriptomics now consistently demonstrate the impact of RMB39 knockdown as well as indisulam treatment on several key factors of innate immunity, including IRF3, STAT1/2, RIG-I and MDA5 (now in Fig. 5), with IFNAR2 and IL10RB additionally found in transcriptomics. We further provide functional evidence that IRF3 is the key factor affected in the TLR3 pathway (IRF3 overexpression, Fig. 6B, C), whereas diminished abundance of RIG-I/MAD5 is equally important in the respective pathway, thereby also affecting NF-κB response (Fig. 6F-I). We further show the functional significance of IFN-receptor/STAT downregulation on type I and III IFN responses (Fig. 7E-G). We hope this reviewer now agrees with our claim that RBM39 is a broadly acting player in innate immune responses.

      1. The structure of the manuscript is rather confusing because IRF3 is presented as the main RBM39 target in figures 3-6, but the -omics data in figures 7 and 8 do not support this view. The authors argue different sensitivities of the experimental approaches, but I think few people would agree that western blots are more sensitive than MS. To my opinion a narrative with less focus on IRF3 and a broader integration of candidates of the -omics approaches would be preferable.

      We are grateful for this valuable comment and fully agree that the original structure of the manuscript was confusing and in parts misleading, which was mainly due to the fact that we followed the history of the project, which first identified the RBM39 mediated impact on IRF3 expression, whereas the -omics studies, identifying additional factors, were done at a far later point. Many discrepancies further arose from the low sensitivity of our proteomics analysis, which we now repeated, thereby obtaining far more sensitive detection of the key factors we also found in the transcriptomics data. We now moved the -omics data from the end of the paper towards the middle and provide similar depth downstream analysis of all relevant key factors identified (RIG-I/MDA5, IFN receptors, STAT1/2, to reduce the focus on IRF3, as suggested. We further changed the title and abstract to reflect this major conceptual change. Thanks to this helpful comment, we think that our manuscript is now conceptually much clearer.

      Investigating the role of RBM39 by RNA-seq in pIC-treated cells would further strengthen the manuscript. It will yield a broader view of the protein's role in induced innate immunity.

      We did not add pIC treatment to the RNA-seq analysis, since, based on own experience and numerous papers, this will change the expression of literally thousands of genes. Based on the key factors of the pIC response modulated by RBM39 (RLRs and IRF3), this would very likely simply result in reduced induction of the whole ISG panel (as exemplified for IFIT1, ISG15, MxA and CXCL10 in Fig. 2B-E).

      3.The results in figures 6A-C are confusing for two reasons. First, the siRNA-mediated knockdown should result in reduced RBM39 protein as well (as shown in Fig. 3A) and, therefore, in an increase in RBM39 levels. Second, why was this effect not noted in the experiments shown in figs. 1-5? To avoid this confusion it might be good to mention which IRF3 splice isoforms are detected by the primers and antibodies used in these figures.

      Unfortunately, the reviewer seems to have conceptually misinterpreted Fig. 6A-C of the original paper, which did not show protein, but transcriptome data. We now added the corresponding data of the proteomic analysis in the new Fig. S5, for all detectable, relevant candidates, showing consistency to all previous data. The confusing point in previous Fig. 6B, which the reviewer appears to refer to, is the upregulation of RBM39 transcript levels upon indisulam treatment, which was not apparent in previous experiments, since we always used WB to show diminished RBM39 protein levels upon indisulam treatment. This increase in RBM39 mRNA is due to an autoregulation of RBM39 mRNA by protein abundance, which has been reported in literature (Campagne et al, 2023). Since this is rather confusing and not relevant for our study, we removed previous Fig. 6B and show this aspect only in the volcano blot in Fig. 5D, mentioning and citing the paper on autoregulation.

      Minor comments.

      1. Fig S1: the figure panels and legend are inconsistent. IFIT1 is labeled as ISG56 in panel S1A.

      We apologie for this inconsistency and now use IFIT1 throughout the paper.

      1. Data with the siRNA escape mutant of RBM39 are inconsistent. For example, why is its effect significantly different only in 1 out of 4 ISG in figures S2A-D?

      We apologize for the inconsistency, which is due to variability of silencing efficiency. We repeated the entire set of experiments (n=3) with a new batch of siRNA and obtained comparable, significant differences for all ISGs analyzed (new Fig. 2B-E).

      1. Line 164: the statement that TRIF and RBM39 siRNAs produce effects of similar magnitude is incorrect for the IFIT1 gene in figure S2A.

      This experiment was repeated (see previous point), now obtaining significant, more homogenous data. We have modified the text accordingly.

      4.Fig. 2H: In absence of additional evidence for functional implications, the data showing reduced IL10RB expression should be omitted.

      We omitted the data, as suggested by the reviewer, however, we provide a more in depth analysis of the type I and III IFN response in Fig. 7, based on the transcriptomic data and a functional analysis.

      5.Fig. 3: More datapoints would be needed in panel A to sustain the lack of significant difference between the untreated and escape mutant samples. Are the viability data in panels B and C normalized to untreated cells to control for Indisulam toxicity? In figure S3A the effect of the mutant is rather small. To allow for comparison, the Indisulam titration curves should be adapted to the concentrations used in Fig. 3.

      Fig. 3 (now Fig. 4) was replaced by another representative experiment, now also containing the quantification of the shown western blots, however, the statistical analysis shown in the previous version was and is based on three independent biological replicates, as indicated in the figure legend. Viability data was normalized to controls and this information is now added to the figure lengend as well. The mutant analyzed in Fig. S3A (now S4A) confers only partial resistance, which explains the limited but clear rescue. We did not include higher indisulam concentrations here due to the increased cytotoxicity of concentration above 5 µM in PH5CH, in the absence of pronounced additional effects on RBM39 abundance (Fig. 4B).

      6.RNA-seq measures steady-state RNA, not transcription.

      This is of course correct, we changed all sentences, where our wording might have indicated that we are measuring transcription by RNAseq. However, we still need to differentiate between the role of RBM39 in transcriptional regulation and splicing, where changes in RNA abundance found in RNAseq rather point to transcriptional regulation.

      Reviewer #2 (Significance (Required)):

      The identification of RBM39 as a candidate player in innate immune responses is of interest to a large scientific community with interest in signalling by pattern recognition receptors. Its role should be strengthened with additional infection models. It is puzzling that three out of four viruses don't benefit from the reduced IFN-beta synthesis in the RBM39 knockdown. Moreover, the data are not convincing (or too diverse) to nail down IRF3 as a major, or the most relevant, RBM39 target.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __ CRISPR Screen for factors that are required for dsRNA-dependent ISG production. Found a large number of hits but most did not validate in subsequent assays. The authors follow up the one candidate that did pass secondary screening criteria, RBM39, although re-expression of RBM39 only rescues the phenotype of the siRNAs against RBM39 (siRBM39) in one of the two cell lines tested. Additionally, siRBM39 impacts only a subset of polyIC-induced ISGs and does not regulate NFkB-driven gene expression. They go on to attempt to investigate the impact of siRBM39 on other key innate immune genes and proteins, although many key controls and appropriate methods are missing.

      We thank this reviewer for pointing at inconsistencies and missing controls in our manuscript. We have critically re-evaluated the respective datasets.

      Major comments: 1) The authors propose some rationale for the limited success of the screen, however, while RBM39 may have a role in dsRNA-induced innate immunity, in general the screen seems to have limited value.

      The aim of our CRISPR/Cas9 death reporter screen was the identification of so far unknown contributors to innate immune response. This was achieved by identifying a critical role of RBM39, followed by an in depth validation focusing on RBM39. We further found known components of the TLR3 pathway in our candidate list (e.g. TRIF and UNC93B1), pointing to the overall quality of the experimental setup. At no point of the manuscript we claim that our screen aimed for or delivered a comprehensive overview on innate immunity pathways. Honestly, no comparable screen (e.g. on cytopathic viruses) has delivered such data.

      2) Given that the siRBM39 clearly has off-target effects (since expression of a resistant RBM39 cDNA only gives limited rescue in many cases - Fig S2), each of the experiments in which siRBM39 is used (i.e. Fig 2) should have the RBM39.esc control - especially those that drive subsequent experiments such as the expression of IFNbeta and IFNLR1 (Fig 2a, h)

      The inconsistency in some datasets, showing all the same trends, but in some cases lacking statistical significance was due to variability in knockdown efficiency. We repeated all mentioned datasets with new batches of siRNA with sufficient biological replicates (n=3) with now all of them revealing consistent, statistically significant data. Importantly, all experiments implementing the RMB39.esc control now show consistent rescue.

      3) Since RBM39 reduction has an apparent impact even if IFNLR1-deficient cells (although need the rescue control to know if this is real) the authors conclude that RBM39 regulates the initial wave of dsRNA signaling-events, but this should be tested with the use of Ruxilitinib to block JAK-STAT signaling.

      Due to the general major re-organization of the manuscript, aiming for a less confusing data presentation and consistency towards depth of candidate evaluation, we have removed the data on the IFNLR-deficient cell line. The claim that RBM39 affects the initial wave of ISG responses is based on reduced IFNb expression, which is exclusively induced by the initial wave of ISG response and by the general impact on ISG expression, which we measure at 6h after induction, too early for autocrine IFN stimulation (Burkart et al, 2023). However, we further demonstrate that downregulation of type I and type III IFN receptors in conjunction with STAT1/2 affect the type I and the type III IFN response as well (Fig. 7E-G, in part new data). Therefore, RBM39 affects both, the intial wave and the auto-/paracrine IFN response, and we therefore undertook no further efforts to separate these effects.

      4) IRF3 expression in the Indisulam-treated cells more closely tracks cell viability than RBM39 expression. For example in Fig 3C 10 microM gives 50% IRF3 expression and 50% viability but still 95% RBB39 expression - arguing that the impact of siRBM39 on IRF3 might be very indirect (and error bars on rescue are large so unclear if the rescue really worked in Fig 3A).

      Based on this reviewer comment we re-evaluated the quantification in previous Fig. 3C (now Fig. 4C), which combines data from three independent experiments. We deeply apologize, but the initial quantification proved to be wrong, due erroneous background subtraction, which was relatively high in one of the PHH-replicates (Replicate 1, see Reviewer Fig. 1 in uploaded file). The re-evaluated quantification revealed 55% for the RBM39 abundance at 10µM indisulam, which better reflects the data shown and is now in line with the impact on cytotoxicity and IRF3 abundance.

      5) It is unclear in Fig 4 why some cell/virus combinations are tested with siRBM39 and others are tested with Indisulam. Also the conclusion that RBM39 "substantially contributes to the cell intrinsic innate immune response to viral infections" is greatly overstated given that the differences are between ~3 fold and non-significant.

      We agree that a homogenous setup for virus infection would be favorable, however, the use of different cell lines was authoritave due to limited permissivess of the used cell types towards virus infection and it appeared challenging to achieve similar knockdown efficiencies. To generate a homogenous experimental design, we now added new data showing a comparable impact of siRNA knockdown (Fig. 3F) and indisulam treatment (new Fig. 3G) on Sendai virus infection in A549 cells and took this as a rationale to consistently use indisulam for all other infections. Overall, the aim of the virus infection experiments was using a variety of natural triggers of innate immunity beyond synthetic poly(I:C). Here we found indeed significant reductions of ISG induction for all viruses tested, similar to poly(I:C), this is the basis for the statement that RBM39 contributes the cell intrinsic innate immune response to viral infections. Our experimental design did not intend to see pronounced effects on viral replication, this was only measured to secure that reduced ISG induction was not due to inhibition of viral replication. We have explained this strategy now clearer and tuned down corresponding statements, to exclude potential overinterpretation of the data.

      6) Neither DTU/DRIMseq or qPCR are valid methods to measure splice isoform differences. The authors need to use rMATS or MAJIQ and validate by gel-based RT-PCR.

      Output generated by modern alignment algorithms like salmon is suitable for studies on an isoform level (Love et al, 2018) and has been used in a variety of studies (e.g.(Jabs et al, 2020; Xiong et al, 2023). MAJIQ and rMATS are only superior tools if the detection of so far unknown isoforms is of interest (Love et al., 2018), which is beyond the scope of this project. We have validated the data for IRF3 in RT-qPCR, showing close to identical results to the DTU analysis (compare Fig. 6A and S6D). We disagree that a gel-based RT-PCR analysis would be superior here, due to the lack of quantification.

      7) The conclusions from the proteomic and transcriptomic analyses should be treated with extreme caution given the caveats of methodology and controls discussed above.

      We are aware of the caveats of these technologies. The previous proteomic analysis indeed suffered from low sensitivity, failing to detect essential candidates like IRF3. The repetition of the experiment (new Fig. 5A, B, new Fig. S5) now revealed data very consistent with the transcriptomic data. Overall, the strength of our approach is the direct comparison of siRNA based RBM39 knockdown and RBM39 depletion by indisulam throughout transcriptomics and proteomics analyses. The wide overlap argues for the validity of our data and suggests that we thereby circumvented many caveats.

      Reviewer #3 (Significance (Required)):

      Innate immune signaling is a complex and essential pathway for maintaining health. While much is known about key components of this pathway, additional regulators are likely to exist. This manuscript describes an attempt to identify new regulators of dsRNA-mediated gene expression.

      References

      Burkart SS, Schweinoch D, Frankish J, Sparn C, Wust S, Urban C, Merlo M, Magalhaes VG, Piras A, Pichlmair A et al (2023) High-resolution kinetic characterization of the RIG-I-signaling pathway and the antiviral response. Life Sci Alliance 6

      Campagne S, Jutzi D, Malard F, Matoga M, Romane K, Feldmuller M, Colombo M, Ruepp MD, Allain FH (2023) Molecular basis of RNA-binding and autoregulation by the cancer-associated splicing factor RBM39. Nat Commun 14: 5366

      Coomar S, Mota P, Penson A, Schwaller J, Abdel-Wahab O, Gillingham D (2023) Overlaid Transcriptional and Proteome Analyses Identify Mitotic Kinesins as Important Targets of Arylsulfonamide-Mediated RBM39 Degradation. Mol Cancer Res 21: 768-778

      Dou Z, Zhang X, Su W, Zhang T, Ye F, Zhao D, Chen X, Li Q, Zhang H, Di C (2023) Indisulam exerts anticancer effects via modulation of transcription, translation and alternative splicing on human cervical cancer cells. Am J Cancer Res 13: 2922-2937

      Jabs S, Biton A, Becavin C, Nahori MA, Ghozlane A, Pagliuso A, Spano G, Guerineau V, Touboul D, Giai Gianetto Q et al (2020) Impact of the gut microbiota on the m(6)A epitranscriptome of mouse cecum and liver. Nat Commun 11: 1344

      Love MI, Soneson C, Patro R (2018) Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. F1000Res 7: 952

      Mai S, Qu X, Li P, Ma Q, Cao C, Liu X (2016) Global regulation of alternative RNA splicing by the SR-rich protein RBM39. Biochim Biophys Acta 1859: 1014-1024

      Nijhuis A, Sikka A, Yogev O, Herendi L, Balcells C, Ma Y, Poon E, Eckold C, Valbuena GN, Xu Y et al (2022) Indisulam targets RNA splicing and metabolism to serve as a therapeutic strategy for high-risk neuroblastoma. Nat Commun 13: 1380

      Xiong L, Liu J, Han SY, Koppitch K, Guo JJ, Rommelfanger M, Miao Z, Gao F, Hallgrimsdottir IB, Pachter L et al (2023) Direct androgen receptor control of sexually dimorphic gene expression in the mammalian kidney. Dev Cell 58: 2338-2358 e2335

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This paper by Beath et. al. identifies a potential regulatory role for proteins involved in cytoplasmic streaming and maintaining the grouping of paternal organelles: holding sperm contents in the fertilized embryos away from the oocyte meiotic spindle so that they don't get ejected into the polar body during meiotic chromosome segregation. The authors show that by time-lapse video, paternal mitochondria (used as a readout for sperm and its genome) is excluded from yolk granules and maternal mitochondria, even when moving long distances by cytoplasmic streaming. To understand how this exclusion is accomplished, they first show that it is independent of both internal packing and the engulfment of the paternal chromosomes by maternal endoplasmic reticulum creating an impermeable barrier. They then test whether the control of cytoplasmic steaming affects this exclusion by knocking down two microtubule motors, Katanin and kinesis I. They find that the ER ring, which is used as a proxy for paternal chromosomes, undergoes extensive displacement with these treatments during anaphase I and interacts with the meiotic spindle, supporting their hypothesis that the exclusion of paternal chromosomes is regulated by cytoplasmic streaming. Next, they test whether a regulator of maternal ER organization, ATX-2, disrupts sperm organization so that they can combine the double depletion of ATX-2 and KLP-7, presumably because klp-7 RNAi (unlike mei-1 RNAi) does not affect polar body extrusion and they can report on what happens to paternal chromosomes. They find that the knockdown of both ATX-2 and KLP-7 produces a higher incidence of what appears to be the capture of paternal chromosomes by the meiotic spindle (5/24 vs 1/25). However, this capture event appears to halt the cell cycle, preventing the authors from directly observing whether this would result in the paternal chromosomes being ejected into the polar body. 

      Strengths: 

      This is a useful, descriptive paper that highlights a potential challenge for embryos during fertilization: when fertilization results in the resumption of meiotic divisions, how are the paternal and maternal genomes kept apart so that the maternal genome can undergo chromosome segregation and polar body extrusion without endangering the paternal genome? In general, the experiments are well-executed and analyzed. In particular, the authors' use of multiple ways to knock down ATX-2 shows rigor. 

      Weaknesses: 

      The paper makes a case that this regulation may be important but the authors should do some additional work to make this case more convincing and accessible for those outside the field. In particular, some of the figures could include greater detail to support their conclusions, they could explain the rationale for some experiments better and they could perform some additional control experiments with their double depletion experiments to better support their interpretations. Also, the authors' inability to assess the functional biological consequences of the capture of the sperm genome by the oocyte spindle should be discussed, particularly in light of the cell cycle arrest that they observe. 

      These general comments are addressed in the more specific critiques below.

      Reviewer #2 (Public Review): 

      Summary 

      In this manuscript, Beath et al. use primarily C. elegans zygotes to test the overarching hypothesis that cytoplasmic mechanisms exit to prevent interaction between paternal chromosomes and the meiotic spindle, which are present in a shared zygotic cytoplasm after fertilization. Previous work, much of which by this group, had characterized cytoplasmic streaming in the zygote and the behavior of paternal components shortly after fertilization, primarily the clustering of paternal mitochondria and membranous organelles around the paternal chromosomes. This work set out to identify the molecular mechanisms responsible for that clustering and test the specific hypothesis that the "paternal cloud" helps prevent the association of paternal chromosomes with the meiotic spindle. 

      Strengths 

      This work is a collection of technical achievements. The data are primarily 3- and 4-channel time-lapse images of zygotes shortly after fertilization, which were performed inside intact animals. There are many instances in which the experiments show extreme technical skill, such as tracking the paternal chromosomes over large displacements throughout the volume of the embryo. The authors employ a wide variety of fluorescent reporters to provide a remarkably clear picture of what is going on in the zygote. These reagents and the novel characterization of these stages that they provide will be widely beneficial to the community. 

      The data provide direct visualization of what had previously been a mostly hypothetical structure, the "paternal cloud," using simultaneous labeling of paternal DNA and mitochondria in combination with a variety of maternal proteins including maternal mitochondria, yolk granules, tubulin, and plasma membrane. Together, these images provided convincing evidence of the existence of this specified cytoplasmic domain. They go on to show that the knockdown of the ataxin-2 homolog ALX-2, a protein previously shown to affect ER dynamics, disrupted the paternal cloud, identifying a role for ER organization in this structure. 

      The authors then used the system to test the functional consequences of perturbing the cytoplasmic organization. Consistent with the paternal cloud being a stable structure, it stayed intact during large movements the authors generated using previously published knockdowns (of mei-1/katanin and kinesin-13/kpl-7) that increased cytoplasmic streaming. They used this data to document instances in which the paternal chromosomes were likely to have been attached to the spindle. They concluded with direct evidence of spindle fibers connecting to the paternal chromatin upon knockdown of ATX-2 in combination with increased cytoplasmic streaming, providing strong, direct support for their overarching hypothesis. 

      Weaknesses 

      While the data is convincing, the narrative of the paper could be streamlined to highlight the novelty of the experiments and better articulate the aims. For example, the cloud of paternal mitochondria and membranous organelles was previously shown, but Figures 1-2 largely reiterate that observation. The innovation seems to be that the combination of ER, yolk, and maternal mitochondrial markers makes the existence of a specified domain more concrete. There are also some instances where more description is needed to make the conclusions from the images clear. 

      These general comments are addressed in the more specific critiques below.

      The manuscript intersperses what read like basic characterizations of fluorescent markers that, as written, can distract from the main story. The authors characterized the dynamics of ER organization throughout the substages of meiosis and the permeability of the envelope of ER that surrounds the paternal chromatin, but it could be more clearly established how the ability to visualize these structures allowed them to address their aims.

      We have added the following after the initial description of ER morphology changes: (ER morphology was used to determine cell-cycle stages during live imaging reported below in Fig. 6.)

      More background on what was previously known about ER organization in M-phase and the role of ataxin proteins specifically may help provide more continuity. 

      We have added references to transitions to ER sheets during mitotic M-phase in HeLa cells and Xenopus extracts.

      Reviewer #3 (Public Review): 

      Summary: 

      This study by Beath et al. investigated the mechanisms by which sperm DNA is excluded from the meiotic spindle after fertilization. Time-lapse imaging revealed that sperm DNA is surrounded by paternal mitochondria and maternal ER that is permeable to proteins. By increasing cytoplasmic streaming using kinesin-13 or katanin RNAi, the authors demonstrated that limiting cytoplasmic streaming in the embryo is an important step that prevents the capture of sperm DNA by the oocyte meiotic spindle. Further experiments showed that the Ataxin-2 protein is required to hold paternal mitochondria together and close to the sperm DNA. Finally, double depletion of kinesin-13 and Ataxin-2 suggested an increased risk of meiotic spindle capture of sperm DNA. 

      Overall, this is an interesting finding that could provide a new understanding of how meiotic spindle capture of sperm DNA and its accidental expulsion into the polar body is prevented. However, some conceptual gaps need to be addressed and further experiments and improved data analyses would strengthen the paper. 

      - It would be helpful if the authors could discuss in good detail how they think maternal ER surrounds the sperm DNA

      We have added 2 references to papers about nuclear envelope re-assembly from Shirin Bahmanyar’s lab and suggest the ER envelope is a halted intermediate in nuclear envelope reassembly.

      and why is it not disrupted following Ataxin disruption. 

      We have been attempting to disrupt ER structures in the meiotic embryo for the last 5 years by depleting profilin, BiP, atlastin, ATX-2 and by optogenetically packing ER into a ball in the middle of the oocyte.  None of these treatments prevent envelopment of the sperm DNA by maternal ER.  None of these treatments remove ER from the spindle envelope and none remove ER from the plasma membrane.  These treatments mostly result in “large aggregates” of ER that we have not examined by EM.  Wild speculation: any disruption of the ER strong enough to prevent ER envelopment around chromatin would be sterile because the M to S transition in the mitotic zone of the germline would be blocked.  Rapid depletion of ATX-2 to the extent shown by rigorous data in this manuscript does not prevent ER envelopment around chromatin.  We chose not to speculate about the reasons for this because we do not know why.

      - Since important phenotypes revealed in RNAi experiments (e.g. kinesin-13 and ataxin-2 double depletion) are not very robust, the authors should consider toning down their conclusions and revising some of their section headings. I appreciate that they are upfront about some limitations, but they do nonetheless make strong concluding sentences. 

      We have changed the discussion of the klp-7 atx-2 double depletion to: “The capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos suggests that the integrity of the exclusion zone around the sperm DNA might insulate the sperm DNA from spindle microtubules.  However, a much larger number of klp-7(RNAi) singly depleted and atx-2(degron) singly depleted time-lapse sequences are needed to rigorously support this idea. “

      - The discussion section could be improved further to present the authors' findings in the larger context of current knowledge in the field. 

      We have expanded the discussion as suggested.

      - The authors previously demonstrated that F-actin prevents meiotic spindle capture of sperm DNA in this system. However, the current manuscript does not discuss how the katanin, kinesin-13 and Ataxin-2 mechanisms could work together with previously established functions of F-actin in this process. 

      We have added pfn-1(RNAi) to the discussion section.

      - How can the authors exclude off-target effects in their RNAi depletion experiments? Can kinesin-13, katanin, and Ataxin phenotypes be rescued for instance? 

      For ataxin-2 phenotypes, two completely independent controls for off target effects are shown.  GFP(RNAi) on a strain with and endogenous ATX-2::GFP tag vs GFP(RNAi) on a strain with no tag on the ATX-2.  ATX-2::AID with or without auxin.  For kinesin-13 and katanin, we did not do a rigorous control for off-target effects of RNAi.  However, the effects of these depletions on cytoplasmic microtubules have been previously reported by others

      - How are the authors able to determine if the paternal genome was actually captured by the spindle? Does lack of movement definitively suggest capture without using a spindle marker? 

      mKate::tubulin labels the spindle in each capture event.  This can be seen in Video S3. for mei-1(RNAi) and Figure 9 for atx-2 klp-7 double depletions.

      (1) Major issues: 

      The images provided are not convincing that mitochondria are entirely excluded from the regions with yolk granules from the images provided. Please provide insets of magnified images of the paternal mitochondria in Figure 1E to more clearly show the exclusion even when paternal mitochondria are streaming. Providing grayscale images, individual z-sections and/or some quantification of this data might also be more convincing to this reviewer. 

      We have modified Fig. 1 by adding single wavelength magnified insets to more clearly show that paternal mitochondria are in a “black hole” in the maternal yolk granules during  cytoplasmic streaming.

      Figure 2 -This figure can be retitled to highlight that the paternal organelle cloud is impermeable to mitochondria and conserved. 

      The legend has been re-titled as suggested.

      Figure 3B, An image of the DNA within the ring of maternal ER especially since the maternal ER ring is used as a proxy for the paternal chromosomes in later figures would strengthen the authors' claims.

      We have added a panel showing DAPI-stained DNA in the center of the ER ring and paternal mitochondria cloud. 

      Why is the faster time scale imaging significant? I think this could be more clearly set up in the paper. Perhaps rapid imaging of maternal mito-labeled kca-1(RNAi) embryos would better show the difference in time scale, with the expectation that the paternal cloud forms and persists while the ER invades. 

      We are not sure what the reviewer means.  5 sec time intervals were used throughout the paper.  We are also not sure how kca-1(RNAi) would help.  Movement of the entire oocyte into and out of the spermatheca is what limits the ability to keep a fusing sperm in focus.  kca-1(RNAi) would prevent cytoplasmic streaming but not ovulation movements.

      Figure 4 - The question about the permeability of the ER envelope seems to come out of nowhere as written. It isn't clear how it contributes to the larger story about preventing sperm incorporation in the spindle.

      This section of the results is introduced with: “If the maternal ER envelope around sperm DNA was sealed and impermeable during meiosis, this could both prevent the sperm DNA from inducing ectopic spindle assembly and prevent the sperm DNA from interacting with meiotic spindle microtubules.” 

      The data in Figure 4 would probably not be expected to be in this paper based on the paper title. Maybe the title needs something about ER dynamics? "eg. ATX-2 but not an ER envelope" isolates the paternal chromatin? 

      In Figure 5, it seems that RNAi of klp-7 and Mei-1 had slightly different effects on short-axis displacement of the ER envelope (klp-7 affecting it more dramatically than mei-1) and slightly different effects on interaction with the meiotic spindle (capture vs streaming past the spindle). The authors mention in their discussion that the difference in the interaction with the meiotic spindle might reflect the effects that loss of Mei-1 may have on the spindle but could it also be a consequence of the differences in cytoplasmic streaming observed?

      With our current data, the only statistically significant difference between cytoplasmic streaming of the sperm contents in mei-1(RNAi) vs klp-7(RNAi) is that excessive streaming persists longer into metaphase II in klp-7(RNAi).  We have added a sentence describing this difference to the results.  If differences in streaming were the cause of different capture frequencies, then klp-7(RNAi) would cause more capture events than mei-1(RNAi) but the opposite was observed.  We have avoided too much discussion here because the frequency of capture events is too low to demonstrate statistically significant differences between mei-1(RNAi), klp-7(RNAi), and atx-2(degron) + klp-7(RNAi) without a very large increase in the number of time-lapse sequences.  

      Also, the authors should find a way to represent this interaction with the meiotic spindle in a quantitative or table form to allow the reader to observe some of the patterns they report more easily.

      We have added a table to Fig. 9 that summarizes capture data.

      Finally, can the authors report when they observe the closest association with the meiotic spindle: Does it correlate with the period of greatest displacement (AI) or are they unlinked? 

      The low frequency of capture events makes it difficult to test this rigorously.

      Figure 6- 'Endogenously tagged ATX-2 was observed throughout oocytes and meiotic embryos without partial co-localization with ER.' How can the authors exclude co-localization with ER? 

      We have changed the wording to: “Endogenously tagged ATX-2 was observed throughout oocytes and meiotic embryos (Fig. 6A; Fig. S2).  ATX-2 did not uniquely  co-localize with ER (Fig. S2).“

      The rationale for why the authors think that the integrity of sperm organelles is important to keep the genomes apart is not clear to this reviewer and needs to be explained better. Moving the discussion of the displacement experiments in Figure S3 from the end of the results section to the ATX-2 knockdown section would help accomplish this. 

      We have added the sentence: “The frequency of sperm capture by the meiotic spindle (Fig. 9D) was significantly higher than wild-type controls in klp-7(RNAi) atx-2(AID) double depleted embryos (p=0.011 Fisher’s exact test).   Although the number of single mutant embryos analyzed was too low to demonstrate a significant difference between single and double mutant embryos,  these results qualitatively support the hypothesis that limiting cytoplasmic streaming and maintaining the integrity of the ball of paternal mitochondria are both important for preventing capture events between the meiotic spindle and sperm DNA.”

      It looks like, in the double knockdown of ATX-2 and KLP-7, the spread of paternal mitochondria is less affected than when only ATX-2 is depleted. What effect does this result have on the observation that the incidence of sperm capture appears to increase in the double depletion? What does displacement of the ER ring look like in the double depletion? Is it additive, consistent with their interpretation that both limiting cytoplasmic streaming and maintaining the integrity of the ball of paternal mitochondria is required to keep the genomes separate? 

      We cannot show a significant difference between single a double knockdowns without increasing n by alot.  We did not analyze ER ring displacement in the double mutant.

      Is the increased incidence of capture in the double-depleted embryos significant? 

      We have added the sentence: “The frequency of sperm capture by the meiotic spindle (Fig. 9D) was significantly higher than wild-type controls in klp-7(RNAi) atx-2(AID) double depleted embryos (p=0.011 Fisher’s exact test).   Although the number of single mutant embryos analyzed was too low to demonstrate a significant difference between single and double mutant embryos,  these results qualitatively support the hypothesis that limiting cytoplasmic streaming and maintaining the integrity of the ball of paternal mitochondria are both important for preventing capture events between the meiotic spindle and sperm DNA.”

      What do the authors make of the cell cycle arrest observed when paternal chromosomes are captured? Is there an argument to be made that this arrest supports the idea that preventing this capture is actively regulated and therefore functionally important? 

      We chose not to discuss the mechanism of this arrest because considerably more work would be required to prove that it is not caused by a combination of imaging conditions and genotype.  The low frequency of these capture + arrest events would make it very difficult to show that the arrest does not occur after depleting a checkpoint protein.

      (2) Minor concerns: 

      Top of page 4: "streaming because depletion tubulin stops cytoplasmic streaming (7)" should be "streaming because depletion of tubulin stops cytoplasmic streaming (7)" 

      The ”of” has been inserted.

      Page 6: "This result indicated that the volume of paternal mitochondria excludes maternal mitochondria and yolk granules but not maternal ER." The authors have only shown this for maternal mitochondria, not yolk granules. 

      We have deleted the mention of yolk granules here.

      Page 7: "These results suggest that all maternal membranes are initially excluded from the sperm at fusion." Should be "These results show that maternal ER are initially excluded from the sperm at fusion. Since maternal mitochondria and yolk granules are excluded later, this suggests that all maternal membranes are initially excluded from the sperm at fusion." 

      We have changed this sentence as suggested.

      It's not clear why the authors show other types of movement that might be quantified when cytoplasmic streaming is affected in Figure 5A and only quantify long-axis and short-axis displacement. 

      We have deleted the other types of movement from the schematic.  Although these parameters were quantified, we did not include this data in the results so it would be confusing for the reader to have them in the schematic.

      Bottom of page 7: Mention that the GFP::BAF-1 was maternally provided. 

      We have added “Maternally provided..”

      Missing an Arrow on Figure 1A 9:20. 

      We removed the text citation to an arrow in Fig. 1A because we moved most of the description of the ER ring to Fig. 3 to address other reviewer suggestions.

      Supplemental videos should be labeled appropriately to indicate what structures are labeled. It is currently difficult to understand what is being shown. 

      (3) Issues with the Discussion section: 

      "The simplest explanation is that cytoplasm does not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm." - Citation page 12. 

      We have changed the sentence to: “The simplest hypothesis is that maternal and paternal cytoplasm might not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm.” 

      "The higher frequency of capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos compared with either single depletion suggests that the integrity of the exclusion zone around the sperm DNA may insulate the sperm DNA from spindle microtubule" - Pages 12-13 reference the figures. 

      This sentence has been rewritten in response to other comments but the new sentence now references revised Fig. 9.

      "ATX-2 is required to maintain the integrity of the ball of paternal mitochondria around the sperm DNA, but the mechanism is unknown." - Page 13 reference figure. 

      A reference to Figs 7 and 8 has been inserted.

      " In control embryos, the sperm contents rarely came near the meiotic spindle in agreement with a previous study that found that male and female pronuclei rarely form next to each other (6). Streaming of the sperm contents was most commonly restricted to a jostling motion with little net displacement, circular streaming in the short axis of the embryo, or long axis streaming in which the sperm turned away from the spindle before the halfway point of the embryo. Depletion of MEI-1 or KLP-7 resulted in longer excursions of the sperm contents in the long axis of the embryo toward the spindle but frequent capture of the sperm by the spindle was only observed in mei-1(RNAi)." - Page 13, the corresponding figures need to be referenced for these sentences. 

      We have inserted figure references.

      "In capture events observed after double depletion of ATX-2 and KLP-7, a bundle of microtubules was discernible extending from the spindle into the ER envelope surrounding the sperm DNA. Such bundles were not observed in mei-1(RNAi) capture events, likely because of the previously reported low density of microtubules in mei-1(RNAi) spindles (36, 37)." - Pages 13-14 references figures here. 

      We have inserted figure references.

      "The higher frequency of capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos compared with either single depletion suggests that the integrity of the exclusion zone around the sperm DNA may insulate the sperm DNA from spindle microtubules." - This should be toned down since this phenotype is not robust. 

      We have changed this to: “The capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos suggests that the integrity of the exclusion zone around the sperm DNA might insulate the sperm DNA from spindle microtubules.  However, a much larger number of klp-7(RNAi) singly depleted and atx-2(degron) singly depleted time-lapse sequences are needed to rigorously support this idea. “

      ATX-2 depletion alters ER morphology but does not impact the maternal ER envelope - could the authors provide a potential explanation for this? 

      In the discussion, we cite papers showing that ATX-2 depletion affects many different cellular processes so the effect we see on paternal mitochondria might have nothing to do with the ER ring.   We have been attempting to disrupt ER structures in the meiotic embryo for the last 5 years by depleting profilin, BiP, atlastin, ATX-2 and by optogenetically packing ER into a ball in the middle of the oocyte.  None of these treatments prevent envelopment of the sperm DNA by maternal ER.  None of these treatments remove ER from the spindle envelope and none remove ER from the plasma membrane.  These treatments mostly result in “large aggregates” of ER that we have not examined by EM.  Wild speculation: any disruption of the ER strong enough to prevent ER envelopment around chromatin would be sterile because the M to S transition in the mitotic zone of the germline would be blocked.  Rapid depletion of ATX-2 to the extent shown by rigorous data in this manuscript does not prevent ER envelopment around chromatin.  We chose not to speculate about the reasons for this because we do not know why.

      It would be good to have representative images of what the altered spindle looks like in MEI-1-depleted oocytes. 

      The structure of MEI-1-depleted spindles has been described in the cited references.

      "Depletion of MEI-1 or KLP-7 resulted in longer excursions of the sperm contents in the long axis of the embryo toward the spindle but frequent capture of the sperm by the spindle was only observed in mei-1(RNAi)" - It is intriguing that this does not happen in the double depletion experiments of kinesin-13 and ATX-2. The authors should perhaps discuss this. 

      This does happen in KLP-7 ATX-2 double depleted embryos as shown in Fig. 9.

      (4) Missing citations: 

      "This analysis was restricted to embryos from anaphase I through anaphase II because our streaming data and that of Kimura 2020 indicate that the sperm contents have not moved significantly before anaphase I." - This needs an appropriate citation. Page 10. 

      We have inserted citations here.

      " The simplest explanation is that cytoplasm does not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm." - Citation page 12. Not referencing figures in the discussion. 

      We have changed the sentence to: “The simplest hypothesis is that maternal and paternal cytoplasm might not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm.” 

      "The higher frequency of capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos compared with either single depletion suggests that the integrity of the exclusion zone around the sperm DNA may insulate the sperm DNA from spindle microtubule" - Pages 12-13 reference the figures. 

      A reference to the revised Fig. 9 has been inserted in the revised version of this sentence.

      "ATX-2 is required to maintain the integrity of the ball of paternal mitochondria around the sperm DNA, but the mechanism is unknown." 

      References to Figs. 7 and 8 have been inserted.

      Page 13 reference figure 

      " In control embryos, the sperm contents rarely came near the meiotic spindle in agreement with a previous study that found that male and female pronuclei rarely form next to each other (6). Streaming of the sperm contents was most commonly restricted to a jostling motion with little net displacement, circular streaming in the short axis of the embryo, or long axis streaming in which the sperm turned away from the spindle before the halfway point of the embryo. Depletion of MEI-1 or KLP-7 resulted in longer excursions of the sperm contents in the long axis of the embryo toward the spindle but frequent capture of the sperm by the spindle was only observed in mei-1(RNAi)." Page 13, the corresponding figures need to be referenced for these sentences. 

      We have inserted citations here.

      "In capture events observed after double depletion of ATX-2 and KLP-7, a bundle of microtubules was discernible extending from the spindle into the ER envelope surrounding the sperm DNA. Such bundles were not observed in mei-1(RNAi) capture events, likely because of the previously reported low density of microtubules in mei-1(RNAi) spindles (36, 37)." Pages 13-14 references figures here. 

      We have inserted citations here.

      (5) Referencing wrong figures in the text: 

      Figure 5 - In the figure legend there is a 5C but there is no 5C panel in the figure. 

      A C has been inserted in Fig. 5.

      Figure 6A - "Dark holes were observed suggesting exclusion from the lumens of larger membranous organelles (Fig. 6A; Fig. S2)." Page 10. 

      6A has been changed to 6C.

      Figure 6A is showing background autofluorescence in WT oocytes so I am not certain why it is cited here. 

      The Figure citation has been corrected to 6B, C.

      Figure 8 - I could not find the supplemental data file with the individual mitochondria distance measurements. 

      We are including the Excel file with the revised submission.

      The last sentence of the first paragraph should be re-worded to be more concise ". In C. elegans, the nucleus is positioned away from the site of future fertilization so that the meiosis I spindle assembles at the opposite end of the ellipsoid zygote from the site of fertilization (2-4). " 

      Every word of this sentence is important.

      Last sentence second paragraph typo "These microtubules are thought to drive meiotic cytoplasmic streaming because depletion tubulin stops cytoplasmic streaming (7) and depletion of the microtubule-severing protein katanin by RNAi results in an increased mass of cortical microtubules and an increase in cytoplasmic streaming (8)." Pages 3-4. 

      “of” has been inserted.

      (6) Typos in the introduction should be corrected: 

      Ataxin or kinesin-13 are not mentioned in the introduction but these are a big focus of the paper. 

      Gong et al 2024 written instead of number citation (page 5), no citation in References.

      This has been corrected. 

      Supplemental videos should be labeled appropriately to indicate what structures are labeled. It is currently difficult to understand what is being shown.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      Summary:

      The authors used four datasets spanning 30 countries to examine funding success and research quality score for various disciplines. They examined whether funding or research quality score were influenced by majority gender of the discipline and whether these affected men, women, or both within each discipline. They found that disciplines dominated by women have lower funding success and research quality score than disciplines dominated by men. These findings, are surprising because even the men in women-dominated fields experienced lower funding success and research quality score.

      Strengths:

      - The authors utilized a comprehensive dataset covering 30 countries to explore the influence of the majority gender in academic disciplines on funding success and research quality scores.

      - Findings suggest a systemic issue where disciplines with a higher proportion of women have lower evaluations and funding success for all researchers, regardless of gender.

      - The manuscript is notable for its large sample size and the diverse international scope, enhancing the generalizability of the results.

      - The work accounts for various factors including age, number of research outputs, and bibliometric measures, strengthening the validity of the findings.

      - The manuscript raises important questions about unconscious bias in research evaluation and funding decisions, as evidenced by lower scores in women-dominated fields even for researchers that are men.

      - The study provides a nuanced view of gender bias, showing that it is not limited to individuals but extends to entire disciplines, impacting the perception and funding and quality or worth of research.

      - This work underscores the need to explore motivations behind gender distribution across fields, hinting at deep-rooted societal and institutional barriers.

      - The authors have opened a discussion on potential solutions to counter bias, like adjusting funding paylines or anonymizing applications, or other practical solutions.

      - While pointing out limitations such as the absence of data from major research-producing countries, the manuscript paves the way for future studies to examine whether its findings are universally applicable.

      Weaknesses:

      - The study does not provide data on the gender of grant reviewers or stakeholders, which could be critical for understanding potential unconscious bias in funding decisions. These data are likely not available; however, this could be discussed. Are grant reviewers in fields dominated by women more likely to be women?

      - There could be more exploration into whether the research quality score is influenced by inherent biases towards disciplines themselves, rather than only being gender bias.

      - The manuscript should discuss how non-binary gender identities were addressed in the research. There is an opportunity to understand the impact on this group.

      - A significant limitation is absence of data from other major research-producing countries like China and the United States, raising questions about the generalizability of the findings. How comparable are the findings observed to these other countries?

      - The motivations and barriers that drive gender distribution in various fields could be expanded on. Are fields striving to reach gender parity through hiring or other mechanisms?

      - The authors could consider if the size of funding awards correlates with research scores, potentially overlooking a significant factor in the evaluation of research quality. Presumably there is less data on smaller 'pilot' funds and startup funds for disciplines where these are more common. Would funding success follow the same trend for these types of funds?

      - The language used in the manuscript at times may perpetuate bias, particularly when discussing "lower quality disciplines," which could influence the reader's perception of certain fields.

      - The manuscript does not clarify how many gender identities were represented in the datasets or how gender identity was determined, potentially conflating gender identity with biological sex.

      Reviewer #3 (Public Review):

      This study seeks to investigate one aspect of disparity in academia: how gender balance in a discipline is valued in terms of evaluated research quality score and funding success. This is important in understanding disparities within academia.

      This study uses publicly available data to investigate covariation between gender balance in an academic discipline and:

      i) Individual research quality scores of New Zealand academics as evaluated by one of 14 broader subject panels.

      ii) Funding success in Australia, Canada, Europe, UK.

      The study would benefit from further discussion of it limitations, and from the clarification of some technical points (as described in the recommendations for the authors).

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors):

      This is a very nice study as-is. In the following comments, I have mainly put my thoughts as I was reading the manuscript. If there are practical ways to answer my questions, I think they could improve the manuscript but the data required for this may not be available.

      Are there any data on the gender of grant reviewers or stakeholders who make funding decisions?

      The research quality score metrics seem to be more related to unconscious bias. The funding metrics may also, but there are potentially simple fixes (higher paylines for women or remove gender identities from applications).

      We have included some details about PBRF funding panel gender diversity. These panels are usually more gender balanced than the field they represent, but in the extreme cases (Engineering, Education, Mathematics) they are skewed as would be expected. Panels for other award decision makers was not available.

      I wonder if the research score metric isn't necessarily reflecting on the gender bias in the discipline but rather on the discipline itself? Terms like "hard science" and "soft science" are frequently used and may perpetuate these biases. This is somewhat supported by the data - on line 402-403 the authors state that women in male-dominated fields like Physics have the same expected score as a man. Could it be that Physics has a higher score than Education even if Physics was woman-dominated and Education was man-dominated? Are there any instances in the data where traditionally male- or female-dominated disciplines are outliers and happen to be the opposite? If so, in those cases, do the findings hold up?

      Overall we would love to answer this question! But our data is not enough. We mention these points in the Discussion (Lines 472-466). We have extended this a little to cover the questions raised here.

      How are those with non-binary gender identities handled in this article? If there is any data on the subject, I would be curious to know how this effects research score and funding success.

      These data were either unavailable or the sample size was too small to be considered anonymously (Mentioned on Lines 74-76).

      A limitation of the present article is a lack of data on major research-producing countries like China and the United States. Is there any data relevant to these or other countries? Is there reason to believe the findings outlined in this manuscript would apply or not apply to those countries also?

      We would be very excited to see if the findings held up in other countries, particularly any that were less European based. Unfortunately we could not find any data to include. Maybe one day!

      What are the motivations or other factors driving men to certain fields and women to certain fields over others? What are the active barriers preventing all fields from 50% gender parity?

      Field choice is a highly studied area and the explanations are myriad we have included a few references in the discussion section on job choice. I usually recommend my students read the blog post at

      https://www.scientificamerican.com/blog/hot-planet/the-people-who-could-have-done-science-didnt/

      It is very thoughtful but unfortunately not appropriate to reference here.

      The authors find very interesting data on funding rates. Have you considered funding rates and the size of funding awards as a factor in research score? Some disciplines like biomedical science receive larger grants than others like education.

      A very interesting thought for our next piece of work. We would definitely like to explore our hypothesis further.

      There are instances where the authors writing may perpetuate bias. If possible these should be avoided. One example is on line 458-459 where the authors state "...why these lower quality disciplines are more likely..." This could be re-written to emphasize that some disciplines are "perceived" as lower quality. Certainly those in these discipline would not characterize their chosen discipline as "low quality".

      Well-spotted! Now corrected as you suggest.

      Similar to the preceding comment, the authors should use care with the term "gender". In the datasets used, how many gender identities were captured? How many gender identity options were given in the surveys or data intake forms? Could individuals in these datasets have been misgendered? Do the data truly represent gender identity or biological sex?

      We know that in the PBRF dataset gender was a binary choice and transgender individuals were able to choose which group they identified with. There was no non-binary option (in defence the latest dataset there is from 2018 and NZ has only recently started updating official forms to be more inclusive) and individuals with gender not-stated (a very small number) were excluded. ARC did mention that a small number of individuals were either non-binary or gender not stated, again these are not included here for reasons of anonymity. This is now mentioned on Lines 74-76. The effects on this group are important and understudied likely because, as here, the numbers are too small to be included meaningfully.

      Reviewer #3 (Recommendations For The Authors):

      Major revisions:

      Could you add line numbers to the Supplementary Materials for the next submission?

      Yes! Sorry for the omission.

      (1) In the main text L146 and Figure 1, it is not clear why the expected model output line is for a 50 year old male from University of Canterbury only, but the data points are from disciplines in all eight universities in New Zealand. I think it would be more clear and informative to report the trend lines that represent the data points. At the moment it is hard to visualise how the results apply to other age groups or universities.

      As age and institution are linear variables with no interactions they are only a constant adjustment above or below this line and the adjustment is small in comparison to the linear trend. Unfortunately, if they were included graphically they do not aid understanding. We agree that indluded raw data with an adjusted trend line can be confusing buy after a lor of between-author discussion this was the most informative compromise we could find (many people like raw data so we included it).

      (2) Does your logistic regression model consider sample size weighting in pmen? Weighting according to sample sizes needs to be considered in your model. At the moment it is unclear and suggests a proportion between 0 and 1 only is used, with no weighting according to sample size. If using R, you can use glm(cbind(nFem, nMalFem).

      Yes. All data points were weighted by group size exactly as you suggest. We have updated the text on Lines 317 to make this clear.

      (3) For PBRF, I think it is useful to outline the 14 assessment panels and the disciplines they consider. Did you include the assessment panel as an explanatory variable in your model too to investigate whether quality is assessed in the same manner between panels? If not, then suggest reasons for not doing so.

      We have now included more detail in main text on the gender split of the panels. They were not included as an explanatory variable. In theory there was some cross-referencing of panel scores to ensure consistency as part of the PBRF quality assurance guidelines.

      (4) There are several limitations which should be discussed more openly:

      Patterns only represent the countries studied, not necessarily academia worldwide.

      Mentioned on Line 485-487.

      Gender is described as a binary variable.

      Discussed on Line 74-76.

      The measure of research evaluation as a reflection of academic merit.

      This is acknowledged in the data limitations paragraph in the discussion, at the end of the discussion

      Minor revisions:

      (1) L186. Why do you analyse bibliometric differences between individuals from University of Canterbury only? It would be helpful to outline your reasons.

      Although bibliometric data is publicly available it is difficult to collect for a large number of individuals. You also need some private data to match bibliometrics with PBRF data which is anonymous. We were only able to do this for our own institution with considerable internal support.

      (2) How many data records did you have to exclude in L191 because they could not be linked? This is helpful to know how efficient the process was, should anyone else like to conduct similar studies.

      We matched over 80% of available records (384 individuals). We have mentioned this on Line 194.

      (3) Check grammar in the sentence beginning in L202.

      Thank-you. Corrected.

      (4) Please provide a sample size gender breakdown for "University of Canterbury (UC) bibliometric data", as you do for the preceding section. A table format is helpful.

      Included on Line 194.

      (5) L377 I think this sentence needs revision.

      Thank you, we have reworked that paragraph.

      (6) L389-392 Is it possible evaluation panels can score women worse than men and that because more women are present in female-biassed disciplines, the research score in these are worse? Women scoring worse between fields, may be a result of some scaling to the mean score.

      No.  This is not possible because women in male-dominated fields score higher.

      (7) L393 Could you discuss explanations for why men outperform women in research evaluation scores more when disciplines are female dominated?

      Unfortunately, we don’t have an explanation for this and can’t get one from our data. We hope it will be an interesting for future work.

      (8) Could the figures be improved by having the crosses, x and + scaled, for example, in thickness corresponding to sample size? Alternatively, some description of the sample size variation? Sorting the rows by order of pmen in Table E1 would also be helpful for the reader.

      As with the previous figure we have tried many ways of presenting it (including tis one). Unfortunately nothing helped.

      We have provided Table E1 as a spreadsheet to allow readers to do this themselves.

      (9) Please state in your methods section the software used to aid repeatability.

      This is now in Supplementary Materials (Matlab 2022b).

      (10) It is great to report your model findings into real terms for PBRF and ARC. Please can you extend this to CIHR and EIGE. i.e. describing how a gender skew increase of x associates with a y increase in funding success chance.

      We have added similar explanations for both these datasets comparing the advantage of being male with the advantage of working in a male dominated discipline.

      (11) I would apply care to using pronouns "his" and "her" in L322-L324 and avoid if at all possible, instead, replacing them with "men" and "women".

      We have updated the text to avoid there pronouns in most places.

      The article in general would benefit from a disclosure statement early on conceding that gender investigated here is only as a binary variable, discounting its spectrum.

      See Line 74-76.

      Please also report how gender balance is defined in the datasets as in the data summary in supplementary materials, within the main text.

      Our definition of gender balance (proportion of researchers who are men, ) is given on Line 103.

      (12) The data summary Table S1 could benefit from explaining the variables in the first column. It is currently unclear how granularity, size of dataset and quotas/pre-allocation? are defined.

      These lines have been removed as they information they contained is included elsewhere in the table with far better explanations!

      (13) There are only 4 data points for investigating covariation between gender balance and funding success in CIHR. This should be discussed as a limitation.

      The small size of the dataset is now mentioned on Line 348.

      (14) L455 "Research varies widely across disciplines" in terms of what?

      This sentence has been extended

      .

      (15) L456 Maybe I am missing something but I don't understand the relevance of "Physicists' search for the grand unified theory" to research quality.

      Removed.

      (16) Can you provide more discussion into the results of your bibliographic analysis and Figure 2? An explanation into the relationships seen in the figure at least would be helpful.

      Thank you we have clarified the relationships seen in each of figures 2A (Lines 226-235), 2B (Lines 236-252), and 2C (lines  260-268).

      (17) It would be helpful to include in the discussion a few more sentences outlining:

      - Potential future research that would help disentangle mechanisms behind the trends you find.

      - How this research could be applied. Should there be some effort to standardise?

      We have added a short paragraph to the discussion about implications/applications, and future research (Lines 481-484).

      (18) The introduction could benefit from discussing and explaining their a priori hypotheses for how research from female-biassed disciplines may be evaluated differently.

      While not discussed in the introduction, possible explanations for why and how research in female dominated fields might be evaluated differently are explored in some detail in the Discussion.  We think once is enough, and towards the end is more effective than at the beginning.

      (19) L16 "Our work builds on others' findings that women's work is valued less, regardless of who performs that work." I find this confusing because in your model, there is a significant interaction effect between gender:pmen. This suggests that for female-biassed disciplines, there is even more of a devaluation for women, which I think your lines in figure 1 suggest.

      Correct but men are still affected, so the sentence is correct.  What is confusing is that the finding is counter to what we might expect.

    1. Overall Rating (⭐⭐⭐⭐☆)

      Impact (⭐⭐⭐⭐⭐): This paper compares the ability of three different species, two primates and one non-primate, to persist in behaviors and works to explain why there are such similarities in some actions but differences in others. It makes some interesting findings that may be relevant to brain cognition in humans and therefore has the potential to have high impact in the behavioral neuroscience field.In this study the authors utilized a well known decision making paradigm to study how decision making compares between primates and mice. This is important because rodent models are increasingly being used as replacements for primates in cognitive studies, particularly developmental studies, drug development, and injury models. This use of rodents can only be of value if their decision making behaviors properly model those of the primates they are replacing. Neural network studies in primate studies would suggest that rodents would not be a total replacement for cognitive studies and this paper seems to corroborate that, at least when it comes to task persistence, with rodents switching tasks at a more rapid pace than the primates. It is unclear yet how to incorporate this information into cognitive studies using rodents but having this information is an important step in being able to

      Methods (⭐⭐⭐⭐☆): The authors use three different species, mice (Mus musculus (males and females). They were presented with a species appropriate k-armed bandit task where targets were placed for them to choose from to get a reward. Individuals had to choose from a known reward or to explore for a new reward. Switching between choices was analyzed and compared using standard ANOVA.

      Note: Please add IACUC and IRB protocol numbers to the methods sections.

      Results (⭐⭐⭐⭐⭐): The authors examined switching behavior and exploratory behaviors in each species and found that while all three species engaged in switching and exploring, the mice switched targets most often indicating a lack of task persistence. In other words, they seemed to explore their options more than the primates did. This remained even after controlling for trial times and task design. Overall the results were compelling and well documented. The statistical analysis was thorough. The figures were clear but, if space were not an object, I would recommend that the 4 across panels be reworked to be a 4 square with 2 top panels and 2 lower panels, all of which could be a bit larger for better viewing.

      Discussion(⭐⭐⭐⭐☆): The discussion is fairly thorough although I think the addition of a discussion of the neural network models of task switching could add value. The neural networks are vastly different between rodents and primates and may also be a reason for the difference seen in the task persistence seen in this study. A discussion of next steps on how to potentially encorporate this information into the analysis of rodent studies on cognitive abilities would be so helprul seeing as we will only continue to increase the use of rodents in these types of studies, although that may take further experiementation.

      Overall, the study is excellent and should be published.

      Reviewer Information The reviewer is the Chair of Biotechnology at the Franklin Cummings Technical Institute. Her PhD is in neuroscience, but her work is as a protein biochemist working on inflammation, signal transduction, and cell-cell communication. She has worked in both industry and academia for over 20 years.

      Dr. Heather Duffy on ResearchHub: https://www.researchhub.com/user/1790894/

      ResearchHub Peer Reviewer Statement: This peer review has been uploaded from ResearchHub as part of a paid peer review initiative. ResearchHub aims to accelerate the pace of scientific research using novel incentive structures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Although the manuscript is well organized and written, it could be largely improved and therefore made more plausible and easier to read. See my point-by-point comments listed below:

      (1) The introduction section is a bit overloaded with some unnecessary information. For example, the authors discussed the relationship between neurotransmitters in the prefrontal and striatum and substance use/sustained attention. However, the results are related to neither the neurotransmitters nor the striatum. In addition, there is a contradictory description about neurotransmitters there, Nicotine/THC leads to increased neurotransmitters, and decreased neurotransmitters is related to poor sustained attention. Does that mean that the use of Nicotine/THC could increase sustained attention?

      Thanks for this insightful question. We understand your concern regarding the seemingly contradictory statements about neurotransmitters and sustained attention. Previous studies have shown that acute administration of nicotine can improve sustained attention (Lawrence et al., 2002; Potter and Newhouse, 2008; Valentine and Sofuoglu, 2018; Young et al., 2004). On the other hand, the acute effects of smoking cannabis on sustained attention are mixed and depend on factors such as dosage and individual differences (Crean et al., 2011). For instance, a previous study (Hart et al., 2001) found that performance on a tracking task, which requires sustained attention, was found to improve significantly after smoking cannabis with a high dose of THC, albeit in experienced cannabis users. However, chronic substance use, including nicotine and cannabis, has been associated with impaired sustained attention (Chamberlain et al., 2012; Dougherty et al., 2013).

      To address your concerns and improve clarity and succinctness of the Introduction, we have removed the description of neurotransmitters from the Introduction. This revision should make the introduction more concise and focus on the direct relationships pertinent to our study.

      (2) It is a bit hard to follow the story for the readers because the Results section went straight into detail. For example, the authors directly introduced that they used the ICV from the Go trials to index sustained attention without basic knowledge about the task. Why use the ICV of Go trials instead of other trials (i.e., successful stop trials) as an index of sustained attention? I suggest presenting the subjects and task details about the data before the detailed behavioral results. The results section should include enough information to understand the presenting results for the readers, rather than forcing the reader to find the answer in the later Methods section.

      We appreciate your suggestion to provide more context about the task and ICV before diving into the detailed behavioural results.

      We used the ICV derived from the Go trials instead of Success stop trials as an index of sustained attention, based on the nature of the stop-signal task and the specific data it generates. Previous studies have indicated that reaction time (RT) variability is a straightforward measure of sustained attention, with increasing variability thought to reflect poorer ability to sustain attention (Esterman and Rothlein, 2019). RT variability is defined as ICV, calculated as the standard deviation of mean Go RT divided by the mean Go RT from Go trials (O'Halloran et al., 2018). The stop signal task includes both Go trials and stop trials. During Go trials, participants are required to respond as quickly and accurately as possible to a Go signal, allowing for the recording of RT for calculating ICV. In contrast, stop trials are designed to measure inhibitory control, where successful response inhibition results in no RT or response recorded in the output. Therefore, Go trials are specifically used to assess sustained attention, while Stop trials primarily assess inhibitory control (Verbruggen et al., 2019).

      We acknowledge the importance of providing this contextual information within the Results section to enhance reader understanding. We have added this information before presenting the behavioural results on Page 6.

      Results

      (1) Behavioural changes over time

      Reaction time (RT) variability is a straightforward measure of sustained attention, with increasing variability thought to reflect poor sustained attention. RT variability is defined as intra-individual coefficient of variation (ICV), calculated as the standard deviation of mean Go RT divided by the mean Go RT from Go trials in the stop signal task. Lower ICV indicates better sustained attention.

      (3) The same problem for section 2 in the Results. What are the predictive networks? Are the predictive networks the same as the networks constructed based on the correlation with ICV? My intuitive feeling is that they are the circular analyses here. The positive/negative/combined networks are calculated based on the correlation between the edges and ICV. Then the author used the network to predict the ICV again. The manipulation from the raw networks (I think they are based on PPI) to the predictive network, and the calculation of the predicted ICV are all missing. The direct exposure of the results to the readers without enough detailed knowledge made everything hard to digest.

      We thank the Reviewer for the insightful comment. We agree with the need for more clarity regarding the predictive networks and the CPM analysis before presenting results. CPM, a data-driven neuroscience approach, is applied to predict individual behaviour from brain functional connectivity (Rosenberg et al., 2016; Shen et al., 2017). The CPM analysis used the strength of the predictive network to predict the individual difference in traits and behaviours. CPM includes several steps: feature selection, feature summarization, model building, and assessment of prediction significance (see Fig. S1).

      During feature selection, we assessed whether connections between brain areas (i.e., edges) in a task-related functional connectivity matrix (derived from general psychophysiological interaction analysis) were positively or negatively correlated with ICV using a significance threshold of P < 0.01. These positively or negatively correlated connections are regarded as positive or negative network, respectively. The network strength of the positive network (or negative network) was determined in each individual by summing the connection strength of each positively (or negatively) correlated edge. The combined network was determined by subtracting the strength of the negative network from the positive network. Next, CPM built a linear model between the network strength of the predictive network and ICV. This model was initially developed using the training set. The predictive networks were then applied to the test set, where network strength was calculated again, and the linear model was used to predict ICV using k-fold cross-validation. Following your advice, we have updated it in the Results section to include these details on Page 7.

      Results

      (2) Cross-sectional brain connectivity

      This study employed CPM, a data-driven neuroscience approach, to identify three predictive networks— positive, negative, and combined— that predict ICV from brain functional connectivity. CPM typically uses the strength of the predictive networks to predict individual differences in traits and behaviors. The predictive networks were obtained based on connectivity analyses of the whole brain. Specifically, we assessed whether connections between brain areas (i.e., edges) in a task-related functional connectivity matrix derived from generalized psychophysiological interaction analysis were positively or negatively correlated with ICV using a significance threshold of P < 0.01. These positively or negatively correlated connections were regarded as positive or negative network, respectively. The network strength of positive networks (or negative networks) was determined for each individual by summing the connection strength of each positively (or negatively) correlated edge. The combined network was determined by subtracting the strength of the negative network from the positive network. We then built a linear model between network strength and ICV in the training set and applied these predictive networks to yield network strength and a linear model in the test set to calculate predicted ICV using k-fold cross validation.

      (4) The authors showed the positive/negative/combined networks from both Go trials and successful stop trials can predict the ICV. I am wondering how the author could validate the specificity of the prediction of these positive/negative/combined networks. For example, how about the networks from the failed stop trials?

      We appreciate the opportunity to clarify the specificity of the predictive networks identified in our study. Here is a more detailed explanation of our findings and their implications.

      To validate the specificity of the sustained attention network identified from CPM analysis, we calculated correlations between the network strength of positive and negative networks and performances from a neuropsychology battery (CANTAB) at each timepoint separately. CANTAB includes several tasks that measure various cognitive functions, such as sustained attention, inhibitory control, impulsivity, and working memory. We found that all positive and negative networks derived from Go and Successful stop trials significantly correlated with a behavioural assay of sustained attention – the rapid visual information processing (RVP) task – at ages 14 and 19 (all P values < 0.028). Age 23 had no RVP task data in the IMAGEN study. There were sporadic significant correlations between constructs such as delay aversion/impulsivity and negative network strength, for example, but the correlations with the RVP were always significant. This demonstrates that the strength of the sustained attention brain network was specifically and robustly correlated with a typical sustained attention task, rather than other cognitive measures. The results are described in the main text on Page 8 and shown in Supplementary materials (Pages 1 and 3) and Table S12.

      In addition, we conducted a CPM analysis to predict ICV using gPPI under Failed stop trials. Our findings showed that positive, negative, and combined networks derived from Failed stop trials significantly predicted ICV: at age 14 (r = 0.10, P = 0.033; r = 0.19, P < 0.001; and r = 0.17, P < 0.001, respectively), at age 19 (r = 0.21; r = 0.18; and r = 0.21, all P < 0.001, respectively), and at age 23 (r = 0.33, r = 0.35, and r = 0.36, respectively, all P < 0.001). Similar results were obtained using a 5-fold CV and leave-site-out CV.

      Our analysis further showed that task-related functional connectivity derived from Go trials, Successful Stop trials, and Failed Stop trials could predict sustained attention across three timepoints. However, the predictive performances of networks derived from Go trials were higher than those from Successful Stop and Failed Stop trials. This suggests that sustained attention is particularly crucial during Go trials when participants need to respond to the Go signal. In contrast, although Successful Stop and Failed Stop trials also require sustained attention, these tasks primarily involve inhibitory control along with sustained attention.

      Taken together, these findings underscore the specificity of the predictive networks of sustained attention. We have updated these results in the Supplementary Materials (Pages 3-5 and Page 7 ):

      Method

      CPM analysis using Failed stop trials

      We performed another CPM analysis using Failed stop trials using gPPI matrix obtained from the second GLM, described in the main text. The CPM analysis was conducted using 10-fold CV, 5-fold CV and leave-site-out CV.

      Results

      CPM predictive performance under Failed stop trials

      Positive, negative, and combined networks derived from Failed stop trials significantly predicted ICV: at age 14 (r = 0.10, P = 0.033; r = 0.19, P < 0.001; and r = 0.17, P < 0.001, respectively), at age 19 (r = 0.21; r = 0.18; and r = 0.21, all P < 0.001, respectively), and at age 23 (r = 0.33, r = 0.35, and r = 0.36, respectively, all P < 0.001). We obtained similar results using a 5-fold CV and leave-site-out CV (Table S6).

      Discussion

      Specificity of the prediction of predictive networks

      We found that task-related function connectivity derived from Go trials, Successful stop trials, and Failed stop trials successfully predicted sustained attention across three timepoints. However, predictive performances of predictive networks derived from Go trials were higher than those derived from Successful stop trials and Failed stop trials. These results suggest that sustained attention is particularly crucial during Go trials when participants need to respond to the Go signal. In contrast, although Successful Stop and Failed Stop trials also require sustained attention, these tasks primarily involve inhibitory control along with sustained attention.

      (5) The author used PPI to define the connectivity of the network. I am not sure why the author used two GLMs for the PPI analysis separately. In the second GLM, Go trials were treated as an implicit baseline. What does this exactly mean? And the gPPI analysis across the entire brain using the Shen atlas is not clear. Normally, as I understand, the PPI/gPPI is conducted to test the task-modulated connectivity between one seed region and the voxels of the whole rest brain. Did the author perform the PPI for each ROI from Shen atlas? More details about how to use PPI to construct the network are required.

      Thank you for your insightful questions. Here, we’d like to clarify how we applied generalized PPI across the whole brain using the Shen atlas and why we used two separate GLMs for the gPPI analysis.

      Yes, PPI is conducted to test the task-modulated connectivity between one seed region and other brain areas. This method can be both voxel-based and ROI-based. In our study, we performed ROI-based gPPI analysis using Shen atlas with 268 regions. Specifically, we performed the PPI on each seed region of interest (ROI) to estimate the task-related FC between this ROI and the remaining ROI (267 regions) under a specific task condition. By performing this analysis across each ROI in the Shen atlas, we generated a 268 × 268 gPPI matrix for each task condition. The matrices were then transposed and averaged with the original matrices, which yielded symmetrical matrices, which were subsequently used for CPM analysis.

      Regarding the use of two separate GLMs for the gPPI analysis, our study aimed to define the task-related FC under two conditions: Go trials and Successful stop trials. The first GLM including Go trials was built to estimate the gPPI during Go trials. However, due to the high frequency of Go trials in the stop signal task, it is common to regard the Go trials as an implicit baseline, as in previous IMAGEN studies (D'Alberto et al., 2018; Whelan et al., 2012). Therefore, to achieve a more accurate estimation of FC during Successful stop trials, we built a second GLM specifically for these trials. Accordingly, we have updated it in the Method Section in the main text on Page 16.

      Method

      2.5 Generalized psychophysiological interaction (gPPI) analysis

      In this study, we adopted gPPI analysis to generate task-related FC matrices and applied CPM analysis to investigate predictive brain networks from adolescents to young adults. PPI analysis describes task-dependent FC between brain regions, traditionally examining connectivity between a seed region of interest (ROI) and the voxels of the whole rest brain. However, this study conducted a generalized PPI analysis, which is on ROI-to-ROI basis (Di et al., 2021), to yield a gPPI matrix across the whole brain instead of just a single seed region.

      Given the high frequency of Go trials in SST, it is common to treat Go trials as an implicit baseline in previous IMAGEN studies (D'Alberto et al., 2018; Whelan et al., 2012). Hence, we built a separate GLM for Successful stop trials, which included two task regressors (Failed and Successful stop trials) and 36 nuisance regressors.

      (6) Why did the author use PPI to construct the network, rather than the other similar methods, for example, beta series correlation (BSC)?

      Thanks for your question. PPI is an approach used to calculate the functional connectivity (FC) under a specific task (i.e., task-related FC). Although most brain connectomic research has utilized resting-state FC (e.g., beta series correlation), FC during task performance has demonstrated superiority in predicting individual behaviours and traits,  due to its potential to capture more behaviourally relevant information (Dhamala et al., 2022; Greene et al., 2018; Yoo et al., 2018). Specifically, Zhao et al. (2023) suggested that task-related FC outperforms both typical task-based and resting-state FC in predicting individual differences. Therefore, we chose to use task-related FC to predict sustained attention over time. We have updated it in the Introduction on Page 5.

      Introduction

      Although most brain connectomic research has utilized resting-state fMRI data, functional connectivity (FC) during task performance has demonstrated superiority in predicting individual behaviours and traits, due to its potential to capture more behaviourally relevant information (Dhamala et al., 2022; Greene et al., 2018; Yoo et al., 2018). Specifically, Zhao et al. (2023) suggested that task-related FC outperforms both typical task-based and resting-state FC in predicting individual differences. Hence, we applied task-related FC to predict sustained attention over time.

      (7) In the section of 'Correlation analysis between the network strength and substance use', the author just described that 'the correlations between xx and xx are shown in Fig5X', and repeated it three times for three correlation results. What exactly are the results? The author should describe the results in detail. And I am wondering whether there are scatter plots for these correlation analyses?

      We’d like to clarify the results in Fig. 5. Fig. 5 illustrates the significant correlations between behaviour and brain activity associated with sustained attention and Cigarette and cannabis use (Cig+CB) after FDR correction. Panel A shows the significant correlation between behaviour level of sustained attention and Cig+CB. Panels B and C show the correlations between brain activity associated with sustained attention and Cig+CB. While Panel B presents the brain activity derived from Go trials, Panel C presents brain activity derived from Successful stop trials. In response to your suggestion, we have described these results in detail on Page 9. We also have included scatter plots for the significant correlations, which are shown in Fig. 5 in Supplementary materials (Fig. S10).

      Results

      (6) Correlation between behaviour and brain to cannabis and cigarette use

      Figs. 5A-C summarizes the results showing the correlation between ICV/brain activity and Cig+CB per timepoint and across timepoints. Fig. 5A shows correlations between ICV and Cig+CB (Tables S14-15). ICV was correlated with Cig+CB at ages 19 (Rho = 0.13, P < 0.001) and 23 (Rho = 0.17, P < 0.001). ICV at ages 14 (Rho = 0.13, P = 0.007) and 19 (Rho = 0.13, P = 0.0003) were correlated with Cig+CB at age 23. Cig+CB at age 19 was correlated with ICV at age 23 (Rho = 0.13, P = 9.38E-05). Fig. 5B shows correlations between brain activity derived from Go trials and Cig+CB (Tables S18-19). Brain activities of positive and negative networks derived from Go trials were correlated with Cig+CB at age 23 (positive network: Rhop = 0.12, P < 0.001; negative network: Rhon = -0.11, P < 0.001). Brain activity of the negative network derived from Go trials at age 14 was correlated with Cig+CB at age 23 (Rhon = -0.16, P = 0.001). Cig+CB at age 19 was correlated with brain activity of the positive network derived from Go trials at age 23 (Rhop = 0.10, P = 0.002). Fig. 5C shows the correlations between brain activity derived from Successful stop and Cig+CB (Tables S18-19). Brain activities of positive and negative networks derived from Successful stop were correlated with Cig+CB at ages 19 (positive network: Rhop = 0.10, P = 0.001; negative network: Rhon = -0.08, P = 0.013) and 23 (positive network: Rhop = 0.13, P < 0.001; negative network: Rhon = -0.11, P = 0.001).

      (8) Lastly, the labels of (A), (B) ... in the figure captions are unclear. The authors should find a better way to place the labels in the caption and keep them consistent throughout all figures.

      Thank you for this valuable comment. We have revised the figure captions in the main text to ensure the labels (A), (B), etc., are placed more clearly and consistently across all figures.

      Reviewer #2 (Public Review):

      While the study largely achieves its aims, several points merit further clarification:

      (1) Regarding connectome-based predictive modeling, an assumption is that connections associated with sustained attention remain consistent across age groups. However, this assumption might be challenged by observed differences in the sustained attention network profile (i.e., connections and related connection strength) across age groups (Figures 2 G-I, Fig. 3 G_I). It's unclear how such differences might impact the prediction results.

      Thank you for your insightful comment. We’d like to clarify that we did not assume that connections associated with sustained attention remain completely consistent across age groups. Indeed, we expected that connections would change across age groups, due to the developmental changes in brain function and structure from adolescence to adulthood. Our focus was on the consistency of individual differences in sustained attention networks over time, recognising that the actual connections within those networks may change. However, we did show that there is some consistency in the specific connections associated with sustained attention over time. Notably, this consistency markedly increases when comparing ages 19 and 23, when developmental factors are less relevant. We support our reasoning above with the following analyses:

      (1) Supplementary materials (Pages 2 and 5), relevant sections highlighted here for emphasis.

      Method

      Comparison of predictive networks identified at one timepoint versus another

      Steiger’s Z value was employed to compare predictive performances of networks identified at different timepoints. This analysis involved comparing the R values derived from networks defined at distinct ages to predict ICV at the same age. For example, we compared the r values of brain networks defined at age 14 when predicting ICV at 19 (i.e., positive network: r = 0.25, negative network: r = 0.25, combined network: r = 0.28) with those R values of brain networks defined at age 19 itself (i.e., positive network: r = 0.16, negative network: r = 0.14, combined network: r = 0.16) derived from Go trials using Steiger's Z test (age 14 → age 19 vs. age 19 → 19). Similarly, comparisons were made between networks defined at age 14 predicting ICV at age 23 and those at age 23 predicting ICV at age 23 (age 14 → age 23 vs. age 23 → 23), as well as between networks defined at age 19 predicting ICV at age 23 and those at age 23 predicting ICV at age 23 (age 19 -> age 23 vs. age 23 -> age 23). These comparisons were performed separately for Go trials and Successful Stop trials.

      Results

      Comparison of predictive performance at different timepoints

      For positive, negative, and combined networks predicting ICV derived from Go trials at age 19, the R values were higher when using predictive networks defined at 19 than those defined at 14 (Z = 3.79, Z = 3.39, Z = 3.99, all P < 0.00071). Similarly, the R values for positive, negative, and combined networks predicting ICV derived from Go trials at age 23 were higher when using predictive networks defined at age 23 compared to those defined at ages 14 (Z = 6.00, Z = 5.96, Z = 6.67, all P < 3.47e-9) or 19 (Z = 2.80, Z = 2.36, Z = 2.57, all P < 0.005).

      At age 19, the R value for the positive network predicting ICV derived from Successful stop trials was higher when using predictive networks defined at 19 compared to those defined at 14 (Z = 1.54, P = 0.022), while the negative and combined networks did not show a significant difference (Z = 0.85, P = 0.398; Z = 2.29, P = 0.123). At age 23, R values for the positive and combined networks predicting ICV derived from Successful stop trials were higher when using predictive networks defined at 23 compared to those defined at 14 (Z = 3.00, Z = 2.48, all P < 3.47e-9) or 19 (Z = 2.52, Z = 1.99, all P < 0.005). However, the R value for the negative network at age 23 did not significantly differ when using predictive networks defined at 14 (Z = 1.80, P = 0.072) or 19 (Z = 1.48, P = 0.138).

      These results indicate that some specific pairwise connections associated with sustained attention at earlier ages, such as 14 and 19, are still relevant as individuals grow older. However, some connections are not optimal for good sustained attention at older ages. That is, the brain reorganizes its connection patterns to maintain optimal functionality for sustained attention as it matures.

      (2) Consistency of Individual Differences:

      We found individual differences in ICV were significantly correlated between the three timepoints (Fig. 1B). In addition, we calculated the correlations of network strength of predictive networks predicting sustained attention derived from Go trials and Successful trials between each timepoints. We found that the correlations of network strength for predictive networks (derived from Go trials and Successful trials) were also significant (all P < 0.003). We have updated these results in the main text (Pages 7-8) and Supplementary Materials (Table S7).

      (2) Cross-sectional brain connectivity

      In addition, we found that network strength of positive, negative, and combined networks derived from Go trials was significantly correlated between the three timepoints (Table S7, all P < 0.003).

      In addition, we found that network strength of positive, negative, and combined networks derived from Successful stop trials was significantly correlated between the three timepoints (Table S7, all P < 0.001).

      (3) Predictive networks across timepoints: Predictive networks defined at age 14 were successfully applied to predict ICV at ages 19 and 23. Similarly, predictive networks defined at age 19 were successfully applied to predict ICV at age 23 (Fig. 4). These results reflect the robustness of the brain network associated with sustained attention over time.

      (4) Dice coefficient analysis: We calculated the Dice coefficient to quantify the similarity of predictive networks across the three timepoints. Connections in the sustained attention networks were significantly similar from ages 14 to 23 (Table S13), despite relatively few overlapping edges over time (as discussed in Supplementary Materials on Page 6).

      (5) Global brain activation: Based on these findings, we indicate that sustained attention relies on global brain activation (i.e., network strength) rather than specific regions or networks (see also (Zhao et al., 2021)).

      In summary, brain network connections undergo change and are not completely consistent across time. However, individual differences in sustained attention and its network are consistent across time, as we found that 1) the brain reorganizes its connection patterns to maintain optimal functionality for sustained attention as it matures. 2) ICV and network strength of sustained attention network were significantly correlated between each timepoint. 3) Sustained attention networks identified from previous timepoints could predict ICV in the subsequent timepoint. 4) Dice coefficient analysis indicated that the edges in the sustained attention networks were significantly similar from ages 14 to 23. 5) Sustained attention networks function as a global activation, rather than specific regions or networks.

      (2) Another assumption of the connectome-based predictive modeling is that the relationship between sustained attention network and substance use is linear and remains linear over development. Such linear evidence from either the literature or their data would be of help.

      Thanks for your valuable suggestion. We'd like to clarify that while CPM assumes a linear relationship between brain and behaviour (Shen et al., 2017), it does not assume that the relationship between the sustained attention network and substance use remains linear over development.

      Our approach in applying CPM to predict sustained attention across different timepoints was based on previous neuroimaging studies (Rosenberg et al., 2016; Rosenberg et al., 2020), which indicated linear associations between brain connectivity patterns and sustained attention using CPM analysis. These findings support the notion of a linear relationship between brain connectivity and sustained attention. In this study, we performed CPM analysis to identify predictive networks predicting sustained attention, not substance use and used the network strength of these predictive networks to represent sustained attention activity.

      To examine the relationship between substance use and sustained attention, as well as its associated brain activity, we conducted correlation analyses and utilized a latent change score model instead of CPM analysis. This decision was informed by cross-sectional studies (Broyd et al., 2016; Lisdahl and Price, 2012) that consistently reported linear associations between substance use and impairments in sustained attention. Additionally, longitudinal research by (Harakeh et al., 2012) indicated a linear relationship between poorer sustained attention and the initiation and escalation of substance use over time.

      Given these previous findings, we assumed a linear relationship between sustained attention and substance use. Our analyses included calculating correlations between substance use and sustained attention, as well as its associated brain activity at each timepoint and across timepoints (Fig. 5). Furthermore, we employed a three-wave bivariable latent change score model, a longitudinal approach, to assess the relationship between substance use and behavirour and brain activity associated with sustained attention (Figs. 6-7). We have added more information in the Introduction to make it more clear on Page 6.

      Introduction

      Additionally, previous cross-sectional and longitudinal studies (Broyd et al., 2016; Harakeh et al., 2012; Lisdahl and Price, 2012) have shown that there are linear relationships between substance use and sustained attention over time. We therefore employed correlation analyses and a latent change score model to estimate the relationship between substance use and both behaviours and brain activity associated with sustained attention.

      (3) Heterogeneity in results suggests individual variability that is not fully captured by group-level analyses. For instance, Figure 1A shows decreasing ICV (better-sustained attention) with age on the group level, while there are both increasing and decreasing patterns on the individual level via visual inspection. Figure 7 demonstrates another example in which the group with a high level of sustained attention has a lower risk of substance use at a later age compared to that in the group with a low level of sustained attention. However, there are individuals in the high sustained attention group who have substance use scores as high as those in the low sustained attention group. This is important to take into consideration and could be a potential future direction for research.

      Thanks for this valuable comment. We appreciate your observation regarding the individual variability that is not fully captured by group-level analyses to some degree. Fig. 1A shows the results from a linear mixed model, which explains group-level changes over time while accounting for the random effect within subjects. Similarly, Fig. 7 shows the group-level association between substance use and sustained attention. We agree that future research could indeed consider individual variability. For example, participants could be categorized based on their consistent trajectories of ICV or substance use (i.e., keep decreasing/increasing) over multiple timepoints. We agree that incorporating individual-level analyses in the future could provide valuable insights and are grateful for your suggestion, which will inform our future research directions.

      The above-mentioned points might partly explain the significant but low correlations between the observed and predicted ICV as shown in Figure 4. Addressing these limitations would help enhance the study's conclusions and guide future research efforts.

      We have updated the text in the Discussion on Page 13:

      Discussion

      However, there are still some individual variabilities not captured in this study, which could be attributed to the diversity in genetic, environmental, and developmental factors influencing sustained attention and substance use. Future research should aim to explore these variabilities in greater depth to gain better understanding of the relationship between sustained attention and substance use.

      Reviewer #3 (Public Review):

      Weaknesses: It's questionable whether the prediction approach (i.e., CPM), even when combined with longitudinal data, can establish causality. I recommend removing the term 'consequence' in the abstract and replacing it with 'predict'. Additionally, the paper could benefit from enhanced rigor through additional analyses, such as testing various thresholds and conducting lagged effect analyses with covariate regression.

      Thank you for your comment. We have replaced “consequence” by “predict” in the abstract.

      Abstract

      Previous studies were predominantly cross-sectional or under-powered and could not indicate if impairment in sustained attention was a predictor of substance-use or a marker of the inclination to engage in such behaviour.

      Reviewer #3 (Recommendations For The Authors):

      (1) The connectivity analysis predicts both baseline and longitudinal attention measures. However, given the high correlation in attention abilities across the three time-points, it's unclear whether the connectivity predicts shared variations of attention across three time points. It would be insightful to assess if predictions at the 2nd and 3rd-time points remained  significant after controlling for attention abilities at the initial time point.

      Thanks for your comments. We performed the CPM analysis to predict ICV at the 2nd and 3rd timepoint, controlling for ICV at age 14 as a covariate. We found that controlling for ICV at age 14, positive, negative, and combined networks derived from Successful stop trials defined at age 14 still predicted ICV at ages 19 and 23. In addition, positive, negative, and combined networks derived from Successful stop trials defined at age 19 predicted ICV at age 23. In addition, positive, negative, and combined networks derived from Go trials defined at age 19 still predicted ICV at age 23, after controlling for ICV at age 14. However, positive, negative, and combined networks derived from Go trials defined at age 14 had lower predictive performances in predicting ICV at ages 19 and 23, after controlling for ICV at age 14. Notably, controlling for ICV at the initial timepoint did not significantly impact the performances of predictive networks derived from Successful stop trials. Accordingly, we have added this analysis and the results in the Supplementary Materials (Pages 3 and 5).

      Method

      Prediction across timepoints controlling for ICV at age 14

      To examine whether connectivity predictors shared variations of sustained attention across timepoints, we applied predictive models developed at ages 14 and 19 to predict ICV at subsequent timepoints controlling for ICV at age 14. Specifically, we used predictive models (including parameters and selected edges) developed at age 14 to predict ICV at ages 19 and 23 separately. First, we calculated the network strength using the gPPI matrix at ages 19 and 23 based on the selected edges identified from CPM analysis at age 14. We then estimated the predicted ICV at ages 19 and 23 by applying the linear model parameters (slope and intercept) obtained from CPM analysis at age 14 to the network strength. Finally, we evaluated the predictive performance by calculating the partial correlation between the predicted and observed values at ages 19 and 23, controlling for ICV at age 14. Similarly, we applied models developed at age 19 to predict ICV at age 23, also controlling for ICV at age 14. To assess the significance of the predictive performance, we used a permutation test, shuffling the predicted ICV values and calculating partial correlation to general a random distribution over 1,000 iterations.

      Results

      Predictions across timepoints controlling for ICV at age 14

      Positive and combined networks derived from Go trials defined at age 14 predicted ICV at ages 19 (r = 0.10, P = 0.028; r = 0.08, P = 0.047) but negative network did not (r = 0.06, P = 0.119). Positive network derived from Go trials defined at age 14 predicted ICV at age 23 (r = 0.11, P = 0.013) but negative and combined networks did not (r = 0.04, P = 0.187; r = 0.08, P = 0.056).  Positive, negative, and combined networks derived from Go trials defined at age 19 predicted ICV at age 23 (r = 0.22, r = 0.19, and r = 0.22, respectively, all P < 0.001).

      Positive, negative, and combined networks derived from Successful stop trials defined at age 14 predicted ICV at age 19 (r = 0.08, P = 0.036; r = 0.10, P = 0.012; r = 0.11, P = 0.009) and 23 (r = 0.11, P = 0.005; r = 0.13, P = 0.005; r = 0.13, P = 0.017) respectively. Positive, negative, and combined networks derived from Successful stop trials defined at age 19 predicted ICV at age 23 (r = 0.18, r = 0.18, and r = 0.17, respectively, all P < 0.001).

      (2) In the Results section, a significance threshold of p = 0.01 was used for the CPM analysis. It would be beneficial to test the stability of these findings using alternative thresholds such as p = 0.05 or p = 0.005.

      We appreciate this insightful comment. We appreciate the suggestion to test the stability of our findings using alternative significance thresholds. Indeed, we have already conducted CPM analyses using a range of thresholds, including 0.1, 0.05, 0.01, 0.005, 0.001, 0.0005, and 0.0001 (see Table S8 in supplementary Materials). The results were similar across different thresholds. Following prior studies (Feng et al., 2024; Ren et al., 2021; Yoo et al., 2018) which used P < 0.01 for feature selection, we chose to focus on the threshold of P < 0.01 for our main analysis. Following your suggestion, we have highlighted this in the Method section on Pages 17-18.

      Method

      2.6.1 ICV prediction

      The r value with an associated P value for each edge was obtained, and a threshold P = 0.01 (Feng et al., 2024; Ren et al., 2021; Yoo et al., 2018) was set to select edges.

      2.6.2 Three cross-validation schemes

      In addition, we conducted the CPM analysis using a range of thresholds for feature selection and observed similar results across different thresholds (See Supplementary Materials Table S8).

      (3) Could you clarify if you used one sub-sample to extract connectivity related to sustained attention and then used another sub-sample to predict substance use with attention-related connectivity?

      Thank you very much for the question. We used the same sample to extract the brain network strength and estimated the correlation with substance use using both the Spearman correlation and latent change score model across three timepoints. We controlled for covariates including sex, age, and scan site at the same time. Accordingly, we have clarified this in the Method section on Page 20. We note that the CPM analyses were conducted using cross-validation, plus a leave-site-out analysis.

      Method

      2.7.3 Correlation between network strength and substance use

      It is worth noting that all the correlations between substance use and sustained attention were conducted using the same sample across three timepoints.

      (4) Could you clarify whether you have regressed covariates in the lagged effects analysis of part 7?

      Thanks for this question. Yes, we confirmed that we controlled the covariates including age, sex and scan sites in the latent change score model. We have described them more clearly now in the Method section (Page 18).

      Method

      2.7.3 Correlation between network strength and substance use

      Additionally, cross-lagged dynamic coupling (i.e., bidirectionality) was employed to explore individual differences in the relationships between substance use and linear changes in ICV/brain activity, as well as the relationship between ICV/brain activity and linear change in substance use. The model accounted for covariates such as age, sex and scan sites.

      References:

      Broyd, S.J., van Hell, H.H., Beale, C., Yucel, M., Solowij, N., 2016. Acute and Chronic Effects of Cannabinoids on Human Cognition-A Systematic Review. Biol Psychiatry 79, 557-567.

      Chamberlain, S.R., Odlaug, B.L., Schreiber, L.R.N., Grant, J.E., 2012. Association between Tobacco Smoking and Cognitive Functioning in Young Adults. The American Journal on Addictions 21, S14-S19.

      Crean, R.D., Crane, N.A., Mason, B.J., 2011. An evidence based review of acute and long-term effects of cannabis use on executive cognitive functions. J Addict Med 5, 1-8.

      D'Alberto, N., Chaarani, B., Orr, C.A., Spechler, P.A., Albaugh, M.D., Allgaier, N., Wonnell, A., Banaschewski, T., Bokde, A.L.W., Bromberg, U., Buchel, C., Quinlan, E.B., Conrod, P.J., Desrivieres, S., Flor, H., Frohner, J.H., Frouin, V., Gowland, P., Heinz, A., Itterman, B., Martinot, J.L., Paillere Martinot, M.L., Artiges, E., Nees, F., Papadopoulos Orfanos, D., Poustka, L., Robbins, T.W., Smolka, M.N., Walter, H., Whelan, R., Schumann, G., Potter, A.S., Garavan, H., 2018. Individual differences in stop-related activity are inflated by the adaptive algorithm in the stop signal task. Hum Brain Mapp 39, 3263-3276.

      Dhamala, E., Yeo, B.T.T., Holmes, A.J., 2022. Methodological Considerations for Brain-Based Predictive Modelling in Psychiatry. Biological Psychiatry.

      Di, X., Zhang, Z.G., Biswal, B.B., 2021. Understanding psychophysiological interaction and its relations to beta series correlation. Brain Imaging and Behavior 15, 958-973.

      Dougherty, D.M., Mathias, C.W., Dawes, M.A., Furr, R.M., Charles, N.E., Liguori, A., Shannon, E.E., Acheson, A., 2013. Impulsivity, attention, memory, and decision-making among adolescent marijuana users. Psychopharmacology (Berl) 226, 307-319.

      Esterman, M., Rothlein, D., 2019. Models of sustained attention. Curr Opin Psychol 29, 174-180.

      Feng, Q., Ren, Z., Wei, D., Liu, C., Wang, X., Li, X., Tie, B., Tang, S., Qiu, J., 2024. Connectome-based predictive modeling of Internet addiction symptomatology. Soc Cogn Affect Neurosci 19.

      Greene, A.S., Gao, S., Scheinost, D., Constable, R.T., 2018. Task-induced brain state manipulation improves prediction of individual traits. Nature Communications 9, 2807.

      Harakeh, Z., de Sonneville, L., van den Eijnden, R.J., Huizink, A.C., Reijneveld, S.A., Ormel, J., Verhulst, F.C., Monshouwer, K., Vollebergh, W.A., 2012. The association between neurocognitive functioning and smoking in adolescence: the TRAILS study. Neuropsychology 26, 541-550.

      Hart, C.L., van Gorp, W., Haney, M., Foltin, R.W., Fischman, M.W., 2001. =. Neuropsychopharmacology 25, 757-765.

      Lawrence, N.S., Ross, T.J., Stein, E.A., 2002. Cognitive mechanisms of nicotine on visual attention. Neuron 36, 539-548.

      Lisdahl, K.M., Price, J.S., 2012. Increased marijuana use and gender predict poorer cognitive functioning in adolescents and emerging adults. J Int Neuropsychol Soc 18, 678-688.

      O'Halloran, L., Cao, Z.P., Ruddy, K., Jollans, L., Albaugh, M.D., Aleni, A., Potter, A.S., Vahey, N., Banaschewski, T., Hohmann, S., Bokde, A.L.W., Bromberg, U., Buchel, C., Quinlan, E.B., Desrivieres, S., Flor, H., Frouin, V., Gowland, P., Heinz, A., Ittermann, B., Nees, F., Orfanos, D.P., Paus, T., Smolka, M.N., Walter, H., Schumann, G., Garavan, H., Kelly, C., Whelan, R., 2018. Neural circuitry underlying sustained attention in healthy adolescents and in ADHD symptomatology. Neuroimage 169, 395-406.

      Potter, A.S., Newhouse, P.A., 2008. Acute nicotine improves cognitive deficits in young adults with attention-deficit/hyperactivity disorder. Pharmacol Biochem Behav 88, 407-417.

      Ren, Z., Daker, R.J., Shi, L., Sun, J., Beaty, R.E., Wu, X., Chen, Q., Yang, W., Lyons, I.M., Green, A.E., Qiu, J., 2021. Connectome-Based Predictive Modeling of Creativity Anxiety. Neuroimage 225, 117469.

      Rosenberg, M.D., Finn, E.S., Scheinost, D., Papademetris, X., Shen, X., Constable, R.T., Chun, M.M., 2016. A neuromarker of sustained attention from whole-brain functional connectivity. Nat Neurosci 19, 165-171.

      Rosenberg, M.D., Scheinost, D., Greene, A.S., Avery, E.W., Kwon, Y.H., Finn, E.S., Ramani, R., Qiu, M., Constable, R.T., Chun, M.M., 2020. Functional connectivity predicts changes in attention observed across minutes, days, and months. Proc Natl Acad Sci U S A 117, 3797-3807.

      Shen, X., Finn, E.S., Scheinost, D., Rosenberg, M.D., Chun, M.M., Papademetris, X., Constable, R.T., 2017. Using connectome-based predictive modeling to predict individual behavior from brain connectivity. Nat Protoc 12, 506-518.

      Valentine, G., Sofuoglu, M., 2018. Cognitive Effects of Nicotine: Recent Progress. Curr Neuropharmacol 16, 403-414.

      Verbruggen, F., Aron, A.R., Band, G.P.H., Beste, C., Bissett, P.G., Brockett, A.T., Brown, J.W., Chamberlain, S.R., Chambers, C.D., Colonius, H., Colzato, L.S., Corneil, B.D., Coxon, J.P., Dupuis, A., Eagle, D.M., Garavan, H., Greenhouse, I., Heathcote, A., Huster, R.J., Jahfari, S., Kenemans, J.L., Leunissen, I., Li, C.S.R., Logan, G.D., Matzke, D., Morein-Zamir, S., Murthy, A., Pare, M., Poldrack, R.A., Ridderinkhof, K.R., Robbins, T.W., Roesch, M.R., Rubia, K., Schachar, R.J., Schall, J.D., Stock, A.K., Swann, N.C., Thakkar, K.N., van der Molen, M.W., Vermeylen, L., Vink, M., Wessel, J.R., Whelan, R., Zandbelt, B.B., Boehler, C.N., 2019. A consensus guide to capturing the ability to inhibit actions and impulsive behaviors in the stop-signal task. Elife 8.

      Whelan, R., Conrod, P.J., Poline, J.B., Lourdusamy, A., Banaschewski, T., Barker, G.J., Bellgrove, M.A., Buchel, C., Byrne, M., Cummins, T.D., Fauth-Buhler, M., Flor, H., Gallinat, J., Heinz, A., Ittermann, B., Mann, K., Martinot, J.L., Lalor, E.C., Lathrop, M., Loth, E., Nees, F., Paus, T., Rietschel, M., Smolka, M.N., Spanagel, R., Stephens, D.N., Struve, M., Thyreau, B., Vollstaedt-Klein, S., Robbins, T.W., Schumann, G., Garavan, H., Consortium, I., 2012. Adolescent impulsivity phenotypes characterized by distinct brain networks. Nat Neurosci 15, 920-925.

      Yoo, K., Rosenberg, M.D., Hsu, W.T., Zhang, S., Li, C.R., Scheinost, D., Constable, R.T., Chun, M.M., 2018. Connectome-based predictive modeling of attention: Comparing different functional connectivity features and prediction methods across datasets. Neuroimage 167, 11-22.

      Young, J.W., Finlayson, K., Spratt, C., Marston, H.M., Crawford, N., Kelly, J.S., Sharkey, J., 2004. Nicotine improves sustained attention in mice: evidence for involvement of the alpha7 nicotinic acetylcholine receptor. Neuropsychopharmacology 29, 891-900.

      Zhao, W., Makowski, C., Hagler, D.J., Garavan, H.P., Thompson, W.K., Greene, D.J., Jernigan, T.L., Dale, A.M., 2023. Task fMRI paradigms may capture more behaviorally relevant information than resting-state functional connectivity. Neuroimage, 119946.

      Zhao, W., Palmer, C.E., Thompson, W.K., Chaarani, B., Garavan, H.P., Casey, B.J., Jernigan, T.L., Dale, A.M., Fan, C.C., 2021. Individual Differences in Cognitive Performance Are Better Predicted by Global Rather Than Localized BOLD Activity Patterns Across the Cortex. Cereb Cortex 31, 1478-1488.

    1. Reviewer #1 (Public Review):

      Summary:

      This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, parahippocampal gyrus, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though still in this revised paper I have substantive concerns about how the analyses were performed. While scene-specific reinstatement decreased for remote memories in both children and adults, claims about its presence cannot be made given the analyses. Gist-level reinstatement was observed in children but not adults, but I also have concerns about this analysis. Broadly, the behavioural and univariate findings are consistent with the idea memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.

      Strengths:

      The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.

      Weaknesses:

      As noted above and in my review of the original submission, the pattern similarity analysis for both item and category-level reinstatement were performed in a way that is not interpretable given concerns about temporal autocorrelation within scanning run. Unfortunately these issues remain of concern in this revision because they were not rectified. Most of my review focuses on this analytic issue, though I also outline additional concerns.

      (1) The pattern similarity analyses are largely uninterpretable due to how they were performed.

      (a) First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, and which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, which is not possible given the design.

      To remedy this, in the revision the authors have said they will refrain from making conclusions about the presence of scene-specific reinstatement (i.e., reinstatement above baseline). While this itself is an improvement from the original manuscript, I still have several concerns. First, this was not done thoroughly and at times conclusions/interpretations still seem to imply or assume the presence of scene reinstatement (e.g., line 979-985, "our research supports the presence of scene-specific reinstatement in 5-to-7-year-old children"; line 1138). Second, the authors' logic for the neural-behavioural correlations in the PLSC analysis involved restricting to regions that showed significant reinstatement for the gist analysis, which cannot be done for the analogous scene-specific reinstatement analysis. This makes it challenging to directly compare these two analyses since one was restricted to a small subset of regions and only children (gist), while scene reinstatement included both groups and all ROIs. Third, it is also unclear whether children and adults' values should be directly comparable given pattern similarity can be influenced by many factors like motion, among other things.

      My fourth concern with this analysis relates to the lack of regional specificity of the effects. All ROIs tested showed a virtually identical pattern: "Scene-specific reinstatement" decreased across delays, and was greater in children than adults. I believe control analyses are needed to ensure artifacts are not driving these effects. This would greatly strengthen the authors' ability to draw conclusions from the "clean" comparison of day 1 vs. day 14. (A) The authors should present results from a control ROI that should absolutely not show memory reinstatement effects (e.g., white matter?). Results from the control ROI should look very different - should not differ between children and adults, and should not show decreases over time. (B) Do the recent items from day 1 vs. day 14 differ? If so, this could suggest something is different about the later scans (and if not, it would be reassuring). (C) If the same analysis was performed comparing the object cue and immediately following fixation (rather than the fixation and the immediately following scene), the results should look very different. I would argue that this should not be an index of reinstatement at all since it involves something presented visually rather than something reinstated (i.e., the scene picture is not included in this comparison). If this control analysis were to show the same effects as the primary analysis, this would be further evidence that this analysis is uninterpretable and hopelessly confounded.

      (b) For the category-based neural reinstatement: (1) This suffers from the same issue of correlations being performed within run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). The authors in their response letter have indicated that because the patterns being correlated are not derived from events in close temporal proximity, they should not suffer from the issue of temporal autocorrelation. This is simply not true. For example, see the paper by Prince et al. (eLife 2022; on GLMsingle). This is not the main point of Prince et al.'s paper, but it includes a nice figure that shows that, using standard modelling approaches, the correlation between (same-run) patterns can be artificially elevated for lags as long as ~120 seconds (and can even be artificially reduced after that; Figure 5 from that paper) between events. This would affect many of the comparisons in the present paper. The cleanest way to proceed is to simply drop the within-run comparisons, which I believe the authors can do and yet they have not. Relatedly, in the response letter the authors say they are focusing mainly on the change over time for reinstatement at both levels including the gist-type reinstatement; however, this is not how it is discussed in the paper. They in fact are mainly relying on differences from zero, as children show some "above baseline" reinstatement while adults do not, but I believe there were no significant differences over time (i.e., the findings the authors said they would lean on primarily, as they are arguably the most comparable). (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. In their response letter and the revised paper, the authors do provide a bit of reasoning as to why this is the most sensible. However, it is still not clear to me whether this is really "reinstatement" which (in my mind) entails the re-evoking of a neural pattern initially engaged during perception. Rather, could this be a shared neural state that is category specific? In any case, I think additional information should be added to the text to clarify that this definition differs from others in the literature. The authors might also consider using some term other than reinstatement. Again (as I noted in my prior review), the finding of no category-level reinstatement in adults is surprising and confusing given prior work and likely has to do with the operationalization of "reinstatement" here. I was not quite sure about the explanation provided in the response letter, as category-level reinstatement is quite widespread in the brain for adults and is robust to differences in analytic procedures etc. (3) Also from a theoretical standpoint-I'm still a bit confused as to why gist-based reinstatement would involve reinstatement of the scene gist, rather than the object's location (on the screen) gist. Were the locations on the screen similar across scene backgrounds from the same category? It seems like a different way to define memory retrieval here would be to compare the neural patterns when cued to retrieve the same vs. similar (at the "gist" level) vs. different locations across object-scene pairs. This is somewhat related to a point from my review of the initial version of this manuscript, about how scene reinstatement is not necessary. The authors state that participants were instructed to reinstate the scene, but that does not mean they were actually doing it. The point that what is being measured via the reinstatement analyses is actually not necessary to perform the task should be discussed in more detail in the paper.

      (2) Inspired by another reviewer's comment, it is unclear to me the extent to which age group differences can be attributed to differences in age/development versus memory strength. I liked the other reviewer's suggestions about how to identify and control for differences in memory strength, which I don't think the authors actually did in the revision. They instead showed evidence that memory strength does seem to be lower in children, which indicates this is an interpretive confound. For example, I liked the reviewer's suggestion of performing analyses on subsets of participants who were actually matched in initial learning/memory performance would have been very informative. As it is, the authors didn't really control for memory strength adequately in my opinion, and as such their conclusions about children vs. adults could have been reframed as people with weak vs. strong memories. This is obviously a big drawback given what the authors want to conclude. Relatedly, I'm not sure the DDM was incorporated as the reviewer was suggesting; at minimum I think the authors need to do more work in the paper to explain what this means and why it is relevant. (I understand putting it in the supplement rather than the main paper, but I still wanted to know more about what it added from an interpretive perspective.)

      (3) Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. report difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). Precuneus also interestingly seems to show numerically recent>remote (values mostly negative), whereas most other regions show the opposite. This difference from zero (in either direction) or lack thereof seems important to the message. In response to this comment on the original manuscript, the authors seem to have confirmed that hippocampal activity was greater during retrieval than implicit baseline. But this was not really my question - I was asking whether hippocampus is (and other ROIs in this same figure are) differently engaged for recent vs. remote memories.

      (4) Related to point 3, the claims about hippocampus with respect to multiple trace theory feel very unsupported by the data. I believe the authors want to conclude that children's memory retrieval shows reliance on hippocampus irrespective of delay, presumably because this is a detailed memory task. However the authors have not really shown this; all they have shown is that hippocampal involvement (whatever it is) does not vary by delay. But we do not have compelling evidence that the hippocampus is involved in this task at all. That hippocampus is more active during retrieval than implicit baseline is a very low bar and does not necessarily indicate a role in memory retrieval. If the authors want to make this claim, more data are needed (e.g., showing that hippocampal activity during retrieval is higher when the upcoming memory retrieval is successful vs. unsuccessful). In the absence of this, I think all the claims about multiple trace theory supporting retrieval similarly across delays and that this is operational in children are inappropriate and should be removed.

      (5) There are still not enough methodological details in the main paper to make sense of the results. Some of these problems were addressed in the revision but others remain. For example, a couple of things that were unclear: that initially learned locations were split, where half were tested again at day 1 and the other half at day 14; what specific criterion was used to determine to pick the 'well-learned' associations that were used for comparisons at different delay periods (object-scene pairs that participants remembered accurately in the last repetition of learning? Or across all of learning?).

      (6) In still find the revised Introduction a bit unclear. I appreciated the added descriptions of different theories of consolidation, though the order of presented points is still a bit hard to follow. Some of the predictions I also find a bit confusing as laid out in the introduction. (1) As noted in the paper multiple trace theory predicts that hippocampal involvement will remain high provided memories retained are sufficiently high detail. The authors however also predict that children will rely more on gist (than detailed) memories than adults, which would seem to imply (combined with the MTT idea) that they should show reduced hippocampal involvement over time (while in adults, it should remain high). However, the authors' actual prediction is that hippocampus will show stable involvement over time in both kids and adults. I'm having a hard time reconciling these points. (2) With respect to the extraction of gist in children, I was confused by the link to Fuzzy Trace Theory given the children in the present study are a bit young to be showing the kind of gist extraction shown in the Brainerd & Reyna data. Would 5-7 year olds not be more likely to show reliance on verbatim traces under that framework? Also from a phrasing perspective, I was confused about whether gist-like information was something different from just gist in this sentence: "children may be more inclined to extract gist information at the expense of detailed or gist-like information." (p. 8) - is this a typo?

      (7) For the PLSC, if I understand this correctly, the profiles were defined for showing associations with behaviour across age groups. (1) As such, is it not "double dipping" to then show that there is an association between brain profile and behaviour-must this not be true by definition? If I am mistaken, it might be helpful to clarify this in the paper. (2) In addition, I believe for the univariate and scene-specific reinstatement analyses these profiles were defined across both age groups. I assume this doesn't allow for separate definition of profiles across the two group (i.e., a kind of "interaction"). If this is the case, it makes sense that there would not be big age differences... the profiles were defined for showing an association across all subjects. If the authors wanted to identify distinct profiles in children and adults they may need to run another analysis. (3) Also, as for differences between short delay brain profile and long delay brain profile for the scene-specific reinstatement - there are 2 regions that become significant at long delay that were not significant at a short delay (PC, and CE). However, given there are ceiling effects in behaviour at the long but not short delay, it's unclear if this is a meaningful difference or just a difference in sensitivity. Is there a way to test whether the profiles are statistically different from one another? (4) As I mentioned above, it also was not ideal in my opinion that all regions were included for the scene-specific reinstatement due to the authors' inability to have an appropriate baseline and therefore define above-chance reinstatement. It makes these findings really challenging to compare with the gist reinstatement ones.

      (8) I would encourage the authors to be specific about whether they are measuring/talking about memory representations versus reinstatement, unless they think these are the same thing (in which case some explanation as to why would be helpful). For example, especially under the Fuzzy Trace framework, couldn't someone maintain both verbatim and gist traces of a memory yet rely more on one when making a memory decision?

      (9) With respect to the learning criteria - it is misleading to say that "children needed between two to four learning-retrieval cycles to reach the criterion of 83% correct responses" (p. 9). Four was the maximum, and looking at the Figure 1C data it appears as though there were at least a few children who did not meet the 83% minimum. I believe they were included in the analysis anyway? Please clarify. Was there any minimum imposed for inclusion?

      (10) For the gist-like reinstatement PLSC analysis, results are really similar a short and long delays and yet some of the text seems to implying specificity to the long delay. One is a trend and one is significant (p. 31), but surely these two associations would not be statistically different from one another?

      (11) As a general comment, I had a hard time tying all of the (many) results together. For example adults show more mature neocortical consolidation-related engagement, which the authors say is going to create more durable detailed memories, but under multiple trace theory we would generally think of neocortical representations as providing more schematic information. If the authors could try to make more connections across the different neural analyses, as well as tie the neural findings in more closely with the behaviour & back to the theoretical frameworks, that would be really helpful.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Reviews):

      Summary:

      This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, parahippocampal gyrus, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though I have substantive concerns about how this analysis was performed and as such will not summarize the results. Broadly, the behavioural and univariate findings are consistent with the idea that memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.

      Strengths:

      The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.

      Weaknesses:

      As noted above, the pattern similarity analysis for both item and category-level reinstatement was performed in a way that is not interpretable given concerns about temporal autocorrelation within the scanning run. Below, I focus my review on this analytic issue, though I also outline additional concerns.

      We thank the reviewer for both the positive and critical appraisal of our paper.

      (1) The pattern similarity analyses were not done correctly, rendering the results uninterpretable (assuming my understanding of the authors' approach is correct).

      a. First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within the scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, but I don't believe this is possible unfortunately given the authors' design; I believe the target (presumably reinstated) scene only appears once during scanning, so there is no separate neural pattern during the presentation of this picture that they can use. For these reasons, any evidence for "significant scene-specific reinstatement" and the like is completely uninterpretable and would need to be removed from the paper.

      We thank the reviewer for this important input. We acknowledge that our study design leads to temporal autocorrelation in the BOLD signal when calculating RSA between fixation and scene time windows. We also recognize that we cannot interpret the significance of scene-specific reinstatement compared to zero and have accordingly removed this information. Nevertheless, our primary objective was to investigate changes in scene-specific reinstatement in relation to the different time delays of retrieval. Given that the retrieval procedure is the same over time and presumably similarly influenced by temporal autocorrelations, we argue that our results must be attributed to the relative differences in reinstatement across recent and remote trials. Bearing this in mind, we argue that our results can be interpreted in terms of delay-related changes in reinstatement. This information is discussed in pp. 21, 40 of the manuscript.

      We agree with the reviewer that cross-run comparisons would be extremely interesting. This could be achieved by introducing the same items repeatedly across different runs, which was not possible in our current setup since we were interested in single exposure retrieval and practical time restriction in scanning children. We have  introduced this idea in Limitations and Discussion sections (pp. 40, 44) of the manuscript to inform future studies.

      Finally, thanks to the reviewer’s comment, we identified a bug in the final steps of our RSA calculation. Fischer’s z-transformation was incorrectly applied to r-1 values, resulting in abnormally high values. We apologize for this error. We have revised the scripts and rectified the bug by correctly applying Fischer’s z-transformation to the r similarity values. We also adjusted the methods description figure accordingly (Figure 5, p. 22). This adjustment led to slightly altered reinstatement indices. Nevertheless, the overall pattern of delay-related attenuation in the scene-specific reinstatement index, observed in both children and adults, remains consistent. Similarly, we observed gist-like reinstatement uniquely in children.

      b. From a theoretical standpoint, I believe the way this analysis was performed considering the fixation and the immediately following scene also means that the differences between recent and remote could have to do with either the reactivation (processes happening during the fixation, presumably) or differences in the processing of the stimulus itself (happening during the scene presentation). For example, people might be more engaged with the more novel scenes (recent) and therefore process those scenes more; such a difference would be interpreted in this analysis as having to do with reinstatement, but in fact could be just related to the differential scene processing/recognition, etc.

      Thank you for your insightful comments. We acknowledge the theoretical concerns raised about distinguishing between the effects of reactivation processes occurring during fixation and differential processing of the stimulus itself during scene presentation. Specifically, the notion that engagement levels with recent scenes could result in enhanced processing, which might be misattributed to memory reinstatement mechanisms.

      We argue, however, that during scene presentation, scenes are processed more “memory-wise” rather than “perception-wise”, since both recent and remote memories are well-learned, as we included only correctly recalled memories in the analysis.

      We concur that scene presentations entail perceptual processing; however, such processing would be consistent across all items, given that they were presented with the same repeated learning procedure, rendering them equally familiar to participants. In addition, we would argue that distinct activation patterns elicited during varying delays are more likely attributable to memory-related processing, since participants actively engaged in a memory-based decision-making task during these intervals. We have incorporated this rationale into the discussion section of our manuscript (p. 40).

      With this in mind, we hypothesized that in case of “memory-wise” processing, the neural engagement during the scene time window should be higher for remote compared to recent  items, and this increases with passing time as more control and effort should be exhibited during retrieval due to reorganized and distributed nature of memories. If the scenes are processed more “perception-wise”, we would expect higher neural engagement during the retrieval of recent compared to remote items. Our exploratory analysis (detailed overview in supplementary materials, Figure S3, Table S9) revealed a higher neural activation for remote compared to recent items in medial temporal, prefrontal, occipital and cerebellar brain regions, supporting the notion of “memory-wise” processes during scene time window. However, this exploratory analysis cannot provide a direct solution to the reviewer’s concern as our paradigm per se cannot arbitrate between “memory-wise” and “perception-wise” nature of retrieval. We added the point to the discussion (see p. 40).

      c. For the category-based neural reinstatement:

      (1) This suffers from the same issue of correlations being performed within the run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). With this restriction, it may or may not be possible to perform this analysis, depending upon how the same-category scenes are distributed across runs. However, there are other issues with this analysis, as well.

      (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. The authors do not motivate the reason for this switch. Please provide reasoning as to why fixation-fixation is more appropriate than fixation-scene similarity for category-level reinstatement, particularly given the opposite was used for item-level reinstatement. Even if the analyses were done properly, it would remain hard to compare them given this difference in approach.

      (3) I believe the fixation cross with itself is included in the "within category" score  Is this not a single neural pattern correlated with itself, which will yield maximal similarity (pearson r=1) or minimal dissimilarity (1-pearson r=0)? Including these comparisons in the averages for the within-category score will inflate the difference between the "within-category" and "between-category" comparisons. These (e.g., forest1-forest1) should not be included in the within-category comparisons considered; rather, they should be excluded, so the fixations are always different but sometimes the comparisons are two retrievals of the same scene type (forest1-forest2), and other times different scene types (forest1-field1)

      (4) It is troubling that the results from the category reinstatement metric do not seem to conceptually align with past work; for example, a lot of work has shown category-level reinstatement in adults. Here the authors do not show any category-level reinstatement in adults (yet they do in children), which generally seems extremely unexpected given past work and I would guess has to do with the operationalization of the metric.

      Thank you for this important input regarding category-based reinstatement.

      (1) The distribution of within-category items across runs was approximately similar and balanced. Additionally, within runs, they were presented randomly without close temporal proximity. Based on this arrangement, we believe that the issue of close temporal autocorrelation, as pointed out by the reviewer in the context of scene-specific reinstatement, may not apply to the same extent here. Again, our focus is not on the absolute level of category-based reinstatement, but the relative difference across conditions (recent vs. remote short delay vs. remote long delay) which are equally impacted by the autocorrelations.  

      (2) We apologize for not motivating this analysis further. Whereas the scene-reinstatement index (i.e., fixation to scene correlation) gives us a measure of the pre-activation of a concrete scene (e.g., a yellow forest in autumn), the gist-like reinstatement gives us a measure of the pre-activation of a whole category of scenes (e.g., forests). Critically, our window of interest is the fixation period for both sets of analysis (in the absence of any significant visual input). The scene-specific reinstatement uses the scene window as a neural template against which the fixation period can be compared, while the gist-like reinstatement compares similarity of reactivation pattern for trials from the same category but differ in the exact memory content. The reinstatement of more generic, gist-like memory (e.g., forest) across multiple trials should yield more similar neural activation patterns. Significant gist-like reinstatement would suggest that neural patterns for scenes within the same category are more generic, as indicated by higher similarity among them. On the other hand, a more detailed reinstatement of specific types of forests (e.g., a yellow forest in autumn, green pine trees, a bare-leaved forest in spring, etc.) that differ in various dimensions could result in neural activation patterns that are as dissimilar as those seen in the reinstatement of scenes from entirely different categories. Through this methodology, we could distinguish between more generic, gist-like reinstatement and more specific, detailed reinstatement. This is now clarified in the manuscript, see p. 25.

      (3) We apologize for the confusion caused by the figure and analysis description. In our analysis, we indeed excluded the correlation of the fixation cross with itself. Consequently, the diagonal in the figure should be blank to indicate this. This is now revised in the manuscript (Figure 7B and in Methods).

      (4) We appreciate your concern and recognize that the terminology we used might not align perfectly with the conventional understanding of category-based reinstatement. Typically, category-level neural representations (as discussed in Polyn et al., 2005; Jafarpour et al., 2014; among others) are investigated to identify specific brain areas associated with encoding/perception of scenes or faces. Our aim, however, was to explore the mnemonic reinstatement of highly detailed scenes that were elaborately encoded, with the hypothesis that substantial representational transformations would occur over time and vary with age. This hypothesis is based on the memory literature, including the Fuzzy-Trace Theory, the Contextual Binding Theory, and the Trace Transformation Theory (Brainerd & Reyna, 1998; Yonelinas, 2019; Moscovitch & Gilboa, 2023). Therefore, we renamed 'category-based' reinstatement to 'gist-like' reinstatement, which clarifies our concept and better aligns it with existing literature.

      We anticipated that young adults, having the ability to retain detailed narratives post-encoding, would demonstrate a reinstatement of scenes with distinct details, making these scenes dissimilar from each other (see similar findings in Sommer et al., 2021). In contrast, given the anticipated lesser strategic elaboration during learning in children, we hypothesized that they would demonstrate a shallower, more gist-like reinstatement (for instance, children recalling a forest or a field in a general sense without specific details or vivid imagery). This could result in higher category-based similarity, as children might reinstate a more generic forest concept.

      We did not gather additional data on the verbal quality of reinstatement due to the limited scanning time available for children, so these assumptions remain unverified. However, anecdotal observations post-retrieval indicated that adults often reported very vivid scenes associated with clear narrative recall. In contrast, children frequently described more vague memories (e.g., “I know it was a forest”) without specific details. Future studies should include measures to assess the quality of reinstatement, potentially outside the scanning environment.

      (2) I did not see any compelling statistical evidence for the claim of less robust consolidation in children.

      Specifically in terms of the behavioral results of retention of the remote items at 1 vs 14 days, shown in Figure 2B, the authors conclude that memory consolidation is less robust in children (line 246). Yet they do not report statistical evidence for this point, as there was no interaction of this effect with the age group. Children had worse memory than adults overall (in terms of a main effect - i.e. across recent and remote items). If it were consolidation-specific, one would expect that the age differences are bigger for the remote items, and perhaps even most exaggerated for the 14-day-old memories. Yet this does not appear to be the case based on the data the authors report. Therefore, the behavioral differences in retention do not seem to be consolidation specific, and therefore might have more to do with differences in encoding fidelity or retrieval processes more generally across the groups. This should be considered when interpreting the findings.

      Thank you for highlighting this important issue. We acknowledge that our initial description and depiction of our behavioral findings may not have effectively conveyed the main message about memory consolidation. Therefore, we have revised the behavioral results section (see pp. 12-14) to communicate our message more clearly.

      As detailed in the methods section, we reported retention rates only for those items that were correctly (100%) learned on day 0, day 1, and day 14. This approach meant that different participants had varying numbers of items learned correctly. However, this strategy allowed us to address our primary question: whether memory consolidation, based on all items initially encoded successfully, is comparably robust between groups.

      To illustrate the change in retention rate slopes over time for recently learned items (i.e., immediately 30 minutes after learning), short delay remote, and long delay remote items, relative to the initially correctly learned items more clearly and straightforward, we conducted the following analysis: after observing no differences between sessions in both age groups for recent items on days 1 and 14, we combined the recent items. This approach enabled us to investigate how the slope of memory retention for initially correctly learned items (with a baseline of 100%) changes over time. We observed a significant interaction between item type (recent, short delay remote, long delay remote) and group (F(3,250) = 17.35, p < .001, w2 = .16). The follow up of this interaction revealed significantly less robust memory consolidation across all delay times in children compared to young adults. This information is added in the manuscript in pp. 12-14. We have also updated the figures, incorporating the baseline of 100% correct performance.

      (3) Please clarify which analyses were restricted to correct retrievals only. The univariate analyses states that correct and incorrect trials were modelled separately but does not say which were considered in the main contrast (I assume correct only?). The item specific reinstatement analysis states that only correct trials were considered, but the category-level reinstatement analysis does not say. Please include this detail.

      Thank you for bringing this to our attention. We indeed limited our analysis – including univariate, specific reinstatement, and gist-like analyses – to only correctly remembered items. This decision was made because our goal was to observe delay-related changes in the neural correlates of correct memories, which are potentially stronger. We have incorporated this information into the manuscript.

      (4) To what extent could performance differences be impacting the differences observed across age groups? I think (see prior comment) that the analyses were probably limited to correct trials, which is helpful, but still yields pretty big differences across groups in terms of the amount of data going into each analysis. In general, children showed more attenuated neural effects (e.g., recent/remote or session effects); could this be explained by their weaker memory? Specifically, if only correct trials are considered that means that fewer trials would be going into the analysis for kids, especially for the 14-day remote memories, and perhaps pushing the remove > recent difference for this condition towards 0. The authors might be able to address this analytically; for example, does the remote > recent difference in the univariate data at day 14 correlate with day 14 memory?

      Thank you for pointing this out. Indeed, there was a significant relationship between remote > recent difference in the univariate data and memory performance at day 14 across both age group (see Figure 4C-D). The performance of all participants including children was above chance level for remote trial on day 14. In addition, although number of remote trials was lower in children (18 trials on average) in comparison to adults (22 trials on average), we believe that the number of remote trials was not too low or different across groups for the contrast.

      (5) Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. report difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example, in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). This difference from zero or lack thereof seems important to the message - is that correct? If so, can the authors incorporate descriptions of these findings?

      Thank you for this valuable input. When examining recent and remote retrieval separately, indeed both the anterior and posterior regions of the hippocampus exhibited significant activation from zero in adults (all p < .0003FDRcorr) and children (all p < .014FDRcorr, except for recent posterior hippocampus) during all delays. We include this information in the manuscript (see p. 17) and add it to the supplementary materials (Figure S2, Table S7).

      (6) Please provide more details about the choices available for locations in the 3AFC task. (1) Were they different each time, or always the same? If they are always the same, could this be a motor or stimulus/response learning task? (2) Do the options in the 3AFC always come from the same area - in which case the participant is given a clue as to the gist of the location/memory? Or are they sometimes randomly scattered across the image (in which case gist memory, like at a delay, would be sufficient for picking the right option)? Please clarify these points and discuss the logic/impact of these choices on the interpretation of the results.Response: Thank you for pointing this out. During learning and retrieval, we employed the 3AFC (Three-Alternative Forced Choice) task.

      The choices for locations varied across scenes while remained the same across time within individuals. There were 18 different key locations for the objects, distributed across the stimulus set. This means the locations of the objects were quite heterogeneous and differed between objects. The location of the object within the task was presented once during encoding and remained consistent throughout learning. Given the location heterogeneity, we believe our task cannot be reduced to a mere “stimulus/response learning task” but is more accurately described as an object-location associations task.

      Similar to the previous description, the options for the 3AFC task did not originate from the same area, as there were 18 different areas in total. The three choice options were distributed equally: so sometimes the “correct” answer was the left option, sometimes in the middle option, or sometimes the right option. Therefore, we believe that the 3AFC task did not provide clues to the location but required detailed and precise memory of the location. Moreover, the options were not randomly scattered but rather presented close together in the scene, demanding a high level of differentiation between choices.

      Taking all the above into consideration, we assert that precise object-location associative memory is necessary for a correct answer. We have added this information to the manuscript (p. 9).

      (7) Often p values are provided but test statistics, effect sizes, etc. are not - please include this information. It is at times hard to tell whether the authors are reporting main effects, interactions, pairwise comparisons, etc.

      Thank you for bringing this to our attention. We realize that including this information in the Tables may not be the most straightforward approach. Therefore, we have incorporated the test statistics, effect sizes, and related details into the text of the results section for clarity.

      (8) There are not enough methodological details in the main paper to make sense of the results. For example, it is not clear from reading the text that there are new object-location pairs learned each day.

      Thank you for pointing this out. We have added this information to the main manuscript. Additionally, we have emphasized this information in the text referring to Figure 1B.

      (9) The retrieval task does not seem to require retrieval of the scene itself, and as such it would be helpful for the authors to both explain their reasoning for this task to measure reinstatement. Strictly speaking, participants could just remember the location of the object on the screen. Was it verified that children and adults were recalling the actual scene rather than just the location (e.g. via self-report)? It's possible that there may be developmental differences in the tendency to reinstate the scene depending on e.g., their strategy.

      Thank you for highlighting this important point. Indeed, the retrieval task included explicit instructions for participants to recall and visualize the scene associated with the object presented during the fixation time window. Participants were also instructed to recollect the location of the object within the scene. Since the location was contextually bound to the scene and each object had a unique location in each scene, the location of the object was always embedded in the specific scene context. We have added this information to both the Methods and Results sections.

      From the self-reports of the participants (which unfortunately were not systematically collected on all occasions), they indicated that when they could recall the scene and the location due to the memory of stories created during strategic encoding, it aided their memory for the scene and location immensely. We also concur with your observation that children and young adults may differ in their ability to reinstate scenes, depending on the success of their employed recall strategies. This task was conducted with an awareness of potential developmental differences in the ability to form complex contextual memories. Our elaborative learning procedure was designed to minimize these differences. It is important to note though we did not expect children to achieve performance levels fully comparable to adults. There may indeed be developmental differences in reinstatement, such as due to differences in knowledge availability and accessibility (Brod, Werkle-Bergner, & Shing, 2013). We think that these differences may underlie our findings of neural reinstatement. This is now discussed in p. 34-35, 39-43 of the manuscript.

      (10) In general I found the Introduction a bit difficult to follow. Below are a few specific questions I had.

      a. At points findings are presented but the broader picture or take-home point is not expressed directly. For example, lines 112-127, these findings can all be conceptualized within many theories of consolidation, and yet those overarching frameworks are not directly discussed (e.g., that memory traces go from being more reliant on the hippocampus to more on the neocortex). Making these connections directly would likely be helpful for many readers.

      Thank you for bringing this to our attention. We have incorporated a summary of the general frameworks of memory consolidation into the introduction. This addition outlines how our summarized findings, particularly those related to memory consolidation for repeatedly learned information, align with these frameworks (see lines 126-138, 146-150).

      b. Lines 143-153 - The comparison of the Tompary & Davachi (2017) paper with the Oedekoven et al. (2017) reads like the two analyses are directly comparable, but the authors were looking at different things. The Tompary paper is looking at organization (not reinstatement); while the Oedekoven et al. paper is measuring reinstatement (not organization). The authors should clarify how to reconcile these findings.

      Thank you for highlighting this aspect. We have revised how we present the results from Tompary & Davachi (2017). This study examined memory reorganization for memories both with and without overlapping features, and it observed higher neural similarity for memories with overlapping features over time. The authors also explored item-specific reinstatement for recent and remote memories by assessing encoding-retrieval similarity. Since Oedekoven et al. (2017) utilized a similar approach, their results are comparable in terms of reinstatement. We have updated and expanded our manuscript to clarify the parallels between these studies (see lines 157-162).

      c. Line 195-6: I was confused by the prediction of "stable involvement of HC over time" given the work reviewed in the Introduction that HC contribution to memory tends to decrease with consolidation. Please clarify or rephrase.

      Drawing on the Contextual Binding Theory (Yonelinas et al., 2019), as well as the Multiple Trace Theory (Nadel et al., 2000) and supported for instance by evidence from Sekeres et al. (2018), we hypothesized that detailed contextual memories formed through repeated and strategic learning would strengthen the specificity of these memories, resulting in consistent hippocampal involvement for successfully recalled contextualized detailed memories. We have included additional explanatory information in the manuscript to clarify this hypothesis (see lines 217-219).

      d. Lines 200-202: I was a bit confused about this prediction. Firstly, please clarify whether immediate reinstatement has been characterized in this way for kids versus adults. Secondly, don't adults retain gist more over long delays (with specific information getting lost), at least behaviourally? This prediction seems to go against that; please clarify.

      Thank you for raising this important point. Indeed, there are no prior studies that examined memory reinstatement over extended durations in children. The primary existing evidence suggests that neural specificity or patterns of neural representations in children can be robustly observed, while neural selectivity or univariate activation in response to the same stimuli tends to mature later (i.e., Fandakova et al., 2019). Bearing this in mind and recognizing that such neural patterns can be observed in both children and adults, we hypothesized that adults may form stronger detailed contextual memories compared to children. By employing strategies such as creating stories, adults might more easily recall scenes without the need to resort to forming generic or gist-like memories (for example, 'a red fox was near the second left pine tree in a spring green forest'). This assumption aligns with the Fuzzy Trace Theory (Reyna & Brainerd, 1995), which posits that verbatim memories can be created without the extraction of a gist.

      Conversely, we hypothesized that children, due to the ongoing maturation of associative and strategic memory components (as discussed in Shing et al., 2008 and 2010), which are dependent respectively on the hippocampus (HC) and the prefrontal cortex (PFC), would be less adept at creating, retaining, and extracting stories to aid their retrieval process. This could result in them remembering more generic integrated information, like the relationship between a fox and some generic image of a forest. We have added explanatory information to the manuscript to elucidate these points (see lines 225-230).

      Reviewer #1 (Recommendations For The Authors):

      (1) For Figure 3, I would highly recommend changing the aesthetics for the univariate data - at least on my screen they appear to be open boxes with solid vs. dashed lines, and as such look identical to the recent vs. remove distinction in Figure 2B. It also doesn't match the legend for me, which shows the age groups having purple vs. yellow coloring.

      Thank you for this observation. We have adjusted Figure 2 (now Figure 3) (please refer to p. 14) accordingly, now utilizing purple and yellow colors to distinguish between the age groups.

      (2) Lines 329-330, it is not true that "all" indices were significant from zero but this is only apparent if you read the next sentence. Please rephrase to clarify. e.g., "All ... indices with a few exceptions ... were significantly..."?

      Based on the above suggestions and considering our primary focus on time-related changes in scene-specific reinstatement, we will refrain from further interpreting the relative expression of individual scene-specific indices against 0. Consequently, we have removed this information from our analysis.

      (3) It is challenging to interpret some of the significance markers, such as those in Figure 3. For example what effects are being denoted by the asterisks and bars above vs. below the data on panel D? Please clarify and/or note in the legend.

      We have included a note in the legend to clarify the meaning of all significance markers. In addition, we decided to state any significant main and interaction effects in the figure rather that to use significance markers.

      (4) For Figures 2 and 3, only the meaning of error bars is described in the caption. It is not explained in the caption what the boxes, lines, and points denote. Please clarify.

      Thank you for highlighting this. We have added explanations to the figure's annotation for clarity. Please note, that considering other review’s suggestions figure plots may have been adjusted or changed, resulting in adjustment of the explanations in the figure annotation.

      (5) How were recent and remote interspersed relative to one another? The text says that each run had 10 recent and 10 remote pairs, presented in a "pseudo-random order" - not clear what that (pseudo) means in this case. Please clarify.

      Thank you for raising this point. We provide this information in the Methods section “Materials and Procedure”: 'The jitters and the order of presentation for recent and remote items were determined using OptimizeXGUI (Spunt, 2016), following an exponential distribution (Dale, 1999). Ten unique recently learned pairs (from the same testing day) and ten unique remotely learned items (from Day 0) were distributed within each run (in total three runs) in the order as suggested by the software as the most optimal. There were three runs with unique sets of stimuli each resulting in thirty unique recent and thirty unique remote stimuli overall.'

      (6) Figure 1A, second to last screen on the learning cycles row - what would be presented to participants here, one of these three emojis? What does the sleepy face represent? I see some of these points were mentioned in the methods, but additional clarification in the caption would be helpful.

      Thank you for highlighting this. We have included this information in the figure caption. Specifically, the sleepy face symbol in the figure denotes a 'missed response'.

      (7) Not clear how the jittered fixation time between object presentation and scene test is dealt with in representational similarity analyses.

      Thank you for pointing this out. Beta estimates were obtained from a Least Square Separate (LSS) regression model. Each event was modeled with their respective onset and duration and, as such, one beta value was estimated per event (with the lags between events differing from trial to trial). We have edited the corresponding section (see p. 53).  

      (8) It was a little bit strange to have used anterior vs posterior HPC ROIs separately in univariate analysis but then combined them for multivariate. There are many empirical and theoretical motivations for looking at item-specific and category reinstatement in anterior and posterior HPC separately, so I was surprised not to see this. Please explain this reasoning.

      Thank you for pointing this out. We agree with the reviewer and included the anterior and posterior HC ROIs into the multivariate analysis. Please see the revised results section (pp. 13-15).

      (9) The term "neural specificity" is introduced (line 164) without explanation; please clarify.

      Thank you for bringing this to our attention. The term ‘neural specificity’ refers to the neural representational distinctiveness of information. In other words, ‘neural specificity,’ as defined by Fandakova et al. (2019), refers to the distinctiveness of neural representations in the regions that process that sensory input. We decided, however to refrain from using this term and instead to use neural representational distinctiveness, which is more self-explaining and was also introduced in the manuscript.

      (10) Age range is specified as 5-7 years initially (line 187) and then 6-7 years (line 188).

      We have corrected the age range in line 188 to '5 to 7 years.'

      Reviewer #2 (Public Reviews):

      Schommartz et al. present a manuscript characterizing neural signatures of reinstatement during cued retrieval of middle-aged children compared to adults. The authors utilize a paradigm where participants learn the spatial location of semantically related item-scene memoranda which they retrieve after short or long delays. The paradigm is especially strong as the authors include novel memoranda at each delayed time point to make comparisons across new and old learning. In brief, the authors find that children show more forgetting than adults, and adults show greater engagement of cortical networks after longer delays as well as stronger item-specific reinstatement. Interestingly, children show more category-based reinstatement, however, evidence supports that this marker may be maladaptive for retrieving episodic details. The question is extremely timely both given the boom in neurocognitive research on the neural development of memory, and the dearth of research on consolidation in this age group. Also, the results provide novel insights into why consolidation processes may be disrupted in children. Despite these strengths, there are quite a few important design and analytical choices that derail my enthusiasm for the paper. If the authors could address these concerns, this manuscript would provide a solid foundation to better understand memory consolidation in children.

      We thank the reviewer for both the positive and critical appraisal of our paper.

      Reviewer #2 (Recommendations For The Authors):

      (1) My greatest concern is the difference in memory accuracy that emerges as soon as immediate learning, which undermines the interpretation of any consolidation-related differences. This concern is two-fold. The authors utilize an adaptive learning approach in which participants learn to criteria or stop after 4 repetitions. This type of approach leads to children seeing the stimuli more often during learning compared to adults, which on its own could have consequences for consolidation-related neural markers. Specifically, within adults theoretical and empirical work this shows that repeating information can actually lead to more gist-like representations, which is the exact profile the children are showing. While there could be a strength to this approach because it allows for equivocal memory, the decision to stop repetitions before criteria means that memory performance is significantly lower in the children, which again could have consequences to consolidation-related neural markers. First, the authors do not show any of the learning-related data which would be critical to assess the impact of this design choice. Second, there are likely differences in memory strength at the delay, making it extremely difficult to determine if the neural markers reflect development, worse memory strength, or both. This issue is compounded by the use of a 3-AFC paradigm, wherein "correct responses" included in the analysis could contain a significant amount of guessing responses. I think a partial solution to this problem is to analyze the RT data and include them in the analyses or use a drift-diffusion modeling approach to get more precise estimates of memory strength to control for this feature. An alternative is to sub-select participants in each group to have a sample matched on performance (including # of repetitions) and re-run all the analyses in this sub-sample. Without addressing these concerns it is near impossible to interpret the presented data.

      Thank you for highlighting this point.

      Firstly, we believe that our approach, involving strategic and repeated learning coupled with feedback, enhances the formation of detailed contextual memories. The retrieval procedure also emphasized the need for detailed memory for location. These are critical differences in experimental procedure from previous studies, which enhanced the importance of detailed representations and likely reduced the likelihood of forming gist-like memories.

      Indeed, we ceased further learning after the fourth repetition. Extensive piloting, where we initially stopped after the seventh repetition, showed no improvement beyond the fourth repetition. In fact, performance tended to decline due to fatigue. Therefore, we limited the number of repetition cycles to the point where an improvement of performance was still feasible. Even though children exhibited lower final learning performance overall, we believe our procedure facilitated them to reach their maximal performance within the experimental setup.

      To address the reviewer’s concern, we included learning data to illustrate the progression of learning (see Fig. 1C, pp. 9-10 in Results).

      When interpreting the retention rates, it is important to note that we reported retention rates only for items that were correctly learned (100%) on day 0, day 1, and day 14. This approach meant that different participants had varying numbers of items learned correctly. However, this method enabled us to address our primary question: whether memory consolidation, based on all items initially encoded successfully, is comparably robust between the groups. To simultaneously examine the change in retention rate slopes over time for recent (30 minutes after learning), short delay (one night after) remote, and long delay (two weeks after) remote items, we conducted a separate analysis of retention rates for recent items on days 1 and 14. After observing no differences between sessions in both age groups, we combined the data for recent items. This allowed us to investigate how the slope of memory retention for initially correctly learned items (with a baseline of 100%) changes over time. We observed a significant interaction between item type (recent, short delay remote, long delay remote) and group. Analysis of this interaction revealed significantly less robust memory consolidation across all delay times for children compared to young adults. The figures have been adjusted accordingly to incorporate the baseline of 100% correct performance.

      Following your suggestion, we also employed the drift diffusion model approach to characterize memory strength, calculating drift rate, boundary and non-decision time parameters. We added the results to the Supplementary Materials (section S2.1, Figure S1).

      Generally, our findings indicate lower overall drift rate in children when considering all items that had to be learned. We also observed that adults show higher slope of decline in drift rate in short and long delay, which, however, are characterized still by higher memory strength compared to children. Both age groups required similar amount of evidence to make decision, which declined with delay. It may indicate an adaptation of weaker memory. Further, we observed lesser non-decision time in children compared to adults, potentially suggesting less error checking or less thorough processing and memory access through strategy in children.

      Overall, these results indicate weaker memory strength in children as a quantitative measure. It may nevertheless stem from qualitatively different memory representations that children form, as our RSA findings suggest. We believe that our neural effect reflects the effect of interest (i.e., worse memory due to lower memory strength in children). When controlled for, it will take away variance of interest in the neural data. Therefore, we will refrain from including memory strength into the model. However, we will include mean RT as the indicator of general response tendencies.

      Given that the paper is already very complex and long, we opted to add the diffusion model results to the Supplementary Materials (section S2.1, Fig. S1), while discussing the results in the discussion (p. 35).

      (2) More discussion of the behavioral task should be included in the results, in particular the nature of the adaptive learning paradigm including the behavioral results as well as the categorical nature of the memoranda. Without this information, it is difficult for the reader to understand what category-level versus item-level reinstatement reflects.

      Thank you for this valuable input. We have incorporated this information into the results section. Please refer to pp. 9-10, 12, 14, 21, 25-26 for the added details.

      (3) Some of the methods for the reinstatement analysis were unclear to me or warranted further adjustment. I believe the authors compared the scene against all other scenes. I believe it would be more appropriate to only compare this against scenes drawn from the same category as opposed to all scenes. Secondly, from my reading, it seems like the reinstatement was done during the scene presentation, rather than the object presentation in which they would retrieve the scene. I believe the reinstatement results would be much stronger if it was captured during the object presentation rather than the re-presentation of the scene. Or perhaps both sets of analyses should be included.

      We apologize for the confusion regarding the analysis method.

      During the review process we have improved the description of this analysis and hope it is easier to follow now. In short, we used both approaches (within and between categories) to suit different goals (I.e., measuring scene-reinstatement and gist-like reinstatement).

      Both types of reinstatement were assessed during the fixation cross to avoid confounds with the object itself being on the screen. We only used the scene window in one analysis (scene-reinstatement index) as a neural template to track its pre-activation during the fixation. So, as the reviewer suggests, our rationale is that the reinstatement indeed starts taking place at the short object presentation window, but importantly, extends to the fixation window. We added this clarifying information to the results section (see p. 21-27).

      (4) For the univariate results, it was unclear to me when reading the results whether they were focusing on the object presentation portion of the trial or the scene presentation portion of the trial. Again, I think the claims of reinstatement related activity would be stronger if they accounted for the object presentation period.

      Thank you for pointing this out. Indeed, the univariate results were based on the object presentation time window. We added this information to the results section (Fig. 3, pp. 14, 16).

      (5) Further, given the univariate differences shown across age groups, the authors should re-run all analyses for the RSA controlling for mean activation within the ROI.

      Thank you for highlighting this. We re-ran all analysis for the RSA controlling for the mean activation within the ROI. The results remained unchanged. We have added this information to the results section as well as in Table S8 and S11 in the Supplementary Materials for further details.

      (6) The authors should include explicit tests across groups for their brain-behavior analyses if they want to make any developmentally relevant interpretations of the data. Also, It would be helpful to include similar analyses to those using the univariate signals, and not just the RSA results.

      Following reviewer’s suggestion, we included brain-behavior analyses for univariate data as well as RSA data with explicit tests across groups. These can be found in the Results Section pp. 18-20, 28-32. Due to the interdependence of predefined ROIs and to avoid running a high number of correlation tests, we employed the partial least square correlation analysis for this purpose. This approach focuses on multivariate links between specified Regions of Interest (ROIs) and fluctuations in memory performance over short and long delays across different age cohorts. We argue that this multivariate strategy offers a more comprehensive understanding of the relationships between brain metrics across various ROIs and memory performance, given their mutual dependence and connectivity (refer to Genon et al. (2022) for similar discussions).

      (7) There could be dramatic differences in memory processing across 5-7 year olds. I know the sample is a little small for this, but I would like to see regressions done within the middle childhood group in addition to the across-group comparisons.

      We have included information detailing the relationship between memory retention rate and age within the child group (refer to p. 13). In the child group, both recent and short delay remote memory improved with age. However, the retention rate for long-delayed memory did not show a significant improvement with increasing age in children.

      (8) I am concerned that the authors used global-signal as a regressor in their first-level analyses, given that there could be large changes in the amount of univariate activation that occurs across groups. This approach can lead to false positives and negatives that obscure localized differences. The authors should remove this term, and perhaps use the mean sum of the white matter or CSF to achieve the noise regressor they wanted to include.

      We understand the reviewers' concerns. However, we believe that our approach is recommended for the pediatric population. Specifically, Graff et al., 2021, found that global signal regression is a highly efficacious denoising technique in their study of 4 to 8-year-old children. This technique was previously suggested for adults by Ciric et al., 2017, and the benefits in terms of motion and physiological noise removal outweigh the potential costs of removing some signal of interest, as indicated by Behzadi et al., 2007. Additionally, we incorporated the six anatomic component-based noise correction (CompCor) to account for WM and CSF signals, as recommended in the pediatric literature.

      (9) The authors discuss the relationship between hippocampal reactivation and worse memory through the lens of Schapiro et al., but a new paper by Tanriverdi et al came out in JOCN recently that is more similar to the authors' findings.

      Thank you for highlighting the recent paper by Tanriverdi et al. in JOCN, which aligns closely with our findings. We appreciate the suggestion and agree that exploring this alignment could further enrich our discussion on the relationship between hippocampal reactivation and memory retention. We incorporated this work in our revised manuscript .

      Minor Comments

      - I was surprised that the authors did not see any differences in univariate signals for memory retrieval as a function of development, as much of the prior work has shown differences (for example work by Tracy Riggins). I believe this contrast should be highlighted in the discussion.

      - Given the robust differences in sleep patterns across childhood and the role of sleep in systems consolidation framework, I think this feature should be highlighted in either the introduction or discussion.

      - Could the authors report on differences (or lack of differences) in head motion across the groups, and if they are different whether they could include them as a confounding variable.

      I believe we included six motion parameters and their derivatives into the model

      Thank you for your comments.

      First, prior works on univariate signals of memory retrieval focused mostly on remembered vs forgotten contrasts, while in our study we focused on remote vs recent in short and long delay only for correctly remembered items. This can partially explain the results. We highlighted this information in the discussion session.

      Second, we agree with the reviewer that sleep patterns across childhood should be addressed in the analysis. Therefore, we incorporated them in the discussion section.

      Third, indeed head motion were included in the analysis as confounding variables, as adding them is highly recommended for the developmental population (e.g., Graff et al. 2021). As an example, we observed higher framewise displacement in children compared to adults, t = -16(218), p <. 001, as well as in translational y, t = -2.33(288), p = .02.

      Reviewer #3 (Public Reviews):

      Summary:

      This study aimed to understand the neural correlates of memory recall over short (1-day) and long (14-days) intervals in children (5-7 years old) relative to young adults. The results show that children recall less than young adults and that this is accompanied by less activation (relative to young adults) in brain networks associated with memory retrieval.

      Strengths:

      This paper is one of few investigating long-term memory (multiple days) in a developmental population, an important gap in the field. Also, the authors apply a representational similarity analysis to understand how specific memories evolve over time. This analysis shows how the specificity of memories decreases over time in children relative to adults. This is an interesting finding.

      We thank the reviewer for the appraisal of our manuscript.

      Weaknesses:

      Overall, these results are consistent with what we already know: recall is worse in children relative to adults (e.g., Cycowicz et al., 2001) and children activate memory retrieval networks to a lesser extent than adults (Bauer et al, 2017).

      It seems that the reduced activation in memory recall networks is likely associated with less depth of memory encoding in children due to inattentiveness, reduced motivation, and documented differences in memory strategies. In regard to this, there was consideration of IQ, sex, and handedness but these were not included as covariates as they were not significant although I note p<.16 suggests there was some level of association nonetheless. Also, IQ is measured differently for the children and adults so it's not clear these can be directly contrasted. The authors suggest the instructed elaborative encoding strategy is effective for children and adults but the reference in support of this (Craik & Tulving, 1975) does not seem to support this point.

      Thank you for your review, and we appreciate your valuable feedback. Here are our responses and clarifications:

      Regarding the novelty of the results in terms of mentioned existent literature, we believe that in contrast to Cycowicz et al. (2001) and Bauer et al (2017), etc, we assess not only immediate memory after encoding with semantic judgement of abstract associations, but add to these findings investigating consolidation-related changes in complex associative and contextual information in much under investigated sample of 5-to-7-year-old preschoolers. With this we are able to infer also how neural representations of children change over time, providing invaluable insights into knowledge formation in this developmental cohort.

      With this, the observed age differences are not so of primary importance, as time-related changes in mnemonic representations observed in children.

      Regarding the assumption of inattentiveness in children, we want to emphasize that the experimenter was present throughout the learning process, closely supervising the children. We observed prompt responses to every trial in children and noted an increase in accuracy over the encoding-learning cycles, leading us to conclude that the children were indeed attentive to the task. The observed accuracy improvement across learning cycles  indicates increase in remembered information. Furthermore, we took measures to ensure their engagement, including extensive training in both verbal and computerized versions to ensure that they understood and actively created stories to support their learning.

      We collected motivation data after each task execution in children, and the results indicated that they scored high in motivation. Children not only completed the tasks but also expressed their willingness to participate in subsequent appointments, highlighting their active involvement in the study.

      The observed differences in the efficiency of strategy utilization were expected, given developmental differences in the associative and strategic components of memory in children, as noted in prior research (Shing, 2008, 2010).

      We appreciate your point about IQ, sex, and handedness. These variables were indeed included in the behavioral models, and mean brain activation was also included in the brain data models, addressing the potential influence of these factors on our results.

      While it's true that we applied different tests to measure IQ in children and adults, these tests targeted comparable subtests that addressed similar cognitive constructs. As the final IQ values are standardized, we believe it is appropriate to compare them between the two groups.

      Lastly, we agree that the citation Craik & Tulving, 1975 supports the notion of effectiveness of instructed elaborative learning only in adults, but not in children. For this purpose, we added relevant literature for the child cohort (i.e., Pressley, 1982; Pressley et al., 1981; Shing et al., 2008).

      Reviewer #3 (Recommendations For The Authors):

      An additional point for the authors to consider is that the hypotheses were uncertain. The first is that prefrontal, parietal, cerebellar, occipital, and PHG brain regions would have greater activation over time in adults and not children - which is very imprecise as this is basically the whole brain. Moreover, brain imaging data may be in opposition to this prediction: e.g., the hippocampus has a delayed maturational pattern beyond 5-yrs (e.ge., Canada 2019; Uematsu 2012) and some cortical data predicts earlier development in these regions.

      Thank you for your feedback, and we appreciate your insights regarding our hypotheses.

      The selection of our regions of interest (ROIs) was guided by prior literature that has demonstrated the interactive involvement of multiple brain areas in memory retrieval and consolidation processes. Additionally, our recent work utilizing multivariate partial least square correlation analysis (Schommartz, 2022, Developmental Cognitive Neuroscience) has indicated that unique profiles derived from the structural integrity of multiple brain regions are differentially related to short and long-delay memory consolidation.

      Indeed, the literature suggests that the hippocampus may exhibit a more delayed maturational pattern extending into adolescence, as supported by studies such as Canada (2019) and Uematsu (2012), etc. We added this information as well as findings from the literature on cortical development to be more balanced in our review of the literature.

      Given this complexity, we believe it is important to emphasize in our discussion that both the medial temporal lobe, including the hippocampus, and cortical structures, as well as the cerebellum, undergo profound neural maturation. We highlight these nuances in our revised manuscript to provide a more comprehensive perspective on the developmental differences in memory retention over time.

      The writing was challenging to follow - consider as an example on page 9 the sentence that spans 10 lines of text.

      Thank you for bringing this to our attention. We have carefully reviewed the manuscript and have made efforts to streamline the text, ensuring that sentences are not overly long or complex to improve readability and comprehension.

      I found the analysis (and accompanying figures) a bit of a data mine - there are so many results that are hard to digest and in other cases highly redundant one from the other. This may be resolved in part by moving redundant findings to the supplemental. Some were hard to follow - so when there is a line between recent and recent data, that seems confusing to connect data that, I believe, are different sets of items. Later scatterplots (Fig 7) have pale yellow dots that I had a hard time seeing.

      Thank you for bringing up your concerns regarding the analysis and figures in our manuscript. We have carefully considered your feedback and made several improvements to address these issues.

      To alleviate the challenge of digesting numerous results, we have taken steps to enhance clarity and reduce redundancy. Specifically, we have moved some of the redundant findings to the supplementary sections, which should help streamline the main manuscript and make it more reader friendly.

      Regarding the line between 'recent' and 'recent data,' figure were transformed to a clearer version. Furthermore, we have improved the visibility of certain elements, such as the pale-yellow dots in the scatterplots (Fig 1, 2, 4, etc. ), to ensure that readers can better discern the data points.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      […] 

      Weaknesses: 

      The question of the physiological relevance of short bouts of ischemia remains.

      The chemical ischemia protocol induces a duration-dependent ATP depletion in acute slices on a time scale of minutes (Pape and Rose 2023). This is about the same time scale as the peri-infarct depolarisation (Lauritzen et al. 2011) that the protocol attempts to model. Of course, such models do not completely replicate the complex situation in vivo. However, the presented analyses of synapse function cannot be performed in vivo. We discuss this now in the manuscript.

      The precise mechanisms underlying the shift between ischemia-induced long-term potentiation and long-term failure of synaptic responses were not addressed. Could this be cell death?

      Thank you for the comment. Yes, we indeed believe that the persistent failure of synaptic transmission is because of neuronal cell death (i.e., of CA1 pyramidal cells) or at least persistent depolarisation. We did not explicitly state that in the original submission but do so in the revised manuscript. It is supported by the unquantified observation of swelling and/or loss of integrity of CA1 pyramidal cell bodies in parallel to postsynaptic failure. It is also in line with many reports from the literature, of which we now cite two (lines 186-198).

      Sex differences are not addressed or considered.

      We have performed all experiments on male mice, as indicated in Material and Methods. We have indeed not addressed sex differences of the observed effects. We consider this, and many other important factors, to be interesting topics for follow-up studies. This is now discussed (lines 413-424).

      Reviewer #2 (Public Review): 

      […]

      Weaknesses: 

      The weaknesses are minor and only relate to the interpretation of some of the data regarding the presynaptic mechanisms causing the potentiation of release. The authors measured the fiber volley, which reflects the extracellular voltage of the compound action potential of the fiber bundle. The half-duration of the fiber volley was increased, which could be due to the action potential broadening of the individual axons but could also be due to differences in conduction velocity. We are therefore skeptical whether the conclusion of action broadening is justified.

      These are excellent points. We have added an analysis demonstrating that axonal conduction velocity is unlikely to be affected. Nonetheless, the fiber volley is indeed an indirect measure of what happens in individual axons. We have adjusted our interpretation accordingly and now also discuss alternative explanations of our findings (lines 363-379).

      Reviewer #3 (Public Review): 

      […]

      Weaknesses: 

      The data on fiber volley duration should be supported by more direct measurements to prove that chemical ischemia increases presynaptic Ca2+ influx due to a presynaptic broadening of action potentials. Given the influence that positioning of the stimulating and recording electrode can have on the fiber volley properties, I found this data insufficient to support the assumption of a relationship between increased iGluSnFR fluorescence, action potential broadening, and increased presynaptic Ca2+ levels.

      We have added a new analysis showing that the latency of the fiber volley is unaffected and relatively constant, which strengthens our conclusion. But the fiber volley is indeed an indirect measure of action potential firing in individual axons. The suggested experiment, which would require simultaneous recording of Ca2+ and action potentials in single axons in combination with chemical ischemia, is extremely difficult, if possible at all. Instead, we have extended the discussion and include now further alternative mechanistic explanations (lines 363-379).

      The results are obtained in an ex-vivo preparation, it would be interesting to assess if they could be replicated in vivo models of cerebral ischemia. 

      This would certainly be very interesting but also extremely challenging technically. For a detailed analysis of synaptic changes as presented here, the main difficulty will be to stimulate and visualise glutamate release exclusively in an isolated population of synapses while recording postsynaptic responses in a stroke model.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      […]

      Labelling of experimental groups of 2-minute and 5-minute chemical ischemia is more accurate than "metabolic stress" and "with postsynaptic failure". The critical difference between these two conditions is lost with this nomenclature. The reader could be misled to believe that the two groups form a heterogenous population of responses from the same experimental manipulation which is incorrect.

      We had stated in the manuscript that we ‘ … grouped combined iGluSnFR and electrophysiological recordings according to the effect of chemical ischemia on the synaptic response: ‘chemical ischemia with postsynaptic failure’ if the postsynaptic response did not recover to above 50% of the baseline level and ‘chemical ischemia’ when it did (as indicated in Fig. 1H). …’. The recordings were not grouped according to chemical stress duration but according to the effect on the postsynaptic response. We have revised the text explaining this (lines 125-135) and illustrate that now also in Fig. 1H. We hope this is easier to follow now.

      More details on the long-term impact of 5-minute ischemia on cell viability would be enlightening regarding the specific mechanism separating these two conditions. With 2 minutes it would appear that cells remain alive (i.e. intact post-synaptic responses), 5 minutes however, inducing cell death. 

      Yes, our observations, although not quantified, are in line with cell death as CA1 pyramidal cell bodies appeared swollen and/or lost their integrity when chemical ischemia was followed by postsynaptic failure. This is also in line with reports from the literature. We have revised the results section accordingly (lines 186-201).

      In the paragraph titled "glutamate uptake is unaffected after acute chemical ischemia", there are two erroneous citations of Figure S3 that should be Figure S4.

      Thank you. We corrected this mistake.

      The sex of animals is not given. This is essential information. 

      We used male mice as indicated in the initial version of the manuscript (Material and Methods). We have added a statement regarding the role of sex to the final section of the Discussion.

      Reviewer #2 (Recommendations For The Authors):

      We propose addressing the weaknesses mentioned in the public review. As said, the fibre volley is a very indirect measure of action potential broadening. Based on the iGluSnFR data, the authors predict that the potentiation is mediated by depolarization, action potential broadening, and increased presynaptic calcium influx. The latter could be tested experimentally, but this does not seem necessary if the data are interpreted more cautiously. For example, other explanations for the broadened fiber volley could be mentioned, such as a slowing and/or dispersion of the action potential propagation speed. Furthermore, depolarization could cause elevated resting calcium concentrations, which could potentiate release independently of action potential broadening. Finally, classical forms of presynaptic potentiation of the release machinery that occur during homeostatic plasticity or Hebbian plasticity may operate independently of calcium dynamics.

      Thank you for this comment. The discussion of the mechanism was indeed too short. We have added an analysis of the fiber volley delay after stimulation, which was not affected. Presynaptic action potential broadening is, in our opinion, a very likely explanation for our observations but we did not perform direct experiments. Directly recording presynaptic action potentials and Ca2+ transients in the chemical ischemia model over extended periods of time is a major technical challenge and certainly of interest in the future. As suggested, we have expanded the discussion section and now mention various alternative explanations (lines 363-379).

      There are the following minor suggestions:

      Add line numbers.

      We have added line numbers.

      We would suggest providing exact P values instead of asterisks in the figures. 

      We agree that having exact P values in the figure panels can be very helpful. However, in the present figures they are hard to integrate without overcrowding the already complex panels and thereby obscuring other important details. All p-values are included in the figure legends and/or main text.

      Abstract: "We also observed an unexpected hierarchy of vulnerability of the involved mechanisms and cell types." This sentence is hard to understand and cell types were not directly compared (i.e. axons of CA3 and axons of CA1 neurons were not compared).

      We have revised this statement and removed the reference to cell types.

      In Figure 1G there seems to be an increase in the fiber volley. Is this significant? Could this be due to swelling of the slice during chemical ischemia? Or an increase in excitability? Maybe this could be discussed. 

      The effect was analysed in the context of Fig. 2. A significant increase of the fiber volley amplitude was detected in chemical ischemia (Fig. 2H) but also under control conditions (Fig. 2F). We therefore consider this a change that is detectable but not related to chemical ischemia and not a potential explanation for increased glutamate release (lines 157-160). Also, no significant fiber volley increase was detected in chemical ischemia with postsynaptic failure (Fig. 2H) and in the experiments illustrated in Fig. 4E. Our interpretation is that the fiber volley unspecifically increases in some experiments over the time course of the experiment (~ 60 min) but this is unrelated to chemical ischemia.

      In the results: "A fully separate set of experiments..." Please explain better what this means. 

      We have revised the entire section to explain more clearly how recordings were grouped (lines 125135).

      In the results: "...(Syková and Nicholson, 2008) (Figure S3). However, this was not observed for chemical ischemia without postsynaptic failure (Figure S3), in which the increased glutamate transients were observed." This should probably refer to Figure S4. 

      Thank you for spotting this mistake. We corrected it.

      The last sentence in the results "... most likely by increased presynaptic Ca2+ influx, and, at the same time, the postsynaptic response." This is difficult to understand. Does "at the same time" refer to another mechanism or the consequence of more Ca2+? 

      We revised this part of the results section to improve clarity and toned down our conclusions (lines 328-335 and 363-379).

      Reviewer #3 (Recommendations For The Authors): 

      There are a few points that the author needs to clarify: 

      The authors do not discuss the different behaviour of iGlu F0 during chemical ischemia and chemical ischemia with postsynaptic failure shown in Figure 2, panels D and E. In the first case, during the application of the solution to induce ischemia, iGluF0 decreases while in the other case, it strongly increases before falling down. In both cases, the fEPSP slope is decreased. How does the author explain this observation? 

      We attribute the transient increase of extracellular glutamate during prolonged chemical ischemia to the increase of synaptic glutamate release observed previously under such conditions (Hershkowitz et al. 1993; Tanaka et al. 1997) and other mechanisms reviewed by us (Passlick et al. 2021) (e.g., glial glutamate release, transiently reduced glutamate uptake), which we could not detect during shorter chemical ischemia. The initial drop of the fEPSP slope is most likely due to postsynaptic depolarisation, which is followed by a repolarisation if the chemical stress duration is short. We now explain this in more detail in lines 185-200 of the revised manuscript. Although we focussed on the bi-directional effect on longer timescales in this manuscript, this transient phase during chemical ischemia is very interesting for further investigations.

      On page 8, first line, I think that the authors meant Figure S4, not Figure S3 when they mentioned results on ECS diffusivity and ECS fraction. 

      Yes, thank you for spotting this. We corrected the mistake.

      In Supplementary Figure 5 panel B It seems that PPR is significantly reduced upon chemical ischemia (asterisk on columns green) but the authors claimed in the paper at page 10 that "Analysing the paired-pulse ratio (PPR) of postsynaptic response and iGluSnFR transients revealed no consistent changes after chemical ischemia (Figure S5).". Did the authors refer to the data normalized in panel D? In this case, I do not see the need to normalize raw data that have been already shown in a previous panel and that give different statistical results, probably due to the different tests used (paired in panel B and not paired in panel D). 

      We have clarified this point in the supplementary material (Figure S5, legend). There is a relevant difference between the analyses presented in panel B and D. The paired test presented in B analyses the change of the electrophysiological PPR in response to chemical ischemia. The test in D on the electrophysiologically PPR asks if the reduction in B is significantly different from the changes seen under control conditions. Because it is not, we conclude that chemical ischemia has no relevant effect on the electrophysiological PPR and, in combination with the results on the iGluSnFR PPR, also not on short-term plasticity, as tested here.

      References

      Hershkowitz N, Katchman AN, Veregge S. Site of synaptic depression during hypoxia: a patch-clamp analysis. Journal of Neurophysiology 69: 432–441, 1993.

      Lauritzen M, Dreier JP, Fabricius M, Hartings JA, Graf R, Strong AJ. Clinical Relevance of Cortical Spreading Depression in Neurological Disorders: Migraine, Malignant Stroke, Subarachnoid and Intracranial Hemorrhage, and Traumatic Brain Injury. J Cereb Blood Flow Metab 31: 17–35, 2011.

      Pape N, Rose CR. Activation of TRPV4 channels promotes the loss of cellular ATP in organotypic slices of the mouse neocortex exposed to chemical ischemia. The Journal of Physiology 601: 2975–2990, 2023.

      Passlick S, Rose CR, Petzold GC, Henneberger C. Disruption of Glutamate Transport and Homeostasis by Acute Metabolic Stress. Front Cell Neurosci 15: 637784, 2021.

      Tanaka E, Yamamoto S, Kudo Y, Mihara S, Higashi H. Mechanisms Underlying the Rapid

      Depolarization Produced by Deprivation of Oxygen and Glucose in Rat Hippocampal CA1 Neurons In Vitro. Journal of Neurophysiology 78: 891–902, 1997.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Review:

      Reviewer #1 (Public Review):

      In 'Systems analysis of miR-199a/b-5p and multiple miR-199a/b-5p targets during chondrogenesis', Patel et al. present a variety of analyses using different methodologies to investigate the importance of two miRNAs in regulating gene expression in a cellular model of cartilage development. They first re-analysed existing data to identify these miRNAs as one of the most dynamic across a chondrogenesis development time course. Next, they manipulated the expression of these miRNAs and showed that this affected the expression of various marker genes as expected. An RNA-seq experiment on these manipulations identified putative mRNA targets of the miRNAs which were also supported by bioinformatics predictions. These top hits were validated experimentally and, finally, a kinetic model was developed to demonstrate the relationship between the miRNAs and mRNAs studied throughout the paper.

      I am convinced that the novel relationships reported here between miR-199a/b-5p and target genes FZD6, ITGA3, and CAV1 are likely to be genuine. It is important for researchers working on this system and related diseases to know all the miRNA/mRNA relationships but, as the authors have already published work studying the most dynamic miRNA (miR-140-5p) in this biological system I was not convinced that this study of the second miRNA in their list provided a conceptual advance on their previous work.

      We believe this study is an enhancement on our previous work for two reasons, which have been alluded to in new text within the introduction. Firstly, our previous work used experimental and bioinformatic analysis to identify microRNAs with significant regulatory roles during chondrogenesis. This new manuscript additionally uses  a systems biology approaches to identify novel miRNA-mRNA interactions and capture these within an in silico model. Secondly, this work was initiated by the analysis of our previously generated data – using a novel tool we developed for this type of data (Bioconductor - TimiRGeN).  

      I was also concerned with the lack of reporting of details of the manipulation experiments. The authors state that they have over-expressed miR-199a-5p (Figure 2A) and knocked down miR-199b-5p (Figure 2B) but they should have reported their proof that these experiments had worked as predicted, e.g. showing the qRT-PCR change in miRNA expression. Similarly, I was concerned that one miRNA was over-expressed while the other was knocked down - why did the authors not attempt to manipulate both miRNAs in both directions? Were they unable to achieve a significant change in miRNA expression or did these experiments not confirm the results reported in the manuscript?

      We agree with the reviewer that some additional data were needed to demonstrate the effective regulation of miR-199-5p.  Hence, Supplementary Figure 1 is now included which provides validation of the effects of miR-199a-5p overexpression

      (Supplementary Figure 1A) and inhibition of miR-199a/b-5p (Supplementary Figure 1B). Within the main manuscript, Figure 2B has been amended to include the consequences of inhibition of miR-199a-5p, with 2C showing the consequences of miR-199b-5p inhibition. Further, we include new data with regards to miR-199a/b-5p inhibition on CAV1 (Figure 4A). 

      I had a number of issues with the way in which some of the data was presented. Table 1 only reported whether a specific pathway was significant or not for a given differential expression analysis but this concealed the extent of this enrichment or the level of statistical significance reported. Could it be redrawn to more similarly match the format of Figure 3A? The various shades of grey in Figure 2 and Figure 4 made it impossible to discriminate between treatments and therefore identify whether these data supported the conclusions made in the text. It also appeared that the same results were reported in Figure 3B and 3C and, indeed, Figure 3B was not referred to in the main text. Perhaps this figure could be made more concise by removing one of these two sets of panels.

      We agree with all points made here and have amended these within the manuscript. Figure 1A is now pathway enrichment plots from the TimiRGeN R Bioconductor package, and the table which previously showed the pathways enriched at each time point is now in the supplementary materials (supp. Table 1). Figure 2 and 4 now have color instead of shades of grey. Figure 3C has now been moved to supplementary materials (Supplementary Figure 2) and is referenced in the text. 

      Overall, while I think that this is an interesting and valuable paper, I think its findings are relatively limited to those interested in the role of miRNAs in this specific biomedical context.

      Reviewer #2 (Public Review):

      Summary:

      This study represents an ambitious endeavor to comprehensively analyze the role of miR199a/b-5p and its networks in cartilage formation. By conducting experiments that go beyond in vitro MSC differentiation models, more robust conclusions can be achieved.

      Strengths:

      This research investigates the role of miR-199a/b-5p during chondrogenesis using bioinformatics and in vitro experimental systems. The significance of miRNAs in chondrogenesis and OA is crucial, warranting further research, and this study contributes novel insights.

      Weaknesses:

      While miR-140 and miR-455 are used as controls, these miRNAs have been demonstrated to be more relevant to Cartilage Homeostasis than chondrogenesis itself. Their deficiency has been genetically proven to induce Osteoarthritis in mice. Therefore, the results of this study should be considered in comparison with these existing findings.

      We agree with the reviewers comments. miR-455-null mice develop normally but miR-140-null (or mutated) mice and humans do have skeletal abnormalities (e.g. Nat Med. 2019 Apr;25(4):583-590. doi: 10.1038/s41591-019-0353-2), indicating a role in chondrogenesis.  We have made an addition in the description to point towards the need to assess the roles miR-199a/b-5p may play during skeletogenesis and OA. We anticipate miR-199a/b-5p to be relevant in OA and have ongoing additional work for this – but this beyond the scope of this manuscript. 

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Beyond the issues raised in the public review, I had a few minor recommendations that are largely designed to help improve the understanding of the manuscript as it is currently written.

      (1) Please provide the statistical tests used to obtain p-values in the Figure 2 and 4 legends.

      We have now added statistical test information to the figure legends of figures 2 and 4.

      (2) It is stated on p. 9 that both miRNAs may share a functional repertoire because 25 and 341 genes are interested between their inhibition experiments. Please provide statistical support that this overlap is an enrichment over the null background in this experiment. Total DE genes – chi squared. Expected / Observed. 

      A chi-squared test is now presented in the manuscript which shows that the number of significant genes which were found in common between miR-199a-5p knockdown and miR-199b-5p knockdown were significantly more than expected for day 0 or day 1 of the experiments. 

      (3) The final sentence on p. 12 (beginning 'Size of the points reflect...') seemed out of place - is it part of a legend?

      Thank you for pointing out this mistake - it was part of figure 3C and now is in the supplementary materials.

      (4) A sentence on p. 14 reads that 'FZD6 and ITGA3 levels increased significantly' but this should read decreased, rather than increased. Quite an important typo!

      Thank you for pointing this error out. It has been corrected.

      (5) Theoretical transcripts are mentioned in the legend of Figure 5A but these were not present in the figure. Please include these or remove them from the legend.

      This error has been removed form Figure 5A.

      (6) On p 20, the references 22 and 27 should I think be moved to earlier in the sentence (after 'miR-199a-5p-FZD6 has been predicted previously'). Currently, it reads as if these references support your luciferase assays which you claim are the first evidence for this target relationship.

      We agree with this change and have corrected the manuscript.

      (7) The reference to Figure 5D on p. 20 should be a reference to Figure 5C.

      Thank you for pointing this error out – this has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) The paper is based on the importance of miR-140 and miR-455 as miRNAs in chondrogenesis, citing only Barter, M. J. et al. Stem Cells 33, (2015). Considering the scope and results of this study, this citation is insufficient.

      We agree with this reviewers comments. For many year miR-140 and miR-455 have been experimented on and their importance in OA research has become apparent. We included additional references within the introduction to address this.

      (2) Analyzing chondrogenesis solely through differentiation experiments from MSCs is inadequate. It is essential to perform experiments involving the network within normal cartilage tissue and/or the generation of knockout mice to understand the precise role of miR199a/b-5p in chondrogenesis.

      We have added an additional paragraph in the discussion to state this, and do believe it is highly important that miR-199a/b-5p be tested in OA samples – however this would be beyond the intended scope of this article.

      (3) In light of the above points, it is imperative to investigate the role of miR-199a/b-5p beyond the in vitro differentiation model from MSCs, encompassing mouse OA models or human disease samples.

      In tangent with the previous address, we agree with the pretense and believe additional experiments should be performed to gain more insight to the mechanism of how miR-199a/b-5p regulate OA. But development of a new mouse line to investigate this is not in the scope of this manuscript.

    1. Author response:

      eLife assessment

      This important study describes the crystallographic screening of a number of small molecules against a viral enzyme critical for the 5' capping of SARS-CoV-2 RNA and viral replication. While the high-quality crystal structures and complementary biophysical assays in this study provide solid evidence to support the major claims regarding how these small molecule compounds bind to the viral enzyme, the mismatch between the antiviral activity and binding to the viral enzyme of several small molecule compounds could have been more thoroughly investigated or discussed. This paper would be of interest to the fields of coronavirus biology, structural biology, and drug discovery.

      We do fully agree that the antiviral assay results could be brought better into context clarifying that the antiviral effects of tubercine and its derivates are due to off-target effects.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript describes the crystallographic screening of a number of small molecules derived from the natural substrates S-adenosyl methionine (SAM) and adenine, against the SARS-CoV-2 2'-O-methyltransferase NSP16 in complex with its partner NSP10. High-quality structures for several of these are presented together with efforts to evaluate their potential biophysical binding and antiviral activities. The structures are of high quality and the data are well presented but do not yet show potency in biophysical binding. They only offer limited insights into the design of inhibitors of NSP16/10.

      Strengths:

      The main strengths of the study are the high quality of the structural data, and associated electron density maps making the structural data highly accurate and informative for future structure-based design. These results are clearly presented and communicated in the manuscript. Another strength is the authors' attempts to probe the binding of the identified fragments using biophysical assays. Although in general the outcome of these experiments shows negative data or very weak binding affinities the authors should be commended for attempting several techniques and showing the data clearly. This study is also useful as an example of the complexities associated with drug discovery on a bi-substrate target such as a methyltransferase, several of the observed binding poises were unexpected with compounds that are relatively similar to substrates binding in different parts of the active site or other unexpected orientations. This serves as an example of how experimental structural information is still of crucial importance to structure-based drug design. In general, the claims in the manuscript are well supported by the data.

      Weaknesses:

      The main limitations of the study are that the new structures generated in the study are fairly limited in terms of chemical space being similar to either SAM or RNA-CAP analogues. It feels a little bit of a lost opportunity to expand this to more diverse ligands which may reveal potential inhibitors that are distinct from current methyltransferase inhibitors based on SAM analogues and truly allow a selective targeting of this important target.

      It is true that it makes sense to screen for more diverse compounds to expand to a more diverse ligand set and we do hope our study motivates to do so. Given the limited number of crystal structures of nsp10-16 with potential drug molecules, the aim of this study was to upgrade the data base with new complex structures to have a pool of complex structures for future compound designs with increased selectivity. Furthermore, some of the hits are known inhibitors of similar enzymes and most prominent and potent methyltransferase inhibitors are structurally related to SAM, like sinefungin and tubercidine. We do think that knowing which SAM compounds or fragments of SAM are able to bind in the nsp10-16 active site is highly valuable for further specific and optimized inhibitor design.

      Another limitation is the potentially misleading nature of the antiviral assays. It is not possible to say if these compounds display on-target activity in these assays or even if the inhibition of NSP16/10 would have any effect in these assays. Whilst the authors do mention these points I think this should be emphasized more strongly.

      That is a very valid point and we do not believe that the antiviral activity is based on on-target effects. We do agree that the way it is currently presented can be considered misleading and we indeed clarify this point in the revised version.

      Minor critical points:

      The authors state that their crystals and protein preps have co-purified SAM occupying the active site of the crystals. Presumably, this complicates the interpretation of electron density maps as many of the ligands share overlap with the existing SAM density making traditional analysis of difference maps challenging. The authors did not utilize the PanDDA analysis for this step, perhaps this is related to the presence of SAM in the ground state datasets? Also, occupancies are reported in the manuscript in some cases to two significant figures, this seems to be an overestimation of the ability of refinement to determine occupancy based on density alone and the authors should clarify how these figures were reached.

      We have used PanDDA in parallel for hit finding. We however did not see any advantages for this target over the hit finding results from the visual inspection. This is probably as mentioned because of SAM being present is the “ground state” which complicates the PanDDA map calculations.

      Regarding the occupancies, we fully agree with this comment and change it to reasonable digits and clarify how the figures were reached.  

      The molecular docking approach to pre-selection of library compounds to soak did not appear to be successful. Could the authors make any observations about the compounds selected by docking or the docking approach used that may explain this?

      Yes, it is a good point to give possible explanations why the docking approach was not successful to facilitate similar approaches in future studies.

      Reviewer #2 (Public Review):

      Summary:

      The study by Kremling et al. describes a study of the nsp16-nsp10 methyl transferase from SARS CoV-2 protein which is aimed at identifying inhibitors by x-ray crystallography-based compound screening.<br /> A set of 234 compounds were screened resulting in a set of adenosine-containing compounds or analogues thereof that bind in the SAM site of nsp16-nsp10. The compound selection was mainly based on similarity to SAM and docking of commercially available libraries. The resulting structures are of good quality and clearly show the binding mode of the compounds. It is not surprising to find that these compounds bind in the SAM pocket since they are structurally very similar to portions of SAM. Nevertheless, the result is novel and may be inspirational for the future design of inhibitors. Following up on the crystallographic screen the identified compounds were tested for antiviral activity and binding to np16-nsp10. In addition, an analysis of similar binding sites was presented.

      Strengths:

      The crystallography is solid and the structures are of good quality. The compound binding constitutes a novel finding.

      Weaknesses:

      The major weakness is the mismatch between antiviral activity and binding to the target protein. Only one of the compounds could be demonstrated to bind to the nsp16-nsp10 protein. By performing a displacement experiment using ITC Sangivamycin is concluded to bind with a Kd > 1mM. However, the same compound displays antiviral activity with an EC50 of 0.01 microM. Even though the authors do not make specific claims that the antiviral effect is due to inhibition of nsp16-nsp10, it is implicit. If the data is included, it should state specifically that the effect is not likely due to nsp16-nsp10 inhibition.

      We do believe that the antiviral data are valuable and should be published within this work. We also agree with the comment that it should be clearly stated that the antiviral effect is not likely because of nsp10-16 inhibition and we will optimize that accordingly.

      The structure of the paper and the language needs quite a lot of work to bring it to the expected quality.

      We will go through the manuscript again and further improve the structure and language as much as possible

      Technical point:

      Refinement of crystallographic occupancies to single digit percentage is not normally supported by electron density.

      We agree with that point and correct it in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Ewing sarcoma is an aggressive pediatric cancer driven by the EWS-FLI oncogene. Ewing sarcoma cells are addicted to this chimeric transcription factor, which represents a strong therapeutic vulnerability. Unfortunately, targeting EWS-FLI has proven to be very difficult, and a better understanding of how this chimeric transcription factor works is critical to achieving this goal. Towards this perspective, the group had previously identified a DBD-𝛼𝛼4 helix (DBD) in FLI that appears to be necessary to mediate EWS-FLI transcriptomic activity. Here, the authors used multi-omic approaches, including CUT&tag, RNAseq, and MicroC to investigate the impact of this DBD domain. Importantly, these experiments were performed in the A673 Ewing sarcoma model where endogenous EWS-FLI was silenced, and EWS-FLI-DBD proficient or deficient isoforms were re-expressed (isogenic context). They found that the DBD domain is key to mediating EWS-FLI cis activity (at msat) and to generating the formation of specific TADs. Furthermore, cells expressing DBD-deficient EWS-FLI display very poor colony-forming capacity, highlighting that targeting this domain may lead to therapeutic perspectives.

      We thank Reviewer 1 for their strong summary of Ewing sarcoma background and accurate description of our experimental approaches and findings.

      Strengths:

      The group has strong expertise in Ewing sarcoma genetics and epigenetics and also in using and analyzing this model (Theisen et al., 2019; Boone et al., 2021; Showpnil et al., 2022).

      We thank the reviewer.  

      They aim at better understanding how EWS-FLI mediated its oncogenic activity, which is critical to eventually identifying novel therapies against this aggressive cancer.

      We are happy to see that our overall aim was also appreciated by Reviewer 1.

      They use the most recent state-of-the-art omics methods to investigate transcriptome, epigenetics, and genome conformation methods. In particular, Micro-C enables achieving up to 1kb resolved 3D chromatin structures, making it possible to investigate a large number of TADs and sub-TADs structures where EWS-FLI1 mediates its oncogenic activity.

      We thank Reviewer 1 for their acknowledgement of our approaches and the resolution achieved with our Micro-C experiments.  

      They performed all their experiments in an Ewing sarcoma genetic background (A673 cells) which circumvents bias from previously reported approaches when working in non-orthologous cell models using similar approaches.

      We agree with the reviewer about the importance of using model systems that accurately capture features of the disease being studied. As we have added an additional cell line in the revision we should note that this second model also represents a Ewing sarcoma genetic background while representing tumors expressing another oncogenic fusion found in this disease. 

      Weaknesses:

      The main weakness comes from the poor reproducibility of Micro-C data . Indeed, it appears that the distances/clustering observed between replicates are typically similar or even larger than between biological conditions. For instance, in Figure 1B, I do not see any clustering when considering DBD1, DBD2, DBD+1, DBD+2.

      Lanes 80-83: "KD replicates clustered together with DBD replicate 1 on both axes and with DBD replicate 2 on the y-axis. DBD+ replicates, on the other hand, clustered away from both KD and DBD replicates. These observations suggest that the global chromatin structure of DBD replicates is more similar to KD than DBD+ replicates."

      When replacing DBD replicate 1 with DBD replicate 2, their statement would not be true anymore.

      Additional replicates to clarify this aspect seem absolutely necessary since those data are paving the way for the entire manuscript.

      These are valid concerns and we thank the reviewers for highlighting this limitation of poor clustering of Micro-C replicates on MDS plot. We account for this variability between different replicates when identifying differentially interacting regions. By using an adjusted p-value < 0.05, we aim to ensure that repeating the experiments we will discover the same differentially interacting regions with a false discovery rate of 5%.

      We also would like to note that the replicates cluster much closely on PCA plot of RNA-seq data (Supplementary Figure 1C) and as well as on PCA plot of H3K27ac CUT&Tag data (Figure 4A). Notably, the RNA-seq result has now reproduced when performed with different sets of hands across multiple studies (Boone, et. al., 2021 and this report), as well as in a second cell line (as reported in this manuscript revision). These observations suggest that the cells of these replicates are functionally similar to each other at a population level. Chromatin organization detected by Micro-C is a highly heterogenous within cells of a population (Misteli, et. al., 2020). Moreover, despite increased resolution with Micro-C over Hi-C, the conventional sequencing depth that Micro-C is performed at makes resolving finer scale 3D interactions, particularly between enhancers and promoters, challenging (Goel, et. al., 2023). Thus biologically relevant interactions driving EWSR1::ETS transcriptional regulation through de novo enhancers may have relatively weak signal in Micro-C. Both the strength of the signal and the heterogeneous chromatin state present in bulk samples could affect the average signal leading to poor clustering replicates (Hafner and Boettiger, 2022). 

      Importantly, rather than add an additional replicate of a single cell line, we repeated our study in an additional cell line, TTC466, and largely reproduced our high-level findings for transcription, enhancer formation, and 3D chromatin. Specific limitations of the TTC466 study are addressed in the Discussion section (392-420). The reproduction of weak/moderate clustering in the MDS plot in both A673 and TTC466 cell lines suggests the α4 helix of EWSR1::ETS fusions are important for reshaping 3D chromatin. However, higher resolution analyses focused on specific EWSR1::ETS-bound loci are likely an important area of future study required to fully understand the role of the α4 helix in chromatin regulation in Ewing sarcoma.

      Similarly:

      - In Figure 1C, how would the result look when comparing DBD2/KD2/DBD+2? Same when comparing DBD 1 with KD1 and DBD+1. Would the difference go in the same direction?

      This is a great point. We added distance decay plots of individual replicates in Supplementary Figure 2 and added discussion of these results in lines 88-89 of the text.

      - Figure 1D-E. How would these plots look like when comparing each replicate to each other's? How much difference would be observed when comparing, for instance, DBD1/DBD2 ? or DBD1/DBD+1?

      Unfortunately, separate replicates are required to conduct Differentially Interacting Region analysis as it determines statistically significant interactions. Therefore, we are unable to plot these analyses with individual replicates. 

      - Figure 2: again, how would these analyses look like when performing the analysis with only DBD1/DBD+1/KD1 or DBD2/DBD+2/KD?

      This is a good suggestion. It is possible to do such analysis. However, we will lose resolution as such that we may not accurately detect TADs, especially smaller TADs. Therefore, we decided to combine the biological replicates.   

      Another major question is the stability of EWS-FLI DBD vs EWS-FLI DBD+ proteins. In the WB, FLAG intensities seem also higher (2/3 replicates) in DBD+ condition compared to the DBD condition (Figure S1B).

      This is a valid concern with shRNA knock-down/rescue system and we regularly validate new constructs to ensure that they have similar expression levels as rescue with the wildtype fusion before proceeding to more exhaustive experimental workups. We would note that while we have not tested for differences in protein stability, for these constructs we largely see similar expression levels across multiple experiments, multiple cell lines, and multiple sets of hands. There may be some variations in expression level from experiment to experiment, but western blotting is a semiquantitative assay and it is also not possible to rule out that slight differences in band intensity may be a result of error in gel loading. For this reason, alongside western blotting for construct expression, we also validate construct function using RNA-seq and colony formation assays (as reported in this manuscript) and these show good agreement across biological replicates.  

      Indeed, it seems that they have more FLAG (i.e., EWS-FLI) peaks in the DBD+ condition compared to the DBD condition (Figure 2B). 

      We appreciate the comment since the legend of Figure 2B led to a misunderstanding. Figure 2B depicts the number of TADs detected in DBD and DBD+ conditions (height of the bar graphs) and the proportion of those TADs overlapped with FLAG, CTCF, both or neither peaks on y-axis. The number of FLAG peaks is actually lower in DBD+ as compared to DBD as shown in Figure 5A-B.  We clarified our Figure 2 legend to accurately describe the various proportions (color coded section) of TADs bound by DBD/DBD+ FLAG and CTCF.

      Would it be possible that DBD+ is just more expressed or more stable than DBD? The higher stability of the re-expressed DBD+ could also partially explain their results independently of the 3D conformational change. In other words, can they exclude that DBD+ and DBD binding are not related to their respective protein stability or their global re-expression levels?

      It is possible that DBD+ protein is overexpressed or more stable than DBD. With our current set of data, we cannot conclusively exclude if binding by DBD and DBD+ are not related to their expression level or stability. We would note, as above, that western blots, RNA-seq, and agar assays have largely reproduced across experiments, hands, and cell lines and that western blot is an imperfect assay for assessing protein stability.

      Surprisingly, WB FLI bands in DBD+ conditions are systematically (3/3 replicates) fainter than in DBD conditions (Figure S1B). How do the authors explain these opposite results between FLI and FALG in the WB?

      This is an excellent observation that highlights one of the intricacies of studying EWSR1::FLI1 in our KD/rescue system. Often the limiting factor for an experiment is whether or not the KD condition maintains KD through a second viral transduction for rescue and selection. We have observed over many years of working with this system that rescue conditions which are fully functional (i.e. wildtype EWSR1::FLI1, DBD+, etc.) tend to maintain better KD of endogenous EWSR1::FLI1. Constructs that don’t rescue EWSR1::FLI1 function sometimes maintain KD to a lesser degree, though frequently to a functional degree (i.e. cells are not transformed and EWSR1::FLI1 transcriptional regulation is not rescued). We suspect this observation, also raised by Reviewer 1 is resulted from a potential selection of cells with more endogenous EWSR1::FLI1 escaping KD in in DBD conditions due to selective pressures during expansion in tissue culture.

      We should note that the antibody used for detecting FLI recognizes residues that are deleted in

      DBD and DBD+ constructs, such that the FLI1 blot in Supplementary Figure 1B does not detect either construct. It only detects endogenous EWSR1::FLI1 and the 3X-FLAG-EWSR1::FLI1 construct in the middle lane that runs at a slightly higher molecular weight. The FLAG antibody is the only antibody that detects all three rescue constructs.    

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Bayanjargal et al. entitled "The DBD-alpha4 helix of EWS::FLI is required for GGAA microsatellite binding that underlies genome regulation in Ewing sarcoma" reports on the critical role of a small alpha helix in the DNA binding domain (DBD) of the FLI1 portion of EWS::FLI1 that is critical for binding to repetitive stretches of GGAA-motifs, i.e. GGAA microsatellites, which serve as potent neoenhancers in Ewing sarcoma.

      We thank Reviewer 2 for their succinct and accurate summary of our manuscript. 

      Strengths:

      The paper is generally well-written, and easy to follow and the data presented are of high quality, welldescribed and underpin the conclusions of the authors. The report sheds new light on how EWS::FLI1 mechanistically binds to and activates GGAA microsatellite enhancers, which is of importance to the field.

      We appreciate the reviewer’s assessment of our work. 

      Weaknesses:

      While there are no major weaknesses in this paper, there are a few minor issues that the authors may wish to address before publication:

      (1) While the official protein symbol for the gene EWSR1 is indeed EWS, the protein symbol for the gene FLI1 is identical, i.e. FLI1. The authors nominate the fusion oncoprotein EWS::FLI1 (even in the title) but it appears more adequate to use EWS::FLI1.

      We appreciate the reviewer for bringing this to our attention. Indeed, the most recent guideline for fusion proteins nomenclature is to use the full gene symbols separated by double colons. Therefore, the accurate nomenclature is EWSR1::FLI1. We replaced instances of EWS::FLI with EWSR1::FLI1 and have used the EWSR1::ERG nomenclature in our revised manuscript.  

      (2) The used cell lines should be spelled according to their official nomenclature (e.g. A-673 instead of A673).

      Corrected, thanks!

      (3) It appears as if the vast majority of results were generated in a single Ewing sarcoma cell line (A-673) which is an atypical Ewing sarcoma cell line harboring an activating BRAF mutation and may be genomically quite unstable as compared to other Ewing sarcoma cell lines (Kasan et al. 2023 preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2023.11.20.567802v1). Hence, it may be supportive for the paper to recapitulate/cross-validate a few key results in other Ewing sarcoma cell lines, e.g. by using EWS::ERG-positive cell lines. Perhaps the authors could make use of available published data.

      We thank Reviewer 2 for this helpful comment. We replicated the experiments in TTC-466 cells containing EWSR1::ERG fusion and found that as for A-673 cells the DBD-α4 helix is important for transcriptional, enhancer, and 3D chromatin regulation (Supplementary Figures 9-18).  

      (4) Figure 6 and Supplementary Figure 5 are very interesting but focus on two selected target genes of the fusion (FCGRT and CCND1). It would be interesting to see whether these findings also extend to common EWS::ETS transcriptional signatures that have been reported. The authors could explore their data and map established consensus EWS::ETS signatures to investigate which other hubs might be affected at relevant target genes.

      We expanded our analysis to other genes demonstrated to be regulated by EWSR1::FLI1 nucleated transcriptional hubs (Chong, et. al., 2018) and included NKX2-2 and GSTM4 gene regions in

      Supplementary Figure 7-8 in A-673 cells. We also investigated the same gene regions of FCGRT, CCND1, NKX2-2, GSTM4 in TTC466 cells and report them in Supplementary Figures 14-17. For the purpose brevity, we decided to include the above examples. We may need to develop different tools to conduct further analysis to understand the gene regulatory networks driven by DBD and DBD+ in relation to hub formation. Although it is a great suggestion to map such network, this may be outside the scope of this manuscript. We thank the reviewer for bringing such a good point to our attention.  

      (5) Table 1 is a bit hard to read. In my opinion, it is not necessary to display P-values with up to 8 decimal positions. The gene symbols should be displayed in italic font.

      Suggestions are adapted, thanks!

      Reviewing Editor (Recommendations For The Authors):

      We would draw the authors' attention to the following issues that would best benefit from additional revision.

      As indicated by Referee 1, an important issue concerns the apparent poor reproducibility of Micro-C data. In Figure 1B, the clustering of the DBD1, DBD2, DBD+1, and DBD+2 is poor.

      It appears that the distances/clustering observed between replicates are typically similar or even larger than between biological conditions. Lines 80-83: "KD replicates clustered together with DBD replicate 1 on both axes and with DBD replicate 2 on the y-axis. DBD+ replicates, on the other hand, clustered away from both KD and DBD replicates. If one replaced DBD replicate 1 with DBD replicate 2, this statement would no longer be true. The referees believe that it is important to fully account for these potential discrepancies. Most of the study is based on analyses of these data sets, so if there are issues with them it has repercussions on the entire study. We note however that in Figure 4A the clustering of the H3K27ac data is much more convincing. The referees also feel that it is important to show immunoblots of the expression of DBD and DBD+ levels in the experiments performed here. While this was previously shown in the Boone et al publication in 2021, it could be illustrated again here.

      We thank the editors for concisely summarizing the main weaknesses of the paper and underscoring the importance of the Micro-C data in the rest of the paper. While the Editors note tighter clustering of the H3K27ac (Figure 4A), we would like to note that the replicates cluster much closely on PCA plot of RNA-seq data (Supplementary Figure 1C). Notably, the RNA-seq result has now reproduced when performed with different sets of hands across multiple studies (Boone, et. al., 2021 and this report), as well as in a second cell line (as reported in this manuscript revision). Though not as tight, the H3K27ac CUT&Tag also reproduces in TTC466 cells. Thus, we interpret these findings to indicate that our replicates are functionally similar to each other. As discussed above in the response to Reviewer 1 in more detail, there are several factors that could affect how these functional similarities are represented in Micro-C data. Micro-C is ultimately a readout of the chromatin organization in a heterogeneous population of cells (Misteli et al., 2020). Additionally, sequencing depth limitations in conventional Micro-C experiments limit the ability to faithfully assess the enhancer-promoter interactions that may be relevant for our model system (Goel, et. al., 2023). Thus, both the strength of the biologically relevant signal and the heterogeneous chromatin state present in bulk samples could affect the average signal and lead to poorly clustering replicates (Hafner and Boettiger, 2022). 

      To address these important concerns about rigor and reproducibility of the analyses, we repeated our study in an additional cell line, TTC466, and largely reproduced our high-level findings for transcription, enhancer formation, and 3D chromatin. These additional studies were not without their own limitations and these are addressed in the Discussion section (392-420). The reproduction of weak/moderate clustering in the MDS plot in both A673 and TTC466 cell lines suggests the α4 helix of EWSR1::ETS fusions are important for reshaping 3D chromatin. However, additional genomic analyses geared toward higher resolution at specific EWSR1::ETS-bound loci are likely an important area of future study required to fully understand the role of the α4 helix in chromatin regulation in Ewing sarcoma. Live cell imaging, as performed by Chong, et. al., 2018 and additional biochemical techniques may also be informative and are beyond the scope of this report.

      With regards to concerns about construct expression, we have included immunoblots of the rescue constructs in both cell lines (Supplementary Figure 1B and 9A) and discussed Reviewer 1’s specific concerns in detail above.  

      The referees also raise the issue of using an additional cell line to make a more general message. Although it would perhaps be asking too much to repeat the MicroC experiments, consolidation of the observations could be performed by focusing on specific loci such as FCGRT and CCND1 that were analyzed in this study. Could the authors use 4C-type experiments to reproduce the conclusions in an additional cell line? It would also be pertinent to consolidate the findings at these loci by 4C-type approaches even in the cell line used here. For the moment, all conclusions are based on the same set of data and a single technical approach.

      We repeated the experiments in TTC466 cells and analyzed the data using same cut-offs used in A-673 cells. This allows us to compare between the two cell lines. We hope this new set of experiments and analyses address the reviewers’ concerns.  

      Reviewer #1 (Recommendations For The Authors):

      All the data are performed in A673 cells. Knowing the transcriptomic and epigenetic heterogeneity of Ewing sarcoma cells, some of the experiments supporting their findings should be replicated in at least another Ewing sarcoma model.

      Per our discussion above, we have replicated our experiments in an additional cell line model of Ewing sarcoma. Importantly, the TTC466 cell line used expresses the EWSR1::ERG fusion found in 10-15% of Ewing sarcoma cases.  

      Supplementary Figure 2B. Proportion of TAD boundaries bound by FLAG (i.e., EWS-FLI1) and CTCF. The number/proportion of FLAG (i.e., EWS-FLI) peaks observed at CTCF peak/TAD boundaries seems unexpectedly high. How do they explain this result since EWS-FLI peaks are rather intra-TAD to mediate their enhancer function?

      In our previous study, we showed that EWSR1::FLI1 binding can be detected at boundaries of TADs (Showpnil, et. al., 2022). We think therefore it is likely that EWSR1::FLI1 binding is able to mediate enhancer function both inside TADs as well as at the borders of TADs and may, in some cases, function as an insulator between TADs.  

      For the >50kb loop analysis, what was the low-range threshold? Up to 15-20 kp, contact frequency interactions may be caused by PFA crosslink (did they use a 5kb threshold ?). Were those excluded from that analysis?

      We acknowledge that we did not use a lower threshold to exclude those short-range loop interactions. In our previous study, we observed that EWSR1::FLI1 binding reduces long-range interactions in favor of short-range interactions (Showpnil, et. al., 2022) and wanted to be able to capture short-range loops in our analysis.  

      In Figure 2D, they observed that within TADs containing FLAG peaks at GGAA microsatellites, the intensity of the DBD+ FLAG peaks was higher compared to DBD FLAG peaks. How would this analysis look when considering the ETS FLAG peaks (i.e., EWS-FLI rather repressive peaks)? Could they compare TAD with GGAA msat vs TAD with ETS peaks?

      We agree that this is an interesting observation. In our prior analyses we found no discernible relationship between EWSR1::FLI1 binding and changes in 3D chromatin associated with repression (Showpnil, et. al., Nucleic Acids Research, 2022). In contrast, EWSR1::FLI1-bound superenhancers had greater H3K27ac deposition when overlapping both a bound GGAA repeat and a non-microsatellite site. While there have been several additional reports about the relevance of EWSR1::FLI1 binding at nonmicrosatellite peaks, motifs at these loci have not yet been rigorously defined as GGAA repeats were by Johnson, et. al. in PLoS One, 2017. Each ETS factor binds different motifs containing the core 5’-GGAA-3’ with varying affinities depending on the flanking residues. There may be >100-fold difference in sequence-specific binding affinity for “high” vs. “low” affinity motifs. Better defining the types of ETS motifs bound by EWSR1::FLI1 and the functional changes associated with them thus represents an interesting area of future study.

      Figure 1F: What is the biological meaning of these results (29.7, 39.5, and 54Mbp)? These distances are typically the size of a chromosome arm and clearly beyond classical chromatin loop/TAD structures in which EWS-FLI mediates its cis-activity.

      We agree with referee here. This panel is now removed in our revised manuscript.  

      How do DBD, KD, and DBD+ conditions compare with WT parental cells in the omics data? (Figures 1B, 4A). Do DBD+ conditions overlap with WT conditions? It would be nice to have these analyses also for Micro-C and Cut&Tag data. To be acknowledged here, the transcriptome data showing this aspect in Figure S1C are very convincing.

      This is a fair point. We were not able to obtain similar sequencing depth of wtEF Micro-C libraries to that of KD, DBD and DBD+ due to disproportional use of wtEF libraries in troubleshooting. Therefore, we decided to exclude wtEF condition from these analysis. 

      EWS-FLI cis-regulation at CCND1 also occurs through a much closer EWS-FLI peak (~-20kb msat upstream of CCND1 TSS) which was not taken into consideration. EWS-FLI peak intensity in both DBD and DBD+ at this msta seems similar. How would this fit into their model?

      The referee is correct. The closest peak upstream of CCND1 TSS is about ~19kb away. We highlighted this peak with the dashed boxes near the CCND1 TSS (Supplementary Figure 6). Peak intensity of DBD+ FLAG is slightly higher compared to DBD. Nonetheless, we acknowledge that the difference is small. We suspect that the DBD-α4 helix is affecting binding dynamics at GGAA repeats, but these genomics approaches are not well suited to detect small, but significant, changes in binding affinity or dynamics. In this case a more biochemical approach may be needed. Even though, both protein can still bind the same microsatellites, it is possible that they might differ in their stability of binding or in the recruitment of additional proteins. These possibilities are discussed in the Discussion section (444-463).  

      For the Micro-C, they sequenced only 7 to 8 million reads per condition. This coverage seems particularly low, especially for their analyses using 1-5kb bins. How does this compare with other published Micro-C data? Can this explain the variability observed between replicates?

      We apologize for the inconsistent verbiage of sequencing coverage that may have caused confusion. 7 to 8 million reads were used for shallow sequencing and QC analysis. Once a sample passed QC, we then sequenced 300 million reads per sample. 300M is now changed to 300 million to prevent a misunderstanding at line 598.  

      They mention:

      "In our recent studies of EWS::FLI, we found a small alpha helix in the DNA binding domain DBD-𝛼𝛼4, to

      be required for transcription and regulation by the fusion protein (Boone et al., 2021). Interestingly, this study did not find any change in chromatin accessibility (ATAC-Seq) and genome localization of EWS::FLI constructs (CUT&RUN) when DBD-𝛼𝛼4 helix was deleted leaving the mechanistic basis for the requirement of DBD-𝛼𝛼4 in transcription regulation unclear. "

      And

      "To assay the enhancer landscape, we collected H3K27ac CUT&Tag data from KD, DBD, and DBD+ cells. Principal component analysis of H3K27ac localization shows that the DBD replicates were clustered closer to the KD replicates while being in between the KD and the DBD+ replicates (Figure 4A), suggesting that DBD-𝛼𝛼4 helix is required to reshape the enhancer landscape."

      But now H3K27ac CUT&Tag show strong differences which were not observed in ATAC seq. How to explain this discrepancy?

      Though both H3K27ac and ATAC signal are associated with enhancers and promoters in euchromatin, they are not exactly measurements of the same thing. H3K4me2 is a mark more closely associated with ATAC signal than H3K27ac (Henikoff, et. al., 2020). Nonetheless, there are clear differences between the prior publication (Boone, et. al., 2021) and this work with regards to similar ATAC signal for each replicate and differences in H3K27ac. We suspect this may be related to a tighter association between H3K27ac and EWSR1::FLI1-mediated genome regulation and ATAC. Notably, there were very few differentially accessible regions between EWSR1::FLI1-depleted cells and conditions with EWSR1::FLI1 expression (either endogenous or wildtype rescue) using the A673 KD/Rescue system in Boone, et. al., 2021. In contrast, other A673 KD-rescue studies have reported differences in H3K27ac in EWSR1::FLI1 expressing conditions relative to EWSR1::FLI1-depleted conditions (Theisen, et. al., 2021). .  

      The authors mention:

      "Our study thus uncovered a surprising role for FLI DBD in the process of hub formation which is usually attributed to the EWS low complexity domain."

      Not sure this can be claimed, hubs are composed of many other factors that are not investigated here. Furthermore, promoter enhancer hubs/loops often include combined ETS and mSat chains to generate transcriptional hubs which have not been considered here. None of these points were discussed here.

      We replaced “uncovered” with “suggest” in our revised manuscript at line 476.  

      What are the barcode patterns in Supp 5, are those frequently observed in their Micro-C data, likely mapping artifacts, do they have any impact on their analyses?

      The barcode patterns in now Supplementary Figure 6 are blind spots in the hg19 genome assembly. Since they are few in numbers, we don’t expect these blind spots to impact our analysis.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02516

      Corresponding author(s): Christopher Shoemaker

      __1. __General Statements [optional]

      Thank you to all the reviewers for their helpful efforts on behalf of our manuscript. We appreciate the time and effort they have invested in providing valuable feedback.

      Overall, the positive reception from our reviewers highlighted their appreciation for our approach and findings. Moreover, their comments underscored the relevance and potential impact of our findings, particularly within the fields of autophagy and protein interaction networks. Their detailed and constructive critiques will also help refine both the content and presentation of our work.

      In response to the reviews, we have proposed targeted revisions to the manuscript, all of which are well within our lab's capabilities and can be executed efficiently. We have detailed our responses to each specific point raised by the reviewers below. * *

      • *

      __2. __Description of the planned revisions

      • *

      Reviewer #1

      Evidence, reproducibility and clarity

      1. EVIDENCE, REPRODUCIBILITY AND CLARITY Summary:

      Selective autophagy receptors (SARs) of the Sequestosome-1 like receptor group (SLRs) including SQSTM1(Sequestosome-1)/p62, NBR1, TAX1BP1, NDP52, CALCOCO1 and Optineurin are soluble SARs that engage cargo and ATG8 family proteins as well as components of the core autophagy machinery like FIP200/RBCC1 to bring about the autophagic degradation of the cargo and themselves. In the autophagic degradation of protein aggregates (aggrephagy) the most studied SAR p62 collaborates with the archetypal autophagy receptor NBR1 and also TAX1BP1 to bring about effective turnover of ubiquitinated cargos sequestered into p62 bodies or droplets by liquid-liquid phase separation. How this intricate co-operation of these SARs is orchestrated is incompletely understood. In the paper by North et al entitled "The LC3-interacting region of NBR1 is a protein interaction hub enabling optimal flux" the authors use peptide arrays to map the binding sites for ATG8-family proteins LC3A and GABARAPL1, FIP200 and TAX1BP1 to the autophagy receptor NBR1. The authors find that three short linear interaction motifs (SLiMs), the LIR, FIR and TIR interacting with ATG8 family proteins, FIP200 and TAX1BP1, respectively, partly overlap in a short region of NBR1 that can adopt different conformations to accommodate the different binding partners. In short, the different interactions are mediated by distinct overlapping determinants, rather than a single, convergent, SLiM. While the important binding determinants for ATG8 proteins and FIP200 show more overlap and it was not possible here to find mutations that distinguish LIR and FIR binding, TAX1BP1 bound more to a region downstream of the LIR and a specific mutation in NBR1 and in TAX1BP1 could abolish binding. Checking the role of phosphorylations in augmenting binding using phosphomimetic mutations it was seen that while FIP200 and Atg8-family binding were generally augmented by phosphorylation, TAX1BP1 binding did not respond to these mutations. Very interestingly, the authors found that co-expression of TAX1BP1 with tandem-tagged NBR1 in pentaKO cells (not expressing the SLRs p62, NBR1, NDP52, TAX1BP1 and OPTN) increased significantly the autophagic turnover of NBR1. None of the other SLRs could do this. Instead, this over-expression assay revealed a competition.

      Major points:

      1) In Fig 4 the peptide array binding assay is not sufficient as it is only semiquantitative. The data shown should be accompanied by a more direct binding assay allowing the determination of kDs for the binding where the WT peptides are directly compared to the phosphor mimicking mutant peptides. Here the fluorescence anisotropy assay the authors use in Suppl Fig. 1E or ITC, OctetRed96 or another assay suitable for kD determinations should be used.

      Response: Thank you for the constructive comments regarding our peptide array binding assay. We agree that the semi-quantitative nature of this method limits its ability to provide detailed binding affinity measurements. To address this, we will purify multiple peptides and assess the binding affinities between phosphomimetic+/- LIR peptides and Atg8s, FIP200, and TAX1BP1. While testing all peptides may be cost and time prohibitive, we will prioritize a representative range for this validation effort.

      2) As this paper is already dominated by the use of peptides it would significantly enhance the quality of the data if the authors had included studied with peptides phosphorylated at the specific positions to allow comparison with the phosphomimetic substitutions to aspartate.

      Response: Thank you for your insightful comment. We agree that incorporating studies with peptides phosphorylated at specific positions could provide a more nuanced comparison with the phosphomimetic substitutions to aspartate. Previous studies, including Popelka and Klionsky (2022) and Kliche et al. (2022), have indeed suggested that phosphomimetic substitutions do not perfectly replicate phosphorylation events.

      In response, we plan to order a peptide array containing phosphorylated peptides, not merely phosphomimetics, and will conduct additional experiments with TAX1BP1, FIP200, and LC3A. This approach will allow us to directly assess the effects of actual phosphorylation compared to phosphomimetic substitutions.

      While we acknowledge the possibility of subtle differences in binding affinity or regulatory interactions, we anticipate that the primary conclusions of our study—namely, that TAX1BP1 is largely insensitive to phosphorylation, whereas FIP200 and LC3A binding activities are affected—will remain unchanged. These experiments will provide valuable data to confirm the robustness of our conclusions under the conditions of true phosphorylation.

      3) The quality of the 2D peptide array probing of GST-LC3A binding in Fig 3A is poor. Is this a stripped and re-probed membrane? I do not think these data are publication quality and the experiment should be redone unless the authors have very good arguments against my suggestion. It would also be nice to see a 2D peptide array of GABARAPL1 binding too to make the comparative study complete.

      Response: Thank you for your constructive feedback regarding the quality of the 2D peptide array probing of GST-LC3A in Figure 3A. As you rightly pointed out, the membrane was indeed stripped and reprobed, with LC3A being the final probe. This method sometimes introduces artifacts, such as the 'ring' effect observed, which are common with this technique. However, the results consistently aligned with established consensus sequences for LC3, reinforcing the reliability of our findings despite the suboptimal image quality.

      Recognizing the concerns about the quality of the blot, we are prepared to repeat this experiment using a new commercial vendor, as our previous collaborator is no longer available. We anticipate some differences in the appearance of the blots due to changes in dot size and spacing from the new supplier. Given these variations, we propose adding the revised blot to the supplementary materials rather than the main figures to avoid disrupting the visual continuity of the data presentation.

      Additionally, in response to the reviewer’s suggestion, we will include a 2D peptide array probing for GABARAPL1. This will enhance the comparative analysis within our study.

      One alternative (related to Reviewer 3, comment 3) that we can deliver is using our LIR arrays to derive consensus sequences for LC3 binders and GABARAPL1 binders. In doing this, we find the same differences in LC3 and GABARAP binding preferences that were reported previously in Rogov et al 2017. Recovering these known, and somewhat subtle, differences in binding preference further bolster the validity of our approach.

      4) For the data shown in Fig 6 it should be noted that although these are very interesting results a clear limitation of the study is that the results on the autophagic turnover is based on overexpressing the SLRs in the pentaKO cells. In a physiological setting with all relevant actors in place and with a different stoichiometry the effects could likely be different.

      Response: We appreciate the observation regarding the limitations of our study due to the use of overexpressed SLRs in pentaKO cells. As the reviewer rightly points out, the stoichiometry and interaction dynamics in a physiological setting might differ significantly. Critically, after submission of this manuscript, a recent preprint by Sascha Martens’ group (Bauer et al. BioRxiv) has shown similar results using endogenously tagged p62, TAX1BP1, and NBR1. This study corroborates our results, suggesting that the interactions we observed are not merely artifacts of overexpression but reflect genuine biological phenomena. We will incorporate a detailed discussion of this study in the Discussion section of our manuscript to contextualize our findings within a more physiologically relevant framework.

      Therefore, we believe that our reductionist approach, while not fully reflective of physiological conditions, offers valuable and generalizable insights into the intricate cooperation of SARs in autophagy.

      Minor points:

      1) It would be beneficial for the reader to show a cartoon of the domain organization of both TAX1BP1 and NBR1 in Figure 1. NBR1 is shown in supplemental figure 1, but there is no depiction of the domain organization of TAX1BP1.

      Response: As suggested, a domain schematic for NBR1 and TAX1BP1 will be included.

      2) The authors say at the bottom of page 4 "Complementary in vivo studies reveal that while SLRs typically compete". But do they actually typically compete? Is this not a result of the experimental strategies employed? There is more a shortage of SLRs based on cargo competition as shown recently by Peter Kim's group that excessive pexophagy may reduce mitophagy etc. (Germain et al. 2023).

      Response: Thank you for pointing out this overstatement. We will soften this statement.

      3) In Fig. 3D it should be shown that D, E, A and V are preferred residues at position +1 for LC3A binding.

      Response: As suggested, we will amend the figure to include these residues at the +1 position.

      4) In such a 2D mutational analysis it is often just as important to determine which residues are not allowed for binding. It would therefore be nice if the authors could summarize/visualize their results in a better way in Fig 3D to also show the residues that lead to loss of binding. These could be shown below the sequence and the use of color to distinguish basic, acidic, hydrophobic and aromatic residues could be attempted.

      Response: As suggested, we will add to this figure to make it more comprehensive by including residues that are both preferred and lead to loss of binding. Furthermore, we have incorporated the use of color to distinguish the traits of different residues (basic, acidic, hydrophobic and aromatic) that are dis(favored) at each position.

      5) Line 327: To be clear about the fact that this is an overexpression assay "simultaneous expression" should be corrected to simultaneous overexpression".

      Response: We will make the suggested change.

      6) There are LIRs and FIRs that overlap and those that do not. To check the degree of overlaps that may occur among known LIRs the authors made a peptide array with 100 established LIR sequences taken from the LIR-Central database (Chatzichristofi et al., 2023). The peptide array was probed with LC3A (29 bound), GABARAPL1 (49 bound), the FIP200 Claw domain (57 bound) and the TAX1BP1 CC2 domain (49 bound). As much as one third (32) of the LIR peptides were not bound by any of the four probes. Do the authors have a good explanation for the fact that so many peptides did not bind?

      Response: Thank you for highlighting the significant number of LIR peptides that did not bind to any of the probes in our study. At first, we were similarly surprised by this. In our manuscript, we will expand on several factors that might explain this observation:

      • Specificity of Atg8 Family Proteins: The LIR-Central database indicates that these sequences bind at least one Atg8-family protein, but not necessarily all. Our assay might not have included the specific Atg8 proteins that some LIRs preferentially bind.
      • Peptide Solubility and Conformation: The solubility and conformational stability of peptides printed on an array can vary, affecting binding efficiency. Certain sequences may not adopt the optimal conformation for binding under these assay conditions.
      • Sequence Context and Accessibility: The native context in which the LIR motif is contained, including neighboring amino acids, can influence binding. Peptide arrays strip these peptides of their physiological context. As short linear interaction motifs, the assumption is that context will not strongly affect binding, but it’s known that many LIRs adopt partially structured motifs that influence binding (e.g. a C-terminal helix). Our peptide array approach is likely to impede such secondary structures from forming and may limit binding.
      • Misannotated sequences. The LIRs included from the database have varying levels of validation. Some sequences might be misannotated and, therefore, do not bind any of the probes. These discussion points will be included in the manuscript to provide a comprehensive explanation for the observed data.

      7) Strangely enough, the NBR1 peptide used in Figure 2A did not bind any of the probes while the NBR1 peptides used in Fig. 1C bound very well. Do the authors have any explanation for this?

      Response: Thank you for noting the discrepancy in NBR1 peptide binding observed in Figure 2A compared to Figure 1C. This observation was noted by all reviewers. The difference likely arises from the solubility issues associated with the NBR1 peptide in the format used for Figure 2A, where the peptide sequence included the LIR motif plus 10 amino acids on each side. The core LIR sequence of NBR1 (YIII) is highly hydrophobic, which can affect its solubility and, consequently, its observed binding in our peptide array.

      To overcome this, we optimized the LIR sequence of NBR1 for peptide arrays (amino acids 725-749), which includes seven residues before the LIR and 14 residues after. This shift enhanced solubility and facilitated more reliable probing in our experiments (notably Fig 3). In Fig2A and other assays, both the standard and the optimized formats of the NBR1 LIR were included: the standard format to maintain consistency with other LIRs extracted from the LIR-Central database and the optimized version as a control to validate our results.

      We will detail this explanation in the manuscript, clarifying the rationale behind the observed binding differences.


      Significance

      SIGNIFICANCE

      I found this paper very interesting to read with a lot of interesting new detailed and useful information on binding specificity for the proteins and motifs involved. It is a generally well performed study with interesting results. I also very much enjoyed the Discussion section which opens up for several interesting possible scenarios. The study also produced important point mutants that can be used in future studies to selectively abolish TAX1BP1 binding to NBR1. I think this is a "must read" paper for researchers interested in selective autophagy and co-operation between SARs, and more generally for getting some insight into how SLiMs may work. As such, this paper will be of interest for all interested in autophagy research and for a wider audience too as it is in essence about how overlapping SLiMs may be employed to orchestrate multiple protein-protein interactions using distinct overlapping determinants, rather than a single, convergent, SLiM. It is also one of the very few papers I have come across exploiting the power of the peptide array method so extensively with success for mapping protein binding sites.

      It could perhaps be interesting if the authors discussed their results in relation to another study from the group of Sascha Martens on the role of TAX1BP1 in p62 bodies or condensates (doi: https://doi.org/10.1101/2024.05.17.594671). These two papers should be read together as they are both very interesting and important contributions.

      Response: Thank you for pointing out this important reference that was posted shortly after our manuscript was submitted. As mentioned above, we will include an expanded discussion section to discuss these corroborating findings. We will also include a citation to Ferrari et al (PMID: ) on Tau evasion of autophagy through exclusion of TAX1BP1.

      Reviewer #2

      Evidence, reproducibility and clarity

      Summary In this manuscript, North et al. examined how short linear interaction motifs (SLiMs) help to orchester selective autophagy receptors (SARs) function during cargo engulfment in autophagosomes. In particular, the authors focused on NBR1 as a model SAR to address the role of its role in the clearance of protein aggregates (aggrephagy). Using binding assays, the authors showed that a SLiM harboring NBR1's LIR motif also mediates binding to FIP200 and TAX1BP1. Intrigued by these overlapping binding sites, the authors probed 100 LIRs for their binding to TAX1BP1's coiled-coil 2 region (CC2), FIP200's claw domain and two different ATG8 family members and found heterogenous binding pattern and distinct correlation between these four binding partners. Using mutational peptide arrays of NBR1's SLiM, the authors revealed unique binding determinants of these NBR1 partners and their potential differential regulation by phosphorylation. Taking advantage of their new NBR1 binding insights, the authors structurally modeled the binding of TAX1BP1's CC2 to NBR1's SLiM and identified crucial residues in both proteins for this interaction. Lastly, the authors turned to autophagy flux assays in cells and showed that TAX1BP1 acts synergistically with NBR1 to increase its lysosomal delivery. Overall, the claims and the conclusions are largely supported by the data. However, a few critical issues should be addressed.

      Are the data and the methods presented in such a way that they can be reproduced?

      Are the experiments adequately replicated and statistical analysis adequate?

      Major comments

      1) What are the expression levels of the different tf-SAR fusions compared to the endogenous levels of the respective SAR? And are tf-NBR1 protein levels changed upon co-expression of the other SARs?

      __Response: __We appreciate the questions concerning the expression levels of tf-SAR fusions relative to the endogenous levels of the respective SARs, similar to inquiries from Reviewer 1 (major comment 4). In our study, the levels of tf-NBR1 are notably higher than the endogenous levels. Interestingly, we observed that the co-expression of autophagy-competent NBR1 and TAX1BP1 generally leads to a decrease in the levels of both proteins, likely due to enhanced autophagic turnover. This pattern is not seen with autophagy-deficient mutants, suggesting a functional interaction affecting protein stability.

      Furthermore, a recent preprint by Sascha Martens’ group (Bauer et al., BioRxiv) has presented findings that echo our results using endogenously tagged versions of p62, TAX1BP1, and NBR1. This study supports our observations, indicating that the interactions and effects we report are not artifacts of overexpression but are reflective of genuine biological processes. These findings will be thoroughly discussed in the Discussion section of our manuscript to provide context for our results within a physiologically relevant framework.

      Therefore, we believe that our reductionist approach, while not fully reflective of physiological conditions, offers valuable and generalizable insights into the intricate cooperation of SARs in autophagy.

      2) Which of the 100 LIRs have been shown to specifically bind LC3A or GABARAPL1? The authors should include this information from the literature in Figure 2 (e.g., highlighted by color or else).

      __Response: __Thank you for your suggestion to detail the specific interactions between the 100 LIRs and Atg8 homologs like LC3A and GABARAPL1 in Figure 2. While each LIR in the LIR-Central database has been validated, detailed information on which LIRs bind specific Atg8 homologs—and with what relative affinity—is often lacking in the literature. This gap makes it challenging to present comprehensive binding preferences in a visually coherent way within Figure 2.

      Nevertheless, we recognize the value of such information. We plan to conduct a thorough literature review on all 100 LIRs included in our study. Should we find sufficient and reliable data regarding binding specificities, we will incorporate this into Figure 2, potentially using color coding or another method to highlight these relationships clearly.

      We can also perform the reciprocal experiment by using our LIR arrays to derive consensus sequences for LC3 binders and GABARAPL1 binders. In doing this, we find the same differences in LC3 and GABARAP preferences that were reported previously in Rogov et al 2017. Recovering these known, and somewhat subtle, differences in binding preference further bolster the validity of our approach. These new data will be added to the manuscript.


      3) How effective is the stripping of the peptide array? The authors should provide evidence that there is no carry over binding from sequential probing the array. As a control, the authors should at least repeat probing for the last binder in their sequential binding assay with a new peptide array that has not yet been incubated with a different binder and then stripped.

      __Response: __This is an important question, related to Reviewer 1 (comment 3), as the stripping of the peptide array can be variably affective. Prior to performing any of the arrays included in this manuscript, we did several validation arrays to identify the proper ordering of probes (e.g. what proteins can be stripped, which cannot). FIP200 and TAX1BP1 probing was performed on fresh or successfully stripped blots. LC3A probing was done last, as there is substantial previous literature defining the LC3 motif. However, the results of the LC3A binding consistently aligned with established consensus sequences for LC3, reinforcing the reliability of our findings despite the stripping process. Therefore, while stripping sometimes introduces artifacts, such as the 'ring effect’ observed in Figure 3A, the results did not appear to be influenced by prior probes.

      As suggested, we are prepared to repeat the LC3A probing on a new array to fully cement this interpretation. We note, however, that this will be done using a new commercial vendor, as our previous collaborator is no longer available (The original blots were ordered over 3 years ago). We anticipate some differences in the appearance of the blots due to changes in dot size and spacing from the new supplier. Given these variations, we propose adding the revised blot to the supplementary materials rather than the main figures to avoid disrupting the visual continuity of the data presentation.

      4) What is the number of replicates for the peptide array assays?

      __Response: __Due to cost considerations, peptide array assays in our study were conducted as one or two replicates. We understand the limitations this presents in terms of statistical robustness and variability assessment. However, where possible, we supplemented these assays with additional validation experiments and controls to ensure reliability of our findings. For critical experiments, including key interaction validations, we used independent biochemical assays to confirm the results obtained from the peptide arrays.

      5) The authors should test whether the enhancement of NBR1 flux by TAX1BP1 is only due to the contribution of an additional LIR or potential other functions of TAX1BP1 (e.g. ubiquitin binding or FIP200 binding). The authors should expand the panel shown in Figure 6E with TAX1BP1 mutant which are deficient in ubiquitin or FIP200 binding.

      __Response: __We thank the reviewer for their suggestion. We will include data with TAX1BP1 mutants that are deficient in ubiquitin or FIP200 binding

      Minor comments

      6) Molecular weight markers are missing on immunoblots.

      __Response: __We apologize for this oversight. We will amend figure to include molecular weight markers.

      7) It would be more informative (since some proteins have more than one LIR) if the actual LIR motif would be displayed next to the peptide array (as e.g. done for NBR1) and not only in the supplements.

      __Response: __We appreciate this thoughtful input and will consider its implementation carefully. We will explore the feasibility of integrating this detail in a manner that maintains figure clarity.

      8) Along this line in Figure 2A, NBR1's LIR (marked with a red star) is among the LIRs for which no binding was observed. The authors should explain this.

      Response: Thank you for noting the discrepancy in NBR1 peptide binding observed in Figure 2A compared to Figure 1C. This observation was noted by all reviewers. The difference likely arises from the solubility issues associated with the NBR1 peptide in the format used for Figure 2A, where the peptide sequence included the LIR motif plus 10 amino acids on each side. The core LIR sequence of NBR1 (YIII) is highly hydrophobic, which can affect its solubility and, consequently, its observed binding in our peptide array.

      To overcome this, we optimized the LIR sequence of NBR1 for peptide arrays (amino acids 725-749), which includes seven residues before the LIR and 14 residues after. This shift enhanced solubility and facilitated more reliable probing in our experiments (notably Fig 3). In Fig2A and other assays, both the standard and the optimized formats of the NBR1 LIR were included: the standard format to maintain consistency with other LIRs extracted from the LIR-Central database and the optimized version as a control to validate our results.

      We will detail this explanation in the manuscript, clarifying the rationale behind the observed binding differences.


      Significance

      Collectively, the work of North and colleagues provide valuable new mechanistic insights into the network of interaction that governs the function of SARs. Importantly, this works extends the knowledge in the field that SARs are acting in an orchestrated manner which reinforces their delivery to lysosomes. However, given the involvement of several SARs in the same process, it is crucial to dissect the binding modalities among these factors. In this regard, the current study on fine mapping binding sites provides an important contribution. In particular, in probing the in vitro findings in reconstituted KO cells. This part is really strong. In addition, the identification of critical residues for these bindings events represents important tools for the autophagy community which will be among the basic research audience most interested in this technical study.

      __ __


    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study entitled "Rifampicin tolerance and growth fitness among isoniazid-resistant clinical Mycobacterium tuberculosis isolates: an in-vitro longitudinal study" by Vijay et al. provides valuable insights into the association of rifampicin tolerance and growth fitness with isoniazid resistance among clinical isolates of M. tuberculosis. Antibiotic tolerance in M. tuberculosis is an important topic since it contributes to the lengthy and complicated treatment required to cure tuberculosis disease and may portend the emergence of antibiotic resistance. The authors found that rifampicin tolerance was correlated with bacterial growth, rifampicin minimum inhibitory concentrations, and isoniazid-resistance mutations.

      Strengths:

      The large number of clinical isolates evaluated and their longitudinal nature during treatment for TB (including exposure to rifampin) are strengths of the study.

      Weaknesses:

      Some of the methodologies are not well explained or justified and the association of antibiotic tolerance with growth rate is not a novel finding. In addition, the molecular mechanisms underlying rifampicin tolerance only in rapidly growing isoniazid-resistant isolates have not been elucidated and the potential implications of these findings for clinical management are not immediately apparent.

      We thank the reviewer for the comments, we have modified the method section and figure 1 to clarify the method as suggested by the reviewer.

      Although we agree that previous studies have shown the association of slow growth rate with antibiotic tolerance, ours is the most comprehensive assessment of rifampicin tolerance among clinical isolates, to our knowledge. In particular, we show that the degree of tolerance in clinical isolates can vary over several orders of magnitude: which had not been previously documented or appreciated. Furthermore, the association of high tolerance among IR isolates is a new finding, and given the potential for tolerance to increase risk of de novo drug resistance, our study suggests that IR isolates with high rifampicin tolerance may present a risk for development of MDR-TB.

      In addition, we have also analysed the longitudinal isolates and the genetic variants emerging in them associated with increase in rifampicin tolerance. This analysis reveals possible multiple pathways to increase in rifampicin tolerance among clinical M. tuberculosis isolates. Possible clinical implication includes associating high rifampicin tolerance and isoniazid resistance as a risk factor for tuberculosis treatment failure. This study helps to develop further clinical studies to evaluate the role of rifampicin tolerance in IR isolates and treatment outcome. We have focused on these aspects in the discussion of the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study by Vijay and colleagues addresses a clinically important, and often overlooked aspect of Tb treatment. Detecting for variations in the level of antibiotic tolerance amongst otherwise antibiotic-susceptible isolates is difficult to routinely screen for, and consequently not performed. The authors, present a convincing argument that indeed, there is significant variation in the susceptibility of isoniazid-resistant strains to killing by rifampicin, in some cases at the same tolerance levels as bona fide resistant strains. On the whole, the study is easy to follow and the results are justified. This work should be of interest to the wider TB community at both a clinical and basic level.

      Weaknesses:

      The manuscript is long, repetitive in places, and the figures could use some amending to improve clarity (this could be a me-specific issue as they look ok on my screen, yet the colour is poor when printed).

      We thank the reviewer for the comments, we have modified the revised manuscript as per the reviewer suggestions.

      It would have been great to have seen some correlation between increased rifampicin tolerance and treatment outcome, although I'm not sure if this data is available to the researchers. I agree with the researchers the use of a single media condition is a limitation. However, this is true of a lot of studies. Rifampicin tolerance and treatment outcome analysis.

      We agree with the reviewer that correlation between rifampicin tolerance and treatment outcome is important. This needs to be performed in future studies with better design to correlate rifampicin tolerance with treatment progression or outcome data.  

      Reviewer #3 (Public Review):

      Summary:

      The authors have initiated studies to understand the molecular mechanisms underlying the devolvement of multi-drug resistance in clinical Mtb strains. They demonstrate the association of isoniazid-resistant isolates by rifampicin treatment supporting the idea that selection of MDR is a microenvironment phenomenon and involves a group of isolates.

      Strengths:

      The methods used in this study are robust and the results support the authors' claims to a major extent.

      Weaknesses:

      The manuscript needs a thorough vetting of the language. At present, the language makes it very difficult to comprehend the methodology and results.

      We thank the reviewer for the comments, we have revised the manuscript as per the reviewer’s suggestions.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) Methods: The authors attempt to differentiate between "fast"- and "slow"-growing bacteria in order to determine if the growth rate is associated with rifampicin tolerance. This is accomplished by assessing growth on solid agar at 15 and 60 days post-incubation, respectively. However, mycobacterial growth rate is not a binary phenomenon but rather a continuous variable. Moreover, it is not clear why 15 and 60 days were selected. Also, instead of a "slow growth" phenotype, the 60-day time point might simply reflect a longer lag phase. Were the plates examined at any interval time points? It would be interesting to know whether colony growth was delayed overall in the populations observed only at 60 days, or simply if the appearance of microcolonies visible to the naked eye was delayed (with normal growth afterwards).

      We thank the reviewer for the comments, we want to clarify that we have not used agar plates but most-probable number method to determine the survival fraction post antibiotic treatment. We have clarified this in the revised manuscript and revised figure 1. The MPN method is a binary measure (growth/ no growth) and therefore cannot differentiate between long lag time and other mechanisms. In our original analysis, we included an intermediate time point of 30 days, but these data (included as supp fig. 1) cannot address the issue of lag phase directly. Since the 30-day time point did not add to the overall analysis and interpretation, we had not included them in the original submission.

      (2) Methods/Results/Discussion: Some important clinical information is missing-how were the patients treated who had IR isolates? Did they receive the standard regimen for DS TB or was another drug substituted for isoniazid? Exposure to different drugs could affect the rifampicin-tolerant populations during the intensive phase (Figure 5).

      Thank you for this comment, we have included the information regarding the treatment regimen in the revised manuscript.

      Were there differences in microbiological (sputum culture conversion rate at 8 weeks or time to culture negativity) or clinical outcomes based on isoniazid susceptibility? Perhaps more importantly, were there differences in microbiological/clinical outcomes based on the proportion of bacterial subpopulations with rifampicin tolerance for a particular isolate? There should be more discussion on the potential clinical implications of the study's findings.

      We agree with the reviewer that correlation between rifampicin tolerance and treatment progression or outcome is important. This needs to be performed in future studies with better design to correlate rifampicin tolerance with treatment progression or outcome data.  

      (3) Results (Figure 3A): Although an interesting finding, the increased rifampicin tolerance observed only in the "rapidly" growing populations of isoniazid-resistant isolates (IR) vs. isoniazid-susceptible (IS) isolates is not explained. In contrast, equally, increased rifampicin tolerance is seen in the "slowly" growing populations of both IR and IS isolates. It would be interesting to know if these slowly growing populations show specific tolerance to rifampicin or if, as expected, slow growth confers tolerance to a range of different bactericidal antibiotics.

      We thank the reviewer for the suggestions. we agree these will be interesting to investigate in a future study but are outside the scope of the current study.

      (4) Results (Figure 3B): The basis for the classification into tertiles is not clear and appears somewhat arbitrary-does this represent the survival of a particular isolate following rifampicin exposure relative to the other isolates based on isoniazid susceptibility (IS or IR) or the % growth relative to other populations for the same isolate? Figure 3B is missing a y-axis label. Is it a log10 MPN ratio?

      We thank the reviewer for pointing this, we want to clarify that for the classification into tertiles, first we pooled both group of isolates isoniazid susceptible (IS) and isoniazid resistant (IR) into a single population. Subsequently, we categorized this unified population into three distinct groups: low, medium, and high, based on their survival fraction following rifampicin treatment. Consequently, the 'low,' 'medium,' and 'high' tertiles represent the survival of each isolate following rifampicin exposure relative to the total number of isolates  combing both IS and IR isolates.

      For clarity, we provide a breakdown of the criteria for each tertile:

      +Low tertile: Consists of isolates with the lowest survival fraction (bottom 25%).

      +Medium tertile: Encompasses isolates with survival fractions that fall between the bottom 25% and the top 25%.

      +High tertile: Comprises isolates with the highest survival fractions (top 25%). This we have modified in the revised manuscript to clarify.

      We have also modified the Figure 3B to correct the y-axis label.

      (5) Results (lines 185-186): For correlating relative growth in the absence of antibiotics, 19 clinical isolates "outliers" were removed without explanation.

      We have added explanation for the “outliers” which were removed earlier due to deviation from normal distribution, we have also provided the supplementary figure 3 which includes these outliers.

      (6) Results (lines 203-211): The authors attempted to investigate a potential association between the mechanism of M. tuberculosis isoniazid resistance and the degree of rifampicin tolerance. However, the vast majority of IR clinical isolates (n=71) had a katG_S315X mutation and only 8 isolates had alternative mutations (inhA_I21T and fabG1_C-15X). Given the wide range of rifampicin tolerance observed within these isoniazid-resistant isolates, they concluded that other genetic or epigenetic determinants must be playing a role. WGS of longitudinally collected isolates from the same patients during TB treatment yielded non-synonymous SNPs in a list of genes previously reported to be associated with persistence, tolerance, and mycobacterial survival. However, precise mechanisms (including, e.g., expression of efflux pumps) are not investigated.

      We thank the reviewer for summarising the findings. Yes, we agree that investigating the precise mechanism of rifampicin tolerance is beyond the scope of the current work.

      Minor comments:

      (1) Abstract (line 41): The nonstandard abbreviations "IR" and "IS" have not been introduced prior to this usage.

      We have modified this in the abstract.

      (2) Introduction (line 60): Insert "phenomena" or "mechanisms" after "two".

      We have modified this in the introduction.

      (3) Introduction (lines 66-69): This sentence is confusing, especially the second part ("supporting this studies...").

      We have modified the lines to clarify.

      (4) Introduction (line 84): In the current text, it appears as if "IR" is the abbreviation for "isoniazid". Therefore, I recommend changing "resistance to isoniazid" to "isoniazid resistance".

      We have modified this in the revised manuscript.

      (5) Results (line 141): Insert "the" before "rest".

      We have modified this in the revised manuscript.

      (6) Results (line 187): Replace "did not had" with "did not have".

      We have modified this in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Abstract:

      The abstract is long and repetitive. It needs reworking and shortening to improve clarity and highlight the main takeaway message.

      We thanks the reviewer for the suggestions and have modified this in the revised manuscript.

      The introduction is interesting and contains relevant information. However, it is long and takes a while to get to the point of the study. It needs re-writing to emphasise key prior results and the purpose of this study.

      We thanks the reviewer for the suggestions and we have modified this in the revised manuscript.

      Results:

      As the study relies predominately on the use of MPN, I think a simple schematic of how the experiment is performed would be informative. Could this be added to Figure 1?

      We have revised the figure 1 in the manuscript to include the schematic representation.

      Some of the differences in MKD90, whilst they may be significant, are small so it would at least provide context as to the relevance of these differences. This may also alleviate my confusion as to how the authors can measure the time required to achieve MDK90 as 1.23-1.31 days when the first time point that is taken is day 2 (the data in Figure 2). They have FigS6 but this is small and hard to follow.

      We thank the reviewer for this suggestion, we have modified this in the revised manuscript and figureS6.

      Figure 2:

      Would be helpful to have -1 on the Y axis.

      The grey dots don't print very well (Might be my printer)

      We have modified this in the revised manuscript, figure 2.

      Line 142: The authors note a difference in RIF tolerance at day 15 that disappeared by day 60. I assume they are referring to the day 5 timepoint although this isn't clear as written.

      Yes, it is referring to the day 5 time point and we have clarified this in the revised manuscript.

      The section starting at line 148 (fig 3) is interesting, but it is difficult to read and follow what the difference is between this data and the prior data in Figure 2. It also wasn't until about line 165 that the purpose became clear. Overall the conclusions are sound and interesting.

      We have modified this in the revised manuscript.

      Line 154: What are the early and late time recovery time points?

      Is Figure 3A the same data as Figure 2?

      We have clarified this in the revised manuscript, the figure 3A is the same data as Figure 2.

      I found Figure 6 hard to follow. I'm not sure how better to present this data, but it should be improved. Some further clarification in the text would be helpful.

      We thank the reviewer for the suggestions. We have added more explanation in the text to clarify figure 6.

      Conclusions:

      The conclusions are sound, based on the data presented. The clinical relevance is highlighted, yet appropriately phrased to not be too far-reaching.

      Again, I think the conclusions could be condensed considerably. It is repetitive in places, which distills the main outcomes of this otherwise interesting and important study. The authors appropriately highlight some of the limitations of their study.

      We thank the reviewer for these comments and have modified this in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript "Rifampicin tolerance and growth fitness among isoniazid-resistant clinical Mycobacterium tuberculosis isolates: an in-vitro longitudinal study" by Srinivasan et.al., details the identification/ development of isoniazid-resistant strains in clinical isolates following testament with rifampicin. This is an important aspect of understanding MDR development in TB strains. the results are promising and gel well with the hypothesis. However, the manuscript requires a thorough language modification. While the overall idea is clear the methodology does not come out clearly.

      Specific comments:

      (1) It is not clear whether rifampicin treatments were given for 2 and 5 days before kill curves or for 15 and 60 days? The methodology needs to be phased clearly. Why was this time interval of 15 days and 60 days taken? is there a rationale for this?

      We thank the reviewer for the suggestions, we have modified the method and figure 1 to clarify this in the revised manuscript.

      (2) A concentration of 2ug/ml was used for in vitro culture in this study. While the authors themselves indicate that this is well above the MIC, this might represent a non- natural dose and hence may force the evolution of strains. What will be the scenario in the natural course of antibiotic treatment (dose at MIC or less than MIC)?

      We have observed that till 5 days there is no significant resistant emergence but after 5 days only resistance emerges, therefore we avoided determining the survival fraction after resistance emergence, the kill curve represents mostly tolerant sub population. ADD: Pharmacokinetic studies of rifampicin dosing suggest that peak concentrations of >2-32 µg/mL are typical for standard doses of the drug, therefore we believe the chosen concentration of 2 µg/mL to be physiologically relevant.

      (3) As described in line 155, the survival spanned a broad distribution, across a million times in difference. This is rather surprising that 5 days of rifampicin treatment would lead to such a spread in resistance patterns. Did the authors study the different populations to understand this phenomenon? This is important given the scale of resistance developed in this short time.

      We want to clarify that the broad range of survival fraction reflect the difference in tolerant sub-populations but not resistant sub-population to rifampicin as they are determined post rifampicin treatment in rifampicin free media, this has been clarified in the revised figure 1.

      Overall, the manuscript is a detailed study with new insights into the development of multi-drug resistance by Mtb. A thorough vetting for language is essential for a greater impact of the study.

      We thank the reviewer and have attempted to improve the clarity of the language to increase the potential impact of our findings.

    1. Author response:

      The following is the authors' response to the current reviews.

      Reviewer #1 (Public Review):

      I'll begin by summarizing what I understand from the results presented, and where relevant how my understanding seems to differ from the authors' claims. I'll then make specific comments with respect to points raised in my previous review (below), using the same numbering. Because this is a revision I'll try to restrict comments here to the changes made, which provide some clarification, but leave many issues incompletely addressed.

      As I understand it the main new result here is that certain recurrent network architectures promote emergence of coordinated grid firing patterns in a model previously introduced by Kropff and Treves (Hippocampus, 2008). The previous work very nicely showed that single neurons that receive stable spatial input could 'learn' to generate grid representations by combining a plasticity rule with firing rate adaptation. The previous study also showed that when multiple neurons were synaptically connected their grid representations could develop a shared orientation, although with the recurrent connectivity previously used this substantially reduced the grid scores of many of the neurons. The advance here is to show that if the initial recurrent connectivity is consistent with that of a line attractor then the network does a much better job of establishing grid firing patterns with shared orientation.

      Beyond this point, things become potentially confusing. As I understand it now, the important influence of the recurrent dynamics is in establishing the shared orientation and not in its online generation. This is clear from Figure S3, but not from an initial read of the abstract or main text. This result is consistent with Kropff and Treves' initial suggestion that 'a strong collateral connection... from neuron A to neuron B... favors the two neurons to have close-by fields... Summing all possible contributions would result in a field for neuron B that is a ring around the field of neuron A.' This should be the case for the recurrent connections now considered, but the evidence provided doesn't convincingly show that attractor dynamics of the circuit are a necessary condition for this to arise. My general suggestion for the authors is to remove these kind of claims and to keep their interpretations more closely aligned with what the results show.

      We would like to clarify that the simple (flexible) attractor is a weaker condition than the ones previously used to align grid cells. However, by no means we claim that it is a necessary condition for grid maps to align. Other architectures, certainly more complex ones but perhaps even simpler ones, can align grid maps in our model.

      Major (numbered according to previous review)

      (1) Does the network maintain attractor dynamics after training? Results now show that 'in a trained network without feedforward Hebbian learning the removal of recurrent collaterals results in a slight increase in gridness and spacing'. This clearly implies that the recurrent collaterals are not required for online generation of the grid patterns. This point needs to be abundantly clear in the abstract and main text so the reader can appreciate that the recurrent dynamics are important specifically during learning.

      We respectfully disagree with the interpretation of this result. In this model cells self-organize to produce aligned grid maps. In such systems it makes sense to characterize the equilibrium states of the system. We turned learning off in Figure S3 to show that the recurrent connections have a contractive effect on grid spacing. But artificially turning off learning means that one can no longer make claims about the equilibrium states of the system, since it can no longer evolve freely. In a functional network, if the recurrent attractor is removed, the system will evolve towards poor gridness and no alignment no matter what the starting point is, as also shown in Figure S3. Several experimental results invite us to think of grid cells as the equilibrium solution of a series of constraints that is ready to change at any time: Barry et al, 2012; Yoon et al, 2013; Carpenter et al, 2015; Krupic et al, 2015; Krupic et al, 2018; Jayakumar et al, 2019.

      One point in which we perhaps agree with the reviewer is that information about the hexagonal maps is kept in the feedforward weights, while behavior and the recurrent collaterals act as constraints of which these feedforward weights are the equilibrium solution.

      (2) Additional controls for Figure 2 to test that it is connectivity rather than attractor dynamics (e.g. drawing weights from Gaussian or exponential distributions). The authors provide one additional control based on shuffling weights. However, this is far from exhaustive and it seems difficult on this basis to conclude that it is specifically the attractor dynamics that drive the emergence of coordinated grid firing.

      Again, we do not claim that this is the only way in which grid maps can be aligned, but it is the simplest one proposed so far. We were asked if it was the specific combination of input weights to a cell rather than the organization provided by the attractor which resulted in aligned maps. By shuffling the inputs to a cell we keep the combination of inputs invariant but lose the attractor architecture. Since grid maps in this new situation are not aligned, we can safely conclude that it is not the combination of inputs per se, but the specific organization of these inputs that allows grid alignment. It is not fully clear to us what ‘exhaustive’ means in this context.

      (3) What happens if recurrent connections are turned off? The new data clearly show that the recurrent connections are not required for online grid firing, but this is not clear from the abstract and is hard to appreciate from the main text.

      This point is related to (1). Absent this constraint, Figure S3 shows that the system evolves toward larger spacing, with poorer gridness and no alignment.

      (4) This is addressed, although the legend to Fig. S2D could provide an explanation / definition for the y-axis values.

      We have now added: Mean input fields are the sum of all inputs of a given kind entering a neuron at a given moment in time, averaged across cells and time.

      (5) Given the 2D structure of the network input it perhaps isn't surprising that the network generates 2D representations and this may have little to do with its 1D connectivity. The finding that the networks maintain coordinated grids when recurrent connections are switched off supports my initial concern and the authors explanation, to me at least, remain confusing. I think it would be helpful to consider that the connectivity is specifically important for establishing the coordinated grid firing, but that the online network does not require attractor dynamics to generate coordinated grid firing.

      This point is related to (1) and (3). We agree with the reviewer that the input lies within a 2D manifold, but this is not something that the network has to find out because it receives one datapoint of information at a time. This alone is not enough to form aligned grid cells, since each grid cell can find a roughly equivalent equilibrium in a different direction. It is only the constraint imposed by the recurrent collaterals that aligns grid maps, and, as we show, this constraint does not need to be constructed ad hoc to work on 2D, as previously thought. When recurrent connections are switched off, the system evolves toward unaligned grid maps, with larger spacing and lower gridness. Regarding the results obtained after modifying the network and turning off learning, we think they have a very limited scope (in this case showing the contractive effect of recurrent collaterals on grid spacing), given that the system is artificially being kept out of its natural equilibrium.

      (6) Clarity of the introduction. This is somewhat clearer, but I wonder if it would be hard for someone not familiar with the literature to accurately appreciate the key points.

      We have made our best effort to improve the clarity of the introduction.

      (7) Remapping. I'm not sure why this is ill posed. It seems the proposed model can not account for remapping results (e.g. Fyhn et al. 2007). Perhaps the authors could just clearly state this as a limitation of the model (or show that it can do this).

      We view our model as perfectly consistent with Fyhn et al, 2007. Remapping is not triggered by the network itself, though, but rather by a re-arrangement of the inputs requiring the network to learn new associations. Different simulations of the same model with identical parameters can be interpreted as remapping experiments.

      Reviewer #3 (Public Review):

      Summary:

      The paper proposes an alternative to the attractor hypothesis, as an explanation for the fact that grid cell population activity patterns (within a module) span a toroidal manifold. The proposal is based on a class of models that were extensively studied in the past, in which grid cells are driven by synaptic inputs from place cells in the hippocampus. The synapses are updated according to a Hebbian plasticity rule. Combined with an adaptation mechanism, this leads to patterning of the inputs from place cells to grid cells such that the spatial activity patterns are organized as an array of localized firing fields with hexagonal order. I refer to these models below as feedforward models.

      It has already been shown by Si, Kropff, and Treves in 2012 that recurrent connections between grid cells can lead to alignment of their spatial response patterns. This idea was revisited by Urdapilleta, Si, and Treves in 2017. Thus, it should already be clear that in such models, the population activity pattern spans a manifold with toroidal topology. The main new contributions in the present paper are (i) in considering a form of recurrent connectivity that was not directly addressed before. (ii) in applying topological analysis to simulations of the model. (iii) in interpreting the results as a potential explanation for the observations of Gardner et al.

      We wanted to note that we do not see this paper as proposing an alternative to the attractor hypothesis, given that we use attractor networks, but rather as an exploration of possibilities not yet visited by this hypothesis.

      Strengths:

      The exploration of learning in a feedforward model, when recurrent connectivity in the grid cell layer is structured in a ring topology, is interesting. The insight that this not only align the grid cells in a common direction but also creates a correspondence between their intrinsic coordinate (in terms of the ring-like recurrent connectivity) and their tuning on the torus is interesting as well, and the paper as a whole may influence future theoretical thinking on the mechanisms giving rise to the properties of grid cells.

      Weaknesses:

      (1) In Si, Kropff and Treves (2012) recurrent connectivity was dependent on the head direction tuning, in addition to the location on a 2d plane, and therefore involved a ring structure. Urdapilleta, Si, and Treves considered connectivity that depends on the distance on a 2d plane. The novelty here is that the initial connectivity is structured uniquely according to latent coordinates residing on a ring.

      The recurrent architectures in the cited works are complex and require arranging cells in a 2D manifold to calculate connectivity based on their relative 2D position. In other words, the 2D structure is imprinted in the architecture, as in our 2D condition. In this work the network is much simpler and only requires neighboring relations in 1D. Such relationships have been shown to spontaneously emerge in the hippocampal formation (Pastalkova et al, 2008; Gonzalo Cogno et al, 2024).

      (2) The paper refers to the initial connectivity within the grid cell layer as one that produces an attractor. However, it is not shown that this connectivity, on its own, indeed sustains persistent attractor states. Furthermore, it is not clear whether this is even necessary to obtain the results of the model. It seems possible that (possibly weaker) connections with ring topology, that do not produce attractor dynamics but induce correlations between neurons with similar locations on the ring would be sufficient to align the spatial response patterns during the learning of feedforward weights.

      Regarding the first part of the comment, the recurrent collaterals create one or at times multiple bumps of activity in the network so that neighboring (interconnected) cells activate together. An initial random state of activity rapidly falls into this dynamic, constrained by the attractor. To us this is not surprising given that this connectivity is the classical means of creating a continuous attractor. Perhaps there is some deeper meaning in this comment that we are not fully grasping.

      Regarding the second part of the comment, we fully agree with the reviewer. We are presenting what so far is the simplest connectivity that can align grid maps, but by no means we claim that it is the simplest possible one. Regarding weaker connections with ring topology, we show in Figure S2 that a ring attractor with too weak or too strong connections is incapable of aligning grids, since a balance between feedforward and feedback inputs is required.

      (3) Given that all the grid cells are driven by an input from place cells that span a 2d manifold, and that the activity in the grid cell network settles on a steady state which is uniquely determined by the inputs, it is expected that the manifold of activity states in the grid cell layer, corresponding to inputs that locally span a 2d surface, would also locally span a 2d plane. The result is not surprising. My understanding is that this result is derived as a prerequisite for the topological analysis, and it is therefore quite technical.

      We understand that the reviewer is referring to the motivation behind studying local dimensionality. We agree that the topological analysis approach is quite technical, but it provides unique insights. The theorem of closed surfaces, which allows us to deduce a toroidal topology from Betti numbers (1,2,1), only applies to closed surfaces. One thus needs to show that the point cloud is a surface (local dimensionality of 2) and is closed (no borders or singularities). If borders or singularities were present, a toroidal topology could not be claimed from these Betti numbers. Thus, it is a crucial step of the analysis.

      (4) The modeling is all done in planar 2d environments, where the feedforward learning mechanism promotes the emergence of a hexagonal pattern in the single neuron tuning curve. Under the scenario in which grid cell responses are aligned (i.e. all neurons develop spatial patterns with the same spacing and orientation) it is already quite clear, even without any topological analysis that the emerging topology of the population activity is a torus.

      However, the toroidal topology of grid cells in reality has been observed by Gardner et al also in the wagon wheel environment, in sleep, and close to boundaries (whereas here the analysis is restricted to the a sub-region of the environment, far away from the walls). There is substantial evidence based on pairwise correlations that it persists also in various other situations, in which the spatial response pattern is not a hexagonal firing pattern. It is not clear that the mechanism proposed in the present paper would generate toroidal topology of the population activity in more complex environments. In fact, it seems likely that it will not do so, and this is not explored in the manuscript.

      We agree that our work was constrained to exploration in 2D and that the situations posed by the reviewer are challenging, but we do not see them as unsurmountable. The wagon wheel shows a preservation of toroidal topology locally, where the behavior of the animal is rather 2-dimensional. Globally, hexagonal maps are lost, which is compatible with some flexibility in the way grid maps are formed. If sleep meant that all inputs are turned off, our model would predict a dynamic dictated by the architecture (1D for the ring attractor, for example), but we do not really know that this is the case. In the future, we intend to explore predictive activity along the linear attractor, which could both result in path integration and in some level of preservation of the activity when inputs are completely turned off.

      Regarding boundaries, as we have argued before, the cited work chooses to filter away what looks like more than half of the overall explained variance through PCA, and this is only before applying a non-linear dimensionality reduction algorithm. It is specifically shown that the analyzed components are the ones with global periodicity throughout the environment. Thus, it is conceivable that through this approach, local irregularities found only at the borders are disregarded in favor of a clearer global picture. While using a different methodology, our approach follows a similar spirit, albeit with far less noisy data.

      (5) Moreover, the recent work of Gardner et al. demonstrated much more than the preservation of the topology in the different environments and in sleep: the toroidal tuning curves of individual neurons remained the same in different environments. Previous works, that analyzed pairwise correlations under hippocampal inactivation and various other manipulations, also pointed towards the same conclusion. Thus, the same population activity patterns are expressed in many different conditions. In the present model, this preservation across environments is not expected. Moreover, the results of Figure 6 suggest that even across distinct rectangular environments, toroidal tuning curves will not be preserved, because there are multiple possible arrangements of the phases on the torus which emerge in different simulations.

      We agree with this observation. A symmetry in our implementation results in the fact that only ~50% of times the system falls in the preferred solution, and the rest of the times it falls into other local minima. Whether this result is at odds with current observations can be debated on the basis of probabilities. However, we believe that the symmetry we found is purely circumstantial, and that it can be broken by elements such as head direction modulation or other ingredients used to achieve path integration. In other words, we acknowledge that symmetry is an issue of the implementation we show here (which has been kept as simple as possible to serve as a proof-of-principle) but we do not think that it is a defining feature of flexible attractors in general. We expect that future implementations that incorporate path integration capabilities will not present this kind of symmetry in the space of solutions.

      Regarding the rigid phase translation across modalities, while this effect is very clear in Gardner et al, it is less so in other datasets. The analyses shown in Hermansen et al (2024) can rather be interpreted as somewhere in the way between perfect rigid translation and fully randomized phases across navigation modalities.

      (6) In real grid cells, there is a dense and fairly uniform representation of all phases (see the toroidal tuning of grid cells measured by Gardner et al). Thus, the highly clustered phases obtained in the model (Fig. S1) seem incompatible with the experimental reality. I suspect that this may be related to the difficulty in identifying the topology of a torus in persistent homology analysis based on the transpose of the matrix M.

      We partly agree with this observation and note that a pattern of ordered phases is an issue not only for the 1D attractor but also for the 2D one, which appears much more uniform than in experimental data. The low number of neurons we used for computational economy and the full connectivity could be key ingredients to generate these phase patterns. To show that this is not a defining feature of flexible attractors, apart from the fact that these patterns appear also with non-flexible 2D architectures, we included in Figure S1 simulations with ‘fragmented 1D’ architectures. In this case the architecture is a superposition of 20 random 1D stripe-like attractors. While the alignment of maps achieved with this architecture is almost at the same level as the one obtained with 1D and 2D attractors, the phases are much more similar to what has been observed experimentally, and less uniform than what is obtained with 2D attractors.

      (7) The motivations stated in the introduction came across to me as weak. As now acknolwledged in the manuscript, attractor models can be fully compatible with distortions of the hexagonal spatial response patterns - they become incompatible with this spatial distortions only if one adopts a highly naive and implausible hypothesis that the attractor state is updated only by path integration. While attractor models are compatible with distortions of the spatial response pattern, it is very difficult to explain why the population activity patterns are tightly preserved across multiple conditions without a rigid two-dimentional attractor structure. This strong prediction of attractor models withstood many experimental tests - in fact, I am not aware of any data set where substantial distortions of the toroidal activity manifold were observed, despite many attempts to challenge the model. This is the main motivation for attractor models. The present model does not explain these features, yet it also does not directly offer an explanation for distortions in the spatial response pattern.

      Some interesting examples are experiments in 3D, where grid cells presumably communicate with each other through the same recurrent collaterals, but global periodicity is lost and only some local order is preserved even away from boundaries (Ginosar et al, 2021; Grieves et al, 2021). While these datasets have not been explored using topological analysis, they serve as strong motivators to understanding 2D grid cells as one equilibrium solution that arises under some set of constraints, but belongs to a wider space of possible solutions that may arise as well under more flexible constraints. Even (and especially) if one adheres to the hypothesis that grid cells are pre-wired into a 2D torus, a concept like flexible attractors might become useful to understand how their activity is rendered in 3D. Another strong motivation is our lack of understanding of how a perfectly balanced 2D structure is formed and maintained. Simpler architectures could be thought of as alternatives, but also as an intermediate step towards it.

      Regarding the rigid phase translation across modalities, while this effect is very clear in Gardner et al, it is less so in other datasets. The analyses shown in Hermansen et al (2024) can rather be interpreted as somewhere in the way between perfect rigid translation and fully randomized phases.

      In a separate point, although it might not be strictly related to the comment, we do not fully share the idea that persistent activity patterns during sleep are necessary or sufficient conditions for attractor dynamics, although we do agree that attractors could be the mechanism behind them and any alternative is at least as complex as attractors. On the necessity side, attractors in the hippocampus are not constantly engaged (Wills et al, 2005). For sufficiency, one should prove that no other network is capable of reproducing the phenomenon, and to our best knowledge we are still far from that point.

      (8) There is also some weakness in the mathematical description of the dynamics. Mathematical equations are formulated in discrete time steps, without a clear interpretation in terms of biophysically relevant time scales. It appears that there are no terms in the dynamics associated with an intrinsic time scale of the neurons or the synapses (a leak time constant and/or synaptic time constants). I generally favor simple models without lots of complexity, yet within this style of modelling, the formulation adopted in this manuscript is unconventional, introducing a difficulty in interpreting synaptic weights as being weak or strong, and a difficulty in interpreting the model in the context of other studies.

      We chose to keep the model as simple as possible and in the line of previous publications developing it. However, we see the usefulness of putting it in what in the meantime has become a canonical framework. Fortunately this has been done by D’Albis and Kempter (2017). In our simplified version of the model there is no leak term and adaptation on its own brings down activity in the absence of input, but we agree that such a term could be added, albeit not without modifying all other network parameters.

      In my view, the weaknesses discussed above limit the ability of the model, as it stands, to offer a compelling explanation for the toroidal topology of grid cell population activity patterns, and especially the rigidity of the manifold across environments and behavioral states. Still, the work offers an interesting way of thinking on how the toroidal topology might emerge.

      Reviewer 1:

      Reviewer #1 (Recommendations For The Authors):

      See comments above. In addition:

      (1) Abstract: '...interconnected by a two-dimensional attractor guided by path integration'. This is unclear. I think the intended meaning might be along the lines of '...their being computed by a 2D continous attractor that performs path integration'?

      'path integration allowing for no deviations from the hexagonal pattern' This is incorrect. Local modulation of the gain of the speed input to a standard CAN would distort the grid pattern.

      'Using topological data analysis, we show that the resulting population activity is a sample of a torus' Activity in the model?

      'More generally, our results represent a proof of principle against the intuition that the architecture and the representation manifold of an attractor are topological objects of the same dimensionality, with implications to the study of attractor networks across the brain' I guess one might hold this intuition, but it strikes me as obvious that if you impose an sufficiently strong n-dimensional input on a network then it it's activity could have the same dimensionality. I don't really see this as being a point worth highlighting. Perhaps the more interesting point, it that during learning the recurrent connectivity aligns the grid fields of neurons in the network, and this may be a specific function of the 1D attractor dynamcis, although I don't think the authors have made this point convincing.

      'The flexibility of this low dimensional attractor allows it to negotiate the geometry of the representation manifold with the feedforward inputs'. See above for comments on the use of 'negotiate'.

      'while the ensemble of maps preserves features of the network architecture'. I don't understand this. What is the 'ensemble of maps' and what are the features referred to.

      We have reviewed the abstract considering these points. Regarding the ‘strong n-dimensional input’, we want to point out that it is not the input itself that generates a torus (the no attractor condition does not lead to a torus) but rather the interplay between the input and the attractor.

      ‘Perhaps the more interesting point …’, we do not fully understand how this sentence deviates from our own conclusions. We here show that a strong n-dimensional input is not enough to align grid cells (produce a n-torus), it is the interplay between inputs and attractor dynamics that does so, even if the attractor is not n-dimensional in terms of architecture.

      The ensemble of maps refers to the transpose of the population activity matrix, where each point in the cloud is a map, and the features refer to the persistent homology.

      (2) The manuscript still fails to clarify the difference between a model that path integrates in two dimensions and a model that simply represents information with a given dimensionality. The argument that it's surprising that a network with 1D architecture represents a higher dimensional input strikes me as incorrect and an unnecessary attempt to argue for conceptual importance. At least to me this isn't surprising. It would be surprising if the 1D network could path integrate but this doesn't seem to be the case.

      In response to the reviewer’s concerns, we have made clear in the introduction and discussion that this model has no path integration capabilities, although we aim to develop a model capable of path integration using the kind of simple architecture presented here. We want to highlight here that equating attractor dynamics with path integration would be a conceptual mistake.

      (3) Other wording also seems to make unnecessary conceptual claims. E.g. The repeated use of 'negotiate' implies some degree of intelligence, or at least an exchange of information, that isn't shown to exist. I wonder if more precise language could be used? As I understand it the dimensionality is bounded by the inputs on the one hand, and the network connectivity on the other, with the actual dimensionality being a function of the recurrent and feedforward synaptic weights. There's clearly some role for the relative weights and the properties of plasticity rules, but I don't see any evidence for a negotiation.

      An interesting observation in Figure S2 is that grid maps are aligned only if the relative strength of feedforward and recurrent inputs is similar. If one of them can impose over the other, grid maps do not align. This equilibrium can metaphorically be thought of as a negotiation instance, where the negotiation is an emergent property of the system rather than something happening at an individual synapse.


      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Reviewer #1 (Recommendations For The Authors):

      Major

      (1) What is the evidence that, after training, the 1D network maintains its attractor dynamics when feedforward inputs are active? If the claim is that it does then it's important to provide evidence, e.g. responses to perturbations, or other tests. The alternative is that after training the recurrent inputs are drowned out by the feed forward spatial inputs.

      We agree with the reviewer on the importance of this point. In our model, networks are always learning, and the population activity represented by aligned grid maps in a trained network is a dynamic equilibrium that emerges from the interplay between feedforward and collateral constraints. If Hebbian learning is turned off, one gets a snapshot of the network at that moment. We now show in Fig. S3 that in a trained network without feedforward Hebbian learning the removal of recurrent collaterals results in a slight increase in gridness and spacing. The expansion is due to the fact that, as we argue in the Results section, the attractor has a contractive effect on grid maps, which could relate to observations in novel environments (Barry et al, 2007). If Hebbian learning is turned on in the same situation, the maps, no longer constrained by the attractor, drift toward the equilibrium solution of the ‘No attractor’ condition, with significantly larger spacing, no alignment and lower individual gridness. Thus, the attractor is the force preventing them to do so when feedforward Hebbian learning is on.

      These observations point to the key role played by the attractor not only in forming but also in sustaining grid activity. The dynamic equilibrium framework fits well known properties of the system, such as its capacity to recalibrate very fast (Jayakumar et al, 2019), although this particular feature cannot be modeled with the current version of our model, that lacks path integration capabilities.

      (2) It would be useful to include additional control conditions for Figure 2 to test the hypothesis that it is simply connectivity, rather than attractor dynamics, that drives alignment.

      This could be achieved by randomly assigning strengths to the recurrent connections, e.g. drawing from exponential or Gaussian distributions.

      We agree and have included Fig. S2b-d, showing that the same distribution of collateral input weights entering each neuron, but lacking the 1D structure provided by the attractor, does not align grid maps. This is achieved by shuffling rows in the connectivity matrix, while avoiding self connections to make the comparison fair (self connections substantially alter the dynamic of the network, making it much more rigid). We observed that individual grid maps have very low gridness levels, even lower than in the no-attractor condition. In contrast, they have levels of population gridness slightly higher than in the no-attractor condition, but closer to 0 than to levels achieved with attractors. Our interpretation of these results is that irregular connectivity achieves some alignment in a few arbitrary directions and/or locations, which improves the coordination between maps at the expense of impairing rather than improving hexagonal responses of individual cells. Such observations stand in clear context to what is observed with continuous attractors with an orderly architecture.

      These results suggest that it is the structure of the attractor that allows grid cells to be aligned rather than the mere presence of recurrent collateral connections.

      (3) It seems conceivable that once trained the recurrent connections would no longer be required for alignment. Can this be evaluated by considering what happens if the recurrent connections are turned off after training (or slowly turned off during training)? Does the network continue to generate aligned grid fields?

      This point has elements in common with point 1. As we argued in that response, the attractor has two main effects on grid maps: it aligns them and it contracts them. If the attractor is turned off, feedforward Hebbian learning progressively drives maps toward the solution obtained for the ‘no attractor’ condition, characterized by maps with larger spacing, poorer gridness and lack of alignment.

      (4) After training what is the relative strength of the recurrent and feedforward inputs to each neuron?

      Both recurrent and feedforward synaptic-strength matrices are normalized throughout training, so that the overall incoming synaptic strength to each neuron is invariant. Because of this, although individual feed-forward and recurrent input fields vary dynamically, their average is constant, with the exception of the very first instances of the simulation, before a stable regime is reached in grid-cell activity levels. We have included Fig. S2d, showing the dynamics of feedforward and recurrent mean fields throughout learning as well as their ratio. In addition, Fig. S2a shows that the strength of recurrent relative to feedforward inputs is an important parameter, since alignment is only obtained in an intermediate range of ratios.

      (5) It would be helpful to also evaluate the low dimensional structure of the input to the network. Assuming it has a 2D structure, as it represents 2D space, can an explanation be provided for why it is surprising that the trained network also encodes activity with a 2D manifold? It strikes me that the more interesting finding might relate to alignment of the grids rather than claims about a 1D attractor encoding a 2D representation. Either way, stronger evidence and clearer discussion would be helpful.

      The reviewer is correct in assuming that the input has a 2D structure, that can be represented by a sheet embedded in a high dimensional space and thus has the Betti numbers [1,0,0]. The surprising element in our results is that we are showing for the first time that the population activity of an attractor network is constrained to a manifold that results from the negotiation between the architecture of the attractor and the inputs, and does not merely reflect the former as previously assumed. In this sense, the alignment of grid cells by a 1D attractor is an instance of the more general case that 1D attractors can encode 2D representations.

      It is certainly the case that the 2D input is a strong constraint pushing population activity toward a 2D manifold. However, the final form of the 2D manifold is strongly constrained by the attractor, as shown by the contrast with the no-attractor condition (a 2D sheet, as in the input, vs a torus when the attractor is present). The 1D attractor is able to flexibly adapt to the constraint posed by the inputs while doing its job (as demonstrated in previous points), which results in 2D grid maps aligned by a 1D attractor. Generally speaking, this work provides a proof of principle demonstrating that the topology of the attractor architecture and the manifold of the population activity space need not be identical, as previously widely assumed by the attractor community, and need not even have the same dimensionality. Instead, a single architecture can potentially be applied to many purposes. Hence, our work provides a valuable new perspective that applies to the study of attractors throughout the brain.

      (6) The introduction should be clearer about the different types of grid model and the computations they implement. E.g. The authors' previous model generates grid fields from spatial inputs, but if my understanding is correct it isn't able to path integrate. By contrast, while the many 2D models with continuous attractor dynamics also generate grid representations, they do so by path integration mechanisms that are computationally distinct from the spatial transformation implemented by feedforward models (see also general comments above).

      We agree with the reviewer and have made this point explicit in the introduction.

      (7) A prediction from continuous attractor models is that when place cells remap the low dimensional manifold of the grid activity is unaffected, except that the location of the activity bump is moved. It strikes me as important to test whether this is the case for the model presented here (my intuition is that it won't be, but it would be important to establish either way).

      We want to emphasize that our model is a continuous attractor model, so the question regarding the difference between what our model and continuous attractor network models predict is an ill-posed one. One of our main conclusions is precisely that attractors can work in a wider spectrum of ways than previously thought.

      In lack of a better definition, our multiple simulations could be thought of as training in different arenas. It is true that in our model maps take time to form, but this is also the case in novel environments (Barry et al, 2007 ), and continuous attractor models exclusively or strongly guided by self motion cues struggle to replicate this phenomenon. We show that the current version of our model accepts multiple solutions (in practice four but conceptually infinite countable), all of them resulting in a torus for the population activity (i.e. the same topology or low dimensional manifold). It is not clear to us how easy it would be to differentiate between most of these solutions in experimental data, with only incomplete information. This said, incorporating a symmetry-breaking ingredient to the model, for example related to head direction modulation, could perhaps lead to the prevalence of a single type of solution. We intend to explore this possibility in the future in order to add path-integration capabilities to the system, as described in the discussion.

      (8) The Discussion implies that 1D networks could perform path integration in a manner similar to 2D networks. This is a strong claim but isn't supported by evidence in the study. I suggest either providing evidence that this is the case for models of this kind or replacing it with a more careful discussion of the issue.

      The current version of our model has no path integration capabilities, as is now made explicit in the Introduction and Discussion. In addition, we have now made clear that the idea that path integration could perhaps be implemented using 1D networks is, although reasonable, purely speculative.

      Minor

      (1) Introduction. 'direct excitatory communication between them'. Suggest rewording to 'local synaptic interactions', as communication can also be purely inhibitory (e.g. Burak and Fiete, 2009) or indirect by excitation of local interneurons (e.g. Pastoll et al., Neuron, 2013).

      We agree and have adopted this phrasing.

      (2) The decision to focus the topology analysis on the 60 cm wide central square appears somewhat arbitrary. Are the irregularities referred to a property of the trained networks or would they also emerge with analysis of simulated ideal data? Can more justification be expanded and supplementary analyses be shown when the whole arena is used?

      In practical terms, a subsampling of the data to around half was needed because the persistent homology packages struggle to handle large amounts of data, especially in the calculation of H2. We decided to cut a portion of contiguous pixels in the open field at least larger than the hexagonal tile representing the whole grid population period (as represented in Figure 6). Leaving the borders aside was a logical choice since it is known that the solution at the borders is particularly influenced by the speed anisotropy of the virtual rat (see Si, Kropff & Treves, 2012), in a way that mimics how borders locally influence grid maps in actual rats (Krupic et al, 2015). The specific way in which our virtual rat handles borders is arbitrary and might not generalize. A second issue around borders is that maps are differently affected by incomplete smoothing, although this issue does not apply to our data because we did not smooth across neighboring pixels. In sum, considering the central 60 cm wide square was sufficient to contain the whole torus and a reasonable compromise that would allow us to perform all analyses in the part of the environment less influenced by boundaries.

      (3) It could help the general reader to briefly explain what a persistence diagram is.

      This is developed in the Appendix, but we have now added a reference to it and a brief description in the main text.

      (4) For the analyses in Figure 3-4, and separately for Figure 5, it might help the reader to provide visualizations of the low dimensional point cloud.

      All these calculations take place in the original high-dimensional point cloud. Doing them in a reduced space would be incorrect because there is no dimensionality reduction technique that guarantees the preservation of topology. In Figure 7 we reduce the dimensionality of data but emphasize that it is only done for visualization purposes, not to characterize topology. We also point out in this Figure that the same non-linear dimensionality reduction technique applied to objects with identical topology yields a wide variety of visualizations, some of them clear and some less clear. This observation further exemplifies why one cannot assume that a dimensionality-reduction technique preserves topology, even for a low-dimensional object embedded in a high-dimensional space.

      (5) The detailed comparison of the dynamics of each model is limited by the number of data points. Why not address this by new simulations with more neurons?

      We are not sure we understand this comment. In Figure 2, the dynamics for each model are markedly different. These are averages over 100 simulations. We are not sure what benefit would be obtained from adding more neurons. Before starting this work we searched for the minimal number of neurons that would result in convergence to an aligned solution in 2D networks, which we found to be around 100. Optimizing this parameter in advance was important to reduce computational costs throughout our work.

      (6) Could the variability in Figure 7 also be addressed by increasing the number of data points?

      As we argued in a previous point, there is no reason to expect preservation of topology after applying Isomap. We believe this lack of topology preservation to be the main driver of variability.

      (7) Page/line numbers would be useful.

      We agree. However, the text is curated by biorxiv which, to our best knowledge, does not include them.

      Reviewer 2:

      Reviewer #2 (Recommendations For The Authors):

      (1) I highly suggest that the author rewrite some parts of the Results. There are lots of details which should be put into the Methods part, for example, the implementation details of the network, the analysis details of the toroidal topology, etc. It will be better to focus on the results part first in each section, and then introduce some of the key details of achieving these results, to improve the readability of the work.

      This suggestion contrasts with that of Reviewer #1. As a compromise, we decided to include in the Results section only methodological details that are key to understanding the conclusions, and describe everything else in the Methods section.

      (2) 'Progressive increase in gridness and decrease in spacing across days have been observed in animals familiarizing with a novel environment...' From Fig.2c I didn't see much decrease. The authors may need to carry out some statistical test to prove this. Moreover, even the changes are significant, this might be not the consequence of the excitatory collateral constraint. To prove this, the authors may need to offer some direct evidence.

      We agree that the decrease is not evident in this figure due to the scale, so we are adding the correlation in the figure caption as proof. In addition, several arguments, some related to new analyses, demonstrate that the attractor contracts grid maps. First, the ‘no attractor’ condition has a markedly larger spacing compared to all other conditions (Fig. 2a). We also now show that spacing monotonically decreases with the strength of recurrent relative to feedforward weights, in a way that is rather independent of gridness (Fig. S2a). Second, as we now show in Fig. S2b-d, simulations with a shuffled 1D attractor, such that the sum of input synapses to each neuron are the same as in the 1D condition but no structure is present, lead to a spacing that is mid-way between the ‘no attractor’ condition and the conditions with attractors. Third, as we now show in Fig. S3a, turning off both recurrent connections and feedforward learning in a trained network results in a small increase in spacing. Fourth, as we now show in Fig. S3b, turning off recurrent connections while feedforward learning is kept on increases grid spacing to levels comparable to those of the ‘no attractor’ condition. All these elements support a role of the attractor in contracting grid spacing.

      (3) Some of the items need to be introduced first before going into details in the paper, for instance, the stipe-like attractor network, the Betti number, etc.

      We have added in the Results section a brief description and references to full developments in the Appendix.

      Reviewer 3 (Public Review):

      (1) It is not clear to me that the proposal here is fundamentally new. In Si, Kropff and Treves (2012) recurrent connectivity was dependent on the head direction tuning and thus had a ring structure. Urdapilleta, Si, and Treves considered connectivity that depends on the distance on a 2d plane.

      In the work of Si et al connectivity is constructed ad-hoc for conjunctive cells to represent a torus, it depends on head-directionality but also on the distance in a 2D plane. The topology of this architecture has not been assessed, but it is close to the typical 2D ‘rigid’ constraint. In the work of Urdapilleta et al, the network is a simple 2D one. The difference with our work is that we focus on the topology of the recurrent network and do not use head-direction modulation. In this context, we prove that a 1D network is enough to align grid cells and, more generally, we provide a proof of principle that the topology of the architecture and the representation space of an attractor network do not need to be identical, as previously assumed by the attractor community. These two important points were neither argued, speculated nor self-evident from the cited works.

      (2) The paper refers to the connectivity within the grid cell layer as an attractor. However, would this connectivity, on its own, indeed sustain persistent attractor states? This is not examined in the paper. Furthermore, is this even necessary to obtain the results in the model? Perhaps weak connections that do not produce an attractor would be sufficient to align the spatial response patterns during the learning of feedforward weights, and reproduce the results? In general, there is no exploration of how the strength of collateral interactions affects the outcome.

      The reviewer makes several important points. Local excitation combined with global inhibition is the archetypical architecture for continuous attractors (see for example Knierim and Zhang, Annual review of neuroscience, 2012). Thus, in the absence of feedforward input, we observe a bump of activity. As in all continuous attractors, this bump is not necessarily ‘persistent’ and instead is free to move along the attractor.

      We cannot prove that there is not a simpler architecture that has the same effect as our 1D or 1DL conditions, and we think that there are some interesting candidates to investigate in the future. What we now prove in new Fig. S2b-d is that it is not the strength of recurrent connections themselves, but instead the continuous attractor structure that aligns grid cells in our model. To demonstrate this, we shuffle incoming recurrent connections to each neuron in the 1D condition (while avoiding self-connections for fairness), and show that training does not lead to grid alignment. We also show in Fig. S1 that an architecture represented by 20 overlapping 1DL attractors, each formed by concatenating 10 random cells, aligns grid cells to levels slightly lower but similar to the 1D or 1DL attractors. This architecture can perhaps be considered as simpler to build in biological terms than all the others, but it is still constituted by continuous attractors.

      The strength of recurrent collaterals, or more precisely the recurrent to feedforward ratio, is crucial in our model to achieve a negotiated outcome from constraints imposed by the attractor and the inputs. We now show explicit measures of this ratio in Fig. S2, as well as examples showing that an imbalance in this ratio impairs grid alignment. When the ratio is too high or too low, both individual and population gridness are low. Interestingly, grid spacing behaves differently, decreasing monotonically with the relative strength of recurrent connections.

      (3) I did not understand what is learned from the local topology analysis. Given that all the grid cells are driven by an input from place cells that spans a 2d manifold, and that the activity in the grid cell network settles on a steady state that depends only on the inputs, isn't it quite obvious that the manifold of activity in the grid cell layer would have, locally, a 2d structure?

      The dimensionality of the input is important, although not the only determinant of the topology of the activity. The recurrent collaterals are the other determinant, and their architecture is a crucial feature. For example, as we now show in Figure S2b-d, shuffled recurrent synaptic weights fail to align grid cells. In the 1D condition, if feedforward inputs were absent, the dynamics of the activity would be confined to a ring. The opposite condition is our ‘no attractor’ condition, in which activity in the grid cell layer mimics the topology of inputs, a 2D sheet (and not a torus). It is in the intermediate range, when both feedforward and recurrent inputs are important, that a negotiated solution (a torus) is achieved.

      The analyses of local dimensionality and local homology of Figure 3 are crucial steps to demonstrate toroidal topology. According to the theorem of classification of closed surfaces, global homology is not enough to univocally define the topology of a point cloud, and thus this step cannot be skipped. The step is aimed to prove that the point cloud is indeed a closed surface.

      (4) The modeling is all done in planar 2d environments, where the feedforward learning mechanism promotes the emergence of a hexagonal pattern in the single neuron tuning curve. This, combined with the fact that all neurons develop spatial patterns with the same spacing and orientation, implies even without any topological analysis that the emerging topology of the population activity is a torus.

      We cannot agree with this intuition. In the ‘no attractor’ condition, individual maps have hexagonal symmetry with standardized spacing, but given the lack of alignment the population activity is not a closed surface and thus not a torus. It can rather be described as a 2D sheet embedded in a high dimensional space, a description that also applies to the input space.

      While it is rather evident that an ad hoc toroidal architecture folds this 2D population activity into a torus, it is less evident and rather surprising that 1D architectures have the same capability. This is the main novelty in our work.

      (5) Moreover, the recent work of Gardner et al. demonstrated much more than the preservation of the topology in the different environments and in sleep: the toroidal tuning curves of individual neurons remained the same in different environments. Previous works, that analyzed pairwise correlations under hippocampal inactivation and various other manipulations, also pointed towards the same conclusion. Thus, the same population activity patterns are expressed in many different conditions. In the present model, the results of Figure 6 suggest that even across distinct rectangular environments, toroidal tuning curves will not be preserved, because there are multiple possible arrangements of the phases on the torus which emerge in different simulations.

      We agree with the reviewer in the main point, although the recently found ring activity in the absence of sensory feedback (Gonzalo Cogno et al, 2023) suggests that what is happening in the EC is more nuanced than a pre-wired torus. Solutions in Figure 6 are different ways of folding a 1D strip into a torus, with or without the condition of periodicity in the 1D strip. Whether or not these different solutions would be discernible from one another in a practical setup is not clear to us. For example, global homology, as addressed in the Gardner paper, is the same for all these solutions. Furthermore, while our solutions of up to order 3 are highly discernable, higher order solutions, potentially achievable with other network parameters, would be impossible to discern by eye in representations similar to the ones in Figure 6. In addition, while we chose to keep our model in the simplest possible form as a clear proof of principle, new elements introduced to the model such as head directionality could break the symmetry and lead to the prevalence of one preferred solution for all simulation replicates. We plan to investigate this possibility in the future when attempting to incorporate path-integration capabilities to the model.

      (6) In real grid cells, there is a dense and fairly uniform representation of all phases (see the toroidal tuning of grid cells measured by Gardner et al). Here the distribution of phases is not shown, but Figure 7 suggests that phases are non uniformly represented, with significant clustering around a few discrete phases. This, I believe, is also the origin for the difficulty in identifying the toroidal topology based on the transpose of the matrix M: vectors representing the spatial response patterns of individual neurons are localized near the clusters, and there are only a few of them that represent other phases. Therefore, there is no dense coverage of the toroidal manifold that would exist if all phases were represented equally. This is not just a technical issue, however: there appears to be a mismatch between the results of the model and the experimental reality, in terms of the phase coverage.

      As mentioned in the results section, Figure 7 is meant for visualization purposes only, and serves more as cautionary tale regarding the imprevisible risks of non-linear dimensionality reduction than as a proof of the organization of activity in the network. Isomap is a non-linear transformation that deforms each of our solutions in a unique way so that, while all have the topology of a torus embedded in a high dimensional space, only a few of them exhibited one of two possible toroidal visualizations in a 3D Isomap reduction. Isomap, as well as all other popular dimensionality reduction techniques, provide no guarantee of topology invariance. A better argument to judge the homogenous distribution of phases is persistent homology, which identifies relatively large holes (compared to the sampling spacing) in the original manifold embedded in a high dimensional space. In our case, persistent homology identified only two holes significantly larger than noise (the two cycles of a torus) and one cavity in all conditions that included attractors. Regarding the specific distribution of phases in different conditions, however, see our reply below.

      (7) The manuscript makes several strong claims that incorrectly represent the relation between experimental data and attractor models, on one hand, and the present model on the other hand. For the latter, see the comments above. For the former, I provide a detailed list in the recommendations to the authors, but in short: the paper claims that attractor models induce rigidness in the neural activity which is incompatible with distortions seen in the spatial response patterns of grid cells. However, this claim seems to confuse distortions in the spatial response pattern, which are fully compatible with the attractor model, with distortions in the population activity patterns, which would be incompatible with the attractor model. The attractor model has withstood numerous tests showing that the population activity manifold is rigidly preserved across conditions - a strong prediction (which is not made, as far as I can see, by feedforward models). I am not aware of any data set where distortions of the population activity manifold have been identified, and the preservation has been demonstrated in many examples where the spatial response pattern is disrupted. This is the main point of two papers cited in the present manuscript: by Yoon et al, and Gardner et al.

      First of all, we would like to note that our model is a continuous attractor model. Different attractor models have different outcomes, and one of the main conclusions of our manuscript is that attractors can do a wider range of operations than previously thought.

      We agree with the reviewer that distortions in spatial activity (which speak against a purely path-integration guided attractor) should not be confused with distortions in the topology of the population activity (which would instead speak against the attractor dynamics itself). We have rephrased these observations in the manuscript. In fact, we believe that the capacity of grid cells to present distorted maps without a distortion of the population activity topology, as shown for example by Gardner and colleagues, could result from a tension between feedforward and recurrent inputs, the potential equilibriums of which our manuscript aims to characterize.

      (8) There is also some weakness in the mathematical description of the dynamics. Mathematical equations are formulated in discrete time steps, without a clear interpretation in terms of biophysically relevant time scales. It appears that there are no terms in the dynamics associated with an intrinsic time scale of the neurons or the synapses, and this introduces a difficulty in interpreting synaptic weights as being weak or strong. As mentioned above, the nature of the recurrent dynamics within the grid cell network (whether it exhibits continuous attractor behavior) is not sufficiently clear.

      We agree with the reviewer that our model is rather simple, and we value the extent to which this simplicity allows for a deep characterization. All models are simplifications and the best model in any given setup is the one with the minimum amount of complexity necessary to describe the phenomenon under study. We believe that to understand whether or not a 1D continuous attractor architecture can result in a toroidal population activity, a biophysically detailed model, with prohibitive computational costs, would have been unnecessarily complex. This argument does not intend to demerit biophysically detailed models, which are capable of addressing a wider range of questions regarding, for example, the spiking dynamics of grid cells, which cannot be addressed by our simple model.

      Reviewer #3 (Recommendations For The Authors):

      The work points to an interesting scenario for the emergence of toroidal topology, but the interpretation of this idea should be more nuanced. I recommend reconsidering the claims about limitations of the attractor theory, and acknowledging the limitations of the present theory.

      I don't see the limitations mentioned above as a reason to reject the ideas proposed in this manuscript, for two main reasons: first, additional research might reveal a regime of parameters where some issues can be resolved (e.g. the clustering of phases). In addition, the mechanism described here might act at an early stage in development to set up initial dynamics along a toroidal manifold, while other mechanisms might be responsible for the rigidity of the toroidal manifold in an adult animal. But all this implies that the novelty in the present manuscript is weaker than implied, the ability to explain experimental observations is more limited than implied, and these limitations should be acknowledged and discussed.

      I recommend reporting on the distribution of grid cell phases and, if indeed clustered, this should be discussed. It will be helpful to explore whether this is the reason for the difficulty in identifying the toroidal topology based on the collection of spatial response patterns (using the transpose of the matrix M).

      Ideally, a more complete work would also explore in a more systematic and parametric way the influence of the recurrent connectivity's strength on the learning, and whether a toroidal manifold emerges also in non-planar, such as the wagon-wheel environment studied in Gardner et al.

      Part of these recommendations have been addressed in the previous points (public review). Regarding the reason why the transpose of M does not fully recapitulate architecture with our conservative classification criteria, we believe that there is no reason why it should in the first place. We view the fact that the transpose of M recapitulates some features of the architecture as a purely phenomenological observation, and we think it is important as a proof that M is not exactly the same for the different conditions. We imagined that if M matrices were exactly the same this could be due to poor spatial sampling by our bins. Knowing that they are intrinsically different is important even if the reason why they have these specific features is not fully clear to us.

      Although we do not think that the distribution of phases is related to the absence of a cavity in the transpose of M or to the four clusters found in Isomap projections, it remains an interesting question that we did not explore initially. We are now showing examples of the distribution of phases in Figure S1. We observed that in both 2D and 1D conditions phases are distributed following rather regular patterns. Whether or not these patterns are compatible with experimental observations of phase distribution is to our view debatable, given that so far state-of-the-art techniques have only allowed to simultaneously record a small fraction of the neurons belonging to a given module. This said, we think that it is important to note that ordered phase patterns are an anecdotal outcome of our simulations rather than a necessary outcome of flexible attractors or attractors in general. To prove this point, we simulated a condition with a new architecture represented by the overlap of 20 short 1DL attractors, each recruiting 10 random neurons from the pool of 100 available ones.

      The rest of the parameters of the simulations were identical to those in the other conditions.

      By definition, the topology of this architecture has Betti numbers [20,0,0]. We show in Figure S1 that this architecture aligns grid cells, with individual and population gridness reaching slightly lower levels compared to the 1D condition. However, the distribution of phases of these grid cells has no discernible pattern. This result is an arbitrary example that serves as a proof-of-principle to show that flexible attractors can align grid cells without exhibiting ordered phases, not a full characterization of the outcome of this type of architecture, which we leave for future work. For the rest of our work, we stick to the simplest versions of 1D architectures, which allow for a more in-depth characterization.

      The wagon-wheel is an interesting case in which maps loose hexagonal symmetry although the population activity lies in a torus, perhaps evidencing the tension between feedforward and recurrent inputs and suggesting that grid cell response does not obey the single master of path integration. If we modeled it with a 1D attractor, we believe the outcome would strongly depend on virtual rat trajectory. If the trajectory was strictly linear, the population activity would be locally one-dimensional and potentially represented by a ring. Instead, if the trajectory allowed for turns, i.e. a 2D trajectory within a corridor-like maze, the population activity would be toroidal as in our open field simulations, while maps would not have perfect hexagonal symmetry, mimicking experimental results.

      More minor comments:

      Recurrent dynamics are modeled as if there is no intrinsic synaptic or membrane time constant. This may be acceptable for addressing the goals of this paper, but it is a bit unusual and it will be helpful to explain and justify this choice.

      As mentioned above, we believe that the best model in a given setup is the one with the lowest number of complexities that can still address the phenomenon under study. One does not use general relativity to build a bridge, although it provides a ‘more accurate’ description of the physics involved. All models are simplifications, and the more complex a model, the more it has to be taken as a black box.

      The Introduction mentions that in most models interaction between co-modular neurons occurs through direct excitatory communication, but in quite a few models the interaction is inhibitory. The crucial feature is that the interaction is strongly inhibitory between neurons that differ in their tuning, and either less inhibitory or excitatory between neurons with similar phases.

      We agree that directed inhibition has been shown to be as efficient as directed excitation, and we have modified the introduction to reflect this.

      The Discussion claims that the present work is the first one in which the topology of the recurrent architecture differs from the topology of the emergent state space. However, early works on attractor models of grid cells showed how neural connectivity which is arranged on a 2d plane, without any periodic boundary conditions, leads to a state space that exhibits the toroidal topology. Therefore, this claim should be revised.

      We agree, although the 2D sheet in this case acts as a piece of the torus, and locally the input space and architecture are identical objects. It could be argued that architectures that represent a 2D local slice of the torus, the whole torus, or several cycles around the torus form a continuous family parametrized by the extension of recurrent connections, and as a consequence it is not surprising that these works have not made claims about the incongruence between architecture and representation topologies. The 2D sheet connectivity is still constructed ad hoc to organize activity in a 2D bump, and there is no negotiation between disparate constraints because locally the constraints imposed by input and architecture are the same. We believe this situation is conceptually different from our flexible 1D attractors. We have adapted our claim to include this technical nuance.

      Why are neural responses in the perimeter of the environment excluded from the topological analysis? The whole point of the toroidal manifold analysis on real experimental data is that the toroidal manifold is preserved regardless of the animal's location and behavioral condition.

      We agree, although experimental data needs to go through extensive pre-processing such as dimensionality reduction before showing a toroidal topology. Such manipulations might smooth away the specific effects of boundaries on maps, together with other sources of noise. In our case, the original reason to downsample the dataset is related to the explosion in computational time that we experience with the ripser package when using more than ~1000 data points. For a proof-of-principle characterization we were much more interested in what happened in the center of the arena, where a 1D attractor could fold itself to confine population activity into a torus. The area we chose was sufficiently large to contain the whole torus. Borders do affect the way the attractor folds (they also affect grid maps in real rats). We feel that these imperfections could be interesting to study in relation to the parameters controlling how our virtual rat behaves at the borders, but not at this proof-of-principle stage.

      The periodic activity observed in Ref. 29 could in principle provide the basis for the ring arrangement of neurons. However, it is not yet clear whether grid cells participate in this periodic activity.

      We agree. So far it seems that entorhinal cells in general participate in the ring, which would imply that all kinds of cells are involved. However, it could well be that only some functional types participate in the ring and grid cells specifically do not, as future experiments will tell.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work explores death coding data to understand the impact of COVID-19 on cancer mortality. The work provides solid evidence that deaths with cancer as a contributing cause were not above what would be expected during pandemic waves, suggesting that cancer did not strongly increase the risk of dying of COVID-19. These results are an interesting exploration into the coding of causes of death that can be used to make sense of how deaths are coded during a pandemic in the presence of other underlying diseases, such as cancer.

      We thank the editor and reviewers for the time they took to review our manuscript and for the thoughtful suggestions they provided. We have completed several revisions based on their feedback and we feel our paper is stronger as a result. However, none of these revisions change the overall conclusions of our study.

      Reviewer #1 (Public Review):

      Summary:

      In the paper "Disentangling the relationship between cancer mortality and COVID-19", the authors study whether the number of deaths in cancer patients in the USA went up or down during the first year (2020) of the COVID-19 pandemic. They found that the number of deaths with cancer mentioned on the death certificate went up, but only moderately. In fact, the excess with-cancer mortality was smaller than expected if cancer had no influence on the COVID mortality rate and all cancer patients got COVID with the same frequency as in the general population. The authors conclude that the data show no evidence of cancer being a risk factor for COVID and that the cancer patients were likely actively shielding themselves from COVID infections.

      Strengths:

      The paper studies an important topic and uses sound statistical and modeling methodology. It analyzes both, deaths with cancer listed as the primary cause of death, as well as deaths with cancer listed as one of the contributing causes. The authors argue, correctly, that the latter is a more important and reliable indicator to study relationships between cancer and COVID. The authors supplement their US-wide analysis by analysing three states separately.

      Weaknesses:

      The main findings of the paper can be summarized as six numbers. Nationally, in 2022, multiple-cause cancer deaths went up by 2%, Alzheimer's deaths by 31%, and diabetes deaths by 39%. At the same time, assuming no relationship between these diseases and either Covid infection risk or Covid mortality risk, the deaths should have gone up by 7%, 46%, and 28%. The authors focus on cancer deaths and as 2% < 7%, conclude that cancer is not a risk factor for COVID and that cancer patients must have "shielded" themselves against Covid infections.

      However, I did not find any discussion of the other two diseases. For diabetes, the observed excess was 39% instead of "predicted by the null model" 28%. I assume this should be interpreted as diabetes being a risk factor for Covid deaths. I think this should be spelled out, and also compared to existing estimates of increased Covid IFR associated with diabetes.

      And what about Alzheimer's? Why was the observed excess 31% vs the predicted 46%? Is this also a shielding effect? Does the spring wave in NY provide some evidence here? Why/how would Alzheimer's patients be shielded? In any case, this needs to be discussed and currently, it is not.

      We thank the reviewer for their positive feedback on the paper and for these suggestions. It is true that we have emphasized the impact on cancer deaths, as this was the primary aim of the paper. In the revised version, we have expanded the results and discussion sections to more fully describe the other chronic conditions we used as comparators (lines 267-284;346 – 386).

      Note that we are somewhat reluctant to designate any of these conditions as risk factors based solely on comparing the time series model with the demographic model of our expectations. As we mention in the discussion, there is considerable uncertainty around estimates from the demographic model in terms of the size of the population-at-risk, the mean age of the population-at-risk, and the COVID-19 infection rates and infection fatality ratios. Our demographic model is primarily used to demonstrate the effects of competing risks across types of cancers and chronic conditions, since these findings are robust to model assumptions. In contrast, the demographic model should be used with caution if the goal is to titrate the level of these risk factors (as the level of imputed risk is dependent on model assumptions). In the updated version of the manuscript, we have included uncertainty intervals in Table 3, using the upper and lower bounds of the estimated infection rates and IFRs, to better represent this uncertainty. We have also discussed this uncertainty more explicitly in the text and ran sensitivity analyses with different infection rate assumptions in the discussion (lines 354-362; 367 -370).

      We would like to note that rather than interpreting the absolute results, we used this demographic model as a tool to understand the relative differences between these conditions. From the demographic model we determined that we would expect to see much higher mortality in diabetes and Alzheimer’s deaths compared to cancer deaths due to three factors (1. Size of population-at-risk, 2. Mean age of the population-at-risk, 3. Baseline risk of mortality from the condition), that are separate from the COVID-19 associated IFR. And in general, this is what we observed.

      In comparing the results from the demographic model to the observed excess, diabetes does standout as an outlier from cancer and Alzheimer’s disease in that the observed excess is consistently above the null hypothesis which does lend support to the conclusion that diabetes is in fact a risk factor for COVID-19. A conclusion which is also supported by many other studies. Our findings for hematological cancers are also similar, in that we find consistent support for this condition being a risk factor. We have commented on this in the discussion and added a few references (lines 346-354; 395-403).

      Our hypothesis regarding non-hematological cancer deaths (lower than anticipated mortality due to shielding) could also apply to Alzheimer’s deaths. Furthermore, we used the COVID-19 attack rate for individuals >65 years (based on the data that is available), but we estimate that the mean age of Alzheimer’s patients is actually 80-81 years, so this attack rate may in fact be a bit too high, which would increase our expected excess. We have commented on this in the discussion (lines 363-377).

      Reviewer #2 (Public Review):

      The article is very well written, and the approach is quite novel. I have two major methodological comments, that if addressed will add to the robustness of the results.

      (1) Model for estimating expected mortality. There is a large literature using a different model to predict expected mortality during the pandemic. Different models come with different caveats, see the example of the WHO estimates in Germany and the performance of splines (Msemburi et al Nature 2023 and Ferenci BMC Medical Research Methodology 2023). In addition, it is a common practice to include covariates to help the predictions (e.g., temperature and national holidays, see Kontis et al Nature Medicine 2020). Last, fitting the model-independent for each region, neglects potential correlation patterns in the neighbouring regions, see Blangiardo et al 2020 PlosONE.

      Thank you for these comments and suggestions. We agree there are a range of methods that can be used for this type of analysis, and they all come with their strengths, weaknesses, and caveats. Broadly, the approach we chose was to fit the data before the pandemic (2014-2019), and project forward into 2020. To our knowledge it is not a best practice to use an interpolating spline function to extrapolate to future years. This is demonstrated by the WHO estimates in Germany in the paper you mention. This was our motivation for using polynomial and harmonic terms.

      Based on the above:

      a. I believe that the authors need to run a cross-validation to justify model performance. I would suggest training the data leaving out the last year for which they have mortality and assessing how the model predicts forward. Important metrics for the prediction performance include mean square error and coverage probability, see Konstantinoudis et al Nature Communications 2023. The authors need to provide metrics for all regions and health outcomes.

      Thank you for this suggestion. We agree that our paper could be strengthened by including cross validation metrics to justify model performance. Based on this suggestion, and your observations regarding Alzheimer’s disease, we have done two things. First, for the full pre-pandemic period (2014-2019) for each chronic condition and location we tested three different models with different degree polynomials (1. linear only, 2. linear + second degree polynomial, 3. linear + second degree polynomial + third degree polynomial) and used AIC to select the best model for each condition and location. Next, also in response to your suggestion, we estimated coverage statistics. Using the best fit model from the previous step, we then fit the model to data from 2014-2018 only and used the model to predict the 2019 data. We calculated the coverage probability as the proportion of weekly observed data points that fell within the 95% prediction interval. For all causes of death and locations the coverage probability was 100% (with the exception of multiple cause kidney disease in California, which is only shown in the appendix). The methods and results have been updated to reflect this change and we have added a figure to the appendix showing the selected model and coverage probability for each cause of death and location (lines 504 – 519; 847-859; Appendix 1- Figure 11).

      b. In the context of validating the estimates, I think the authors need to carefully address the Alzheimer case, see Figure 2. It seems that the long-term trends pick an inverse U-shape relationship which could be an overfit. In general, polynomials tend to overfit (in this case the authors use a polynomial of second degree).It would be interesting to see how the results change if they also include a cubic term in a sensitivity analysis.

      Thank you for this observation. Based on the changes described above, the model for Alzheimer’s disease now includes a cubic term in the national data and in Texas and California. The model with the second-degree polynomial remained the best fit for New York (Appendix 1 – Figure 11).

      c. The authors can help with the predictions using temperature and national holidays, but if they show in the cross-validation that the model performs adequately, this would be fine.

      At the scale of the US, adding temperature or environmental covariates is difficult and few US-wide models do so (see Goldstein 2012 and Quandelacy 2014 for examples from influenza). Furthermore, because we are looking at chronic disease outcomes, it is unclear that viral covariates or national holidays would drive these outcomes in the same way as they would if we were looking at mortality outcomes more directly related to transmissible diseases (such as respiratory mortality). Our cross validation also indicates that our models fit well without these additional covariates.

      d. It would be nice to see a model across the US, accounting for geography and spatial correlation. If the authors don't want to fit conditional autoregressive models in the Bayesian framework, they could just use a random intercept per region.

      We think the reviewer is mistaken here about the scale of our national analysis. Our national analysis did not fit independent models for each state or region. Rather, we fit a single model to the weekly-level national mortality data where counts for the whole of the US have been aggregated. We have clarified in the text (lines 156, 464). As such, we do not feel a model accounting for spatial correlation would be appropriate nor would we be able to include a random intercept for each region. We did fit three states independently (NY, TX, CA), but these states are very geographically distant from each other and unlikely to be correlated. These states were chosen in part because of their large population sizes, yet even in these states, confidence intervals were very wide for certain causes of death. Fitting models to each of the 50 US states, most of which are smaller than those chosen here, would exacerbate this issue.

      (2) I think the demographic model needs further elaboration. It would be nice to show more details, the mathematical formula of this model in the supplement, and explain the assumptions

      Thank you for this comment. We have added additional details on the demographic model to the methods. We have also extended this analysis to each state to further strengthen our conclusions (lines 548-590).

      Reviewing Editor Recommendations:

      I think that perhaps something that is missing is that the authors never make their underlying assumption explicit: they are assuming that if cancer increases the risk of dying of COVID-19, this would be reflected in the data on multiple causes of death where cancer would be listed as one of the multiple causes rather than as the underlying cause, and that their conclusions are predicated on this assumption. I would suggest explicitly stating this assumption, as opposed to other reasons why cancer mortality would increase (ex. if cancer care worsened during pandemic waves leading to poorer cancer survival).

      Response: Thank you for this suggestion. We have added a few sentences to the introduction to make this assumption clear (lines 106-112).

      Reviewer #1 (Recommendations For The Authors):

      - It could make sense to add "in the United States" into the title, as the paper only analyses US data.

      - It may make sense to reformulate the title from "disentangling the relationship..." into something that conveys the actual findings, e.g. "Lack of excess cancer mortality during Covid-19 pandemic" or something similar. Currently, the title tells nothing about the findings.

      Thank you for these suggestions. We have added “in the US” to the title. However, we feel that our findings are a bit more subtle than the suggested reformulation would imply, and we prefer to leave it in its current form.

      - Abstract, lines 42--45: This is the main finding of the paper, but I feel it is simplified too strongly in the abstract. Your simulations do *not* "largely explain" excess mortality with cancer; they give higher numbers! Which you interpret as "shielding" etc., but this is completely absent from the abstract. This sentence makes the impression that you got a good fit between simulated excess and real excess, which I would say is not the case.

      Thank you for this comment. We have rephrased the sentence in the abstract to better reflect our intentions for using the demographic model (lines 46-49). As stated above, the purpose of the demographic model was not to give a good fit with the observed excess mortality. Rather, we used the demographic model as a tool to understand the relative differences between these conditions in terms of expected excess mortality given the size, age-distribution, and underlying risk of death from the condition itself, assuming similar IFR and attack rates. And based on this, we conclude that it is not necessarily surprising that we see higher excess mortality for diabetes and Alzheimer’s compared to cancer.

      - Results line 237: you write that it's "more consistent with the null hypothesis", however clearly it is *not* consistent with the null hypothesis either (because 2% < 7%). You discuss in the Discussion that it may be due to shielding, but it would be good to have at least one sentence about it already here in the Results, and refer to the Discussion.

      We have mentioned this in the results and refer to the discussion (lines 277-278).

      - Results line 239: why was it closer to the assumption of relative risk 2? If I understand correctly, your model prediction for risk=1 was 7% and for risk=2 it was 13%. In NY you observed 8% (line 187). How is this closer to risk=2?

      Thank you for this observation. We have updated the demographic model with new data, extended the model to state-level data, and included confidence intervals on these estimates. We have also added additional discussion around the differences between our observations and expectations (lines 249-284).

      - Discussion line 275: "we did not expect to see large increases" -- why exactly? Please spell it out here. Was it due to the age distribution of the cancer patients? Was it due to the high cancer death risk?

      We demonstrate that it is the higher baseline risk of death for cancer that seems to be driving our low expectations for cancer excess mortality (lines 304-320). We have added this to the sentence to clarify our conclusions on this point and have added a figure to better illustrate this concept of competing risks (Figure 6).

      - Methods, line 405: perhaps it makes sense to cite some other notable papers on Covid excess mortality such as Msemburi et al Nature 2023, Karlinsky & Kobak eLife 2021, Islam et al BMJ 2021, etc.

      Thank you for mentioning this oversight. We certainly should have cited these papers and have included them in the updated version.

      - Methods line 410: why did you use a 5-week moving average? Why not fit raw weekly death counts? NB regression should be able to deal with it.

      Smoothing time series data with a moving average prior to running regression models is a very common practice. We did a sensitivity analysis using the raw data. This produced excess estimates with slightly larger confidence intervals, but does not change the overall conclusions of the paper.

      - Methods line 416: please indicate the software/library/package you used for fitting NB regression.

      We fit the NB regression using the MASS package in R version 4.3. We have added this to the methods (line 519).

      - Line 489: ORCHID -> ORCID

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      Codol et al. present a toolbox that allows simulating biomechanically realistic effectors and training Artificial Neural Networks (ANNs) to control them. The paper provides a detailed explanation of how the toolbox is structured and several examples that demonstrate its usefulness.

      Main comments:

      (1) The paper is well written and easy to follow. The schematics help in understanding how the toolbox works and the examples provide an idea of the results that the user can obtain.

      We thank the reviewer for this comment.

      (2) As I understand it, the main purpose of the paper should be to facilitate the usage of the toolbox. For this reason, I have missed a more explicit link to the actual code. As I see it, researchers will read this paper to figure out whether they can use MotorNet to simulate their experiments, and how they should proceed if they decide to use it. I'd say the paper provides an answer to the first question and assures that the toolbox is very easy to install and use. Maybe the authors could support this claim by adding "snippets" of code that show the key steps in building an actual example.

      This is an important point, which we also considered when writing this paper. We instead decided to focus on the first approach, because it is easier to illustrate the scientific use of the toolbox using code or interactive (Jupyter) notebooks than a publication format. We find the “how to proceed” aspect of the toolbox can more easily and comprehensively be covered using online, interactive tutorials. Additionally, this allows us to update these tutorials as the toolbox evolves over different versions, while it is more difficult to update a scientific article. Consequently, we explicitly avoided code snippets on the article itself. However, we appreciate that the paper would gain in clarity if this was more explicitly stated early. We have modified the paper to include a pointer to where to find tutorials online. We added this at the last paragraph of the introduction section:

      The interested reader may consult the full API documentation, including interactive tutorials on the toolbox website at https://motornet.org.

      (3) The results provided in Figures 1, 4, 5 and 6 are useful, because they provide examples of the type of things one can do with the toolbox. I have a few comments that might help improving them:

      a. The examples in Figures 1 and 5 seem a bit redundant (same effector, similar task). Maybe the authors could show an example with a different effector or task? (see point 4).

      The effectors from figures 1 and 5 are indeed very similar. However, the tasks in figure 1 and 5 present some important differences. The training procedure in figure 1 never includes any perturbations, while the one from figure 5 includes a wide range of perturbations of different magnitudes, timing and directions. The evaluation procedure of figure 1 includes center-out reaches with permanent viscous (proportional to velocity) external dynamics, while that of figure 5 are fixed, transient, square-shaped perturbation orthogonal to the reach direction. Finally, the networks in figure 1 undergo a second training procedure after evaluation while the network of figure 5 do not.

      While we agree that some variation of effectors would be beneficial, we do show examples of a point-mass effector in figure 6. Overall, figure 5 shows a task that is quite different from that of figure 1 with a similar effector, while the opposite is true for figure 6. We have modified the text to clarify this for the reader, by adding the following.

      End of 1st paragraph, section 2.4.

      Therefore, the training protocol used for this task largely differed from section 2.1 in that the networks are exposed to a wide range of mechanical perturbations with varying characteristics.

      1st paragraph of section 2.5

      […] this asymmetrical representation of PMDs during reaching movements did not occur when RNNs were trained to control an effector that lacked the geometrical properties of an arm such as illustrated in Figure 4c-e and section 2.1.

      b. I missed a discussion on the relevance of the results shown in Figure 4. The moment arms are barely mentioned outside section 2.3. Are these results new? How can they help with motor control research?

      We thank the reviewer for this comment. This relates to a point from reviewer 2 indicating that the purpose of each section was sometimes difficult to grasp as one reads. Section 2.3 explains the biomechanical properties that the toolbox implements to improve realism of the effector. They are not new results in the sense that other toolboxes implement these features (though not in differentiable formats) and these properties of biological muscles are empirically well-established. However, they are important to understand what the toolbox provides, and consequently what constraints networks must accommodate to learn efficient control policies. An example of this is the results in figure 6, where a simple effector versus a more biomechanically complex effector will yield different neural representations.

      Regarding the manuscript itself, we agree that more clarity on the goal of every paragraph may improve the reader’s experience. Consequently, we ensured to specify such goals at the start of each section. Particularly, we clarify the purpose of section 2.3 by adding several sentences on this at the end of the first paragraph in that section. We also now clearly state the purpose of section 2.3 with the results of figure 6 and reference figure 4 in that section.

      c. The results in Figure 6 are important, since one key asset of ANNs is that they provide access to the activity of the whole population of units that produces a given behavior. For this reason, I think it would be interesting to show the actual "empirical observations" that the results shown in Fig. 6 are replicating, hence allowing a direct comparison between the results obtained for biological and simulated neurons.

      These empirical observations are available from previous electrophysiological and modelling work. Particularly, polar histograms across reaching directions like panel C are displayed in figures 2 and 3 of Scott, Gribble, Graham, Cabel (2001, Nature). Colormaps of modelled unit activity across time and reaching directions like panel F are also displayed in figure 2 of Lillicrap, Scott (2013, Neuron). Electrophysiological recordings of M1 neurons during a similar task in non-human primates can also be seen on “Preserved neural population dynamics across animals performing similar behaviour” figure 2 B (https://doi.org/10.1101/2022.09.26.509498) and “Nonlinear manifolds underlie neural population activity during behaviour” figure 2 B as well (https://doi.org/10.1101/2023.07.18.549575). Note that these two pre-prints use the same dataset.

      We have added these citations to the text and made it explicit that they contain visualizations of similar modelling and empirical data for comparison:

      This heterogeneous set of responses matches empirical observations in non-human primate primary motor cortex recordings (Churchland & Shenoy, 2007; Michaels et al., 2016) and replicate similar visualizations from previously published work (Fortunato et al., 2023; Lillicrap & Scott, 2013; Safaie et al., 2023).

      (4) All examples in the paper use the arm26 plant as effector. Although the authors say that "users can easily declare their own custom-made effector and task objects if desired by subclassing the base Plant and Task class, respectively", this does not sound straightforward. Table 1 does not really clarify how to do it. Maybe an example that shows the actual code (see point 2) that creates a new plant (e.g. the 3-joint arm in Figure 7) would be useful.

      Subclassing is a Python process more than a MotorNet process, as python is an object-oriented language. Therefore, there are many Python tutorials on subclassing in the general sense that would be beneficial for that purpose. We have amended the main text to ensure that this is clearer to the reader.

      Subclassing a MotorNet object, in a more specific sense, requires overwriting some methods from the base MotorNet classes (e.g., Effector or Environment classes, which correspond to the original Plant and Task object, respectively). Since we made the decision (mentioned above) to not include code in the main text, we added tutorials to the online documentation, which include dedicated tutorials for MotorNet class subclassing. For instance, this tutorial showcases how to subclass Environment classes:

      https://colab.research.google.com/github/OlivierCodol/MotorNet/blob/master/examples/3-environments.ipynb

      (5) One potential limitation of the toolbox is that it is based on Tensorflow, when the field of Computational Neuroscience seems to be, or at least that's my impression, transitioning to pyTorch. How easy would it be to translate MotorNet to pyTorch? Maybe the authors could comment on this in the discussion.

      We have received a significant amount of feedback asking for a PyTorch implementation of the toolbox. Consequently, we decided to enact this, and the next version of the toolbox will be exclusively in PyTorch. We will maintain the Application Programming Interface (API) and tutorial documentation for the TensorFlow version of the toolbox on the online website. However, going forward we will focus exclusively on bug-fixing and expanding from the latest version of MotorNet, which will be in PyTorch. We now believe that the greater popularity of PyTorch in the academic community makes that choice more sustainable while helping a greater proportion of research projects.

      These changes led to a significant alteration of the MotorNet structure, which are reflected by changes made throughout the manuscript, notably in Figure 3 and Table 1.

      (6) Supervised learning (SL) is widely used in Systems Neuroscience, especially because it is faster than reinforcement learning (RL). Thus providing the possibility of training the ANNs with SL is an important asset of the toolbox. However, SL is not always ideal, especially when the optimal strategy is not known or when there are different alternative strategies and we want to know which is the one preferred by the subject. For instance, would it be possible to implement a setup in which the ANN has to choose between 2 different paths to reach a target? (e.g. Kaufman et al. 2015 eLife). In such a scenario, RL seems to be a more natural option Would it be easy to extend MotorNet so it allows training with RL? Maybe the authors could comment on this in the discussion.

      The new implementation of MotorNet that relies on PyTorch is already standardized to use an API that is compatible with Gymnasium. Gymnasium is a standard and popular interfacing toolbox used to link RL agents to environments. It is very well-documented and widely used, which will ensure that users who wish to employ RL to control MotorNet environments will be able to do so relatively effortlessly. We have added this point to accurately reflect the updated implementation, so users are aware that it is now a feature of the toolbox (new section 3.2.4.).

      Impact:

      MotorNet aims at simplifying the process of simulating complex experimental setups to rapidly test hypotheses about how the brain produces a specific movement. By providing an end-to-end pipeline to train ANNs on the simulated setup, it can greatly help guide experimenters to decide where to focus their experimental efforts.

      Additional context:

      Being the main result a toolbox, the paper is complemented by a GitHub repository and a documentation webpage. Both the repository and the webpage are well organized and easy to navigate. The webpage walks the user through the installation of the toolbox and the building of the effectors and the ANNs.

      Reviewer #2 (Public Review):

      MotorNet aims to provide a unified interface where the trained RNN controller exists within the same TensorFlow environment as the end effectors being controlled. This architecture provides a much simpler interface for the researcher to develop and iterate through computational hypotheses. In addition, the authors have built a set of biomechanically realistic end effectors (e.g., an 2 joint arm model with realistic muscles) within TensorFlow that are fully differentiable.

      MotorNet will prove a highly useful starting point for researchers interested in exploring the challenges of controlling movement with realistic muscle and joint dynamics. The architecture features a conveniently modular design and the inclusion of simpler arm models provides an approachable learning curve. Other state-of-the-art simulation engines offer realistic models of muscles and multi-joint arms and afford more complex object manipulation and contact dynamics than MotorNet. However, MotorNet's approach allows for direct optimization of the controller network via gradient descent rather than reinforcement learning, which is a compromise currently required when other simulation engines (as these engines' code cannot be differentiated through).

      The paper could be reorganized to provide clearer signposts as to what role each section plays (e.g., that the explanation of the moment arms of different joint models serves to illustrate the complexity of realistic biomechanics, rather than a novel discovery/exposition of this manuscript). Also, if possible, it would be valuable if the authors could provide more insight into whether gradient descent finds qualitatively different solutions to RL or other non gradient-based methods. This would strengthen the argument that a fully differentiable plant is useful beyond improving training time / computational power required (although this is a sufficiently important rationale per se).

      We thank the reviewer for these comments. We agree that more clarity on the section goals may improve the reader’s experience and ensured this is the case throughout the manuscript. Particularly, we added the following on the first paragraph of section 2.3, for which an explicit goal was most missing:

      In this section we illustrate some of these biomechanical properties displayed by MotorNet effectors using specific examples. These properties are well-characterised in the biology and are often implemented in realistic biomechanical simulation software.

      Regarding the potential difference in solutions obtained from reinforcement or supervised learning, this would represent a non-trivial amount of work to do so conclusively and so may not be within the scope of the current article. We do appreciate however that in some situations RL may be a more fitting approach to a given task design. In relation to this point we now specify in the discussion that the new API can accommodate interfacing with reinforcement learning toolboxes for those who may want to pursue this type of policy training approach when appropriate (new section 3.2.4.).

      Reviewer #3 (Public Review):

      Artificial neural networks have developed into a new research tool across various disciplines of neuroscience. However, specifically for studying neural control of movement it was extremely difficult to train those models, as they require not only simulating the neural network, but also the body parts one is interested in studying. The authors provide a solution to this problem which is built upon one of the main software packages used for deep learning (Tensorflow). This allows them to make use of state-of-the-art tools for training neural networks.

      They show that their toolbox is able to (re-)produce several commonly studied experiments e.g., planar reaching with and without loads. The toolbox is described in sufficient detail to get an overview of the functionality and the current state of what can be done with it. Although the authors state that only a few lines of code can reproduce such an experiment, they unfortunately don't provide any source code to reproduce their results (nor is it given in the respective repository).

      The possibility of adding code snippets to the article is something we originally considered, and which aligns with comment two from reviewer one (see above). Hopefully this provides a good overview of the motivation behind our choice not to add code to the article.

      The modularity of the presented toolbox makes it easy to exchange or modify single parts of an experiment e.g., the task or the neural network used as a controller. Together with the open-source nature of the toolbox, this will facilitate sharing and reproducibility across research labs.

      I can see how this paper can enable a whole set of new studies on neural control of movement and accelerate the turnover time for new ideas or hypotheses, as stated in the first paragraph of the Discussion section. Having such a low effort to run computational experiments will be definitely beneficial for the field of neural control of movement.

      We thank the reviewer for these comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The main goal of the authors was to study the testis-specific role of the protein FBXO24 in the formation and function of the ribonucleoprotein granules (membraneless electron-dense structures rich in RNAs and proteins).

      We appreciate the summary comment of reviewer #1.

      Strengths:

      The wide variety of methods used to support their conclusions (including transgenic models)

      We appreciate the positive comment of reviewer #1.

      Weaknesses:

      The lack of specific antibodies against FBXO24. Some of the experiments showing a specific phenotype are descriptive and lack of logical explanation about the possible mechanism (i.e. AR or the tail structure).

      Because we could not obtain specific antibodies against FBXO24, we generated Fbxo24-FLAG transgenic mice, which can be used to show the interaction between FBXO24 and IPO5. For the mechanism of impaired acrosome reaction, we added some results and discussion as written in the response to the question (1) of reviewer #1 (public review). For the mechanism of abnormal flagellar structure, we added new results and fixed the manuscript as written in the response to the major comments of reviewer #3 (recommendations for the authors).

      Questions:

      The paper is excellent and employs a wide variety of methods to substantiate the conclusions. I have very few questions to ask:

      (1) KO mice cannot undergo acrosome reaction (AR) even spontaneously. How do you account for this, given that no visible defects were observed in the acrosome?

      One possibility is that Fbxo24 KO spermatozoa cannot undergo capacitation; however, it is difficult to analyze the capacitation status such as tyrosine phosphorylation because most Fbxo24 KO spermatozoa are not alive (Figure S3A). Other possibility is that AR-related proteins are affected in Fbxo24 KO spermatozoa. Therefore, we analyzed the amounts of AR-related proteins with mass spectrometry (Figure S3C). Although previous studies indicate that the assembly of the SNARE complex is a key event prior to AR [Hutt et al., 2005 (PMID: 15774481); Katafuchi et al., 2000 (PMID: 11066067); Schulz et al., 1997 (PMID: 9356173); Tomes et al., 2002 (PMID: 11884041)], no clear differences were detected for SNARE proteins (Figure S3C and D). PLCD4 that is important for AR [Fukami et al., 2001 (PMID: 11340203)) was also detected in Fbxo24 KO spermatozoa (Figure S3C). Although we could not find differences in the amounts of AR-related proteins, it is still possible that FER1L5, another AR-related protein [Morohoshi et al., 2023 (PMID: 36696506)] not detected in the mass spectrometry analyses, or AR-related proteins not yet identified are affected in Fbxo24 KO spermatozoa. We added these results and discussion (line 160-166 and 305-312).

      (2) KO sperm are unable to migrate in the female tract, and, more intriguingly, they do not pass through the utero-tubal junction (UTJ). The levels of ADAM3 are normal, suggesting that the phenotype is influenced by other factors. The authors should investigate the levels of Ly6K since mice also exhibit the same phenotype but with normal levels of ADAM3.

      We detected LY6K in Fbxo24 KO spermatozoa with immunoblotting, but no difference was found.

      We added the results (Figure S3E and line 172–175).

      (3) In Figure 4A, the authors assert that "RBGS Tg mice revealed that mitochondria were abnormally segmented in Fbxo24 KO spermatozoa." I am unable to discern this from the picture shown in that panel. Could you please provide a more detailed explanation or display the information more explicitly?

      We are sorry for the ambiguous explanation on the morphology of sperm mitochondria sheath. Fbxo24 KO cauda epidydimal spermatozoa shows disorganized mitochondria sheath rather than “segmented”. We fixed the sentence (line 190-192) and added white arrowheads that indicate the disorganized regions (Figure 4A).

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kaneda et al "FBXO24 ensures male fertility by preventing abnormal accumulation of membraneless granules in sperm flagella" is a significant paper on the role of FBXO24 in murine male germ cell development and sperm ultrastructure and function. The body of experimental evidence that the authors present is extraordinarily strong in both breadth and depth. The authors investigate the protein's functions in male germ cells and sperm using a wide variety of approaches but focusing predominantly on their novel mouse model featuring deletion of the Fbxo24 gene and its product. Using this mouse, and a cross of it with another model that expresses reporters in the head and midpiece, they logically build from one experiment to the next. Together, their data show that this protein is involved in the regulation of membraneless electron-dense structures; loss of FBXO24 led to an accumulation of these materials and defects in the sperm flagellum and fertilizing ability. Interestingly, the authors found that several of the best-known components of electron-dense ribonucleoprotein granules that are found in the intermitochondrial cement and chromatoid body were not disrupted in the Fbxo24 knockout, suggesting that the electron-dense material and these structures are not all the same, and the biology is more complicated than some might have thought. They found evidence for the most changes in IPO5 and KPNB1, and biochemical evidence that FBXO24 and IPO5 could interact.

      We appreciate the summary comment of reviewer #2.

      Strengths:

      The authors are to be commended for the thoroughness of their experimental approaches and the extent to which they investigated impacts on sperm function and potential biochemical mechanisms. Very briefly, they start by showing that the Fbxo24 message is present in spermatids and that the protein can interact with SKP1, in a way that is dependent on its F-box domain. This points toward a potential function in protein degradation. To test this, they next made the knockout mouse, validated it, and found the males to be sterile, although capable of plugging a female. Looking at the sperm, they identified a number of ultrastructural and morphological abnormalities, which they looked at in high resolution using TEM. They also cross their model with RBGS mice so that they have reporters in both the acrosome and mitochondria. The authors test a variety of sperm functions, including motility parameters, ability to fertilize by IVF, cumulus-free IVF, zona-free-IVF, and ICSI. They found that ICSI could rescue the knockout but not other assisted reproductive technologies. Defects in male fertility likely resulted from motility disruption and failure to get through the utero-tubal junction but defects in acrosome exocytosis also were noted. The authors performed thorough investigations including both targeted and unbiased approaches such as mass spectrometry. These enabled them to show that although the loss of the FBXO24 protein led to more RNA and elevated levels of some proteins, it did not change others that were previously identified in the electron-dense RNP material.

      The manuscript will be highly significant in the field because the exact functions of the electron-dense RNP materials have remained somewhat elusive for decades. Much progress has been made in the past 15 years but this work shows that the situation is more complex than previously recognized. The results show critical impacts of protein degradation in the differentiation process that enables sperm to change from non-descript round cells into highly polarized and compartmentalized mature sperm, with an equally highly compartmentalized flagellum. This manuscript also sets a high bar for the field in terms of how thorough it is, which reveals wide-ranging impacts on processes such as mitochondrial compaction and arrangement in the midpiece, the correct building of the major cytoskeletal elements in the flagellum, etc.

      We appreciate the positive comment of reviewer #2.

      Weaknesses:

      There are no real weaknesses in the manuscript that result from anything in the control of the authors. They attempted to rescue the knockout by expressing a FLAG-tagged Fbxo24 transgene, but that did not rescue the phenotype, either because of inappropriate levels/timing/location of expression, or because of interference by the tag. They also could not make anti-FBXO24 that worked for coimmunoprecipitation experiments, so relied on the FLAG epitope, an approach that successfully showed co-IP with IPO5 and SKP1.

      We could not rescue the phenotype with Fbxo24-FLAG transgene, but different Fbxo24 mutant mice show the same phenotypes (Figure S6G). Further, another group showed that Fbxo24 KO mice exhibited abnormal mitochondrial coiling [Li et al., 2024 (PMID: 38470475)], confirming that

      FBXO24 is involved in the mitochondrial sheath formation.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors found that FBXO24, a testis-enriched F-box protein, is indispensable for male fertility. Fbxo24 KO mice exhibited malformed sperm flagellar and compromised sperm motility.

      We appreciate the summary comment of reviewer #3.

      Strengths:

      The phenotype of Fbxo24 KO spermatozoa was well analyzed.

      We appreciate the positive comment of reviewer #3.

      Weaknesses:

      The authors observed numerous membraneless electron-dense granules in the Fbxo24 KO spermatozoa. They also showed abnormal accumulation of two importins, IPO5 and KPNB1, in the Fbxo24 KO spermatozoa. However, the data presented in the manuscript do not support the conclusion that FBXO24 ensures male fertility by preventing the abnormal accumulation of membraneless granules in sperm flagella, as indicated in the manuscript title.

      Fbxo24 KO mice showed abnormal accumulation of membraneless granules in sperm flagella and male infertility, suggesting that FBXO24 is involved in these processes, but there are no results that show the direct relationship as reviewer #3 mentioned. Therefore, we fixed the title.

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors):

      On page 4, lines 152-154, the authors introduce the RBGS mouse model and use it in their experiments.

      However, they left out an obvious but helpful sentence that tells the reader that they crossed the Fbxo24-null mouse with the RBGS. As one continues reading it is clear, but best to avoid even slight confusion.

      We revised the explanation in the result section (line 150-153).

      Reviewer #3 (Recommendations For The Authors):

      In this manuscript, the authors found that FBXO24, a testis-enriched F-box protein, is indispensable for male fertility. Fbxo24 KO mice exhibited malformed sperm flagellar and compromised sperm motility. The phenotype of Fbxo24 KO spermatozoa was well analyzed.

      The authors observed numerous membraneless electron-dense granules in the Fbxo24 KO spermatozoa. They also showed abnormal accumulation of two importins, IPO5 and KPNB1, in the Fbxo24 KO spermatozoa. However, the data presented in the manuscript do not support the conclusion that FBXO24 ensures male fertility by preventing the abnormal accumulation of membraneless granules in sperm flagella, as indicated in the manuscript title.

      Fbxo24 KO mice showed abnormal accumulation of membraneless granules in sperm flagella and male infertility, suggesting that FBXO24 is involved in these processes, but there are no results that show the direct relationship as reviewer #3 mentioned. Therefore, we fixed the title.

      Major comments:

      In the title, abstract, introduction, and some sections such as lines 275-276, the authors conclude that FBXO24 prevents the accumulation of importins and RNP granules during spermiogenesis. However, the provided data do not substantiate this claim. To provide conclusive evidence to support the current title, the authors need to present evidence supporting: 1) direct degradation of IPO5 and KPNB1 by FBXO24; 2) the direct requirement of IPO5 for the formation of the membraneless granules, and 3) infertility resulting from the presence of membraneless granules, rather than other issues such as abnormal ODF and AX.

      (1) direct degradation of IPO5 and KPNB1 by FBXO24.

      To examine if IPO5 can be degraded by FBXO24, we performed a ubiquitination assay using HEK293T cells. Ubiquitination of IPO5 was upregulated in the presence of WT FBXO24 but not with the mutant ΔF-box FBXO24, suggesting that IPO5 can be ubiquitinated by FBXO24. We did not examine the ubiquitination of KPNB1 because we failed to construct a plasmid vector expressing mouse KPNB1. We think that KPNB1 is not the substrate because we did not detect the interaction between FBXO24 and KPNB1 (Figure 5E). We added the results of the ubiquitination assay (Figure

      5F and line 261-265) and mentioned it in the abstract (line 35).

      (2) the direct requirement of IPO5 for the formation of the membraneless granules.

      (3) infertility resulting from the presence of membraneless granules, rather than other issues such as abnormal ODF and AX.

      We revealed that IPO5 aggregate under stress condition in COS7 cells (Figure 6C and D); however, we did not examine whether IPO5 is required for the formation of the membraneless granules. We consider that protein degradation systems such as PROTAC or Trim-Away to knockdown IPO5 at the protein level in Fbxo24 KO mice could be a good way to see if the membraneless granules are diminished and male fertility is rescued. However, it takes time to apply the degradation systems in vivo. Therefore, we would like to leave this rescue experiment for future studies. We fixed the title and  abstract (line 37-38), and removed the last sentence of the introduction.

      Also, the other group reported the analyses of Fbxo24 KO mice [Li et al., 2024 (PMID: 38470475)] right after we submitted our manuscript to the eLife. They reported not only disorganized flagellar structures but also abnormal head morphology, which may lead to male infertility. The differences from our study may be due to different mouse genetic backgrounds. We mentioned it in the discussion section (line 348-353).

      Minor comments:

      (1) The authors claimed a significant increase in the total amount of RNAs in Fbxo24 KO spermatozoa (lines 259-261), suggesting that the ...contain RNAs. More direct evidence supporting this claim should be provided.

      We show that the amounts of IPO5 and KBNB1 increased in Fbxo24 KO spermatozoa (Figure 5A and B), both of which could be incorporated into RNP granules in COS7 cells (Figure 6C and D), supporting the idea that membraneless electron-dense structures may be RNP granules. However, because we did not show direct evidence that electron-dense structures contain RNAs, we removed the sentences (line 259-261 of the 1st submission manuscript). 

      (2) The author should provide an explanation for the absence of a FLAG band in the input Tg in Figure 5D and the larger size of the IPO5 band in the FLAG-IP group compared to the input. Similar observations are also noted in Figure 5E.

      The FLAG band is weak because the protein amount is low. When we increase the contrast, we can see the FLAG band. We added an image with high contrast (Figure 5D). Sometimes, proteins run differently with SDS-PAGE after immunoprecipitation, likely due to varying protein composition in the sample. We explained it in the figure legend (line 868-869).

      (3) In Line 526, clarify the procedure for sperm purification, and determine the potential for contamination from somatic cells.

      We did not perform sperm purification, but when we observed spermatozoa obtained from cauda epididymis, we rarely observed either somatic cells or immature spermatogenic cells. We added  pictures in Figure S7. Further, we added detailed explanation about how to collect spermatozoa from the epididymis (line 549-550).

      (4) Define the Y-axis in Figure 2E, F, and G.

      We have revised the figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors investigate the impact of fecal microbiota transfer (FMT) on intestinal recovery from enterotoxigenic E. coli infection following antibiotic treatment. Using a piglet model of intestinal infection, the authors demonstrate that FMT reduces weight loss and diarrhea and enhances the expression of tight junction proteins. Sequencing analysis of the intestinal microbiota following FMT showed significant increases in Akkermansia muciniphila and Bacteroides fragilis. Using additional mouse and organoid models, the authors examine the impact of these microbes on intestinal recovery and modulation of the Wnt signaling pathway. Overall, the data support the notion that FMT following ETEC infection is beneficial, however, additional investigation is required to fully elucidate the mechanisms involved.

      Strengths:

      Initial experiments used a piglet model of infection to test the value of FMT on recovery from E. coli. The FMT treatment was beneficial and the authors provide solid evidence that the treatment increased the diversity of the microbiota and enhanced the recovery of the intestinal epithelium. Sequencing data highlighted an increase in Akkermansia muciniphila and Bacteroides fragilis after FMT.

      The mouse data are consistent with the observations in pigs, and reveal that daily gavage with A. muciniphila or B. fragilis enhances intestinal recovery based on histological analysis, expression of tight junction proteins, and analysis of intestinal barrier function.

      The authors demonstrate the benefit of probiotic treatment following infection using a range of model systems.

      Weaknesses:

      Without sequencing the pre-infection pig microbiota or the FMT input material itself, it's challenging to firmly say that the observed bloom in Akkermansia muciniphila and Bacteroides fragilis stemmed from the FMT.

      Response: We have determined the relative abundance of each bacterium in fecal bacterial suspension, referring to Hu et al. (2018). The absolute abundances of Akkermansia muciniphila and Bacteroides fragilis in the FMT were 1.3 × 103 ± 2.6 × 103 and 4.5 × 103 ± 6.1 × 103 respectively.

      Reference:

      Hu LS, Geng SJ, Li Y, et al. Exogenous Fecal Microbiota Transplantation from Local Adult Pigs to Crossbred Newborn Piglets. Front. Microbiol. 2018, 8.

      The lack of details for the murine infection model, such as weight loss and quantification of bacterial loads over time, make it challenging for a reader to fully appreciate how treatment with Akkermansia muciniphila and Bacteroides fragilis is altering the course of infection. Bacterial loads of E. coli were only quantified at one time point, and the mice that received A. muciniphila and B. fragilis had very low levels of E. coli. Therefore, it is not clear if all mice were subjected to the same level of infection in the first place. The reduced translocation of E. coli to the organs and enhanced barrier function may just reflect the low level of infection in these mice. Further, the authors' conclusion that the effect is specific to A. muciniphila or B. fragilis would be more convincing if the experiments included an inert control bacterium, to demonstrate that gavage with any commensal microbe would not elicit a similar effect.

      The weight loss was added in Figure S2A. All mice were subjected to the same level of infection in the first place.

      Many of the conclusions in the study are drawn from the microscopy results. However, the methods describing both light microscopy and electron microscopy lack sufficient detail. For example, it is not clear how many sections and fields of view were imaged or how the SEM samples were prepared and dehydrated. The mucus layer does not appear to be well preserved, which would make it challenging to accurately measure the thickness of the mucus layer.

      For light microscopy, 3-4 fields were selected from each mouse to count about 30 crypts. The method of electron microscopy was complemented on line 263-270. We have removed data of the mucus layer.

      Gene expression data appears to vary across the different models, for example, Wnt3 expression in mice versus organoids. Additional experiments may be required to clarify the mechanisms involved. Considering that both of the bacteria tested elicited similar changes in Wnt signaling, this pathway might be broadly modulated by the microbiota.

      The reason why the Wnt3 expression pattern is different in mice and in porcine intestinal organoids may be caused by the different infection periods of ETEC in vivo and in vitro. Furthermore, in vivo, the stem cell niche of intestinal stem cells is not only regulated by intestinal epithelial cells, but also affected by mesenchymal cells in connective tissues (Luo et al., 2022). However, in vitro models, stem cell niche is only regulated by epithelial secretory factors, which may also account for the differences in in vitro and in vivo results.

      It has been reported that B. fragilis pretreatment significantly increased the relative abundance of A. muciniphila in the intestine of CDI mice, and the growth and maintenance of A. muciniphila were involved in the restoration of intestinal barrier integrity after CDI infection, indicating that there might exist a bacterial metabolic symbiosis between A. muciniphila and B. fragilis (Deng et al., 2018).

      References:

      Luo HM, Li MX, Wang F, et al. The role of intestinal stem cell within gut homeostasis: Focusing on its interplay with gut microbiota and the regulating pathways. Int. J. Biol. Sci. 2022, 18(13): 5185-5206.

      Deng HM, Yang SQ, Zhang YC, et al. Bacteroides fragilis Prevents Clostridium difficile Infection in a Mouse Model by Restoring Gut Barrier and Microbiome Regulation. Front. Microbiol. 2018, 9.

      The unconventional choice to not include references in the results section makes it challenging for the reader to put the results in context with what is known in the field. Similarly, there is a lack of discussion acknowledging that B. fragilis is a potential pathogen, associated with intestinal inflammation and cancer (Haghi et al. BMC Cancer 19, 879 (2019) ), and how this would impact its utility as a potential probiotic.

      Bacteroides fragilis is one of the symbiotic anaerobes within the mammalian gut and is also an opportunistic pathogen which often isolated from clinical specimens. Bacteroides fragilis was first isolated from the pathogenic site and considered to be pathogenic bacteria. However, with the deepening of research, it is gradually realized that in the long-term evolution process, Bacteroides fragilis colonized in the gut has established a friendly relationship with the host, which is an essential component for maintaining the health of the host, especially for obesity, diabetes and immune deficiency diseases. We have supplemented the discussion on line 598-603.

      Reviewer #2 (Public Review):

      Ma X. et al proposed that A. muciniphila was a key strain that promotes the proliferation and differentiation of intestinal stem cells by acting on the Wnt/β-catenin signaling pathway. They used various models, such as the piglet model, mouse model, and intestinal organoids to address how A. muciniphila and B. fragilis offer protection against ETEC infection. They showed that FMT with fecal samples, A. muciniphila or B. fragilis protected piglets and/or mice from ETEC infection, and this protection is manifested as reduced intestinal inflammation/bacterial colonization, increased tight junction/Muc2 proteins, as well as proper Treg/Th17 cells. Additionally, they demonstrated that A. muciniphila protected basal-out and/or apical-out intestinal organoids against ETEC infection via Wnt signaling. While a large body of work has been performed in this study, there are quite a few questions to be addressed.

      Major comments:

      - The similar protective effect of FMT with fecal samples, A. muciniphila or B. fragilis is perhaps not that surprising, considering that FMT likely restores microbiota-mediated colonization resistance against ETEC infection. While FMT with fecal samples increases SCFAs, it is unclear whether/how FMT with A. muciniphila or B. fragilis alter the microbiota composition/abundance as well as metabolites in the current models in a way that offers protection.

      We examined changes in the gut microbiota of mice treated with A. muciniphila and B. fragilis through 16s rRNA, and results showed that both A. muciniphila and B. fragilis improved the alpha and beta diversities of the microbiota, while these results were not included in this manuscript.

      - Does ETEC infection in piglets/mice cause histological damage in the intestines? These data should be shown.

      The results of scanning electron microscopy (Figure 3A) showed the intestinal damage of piglets after ETEC infection. H&E staining and transmission electron microscopy (Figure 5A and 5B) showed the intestinal damage of mice after ETEC infection.

      - Line 447, "ETEC adheres to intestinal epithelial cells". However, there is no data showing the adherence (or invasion) of ETEC to intestinal epithelial cells, irrespective of piglets/mouse/organoids.

      The scanning electron microscope (Figure 3A bottom) showed that ETEC K88 infected piglets existed obvious rod-shaped bacterial adhesion on the surface of microvilli. Figure 2C showed the colonization of ETEC K88 in the jejunum and colon of piglets. Figure S2A showed the E. coli colonization in intestines and other tissues of mice.

      - In both basal-out and apical-out intestinal organoid models, A. muciniphila protects organoids against ETEC infection. Did ETEC enter into intestinal epithelial cells at all after only one hour of infection? Is the protection through certain A. muciniphila metabolites?

      It has been reported that the duration of the co-culture for studying the host-microbiota cross-talk by apical-out organoids model is 1 hour (Poletti et al., 2021). In addition, Co et al. (2019) used apical-out organoids model to study host-pathogen interactions, with Salmonella enterica serovar Typhimurium or Listeria monocytogenes invading organoids for an hour.

      References:

      Poletti M, Arnauts K, Ferrante M, et al. Organoid-based Models to Study the Role of Host-microbiota Interactions in IBD. J. Crohns Colitis. 2021, 15(7): 1222-1235.

      Co JY, Margalef-Catala M, Li XN, et al. Controlling Epithelial Polarity: A Human Enteroid Model for Host-Pathogen Interactions. Cell Reports. 2019, 26(9): 2509-2520.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Ma et al. describes a multi-model (pig, mouse, organoid) investigation into how fecal transplants protect against E. coli infection. The authors identify A. muciniphila and B. fragilis as two important strains and characterize how these organisms impact the epithelium by modulating host signaling pathways, namely the Wnt pathway in lgr5 intestinal stem cells.

      Strengths:

      The strengths of this manuscript include the use of multiple model systems and follow-up mechanistic investigations to understand how A. muciniphila and B. fragilis interacted with the host to impact epithelial physiology.

      Weaknesses:

      The major weakness is that, as presented, the manuscript is quite difficult to follow, even for someone familiar with the field. The lack of detail in figure legends, organization of the text, and frequent use of non-intuitive abbreviated group names without a clear key (ex. EP/EF, or C E A B) make comprehension challenging. The results section is perhaps too succinct and does not provide sufficient information to understand experimental design and interpretation without reading the methods section first or skipping to the discussion (as an example: WNT-c59 treatment). Extensive revisions could be encouraged to aid in communicating the potentially exciting findings.

      The abbreviations of experimental groups are firstly defined in the Methods and Materials, and we have supplemented the experimental design in the results section on line 397-399, 439-442 and 516-520.

      The bioinformatics section of the methods requires revision and may indicate issues in the pipeline. Merging the forward and reverse reads may represent a problem for denoising. Also since these were sequenced on a NovaSeq, the error learning would have to be modified or the diversity estimates would be inappropriately multiplied. "Alpha diversity and beta diversity were calculated by normalized to the same sequence randomly." Not sure what this means, does this mean subsampled? "Blast was used for sequence alignment", does this mean the taxonomic alignment? This would need to be elaborated on and database versions should be included. The methods, including if any form of multiple testing was included, for LEFSE was also not included.

      Denoising was conducted using UNOISE3 to correct for sequencing errors. Subsequent analysis of alpha diversity and beta diversity were all performed based on the output normalized data. Multiple sequence alignment was performed using MUSCLE (v3.8.31) software to obtain the phylogenetic relationships of all OTUs sequences. We have supplemented the method of multiple testing on line 323-328.

      Reviewer #1 (Recommendations For The Authors):

      At some points, the rationale for using both porcine and murine models was unclear, and it would be helpful for the reader to elaborate on the benefits of these models and why they were used in the introduction. Similarly, it would be helpful to describe the benefits of basal-in organoids versus injecting standard organoids with bacteria.

      The main subject of this study was piglets, supplemented by a mouse model for validation. Interpretation of measurements from organoid microinjection experiments must account for multiple confounding variables such as heterogeneous exposure concentrations and durations, as well as impacts of disrupting the organoid wall. We have added the description in the introduction on line 88-90.

      Line 165 -- The number of piglets used seems high, is it correct approximately 100 pigs were used?

      Nine litters were selected for processing, while only 18 piglets were finally slaughtered.

      There is very little discussion of the preliminary experiment that the authors used to determine how much bacteria to use. I recommend either discussing the data and how the doses were chosen or omitting it. It was not clear if the authors used pasteurized or live bacteria in the experiments. It would also be interesting to include a discussion of the observation that relatively low levels of Akkermansia (10^6 CFU) appeared more beneficial than the higher doses, typically used in these types of experiments.

      We removed these results. The experiments used live bacteria.

      Microscopy methods for both light microscopy and EM would be stronger with added details including how many sections and fields of view were imaged and how the numbers of goblet cells normalized across samples. Without having a clear cross-section of a crypt, it is not clear to me how the images can be used to accurately quantify the number of cells per crypt. Additional details in the methods on how many total crypts were counted should also be included.

      For light microscopy, 3-4 fields were selected from each mouse to count about 30 crypts. We have removed the data of the mucus layer and goblet cells.

      Line 236 -- missing which gene was used.

      The Genbank Accession was added on line 232-233.

      Line 310 -- OTU nomenclature.

      We have supplemented the OTU nomenclature on line 314.

      Line 413 -- This line seems inconsistent with the data analysis described in the methods section. The authors may need to expand their description of the 16S data analysis to be clear and reproducible.

      We have redescribed the 16S data analysis on line 312-328.

      Line 413 -- it is not surprising that 16s analysis did not capture species, it will have limited resolution beyond the genus level.

      We deleted this sentence.

      Methods are missing some details on the data analysis, eg. methods/programs and statistical analysis of PCoA and NMDS, LefSe.

      The methods and statistical analysis of PCoA, NMDS and LEfSe were supplemented on line 323-328.

      Fig 4C -- The images do not clearly capture the mucus layer or how it was analyzed. The sections appear to be cut at a slight angle, with multiple partial sections of crypts. I think this might make it challenging to count goblet cells, especially if the counts are normalized over the number of crypts or villi. The mucus layer does not appear well preserved. For example, I would expect to see an intact mucus layer lining the colon in the PBS control group. Re-cutting sections with a clean cross-section through the tissue will make data analysis easier.

      We have removed data of the mucus layer.

      Fig 4D -- The images appear to be of the mouse proximal colon, whereas the mucus layer and most muc2 will be in the distal colon. If the authors have tissue sections of the distal colon, this may give a clearer image of the mucus layer and might be more consistent with the TEM images in Fig. 4B.

      We apologize for the absence of the distal colon sections.

      To fully preserve the mucus layer, in addition to fixing in Carnoy's solution, the embedding process must be run without the standard washes in 70% ethanol (see: Johansson and Hansson. Methods Mol Biol. (2012) 229; doi: 10.1007/978-1-61779-513-8_13). The mucus will wash away during standard paraffin embedding if the tissue is washed with 70% ethanol, and I wonder if that has occurred in these samples.

      The tissue wasn’t washed with 70% ethanol.

      Fig 6A and 6B -- Although the legend indicates that the data is representative of two independent experiments, it is not clear how many fields of view or cells were imaged. In the bar graphs, it is not clear how many crypts were analyzed and from how many fields of view.

      3-4 fields were selected from each mouse to count about 30 crypts.

      **For all of the bar graphs, this could be addressed by displaying all of the data points, rather than just the mean, to give the reader a sense of how many cells were counted. (as was done in Fig 7B).

      We have changed the bar graphs with data points.

      498-501 -- The text says that the gene expression patterns in the organoids are consistent with the in vivo data, but the data patterns of gene expression appear to be different. For example, patterns for Wnt3 and B-catenin expression in mice, appear to be the opposite of what was observed in the organoid?

      Lines 509-512 mean that the expression patterns of mice in organoids and in vivo is consistent. Figure 7C was incorrectly written as Figure 8C, we have changed it.

      Since Akkermansia does not grow under aerobic conditions, it should be made clear that the organoid co-culture treatment does not involve actively growing bacterial cultures.

      Reunanen et al. found that Akkermansia can tolerate oxygen, more than 90% Akkermansia can keep for 1 h under oxic, 5% CO2 conditions.

      Reference:

      Reunanen J, Kainulainen V, Huuskonen L, et al. Akkermansia muciniphila Adheres to Enterocytes and Strengthens the Integrity of the Epithelial Cell Layer. Appl. Environ. Microbiol. 2015, 81(11): 3655-3662.

      Minor points

      Line 50 -"evidence".

      We have changed to “evidence” on line 49.

      Line 64, 422 - italicize, check italics throughout.

      We have checked italics throughout the manuscript.

      Line 64 - may need to be reworded.

      We have changed to “Clostridioides difficile” on line 66.

      Line 77 - pathogen.

      We have changed to “pathogen” on line 77.

      Line 161 - the.

      We have removed “the” on line 161.

      Line 178 - mouse.

      We have changed to “mouse” on line 179.

      Line 313 -- wording is confusing.

      We have changed the description on line 319-320.

      Line 318 -- Silva version #.

      The version is Silva 132. We have added it on line 316.

      Line 334 - Manufacturer for Live/Dead cell stain?

      The Live/Dead cell stain was used BD Biosciences FVS510. We have added it on line 345.

      Line 433 -- FD4 not defined until here.

      We have refined the FD4 on line 218-219.

      Line 512 -- but did not promote.

      We have changed to “but did not promote” on line 526.

      Line 517 -- Looks like this should be "basal-in organoids" instead of basal-out?

      We have changed the "basal-out" to "apical-to" on line 531.

      Line 546 -- induced neonatal should be protected?

      They are in separate pens.

      Jumps from Fig 7B to Fig 8C in the text.

      We apologize for the wrong writing, and we have change it.

      Reviewer #2 (Recommendations for The Authors):

      The title itself is a bit misleading. Please consider changing it. The authors meant that A. muciniphila prevents pathogen invasion, but does not function in pathogen invasion.

      We have changed the title.

      Major comments:

      - Figures 4A, 4D, and 6B should include presentation of cross-section pictures.

      We provided cross-section pictures to the journal.

      - Figures 7, 8, and 9 should indicate clearly whether mouse or piglet organoids are used. For instance, in the main text, line 490, it indicates piglet organoids, but in Figure 7A legend, it indicates mouse tissue.

      We apologize for the misspelling, and have changed to “mice” on line 501-502.

      - In Figure 7A, the 3rd row, 2nd panel, crypts formed into spherical organoids; whereas in Figure 8, ETEC infection of basal-out organoids formed budding organoids. This needs to be better explained.

      Mouse intestinal organoids were cultured ex vivo from crypts isolated from mice infected with ETEC, while porcine intestinal organoids were co-cultured with ETEC in vitro.

      Minor comments:

      - In the result section, the numbering of Figures or supplementary Figures is problematic, i.e it should start with Figure 1..., Figure S1, but not directly go to Figure S2A etc.

      The Figure 1 was in Materials and Methods.

      - Line 458, please add the gating strategy used in the flow cytometry study.

      The gating strategy was added on line 351-356.

      - The effect of A. muciniphila on the proliferation of intestinal epithelium through the Wnt/β-catenin signaling pathway is well known (such as PMID: 32138776). The authors should discuss this in detail.

      We have supplemented the discussion on line 637-639.

      Reviewer #3 (Recommendations For The Authors):

      It is somewhat unusual that the results from the piglets are in the supplement as this is a major strength of the manuscript (Fig S2).

      We have put these results into Figure 2 of the manuscript.

      "Collectively, our results may provide theoretical basis that FMT is a promising mitigation method for pathogenic bacteria infection and a new strategy for precise application of FMT in clinical and livestock production"- This is somewhat of an odd statement as the introduction of the manuscript completely skips over most of what is known about FMTs in the context of C. difficile. Also if anything, does the authors' own data not point mostly at using A. muciniphila on its own? Clinical trials are well underway in humans.

      We have changed the sentences to “Collectively, our results may provide theoretical basis that A. muciniphila is a promising method to repair intestinal barrier damage and a new strategy for the precise application of A. muciniphila in livestock production.” on line 98-100.

      Line 26: I am not sure probiotic is the right word here given its strict scientific definition. Perhaps beneficial or protective would be more appropriate.

      We have changed “probiotic” to “beneficial” on line 25.

      Line 27: I believe AIMD is antibiotic-induced microbiome-depletion in most usages which may be more accurate and informative than dysregulated.

      The type, dosing, and time of antibiotic we used were applied to induce microbiota disorder.

      It would appear that there are issues in the reference formatting where a number of journal names are missing.

      We have re-edited the reference formatting.

      Line 64- I believe eLife requires the standard practice of italicizing genus and species names. Also Clostridium difficile should now be referred to as Clostridioides difficile.

      We have changed to “Clostridioides difficile” and italicized it on line 66 and 569. The italicizing genus and species names were checked throughout the manuscript.

      Figure S2C: is it not clear why the melt curve was included here, but the legend should make it more clear what is being shown. I assume this is to provide evidence of specificity?

      The melting curve was used to demonstrate that only the ETEC K88 could be amplified by the primers we used. We have added an illustration in the figure legend.

      Figure 2D: there should be a quantitative analysis done on the staining of Muc2.

      We have quantified the staining of MUC2 in Figure 3D.

      Figure 3: The legends are not sufficient. For example: it is not clear what Figure 3A actually shows as the y-axis is not labelled and it is not clear what the relationship is between this and the anosim which is a function for permanova.

      Anosim analysis was performed using the R software with anosim package function based on the rank order of Bray-Curtis distance values to test the significance of differences between groups. The y-axis is the rank of the distance between samples.

      Line 416- OTU not OUT.

      We have changed to “OTU” on line 428.

      Figure 4- the naming key needs to be included in the figure legend. C, E, A, and B are immediately obvious.

      The naming key was included in the figure legend.

      Methods: additional information on the flow cytometry gating strategy/controls should be included.

      The gating strategy was added on line 351-356.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript addresses a fundamental question about how different types of communication signals differentially affect brain state and neurochemistry. In addition, their manuscript  highlights the various processes that modulate brain responses to communication signals, including prior experience, sex, and hormonal status. Overall, the manuscript is well-written and the research is appropriately contextualized.

      That being said, it remains important for the authors to think more about their analytical approaches. In particular, the effect of normalization and the explicit outlining and interpretations of statistical models. As mentioned in the original review, the normalization of neurochemical data seems unnecessary given the repeated-measures design of their analysis and by normalizing all data to the baseline data and including this baseline data in the repeated measures analysis,   one artificially creates a baseline period with minimal variation that dramatically differs in variance from other periods (akin to heteroscedasticity). If the authors want to analyze how a stimulus changes neurochemical concentrations, they could analyze the raw data but depict normalized data in their figures (similar to other papers). Or they could analyze group differences in the normalized data of the two stimulus periods (i.e., excluding the baseline period used for normalization).

      We appreciate the reviewer’s point on the difference in variance caused by including the 100% baseline values in the analysis. After consulting with our statistician, we chose the latter of the two approaches suggested by the reviewer. Specifically, we reran the analysis to exclude the baseline and focus only on the playback windows and the group differences. The text in the results, the significance signs in the figures, and the discussion are corrected accordingly. Despite these changes, our major conclusions remains as before.

      We also followed this reviewer’s suggestions to clarify the statistical model in studying the experience effect. After further consultation with our statistician, we reran the analysis on experience effect, including all the groups of EXP and INEXP animals together. We have corrected text in the figure captions, results, discussion, and data analysis sections of the manuscript related to the effect of experience and its interactions. This has not changed the conclusion made related to the experience effect in the dataset.

      It would also be useful for the authors to provide further discussion of the potential contributions of different types of experiences (mating vs. restraint) to the change in behavior and neurochemical responses to the vocalization playbacks and to try to disentangle sensory and  motor contributions to neurochemical changes.

      We have acknowledged in the Discussion that previous studies suggest that the effect of experience involving stress could be generalized. We believe that this is an important area of future research. Our Discussion acknowledges that the relationship between sensory and motor contributions to neurochemical changes remains an area of interest. We further point out that the time resolution of microdialysis data renders the suggested discussion highly speculative. We plan to use other methods to assess this in future experiments.

      Reviewer #3 (Public Review):

      The work by Ghasemahmad et al. has the potential to significantly advance our understanding of how neuromodulators provide internal-state signals to the basolateral amygdala (BLA) while an animal listens to social vocalizations.

      Ghasemahmad et al. made changes to the manuscript that have significantly improved the work. In particular, the transparency in showing the underlying levels of Ach, DA, and 5HIAA is excellent. My previous concerns have been adequately addressed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I appreciate the authors responses to my previous queries (and to the comments by other reviewers). The introduction does a better job contextualizing the data, and the additional details in the results and Methods sections help readers digest the material. I continue to think the topic  is interesting and the manuscript is potentially impactful. However, I continue to be concerned about their analytical approaches and other aspects of the revised manuscript.

      (a) Normalization

      In my original review I wrote: "The normalization of neurochemical data seems unnecessary   given the repeated-measures design of their analysis and could be problematic; by normalizing     all data to the baseline data (p. 24), one artificially creates a baseline period with minimal   variation (all are "0"; Figures 2, 3 & 5) that could inflate statistical power." I continue to feel that an analysis of normalized data that includes the baseline data is inappropriate because of the minimal variation in the normalized data for the baseline period. When the normalized data for   the baseline period is included in the analysis, there is clearly variation in the extent of variability within each of the time periods (no variability at baseline, variability during periods 1 & 2; analogous to heteroscedasticity). For example, when analyzing the RAW DATA about the change in ACh release in experienced males listening to restraint vocalizations (thank you for releasing the raw data), there was a non-significant effect of time (baseline, period 1, and period 2; linear mixed effects model; F(2,12)=3.2, p=0.0793). However, when the normalized data for  this dataset was analyzed (with baseline values being set at 100% for each mouse), there was a statistically significant effect (F(2,12)=4.5, p=0.0352). This example is just to illustrate how normalization can affect (e.g., inflate) statistical power.

      That being said, I do think that it is reasonable to analyzed normalized data if the period used for normalization is NOT included in the analysis (see Figure 3 of one of the paper the authors listed in their response to reviewers: Galvez-Marquez et al., 2022). However, from the reading of this manuscript, it does seem like normalized baseline data are analyzed to assess how stimuli affect neurochemical concentrations.

      We appreciate the reviewer’s point on the difference in variance caused by including the 100% baseline values in the analysis. After consulting with our statistician, we chose one of the two approaches suggested by the reviewer. Specifically, we reran the analysis to exclude the baseline and focus only on the playback windows and the group differences. The text in the results, the significance signs in the figures, and the discussion are corrected accordingly. Despite these changes, our major conclusions remains as before. We have included some descriptive statistics in the text because we think these are informative.

      We decided to take this approach because the inter-individual variability in the raw data levels, caused by non-experimental factors, is too great to be useful. As we have stated before, these values are affected by probe placement, collection process, or differences in the HPLC or LC/MS runs. These effects are widely recognized in the field.

      It is worth pointing out a few things about the papers listed by the authors. Li et al. (2023) does depict normalized microanalysis data but it isn't clear that any analysis of the normalized data is conducted. The same can be said about Holly et al. (2016). Further, in Bagley et al (2011), the authors depict normalized data in the figures but conduct analyses on the raw data ("After  chronic morphine treatment, systemic naloxone injection increased GABA outflow in PAG by 41% (from 24.6 {plus minus} 2.9 nM to a peak of 34.8 {plus minus} 3.8 nM, n = 6, P = 0.016), but did not alter GABA levels after vehicle treatment (39.8 {plus minus} 8.3 to 38.6 {plus  minus} 7.4 nM with naloxone at matched peak time, n = 4; Fig. 3a)". This latter approach (analyzing raw data in a repeated-measures manner and depicted normalized data) seems reasonable for the authors of the current study.

      (b) Clarification and modification of statistical models

      When analyzing the effect of experience on neuromodulator release, the authors analyze the experienced and inexperienced mice independently (e.g., figure 3 vs. 6). The ideal way to assess the effects of experience is to create a factorial model. For example, one could analyze a full factorial model with experience (exp vs. inexp), stimulus time (mating vs. restraint) and time  (baseline, period 1 vs period 2, assuming raw data are used). If one wanted to exclude the  baseline period because group differences in baseline are not informative, conducting a factorial analysis of normalized data with just the data from period 1 and 2 seems fine. I believe an analysis like this will help increase the legitimacy of the analysis. For example, when analyzing the normalized data (periods 1 and 2) of experienced and inexperienced males in response to mating or restraint vocalizations, you find a significant interaction between experience and stimulus type. Finding an effect of experience in an analysis that includes both experienced and inexperienced mice is ideal from an analytical framework.

      In Figure 6, it is not clear what the statistical model is and what the interactions mean. For example, in the figure legend for figure 6, the authors report time*context and time*sex interactions. However, in this analysis there are two groups of inexperienced males (males that   are listening to restraint vocalizations, males that are listening to mating vocalizations) and one group of females (females that are listening to mating vocalizations); in other words, this is an unbalanced analysis. So, when the authors indicate a time*context interaction, does that mean  they are comparing the male-restraint group to the combination of males and females listening to mating vocalizations? And when they talk about a time*sex interaction, are they analyzing how males listening to either mating or restraint vocalizations differ from females listening to a   mating vocalization? This all seems peculiar to me.

      - A similar set of questions could be raised about interaction effects depicted in Figure 4.

      Overall, I would like this manuscript to be reviewed by a statistician to provide additional input on how best to analyze the data.

      We followed the reviewer’s suggestions to clarify the statistical model in studying the experience effect. After further consultation with the statistician, we reran the analysis on experience effect, including all the groups of EXP and INEXP animals together.

      Design: Intercept + Sex +Context + Experience+ Sex* Experience + Context* Experience.

      The model is not full factorial as recommended by the statistician, because we don’t have females in the restraint group and that would make an unbalanced design. Therefore, running GLM based on the above model and included factors, as advised by the statistician, is the best way of approaching the analysis for the current dataset.

      We have corrected text in the figure captions, results, discussion, and data analysis sections of the manuscript related to the effect of experience and its interactions. The GLM models are clarified for all the figures in the “data analysis” section of the manuscript. We have clarified that the major effect of experience on neuromodulators was seen in the ACh data.

      (c) Analysis of post-stimulus period

      I agree with Reviewer 3 that analyzing the post-stimulus period would be useful. As mentioned     in the original review, these data could serve as an opportunity to show that the neurochemical levels returned to baseline and add further support for the model described in Figure 6. In   addition, these data could help reveal the link  between  neurochemical  release,  auditory responses, and behavior. If neurochemical changes reflect auditory responses, then these should back to baseline during the post-stimulus period. In addition, if behavioral variation (e.g.,    between mice hearing mating vs. restraint stimuli) persists following the termination of playback, then one could similarly assess whether neurochemical variation persists following playback. If   the latter is the case, then the neurochemical release could be more related to the behavior than to the playback stimulus itself.

      We did not change this analysis. Our response to Reviewer 3’s comment is shown below.

      “We decided not to include analyses of the post-stimulus period because this period is subject to wider individual and neuromodulator-specific effects and because it weakens statistical power in addressing the core question—the change in neuromodulator release DURING vocal playback. We agree that the general question is of interest to the field, but we don’t think our study is best designed to answer that question.”

      This was accepted by Reviewer 3. We also note that release patterns have multiple time courses (e.g., Aitta-aho et al., 2018 for ACh), and thus may not support an assumption that levels should return to baseline shortly after playback offset.

      Minor comments:

      Page 7, line 15: I suggest changing "vocalization-dependent" to "stimulus-dependent" because the former could connote patterns of release related to the animal itself vocalizing.

      Changed to: “There were also distinct patterns of ACh and DA release into the BLA depending on the type of vocalization playback (Fig 3C,D).”

      Discussion section: The authors should point out a few caveats with their experiments in the Discussion section. First, experienced animals received both mating (social) and restraint experiences, and it is not clear to what degree each type of experience affected neural and behavioral responses (i.e., specificity of experience effects). For example, mating experience can lead to a wide range of physiological changes, including a resilience to stress (e.g., Leuner et al., PLoS One, 2010; Arnold et al., Hormones and Behavior, 2019), so it is possible that mating experiences by themselves could have induced these changes. Or it could be that experiencing restraint stress affects responses to mating stimuli. This could be added to the first paragraph in page 16. (The authors could also discuss which aspects of the sexual encounters might be most important for the behavioral and neural plasticity.)

      We have added text to raise this issue, stating that it is unknown wither the experience effects are specific and citing the above references concerning the generalized effects of certain experiences.

      Discussion section: It would also be useful for the authors to discuss the extent to which behavior might be driving the neurochemical changes. Some of the analyses suggest that the release is independent of the behavior (e.g., reflects a sensory responses) but this could be emphasized    more in the Discussion.

      We believe that we have addressed this issue sufficiently in our previous response to related issues raised by this reviewer. As we note, there are limitations in the time resolution of microdialysis data that render the suggested discussion highly speculative. We plan to use other methods to assess this in future experiments.

      Figure 2, legend: Please note that the text above the images describes the stimulus played back to these animals and their hormonal state, and not the type of experienced they underwent (i.e.,  clarify the titles)

      Changed as requested.

      I also agree with Reviewer 3 that "mating experience" is a misnomer for this manuscript. "Social experience with a female" is a more accurate descriptor. If they wanted to specifically provide mating experience, males should have only been tested with estrus (receptive females). I don't think this wording change detracts from their findings.

      We have not changed this term. As noted in our previous response to Reviewer #3, we stated: “In the mating experience, mounting or attempted mounting was required for the animal to be included in subsequent testing.” Due to this requirement, the term “mating behavior” is informative and appropriate. In our view, “Social experience with a female” does not adequately describe our inclusion criterion or the experience.

      Reviewer #3 (Recommendations For The Authors):

      The work by Ghasemahmad et al. has the potential to significantly advance our understanding of how neuromodulators provide internal-state signals to the basolateral amygdala (BLA) while an animal listens to social vocalizations.

      Ghasemahmad et al. made changes to the manuscript that have significantly improved the work. In particular, the transparency in showing the underlying levels of Ach, DA, and 5HIAA is excellent. My previous concerns have been adequately addressed. I only have a few minor suggestions for the text and one figure.

      Minor suggestions:

      Page 2, Ln 9: add adult before male and female mice

      Changed as requested

      Page 4, Ln 10: add a period after Tsukano et al., 2019)

      Changed as requested

      Page 6, Ln 9: what did you mean by "their interaction"? Being more specific, but concise, would help the readers.

      We revised the wording to clarify that the neuromodulatory systems interact in the emission of positive and negative vocalizations.

      Page 6, Ln 17: You mention Stim 1 and Stim 2, but the stimuli are not defined at this point. The clear explanation is provided in the following paragraph. Maybe consider switching the order  and define the stimuli before you describe the liquid chromatography/mass spectrometry technique.

      We have revised and merged these paragraphs so that Stim 1 and Stim 2 are defined on first use. We also revised our description of the depiction and analysis of neurochemical data.

      Page 11, Ln 12: replace well-proven with well-documented

      Changed as requested

      Figure 2: There are two arrows pointing towards a single track. I assume one of the arrows is a duplicate. If so, delete one of the arrows. If not, please explain what the second arrow represents.

      Arrow removed

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show that upon treatment with Doxorubicin (Doxo), there is an increase in senescence and inflammatory markers in the muscles. They also show these genes get upregulated in C2C12 myoblasts when treated with conditioned media or 15d-PGJ2. 15dPGJ2 induces cell death in the myoblasts, decreases proliferation (measured by cell numbers), and decreases differentiation and fusion. 15d-PGJ2 modified Cys184 of HRas, which is required for its activation as indicated by the FRET analysis with RAF RBD. They also showed that 15d-PGJ2 activates ERK signaling, but not Akt signaling, through the electrophilic center. 15d-PGJ2 inhibits Golgi localization of HRAS (only WT, not C181 or C184 mutant). They also showed that expressing the WT HRas followed by 15d-PGJ2 treatment led to a decrease in the levels of MHC mRNA and protein, and this defect is dependent on C184. This is a well-written manuscript with interesting insights into the mechanism of action of 15d-PGJ2. However, some clarification and experiments will help the paper advance the field significantly.

      Strengths:

      The data clearly shows that 15d-PGJ2 has a negative role in the myoblast cells and that it leads to modification of HRas protein. Moreover, the induction of biosynthetic enzymes in the PGD2 pathway also supports the induction of 15d-PGJ2 in Doxorubicin-treated cells. Both conditioned media experiments and the 15d-PGJ2 experiments show that 15d-PGJ2 could be the active component secreted by the senescent myoblasts.

      Weaknesses:

      The genes that are upregulated in the muscles upon injection with Doxo are also markers for inflammation. Since Doxo is also known to induce systemic inflammation, it is important to delineate these two effects (Inflammatory cells vs senescent cells). The expression of beta Gal and other markers of senescence in the tissue sections will help to delineate these.

      As pointed out Doxo induces systemic inflammation along with inducing DNA damage-mediated senescence. Therefore, along with the inflammatory markers of the SASP (CXCL1/2, TNF1α, IL6, PTGS1/2, PTGDS) we also observed an increase in the mRNA levels of canonical markers of DNA damage-mediated senescence. We observed an increase in the mRNA levels of cell cycle and senescence associated proteins p16 and p21 (Fig. 1C). We also observed an increased nuclear accumulation of p21 (Fig. 1A) and increased levels of phosphorylated H2A.X in the nucleus (Fig. 1B).

      In Figure 2, where the defect in the differentiation of myoblasts upon treatment with 15d-PGJ2 is shown, most of the cells die within 48 hours at higher concentrations, making it difficult to perform the experiments. This also shows that 15d-PGJ2 was toxic to these cells. Lower concentrations show a decrease in the differentiation based on the lower number of nuclei in fibers and low expression of MyoD, MyoG, and MHC. However, it is unclear if this is due to increased cell death or defective differentiation. It would be a lot more informative if the cell count, cell division, and cell death could be plotted for these concentrations of the drug during the experiment.

      We measured the viability of C2C12 cells after 24 hours of treatment with 15d-PGJ2 using the MTT assay and observed that the viability of cells was decreased after treatment with 15d-PGJ2 (10 µM) but not with 15d-PGJ2 (1 µM, 2 µM, 4 µM, or 5 µM) (see Fig. S2A of the updated manuscript). The results and figures of the manuscript have been updated accordingly.

      Also, in the myoblast experiments, are the effects of treatment with Dox reversible?

      The treatment with Doxorubicin is irreversible as the senescent phenotype was not reversed after withdrawal of Doxorubicin, even after 20 days.

      In Figure 3, most of the experiments are done at a high concentration, which induces almost complete cell death within 48 hours.

      Figure 3 is an acute experiment for only 1 hour, at which time no cell death was observed. Specifically, we measured the phosphorylation of Erk and Akt proteins after 1 hour of treatment with 15d-PGJ2 (10 µM) during which we did not observe any cell death.

      Even at such a high concentration of 15dPGJ2, the increase in ERK phosphorylation is minimal.

      We observe a ~30% increase in the phosphorylation of Erk proteins after treatment with 15d-PGJ­2 in 0.2% serum medium compared to treatment with vehicle (DMSO). This is reproducible and significant.

      The experiment Figure 4C shows that C181 and C84 mutants of the HRas show higher levels in Golgi compared with WT. However, this could very well be due to the defect in palmitoylation rather than the modification with 15d-PGJ2.

      Our data does not suggest higher levels of C184S mutant in the Golgi compared with WT (Fig. S4A). We observed that the ratio of HRas levels in the Golgi to the HRas levels in the plasma membrane were similar in C2C12 cells expressing HRas C184S and HRas WT (Fig. S4A graph columns 1 and 5).

      Though the authors allude to the possibility that intracellular redistribution of HRas by 15d-PGJ2 requires C181 palmitoylation, the direct influence of C184 modification on C181 palmitoylation is not shown. To have a meaningful conclusion, the authors need to compare the palmitoylation and modification with 15d-PGJ2.

      Palmitoylation of HRas C181S is required for the localization of HRas at the plasma membrane. The inhibition of palmitoylation of C181, either by mutation (C181S) or treatment with protein palmitoyl transferase inhibitor (2-Bromopalmitate), results in the accumulation of HRas at Golgi(Rocks et al., 2005) (Fig. S4A). Modification of HRas at C184 by 15d-PGJ2 (Fig. 3A) could inhibit the palmitoylation of HRas at C181. However, our data does not support this hypothesis as modification of HRas WT by 15d-PGJ2 does not increase the level of HRas at the Golgi, like in the case of inhibition of cysteine palmitoylation due to C181S mutation.

      To test if the inhibition of myoblast differentiation depends on HRas, they overexpressed the HRas and mutants in the C2C12 lines. However, this experiment does not take the endogenous HRAs into consideration, especially when interpreting the C184 mutant. An appropriate experiment to test this would be to knock down or knock out HRas (or make knock-in mutations of C184) and show that the effect of 15d-PGJ2 disappears.

      Endogenous HRas (wild type) is present in the C2C12 cells overexpressing the EGFP-tagged HRas constructs. Therefore, we only observe a partial rescue in the differentiation after 15d-PGJ2 treatment in C2C12 cells expressing the C184S mutant (Fig. 4D and E). However, since HRas is expressed under high expression CMV promoter and in the absence of other regulatory elements, the overexpressed constructs do show a dominant effect over the endogenous HRas, showing cysteine mutant dependent inhibition of differentiation of myoblasts after treatment with 15d-PGJ2 (Fig. 4D and E).

      Moreover, in this specific experiment, it is difficult to interpret without a control with no HRas construct and another without the 15d-PGJ2 treatment.

      The mRNA levels of MyoD, MyoG, and MHC in C2C12 cells expressing HRas constructs after treatment with 15d-PGJ2 were normalized to the mRNA levels in C2C12 cells expressing corresponding constructs and were treated with vehicle (DMSO). mRNA levels in C2C12 cells treated with vehicle were not shown as they were normalized to 1. MHC protein levels in C2C12 cells expressing HRas constructs after 15d-PGJ2 treatment were normalized to that in C2C12 cells treated with vehicle (DMSO). Since the hypothesis to study the effect of HRas cysteine mutations on the differentiation of myoblasts after treatment with 15d-PGJ2, C2C12 cells expressing HRas WT serve as adequate control. Fig. 2 shows the effect of 15d-PGJ2 on muscle differentiation when HRas was not overexpressed.

      Moreover, the overall study does not delineate the toxic effects of 15d-PGJ2 from its effect on the differentiation.

      The inhibition of differentiation in C212 cells after treatment with 15d-PGJ2 cannot be attributed to the general toxicity of 15d-PGJ2 in cells. We show that the inhibition of differentiation of myoblasts after 15d-PGJ2 depends on modification of HRas at C184 i.e. failure to modify HRas at C184 (Fig. 3A) and resultant activation (Fig. 3B) by 15d-PGJ2 rescues this inhibition of differentiation of C2C12 cells (Fig. 4D and E), dissecting the inhibition of differentiation of myoblasts by 15d-PGJ2 from general toxic effects of 15d-PGJ2 on cell physiology.

      Please note that the effect of 15d-PGJ2 on cell physiology is context-specific. On one hand, 15d-PGJ2 has been shown to exert tumor-suppressor effects by inhibiting the proliferation of ovarian cancer cells and lung adenocarcinoma cells (de Jong et al., 2011; Slanovc et al., 2024), 15d-PGJ2 also exerts pro-carcinogenic effects by induction of epithelial to mesenchymal transition in breast cancer cells MCF7 and inhibition of tumor-suppressor protein p53 in MCF7 and PC-3 cells (Choi et al., 2020; Kim et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Swarang and colleagues identified the lipid metabolite 15d-PGJ2 as a potential component of senescent myoblasts. They proposed that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas, suggesting its potential as a target for restoring muscle homeostasis post-chemotherapy.

      Strengths:

      The regulation of HRas by 15d-PGJ2 is well controlled.

      Weaknesses:

      (1) I still think the novelty is limited by previous published findings. The authors themselves noted that the accumulation of 15d-PGJ2 in senescent cells has been reported in various cell types, including human fibroblasts, HEPG2 hepatocellular carcinoma cells, and HUVEC endothelial cells (PMCID: PMC8501892). Although the current study observed similar activation of 15d-PGJ2 in myoblasts, it appears to be additive rather than fundamentally novel. The covalent adduct of 15d-PGJ2 with Cys-184 of H-Ras was reported over 20 years ago (PMID: 12684535), and the biochemical principles of this interaction are likely universal across different cell types. The regulation of myogenesis by both HRas and 15d-PGJ2 has also been previously extensively reported (PMID: 2654809, 1714463, 17412879, 20109525, 11477074). The main conceptual novelty may lie in the connection between these points in myoblasts. But as discussed in another comment, the use of C2C12 cells as a model for senescence study is questionable due to the lack of the key regulator p16. The findings in C2C12 cells may not accurately represent physiological-relevant myoblasts. It is recommended that these findings be validated in primary myoblasts to strengthen the study's conclusions.

      This is the first study to show a molecular mechanism where activation of HRas signaling in skeletal myoblasts due to covalent modification by 15d-PGJ2 at C184 of HRas inhibits the differentiation of skeletal myoblasts.

      (2) The C2C12 cell line is not an ideal model for senescence study.

      C2C12 cells are a well-established model for studying myogenesis. However, their suitability as a model for senescence studies is questionable. C2C12 cells are immortalized and do not undergo normal senescence like primary cells as C2C12 cells are known to have a deleted p16/p19 locus, a crucial regulator of senescence (PMID: 20682446). The use of C2C12 cells in published studies does not inherently validate them as a suitable senescence model. These studies may have limitations, and the appropriateness of the C2C12 model depends on the specific research goals.

      Several reports have shown that cells undergo senescence independent of p16 expression. MCF7 human breast adenocarcinoma cells have been shown to undergo DNA damage mediated and Oncogene induced senescence as seen after treatment with Doxorubicin (PMID: PMC7025418) and expression of constitutively active HRas (PMID: 17135242), despite the homozygous deletion of p16 locus (ISBN 9780124375512 Chapter 17 Table 2) by upregulation of cell cycle inhibitor protein p21. In this study, we observe an increase in the senescence markers in C2C12 cells after treatment with Doxo (Fig. 1). We also observed an increase in the markers of DNA damage-mediated senescence in MCF7 after treatment with Doxo (Data will be included in the revised manuscript). Based on these observations, we have concluded that C2C12 cells undergo senescence despite lacking the p16/p19 locus.

      In the study by Moustogiannis et al. (PMID: 33918414), they claimed to have aged C2C12 cells through multiple population doublings. However, the SA-β-gal staining in their data, which is often used to confirm senescence, showed almost fully confluent "aged" C2C12 cells. This confluent state could artificially increase SA-β-gal positivity, suggesting that these cells may not truly represent senescence. Moreover, the "aged" C2C12 cells exhibited normal proliferation, which contradicts the definition of senescence. Similar findings were reported in another study of C2C12 cells subjected to 58 population doublings (PMID: 21826704), where even at this late stage, the cells were still dividing every 2 or 3 days, similar to younger cells at early passages. More importantly, I do know how the p16 was detected in that paper since the locus was already mutated. In terms of p21, there was no difference in the proliferative C2C12 cells at day 0.

      In the study by Moiseeva et al. in 2023 (PMID: 36544018), C2C12 cells were used for senescence modeling for siRNA transfection. However, the most significant findings were obtained using primary satellite cells or confirmed with complementary data.

      In conclusion, while molecular changes observed in studies using C2C12 cells may be valid, the use of primary myoblasts is highly recommended for senescence studies due to the limitations and questionable senescence characteristics of the C2C12 cell line.

      (3) Regarding source of increased PGD in the conditioned medium, I want to emphasize that it's unclear whether the PGD or its metabolites increase in response to DNA damage or the senescence state. Thus, using a different senescent model to exclude the possibility of DNA damage-induced increase will be crucial.

      Though Senescence can be induced by several stress stimuli like DNA damage, Oncogene expression, ROS, Mitochondrial Dysfunction, etc., DNA damage remains critical for the induction of the SASP (reviewed in PMID: 20078217). Also, other models of senescence, like Oncogene Induced Senescence (reviewed in PMID: 17671427), ROS Induced Senescence (PMID: 24934860), Mitochondrial Dysfunction Associated Senescence (MiDAS) (PMID: 26686024) have shown upregulation of DNA damage-associated signaling pathways. In this study, we have explored the SASP of cells undergoing senescence upon chemotherapy drug Doxorubicin-mediated DNA damage.

      (4) Similarly for the in vivo Doxorubicin (Doxo) injection, both reviewers have raised concerns about the potential side effects of Doxo, including inflammation, DNA damage, and ROS generation. These effects could potentially confound the results of the study. The physiological significance of this study will heavily rely on the in vivo data. However, the in vivo senescence component is confounded by the side effects of Doxo.

      We concur that this is a limitation of this study and the subsequent work will demonstrate the origin of prostaglandin biosynthesis after treatment with Doxo in vivo.

      (5) Figure 2A lacks an important control from non-senescent cells during the measurement of C2C12 differentiation in the presence of conditioned medium. The author took it for granted that the conditioned medium from senescent cells would inhibit myogenesis, relying on previous publications (PMID: 37468473). However, that study was conducted in the context of myotonic dystrophy type 1. To support the inhibitory effect in the current experimental settings, direct evidence is required. It would be necessary to include another control with conditioned medium from normal, proliferative C2C12 cells.

      Conditioned medium of senescent cells of several types, like senescent myoblasts in case of DM1 (PMID: 37468473), adipocytes undergoing senescence due to H2O2 treatment, Insulin Resistance, and Replicative senescence (PMID: 37321332), has been shown to inhibit the differentiation of myoblasts. Therefore, in this study, we measured the effect of prostaglandin PGD2 and its metabolites on the differentiation of myoblasts by inhibiting the biosynthesis of PGD2 in senescent myoblasts by treatment with AT-56. We inhibited the synthesis of PGD2 in senescent cells by treatment with AT-56, and then collected the conditioned medium. Conditioned medium collected from senescent C2C12 cells treated with vehicle (DMSO) served as a control for the experiment.

      (6) Statistical analyses problems.

      Only t-test was used throughout the study even when there are more than two groups. Please have a statistician to evaluate the replicates and statistical analyses used.

      In experiments with more than two groups, the t-test was used for column-wise comparison of the experiment samples to the control sample. Multiple sample comparisons using one-way or two-way ANOVA were avoided as experimental samples were individually compared to the control sample.

      For the 15d-PGJ2/cell concentration measurements in Figure 1F, there were only two replicates, which was provided in the supplementary table after required. Was that experiment repeated with more biological replicates?

      Additional replicates of the experiment will be included in the revised manuscript.

      For figure 1C, Fig 1F, 1G, 1J, 2C, 2E, 3A, 3E, 3F, 4D, 4E, please include each data points in bar graphs as used in Fig 1D, or at least provide how many biological replicates were used for each experiment?

      Appropriate revisions will be made in the figure legends of the revised manuscript.

      There is no error bar in a lot of control groups (Fig 2C, 2E, 3EF, 4E, S4B).

      There are no error bars for the control groups in the figures 2C, 2E, 3E, 3F, 4E, and S4B as the experimental samples of each replicate were normalized to the corresponding control sample, rendering the values for the control sample of each replicate to 1.

      For qPCR data in Figure 1C, the author responded in that the data in was plotted using 2-ΔCT instead of 2-ΔΔCT to show the variability in the expression of mRNAs isolated from animals treated with Saline. This statement does not align with the method section. Please revise.

      Appropriate revisions will be made to the method sections of the revised manuscript.

      (7) For Figure 1, the title may not be appropriate as there is insufficient data to support the inhibition of myoblast differentiation.

      Appropriate revisions will be made to the revised manuscript.

      Recommendations for the authors:

      After careful review, the editors advise you to carefully address the following concerns.

      (1) There were concerns that in the revised manuscript, the DMSO and Doxo experiments depicted in Figure 1H appeared quite homogenous despite the author's description to the contrary. This leads to concerns about the type of statistics employed and the possible low number of replicates of experiments shown in Fig. 1.

      (2) Experiments in Figure 1F, 1I, and 1J had as few as n=2 experiments. Figures 1C, 1D, 1F, 1G, and 1J, the statistics used a two-tailed student's t-test; for all other experiments, they marked N/A for statistics. Using a t-test for multi-group comparisons (as indicated in the figure legend) and relying on only 2 replicates for many experiments are not appropriate.

      Additional replicates for the experiments shown in figures 1F, 1I, and 1J have been done and the data will be revised along with updated statistical tests during the revision of the manuscript.

      (3) In several experiments, the difference between technical replicates is too high.

      Reviewer #1 (Recommendations For The Authors):

      Most of my concerns were addressed in the revised manuscript.

      We thank the reviewer for their time in reviewing the manuscript and consideration of the author’s response to their comments in during the previous round of review.

      Reviewer #2 (Recommendations For The Authors):

      Validating the findings in a primary myoblast is highly recommended for senescence studies due to the limitations and questionable senescence characteristics of the C2C12 cell line.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      Validate the finding in a different senescent model to exclude the possibility of DNA damage-response.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      For Fig 2A, add another control with a conditioned medium from normal, proliferative C2C12 cells.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      Please have a statistician to evaluate the replicates and statistical analyses used.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      For the barplots (figure 1C, Fig 1F, 1G, 1J, 2C, 2E, 3A, 3E, 3F, 4D, 4E), please include each data points, or at least provide how many biological replicates were used for each experiment.

      Appropriate revisions will be made in the figure legends of the revised manuscript.

      For Figure 1, the title may not be appropriate as there is insufficient data to support the inhibition of myoblast differentiation.

      Appropriate revisions will be made to the revised manuscript.


      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript provides useful information about the lipid metabolite 15d-PGJ2 as a potential regulator of myoblast senescence. The authors provide experimental evidence that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas. However, the manuscript is incomplete in its current form, as it lacks robust support from the data regarding the main conclusions related to senescence and technical concerns related to the senescence models used in this study.

      We are grateful to the editors and the reviewers for their time and comments in sharpening the science and the writing of the manuscript. We have attached a detailed response to emphasize that the manuscript does include robust evidence regarding the claims, which could have been missed during the review process. We have provided a better context for these points now.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show that upon treatment with Doxorubicin (Doxo), there is an increase in senescence and inflammatory markers in the muscles. They also show these genes get upregulated in C2C12 myoblasts when treated with conditioned media or 15d-PGJ2. 15dPGJ2 induces cell death in the myoblasts, decreases proliferation (measured by cell numbers), and decreases differentiation and fusion. 15d-PGJ2 modified Cys184 of HRas, which is required for its activation as indicated by the FRET analysis with RAF RBD. They also showed that 15d-PGJ2 activates ERK signaling, but not Akt signaling, through the electrophilic center. 15d-PGJ2 inhibits Golgi localization of HRAS (only WT, not C181 or C184 mutant). They also showed that expressing the WT HRas followed by 15d-PGJ2 treatment led to a decrease in the levels of MHC mRNA and protein, and this defect is dependent on C184. This is a well-written manuscript with interesting insights into the mechanism of action of 15d-PGJ2. However, some clarification and experiments will help the paper advance the field significantly.

      Strengths:

      The data clearly shows that 15d-PGJ2 has a negative role in the myoblast cells and that it leads to modification of HRas protein. Moreover, the induction of biosynthetic enzymes in the PGD2 pathway also supports the induction of 15d-PGJ2 in Doxorubicin-treated cells. Both conditioned media experiments and the 15d-PGJ2 experiments show that 15d-PGJ2 could be the active component secreted by the senescent myoblasts.

      Weaknesses:

      The genes that are upregulated in the muscles upon injection with Doxo are also markers for inflammation. Since Doxo is also known to induce systemic inflammation, it is important to delineate these two effects (inflammatory cells vs senescent cells). The expression of beta Gal and other markers of senescence in the tissue sections will help to delineate these.

      As pointed out Doxo induces systemic inflammation along with inducing DNA damage-mediated senescence. Therefore, along with the inflammatory markers of the SASP (CXCL1/2, TNF1α, IL6, PTGS1/2, PTGDS) we also observed an increase in the mRNA levels of canonical markers of DNA damage-mediated senescence. We observed an increase in the mRNA levels of cell cycle and senescence associated proteins p16 and p21 (Fig. 1C). We also observed an increased nuclear accumulation of p21 (Fig. 1A) and increased levels of phosphorylated H2A.X in the nucleus (Fig. 1B).

      In Figure 2, where the defect in the differentiation of myoblasts upon treatment with 15d-PGJ2 is shown, most of the cells die within 48 hours at higher concentrations, making it difficult to perform the experiments. This also shows that 15d-PGJ2 was toxic to these cells. Lower concentrations show a decrease in the differentiation based on the lower number of nuclei in fibers and low expression of MyoD, MyoG, and MHC. However, it is unclear if this is due to increased cell death or defective differentiation. It would be a lot more informative if the cell count, cell division, and cell death could be plotted for these concentrations of the drug during the experiment.

      We measured the viability of C2C12 cells after 24 hours of treatment with 15d-PGJ2 using the MTT assay and observed that the viability of cells was decreased after treatment with 15d-PGJ2 (10 µM) but not with 15d-PGJ2 (1 µM, 2 µM, 4 µM, or 5 µM) (see Fig. S2A of the updated manuscript). The results and figures of the manuscript have been updated accordingly.

      Also, in the myoblast experiments, are the effects of treatment with Dox reversible?

      The treatment with Doxorubicin is irreversible as the senescent phenotype was not reversed after withdrawal of Doxorubicin, even after 20 days.

      In Figure 3, most of the experiments are done at a high concentration, which induces almost complete cell death within 48 hours.

      Figure 3 is an acute experiment for only 1 hour, at which time no cell death was observed. Specifically, we measured the phosphorylation of Erk and Akt proteins after 1 hour of treatment with 15d-PGJ2 (10 µM) during which we did not observe any cell death. 

      Even at such a high concentration of 15dPGJ2, the increase in ERK phosphorylation is minimal.

      We observe a ~30% increase in the phosphorylation of Erk proteins after treatment with 15d-PGJ2 in 0.2% serum medium compared to treatment with vehicle (DMSO). This is reproducible and significant.

      The experiment Figure 4C shows that C181 and C84 mutants of the HRas show higher levels in Golgi compared with WT. However, this could very well be due to the defect in palmitoylation rather than the modification with 15d-PGJ2.

      Our data does not suggest higher levels of C184S mutant in the Golgi compared with WT (Fig. S4A). We observed that the ratio of HRas levels in the Golgi to the HRas levels in the plasma membrane were similar in C2C12 cells expressing HRas C184S and HRas WT (Fig. S4A graph columns 1 and 5).

      Though the authors allude to the possibility that intracellular redistribution of HRas by 15d-PGJ2 requires C181 palmitoylation, the direct influence of C184 modification on C181 palmitoylation is not shown. To have a meaningful conclusion, the authors need to compare the palmitoylation and modification with 15d-PGJ2.

      Palmitoylation of HRas C181S is required for the localization of HRas at the plasma membrane. The inhibition of palmitoylation of C181, either by mutation (C181S) or treatment with protein palmitoyl transferase inhibitor (2-Bromopalmitate), results in the accumulation of HRas at Golgi(Rocks et al., 2005) (Fig. S4A). Modification of HRas at C184 by 15d-PGJ2 (Fig. 3A) could inhibit the palmitoylation of HRas at C181. However, our data does not support this hypothesis as modification of HRas WT by 15d-PGJ2 does not increase the level of HRas at the Golgi, like in the case of inhibition of cysteine palmitoylation due to C181S mutation.

      To test if the inhibition of myoblast differentiation depends on HRas, they overexpressed the HRas and mutants in the C2C12 lines. However, this experiment does not take the endogenous HRAs into consideration, especially when interpreting the C184 mutant. An appropriate experiment to test this would be to knock down or knock out HRas (or make knock-in mutations of C184) and show that the effect of 15d-PGJ2 disappears. 

      Endogenous HRas (wild type) is present in the C2C12 cells overexpressing the EGFP-tagged HRas constructs. Therefore, we only observe a partial rescue in the differentiation after 15d-PGJ2 treatment in C2C12 cells expressing the C184S mutant (Fig. 4D and E). However, since HRas is expressed under high expression CMV promoter and in the absence of other regulatory elements, the overexpressed constructs do show a dominant effect over the endogenous HRas, showing cysteine mutant dependent inhibition of differentiation of myoblasts after treatment with 15dPGJ2 (Fig. 4D and E).

      Moreover, in this specific experiment, it is difficult to interpret without a control with no HRas construct and another without the 15d-PGJ2 treatment.

      The mRNA levels of MyoD, MyoG, and MHC in C2C12 cells expressing HRas constructs after treatment with 15d-PGJ2 were normalized to the mRNA levels in C2C12 cells expressing corresponding constructs and were treated with vehicle (DMSO). mRNA levels in C2C12 cells treated with vehicle were not shown as they were normalized to 1. MHC protein levels in C2C12 cells expressing HRas constructs after 15d-PGJ2 treatment were normalized to that in C2C12 cells treated with vehicle (DMSO). Since the hypothesis to study the effect of HRas cysteine mutations on the differentiation of myoblasts after treatment with 15d-PGJ2, C2C12 cells expressing HRas WT serve as adequate control. Fig. 2 shows the effect of 15dPGJ2 on muscle differentiation when HRas was not overexpressed.

      Moreover, the overall study does not delineate the toxic effects of 15d-PGJ2 from its effect on the differentiation.

      The inhibition of differentiation in C212 cells after treatment with 15d-PGJ2 cannot be attributed to the general toxicity of 15d-PGJ2 in cells. We show that the inhibition of differentiation of myoblasts after 15d-PGJ2 depends on modification of HRas at C184 i.e. failure to modify HRas at C184 (Fig. 3A) and resultant activation (Fig. 3B) by 15d-PGJ2 rescues this inhibition of differentiation of C2C12 cells (Fig. 4D and E), dissecting the inhibition of differentiation of myoblasts by 15d-PGJ2 from general toxic effects of 15d-PGJ2 on cell physiology.

      Please note that the effect of 15d-PGJ2 on cell physiology is context-specific. On one hand, 15d-PGJ2 has been shown to exert tumor-suppressor effects by inhibiting the proliferation of ovarian cancer cells and lung adenocarcinoma cells (de Jong et al., 2011; Slanovc et al., 2024), 15d-PGJ2 also exerts pro-carcinogenic effects by induction of epithelial to mesenchymal transition in breast cancer cells MCF7 and inhibition of tumor-suppressor protein p53 in MCF7 and PC-3 cells (Choi et al., 2020; Kim et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Swarang and colleagues identified the lipid metabolite 15d-PGJ2 as a potential component of senescent myoblasts. They proposed that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas, suggesting its potential as a target for restoring muscle homeostasis post-chemotherapy.

      Strengths:

      The regulation of HRas by 15d-PGJ2 is well controlled.

      Weaknesses:

      The novelty of the study is compromised as the activation of PGD and 15d-PGJ2, as well as the regulation of HRas and cell proliferation, have been previously reported. 

      Literature does not support this statement, and it is important to clarify this misimpression for the field as a whole. 

      Let us clarify- 

      Covalent modification of HRas by 15d-PGJ2 has been reported only twice in the literature(Luis Oliva et al., 2003; Yamamoto et al., 2011) in fibroblasts and neurons respectively. 

      Interaction between Hras and 15d-PGJ2 in skeletal muscles has not been shown before, even though both Hras and 15d-PGJ2 are shown to be key regulators of muscle homeostasis. 

      Activation of Hras by 15d-PGJ2 was reported first by Luis Oliva et al (Luis Oliva et al., 2003). However, this study does not comment on the functional implications of activation of Hras signaling. 

      Recently, our lab contributed to a study where the functional implication of activation of Hras signaling due to covalent modification by 15d-PGJ2 was shown in the maintenance of senescence phenotype (Wiley et al., 2021). 

      15d-PGJ2 was shown to inhibit the differentiation of myoblasts by Hunter et al (Hunter et al., 2001). This study hypothesized that the inhibition of myoblast differentiation is via 15d-PGJ2 mediated activation of the PPARγ signaling, the study also showed inhibition of myoblast differentiation independent of PPARγ activity, suggesting the presence of other mechanisms.

      This is the first study to show a molecular mechanism where activation of Hras signaling in skeletal myoblasts due to covalent modification by 15d-PGJ2 at C184 of Hras inhibits the differentiation of skeletal myoblasts.

      Additionally, there are major technical concerns related to the senescence models, limiting data interpretation regarding the relevance to senescent cells.

      Major concerns:

      (1) The C2C12 cell line is not an ideal model for senescence study due to its immortalized nature and lack of normal p16 expression. A more suitable myoblasts model is recommended, with a more comprehensive characterization of senescence features.

      C2C12 is a good model for DNA damage-based senescence that is used in this manuscript. Several reports in the literature have shown the induction of senescence in C2C12 cells. Moiseeva et al 2023 show induction of senescence in C2C12 cells after etoposide-mediated DNA damage. Moustogiannis et al 2021 show the induction of replicative senescence in C2C12 cells. In this study, we show that C2C12 cells undergo DNA damage-mediated senescence after treatment with Doxo. We measured the induction of senescence in C2C12 cells upon DNA damage using several physiological (Nuclear Size, Cell Size, and SA β-gal) and molecular markers (mRNA levels of p21 and SASP factors (IL6 and TGFβ), protein levels of p21) of senescence (see Fig. 1 of the updated manuscript). The results and the figures in the manuscript have been updated accordingly.

      (2) The source of increased PGD or its metabolites in the conditioned medium is unclear. Including other senescence models, such as replicative or oncogeneinduced senescence, would strengthen the study.

      Fig. 1E shows time-dependent increase in the expression of PGD2 biosynthetic enzymes in senescent C2C12 cells. Fig. 1F shows an increase in the levels of 15dPGJ2 secreted by senescent C2C12 cells in the conditioned medium. This data shows that senescent C2C12 cells are the source of PGD and its metabolites in the conditioned medium.

      Again, C2C12 is not suitable for replicative senescence due to its immortalized status.

      We and others have shown that C2C12 cells undergo senescence, and this manuscript only used DNA damage induced senescence.

      (3) In the in vivo part, it is unclear whether the increased expression of PTGS1, PTGS2, and PTGDS is due to senescence or other side effects of DOXO.

      We concur that this is a limitation of this study and the subsequent work will demonstrate the origin of prostaglandin biosynthesis after treatment with Doxo in vivo.

      (4) Figure 2A lacks an important control from non-senescent cells during the measurement of C2C12 differentiation in the presence of a conditioned medium.

      Figure 2A tests the effect of prostaglandin PGD2 and its metabolites secreted by the senescent cells on the differentiation of myoblasts. Therefore, we inhibited the synthesis of PGD2 in senescent cells by treatment with AT-56, and then collected the conditioned medium. Conditioned medium collected from senescent C2C12 cells treated with vehicle (DMSO) served as a control for the experiment, whereas differentiation of C2C12 cells without any treatment serves as a positive control.

      There is no explanation of how differentiation was quantified or how the fusion index was calculated.

      The fusion index was calculated using a published myotube analyzer software (Noë et al., 2022). Appropriate information has been added to the materials and methods section of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 3: Expand SA in "SA β-gal".

      The manuscript has been updated accordingly (See line 3).

      Line 68: HRas is highly regulated by lipid modifications.

      The manuscript has been updated accordingly (See line 67).

      Figures

      Figure S1A seemed incomplete (maybe some processing issue).

      The Figure has been updated in the revised manuscript (See Fig. S1A).

      Figure S1B-H are mislabeled.

      The figure has been updated in the revised manuscript (See Fig. S1C, D, E, and F).

      Figures S1E-H are not mentioned in the manuscript.

      The manuscript has been updated accordingly (See line 120).

      Many supplementary figures are not cited in the article.

      The manuscript has been updated accordingly. (See lines 85, 120, 123, 166, 225, 356, 364, 412, and 413)

      Reviewer #2 (Recommendations For The Authors):

      (1) Clarify the injection method for Doxorubicin in B6J mice on line 83 (IP or IM).

      Mice were injected intraperitoneally with Doxorubicin (as mentioned in the materials and methods, see lines 83 and 794)

      (2) Address missing information in figures or figure legends.

      There is missing piece in Sup Fig 1A.

      The figure has been updated in the revised manuscript (See Fig. S1A).

      Correct labels in Sup Fig 1C and 1D.

      The figure has been updated in the revised manuscript (See Fig. S1C, D, E, and F).

      How would the authors explain the dramatic differences in the morphology of C2C12 cells treated with DOXO between bright field and SA-beta-gal staining images in Sup Fig 1B and 1C.

      The SA β-gal image after treatment with Doxo does show a flattened cell morphology. Another field of view from the same experiment has been added in the figure to show the difference in the cell morphology more prominently in the revised manuscript (See Fig. 1H).

      Provide explanations for Sup Fig 1E-1G, including the meaning of the y-axis and the blue dots and red lines.

      We have provided an explanation for the multiple reaction monitoring mass spectrometry used to measure the concentration of 15d-PGJ2 in the conditioned medium in the revised manuscript (see lines 119-130 and the legends of Fig. S1C, D, and E)

      (3) Please review the calculation of qPCR data in Figure 1C for correctness, ensuring reference samples with an average expression level of 1.

      The data in Fig. 1C was plotted using 2-ΔCT instead of 2-ΔΔCT to show the variability in the expression of mRNAs isolated from animals treated with Saline.

      (4) Please explain the calculation of 15d-PGJ2/cell concentration in Figure 1F and provide raw data for review, considering the substantial changes and small error bars. The method or result section lacks an explanation of how this calculation was performed. Additionally, there is no mention of the cell number count.

      All the raw values (concentration of 15d-PGJ2 measured using mass spec and cell numbers counted at the time of collection of conditioned medium) are provided in the supplementary table 1. The standard curve to calculate the concentration of 15dPGJ2 in the conditioned medium is shown in Fig. S1F. The cell number was counted after trypsinization using a hemocytometer on the day of collection of the conditioned medium.

      (5) Please clarify how cell number normalization and doubling time calculation were done in Fig 2B. Consider replacing the figure with a growth curve showing confluence on the y-axis for easier interpretation.

      Cells were counted every 24 hours and the normalization was done to the number of cells counted on day 0 of the treatment (to consider attaching efficiency and other cell culture parameters). Doubling time was calculated as the reciprocal of the slope of the graph of log2(normalized cell number) vs time.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1

      The paper is overall convincing. However, a little more attention to data presentation and possibly the addition of at least another technique (see below) would greatly strengthen the findings.

      As we hope to demonstrate below, we have taken steps to improve our manuscript on both fronts (data presentation and experimental evidence).

      The absence of statistics catches immediately the eye. I am sure that the shown differences are statistically significant (thanks to the number of analyzed cells), but reporting the result of some statistical test would help the reader in identify the relevant data in a plot. This is somehow necessary considering that sometimes in the text something is deemed to be "significant" or "not significant", and I felt that I really needed that when looking at the plot in Fig. 3D.

      To facilitate the interpretation of figures that contain data from multiple strains (such as the one mentioned by the reviewer), we have carried out a nonparametric single-step multiple comparison test (Games-Howell) to identify mutants whose means differ significantly from each other. To avoid overcrowding the figures, we have graphically summarized the p-values of all pairwise comparisons in a small matrix within the corresponding panel, and provided 99% confidence intervals and p-values of all differences in the Supplement.

      Related to the previous point: for every N/C distribution analysis, a number of analyzed cells is reported. By the way it is written, it seems that the replication relies solely by the cells in that specific population, i.e.: each cell is treated as a replicate. At least I could not find if that is not the case in the legends or in the methods. I wonder what the results would be (and their significance) if each replicate would be a new assay on another population.

      Cell populations exhibit significant variability in their phenotypic characteristics. Consequently, the quantification of a specific feature (e.g., the Sfp1 nuclear/cytoplasmic ratio) across a sample of cells from a given population results in a distribution rather than a single fixed value. For each quantification, we report the number of cells that were used to construct the corresponding distribution, i.e. the sample size. To compare samples from different populations (e.g., different Sfp1 mutant strains), we run them in parallel during microscopy experiments and compare their means, as described above. Throughout our study, we have tried to ensure that we quantify a sufficiently large number of cells to overcome cell-to-cell variability and enhance the reliability of our results.

      In this context, the question of the reviewer is not entirely clear to us, as individual measurements of a sample are not replicates. However, one can replicate the entire experiment on a different day by re-growing the different strains, running microscopy, quantifying the new movies etc. In this sense, the experiments shown in the manuscript consist of single replicates, i.e. experiments that were carried out on the same day, with all the relevant mutants and controls quantified together. However, we have monitored many of our mutants multiple times over the course of our work. For example, Fig. 1 below shows replicates of the Sfp1 N/C ratio distributions at steady-state in the analog-sensitive (A) and wild-type (B) background, which were quantified several times across various experiments. While day-to-day variability in the empirical distributions of the same mutant exists to a small extent, it is quite small.

      The scale of x axes in N/C ratio plots. Besides not being consistent throughout the figures, it originates from 1, visually enhancing the differences.

      We believe the reviewer was referring to the y-axes, as the x-axes represent time. Summarizing the N/C ratio dynamics of different Sfp1 mutants has been challenging. First, the average N/C ratios at steady-state vary considerably across different mutants, as shown in the panels that summarize steady-state N/C ratios. To compare the magnitude and features of their responses, normalization is necessary. We chose to normalize the time series of each mutant to have a mean of 1 prior to the onset of a perturbation. This allows the normalized time series to represent the percentage-wise changes in the Sfp1 N/C ratio upon perturbation.

      Using a common y-axis scale for all plots of N/C ratio dynamics not ideal, as some responses are subtler than others. Additionally, we do not believe that N/C dynamics across different figures need to (or should) be compared to each other. However, within a figure, panels that require comparison are placed in the same row and share the same y-axis scale. We believe that this approach optimizes data visualization and facilitates important visual comparisons.

      Related to the previous point: it is evident from the plots that the N/C ratio is always positive, even in the most deficient of the analyzed mutants. This implies that a relevant fraction of Sfp1 is still nuclear. I thus wonder what the impact of these mutations would be on the actual function of Sfp1. For this reason, I feel that qPCR evaluation of transcripts of Sfp1 target genes is particularly needed. Since lack of Sfp1 is known to yield some of the smallest cells possible, it would also be cool to have an estimate of the size of mutants where Sfp1 is less nuclear. These analyses could confer phenotypical relevance to the data, but would also help in assessing a currently unexplored possibility, that phosphorylation events by PKA influence Sfp1 function besides its localization, i.e.: the still somehow nuclear fraction is not as functional as wt Sfp1 in promoting transcription.

      It is indeed the case that the recorded N/C ratios are larger than 1 in all strains that we have monitored. We have never observed an N/C ratio smaller than 1 using widefield microscopy for two main reasons: first, out-of-focus light from the cytosol above and below the nucleus is added to the nuclear signal, causing the nuclear signal to always be non-zero, even for predominantly cytosolic proteins. Second, both in- and out of focus vacuoles are devoid of the fluorescent protein fusions that we quantify, which reduces the average brightness of the cytosol. For these reasons, even when a protein is largely cytosolic, the average N/C ratio over a cell population is no lower than around 1.5. Keeping these points in mind, one can observe that our most delocalized Sfp1 mutants have an N/C ratio that is around 1.6-1.7, which is very close to the lower limit. This means that these Sfp1 mutants are largely cytosolic, and the nuclear fraction (if non-zero) is quite small.

      We agree that assessing the phenotypic relevance of Sfp1 mutations is of interest. However, this was impossible with our original strains, as we introduced each Sfp1 mutant as an extra copy in the HO locus while leaving the endogenous Sfp1 locus intact. This was done in order to avoid any phenotypic changes that might result from changes in Sfp1 activity.

      To address the suggestion of the reviewer, we therefore deleted the endogenous Sfp1 copy in strains carrying sfp1PKA2A, sfp1PKA2D and sfp113A, leaving only the mutated Sfp1 copy at the HO locus. Surprisingly, the growth rate and drug sensitivity (determined by halo assays) of these single-copy mutants did not differ much in comparison to the mutants carrying the functional Sfp1 copy and from the wild-type (Supp. Figs. 4J and 7). This observation aligns with findings for the single-copy sfp1-1 mutant in [Lempiäinen et al. 2009], which corresponds to sfp1TOR7A in our work. [Lempiäinen et al. 2009] had suggested that Sch9 compensates for the loss of Sfp1 activity via a feedback mechanism, which could explain our results as well. If this is the case, acute depletion of wild-type Sfp1 could unveil transient changes in cell growth, before the compensatory effect of Sch9 was established. Unfortunately, we were unable to efficiently degrade wild-type Sfp1 carrying a C-terminal auxin-inducible degron. Instead, we followed the same approach with [Lempiäinen et al. 2009] and deleted SCH9.

      As we describe in the last section of Results, the difference was dramatic for sfp113A __mutants, which were extremely slow-growing in the absence of Sch9 (doubling time was around 4 hours, but it was hard to estimate because we could not grow the cells consistently). Interestingly, SCH9 deletion had a negative impact on sfp1__PKA2D __but not sfp1__PKA2A __cells (__Supp. Fig. 7). Overall, these results demonstrate that Sch9 can compensate for loss of Sfp1 activity, which makes it challenging to study the impact of Sfp1 mutations on cellular phenotypes.

      To further understand to what extent Sch9 compensates for loss of Sfp1 phosphorylation, we carried out RNA-seq on WT and cells carrying a single copy of sfp113A (with the endogenous SFP1 copy removed). Despite the fact that sfp113A __grow as well as WT, RNA-seq picked up several differentially expressed genes related to amino acid biosynthesis. This surprising finding is presented in the last section of Results, and in __Supplementary Figures 8, 9 and 10. We explore the relevance of these results and their connection with past literature on Sfp1 and Sch9 in the Discussion section.

      I found some typos here and there, and it would greatly help to report them if in the manuscript line numbers were included.

      We apologize for the typos. We have tried to eliminate them, and we have also added line numbers to the manuscript.

      Reviewer 2

      There is no biochemical evidence presented that the putative PKA sites (S105 and S136) are genuinely phosphorylated by PKA. The fact that they match the PKA consensus motif, alone, does not guarantee this. In order to claim that they are looking at the effect of PKA by mutagenizing these residues, the authors have to demonstrate the PKA-dependency of S105 and S136 phosphorylation by, for example, mass spec experiments or western blotting with phospho-specific antibodies (Cell Signaling Technology #9624 for example). Also, does the band-shift caused by PKA inhibition (Fig 3C) is canceled by the S105A/S136A mutation?

      We took several actions to demonstrate that the putative PKA sites are indeed phosphorylated by PKA. We first tried to detect Sfp1 phosphorylation using the antibody mentioned by the reviewer, but failed as the sensitivity of this antibody appears to be quite low. On the other hand, mass spectrometry did not produce the right fragments to detect the sites of interest. We therefore resorted to an in vitro kinase assay using [γ-32P]ATP together with purified PKA and Sfp1. Unfortunately, bacterial overexpression of MBP-tagged Tpk1, Tpk2 and Tpk3 (the catalytic subunits of PKA) was quite challenging and we were unable to produce soluble protein. We therefore resorted to commercially available bovine PKA (bPKA, PKA catalytic subunit, Sigma-Aldrich 539576), which shows high homology to the yeast Tpk kinases [Toda et al. 1987]. Moreover 87% of bPKA substrates have been shown to also be Tpk1 substrates [Ptacek et al. 2005], and bPKA has been used to identify new Tpk substrates in budding yeast [Budovskaya et al. 2005__]. As we show in the revised manuscript, bovine PKA does phosphorylate Sfp1. Moreover, phosphorylation is reduced by 50% in the double S105A, S136A mutant (Fig.1F), and becomes undetectable in the 13A mutant__ (Supp Fig. 6). Together with the rapid response of Sfp1 localization to acute PKA inhibition which we had already reported, we believe that these results provide strong evidence that Sfp1 is a direct PKA substrate, and that the two phosphosites that we identified are functional.

      As the above in vivo experiments do not exclude S105/S136 phosphorylation by other kinases downstream of PKA, in order to claim the direct phosphorylation, the authors need in vitro PKA kinase assay. These biochemical experiments are not trivial, but I think absolutely necessary for this story.

      One cannot exclude that S105/S136 are also phosphorylated by other kinases of the AGC family (note that [Lempiäinen et al. 2009] has already excluded Sch9). However, as we hope to have shown, PKA indeed phosphorylates Sfp1. Examining if other kinases besides PKA and TORC1 target Sfp1 is a very interesting question that should be addressed in future work.

      The authors only look at the localization of Sfp1. To assess its functionality and so physiological impact, it would be informative to measure the mRNA level of target ribosomal genes in various Sfp1 mutants they created.

      As we described in our response to Reviewer 1 above, we did perform RNA-seq on WT and cells carrying a single copy of sfp113A. We observed a notable absence of differentially expressed ribosomal genes and ribosome-related categories in the GO analysis (Supp. Figs. 8, 9 and 10). Together with our observations on SCH9 deletion (Supp. Fig. 7), these results suggest that Sch9 can largely compensate for the loss of Sfp1 activity. On the other hand, the emergence of differentially expressed amino acid biosynthesis genes is a finding that merits further investigation, as it connects with previous observations made with Sch9 deletion mutants and the [ISP+] prion form of Sfp1 (cf. Discussion).

      In the experiments using analog-sensitive PKA (Fig 1D and E for example), they directly compare wildtype-PKA versus analog sensitive-PKA, or with 1-NM-PP1 versus without 1-NM-PP1. This makes interpretation difficult, particularly because 1-NM-PP1 itself has a significant impact even in the wild PKA strain. The real question is the difference between wild-type Sfp1 versus mutant Sfp1. In the current form, they compare Fig 1D versus 1E, these two do not look like a single, side-by-side experiment. They should compare wild-type Sfp1 versus mutant Sfp1 side-by-side.

      Figure 1D shows that 1-NM-PP1 has a transient off-target effect on Sfp1 localization in WT cells, which could also affect Sfp1 mutants. This observation prompted us to use wild-type PKA as a control when testing the effect of 1-NM-PP1 on sfp1PKA2D in cells carrying PKAas (Figure 1E). As Fig. 1E shows, the effect of 1-NM-PP1 on sfp1PKA2D localization in PKAas cells is quite similar to the off-target effect in cells carrying sfp1__PKA2D __and wild-type PKA. This behavior of sfp1__PKA2D __is clearly different from the response of wild-type Sfp1 to PKAas inhibition, which results in sustained delocalization. We have made the latter observation repeatedly, both in this study and our previously published work [Guerra et al. 2021].

      In Figure 3, the argument around the additive effects of PKA and TORC1 is confusing. The authors say they are additive referring Figure 3E, but say they are not additive referring Figure 3B. Which is true? In fact, Figure 3B appears to show an additive effect as well.

      We did not use the word "additive" in the text, because we find it difficult to interpret. Instead, we state that PKA and TORC1 appear to control Sfp1 phosphorylation independently of each other. PKA and TORC1 phosphorylation converges to the same response, affecting Sfp1 localization. It appears that loss of either kinase delocalizes Sfp1, while loss of both kinases may only have a small additional effect.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study identifies new types of interactions between Drosophila gustatory receptor neurons (GRNs) and shows that these interactions influence sensory responses and behavior. The authors find that HCN, a hyperpolarization-activated cation channel, suppresses the activity of GRNs in which it is expressed, preventing those GRNs from depleting the sensillum potential, and thereby promoting the activity of neighboring GRNs in the same sensilla. HCN is expressed in sugar GRNs, so HCN dampens the excitation of sugar GRNs and promotes the excitation of bitter GRNs. Impairing HCN expression in sugar GRNs depletes the sensillum potential and decreases bitter responses, especially when flies are fed on a sugar-rich diet, and this leads to decreased bitter aversion in a feeding assay. The authors' conclusions are supported by genetic manipulations, electrophysiological recordings, and behavioral assays.

      Strengths:

      (1) Non-synaptic interactions between neurons that share an extracellular environment (sometimes called "ephaptic" interactions) have not been well-studied, and certainly not in the insect taste system. A major strength of this study is the new insight it provides into how these interactions can impact sensory coding and behavior.

      We appreciate the reviewer’ view that our findings may allow researchers to better understand sensory coding and behavior. However, we respectfully disagree that the SP homeostasis in Drosophila gustation we describe here pertains to ephaptic interaction. Although SP reduction was proposed as the basis of post-ephaptic hyperpolarization in Drosophila olfaction, we find that SP changes are found to be too slow to mediate the fast action of ephaptic inhibition in gustation, reported in the ref#17. We observed a slow, sweet-dependent SP depletion (Fig. 5B, revised), which takes more than one hour. The real-time change of SP was also slow even upon contact with 200-mM sucrose; this result was set aside for another manuscript in preparation. Therefore, we believe the main findings in this paper concern the homeostatic preservation of SP for the maintenance of gustatory function, not ephaptic interaction.

      (2) The authors use many different types of genetic manipulations to dissect the role of HCN in GRN function, including mutants, RNAi, overexpression, ectopic expression, and neuronal silencing. Their results convincingly show that HCN impacts the sensillum potential and has both cell-autonomous and nonautonomous effects that go in opposite directions. There are a couple of conflicting or counterintuitive results, but the authors discuss potential explanations.

      (3) Experiments comparing flies raised on different food sources suggest an explanation for why the system may have evolved the way that it did: when flies live in a sugar-rich environment, their bitter sensitivity decreases, and HCN expression in sugar GRNs helps to counteract this decrease.

      Weaknesses/Limitations:

      (1) The genetic manipulations were constitutive (e.g. Ih mutations, RNAi, or misexpression), and depleting Ih from birth could lead to compensatory effects that change the function of the neurons or sensillum. Using tools to temporally control Ih expression could help to confirm the results of this study.

      We attempted to address this point by using the tub-Gal80ts system. The result is now included as Fig. 1-figure supplement 2. At 29C, a non-permissive temperature for GAL80ts which allows GAL4-dependent expression Ih-RNAi, we observed that bGRN responses were decreased and sGRN responses were increased compared to the control maintained at 18°C, and this is in parallel with the result in Fig. 1C,D. For this experiment, we inserted “To exclude the possibility that Ih is required for normal gustatory development, we temporally controlled Ih RNAi knockdown to occur only in adulthood, which produced similar results (Fig. 1-figure supplement 2).” (~line 113).

      (2) The behavioral experiment shows a striking loss of bitter sensitivity, but it was only conducted for one bitter compound at one concentration. It is not clear how general this effect is. The same is true for some of the bitter GRN electrophysiological experiments that only tested one compound and concentration.

      We conducted additional behavioral experiments with other bitters such as lobeline and theophylline (Fig. 5-figure supplement 1), which showed sensitivity losses in Ih mutants similar to caffeine. For these results, the following is inserted at ~line 274: “These results were recapitulated with other bitters, lobeline and theophylline (Fig. 5-figure supplement 1).”

      We also added single sensillum recording data with bitters, berberine, lobeline, theophylline and umbelliferone, which yielded results similar to those obtained with caffeine (Fig. 1-figure supplement 1). This is described with the sentence at ~line 105 “Other bitter chemical compounds, berberine, lobeline, theophylline, and umbelliferone, also required Ih for normal bGRN responses (Fig. 1-figure supplement 1).”

      (3) Several experiments using the Gal4/UAS system only show the Gal4/+ control and not the UAS/+ control (or occasionally neither control). Since some of the measurements in control flies seem to vary (e.g., spiking rate), it is important to compare the experimental flies to both controls to ensure that any observed effects are in fact due to the transgene expression.

      We appreciate the reviewers for raising this point. Indeed, there was a small logical flaw with the controls. We have now included all the necessary controls for Fig. 1C-F, Fig. 2I,J, Fig. 4E, and Fig. 5D, as reviewers suggested. These experiments remained statistically significant after including the new control groups.

      (4) I was surprised that manipulations of sugar GRNs (e.g. Ih knockdown, Gr64a-f deletion, or Kir silencing) can impact the sensillum potential and bitter GRN responses even in experiments where no sugar was presented.

      We are afraid there is a misunderstanding on the early part of the paper. We suspected that the manipulations impacted bGRNs and SP due to the sweetness in the regular cornmeal food, as stated in lines 214-220 “Typically, we performed extracellular recordings on flies 4-5 days after eclosion, during which they were kept in a vial with fresh regular cornmeal food containing ~400 mM D-glucose. The presence of sweetness in the food would impose long-term stimulation of sGRNs, potentially requiring the delimitation of sGRN excitability for the homeostatic maintenance of gustatory functions. To investigate this possibility, we fed WT and Ihf03355 flies overnight with either non-sweet sorbitol alone (200 mM) or a sweet mixture of sorbitol (200 mM) + sucrose (100 mM).”

      I believe the authors are suggesting that the effects of sugar GRN activity (e.g., from consuming sugar in the fly food prior to the experiment) can have long-lasting effects, but it wasn't entirely clear if this is their primary explanation or on what timescale those long-lasting effects would occur. How much / how long of a sugar exposure do the flies need for these effects to be triggered, and how long do those effects last once sugar is removed?

      We attempted to address this point with additional experiments (Fig. 5A,B). The reduction of SP could be observed in WT and HCN-deficient mutants with similar degrees 1 hr after the flies were transferred from nonsweet sorbitol-containing vials to sweet sucrose-containing ones. Moreover, the mutants, but not WT, showed further depression of SP when the sweetness persisted in the media for 4 hrs and overnight. This long-term exposure to sweetness longer than 1 hr may simulates the feeding on the regular sweet cornmeal food. The recovery of SP was also tested by removing flies from the sweet media after overnight-long sweet exposure and placing them in sorbitol food. SPs of WT and the mutants were recovered to the similar levels 1 hr after separating the animals from sweetness, although the HCN-lacking mutants showed much lower SP right after overnight sweetness exposure. The unimpaired recovery of the mutants suggests that HCN is independent of generating transepithelial potential itself. Therefore, regardless of HCN, SP changes are not fast even in the presence of strong sweetness, and SP is much better guarded when sGRNs express HCN in a sweet environment.

      We inserted the following at ~line 260 to describe the newly added recovery experiment: “Following overnight sweet exposure, SPs of WT and Ihf03355 were recovered to similar levels after 1-hr incubation with sorbitol only food. However, it was after 4 hrs on the sorbitol food that the two lines exhibited SP levels similar to those achieved by overnight incubation with sorbitol only food (Fig. 5B). These results indicate that SP depletion by sweetness is a slow process, and that the dysregulated reduction and recovery of SPs in Ihf03355 manifest only after long-term conditioning with and without sweetness, respectively.”.

      (5) The authors mention that HCN may impact the resting potential in addition to changing the excitability of the cell through various mechanisms. It would be informative to record the resting potential and other neuronal properties, but this is very difficult for GRNs, so the current study is not able to determine exactly how HCN affects GRN activity.

      On this point, we cannot but rely on previous studies of biophysical and electrophysiological characterization on mammalian HCN channels and a heterologous expression study that revealed a robust hyperpolarization-activated cation current from Drosophila HCN channels (PMID: 15804582).

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors start by showing that HCN loss-of-function mutation causes a decrease in spiking in bitter GRNs (bGRN) while leaving sweet GRN (sGRN) response in the same sensillum intact. They show that a perturbation of HCN channels in sweet-sensing neurons causes a similar decrease while increasing the response of sugar neurons. They were also able to rescue the response by exogenous expression. Ectopic expression of HCN in bitter neurons had no effect. Next, they measure the sensillum potential and find that sensillum potential is also affected by HCN channel perturbation. These findings lead them to speculate that HCN in sGRN increases sGRN spiking which in turn affects bGRNs. To test this idea that carried out multiple perturbations aimed at decreasing sGRN activity. They found that decreasing sGRN activity by either using receptor mutant or by expressing Kir (a K+ channel) in sGRN increased bGRN responses. These responses also increase the sensillum potential. Finally, they show that these changes are behaviorally relevant as conditions that increase sGRN activity decrease avoidance of bitter substances.

      Strengths:

      There is solid evidence that perturbation of sweet GRNs affects bitter GRN in the same sensillum. The measurement of transsynaptic potential and how it changes is also interesting and supports the authors' conclusion.

      Weaknesses:

      The ionic basis of how perturbation in GRN affects the transepithelial potential which in turn affects the second neuron is not clear.

      We speculate that HCN-dependent membrane potential regulation, rather than ionic composition change, is responsible for the observed SP preservation, as further discussed as an author response in the section of “Recommendations for the authors”. The transepithelial potential can be dissipated by increased conductance through receptor-linked ion channels following gustatory receptor activation in GRNs. The volume of the sensillum lymph is very small according to electron micrographs of horizontally sliced bristles (PMID: 11456419). Therefore, robust excitation of a gustatory neuron may easily deplete the extracellular potential built as a form of polarized ion concentrations across the tight junction. When the consumption is too strong and extended, the neighboring neuron, which share TEP with the activated GRN, can be negatively affected. We propose that HCN suppresses overexcitation of sGRNs by means of membrane potential stabilization. This stabilization prevents sGRNs from excessively reducing the TEP, thereby protecting the activity of neighboring bGRNs.

      Reviewer #3 (Public Review):

      Ephaptic inhibition between neurons housed in the same sensilla has been long discovered in flies, but the molecular basis underlying this inhibition is underexplored. Specifically, it remains poorly understood which receptors or channels are important for maintaining the transepithelial potential between the sensillum lymph and the hemolymph (known as the sensillum potential), and how this affects the excitability of neurons housed in the same sensilla.

      Although a reduction of sensillum potential was proposed to underlie membrane hyperpolarization of post-ephaptic olfactory neurons in Drosophila, our preliminary data (not shown due to a manuscript in preparation) and the results included in the paper (Fig. 5B) strongly suggest that SP reduction is not a requisite for ephaptic inhibition at least in GRNs. Ephaptic inhibition is expected to be instantaneous, whereas we find that SP reduction in gustation is very slow. Therefore, we would like to indicate that the findings we report in this manuscript are not directly related to ephaptic inhibition.

      Lee et al. used single-sensillum recordings (SSR) of the labellar taste sensilla to demonstrate that the HCN channel, Ih, is critical for maintaining sensillum potential in flies. Ih is expressed in sugar-sensing GRNs (sGRNs) but affects the excitability of both the sGRNs and the bitter-sensing GRNs (bGRNs) in the same sensilla. Ih mutant flies have decreased sensillum potential, and bGRNs of Ih mutant flies have a decreased response to the bitter compound caffeine. Interestingly, ectopic expression of Ih in bGRNs also increases sGRN response to sucrose, suggesting that Ih-dependent increase in sensillum potential is not specific to Ih expressed in sGRNs. The authors further demonstrated, using both SSR and behavior assays, that exposure to sugars in the food substrate is important for the Ih-dependent sensitization of bGRNs. The experiments conducted in this paper are of interest to the chemosensory field. The observation that Ih is important for the activity in bGRNs albeit expressed in sGRNs is especially fascinating and highlights the importance of non-synaptic interactions in the taste system.

      Despite the interesting results, this paper is not written in a clear and easily understandable manner. It uses poorly defined terms without much elaboration, contains sentences that are borderline unreadable even for those in the narrower chemosensory field, and many figures can clearly benefit from more labeling and explanation. It certainly needs a bit of work.

      We would like to revise the language aspect of the manuscript after finalizing the scientific revision.

      Below are the major points:

      (1) Throughout the paper, it is assumed that Ih channels are expressed in sugar-sensing GRNs but not bitter-sensing GRNs. However, both this paper and citation #17, another paper from the same lab, contain only circumstantial evidence for the expression of Ih channels in sGRNs. A simple co-expression analysis, using the Ih-T2A-GAL4 line and Gr5a-LexA/Gr66a-LexA line, all of which are available, could easily demonstrate the co-expression. Including such a figure would significantly strengthen the conclusion of this paper.

      We did conduct confocal imaging with Ih-T2A-Gal4 in combination with GRN Gal4s (ref#17 version2). The expression is very broad, including both neurons and non-neuronal cells. We observed much stronger sGRN expression than bGRN expression. But the promiscuous expression of the reporter in many cells hindered us from clearly demonstrating the void of the reporter in bGRNs. However, the functional and physiological examination of Ih-T2A-Gal4 with the neuronal modifiers such as TRPA1 and Kir2.1 in ref#17 indicates the strong and little expression of Ih in sGRNs and bGRNs, respectively. Furthermore, the RNAi kd results present another line of evidence that HCN expressed in sGRNs regulates SP and bGRN activity (Fig. 1C,D, Fig. 1-figure supplement 2). Ih-RNAi expression in bGRNs did not result in any statistically significant changes in the activities of sGRNs and bGRNs compared to controls (Fig. 1C,D, revised), advocating that Ih acts in sGRNs for the functional homeostasis of SP and GRNs, as we claim.

      (2) Throughout this paper, it is often unclear which class of labellar taste sensilla is being recorded. S-a, S-b, I-a, and I-b sensilla all have different sensitivities to bitters and sugars. Each figure should clearly indicate which sensilla is being recorded. Justification should be provided if recordings from different classes of sensilla are being pooled together for statistics.

      We mainly performed SSR (single sensillum recording) on i-type bristles as they have the simplest composition of GRNs compared to s- and L-type bristles. As single s-types also contain each of s- and bGRN, we measured SP also for s-types (Figs. 2, 3F and 4D). In case of Fig.3-figure supplement 1, L-types were tested for the relationship between water cell activity and SP. Now all the panels are labelled with the tested bristle types.

      (3) In many figures, there is a lack of critical control experiments. Examples include Figures 1C-F (lacking UAS control), Figure 2I-J (lacking UAS control), Figure 4E (lacking the UAS and GAL4 control, and it is also strange to compare Gr64f > RNAi with Gr66a > RNAi, instead of with parental GAL4 and UAS controls.), and Figure 5D (lacking UAS control). Without these critical control experiments, it is difficult to evaluate the quality of the work.

      Thank you for pointing this out. We appreciate the feedback and have addressed these concerns by including all the requested controls in the figures. Specifically, we have added the UAS controls for Figs 1C-F and 2I-J, as well as the UAS and GAL4 controls for Fig. 4E. We have also included the UAS control for Fig. 5D.

      (4) Figure 2A could benefit from more clarification about what exactly is being recorded here. The text is confusing: a considerable amount of text is spent on explaining the technical details of how SP is recorded, but very little text about what SP represents, which is critical for the readers. The authors should clarify in the text that SP is measuring the potential between the sensillar lymph, where the dendrites of GRNs are immersed, and the hemolymph. Adding a schematic figure to show that SP represents the potential between the sensillar lymph and hemolymph would be beneficial.

      SP was defined at lines 55-56 in the first paragraph of introduction, which also contains the background information for SP as a transepithelial potential. As reviewer suggested, we now also included a sentence describing SP (“SP is known as a transepithelial potential between the sensillum lymph and the hemolymph, generated by active ion transport through support cells”, line 126) and a drawing to illustrate the concept of SP (Fig. 2A), and revised the legend.

      (5) The sGRN spiking rate in Figure 4B deviates significantly from previous literature (Wang, Carlson, eLife 2022; Jiao, Montell PNAS 2007, as examples), and the response to sucrose in the control flies is not dosage-dependent, which raises questions about the quality of the data. Why are the responses to sucrose not dosage-dependent? The responses are clearly not saturated at these (10 mM to 100 mM) concentrations.

      Our recordings show different spiking frequencies from others’ work, because the frequencies are from 5-sec bins not only first 0.5 sec. This lowers the frequencies, as spikes are relatively more frequent in the beginning of the recording (Fig. 4-figure supplement 1).

      Why are the responses to sucrose not dosage-dependent? The responses are clearly not saturated at these (10 mM to 100 mM) concentrations.

      We were also puzzled with the flat dose dependence to sucrose. This result may suggest the existence of another mechanism moderating sucrose responses of sGRNs. This flat curve reappeared with other genotypes with the same concentration range (5-50 mM) in Fig. 4E. However, 1-mM sucrose produced much lower spiking frequencies (Fig. 4E), suggesting that sGRN responses are saturated at 5 mM sucrose with our recording/analysis condition.

      (6) In Figure 4C, instead of showing the average spike rate of the first five seconds and the next 5 seconds, why not show a peristimulus time histogram? It would help the readers tremendously, and it would also show how quickly the spike rate adapts to overexpression and control flies. Also, since taste responses adapt rather quickly, a 500 ms or 1 s bin would be more appropriate than a 5-second bin.

      Taste single sensillum recording starts by contacting stimulants, which bars us from recording pre-stimulus responses of GRNs. Therefore, we showed post-stimulus graphs with 1-sec bins (Fig. 4-figure supplement 1) as we reviewer suggested.

      (7) Lines 215 - 220. The authors state that the presence of sugars in the culture media would expose the GRNs to sugar constantly, without providing much evidence. What is the evidence that the GRNs are being activated constantly in flies raised with culture media containing sugars? The sensilla are not always in contact with the food.

      We agree with reviewer. We replaced “long-term stimulation of sGRNs” with “strong and frequent stimulation of sGRNs for extended period”. The word long-term may be interpreted to be constant.

      (8) Line 223. To show that bGRN spike rates in Ih mutant flies "decreased even more than WT", you need to compare the difference in spike rates between the sorbitol group and the sorbitol + sucrose group, which is not what is currently shown.

      The data were examined by ANOVA and a multiple comparison test (Dunn’s) between all the groups regardless of genotypes and conditions in the panel (all the groups sharing the y axis). Therefore, the differences were statistically examined. However, the cited expression we used read like it was about the slope or extent of the decrease. We intended to indicate the difference in the absolute values of spiking frequencies after overnight sweet exposure between the genotypes, while bGRN activities were statistically indifferent between WT and Ih mutants when they were kept only on sorbitol food. We revised it to “decreased to the level significantly lower than WT”. We also changed the graph style to effectively present the trend of changes in bGRN sensitivity with comparison between genotypes. Again, the groups were statistically examined together regardless of the genotypes and conditions.

      (9) To help readers better understand the proposed mechanisms here, including a schematic figure would be helpful. This should show where Ih is expressed, how Ih in sGRNs impacts the sensillum potential, how elevated sensillum potential increases the electrical driving force for the receptor current, and affects the excitability of the bGRNs in the same sensilla, and how exposure to sugar is proposed to affect ion homeostasis in the sensillum lymph.

      As reviewer suggested, we included two panels to show working model for gustatory homeostasis via SP maintenance by HCN (Fig. 5E,F).

      Reviewer #1 (Recommendations For The Authors):

      (1) The relationship between this paper and the authors' bioRxiv preprint posted last year is not clear. In the introduction they made it seem like this paper is a follow-up that builds on the preprint, but most or all of the experiments in this paper were already performed in the preprint. I guess the authors are planning to divide the original paper into two papers. I would suggest updating the preprint to avoid confusion.

      Thank you for the comment. We updated the preprint to be without a part of Fig.6 and entire Fig.7 along with associated texts. As reviewer pointed out, our eLife paper was spun off from the part of the preprint paper, because we feel that the two stories could confuse readers when presented together.

      (2) Have the authors considered testing responses of water GRNs? They reside in the same sensilla as sugar neurons, so are they also increased affected by Ih mutation or RNAi in sugar neurons? This would strengthen the evidence that the indirect (non-cell autonomous) effects of Ih are due to the sensillum potential and not some specific interaction between sweet and bitter cells.

      As reviewer proposed, we appraised water GRN activity in the L-type bristles of WT, Ihf03355 and a genomic rescue line for Ihf03355. Spiking responses in water GRNs were evoked by hypo-osmolarity of electrolyte (0.1 mM tricholine citrate-TCC). Interestingly, the Ih mutant showed reduced 0.1 mM TCC-provoked spiking frequencies compared to WT. This impairment was rescued by the genomic fragment containing an intact Ih locus (Figure 3-figure supplement 1A).

      Additionally, SPs in L-type bristles were reduced by Ih deficiencies but increased in Gr64af, suggesting that HCN regulates sGRNs in L-type bristles as well (Figure 3-figure supplement 1B). Again, the bristles of animals with both mutations together exhibited SPs similar to those of WT.

      Furthermore, when we conducted cDNA rescue experiments in L bristles, introduction of Ih-RF cDNA in sGRNs restored SPs, while expressing it in bGRNs did not unlike the results from the i- and s-bristles (Fig. 2K,L), likely because L-bristles lack bGRNs. These cDNA rescue and genetic interaction experiments were conducted using flies fed on fresh cornmeal food with strong sweetness, suggesting that the sweetness in the media is the likely key factor producing the genetic interaction and necessitating HCN, consistent with other results in the manuscript. Therefore, SP regulation by HCN is observed in the L-type bristles.

      Minor comments:

      Line 52: typo, "Many of"

      Thank you. Corrected

      Line 95: typo, "sensilla do an sGRN"

      Corrected

      Line 98: typo, "we observed reduced the spiking responses"

      Corrected

      Line 206: typo, "a relatively low sucrose concentrations"

      Corrected

      Line 260: "inverse relationship between the two GRNs in excitability" - I am not exactly sure what data you are referring to.

      Although alleles did not show increased sGRN activities, knockdown of Ih decreased bGRN activity but increased sGRN activity (Fig. 1C,D, Fig.1-figure supplement 2B), while suppression of sGRNs increased bGRN activity (Fig. 3). To clarify this point, we revised the phrase to “the inverse relationship between the two GRNs in excitability observed in Fig. 1C,D, Fig. 1-figure supplement 2B, and Fig. 3”.

      Methods: typo, "twenty of 3-5 days with 10 males and 10 females"

      Corrected to “Twenty flies, aged 3-5 days and consisting of 10 males and 10 females,”

      Methods: typo, "Kim's wipes" should be "Kimwipes"

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      (1) More clarification is necessary on Transepithelial potential (TEP). TEP is typically created by having pumps and tight junctions between the sensillar lymph and the hemolymph.

      We have an introduction to TEP or SP in the context of sensory functions (lines 40-57) with relevant references. The involvement of pumps and tight junction was mentioned in the same paragraph; “Glia-like support cells exhibit close physical association with sensory receptor neurons, and conduct active transcellular ion transport, which is important for the operation of sensory systems” (line 40) and “Tight junctions between support cells separate the externally facing sensillar lymph from the internal body fluid known as hemolymph” (line 53).

      It is not clear how HCN channels in one of the neurons might change the composition of the sensillum lymph. An explanation of their model of how TEP depends on HCN is necessary.

      Although the ionic composition of the sensillum lymph is a contributing factor to the sensillum potential, it is more conceptually relevant to describe our findings with the perspective of membrane potential regulation given the role of HCN in membrane potential stabilization as discussed in our manuscript.

      We speculate that HCN controls the membrane potential at rest and/or in motion to modulate sGRN activity towards saving SP despite the sweetness in the niche. We positioned our results in relation to SP in discussion; “Our results provide multiple lines of evidence that HCN suppresses HCN-expressing GRNs, thereby sustaining the activity of neighboring GRNs within the same sensilla. We propose that this modulation occurs by restricting SP consumption through HCN-dependent neuronal suppression rather than via chemical and electrical synaptic transmission.” (lines 252-255). Moreover, it is unclear whether HCN is localized to the dendrite bathed in the sensillum lymph to influence the ionic composition of the lymph. It would be very interesting to study in future whether the ionic flow through HCN channels itself is critical for the function of HCN in this context, and whether HCN is exclusively present in the dendrite to support the postulation. However, we would like to remind reviewer that Kir2.1 and HCN channels in sGRNs showed similar effects on SP and bGRNs, while they differ in Na+ conductance.

      In the initially submitted manuscript (lines 325-343), we discussed the potential mechanism by which Kir2.1 and HCN channels commonly increase SP in terms of how the membrane potential regulation in the soma can control the SP consumption in the dendrite of sGRNs.

      Another point about the TEP that needs some explanation is that these sensilla are open to the environment as tastants must flow in and are different from mechanical sensilla in that sense.

      This is a very important question regarding the general physiology of the taste sensilla, as the sensillum lymph is in contact with the external environment through the pore of the sensillum. It is indeed interesting to consider how the composition and potential of the lymph are maintained despite the relatively vast volume of food the sensilla encounter during gustation and the continuous evaporation to air between episodes of gustation. However, we believe that this question, while important, is distinct from the primary focus of our manuscript.

      Are the TEP measurements in Figure 2 under control conditions where there are no tastants?

      There is no tastant in the SP-measuring glass electrode other than the electrolyte. We apologize that we did not specify the recording electrode condition. We inserted a clause in the method; “For SP recordings, the recording electrode contained 2 mM TCC as the electrolyte, and…”

      Does the TEP change dynamically as sGRN is activated?

      SP does shift in response to sweets. Please see Fig. 5B. Also, we showed SP changes by mechanical stimuli, which depended on the mechanoreceptor, NompC (Fig. 2D-F). Mechanoreceptor neurons share the sensillum lymph with GRNs.

      (2) More clarification on the potential transduction mechanism and how TEP affects one neuron differentially. Essentially, sGRN perturbation affects sGRN activity and it affects the TEP. More explanation is needed for the potential ionic mechanism of each.

      Our results strongly suggest that HCN lowers the activity of HCN-expressing GRNs, mitigating SP consumption. This modulation is crucial because the SP serves as a driving force for neuronal activation within the sensillum. HCN is particularly necessary in sGRNs because of the flies’ sweet feeding niche, which is expected to result in frequent and strong activation of sGRNs. The SP saved by HCN-dependent delimitation of sGRNs can be used to raise the responsibility of bGRNs.

      (3) The authors refer to their own unreviewed paper (Reference 17). This paper is on a similar topic and there seems to be some overlap. Clarification on this point would be important.

      We revised the biorxiv preprint, so that the preprint version 2 does not contain the parts overlapping with this eLife paper. This eLife paper was originally part of the preprint paper, but it was separated to clarify the messages of the two stories. As we explained in Discussion (lines 276-297), HCN provides resistance to both hyperpolarization and depolarization of the membrane potential. Simply put, one paper focuses on the role of HCN in resisting hyperpolarization, while the other (this paper in eLife) focuses on resisting depolarization.

      (4) Methods are sparse. Many details on the method are necessary. For example, Sensilla recordings are being done by the tip-dip method (I assume). What does "number of experiments" mean in Figure 1? Is it the number of animals or the number of sensilla? How many trials/sensilla?

      We indicated the extracellular recording was performed by the tip-dip method; “In vivo extracellular recordings were performed by the tip-dip method as detailed previously”. We also added a statement on the number of experiments; “The number of experiments indicated in figures are the number of naïve bristles tested. The naïve bristles were from at least three different animals.”

      (5) Figure 1: I understand the author's interpretation. But if one compares WT in Figure 1A to Gr64a-IhRNAi in 1C, we can come to the conclusion that there is no change. In other words, the control in Figure 1C (grey) has a much higher response than WT. Similar conclusions can be made for other experiments. Is the WT response stable enough to make the conclusions made here?

      The genetic background of each genotype may influence GRN activity to some extent. RNAi knockdown experiments are well-known for their hypomorphic nature, and their effects should be evaluated by comparison with their parental controls such as Gal4 and UAS lines. As all reviewers pointed out, we added the results from UAS control. This effort confirms that Gr89a>Ih RNAi is statistically indifferent to UAS control as well as Gr64f-Gal4 control in bGRN spiking evoked by 2-mM caffeine, while Gr64f>Ih RNAi showed reduced bGRN responses to 2 mM caffeine compared to all the controls.

      (6) Figure 3: Why is bGRN spiking not plotted against sensillum potential to observe the dependence more directly?

      This is a very interesting suggestion. We are not, however, equipped to measure spiking and sensillum potential simultaneously. Therefore, they are independent experiments, and we treated them accordingly.

      (7) Figure 4: Why bGRN response is only affected at high caffeine concentrations is not clear.

      We were also surprised by the differences in the dose dependence results of b- and sGRNs, genetically manipulated to mis-express and over-express HCN in Fig. 4A and 4E, respectively. Each gustatory neuron likely has distinct sets of players and parameters that set its own membrane potential and excitability.

      We can think of a possibility that there might be a range of membrane potentials within which HCN does not engage. In bGRNs, the resting membrane potential may lie low within this range, so that some degrees of membrane depolarization by low concentrations of caffeine do not significantly close HCN channels, thus preventing their hyperpolarizing effects. On the other hand, the membrane potential of sGRNs may be high within this range, showing suppressive effects at all tested sucrose concentrations. However, we find this explanation is too speculative to include in the main text, while we stated in the original manuscript, “implying a complex cell-specific regulation of GRN excitability.” (line 210).

      (8) Minor:

      L98 - there is a small typo

      Corrected

      L274: "funny" !?

      “Funny” currents, denoted If, were initially observed by electrophysiologists and later attributed to HCN channels, now indicated by Ih (thus the gene name Ih in Drosophila). These currents were termed "funny" due to their unusual properties compared to other currents. For more detailed information, please refer to the cited references.

      L257: Neuropeptide seemed to be abrupt

      We attempted to discuss possible mechanisms that mediate excitability changes across GRNs beyond the mechanism by SP shifts. Neuropeptides, which are chemical neurotransmitters along with small neurotransmitters, were mentioned following the discussion on synaptic transmission to suggest alternative pathways for excitability regulation. This inclusion is meant to provide a comprehensive overview of potential mechanisms influencing GRN activity.

      Reviewer #3 (Recommendations For The Authors):

      Congratulations on your fascinating research! The results are certainly of interest to the chemosensory field. However, I suggest using academic editing services to enhance the clarity of your text and ensure that the terminology and jargon align with standard usage in the field. The current choice of words may not be consistent with commonly used terms. As it is now, the writing might not fully showcase the compelling story and the effort behind your study, and is underselling your interesting results. Proper refinement could make sure your valuable findings are appropriately recognized.

      We appreciate your comments and apologize for any difficulties reviewers faced during the review process. We are currently prioritizing the review of scientific content and plan to address language issues in a subsequent revision. It would be very helpful for future revisions if the problematic sentences or expressions could be indicated in detail after this revision. This will allow us to ensure that our terminology and expression align with standard usage in the field, and that our findings are clearly and effectively communicated.

      Minor points:

      (1) Line 110: what is Ih-RF?

      We apologize that we relied on a reference in describing the cDNA. The following clause was inserted with additional reference and the Flybase id: “(Flybase id: FBtr0290109), which previously rescued Ih deficiency in other contexts17,26 ,”  

      (2) Line 158: Gr64af mutant flies still have Gr5a and a residual response to fructose and sucrose (Slone, Amrein 2007).

      We revised the line to “is severely impaired in sucrose and glucose sensing”, since there is a substantial loss of sucrose and glucose sensing in both Gr64af from Kim et al 2018 and DGr64 from Slone et al 2007, when they were examined by the proboscis extension reflex assay. This was also confirmed in the study by Jiao et al 2009. We also deleted “sugar-ageusic” and instead describe the mutant “impaired in sucrose and glucose sensing” in Fig. 3 legend.

      (3) Lines 264-273 seem unnecessary. This paper is not about the function of HCN in mammals, and these discussions seem largely irrelevant.

      We feel that it is important to position our results within a broader context by discussing the potential implications of our findings for sensory systems of other animals. As we stated, HCN channels have been localized in mammalian sensory systems, but their roles are often not well understood. By including this discussion, we aim to highlight the relevance of our findings beyond the model organism used in our study and suggest possible areas for future research in mammalian systems.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Overall, the manuscript is very well written, the approaches used are clever, and the data were thoroughly analyzed. The study conveyed important information for understanding the circuit mechanism that shapes grid cell activity. It is important not only for the field of MEC and grid cells, but also for broader fields of continuous attractor networks and neural circuits.

      We appreciate the positive comments.

      (1) The study largely relies on the fact that ramp-like wide-field optogenetic stimulation and focal optogenetic activation both drove asynchronous action potentials in SCs, and therefore, if a pair of PV+ INs exhibited correlated activity, they should receive common inputs. However, it is unclear what criteria/thresholds were used to determine the level of activity asynchronization, and under these criteria, what percentage of cells actually showed synchronized or less asynchronized activity. A notable percentage of synchronized or less asynchronized SCs could complicate the results, i.e., PV+ INs with correlated activity could receive inputs from different SCs (different inputs), which had synchronized activity. More detailed information/statistics about the asynchronization of SC activity is necessary for interpreting the results.

      The short answer here is that spiking responses from the pairs of SCs that we sampled appear asynchronous. We now show this in the form of cross-correlograms for all recorded pairs of SCs (Figure 2, Figure Supplement 1). The correlograms lack peaks that would indicate synchronous activation. Thus, while our dataset is not large enough to rule out occasional direct synchronisation of SCs, this appears unlikely to account for synchronised input to PV+INs.

      This conclusion is consistent with consideration of mechanisms that could in principle synchronise SCs:

      First, if responses to ramping light inputs was fully deterministic, then this could lead to fixed relative timing of spikes fired by different SCs. This is unlikely given the influence of stochastic channel gating on SC spiking (Dudman and Nolan 2009) and is inconsistent with trial to trial variability in spike timing (Figure 2, Figure Supplement 2).

      Second, as SCs are glutamatergic they could excite one another. However, excitatory connections between stellate cells are rare (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016) and when detected they have low amplitude (mean < 0.25 mV; (Winterer et al. 2017)). Our finding that spiking by pairs of SCs is not correlated is consistent with this.

      Third, strong interaction between stellate cells mediated by local inhibitory pathways (Pastoll et al. 2013; Couey et al. 2013) could coordinate their activity. The lack of correlation between spiking of pairs of SCs suggests that such coordination is rarely recruited by our ramping protocols. Nevertheless, recruitment of inhibition may happen to some extent as experiments in Figure 4 show that correlated input from SCs to more distant, but not nearby PV+INs, is reduced by blocking inhibitory synapses. Given that we don't find evidence for synchronised spiking of SCs, this additional common input to widely separated PV+INs is instead best explained by recruitment of interneurons that act directly on the target SCs. We have modified Figure 8 to make this clear.

      Thus, for experiments with ramping light stimuli, synchronous activation of SCs is unlikely to explain common input to PV+INs. Input from the same SC best explains correlated responses of nearby PV+IN inhibitory populations, while recruitment of an additional inhibitory pathway may contribute to correlated responses of more distant PV+INs.

      For experiment using focal stimulation, substantial trial-to-trial variation in SC spike timing argues strongly against deterministic coordination. Indirect coordination of presynaptic neurons is also extremely unlikely given that focal activation is sparse and brief, while inputs from many presynaptic SCs are required to drive a postsynaptic interneuron to spike (e.g. (Pastoll et al. 2013; Couey et al. 2013)). Results from these experiments thus corroborate results from experiments using ramping light stimulation.

      In revising the manuscript we have tried to ensure these arguments are clear (e.g. p 5, para 3; p 6, para 2; p 10, para 1).

      (2) The hypothesis about the "direct excitatory-inhibitory" synaptic interactions is made based on the GABAzine experiments in Figure 4. In the Figure 8 diagram, the direct interaction is illustrated between PV+ INs and SCs. However, the evidence supporting this "direct interaction" between these two cell types is missing. Is it possible that pyramidal cells are also involved in this interaction? Some pieces of evidence or discussions are necessary to further support the "direction interaction".

      Indirect connections between stellate cells mediated via fast spiking inhibitory interneurons are well established by previous studies (e.g. (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016), and so were not addressed here. Previous work also establishes that connections from stellate cells to pyramidal cells are extremely rare (Winterer et al. 2017). Because the Sim1:Cre mouse line is specific to stellate cells and does not drive transgene expression in pyramidal cells (Sürmeli et al. 2015), it's therefore unlikely that pyramidal cells play a role.

      To make these points clearer we have modified the text in the discussion (p 5, para 3; p 10, paras 1 & 2). We have also modified Figure 8 to highlight that the indirect interaction may be best accounted for by inhibitory pathways onto PV+INs rather than via SCs (which our new cross-correlation analyses indicate is unlikely).

      Reviewer #2 (Public Review):

      In this study, Huang et al. employed optogenetic stimulation alongside paired whole-cell recordings in genetically defined neuron populations of the medial entorhinal cortex to examine the spatial distribution of synaptic inputs and the functional-anatomical structure of the MEC. They specifically studied the spatial distribution of synaptic inputs from parvalbumin-expressing interneurons to pairs of excitatory stellate cells. Additionally, they explored the spatial distribution of synaptic inputs to pairs of PV INs. Their results indicate that both pairs of SCs and PV INs generally receive common input when their relative somata are within 200-300 ums of each other. The research is intriguing, with controlled and systematic methodologies. There are interesting takeaways based on the implications of this work to grid cell network organization in MEC.

      We appreciate the positive comments.

      (1) Results indicate that in brain slices, nearby cells typically share a higher degree of common input. However, some proximate cells lack this shared input. The authors interpret these findings as: "Many cells in close proximity don't seem to share common input, as illustrated in Figures 3, 5, and 7. This implies that these cells might belong to separate networks or exist in distinct regions of the connectivity space within the same network.". Every slice orientation could have potentially shared inputs from an orthogonal direction that are unavoidably eliminated. For instance, in a horizontal section, shared inputs to two SCs might be situated either dorsally or ventrally from the horizontal cut, and thus removed during slicing. Given the synaptic connection distributions observed within each intact orientation, and considering these distributions appear symmetrically in both horizontal and sagittal sections, the authors should be equipped to estimate the potential number of inputs absent due to sectioning in the orthogonal direction. How might this estimate influence the findings, especially those indicating that many close neurons don't have shared inputs?

      Given we find high probabilities of correlated inputs to nearby cells in both planes, our conclusion that nearby cells are likely to receive common inputs appears to be independent of the slice plane. For cells further apart, where the degree of correlated input becomes more variable, it is possible that cell pairs that have low input correlations measured in one slice plane would have high input correlations if measured in a different plane. An argument against this is that as the cell pairs are further apart, it is less likely that an orthogonal axon would intersect dendritic trees of both cells. Nevertheless, we can't rule this out given the data here. We have amended the discussion to highlight this possibility (p 10, para 1). We agree it would be interesting to address this point further with quantitative analyses but this will be difficult without detailed reconstructions of the circuit.

      (2) The study examines correlations during various light-intensity phases of the ramp stimuli. One wonders if the spatial distribution of shared (or correlated) versus independent inputs differs when juxtaposing the initial light stimulation phase, which begins to trigger spiking, against subsequent phases. This differentiation might be particularly pertinent to the PV to SC measurements. Here, the initial phase of stimulation, as depicted in Figure 7, reveals a relatively sparse temporal frequency of IPSCs. This might not represent the physiological conditions under which high-firing INs function. While the authors seem to have addressed parts of this concern in their focal stim experiments by examining correlations during both high and low light intensities, they could potentially extract this metric from data acquired in their ramp conditions. This would be especially valuable for PV to SC measurements, given the absence of corresponding focal stimulation experiments.

      We understand the gist of the question here as being can differences in correlation scores between initial vs later phases of responses to ramping light inputs be used to infer spatial organisation? These differences are likely to reflect heterogeneity in the spiking of the input neurons, for example through differences in spike threshold, spike frequency adaptation and saturation of spiking (e.g. Figure 2, Figure Supplement 1A, and also see (Pastoll et al. 2020)). We don't expect these differences to have any spatial organisation along the mediolateral axis, and while spike threshold follows a dorsoventral organisation there is nevertheless substantial local variation between neurons (Pastoll et al. 2020). It's therefore unlikely we can use differences in early versus late correlations to make the inferences proposed by the reviewer.

      With respect to PV to SC measurements, similar heterogeneity is likely. We note that we were unable to carry out focal stimulation experiments for PV to SC connections as PV neurons did not spike in response to focal optogenetic stimulation.

      With respect to physiological conditions, our aim here is simply to assess connectivity in well controlled conditions, e.g. voltage-clamp, minimal spontaneous activity, known neuronal locations, etc. It's not clear that physiological activation patterns would improve on these tests and quite likely data would be noisier and harder to interpret.

      (3) Re results from Figure 2: Please fully describe the model in the methods section. Generally, I like using a modeling approach to explore the impact of convergent synaptic input to PVs from SCs that could effectively validate the experimental approach and enhance the interpretability of the experimental stim/recording outcomes. However, as currently detailed in the manuscript, the model description is inadequate for assessing the robustness of the simulation outcomes. If the IN model is simply integrate-and-fire with minimal biophysical attributes, then the findings in Fig 2F results shown in Fig 2F might be trivial. Conversely, if the model offers a more biophysically accurate representation (e.g., with conductance-based synaptic inputs, synapses appropriately dispersed across the model IN dendritic tree, and standard PV IN voltage-gated membrane conductances), then the model's results could serve as a meaningful method to both validate and interpret the experiments.

      We appreciate the simulation descriptions were insufficient and have modified the manuscript to include additional details and clarification (p 14, paras 1-3).

      We're not sure we follow the logic here with respect to model types. The experiments were carried out in the voltage-clamp recording configuration with the goal of identifying correlated inputs independently from how they are integrated by the postsynaptic neuron. Given that membrane potential doesn't change (and so the CdVm/dt term of the membrane equation = 0), integrate and fire and point conductance-based models both simplify down to summing of input currents. We achieve this by convolving spike times with experimentally measured synaptic current waveforms. An assumption of our approach is that we achieve a reasonable space clamp. We believe this is justified given that stellate cells and PV interneurons are reasonably electrotonically compact, and that our analysis relies on consistent correlations rather than absolute amplitudes or time constants of the postsynaptic response and so should tolerate moderate space clamp errors.

      Reviewer #3 (Public Review):

      This paper presents convincing data from technically demanding dual whole-cell patch recordings of stellate cells in medial entorhinal cortex slice preparations during optogenetic stimulation of PV+ interneurons. The authors show that the patterns of postsynaptic activation are consistent with dual recorded cells close to each other receiving shared inhibitory input and sending excitatory connections back to the same PV neurons, supporting a circuitry in which clusters of stellate cells and PV+IN interact with each other with much weaker interactions between clusters. These data are important to our understanding of the dynamics of functional cell responses in the entorhinal cortex. The experiments and analysis are quite complex and would benefit from some revisions to enhance clarity.

      These are technically demanding experiments, but the authors show quite convincing differences in the correlated response of cell pairs that are close to each other in contrast to an absence of correlation in other cell pairs at a range of relative distances. This supports their main point of demonstrating anatomical clusters of cells receiving shared inhibitory input.

      We appreciate the positive comments.

      The overall technique is complex and the presentation could be more clear about the techniques and analysis. In addition, due to this being a slice preparation they cannot directly relate the inhibitory interactions to the functional properties of grid cells which was possible in the 2-photon in vivo imaging experiment by Heys and Dombeck, 2014.

      We have modified the manuscript to try to improve the presentation (specific changes are detailed below). We agree that an important future challenge is to relate our findings to in vivo observations (p 11, para 2).

      Reviewer #1 (Recommendations For The Authors):

      Major points

      (1) The study largely relies on the fact that ramp-like wide-field optogenetic stimulation and focal optogenetic activation both drove asynchronous action potentials in SCs, and therefore, if a pair of PV+ INs exhibited correlated activity, they should receive common inputs. In Figure 2 and its supplementary figures, the authors also showed examples of asynchronized activity. However, it is unclear to me what criteria/thresholds were used to determine the level of activity asynchronization, and under these criteria, what percentage of cells actually showed synchronized or less asynchronized activity. A notable percentage of synchronized or less asynchronized SCs could complicate the results, i.e., PV+ INs with correlated activity could receive inputs from different SCs (different inputs), which had synchronized activity. Related to this concern, it would also be important to simulate what level of activity asynchronization in SCs could still lead to correlated PV+ IN activity above shuffle, and among the recorded SCs, what percentage of cells belong to this synchronized/less asynchronized category.

      We address this point in our response to the public review. In brief, we have added additional cross-correllograms showing that ramp activation of SC pairs does not cause detectable synchronous activation. We also clarify that sensitivity of correlations of some widely separated pairs to GABA-blockers is suggestive of SCs activating common inhibitory inputs to cell pairs.

      (2) The above concern is more relevant to the focal stimulation experiments, in which the authors tried to claim that a pair of PV+ INs with correlated activity could receive inputs from the same SCs neurons. The authors also showed that the stimulation patterns leading to the activation of PV+ INs were more similar if PV+ INs had correlated activity (Figure 5D). However, if nearby SCs were more synchronized than distal SCs within this stimulation scale, even though a pair of PV+ INs showed correlated activity, they could still receive inputs from different but nearby SCs. In this case, it would be helpful to quantify the relationship between the level of activity synchronization of SCs and their distances. In Figure 5 Supplementary Figure 1, the data were only provided for 8 cells. If feasible, collecting data from more cells would be needed for the proposed analysis.

      We explain in our responses to point 1 above and in the public review that direct synchronisation of SCs is unlikely. This is particularly unlikely for focal stimulation experiments as the timing of responses of individual SCs is extremely variable between trials. Thus, even if there were strong synaptic connections between SCs, which the evidence suggests there is not (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016), then this would be unlikely to result in reliably timed coordinated firing.

      (3) It is unclear what the definition of "common inputs" is. Do they refer to inputs from the same group of cells? If different groups of cells provide synchronized inputs, will the inputs be considered "common inputs" or "different inputs"?

      We used "common" in an attempt to be consistent with classic work by Yoshimura et al. and in an attempt to be succinct. Thus, by common input we are referring to cell pairs for which a proportion of their input is from the same presynaptic neuron(s), as opposed to cell pairs for which their input is from different neurons and therefore have no common input. We have attempted to make sure this is clear in the revised manuscript (e.g description of simulations on p 4, para 2).

      (4) In the introduction and abstract, it was mentioned that "dense, but specific, direct excitatory-inhibitory synaptic interactions may operate at the scale of grid cell clusters". It is unclear to me how "dense" was demonstrated in the data. Can the authors clarify?

      Thanks for flagging this, we were insufficiently clear. We have revised the text to refer to cell pairs for which a proportion of their input is from the same presynaptic neurons (e.g. p 3, para 1), and separately about indirect coordination, by which we mean inputs to cell pairs that appear correlated because of coordination between upstream neurons.

      (5) The hypothesis about the "direct excitatory-inhibitory" synaptic interactions is made based on the GABAzine experiments in Figure 4. In the Figure 8 diagram, the direct interaction is illustrated between PV+ INs and SCs. Is there any evidence supporting this "direct interaction"?

      The direct interaction from SCs to PV+INs and from PV+INs to SCs were previously demonstrated by experiments with recordings from pairs of neurons (e.g. (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016; Winterer et al. 2017). Our results in Figures 3-5, which show that exciting SCs by light activation of ChR2 leads to excitation of PV+INs, and in Figure 7, which show that light activation of PV+INs expressing ChR2 leads to inhibition of SCs, are consistent with these previous conclusions. We have modified the manuscript to make sure this is clear (p 2, para 3).

      Is it possible that pyramidal cells are also involved in this interaction? If this is unlikely, the author may provide some pieces of evidence (e.g., timing of responses after optogenetic stimulation) or some discussions.

      This is unlikely given that previous studies indicate that connections from stellate to pyramidal cells are weak or absent (Winterer et al. 2017). We now clarify this in the Discussion (p 10, para 1).

      Minor points (1) Page 4: the last paragraph: the author claimed that CCpeakmean was reduced and CClagvar increased with cell separation. Although the trends are visible in the figures, the author may provide appropriate statistics to support this statement, such as a correlation between cell separation and CCpeakmean CClagvar./

      We have inserted summaries of linear model fits into the legends for Figure 3E-F, Figure 5F-H and Figure 7D.

      (2)  If I understood correctly, in the second last paragraph on page 6, "pairs of SCs" should be changed to "pairs of PV+ INs".

      Thanks. Corrected.

      (3)  Page 9: the 7th line to the end: where is Figure S4?

      Corrected to 'Figure 3, Figure Supplement 2'.

      (4)  Page 27: at the end of figure caption B: two ".

      Corrected.

      (5)  Figures 3A and B: what are the red vertical rectangles?

      These are the regions shown on an expanded time base in C and D. This is now clarified in the legend.

      (6)  Page 28 Figure caption of D and E: (C) and (D) should be (D) and (E).

      Corrected.

      (7)  The first sentence of the third paragraph in INTRODUCTION: 'later' should be 'layer'.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      - Some related work has been done by Beed et al. 2013 to map the spatial distribution of inputs to neurons in MEC. Certainly, there are differences in the approaches and the key questions, but the contribution of this study would benefit from a more detailed comparison of the results from Beed vs the current study and should be included in the discussion.

      It's hard to include a detailed comparison of results, at least without losing focus, as the two studies address different questions with different approaches. We already noted that 'Local optical activation of unidentified neurons has also been used to infer connectivity principles but with a focus on responses of single postsynaptic neurons (Beed et al., 2013, 2010)'. In addition, we now note that 'Our focal optogenetic stimulation approach also offers insight into the spatial organization of presynaptic neuronal populations, with the advantage, compared to focal glutamate uncaging previously used to investigate connectivity in the MEC (Beed et al., 2013, 2010), that the identity of the presynaptic cell population is genetically defined'.

      - There are a few places where the language is ambiguous or needs a more detailed description for clarity. • 3rd paragraph under "Focal activation of SCs generates common input to nearby PV+Ins". The correlation probability description in this paragraph and a similar sentence in the methods are very hard to understand. I had to look up the analysis in Yoshimura et al. 2005 to understand what was done here. It's a nice analysis, but the manuscript could benefit from a more detailed description of this measure in the methods.

      We agree, it is a somewhat complex metric and is challenging to explain. In the interests of keeping the main text succinct, we have left the bare bones explanation as it was in the Results, but have expanded the explanation in the Methods. We hope this is now clear.

      - " Alternatively, if there is no clear spatial organization of SC to PV+INs connections, then the similarity between stimulus locations for pairs of SCs should have a random distribution." This sentence is hard to understand. I think the use of the phrase "similarity of stimulus location" is a strange phrasing and is driving the confusion in this sentence.

      We have replaced this with 'correspondence between active stimulus locations'.

      - In the discussion under "Spatial extent and functional organization of L2 circuits" there is a grammatical mistake (seems to be 2x phrasing of "leads to common synaptic input").

      Corrected.

      - Citation in the introduction/discussion. Introduction: in addition to Gu et al. 2018, Heys et al 2014 also showed there are non-random correlations among putative grid cells as a function of their somatic distance. In the discussion section, in addition to Gu et al. 2018, Heys et al. 2014 showed there is anatomical clustering of grid cells in MEC. This earlier work investigating functional correlations among neurons in the superficial aspect of MEC in vivo should be cited and is particularly relevant in these two sections of the manuscript.

      Thanks, we apologise for the oversight. We're well aware of this important study and have now cited it.

      -Typo - Paragraph 3 of the intro; "later" should be layer.

      Corrected.

      -Figure 5 (D-E) there is a typo high correlation probability is D and low correlation is E (text says C/D).

      Corrected.

      Reviewer #3 (Recommendations For The Authors):

      The paper is missing the bibliography section. This makes the review somewhat difficult as some cited papers are not immediately familiar based on the citation.

      Thanks and our apologises for making extra work by omitting this. It is now included.

      Page 2 - "cell clusters" - they should also cite the paper by Heys and Dombeck, 2014 that shows a spatial scale of inhibitory interactions computed based on correlations of grid cells recorded using 2-photon calcium imaging.

      Added (see above).

      Page 2 - "later 2 of the MEC" - layer.

      Corrected.

      Page 2 - "synaptic interactions" - again they should mention the work by Heys and Dombeck, 2014 that indirectly measured the spatial scale of inhibition.

      Now cited in this paragraph.

      Page 4 "we simulated responses" and Figure 2E - in each simulation - did they fit the magnitude and time constant of the simulated EPSCs to individual EPSCs in the data? Or did they randomly vary these to find the best fit?

      The parameters for the simulations are given in the Methods and were chosen to correspond to the experimental values. We have rewritten this section to make the simulation methods clearer. Simulations using different time constants within a physiological range support similar conclusions.

      Page 4 - "we identified 35/71" - Are these the cells that appear in yellow as correlated in Figures 3E-F? If so, the text should indicate that these cells are shown in yellow.

      We have added this and have also updated the legends for additional clarification.

      Figure 2, Figure Supplement 1 - B,C - the following phrase is not clear: "when the 4 / 8 of each neurons inputs from SCs also project to the other neuron (B)," Should the "the" be removed? Also, by 4/8 do they mean 50%, or do they mean 4 to 8?

      Thanks, we've reworded to improve the clarity.

      E - "receiving presynaptic inputs consisted of 4 overlapping SCs" - should it say "consisting"?

      Corrected.

      Figure 3, Figure Supplement 1 part E - "the same data as (C )" - should this be the same data as (D)?? I do not see how doing clustering on the shuffled data in (C ) would give two groups, but it makes sense if it is from (D).

      That's right, now corrected.

      Page 5 - "used action potentials" - this is confusing. Is the word "used" supposed to be there?

      Corrected.

      Page 5 - "widefield activation experiments" - they should cite the experiments that they are referring to here.

      Added.

      Page 5 - "effect of blocking" - "Figure 4" - I find it very odd that the agent GABAzine in Figure 4 is not explicitly mentioned in the main text (though it is mentioned in the methods). The main text should indicate that blocking was performed using GABAzine.

      Added.

      Page and page 14 and Figure 5 - "shifted" - do they mean shuffled?

      We do. The classic papers by Yoshimura et al. used shifted so we keep this here so it's clear we've used their approach. We've added additional explanation to try to make sure the meaning is clear.

      Figure 5 A, B, D, and E would benefit from a more detailed description. They should state whether the labels "1a" and "1b" and "2a" and "2b" refer to different recorded neurons in each pair. They should indicate that 2a and 2b are a different pair? Are the x, y axes of the images corresponding to anatomical position? Does "B" indicate the location of recordings shown in Figure 5B? The authors probably think this is all obvious, but it is not immediately obvious to the reader.

      We have added additional clarification.

      Page 8 - "Beed et al." - These papers by Beed ought to be cited in the introduction as well as they are highly relevant.

      We now cite Beed et al. 2013 in the Introduction when we discuss local inhibitory input to SCs. While the Beed et al. 2010 paper is an important contribution to understanding about pathways from deep to superficial layers, the introduction focuses on communication between identified pre- and postsynaptic populations within layer 2 and therefore we haven't found a way to cite it without losing focus. We do cite this paper multiple times elsewhere.

      Page 10 - "Excitatory-inhibitory interactions" - this summary of attractor models ought to cite the paper by Burak and Fiete as well.

      The discussion focuses on models with excitatory-inhibitory connectivity and cites an important paper from the Fiete group. The model by Burak and Fiete, while also important, is purely inhibitory and so is not well constrained by the known circuitry, and therefore could not be correctly cited here.

      Page 10 - "be consistent with models…or that focus on pyramidal neurons have also been proposed" - this seems ungrammatical as if two different sentences were merged.

      Corrected.

      References

      Couey, Jonathan J, Aree Witoelar, Sheng-Jia Zhang, Kang Zheng, Jing Ye, Benjamin Dunn, Rafal Czajkowski, et al. 2013. “Recurrent Inhibitory Circuitry as a Mechanism for Grid Formation.” Nat. Neurosci. 16 (3): 318–24. https://doi.org/10.1038/nn.3310.

      Dudman, Joshua T, and Matthew F Nolan. 2009. “Stochastically Gating Ion Channels Enable Patterned Spike Firing through Activity-Dependent Modulation of Spike Probability.” Plos Comput. Biol. 5 (2): e1000290. https://doi.org/10.1371/journal.pcbi.1000290.

      Fuchs, Elke C, Angela Neitz, Roberta Pinna, Sarah Melzer, Antonio Caputi, and Hannah Monyer. 2016. “Local and Distant Input Controlling Excitation in Layer II of the Medial Entorhinal Cortex.” Neuron 89 (1): 194–208. https://doi.org/10.1016/j.neuron.2015.11.029.

      Pastoll, Hugh, Derek L Garden, Ioannis Papastathopoulos, Gülşen Sürmeli, and Matthew F Nolan. 2020. “Inter- and Intra-Animal Variation in the Integrative Properties of Stellate Cells in the Medial Entorhinal Cortex.” Elife 9 (February). https://doi.org/10.7554/eLife.52258.

      Pastoll, Hugh, Lukas Solanka, Mark C W van Rossum, and Matthew F Nolan. 2013. “Feedback Inhibition Enables Theta-Nested Gamma Oscillations and Grid Firing Fields.” Neuron 77 (1): 141–54. https://doi.org/10.1016/j.neuron.2012.11.032.

      Sürmeli, Gülşen, Daniel Cosmin Marcu, Christina McClure, Derek L F Garden, Hugh Pastoll, and Matthew F Nolan. 2015. “Molecularly Defined Circuitry Reveals Input-Output Segregation in Deep Layers of the Medial Entorhinal Cortex.” Neuron 88 (5): 1040–53. https://doi.org/10.1016/j.neuron.2015.10.041.

      Winterer, Jochen, Nikolaus Maier, Christian Wozny, Prateep Beed, Jörg Breustedt, Roberta Evangelista, Yangfan Peng, Tiziano D’Albis, Richard Kempter, and Dietmar Schmitz. 2017. “Excitatory Microcircuits within Superficial Layers of the Medial Entorhinal Cortex.” Cell Rep. 19 (6): 1110–16. https://doi.org/10.1016/j.celrep.2017.04.041.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      While there are many models for sequence retrieval, it has been difficult to find models that vary the speed of sequence retrieval dynamically via simple external inputs. While recent works [1,2] have proposed some mechanisms, the authors here propose a different one based on heterogeneous plasticity rules. Temporally symmetric plasticity kernels (that do not distinguish between the order of pre and post spikes, but only their time difference) are expected to give rise to attractor states, asymmetric ones to sequence transitions. The authors incorporate a rate-based, discrete-time analog of these spike-based plasticity rules to learn the connections between neurons (leading to connections similar to Hopfield networks for attractors and sequences). They use either a parametric combination of symmetric and asymmetric learning rules for connections into each neuron, or separate subpopulations having only symmetric or asymmetric learning rules on incoming connections. They find that the latter is conducive to enabling external inputs to control the speed of sequence retrieval.

      Strengths:

      The authors have expertly characterised the system dynamics using both simulations and theory. How the speed and quality of retrieval varies across phases space has been well-studied. The authors are also able to vary the external inputs to reproduce a preparatory followed by an execution phase of sequence retrieval as seen experimentally in motor control. They also propose a simple reinforcement learning scheme for learning to map the two external inputs to the desired retrieval speed.

      Weaknesses:

      (1) The authors translate spike-based synaptic plasticity rules to a way to learn/set connections for rate units operating in discrete time, similar to their earlier work in [5]. The bio-plausibility issues of learning in [5] carry over here, for e.g. the authors ignore any input due to the recurrent connectivity during learning and effectively fix the pre and post rates to the desired ones. While the learning itself is not fully bio-plausible, it does lend itself to writing the final connectivity matrix in a manner that is easier to analyze theoretically.

      We agree with the reviewer that learning is not `fully bio-plausible’. However, we believe that extending the results to a model in which synaptic plasticity depends on recurrent inputs is beyond the scope of this work. We have added a mention of this issue in the Discussion in the revised manuscript.

      (2) While the authors learn to map the set of two external input strengths to speed of retrieval, they still hand-wire one external input to the subpopulation of neurons with temporally symmetric plasticity and the other external input to the other subpopulation with temporally asymmetric plasticity. The authors suggest that these subpopulations might arise due to differences in the parameters of Ca dynamics as in their earlier work [29]. How these two external inputs would connect to neurons differentially based on the plasticity kernel / Ca dynamics parameters of the recurrent connections is still an open question which the authors have not touched upon.

      The issue of how external inputs could self-organize to drive the network to retrieve sequences at appropriate speeds is addressed in the Results section, paragraph `Reward-driven learning’. These inputs are not `hand-wired’ - they are initially random and then acquire the necessary strengths to allow the network to retrieve the sequences at different speeds thanks to a simple reinforcement learning scheme. We have rewritten this section to clarify this issue.

      (3) The authors require that temporally symmetric and asymmetric learning rules be present in the recurrent connections between subpopulations of neurons in the same brain region, i.e. some neurons in the same brain region should have temporally symmetric kernels, while others should have temporally asymmetric ones. The evidence for this seems thin. Though, in the discussion, the authors clarify 'While this heterogeneity has been found so far across structures or across different regions in the same structure, this heterogeneity could also be present within local networks, as current experimental methods for probing plasticity only have access to a single delay between pre and post-synaptic spikes in each recorded neuron, and would therefore miss this heterogeneity'.

      We agree with the reviewer that this is currently an open question. We describe this issue in more detail in the Discussion of the revised manuscript.

      (4) An aspect which the authors have not connected to is one of the author's earlier work:

      Brunel, N. (2016). Is cortical connectivity optimized for storing information? Nature Neuroscience, 19(5), 749-755. https://doi.org/10.1038/nn.4286 which suggests that the experimentally observed over-representation of symmetric synapses suggests that cortical networks are optimized for attractors rather than sequences.

      We thank the reviewer for this suggestion. We have added a paragraph in the discussion that discusses work on statistics of synaptic connectivity in optimal networks. We expect that in networks that contain two subpopulations of neurons, the degree of symmetry should be intermediate between a network storing fixed point attractors exclusively, and a network storing sequences exclusively.

      Despite the above weaknesses, the work is a solid advance in proposing an alternate model for modulating speed of sequence retrieval and extends the use of well-established theoretical tools. This work is expected to spawn further works like extending to a spiking neural network with Dale's law, more realistic learning taking into account recurrent connections during learning, and experimental follow-ups. Thus, I expect this to be an important contribution to the field.

      We thank the reviewer for the insightful comments.

      Reviewer #2 (Public Review):

      Sequences of neural activity underlie most of our behavior. And as experience suggests we are (in most cases) able to flexibly change the speed for our learned behavior which essentially means that brains are able to change the speed at which the sequence is retrieved from the memory. The authors here propose a mechanism by which networks in the brain can learn a sequence of spike patterns and retrieve them at variable speed. At a conceptual level I think the authors have a very nice idea: use of symmetric and asymmetric learning rules to learn the sequences and then use different inputs to neurons with symmetric or asymmetric plasticity to control the retrieval speed. The authors have demonstrated the feasibility of the idea in a rather idealized network model. I think it is important that the idea is demonstrated in more biologically plausible settings (e.g. spiking neurons, a network with exc. and inh. neurons with ongoing activity).

      Summary

      In this manuscript authors have addressed the problem of learning and retrieval sequential activity in neuronal networks. In particular, they have focussed on the problem of how sequence retrieval speed can be controlled?

      They have considered a model with excitatory rate-based neurons. Authors show that when sequences are learned with both temporally symmetric and asymmetric Hebbian plasticity, by modulating the external inputs to the network the sequence retrieval speed can be modulated. With the two types of Hebbian plasticity in the network, sequence learning essentially means that the network has both feedforward and recurrent connections related to the sequence. By giving different amounts of input to the feed-forward and recurrent components of the sequence, authors are able to adjust the speed.

      Strengths

      - Authors solve the problem of sequence retrieval speed control by learning the sequence in both feedforward and recurrent connectivity within a network. It is a very interesting idea for two main reasons: 1. It does not rely on delays or short-term dynamics in neurons/synapses 2. It does not require that the animal is presented with the same sequences multiple times at different speeds. Different inputs to the feedforward and recurrent populations are sufficient to alter the speed. However, the work leaves several issues unaddressed as explained below.

      Weaknesses

      - The main weakness of the paper is that it is mostly driven by a motivation to find a computational solution to the problem of sequence retrieval speed. In most cases they have not provided any arguments about the biological plausibility of the solution they have proposed e.g.:

      - Is there any experimental evidence that some neurons in the network have symmetric Hebbian plasticity and some temporally asymmetric? In the references authors have cited some references to support this. But usually the switch between temporally symmetric and asymmetric rules is dependent on spike patterns used for pairing (e.g. bursts vs single spikes). In the context of this manuscript, it would mean that in the same pattern, some neurons burst and some don't and this is the same for all the patterns in the sequence. As far as I see here authors have assumed a binary pattern of activity which is the same for all neurons that participate in the pattern.

      There is currently only weak evidence for heterogeneity of synaptic plasticity rules within a single network, though there is plenty of evidence for such a heterogeneity across networks or across locations within a particular structure (see references in our Discussion). The reviewer suggests another interesting possibility, that the temporal asymmetry could depend on the firing pattern on the post-synaptic neuron. An example of such a behavior can be found in a paper by Wittenberg and Wang in 2006, where they show that pairing single spikes of pre and post-synaptic neurons lead to LTD at all time differences in a symmetric fashion, while pairing a pre-synaptic spike with a burst of post-synaptic spikes lead to temporally asymmetric plasticity, with a LTP window at short positive time differences. We now mention this possibility in the Discussion, but we believe exploring fully this scenario is beyond the scope of the paper.

      - How would external inputs know that they are impinging on a symmetric or asymmetric neuron? Authors have proposed a mechanism to learn these inputs. But that makes the sequence learning problem a two stage problem -- first an animal has to learn the sequence and then it has to learn to modulate the speed of retrieval. It should be possible to find experimental evidence to support this?

      Our model does not assume that the two processes necessarily occur one after the other. Importantly, once the correct external inputs that can modulate sequence retrieval are learned, sequence retrieval modulation will automatically generalize to arbitrary new sequences that are learned by the network.

      - Authors have only considered homogeneous DC input for sequence retrieval. This kind of input is highly unnatural. It would be more plausible if the authors considered fluctuating input which is different from each neuron.

      We have modified Figure 1e and Figure 2c to show the effects of fluctuating inputs on pattern correlations and single unit activity. We find that these inputs do not qualitatively affect our results.

      - All the work is demonstrated using a firing rate based model of only excitatory neurons. I think it is important that some of the key results are demonstrated in a network of both excitatory and inhibitory spiking neurons. As the authors very well know it is not always trivial to extend rate-based models to spiking neurons.

      I think at a conceptual level authors have a very nice idea but it needs to be demonstrated in a more biologically plausible setting (and by that I do not mean biophysical neurons etc.).

      We have included a new section in the discussion with an associated figure (Figure 7) demonstrating that flexible speed control can be achieved in an excitatory-inhibitory (E-I) spiking network containing two excitatory populations with distinct plasticity mechanisms.

      Reviewer #1 (Recommendations For The Authors):

      In the introduction, the authors state: 'symmetric kernels, in which coincident activity leads to strengthening regardless of the order of pre and post-synaptic spikes, have also been observed in multiple contexts with high frequency plasticity induction protocols in cortex [21]'. To my understanding, [21]'s final model 3, ignores LTD if the post-spike also participates in LTP, and only considers nearest-neighbour interactions. Thus, the kernel would not be symmetric. Can the authors clarify what they mean and how their conclusion follows, as [21] does not show any kernels either.

      In this statement, we were not referring to the model in [21], but rather the experimentally observed plasticity kernels at different frequencies. In particular, we were referring to the symmetric kernel that appears in the bottom panel of Figure 7c in that paper.

      The authors should also address the weaknesses mentioned above. They don't need to solve the issues but expand (and maybe indicate resolutions) on these issues in the Discussion.

      For ease of reproducibility, the authors should make their code available as well.

      We intend to publish the code required to reproduce all figures on Github.

      Reviewer #2 (Recommendations For The Authors):

      -  Show the ground state of the network before and after learning.

      We have decided not to include such a figure, as we have not analyzed the learning process, but instead a network with a fixed connectivity matrix which is assumed to be the end result of a learning process.

      -  Authors have only considered a network of excitatory neurons. This does not make sense. I think they should demonstrate a network of both exc. and inch. neurons (spiking neurons) exhibiting ongoing activity.

      See our comment to Reviewer #2 in the previous section.

      -  Show how the sequence dynamics unfolds when we assume a non-zero ongoing activity.

      We are not sure what the reviewer means by `non-zero ongoing activity. We show now the dynamics of the network in the presence of noisy inputs, which can represent ongoing activity from other structures (see Fig 1e and 2c).

      -  From the correlation (==quality) alone it is difficult to judge how well the sequence has been recovered. Authors should consider showing some examples so that the reader can get a visual estimate of what 0.6 quality may mean. High speed is not really associated with high quality (Fig 2b). So it is important to show how the sequence retrieval quality is for non-linear and heterogeneous learning rules.

      We believe that some insight into the relationship between speed and quality for the case of non-linear and heterogeneous learning rules is addressed by the correlation plots for chosen input configurations (see Fig. 3a and and 5b). We leave a full characterization for future work.

      -  Authors should show how the retrieval and quality of sequences change when they are recovered with positive input, or positive input to one population and negative to another. In the current version sequence retrieval is shown only with negative inputs. This is a somewhat non-biological setting. The inhibitory gating argument (L367-389) is really weak.

      We would like to clarify that with the parameters chosen in this paper, the transfer function has half its maximal rate at zero input. This is due to the fact we chose the threshold to be zero, using the fact that any threshold can be absorbed in the external inputs. Thus, negative inputs really mean sub-threshold inputs, and they are consistent with sub-threshold external excitatory inputs. We have clarified this issue in the revised manuscript.

      -  Authors should demonstrate how the sequence retrieval dynamics is altered when they assume a fluctuating input current for sequence retrieval instead of a homogeneous DC input.

      See our comment to Reviewer #2 in the previous section.

      -  Authors should show what are the differences in synaptic weight distribution for the two types of learning (bi-linear and non-linear). I am curious to know if the difference in the speed in the two cases is related to the weight distribution. In general I think it is a good idea to show the synaptic weight distribution before and after learning.

      As mentioned above, we do not study any learning process, but rather a network with a fixed connectivity matrix, assumed to represent the end result of learning. In this network, the distribution of synaptic weights converges to a Gaussian in the large p and cN limits, independently of the functions f and g, because of the central limit theorem, if there are no sign constraints on weights. In the presence of sign constraints, the distribution is a truncated Gaussian.

      -  I suggest the use of a monochromatic color scale for figure 2b and 3b.

      Figure 3: The sentence describing panel 2 seems incomplete.

      Also explain why there is non-monotonic relationship between I_s and speed for some values of

      I_a in 3b

      There is a non-monotonic relationship for retrieval quality, not speed. We have clarified this in the manuscript text, but don’t currently have an explanation for why this phenomenon occurs for these specific values of I_a.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Additional Discussion Points

      (1) There is not much exploration of potential mechanisms, i.e., the impact of PV neuron activity on the broader circuit. Additionally, the study exclusively focuses on PV cells and does not explore the role of other prefrontal populations, particularly those known to respond to cueevoked fear states. The discussion should consider how PV activity might impact the broader circuit and whether the present findings are specific to PV cells or applicable to other interneuron subtypes.

      We have added an extensive discussion of potential mechanisms and the potential contributions of other interneuron subtypes:

      “For example, PV neurons aid in improving visual discrimination through sharpening response selectivity in visual cortex (Lee et al., 2012). In prefrontal cortex, PV neurons are critical for task performance, particularly during performance of tasks that require flexible behavior such as rule shift learning (Cho et al., 2020) and reward extinction (Sparta et al., 2014). Further, PV neurons play an essential role in the generation of cortical gamma rhythms, which contribute to synchronization of selective populations of pyramidal neurons (Sohal et al., 2009; Cardin et al., 2009). Courtin et al (2014) showed that brief suppression of dorsomedial prefrontal (dmPFC) PV neural activity enhanced fear expression, one of the main functions of the dmPFC, by synchronizing the spiking activity of dmPFC pyramidal neurons (Courtin et al., 2014). This result is potentially relevant to our findings, but likely involves different circuit mechanisms because of the difference in timescale, targeted area, and downstream projection targets (Vertes, 2004). These and other studies support the idea that PV neural activity supports the execution of a behavior by shaping rather than suppressing cortical activity, potentially by selecting among conflicting behaviors by the synchronization of different pyramidal populations (Warden et al., 2012; Lee et al., 2014).

      The roles of other inhibitory neural subtypes (such as somatostatin (SOM)-expressing and vasoactive intestinal peptide (VIP)-expressing IL GABA neurons) in avoidance behavior are currently unknown, but are likely important given the role of SOM neurons in gamma-band synchronization (Veit et al., 2017), and the role of VIP neurons in regulating PV and SOM neural activity (Cardin, 2018).” 

      (2) There is some discordance between changes in neural activity and behavior. For example, in Figure 4C, the relationship between PV neuron activity and movement emerges almost immediately during learning, but successful active avoidance emerges much more gradually. Why is this?

      We have added extensive text to the discussion that addresses this issue:

      “Interestingly, the rise in IL PV neural activity during movement does not require avoidance learning. IL PV neurons begin to respond during movement immediately after the animal has received a single shock in an environment, but learning to cross the chamber to avoid the signaled shock takes tens of trials. Why is there a discordance between the emergence of the IL PV signal during movement and avoidance learning?

      The components underlying active avoidance have been debated over the years, but are thought to involve at least two essential behaviors – suppressing freezing, and moving to safety (LeDoux et al., 2017). Freezing is the default response of mice upon hearing a shock-predicting tone, and can be learned in a single trial (Ledoux, 1996; Fanselow, 2010; Zambetti et al., 2022). When a predator is in the distance, freezing can increase the chance of survival by reducing the chances of detection. However, a strategic avoidance behavior may prevent a future encounter with the predator altogether. The importance of IL PV neural activity in defensive behavior may be to suppress reactive defensive behaviors such as freezing in order to permit a flexible goaldirected response to threat.

      The freezing suppression and avoidance movement components of the avoidance response are dissociable, both because freezing precedes avoidance learning, and because animals intermittently move prior to avoidance learning. Our finding that the rise in PV activity during movement emerges immediately after receiving a single shock, tens of trials before animals have learned the avoidance behavior, suggests that the IL PV signal is associated with the suppression of freezing. Further, IL PV neurons do not respond during movement toward cued rewards because in reward-based tasks there is no freezing response in conflict with reward approach behavior.” 

      (3) vmPFC was defined here as including the infralimbic (IL) and dorsal peduncular (DP) regions. While the role of IL has been frequently characterized for motivated behavior, relatively few studies have examined DP. Perhaps the authors are just being cautious, given the challenges involved in the viral targeting of the IL region without leakage to nearby regions such as DP. But since the optical fibers were positioned above the IL region, it is possible that DP did not contribute much to either the fiber photometry signals or the effects of the optogenetic manipulations. Perhaps DP should be completely omitted, which is more consistent with the definitions of vmPFC in the field.

      Yes, we included DP to be cautious as our viral expression sometimes leaks into DP, though the optic fiber targets IL. We have replaced vmPFC with IL throughout the manuscript. 

      (4) In the Discussion, the authors should consider why PV cells exhibit increased activity during both movement initiation and successful chamber crossing during avoidance. While the functional contribution of the PV signal during movement initiation was tested with optogenetic inhibition, some discussion on the possible role of the additional PV signal during chamber crossing is of interest readers who are intrigued by the signaling of two events. Is the chamber crossing signal related to successful avoidance or learned safety (e.g., see Sangha, Diehl, Bergstrom, Drew 2020)?

      IL PV neural activity starts to increase at movement initiation, peaks at chamber crossing (when movement speed is highest), and decreases after chamber crossing (Figure 1E). Thus, the increase in PV neural activity at movement initiation and at chamber crossing are different phases of the same event. 

      We think this signal is unlikely to be a safety signal, and have added text to the discussion to clarify this issue:

      “We think the IL PV signal is unlikely to be a safety signal (Sangha et al., 2020). First, the PV signal rises during movement not only in the avoidance context, but during any movement in a “threatening” context (i.e. a context where the animal has been shocked). For example, PV neural activity rises during movement during the intertrial interval in the avoidance task. Further, the emergence of the PV signal during movement happens quickly – after the first shock – and significantly before the animal has learned to move to the safe zone. This suggests a close association with enabling movement in a threatening environment, when animals must suppress a freezing response in order to move. Additionally, the rise in PV activity was specifically associated with movement and not with tone offset, the indicator of safety in this task. Finally, if IL PV neural activity reflects safety signals one would expect the response to be enhanced by learning, but the amplitude of the IL PV response was unaffected by learning after the first shock.”

      (5) The primary conclusion here that PV cells control the fear response should be considered within the context of prior findings by the Herry laboratory. Courtin et al (2014) demonstrated a select role of prefrontal PV cells in the regulation of fear states, accomplished through their control over prefrontal output to the basolateral amygdala. The observations in this paper, which used both ChR2 and Arch-T to address the impact of vmPFC PV activity on reactive behavior, are highly relevant to issues raised both in the Introduction and Discussion.

      Courtin et al (2014)’s finding is very important. We did not discuss this paper originally because Courtin et al. is about dmPFC, which has a different role in fear processing than IL/vmPFC. We have added text about this finding to the discussion:

      “Courtin et al (2014) showed that brief suppression of dorsomedial prefrontal (dmPFC) PV neural activity enhanced fear expression, one of the main functions of the dmPFC, by synchronizing the spiking activity of dmPFC pyramidal neurons (Courtin et al., 2014). This result is potentially relevant to our findings, but likely involves different circuit mechanisms because of the difference in timescale, targeted area, and downstream projection targets (Vertes, 2004).

      Additional analyses

      (1) As avoidance trials progress (particularly on days 2 and 3), do PFC PV responses attenuate? That is, does continued unreinforced tone presentations lead to reduced reliance of PV cellmediated suppression in order for successful avoidance to occur?

      We added Figure 1—Figure supplement 1M and 1N and a sentence on page 5: “IL PV neural activity during the avoidance movement was not attenuated by learning or repeated reinforcement (Figure 1—Figure supplement 1M and N, N = 8 mice, p = 0.8886, 1-way ANOVA).” We only included data from days 1 and 2, since we started to introduce short and long tone trials on day 3 which might interfere. 

      (2) In Figure 3D, it would be very informative and further support the claim of "no role for movement during reward" if the response of these cells during the "initiation of movement during reward-approach" was shown (similar to Figure 1F for threat avoidance).

      Thank you for the question. We added Figure 3—Figure supplement 1B and C to show IL PV neural activity aligned to initiation of movement during reward-approach. IL PV activity decreased after movement initiation for reward approach (N = 6 mice, p=0.0382, paired t-test). This further solidifies our claim that IL PV neuron activity only increases for threat avoidance.   

      Reviewer 1 (Recommendations For The Authors):

      (1) Fig1G shows the average response of PV cells during chamber crossing on an animal-toanimal basis. It would be informative to also see a similar plot for movement initiation.

      We have added the suggested figure in Figure 1—Figure supplement 1B.  

      (2) In the Results section (Page 5), there is a small issue with the logic. It says: "As vmPFC inactivation impairs avoidance behavior, the activity of inhibitory vmPFC PV neurons might be predicted to be low during successful avoidance trials." As opposed to "low", it should say "high", right? If inhibition impairs avoidance, then high responding by these cells would be presumed to drive the avoidance response, as supported by your findings.

      We have re-worded the text in this section. Based on prior findings that IL inactivation impairs avoidance (Moscarello et al., 2013), we predicted that inhibitory PV neurons would be less active during avoidance, because activating these neurons could suppress IL. However, we found that they were selectively active during avoidance.

      (3) In the caption/legend for Fig1E, it says that the "black ticks" indicate "tone onset". But it should say "movement initiation".

      We thank the reviewer for pointing out this error. The ticks do indicate tone onset, and we have corrected the figure to reflect this. 

      Reviewer 2 (Recommendations For The Authors):

      (4) Perhaps replace the term 'good outcomes' with 'reinforcing outcomes' or simply 'reinforcement'.

      Thank you for the suggestion. We have replaced ‘good outcomes’ with ‘reinforcing outcomes’.

      Reviewer 3 (Recommendations For The Authors):

      (5) It would be useful to provide some (perhaps speculative) explanation for the discordance between the PV activity-movement relationship and success of active avoidance in Fig. 4C

      We have added text to the discussion that addresses this issue:

      “Interestingly, the rise in IL PV neural activity during movement does not require avoidance learning. IL PV neurons begin to respond during movement immediately after the animal has received a single shock in an environment, but learning to cross the chamber to avoid the signaled shock takes tens of trials. Why is there a discordance between the emergence of the IL PV signal during movement and avoidance learning?

      The components underlying active avoidance have been debated over the years, but are thought to involve at least two essential behaviors – suppressing freezing, and moving to safety (LeDoux et al., 2017). Freezing is the default response of mice upon hearing a shock-predicting tone, and can be learned in a single trial (Ledoux, 1996; Fanselow, 2010; Zambetti et al., 2022). When a predator is in the distance, freezing can increase the chance of survival by reducing the chances of detection. However, a strategic avoidance behavior may prevent a future encounter with the predator altogether. The importance of IL PV neural activity in defensive behavior may be to suppress reactive defensive behaviors such as freezing in order to permit a flexible goaldirected response to threat.

      The freezing suppression and avoidance movement components of the avoidance response are dissociable, both because freezing precedes avoidance learning, and because animals intermittently move prior to avoidance learning. Our finding that the rise in PV activity during movement emerges immediately after receiving a single shock, tens of trials before animals have learned the avoidance behavior, suggests that the IL PV signal is associated with the suppression of freezing. Further, IL PV neurons do not respond during movement toward cued rewards because in reward-based tasks there is no freezing response in conflict with reward approach behavior.” 

      (6) I don't really understand what is shown in Figure 4D -- exactly what time points does this represent? Was habituation performed everyday?

      Figure 4D shows data from the approach task, not the avoidance task. This data is from welltrained mice, not the first day of training on this task. There was a pre-task recording period every day.

      (7) Why was optogenetic inhibition only delivered from 0.5-2.5 sec after the tone cue?

      We wanted to avoid any possibility that perception of the tone would be disrupted, so we delayed the onset of optogenetic inhibition. We chose 0.5 sec onset because animals typically begin to move ~1 second after tone onset.

      (8) The regression analysis with shuffled time points is not well explained -- some additional methodological details are needed (Fig. 2H).

      We added the following to the methods section to provide a clearer explanation: 

      “DF/F (t) was modeled as the linear combination of all event kernels. Given the event occurrence time points of all event types, we can use linear regression to decompose characteristic kernels for each event type. Kernel coefficients of the model were solved by minimizing the mean square errors between the model and the actual recorded signals. To prove that kernel ki is an essential component for the raw calcium dynamics, we compared the explanation power of the full model to the reduced model where the time points of the occurrence of event ki were randomly assigned. Thus, the kernel coefficients should not reflect the response to the event in the reduced model. 

      Editor's notes:

      -  Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the pvalue is less than 0.05.

      Thank you for pointing this out. We have included all the test statistics and exact p values as suggested.

      -  Please note the sex of the mice and distribution of sexes in each group for each experiment.

      We have added the sex of mice for all experiments in the methods section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work successfully identified and validated TRLs in hepatic metastatic uveal melanoma, providing new horizons for enhanced immunotherapy. Uveal melanoma is a highly metastatic cancer that, unlike cutaneous melanoma, has a limited effect on immune checkpoint responses, and thus there is a lack of formal clinical treatment for metastatic UM. In this manuscript, the authors described the immune microenvironmental profile of hepatic metastatic uveal melanoma by sc-RNAseq, TCR-seq, and PDX models. Firstly, they identified and defined the phenotypes of tumor-reactive T lymphocytes (TRLs). Moreover, they validated the activity of TILs by in vivo PDX modelling as well as in vitro coculture of 3D tumorsphere cultures and autologous TILs. Additionally, the authors found that TRLs are mainly derived from depleted and late activated T cells, which recognize melanoma antigens and tumor-specific antigens. Most importantly, they identified TRLs associated phenotypes, which provide new avenues for targeting expanded T cells to improve cellular and immune checkpoint immunotherapy.

      Strengths:

      Jonas A. Nilsson, et al. has been working on new therapies for melanoma.  The team has also previously performed the most comprehensive genome-wide analysis of uveal melanoma available, presenting the latest insights into metastatic disease. In this work, the authors performed paired sc-RNAseq and TCR-seq on 14 patients with metastatic UM, which is the largest single-cell map of metastatic UM available. This provides huge data support for other  studies of metastatic UM.

      We thank the reviewer for these kind words about our work.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that these strengths are not  directly demonstrated. That is,  insufficient analyses are performed to fully support the key claims in the manuscript by the data presented. In particular:

      The author's description of the overall results of the article should be logical, not just a description of the observed phenomena. For example, the presentation related to the results of TRLs lacked logic. In addition, the title of the article emphasizes the three subtypes of hepatic metastatic UM  TRLs, but these three subtypes are not specifically discussed in the results as well as the discussion section. The title of the article is not a very comprehensive generalization and should be carefully considered by the authors.

      We thank the reviewer for the critical reading of our work. We have added more data and more discussion.

      The authors' claim that they are the first to use autologous TILs and sc-RNAseq to study immunotherapy needs to be supported by the corresponding literature to be more convincing. This can help the reader to understand the innovation and importance of the methodology.

      We have gone through the manuscript and found that we only refer to being first in using PDX models and autologous TILs to study immunotherapy responses by single-cell sequencing. While there are data to be deduced from other studies, we still believe this to be an accurate statement.

      In addition, the authors argue that TILs from metastatic UM can kill tumor cells. This is the key and bridging point to the main conclusion of the article. Therefore, the credibility of this conclusion should be considered.  Metastatic UM1 and UM9 remain responsive to autologous tumors under in vitro conditions with their autologous TILs.

      UM1 responds also in vivo in the subcutaneous model in the paper. We have also finished an experiment where we show that this model also responds in a liver metastasis model. These data have been added in this revised version of the paper. We add two main figures and one supplementary figure where we characterize the response in vivo and also by single-cell sequencing of TILs.

      In contrast, UM22, also as a metastatic UM, did not respond to TIL treatment. In particular, the presence of MART1-responsive TILs. The reliability of the results obtained by the authors in the model of only one case of UM22 liver metastasis should be considered. The authors should likewise consider whether such a specific cellular taxon might also exist in other patients with metastatic UM, producing an immune response to tumor cells. The results would be more comprehensive if supported by relevant data.

      The reviewer has interpreted the results absolutely right, the allogenic and autologous MART1-specific TILs cells while reactive in vitro against UM22, cannot kill this tumor either in a subcutaneous or liver metastases model. We hypothesize this has to do with an immune exclusion phenotype and show weak immunohistochemistry that suggest this. We hope the addition of more UM1 data can be viewed as supportive of tumor-reactivity also in vivo.

      In addition, the authors in that study used previously frozen biopsy samples for TCR-seq, which may be associated with low-quality sequencing data, high risk of outcome indicators, and unfriendly access to immune cell information. The existence of these problems and the reliability of the results should be considered. If special processing of TCR-seq data from frozen samples was performed, this should also be accounted for.  

      We agree with the reviewers and acknowledge we never anticipated the development of single-cell sequencing techniques when we started biobank 2013. We performed dead cell removal before the 10x Genomics experiment. We have also done extensive quality controls and believe that the data from the biopsies should be viewed as a whole and that quantitative intra-patient comparisons cannot be done.

      Reviewer #2 (Public Review):  

      Summary:  

      The study's goal is to characterize and validate tumor-reactive T cells in liver metastases of uveal melanoma (UM), which could contribute to enhancing immunotherapy for these patients. The authors used single-cell RNA and TCR sequencing to find potential tumor-reactive T cells and then used patientderived xenograft (PDX) models and tumor sphere cultures for functional analysis. They discovered that tumor-reactive T cells exist in activated/exhausted T cell subsets and in cytotoxic effector cells. Functional experiments with isolated TILs show that they are capable of killing UM cells in vivo and ex vivo.

      Strengths:  

      The study highlights the potential of using single-cell sequencing and functional analysis to identify T cells that can be useful for cell therapy and marker selection in UM treatment. This is important and novel as conventional immune checkpoint therapies are not highly effective in treating UM. Additionally, the study's strength lies in its validation of findings through functional assays, which underscores the clinical relevance of the research. 

      We thank the reviewer for these kind words about our work.

      Weaknesses:  

      The manuscript may pose challenges for individuals with limited knowledge of single-cell analysis and immunology markers, making it less accessible to a broader audience.

      The first draft of the manuscript (excluding methods) was written by a person (J.A.N) who is not a bioinformatician. It has been corrected to include the correct nomenclature where applicable but overall it is written with the aim to be understandable. We have made an additional effort in this version. 

      Reviewer #1 (Recommendations For The Authors):  

      (1) Firstly, the authors should provide high-resolution pictures to ensure readability for readers. 

      We have converted to pdf ourselves and that improved resolution. We are happy to provide high-resolution to the office if needed for the printing.

      (2) Furthermore, some parts of the article are more colloquial, and the authors should consider the logic and academic nature of the overall writing of the article. For example, authors should double-check whether the relevant expressions in the results are correct. For example, 'TCR' in the fourth part of the results should be 'TRLs'.

      We thank the reviewer for the recommendations and have gone through the manuscript.

      (3) Moreover, UM22 is described several times in the results as a metastatic UM and should be clearly defined in the methodology.

      The UM22 and UM1 samples are described in-depth in Karlsson et al., Nature Communications, 2020, a paper that is cited in the beginning of Results as part of the narrative. The current work can be viewed as an extension of that work.

      (4) Finally, it is recommended that authors describe a part of the results in full before citing the corresponding picture, otherwise, it will lead to confusion among readers.

      We have made an effort in the revised version to describe the new data in more detail.

      Reviewer #2 (Recommendations For The Authors):  

      The manuscript is very interesting and important to understanding key aspects of uveal melanoma immune profile and functionality. However, in my opinion, there are a few aspects that could be addressed.  

      - The manuscript lacks comprehensive details about the samples used, such as their disease progression, response to treatment, or any relevant information that could shed light on potential differences between samples. It would be valuable to know whether these samples were collected before any systemic treatment or if any of the patients underwent immunotherapy post-sample collection, along with the outcomes of such treatments. Providing this information would enrich the manuscript and provide a more holistic view of the research.

      We thank the reviewer for the recommendation and have included a new Supplementary table 7 with information about the samples. We have also pasted in individual samples’ contribution to the UMAP to add further holistic view.  

      - The results presented and discussed in the manuscript seem to indicate that there were no significant differences across the various samples, including comparisons between lymph-node and liver metastases. However, this lack of variation or the reasons for not discussing any observed differences should be clarified. If there are distinctions between the samples, it would be beneficial to discuss these findings in the manuscript.

      We thank the reviewer for the recommendation. Whereas 14 samples are many for a uveal melanoma study it is not really powered to do intra-patient comparisons.

      - The manuscript may pose difficulties for individuals with limited knowledge of single-cell analysis and immunology markers, potentially limiting its accessibility. To make the research more inclusive, the authors might consider presenting the technical aspects of their work in a less descriptive manner and providing explanations for those less familiar with the technology. This would help a broader audience grasp the significance of the study's findings. 

      The manuscript is from a multidisciplinary team where all have read and commented. The draft was written by a tumor biologist and edited by a bioinformatician for accuracy. We honestly think it is more understandable than most studies in this bioinformatics era. But we have tried to describe the new data in an easier way.

    1. if we fail to control our numbers and our appetites well then yes our society will start to to crash in a similar way to that of 00:35:32 easter island only on a worldwide scale and that means the whole industrial civilization will break down and 00:35:45 our descendants will essentially be uh savages to use that term very advisably and savages in the sense that they will have lost 00:35:58 the fruits of civilization and hate us

      for - progress trap - dark futures scenario - like Easter Island but on a global scale

      comment - The potential global breakdown of global industrialized society, rupturing supply chains so that our highly interdependent world becomes the very Achilles Heel that hastens its demise is chilling - It could mean a huge disruption to the most important aspect of civilization - the continuing accruing and inter-generational transmission of knowledge - It would be catastrophic to lose that, but it is entirely possible - As Wright himself famously said, to use a computer metaphor, we humans are like 50,000 year old hardware, running modern software - By that, he meant that our cognitive physiology (brain and sensory processing system) has not changed for tens of thousands of years, yet cultural evolution happens at exponentially faster rates, so much so that our biological systems are not adapted to keep up with the pace, and that spells disaster - When we no longer have the sensory or cognitive apparatus to sense danger, and we are offloading that to AI, we are in an extremely vulnerable situation

      progress trap - Gedanken - Think of our ancestors from 50,000 years ago. - What Wright is saying with his metaphor is that if that child from 50,000 years ago were transported by a time machine to modernity, (s)he would have little problem integrating into modern society - LIKEWISE, if we lose all the knowledge fruits of accumulated over so many thousands of years, it would be like being born into a human tribe 50,000 years ago. - We would likely still have language, but all our technology may have to start from scratch!

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      The authors provide solid data on a functional investigation of potential nucleoid-associated proteins and the modulation of chromosomal conformation in a model cyanobacterium. While the experiments presented are convincing, the manuscript could benefit from restructuring towards the precise findings; alternatively, additional data buttressing the claims made would significantly enhance the study. These valuable findings will be of interest to the chromosome and microbiology fields.

      We appreciate editors for taking time for assessment and reviewers for giving critical suggestions. Both reviewers were concerned about our interpretation of 3C data, and Reviewer #2 suggested the biochemistry of cyAbrB2 to reinforce our claim. We agree with the concern and suggest editors add a sentence “How cyAbrB2 affects chromosome structure is still elusive from this study, and the biochemical assays are needed in the future experiment.” to the eLife assessment.

      The major revision points are the following;

      Reconstruction of Figures

      Previous Figure 5E has been omitted

      Additional 3C data on the nifJ region

      Rephrasing the conclusion of 3C data

      Additional discussion on cyAbrB2 and NAPs

      Reviewer #1 (Public Review): 

      Strength: 

      At first glance, I had a very positive impression of the overall manuscript. The experiments were well done, the data presentation looks very structured, and the text reads well in principle.

      Weakness: 

      Having a closer look, the red line of the manuscript is somewhat blurry. Reading the abstract, the introduction, and parts of the discussion, it is not really clear what the authors exactly aim to target. Is it the regulation of fermentation in cyanobacteria because it is under-investigated? Is it to bring light to the transcriptional regulation of hydrogenase genes? The regulation by SigE? Or is it to get insight into the real function of cyAbrB2 in cyanobacteria? All of this would be good of course. But it appears that the authors try to integrate all these aspects, which in the end is a little bit counterintuitive and in some places even confusing. From my point of view, the major story is a functional investigation of the presumable transcriptional regulator cyAbrB2, which turned out to be a potential NAP. To demonstrate/prove this, the hox genes have been chosen as an example due to the fact that a regulatory role of cyAbrB2 has already been described. In my eyes, it would be good to restructure or streamline the introduction according to this major outcome. 

      As you pointed out, the major focus of this study is cyAbrB2 as a potential NAPs. To focus on NAPs, we simplified the first paragraph of the discussion (ll.246-263) and added the section comparing cyAbrB2 with other known NAPs (11.269-299). To emphasize the description of cyAbrB2, we also rearranged the figures and divided the analysis on cyAbrB2 ChIP into two figures. We reduced the first paragraph of the introduction but mostly preserved the composition of the introduction to keep the general to specific pattern, even though the manuscript is blurry.

      Points to consider: 

      The authors suggest that the microoxic condition is the reason for the downregulation of e.g. photosynthesis (l.112-114). But of course, they also switched off the light to achieve a microoxic environment, which presumably is the trigger signal for photosynthesis-related genes. I suggest avoiding making causal conclusions exclusively related to oxygen and recommend rephrasing (for example, "were downregulated under the conditions applied").

      We agree with this point. We rephrased l.114 to “by the transition to dark microoxic conditions from light aerobic conditions” (ll.108-109).

      The authors hypothesized that cyAbrB2 modulates chromosomal conformation and conducted a 3C analysis. But if I read the data in Figure 5B & C correctly, there is a lot of interaction in a range of 1650 and 1700 kb, not only at marked positions c and j. Positions c and j have been picked because it appears that cyAbrB2 deletion impacts this particular interaction. But is it really significant? In the case of position j the variation between the replicates seems quite high, in the case of position c the mean difference is not that high. Moreover, does all this correlate with cyAbrB2 binding, i.e. with positions of gray bars in panel A? If this was the case, the data obtained for the cyabrB2 mutant should look totally different but they are quite similar to WT. That's why the sentence "By contrast, the interaction frequency in Δcyabrb2 mutant were low and unchanged in the aerobic and microoxic conditions" does not fit to the data shown. But I have to mention that I am not an expert in these kinds of assays. Nevertheless, if there is a biological function that shall be revealed by an experiment, the data must be crystal clear on that. At least the descriptions of the 3C data and the corresponding conclusions need to be improved. For me, it is hard to follow the authors' thoughts in this context. 

      According to your suggestion, we again have carefully observed the 3C data. Furthermore, we conducted an additional 3C experiment on nifJ region (Figures 7F-J). Then we admit we had overinterpreted the 3C data. Therefore, we rewrote the result and discussion of the 3C assay in line with the data (ll.220-245) and removed the previous Figure 5E. Following are individual responses.

      Positions c and j have been picked because it appears that cyAbrB2 deletion impacts this particular interaction. But is it really significant?

      We could not find statistically significant differences at locus c and j. Therefore, we added this in the result section “Note that the interaction scores exhibit considerable variability and we could not detect statistical significance at those loci.” (ll.231-232)

      does all this correlate with cyAbrB2 binding, i.e. with positions of gray bars in panel A?

      As you are concerned, interaction frequency and cyAbrB2 binding do not correlate. Therefore, we withdraw the previous claim and stated as follows; “Moreover, our 3C data did not support bridging at least in hox region and nifJ region, as the high interaction locus and cyAbrB2 binding region did not seem to correlate (Figure 7).” (ll.280-282)

      If this was the case, the data obtained for the cyabrB2 mutant should look totally different but they are quite similar to WT.

      We rewrote it as follows; “Then we compared the chromatin conformation of wildtype and cyabrb2∆. Although overall shapes of graphs did not differ, some differences were observed in wildtype and cyabrb2∆ (Figures 7B and 7G); interaction of locus (c) with hox region were slightly lower in cyabrb2∆ and interaction of loci (f’) and (g’) with nifJ region were different in wildtype and cyabrb2∆. Note that the interaction scores exhibit considerable variability and we could not detect statistical significance at those loci.” (ll.228-232)

      That's why the sentence "By contrast, the interaction frequency in Δcyabrb2 mutant were low and unchanged in the aerobic and microoxic conditions" does not fit to the data shown.

      We rewrote the sentence as follow; “While the interaction scores exhibit considerable variability, the individual data over time demonstrate declining trends of the wildtype at locus (c) and (j) (Figure S8). In ∆cyabrb2, by contrast, the interaction frequency of loci (c) and (j) was unchanged in the aerobic and microoxic conditions (Figure 7E). The interaction frequency of locus (c) in ∆cyabrb2 was as low as that in the microoxic condition of wildtype, while that of locus (j) in ∆cyabrb2 was as high as that in the aerobic condition of wildtype (Figures 7B and 7C).” (ll.238-243)

      The figures are nicely prepared, albeit quite complex and in some cases not really supportive of the understanding of the results description. Moreover, they show a rather loose organization that sometimes does not fit the red line of the results section. For example, Figure 1D is not mentioned in the paragraph that refers to several other panels of the same figure (see lines110-128). Panel 1D is mentioned later in the discussion. Does 1D really fit into Figure 1 then? Are all the panels indeed required to be shown in the main document? As some elements are only briefly mentioned, the authors might also consider moving some into the supplement (e.g. left part of Figure 1C, Figure 2A, Figure 3B ...) or at least try to distribute some panels into more figures. This would reduce complexity and increase comprehensibility for future readers. Also, Figure 3 is a way too complex. Panel G could be an alone-standing figure. The latter would also allow for an increase in font sizes or to show ChIP data of both conditions (L+O2 and D-O2) separately. Moreover, a figure legend typically introduces the content as a whole by one phrase but here only the different panels are described, which fits to the impression that all the different panels are not well connected. Of course, it is the decision of the authors what to present and how but may they consider restructuring and simplifying.

      According to the advice, we have rearranged the Figure composition.

      The left side of Figure 1C has been moved to supplement. Instead, representative expression fold changes of “Transient”, “Plateau”, “Continuous”, and “Late” genes are shown for comprehensibility. We left Figure 1D in Figure 1, as this diagram shows our motive to focus on hox and nifJ. We moved Figure 2A to supplement. We did not move Fig3B, as this figure shows the distribution of cyAbrB2 (“long tract of AT-rich DNA”) comprehensively and simply. We agree that Figure 3 was too complex. Therefore, we moved Figures 3F and 3G to a new independent figure (Figure 4). In Figure 4C (former 3G), we show the ChIP data of the L+O2 condition only, and the change of ChIP data under the D-O2 condition is shown in Figure 5. The schematic image showing cyanobacterial chromosome and NAPs (previous Figure 5E) was omitted because it was overinterpreting.

      The authors assume a physiological significance of transient upregulation of e.g. hox genes under microoxic conditions. But does the hydrogenase indeed produce hydrogen under the conditions investigated and is this even required? Moreover, the authors use the term "fermentative gene". But is hydrogen indeed a fermentation product, i.e. are protons the terminal electron acceptor to achieve catabolic electron balance? Then huge amounts of hydrogen should be released. Comment should be made on this.

      This is a very important point; Yes, hydrogenase indeed produces hydrogen under the conditions we investigated, and proton accepts a majority of reducing power under the dark microoxic condition. We wrote in the introduction section as follows; “Hydrogen is generated in quantities comparable to lactate and dicarboxylic acids as the result of electron acceptance in the dark microoxic condition (Akiyama and Osanai 2023; Iijima et al. 2016)” (ll.54-55). The detailed explanation is below, although omitted from the manuscript.

      A recent study (Akiyama and Oasanai 2023) quantified the consumed glycogen and secreted fermentative products (hydrogen, lactate, dicarboxylic acid, and acetate) in the Synechocystis under the dark microoxic condition, the same conditions as we investigated. The system of the study consists of a 10 mL liquid layer and a 10 mL gas layer, cultivated for 3 days under dark microoxic conditions. Then the amounts of lactic acid, dicarboxylic acid, and hydrogen were approximately 2 µmol, 3.5 µmol, and 11µmol (assuming the gas layer was at 1 atm and ignoring aqueous population), respectively. On the other hand, glycogen equivalent to 15µmol of glucose was consumed in the system. This estimate supports hydrogen accounts for a substantial portion of fermentative products during dark microoxic conditions.

      The necessity of hydrogen production under dark microoxic conditions was demonstrated in (Gutekunst et al. 2014). They show hydrogenase activity is required for the mixotrophic growth in the light-dark and microoxic cycle with arginine. The necessity remains unclear in our conditions because we only performed continuous dark microoxic conditions without glucose.

      The authors also mention a reverse TCA cycle. But is its existence an assumption or indeed active in cyanobacteria, i.e. is it experimentally proven? The authors are a little bit vague in this regard (see lines 241-246).

      We misused the Terminology. We mean to mention the “reductive branch of TCA”. Cyanobacteria conduct the branched TCA cycle under microoxic conditions. One of the branches is the reductive branch, which reduces oxaloacetate to produce malate. We corrected “reverse TCA cycle” to “reductive branch of TCA”. (Figure 1D and ll.260-262)

      Reviewer #2 (Public Review): 

      This work probes the control of the hox operon in the cyanobacterium Synechocystis, where this operon directs the synthesis of a bidirectional hydrogenase that functions to produce hydrogen. In assessing the control of the hox system, the authors focused on the relative contributions of cyAbrB2, alongside SigE (and to a lesser extent, SigA and cyAbrB1) under both aerobic and microoxic conditions. In mapping the binding sites of these different proteins, they discovered that cyAbrB2 bound many sites throughout the chromosome repressed many of its target genes, and preferentially bound regions that were (relatively) rich in AT-residues. These characteristics led the authors to consider that cyAbrB2 may function as a nucleoid-associated protein (NAP) in Synechocystis, given its functional similarities with other NAPs like H-NS. They assessed the local chromosome conformation in both wild-type and cyabrB2 mutant strains at multiple sites within a 40 kb window on either side of the hox locus, using a region within the hox operon as bait. They concluded that cyAbrB2 functions as a nucleoid-associated protein that influences the activity of SigE through its modulation of chromosome architecture.

      The authors approached their experiments carefully, and the data were generally very clearly presented and described.

      Based on the data presented, the authors make a strong case for cyAbrB2 as a nucleoid-associated protein, given the multiple ways in which it seems to function similarly to the well-studied Escherichia coli H-NS protein. It would be helpful to provide some additional commentary within the discussion around the similarities and differences of cyAbrB2 to other nucleoid-associated proteins, and possible mechanisms of cyAbrB2 control (post-translational modification; protein-protein interactions; etc.). The manuscript would also be strengthened with the inclusion of biochemical experiments probing the binding of cyAbrB2, particularly focusing on its oligomerization and DNA polymerization/bridging potential.

      We agree with the comment that the biochemical experiments will deepen our insights into the cyAbrB2 and chromatin conformation. As the reviewer pointed out, the biochemical assay will provide valuable information on mechanisms of cyAbrB2 control, such as post-transcriptional modification, cooperation with cyAbrB1, oligomerization, and the structure of cyAbrB2-bound DNA. However, we think those potential findings are worth of new independent research paper, rather than a part of this paper. Therefore, we added a discussion mentioning biochemistry as the future work (ll.275-290; the section of “The biochemistry of cyAbrB2 will shed light on the regulation of chromatin conformation in the future”).

      Previous work had revealed a role for SigE in the control of hox cluster expression, which nicely justified its inclusion (and focus) in this study. However, the results of the SigA studies here suggested that SigA both strongly associated with the hox promoter, and its binding sites were shared more frequently than SigE with cyAbrB2. The focus on cyAbrB2 is also well-justified, given previous reports of its control of hox expression; however, it shares binding sites with an essential homologue cyAbrB1. Interestingly, while the B1 protein appears to bind similar sites, instead of repressing hox expression, it is known as an activator of this operon. It seems important to consider how cyAbrB1 activity might influence the results described here.

      We infer that the minor side of the bimodal SigE peak is the genuine population that contributes to hox transcription, as hox genes are expressed in a SigE-dependent manner (Figure S2). We considered the strong SigA peak upstream of the hox operon binds the promoter of TU1715, the opposite direction of the hox operon. We added a description of the single SigA peak and bimodal SigE peak near the TSS of the hox operon as follows;

      “A bimodal peak of SigE was observed at the TSS of the hox operon in a microoxic-specific manner (Figure 6C bottom panel). The downstream side of the bimodal SigE peak coincides with SigA peak and the TSS of TU1715. Another side of the bimodal peak lacked SigA binding and was located at the TSS of the hox operon (marked with an arrow in Figure 6C), although the peak caller failed to recognize it as a peak.” (ll.206-209)

      The point that cyAbrB1 binds similar sites as cyAbrB2, despite regulating hox expression in the opposite direction, is very interesting. Therefore, we referred to the transcriptome data of the cyAbrB1 knockdown strain and compared the impact of cyAbrB1 knockdown and cyAbrB2 deletion. We described in result and discussion as follows;

      “we referred to the recent study performing transcriptome of cyAbrB1 knockdown strain, whose cyAbrB1 protein amount drops by half (Hishida et al. 2024). Among 24 genes induced by cyAbrB1 knockdown, 12 genes are differentially downregulated genes in cyabrb2∆ in our study (Figure S5D).” (ll.162-165)

      “CyAbrB1, the homolog of cyAbrB2, may cooperatively work, as cyAbrB1 directly interacts with cyAbrB2 (Yamauchi et al. 2011), their distribution is similar, and they partially share their target genes for suppression (Figures 3A S5C and S5D). The possibility of cooperation would be examined by the electrophoretic mobility shift assay of cyAbrB1 and cyAbrB2 as a complex. Despite their similar repressive function, cyAbrB1 and cyAbrB2 regulate hox expression in the opposite directions, and their mechanism remains elusive.” (ll.292-296)

      Hox operon differs from this general tendency. To see if cyAbrB1 behaves differently from cyAbrB2 in the hox operon, we did an additional ChIP-qPCR experiment on cyAbrB1 in the aerobic condition and the dark microoxic condition (Figure 5C). However, we could not find the difference.

      Reviewer #1 (Recommendations For The Authors): 

      Figure 1B: I recommend changing the header in the grey bar to terms like "upregulated" and "downregulated", which are also used in the legend description. Upregulation of genes can also be a result of de-repression, which is why the term "activated" is somewhat misleading.

      Corrected.

      Lines 114-116: It is unclear what the authors exactly mean here. Please clarify. 

      We rephrase the sentence “The enrichment in the butanoate metabolism pathway indicates the upregulation of genes involved in carbohydrate metabolism. We further classified genes according to their expression dynamics.” (ll.110-111)

      Reviewer #3 (Recommendations For The Authors): 

      Major/experimental comments: 

      (1) For the chromosome conformation capture experiments, it is indicated that these were conducted at aerobic (1hr) and microoxic (4 hr) conditions. But the data presented in Figure 1 suggest that 1 hr corresponds to the beginning of microoxic growth, and that time 0 is aerobic. The composite 3C data in Figure 5 show some interesting but specific differences. It is appreciated that the authors presented the profiles for individual samples in Figure S7, and the differences here do not seem to be as compelling. Are the major differences being highlighted significantly (statistically) different (e.g. at the (c) and (j) loci)? Might the differences be starker if an earlier aerobic condition (e.g. time 0) had been used instead of the 1 hr - microoxic - timepoint?

      Previous Figure 5 consisted of three time points (solid line: aerobic condition, dashed line:1hr of microoxic condition, and dotty line:4hr of microoxic condition). We omitted data of 4hr in the main figure (Figure 7) as 4hr in microoxic conditions makes data complicated. Three time points are shown in the profiles of individual loci (Figure S8).

      There is no statistical significance found in (c) and (j) loci by t-test. Therefore, we have toned down the interpretation of 3C data as follows; “Our 3C result demonstrated that cyAbrB2 influences the chromosomal conformation of hox and nifJ region to some extent (Figure 7).” (ll.325-326)

      (2) This is a complicated system that involves multiple regulatory proteins, each of which is differentially affected by the growth conditions (aerobic/microoxic). It is obviously beyond the scope of this work to probe deeply into all of these proteins. The focus here was on cyAbrB2, and to a slightly lesser extent SigE; however, based on the data presented, it seems that SigA and cyAbrB1 may be equally important contributors to hox control/expression, and in the case of cyAbrB1, possibly also to chromosome conformation. cyAbrB1 appears to have the same binding sites as cyAbrB2, and has been reported to interact with cyAbrB2. Given this association, it is possible that the two proteins may affect the binding of each other, and that loss of one might lead to enhanced binding by the other (or binding may require heterooligomerization?). Probing the regulatory interplay between these two proteins (or at least discussing it) feels important. Conducting e.g. mobility shift assays with each protein, both individually and together, could possibly allow for some understanding of how they function together. 

      We agree that the biochemistry of cyAbrB2 and cyAbrB1 may explain why cyAbrB1 and cyAbrB2 bind long tracts of AT-rich genome regions in vitro. We would like to put the biochemistry future plan as we think biochemistry data is beyond the present study.

      The idea that cyAbrB1 and cyAbrB2 cooperate to form heterooligomers and broad binding to the genome is a very rational and interesting prediction. We add this idea to the discussion “Overall, the biochemistry integrating assay conditions (PTM, buffer condition, and cooperation with cyAbrB1) and output (DNA binding, oligomerization, and DNA structure) will deepen the understanding of cyAbrB2 as cyanobacterial NAPs.”(ll.287-290). We also compared our transcriptome of ∆_cyabrb2 with the recent study of cyabrb1 knockdown (ll. 162-165), and concluded “they partially share their target genes for suppression (Figures 3A S5C and S5D)” (l. 293).

      (3) Throughout the manuscript, there is reference made to cyAbrB2 binding becoming 'blurry' or non-specific under microoxic conditions. It is not clear what this means. It appears that when cyAbrB2 binds, any given protected region can be quite extensive, which can be suggestive of polymerization along the chromosome. Are the boundaries for binding sites typically clearly delineated, and this changes when the cultures are growing under microoxic conditions? There is also no mention made anywhere about oligomerization potential for cyAbrB2, which would be important for the polymerization, and bridging suggested for cyAbrB2 in the model presented in Figure 5. Previous publications (Song et al., 2022; Ishi et al., 2008) have suggested that it can exist as a dimer in vivo, but that in vitro it is largely monomeric. The manuscript would benefit from some additional biochemical analyses of cyAbrB2 binding activity, with a particular focus on DNA binding and oligomerization/bridging potential, and some additional discussion about these characteristics as well. 

      Throughout the manuscript, there is reference made to cyAbrB2 binding becoming 'blurry' or non-specific under microoxic conditions. It is not clear what this means.

      In order to clearly describe “cyAbrB2 binding becomes blurry”, we rearranged the figure composition and made an exclusive figure (Figure 5). We also rephrased the description by adopting the reviewer’s word “boundaries for binding sites”, as this phrase well describes the change. “When cells entered microoxic conditions, the boundaries of the cyAbrB2 binding region and cyAbrB2-free region became obscure (Figure 5), “(ll.319-320)

      There is also no mention made anywhere about oligomerization potential for cyAbrB2,

      We added the discussion about oligomerization “DNA-bound cyAbrB2 is expected to oligomerize, based on the long tract of cyAbrB2 binding region in our ChIP-seq data. However, no biochemical data mentioned the DNA deforming function or oligomerization of cyAbrB2 in the previous studies and preference for AT-rich DNA is not fully demonstrated in vitro (Dutheil et al. 2012; Ishii and Hihara 2008; Song et al. 2022)”(ll. 277-280) and “Overall, the biochemistry integrating assay conditions (PTM, buffer condition, and cooperation with cyAbrB1) and output (DNA binding, oligomerization, and DNA structure) will deepen the understanding of cyAbrB2 as cyanobacterial NAPs.” (ll.287-290)

      The manuscript would benefit from some additional biochemical analyses of cyAbrB2 binding activity, with a particular focus on DNA binding and oligomerization/bridging potential, and some additional discussion about these characteristics as well. 

      We added the discussion integrally considering known features of cyAbrB2, novel findings on cyAbrB2, and the comparison with known NAPs (ll.269-290).

      (4) Given that the major take-away for the authors (based on the title) seems to be the nucleoid-associated protein potential for cyAbrB2, the Discussion would benefit from some additional focus in this area. How similar is cyAbrB2 to other nucleoid-associated proteins? (e.g. H-NS, Lsr2) How does counter-silencing work for other nucleoid-associated proteins? Can the authors definitively exclude the possibility of binding site competition/occlusion, given that cyAbrB2 covers the promoter region of hox? What is other nucleoid-associated proteins have been characterized in the cyanobacteria? 

      We agree with the point, so we additionally discussed cyAbrB2 comparing with H-NS and Lsr2, the canonical NAPs (ll. 269-290).

      We did not deny the possibility of the exclusion of RNAP by cyAbrB2, but the previous manuscript insufficiently discussed that. To emphasize that cyAbrB2 excludes RNA polymerase, we simplified Figure 6 and employed mosaic plots showing anti-co-occurrence of cyAbrB2 binding regions and SigE peaks. Furthermore, we added discussion about SigE exclusion by cyAbrB2 (ll. 355-359)

      We mention the possibility of other nucleoid-associated proteins in cyanobacteria in the discussion. “Furthermore, the conformational changes by deletion of cyAbrB2 were limited, suggesting there are potential NAPs in cyanobacteria yet to be characterized.” (ll.336-339)

      (5) Previous work (Song et al., 2022) showed that changing the AT content of cyAbrB2 binding sites did not affect its ability to bind DNA. There are also previous papers suggesting that cyAbrB2 may be subject to diverse post-translational modifications (e.g. phosphorylation - Spat et al., 2023; glutationylation - Sakr et al., 2013), as well as association with cyAbrB1. These collectively suggest there may be other factors that contribute to cyAbrB2 binding specificity/activity. These seem like relevant points to discuss, particularly given the transient nature of the cyAbrB2 effects on some genes.

      We have included the discussion about AT content, post-translational modifications and transient regulations, and association with cyAbrB1 (ll. 284-295)

      (6) Given the major binding site for SigA upstream of the hox operon, it seems that it likely also contributes to hox cluster expression, together with SigE. Is there a sense for the relative contribution of each sigma factor to hox cluster expression? And whether both are subject to the same inhibitory effect of cyAbrB2? 

      As described above response to the public review, the SigA binding site upstream of the hox operon should be assigned to the TSS of TU1715 (Figure 6C). Transcription of hox operon is highly dependent on SigE as shown in Figure S2, and residual transcription in sigE∆ strain is derived from other sigma factors (SigABCD). Estimating the relative contribution of sigma factors other than SigE is difficult at present because SigABCDE can partially compensate for each other.

      As the different impact of NAPs on the primary and alternative sigma factor is observed in H-NS (Shin et al. 2005), whether both the primary sigma factor (SigA) and the alternative sigma factor (SigE) are inhibited by cyAbrB2 to the same extent is a very interesting question.

      We calculated the odds ratio of SigE and SigA being in the cyAbrB2-free region and wrote in the result; “SigE preferred the cyAbrB2-free region in the aerobic condition more than SigA did (Odds ratios of SigE and SigA being in the cyAbrB2-free region were 4.88 and 2.74, respectively).” (ll.193-195) and discussed “The higher exclusion pressure of cyAbrB2 on SigE may contribute to sharpening the transcriptional response of hox and nifJ on entry to microoxic conditions.” (ll.357-359)

      (7) The 3C experiments suggest there are indeed changes in chromosome architecture in the hox region as growth conditions change and when different regulators are present. Across the chromosome, analogous changes are expected; however, it may be premature to draw this conclusion based on changes at one locus. Is there a reason that the authors did not take full advantage of their 3C samples and sequence them, to capture the full chromosome interactome at the two time-points? This would allow broader conclusions to be drawn regarding changes in chromosome structure and the impact of cyAbrB2.

      In response to the suggestion, we performed an additional 3C assay on the nifJ region by utilizing residual 3C samples. Expanding to genome-wide sequence (Hi-C) needs concentration of ligated fragments by the biotinylation, which were omitted in our 3C sample.

      We rewrote the result as obtained from the 3C data of hox and nifJ (ll.220-245) and omitted the schematic image of an entire chromosome of cyanobacteria (previous Figure 5E).

      Editorial comments: 

      (1) The data presentation in Figure 1 is very effective. 

      (2) Line 87: please rephrase - you can have 'high similarity' or 'high levels of identity', but not high levels of homology - genes/proteins are either homologous or not.

      (3) Line 118: classified into four 'groups'? 

      (4) Line 590: remove 'the'. 

      (5) Figure 2S, panel B: please define acronyms in the legend (GT, IP) and write out 'FLAG' in full for AbrB1.

      (2) to (5) have been corrected.

      (6) Please provide information on or a reference for the tagging of SigA for use in the ChIP-seq experiments within the Materials and Methods.

      Added (l.365)

      (7) Line 648: space between 'binding' and 'regions'. 

      corrected.

      (8) Fig 4E: please make the solid lines thicker - they are currently difficult to see.

      We have made Figure 6C (former 4E) larger and the line thicker.

      (9) Line 666: location. 

      (10) Line 673: Individual. 

      (11) Figure S5, panel C graph title: should this be 'Relative'? 

      (12) Figure S7: What is 'GT'? Should this be 'WT'? 

      (9) to (12) have been corrected.

      (13) In addition to the data presented in Figure 3G, it would be nice to have a small table or Venn diagram summarizing the number of cyAbrB2 binding sites that fall into the different categories (full gene/operon; downstream of a gene; within a gene; promoter region). 

      In response to the comment, we noticed the categories we had applied (full gene/operon; downstream of a gene; within a gene; promoter region) were arbitrary. Therefore, we categorized transcriptional units (TUs) according to the extent of occupancy by cyAbrB2. (Figures 4B and 4C)

      (14) Line 280-281: suggest replacing 'mediates' with 'influences'. 'Mediates' sounds like a direct interaction (for which the evidence is not currently strong without some additional biochemical data), but 'influences' could better accommodate both direct and indirect possibilities. 

      (15) Line 410: it is not clear what this means. 

      We have omitted “As a result, DNA ~600-fold condensed DNA than 3C samples were ligated.”, as it does not give any information about the experimental procedure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript builds upon the authors' previous work on the cross-talk between transcription initiation and post-transcriptional events in yeast gene expression. These prior studies identified an mRNA 'imprinting' phenomenon linked to genes activated by the Rap1 transcription factor (TF), a surprising role for the Sfp1 TF in promoting RNA polymerase II (RNAPII) backtracking, and a role for the non-essential RNAPII subunits Rpb4/7 in the regulation of mRNA decay and translation. Here the authors aimed to extend these observations to provide a more coherent picture of the role of Sfp1 in transcription initiation and subsequent steps in gene expression. They provide evidence for (1) a physical interaction between Sfp1 and Rpb4, (2) Sfp1 binding and stabilization of mRNAs derived from genes whose promoters are bound by both Rap1 and Sfp1 and (3) an effect of Sfp1 on Rpb4 binding or conformation during transcription elongation. 

      Strengths: 

      This study provides evidence that a TF (yeast Sfp1), in addition to stimulating transcription initiation, can at some target genes interact with their mRNA transcripts and promote their stability. Sfp1 thus has a positive effect on two distinct regulatory steps. Furthermore, evidence is presented indicating that strong Sfp1 mRNA association requires both Rap1 and Sfp1 promoter binding and is increased at a sequence motif near the polyA track of many target mRNAs. Finally, they provide compelling evidence that Sfp1-bound mRNAs have higher levels of RNAPII backtracking and altered Rpb4 association or conformation compared to those not bound by Sfp1. 

      Weaknesses: 

      The Sfp1-Rpb4 association is supported only by a two-hybrid assay that is poorly described and lacks an important control. Furthermore, there is no evidence that this interaction is direct, nor are the interaction domains on either protein identified (or mutated to address function). 

      Indeed, our two hybrid, immunoprecipitation and imaging results do not allow us to conclusively discern whether the interaction between Rpb4 and Sfp1 is direct or indirect. While the interaction holds significance, we consider the direct versus indirect distinction to be of secondary importance in the context of this paper. In the current text we indicated that 'our two hybrid, immunoprecipitation and imaging results do not differentiate between a direct or indirect interactions' (see page 6, sentences highlighted in blue)

      The contention that Sfp1 nuclear export to the cytoplasm is transcription-dependent is not well supported by the experiments shown, which are not properly described in the text and are not accompanied by any primary data. 

      This section has been re-written for better clarity (see page 7). We note that this assay was originally developed and published by Lee, M. S., M. Henry, and P. A. Silver in their 1996 paper in G&D and has since been reported in numerous subsequent studies. Reassuringly, our conclusion is bolstered by the observation that Sfp1 binds to Pol II transcripts co-transcriptionally, suggesting that Sfp1 is exported in the context of the mRNA.

      The presence of Sfp1 in P-bodies is of unclear relevance and the authors do not ask whether Sfp1-bound mRNAs are also present in these condensates. 

      P-bodies consist of both RNA and proteins (reviewed in doi: 10.1021/acs.biochem.7b01162). The significance of this experiment lies in its contribution to further confirming the co-localization of Sfp1 with mRNAs and Rpb4. This observation could also yield valuable insights for future investigations into the role of Sfp1.

      Further analysis of Sfp1-bound mRNAs would be of interest, particularly to address the question of whether those from ribosomal protein genes and other growth-related genes that are known to display Sfp1 binding in their promoters are regulated (either stabilized or destabilized) by Sfp1. 

      Fig. 4A, C and D show that RP mRNAs become destabilized in sfp1Δ cells.

      The authors need to discuss, and ideally address, the apparent paradox that their previous findings showed that Rap1 acts to destabilize its downstream transcripts, i.e. that it has the opposite effect of Sfp1 shown here. 

      We would like to thank Reviewer 1 for this valuable comment. In the revised paper, we delved into our hypothesis suggesting that Rap1 is likely responsible for regulating the imprinting of other proteins, that, in turn, lead to the destabilization of mRNAs, such as Rpb4. See blue paragraph in page 20.

      Finally, recent studies indicate that the drugs used here to measure mRNA stability induce a strong stress response accompanied by rapid and complex effects on transcription. Their relevance to mRNA stability in unstressed cells is questionable. 

      Half-lives were determined mainly by the GRO analysis of optimally proliferating cells. This  method does not requires any drug or stressful treatment.  The results obtained by this method were consistent with those obtained after thiolutin addition. Using both methods, we discovered that disruption of Sfp1 results in substantial mRNA destabilization. Nevertheless, in our revised manuscript, we show results obtained by subjecting cells to a temperature shift to 42°C, a natural method to inhibit transcription. This approach to determine half-lives has been previously reported in our publications, such as Lotan et al. (2005, 2007) and Goler Baron et al. (2008). This may rule out effects of the drug on half-lives. Indeed, this assay clearly determine HL under heat stress. Thus it can clearly demonstrate that, at least during heat shock, Sfp1 stabilizes mRNAs. Since the results are similar to those obtained by the GRO method at 30oC, we concluded that Sfp1 stabilizes mRNA under optimal and hot conditions.

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript by Kelbert et al. presents results on the involvement of the yeast transcription factor Sfp1 in the stabilisation of transcripts whose synthesis it stimulates. Sfp1 is known to affect the synthesis of a number of important cellular transcripts, such as many of those that code for ribosomal proteins. The hypothesis that a transcription factor can remain bound to the nascent transcript and affect its cytoplasmic half-life is attractive, but the methods used to demonstrate the half-life effects and the association of Sfp1 with cytoplasmic transcripts remain to be fully validated, as explained in my comments on the results below: 

      Comments on methodology and results: 

      (1) A two-hybrid-based assay for protein-protein interactions identified Sfp1, a transcription factor known for its effects on ribosomal protein gene expression, as interacting with Rpb4, a subunit of RNA polymerase II. Classical two-hybrid experiments depend on the presence of the tested proteins in the nucleus of yeast cells, suggesting that the observed interaction occurs in the nucleus. Unfortunately, the two-hybrid method cannot determine whether the interaction is direct or mediated by nucleic acids. 

      Indeed, our two hybrid, immunoprecipitation and imaging results do not allow us to conclusively discern whether the interaction between Rpb4 and Sfp1 is direct or indirect. While the interaction holds significance, we consider the direct versus indirect distinction to be of secondary importance in the context of this paper. In the current text we indicated that 'our two hybrid, immunoprecipitation and imaging results do not differentiate between a direct or indirect interactions' (see page 6)

      (2) Inactivation of nup49, a component of the nuclear pore complex, resulted in the redistribution of GFP-Sfp1 into the cytoplasm at the temperature non-permissive for the nup49-313 strain, suggesting that GFP-Sfp1 is a nucleo-cytoplasmic shuttling protein. This observation confirmed the dynamic nature of the nucleo-cytoplasmic distribution of Sfp1. For example, a similar redistribution to the cytoplasm was previously reported following rapamycin treatment and under starvation (Marion et al., PNAS 2004). In conjunction with the observation of an interaction with Rpb4, the authors observed slower nuclear import kinetics for GFP-Sfp1 in the absence of Rpb4 when cells were transferred to a glucose-containing medium after a period of starvation. Since the redistribution of GFP-Sfp1 was abolished in an rpb1-1/nup49-313 double mutant, the authors concluded that Sfp1 localisation to the cytoplasm depends on transcription. The double mutant yeast cells may show a variety of non-specific effects at the restrictive temperature, and whether transcription is required for Sfp1 cytoplasmic localisation remains incompletely demonstrated. 

      We agree with Reviewer 2 that any heat inactivation of a temperature-sensitive (ts) protein can lead to non-specific effects. It is evident that nup49-313 does not prevent Sfp1 export to the cytoplasm. In the case of rpb1-1, these non-specific effects are expected due to transcriptional arrest, which can eventually result in a reduction in protein content. However, this process takes some time, while the impact on export is more rapid. It is worth noting that this assay was developed and previously published by Pam Silver (Henry and Silver G&D 1996) and has been reported in many subsequent papers. Importantly, our conclusion is supported by the observation that Sfp1 binds both nascent RNA (co-transcriptionally) and mature mRNA (cytoplasmic). These observations, along with the reduced mRNA export upon transcription blocking, are consistent with our proposal that Sfp1 is exported in association with mRNA.

      (3) Under starvation conditions, which led to the presence of Sfp1 in the cytoplasm and have previously been correlated with a decrease in the transcription of Sfp1 target genes, the authors observed that a plasmid-based expressed GFP-Sfp1 accumulated in cytoplasmic foci. These foci were also labelled by P-body markers such as Dcp2 and Lsm1. The quality of the microscopic images provided does not allow to determine whether Rpb4-RFP colocalises with GFP-Sfp1. 

      The submitted PDF figure is of low quality. We believe that high quality figure of the final submission is convincing. 

      (4) To understand to which RNA Sfp1 might bind, the authors used an N-terminally tagged fusion protein in a cross-linking and purification experiment. This method identified 264 transcripts for which the CRAC signal was considered positive and which mostly correspond to abundant mRNAs, including 74 ribosomal protein mRNAs or metabolic enzyme-abundant mRNAs such as PGK1. The authors did not provide evidence for the specificity of the observed CRAC signal, in particular, what would be the background of a similar experiment performed without UV cross-linking. In a validation experiment, the presence of several mRNAs in a purified SFP1 fraction was measured at levels that reflect the relative levels of RNA in a total RNA extract. Negative controls showing that abundant mRNAs not found in the CRAC experiment were clearly depleted from the purified fraction with Sfp1 would be crucial to assessing the specificity of the observed protein-RNA interactions. The NON-CRAC+ selected mRNAs were enriched for genes whose expression was previously shown to be upregulated upon Sfp1 overexpression (Albert et al., 2019). The presence of unspliced RPL30 pre-mRNA in the Sfp1 purification was interpreted as a sign of co-transcriptional assembly of Sfp1 into mRNA, but in the absence of valid negative controls, this hypothesis would require further experimental validation.

      We would like to thank Reviewer 2 for bringing this issue up, as it helped us to clarify it in the revised paper.

      First, we emphasized in the Discussion that many CRAC+ genes do not fall into the category of highly transcribed genes. Please see more detailed discussion below.

      Secondly, we examined various features of the 264 genes - classified as CRAC+ - to estimate their specificity and biological significance. Our various experiments revealed that the CRAC+ genes represent a distinct group with many unique features.

      The biological significance of the 264 CRAC+ mRNAs was demonstrated by various experiments; all are inconsistent with technical flaws. In fact, all the experiments and analyses that we have pursued indicate the unique nature of the CRAC+ genes. Some examples are:

      (1) Fig. 2a and B show that most reads of CRAC+ mRNA were mapped to specific location – close the pA sites.

      (2) Fig. 2C shows that most reads of CRAC+ mRNA were mapped to specific RNA motif located near the 3’ ends of the mRNAs.

      (3) Most RiBi CRAC+ promoter contain Rap1 binding sites (p= 1.9x10-22), whiles the vast majority of RiBi non-CRAC+  promoters do not. (Fig. 3C).

      (4) Fig. 4A shows that RiBi CRAC+ mRNAs become destabilized due to Sfp1 deletion, whereas RiBi non-CRAC+ mRNAs do not. Fig. 4B shows similar results due to Sfp1 depletion.

      (5) Fig. 6B shows that the impact of Sfp1 on backtracking is substantially higher for CRAC+ than for non-CRAC+ genes. This is most clearly visible in RiBi genes.

      (6) Fig. 7A shows that the Sfp1-dependent changes along the transcription units is substantially more rigorous for CRAC+ than for non-CRAC+.

      (7) In Fig. S4B, the chromatin binding profile of Sfp1 is shown to be different for CRAC+ and non-CRAC+ genes.

      Taken together, the many unique features, in fact, any feature that we examined, indicate the specificity and significance of this group, demonstrating that our CRAC results are biologically significant.

      Most importantly, these genes do not all fall into the category of highly transcribed genes.  On the contrary, as depicted in Figure 6A (green dots), it is evident that CRAC+ genes exhibit a diverse range of Rpb3 ChIP and GRO signals. Furthermore, as illustrated in Figure 7A, when comparing CRAC+ to Q1 (the most highly transcribed genes), it becomes evident that the Rpb4/Rpb3 profile of CRAC+ genes behaves differently from the Q1 group. Evidently, despite the heterogeneous transcription of CRAC+ genes (as mentioned above), the Rpb4/Rpb3 profile decreases more substantially than that of the highly transcribed genes (Q1).  Moreover, despite similar expression levels among all RiBi mRNAs, only a portion of them binds Sfp1.

      Thus, all our results indicate that CRAC+ genes represent biologically significant group, irrespective of the expression of it members. In response to this comment, we included a new paragraph discussing the validity of our conclusions. See page 18, blue paragraph.

      (5) To address the important question of whether co-transcriptional assembly of Spf1 with transcripts could alter their stability, the authors first used a reporter system in which the RPL30 transcription unit is transferred to vectors under different transcriptional contexts, as previously described by the Choder laboratory (Bregman et al. 2011). While RPL30 expressed under an ACT1 promoter was barely detectable, the highest levels of RNA were observed in the context of the native upstream RPL30 sequence when Rap1 binding sites were also present. Sfp1 showed better association with reporter mRNAs containing Rap1 binding sites in the promoter region. However, removal of the Rap1 binding sites from the reporter vector also led to a drastic decrease in reporter mRNA levels. Whether the fraction of co-purified RNA is nuclear and co-transcriptional or not cannot be inferred from these results. 

      The proposed co-transcriptional binding of Sfp1 is based on the findings presented in Figure 5C and Figure S2D, as well as the observed binding of Sfp1 to transcripts containing introns, as shown in Figures 2D and 3B.  The results of Fig. 3 led us to the assertion that the "RNA-binding capacity of Sfp1 is regulated by Rap1-binding sites located at the promoter." We maintain our stance on this conclusion. Indeed, the Rap1 binding site does impact mRNA levels, as highlighted by Reviewer 2. However, "construct E," which possesses a promoter with a Rap1 binding site, exhibits lower transcript levels compared to "construct F," which lacks such a binding site in its promoter. Despite this difference in transcript levels, Sfp1 was able to pull down the former transcript but not the latter, even though expression of the former gene is relatively low. Thus, the results appear to be more reliant on the specific capacity of Sfp1 to interact with the transcript rather than on the transcript's expression level.

      (6) To complement the biochemical data presented in the first part of the manuscript, the authors turned to the deletion or rapid depletion of SFP1 and used labelling experiments to assess changes in the rate of synthesis, abundance, and decay of mRNAs under these conditions. An important observation was that in the absence of Sfp1, mRNAs encoding ribosomal protein genes not only had a reduced synthesis rate but also an increased degradation rate. This important observation needs careful validation, as genomic run-on experiments were used to measure half-lives, and this particular method was found to give results that correlated poorly with other measures of half-life in yeast (e.g. Chappelboim et al., 2022 for a comparison). Similarly, the use of thiolutin to block transcription as a method of assessing mRNA half-life has been reported to be problematic, as thiolutin can specifically inhibit the degradation of ribosomal protein mRNA (Pelechano & Perez-Ortin, 2008). Specific repressible reporters, such as those used by Baudrimont et al. (2017), would need to be tested to validate the effect of Sfp1 on the half-life of specific mRNAs. Also, it would be very difficult to infer from the images presented whether the rate of deadenylation is altered by Sfp1.

      Various methods exist for assessing mRNA half-lives (HLs), and each of them carries its own set of challenges and biases. Consequently, it becomes problematic to directly compare HL values of a specific mRNA when different methods are employed. The superiority of one particular method over others remains unclear (in my opinion). However, they exhibit a high degree of reliability when it comes to comparing different strains under the identical conditions using a single method.

      Estimating HLs through the GRO approach is a non-invasive method, applied on optimally proliferating cells, which has been employed in numerous publications. While no method is without its limitations, our experience along the years reassured approach to be among the most dependable. Our HL determination using thiolutin to block transcription provided results that were consistent with the values obtained by the GRO approach.

      Nevertheless, in our revised manuscript, we supplemented the HL data, obtain by thiolutin, with results obtained by subjecting cells to a temperature shift to 42°C, a natural method to block transcription in wild-type (WT) cells. This approach to determine HLs has been previously reported in our publications, such as Lotan et al. (2005, 2007) and Goler Baron et al. (2008). The new results are shown in Fig. S3B. They are consistent with our conclusion that Sfp1 stabilizes mRNAs.

      Using a repressible promoter to determine mRNA HL is, unfortunately, not suitable in this paper because the promoter itself is involved in HL regulation. This observation is supported by Bregman et al. (2011) and depicted in Fig. 3, which illustrates that the promoter is critical for mRNA imprinting, consequently regulating HL.

      (7) The effects of SFP1 on transcription were investigated by chromatin purification with Rpb3, a subunit of RNA polymerase, and the results were compared with synthesis rates determined by genomic run-on experiments. The decrease in polII presence on transcripts in the absence of SFP1 was not accompanied by a marked decrease in transcript output, suggesting an effect of Sfp1 in ensuring robust transcription and avoiding RNA polymerase backtracking. To further investigate the phenotypes associated with the depletion or absence of Sfp1, the authors examined the presence of Rpb4 along transcription units compared to Rpb3. One effect of spf1 deficiency was that this ratio, which decreased from the start of transcription towards the end of transcripts, increased slightly. The results presented are largely correlative and could arise from the focus on very specific types of mRNAs, such as those of ribosomal protein genes, which are sensitive to stress and are targeted by very active RNA degradation mechanisms activated, for example, under heat stress (Bresson et al., 2020). 

      Figure 7A illustrates a significant reduction in Rpb4/Rpb3 ratios along the transcription unit in WT cells. This reduction is notably more pronounced in CRAC+ genes compared to the highly transcribed quartile (Q1), which includes all ribosomal protein (RP) genes, and it is completely absent in sfp1∆ cells. Furthermore, it's important to highlight that the CRAC+ gene group displays a wide range of transcription rates, as measured by either Rpb3 ChIP or GRO (Figure 6A). Given these observations, we do not think that heightened sensitivity of RP mRNA degradation in response to stress is responsible for the pronounced difference in the configuration of the Pol II elongation complex that is detected in CRAC+ genes, mainly because this experiment was performed under standard (non-stress) culture conditions.

      Correlative studies are particularly informative when a gene mutation eliminates a correlation, and this is precisely the type of study depicted in Figure 7B-C. The correlations shown in these panels are dependent on Sfp1. Indeed, RP genes are sensitive to stress. However, we used non-stressed conditions. Furthermore, CRAC+ genes did not display any apparent unusual destabilization but rather exhibited higher (not lower) mRNA stability compared to non-CRAC+ genes (Figure 7C).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The paper combines phenotypic and genomic analyses of the "sheltered load" (i.e. the accumulation of deleterious mutations linked to S-loci that are hidden from selection in the homozygous state) in Arabidopsis. The authors compare results to previous theoretical predictions concerning the extent of the load in dominant vs recessive S-alleles, and further develop exciting theory to reconcile differences between previous theory and observed results.

      Strengths:

      This is a very nice combination of theory and data to address a classical question in the field.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      The "genetic load" is a poorly defined concept in general, and its quantification via the number of putatively deleterious mutations is quite difficult. Furthermore counting up the number of derived mutations at fully constrained nucleotides may not be a great estimate of the load, and certainly does not allow for evaluation of recessivity -- a concept critical to ideas concerning the sheltered load. Alternative approaches - including estimating the severity of mutations - could be helpful as well. This imperfection in available approaches to test theory must be acknowledged more strongly by the authors.

      As suggested by the reviewer, we implemented alternative approaches to estimate the severity of deleterious mutations and now report the results of SNPeff and

      SIFT4G analyses in Table S6. The results we obtained with these other metrics were overall very similar to those based on our previous counting of mutations at 0-fold and 4-fold degenerate sites. More generally, we tried to improve the presentation of our strategy to estimate the genetic load (clarified in lines 262-268, 271, 292-295, 297. In particular, we made it clear that our population genetic analysis cannot assess the recessivity of the observed mutations (lines 428-434).

      Reviewer #2 (Public Review):

      Summary:

      This study looks into the complex dominance patterns of S-allele incompatibilities in Brassicaceae, through which it attempts to learn more about the sheltering of deleterious load. I found several weak points in the analyses that diminished my excitement about the results. In particular, the way in which deleterious mutations were classified lacked the ability to distinguish the severity of the mutations and thus their expected associated dominance.

      First, we would like to clarify that our goal with this study is NOT to learn something about dominance of the linked deleterious mutations (we can not). Instead, we compare the accumulation of deleterious mutations linked to dominant vs recessive S-ALLELES, but are agnostic regarding the dominance level of the LINKED mutations themselves. The rationale is that the different intensities of natural selection between dominant vs recessive S-alleles provide a powerful way to examine the process by which deleterious mutations are sheltered in general. We further clarified this aspect on lines 70-73 and 399-401.

      Second, as mentioned above in response to Reviewer 1, we complemented the analysis by predicting the severity of the deleterious mutations by SIFT4G and SNPeff. The results were largely consistent, with the exception that the number of sites included in SIFT4G was low, such that the statistical power was reduced (lines 296-300).

      Furthermore, the simulation approach could have provided this exact sort of insight but was not designed to do so, making this comparison to the empirical data also less than exciting for me.

      As explained above, studying dominance of the linked mutations we observed is an interesting research question (albeit a difficult one), but it was not our goal here. Instead, our study was designed as an empirical test of the predictions presented in Llaurens et al (2009), and we re-analysed some aspects of the model outcome to illustrate our points.

      We now better explain that we based our choice of parameters on the fact that in the theoretical study by Llaurens et al (2009), recessive deleterious mutations are predicted to accumulate in a much more straightforward manner (line 316-318).

      We now dedicate a paragraph of the discussion to explain how our stochastic simulations could be improved, and acknowledge that a full exploration of the interaction between dominance of the S-alleles and dominance of the linked deleterious mutations would be an interesting follow-up - albeit beyond the scope of our study (line 437-441).

      Major and minor comments:

      I think the introduction (or somewhere before we dive into it in the results) of the dominance hierarchy for the S-alleles needs a more in-depth explanation. Not being familiar with this beforehand really made this paper inaccessible to me until I then went to find out more before continuing. I would expect this paper to be broad enough that self-contained information makes it accessible to all readers. For example, lines 110-115 could be in the Introduction.

      We thank the reviewer for this useful remark. We now give a more comprehensive description of the dominance hierarchy and introduce the classes of dominance in A. lyrata already in the introduction, on lines 64-70.

      Along with my above comment, perhaps it is not my place to comment, but I find the paper not of a broad enough scope to be of interest to a broad readership. This S-allele dominance system is more than simple balancing selection, it is a very complex and specific form of dominance between several haplotypes, and the mechanism of dominance does not seem to be genetic. I am not sure that it thus extrapolates to broad comments on general dominance and balancing selection, e.g. it would not be the same as considering inversions and this form of balancing selection where we also expect recessive deleterious mutations to accumulate.

      We disagree with these interpretations by the reviewer, for two reasons:

      First, the mechanism of dominance is actually entirely genetic. In fact, we uncovered some years ago that it is based on the molecular interaction between small non-coding RNAs from dominant alleles and their target sites on recessive alleles (Durand et al. Science 2014, see lines 68-70). If there is something specific with this system, it is that the dominance phenomenon is better understood at the mechanistic level than in most other cases, but the resulting phenomenon in itself (a dominance hierarchy) is rather common.

      Second, the kind of variation in the intensity of linked selection created by this mechanism is actually a general phenomenon, so our results have broad relevance beyond our particular study system. We modified the introduction to explain this point

      more clearly, highlighting in particular the fact that the situation we study closely resembles the case of sex chromosomes, where X (or Z) chromosomes are genetically recessive and Y (or W) chromosomes are genetically dominant. We cite this example in lines 83-87 of the introduction and also several well-studied other examples on lines 480-489 of the discussion.

      It would have been particularly interesting, or a nice addition, to see deleterious mutations classed by something like SNPeff or GERP where you can have different classes of moderate to severe deleterious variants, which we would expect also to be more recessive the more deleterious they are. In line with my next comment on the simulations, I think relative differences between mutations expected to be more or less dominant may be even more insightful into the process of sheltering which may or may not be going on here.

      We agree with the reviewer, and as detailed above we have now integrated such analyses with SNPeff and SIFT4G (Table S6). These new results reinforce our conclusion that while S-allele dominance influences the fixation of deleterious mutations, it has no effect on their total number. See lines 270-272 and 296-300.

      In the simulations, h=0 and s=0.01 (as in Figure 5) for all deleterious mutations seems overly simplistic, and at the convenient end for realistic dominance. I think besides recessive lethals which we expect to be close to h=0 would have a much larger selection coefficient, and other deleterious mutations would only be partially recessive at such an s value. I expect this would change some of the simulation results seen, though to what degree I am not certain. It would be nice to at least check the same exact results for h=0.3 or 0.2 (or additionally also for recessive lethals, e.g. h=0 and s=-0.9). I would also disagree with the statement in line 677, many studies have shown, particularly those on balancing selection, that partially recessive deleterious mutations are not eliminated by natural selection and do play a role in population genetic dynamics. I am also not surprised that extinction was found for higher s values when the mutation rate for such mutations was very high and the distribution of s values was constant. An influx of such highly deleterious mutations is unlikely to ever let a population survive, yet that does NOT mean that in nature, the rare influx of such mutations does lead to them being sheltered. I find overall that the simulation results contribute very little, to none, to this paper, as without something more realistic, like a simultaneous distribution of s and h values, you cannot say which, if any class of these mutations are the ones expected to accumulate because of S-allele dominance.

      We understand that the previous version of our manuscript was confusing between dominance of the S-alleles and dominance of the linked deleterious mutations. We clarified that our study focuses on the effect of the former only (lines 99, 263-264 and 581-583).

      We agree that a complete exploration of the interaction between dominance of the S-alleles and dominance of the linked mutations being sheltered would have been an asset, but as explained above this is not the focus of our study. The previous work by Llaurens et al (2009) has already established that deleterious mutations can fix within S-allele lineages, especially when linked to dominant S-alleles, and when the number of S-alleles is large. Under the conditions they examined, deleterious mutations were much more strongly eliminated if not fully recessive (h=0 vs h=0.2), so for the present study we decided to simulate fully recessive mutations only. We now formally acknowledge the possibility that some complex interaction may take place between dominance of the S-alleles and dominance of the linked deleterious mutations (lines 440-442). However, as explained above we feel that fully exploring this complex interaction would require a detailed investigation, which is clearly beyond the scope of the present study.

      Rather they only show the disappointing or less exciting result that fully recessive, weakly deleterious mutations (which I again think do not even exist in nature as I said above) have minor, to no effect across the classes of S-allele dominance. They provide no insight into whether any type of recessive deleterious mutation can accumulate under the S-allele dominance hierarchy, and that is the interesting question at hand. I would either remove these simulations or redo them in another approach. The authors never mention what simulation approach was used, so I can only assume this is custom, in-house code. Yet I do not find that code provided on the github page. I do not know if the lack of a distribution for h and s values is then a choice or a programming limitation, but I see it as one that should be overcome if these simulations are meant to be meaningful to the results of the study.

      The code we used (in C) was adapted from the previous study by Llaurens et al. (2009), which at the time was not deposited in a data repertory, unfortunately. With the agreement of the authors of that study, this code is now available on Github:

      (https://github.com/leveveaudrey/model_ssi_Llaurens; line 723).

      It is correct that our simulations were not aimed at determining whether “any type of recessive deleterious mutation can accumulate”, but we strongly believe that they help interpreting the observations made in the genomic data.

      Recommendations for the authors:

      Notes from the editor:

      I found Table 1 confusing, with column headings of observed proportion but perhaps numbers reflecting counts.

      Thank you for pointing out this confusion. There was indeed an error in the last column, which we have now corrected.

      I found Figure 2 a bit hard to parse, with the vertical lines being unclear and the x-axis ticks of insufficient resolution to evaluate the physical extent of the signals.

      We increased the size of the label on the x-axis and detailed it on the Figure 2, which is now hopefully more clear. Moreover, we increase the size of the vertical lines.

      Finally, I wonder, given the rapid decay of signal in lyrata, whether 25kb is the right choice for evaluating load and whether the pattern may look different on a smaller scale.

      It is true that the signal decays rapidly in A. lyrata, as can be seen in the haplotype structure analysis and in line with our previous analysis of the same populations Le Veve et al (MBE 2023; in this study we explored the effect of the choice of the size of the chromosomal region analyzed; lines 266-269). However, for the sake of comparison, we prefer to stick to the same window size. The fact that we still see an effect of dominance in spite of the lower statistical power associated with the more rapid decay (because a smaller number of genes is expected to be impacted) actually reinforces our conclusions.

      Reviewer #1 (Recommendations For The Authors):

      I have a few additional suggestions to improve the manuscript.

      (1) How does the load linked to the S-locus compare to that observed in other genomic regions? It would be useful to provide a comparison of the results quantified in Figures three and four to comparable genomic regions unlinked to the S-locus. How severe is the linked load?

      This comparison to the genomic background was actually the core of our previous study (Le Veve et al MBE 2023), which was based on the same populations. This analysis revealed that polymorphism of the 0-fold degenerate sites was more than twice higher in the 25kb immediately flanking the S-locus than in a series of 100 unlinked control regions. Here, the main focus of the present study is on the effect of linkage to particular S-alleles (which was not possible previously because haplotypes had to be phased).

      (2) Details of the GLM for data underlying Figures 3 and 4 are somewhat unclear. Is the key explanatory variable (Dominance) treated as continuous? Categorical? Ordinal etc…

      Dominance is considered as a continuous variable. We specify this in line 162 of the results, in the legends of Figures 3 and 4, in the Material and Method (lines 627 and 660) and in the legend of Table S4.

      (3) I had some trouble understanding the two different p-values in columns five and six of table one. Please provide more detail.

      We understand that the two p-values in Table 1 were confusing. The first was related to the binomial test and the second to the permutation test. To be consistent with the rest of the manuscript, we conserved only the p-value of the permutation test.

      (4) As mentioned in the "weaknesses" above, the authors should be more clear about what they are quantifying. They are explicitly counting the number of variants at 0-fold degenerate sites as a proxy for the genetic load. How good this proxy is is unclear. The most egregious misstatement here was on line 314 in which they make reference to the "total load." However, this limitation should be acknowledged throughout the manuscript and deserves more attention in the methods and discussion.

      As mentioned above, we now integrate additional methods to define and quantify the load (SIFT4G and SNPeff), which reinforced our previous conclusions (lines 271-272, 297-302).

      We clarified our wording and replaced the mention of “total load” by “mean number of linked deleterious mutations per copy of S-allele” (line 324-325). In the discussion we tried to better explain the limitations of approaches to estimate the genetic load (line 431-437).

      Reviewer #2 (Recommendations For The Authors):

      Line 60, it should be specified that this is only for recessive deleterious mutations.

      Non-recessive deleterious mutations would certainly not be expected to accumulate.

      As explained in details above, the question of whether and how non-recessive deleterious mutations can accumulate when linked to the S-locus is difficult and would in itself deserve a full treatment, which is clearly beyond the scope of the present study. We clarified this point on line 56.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      The goal of the current study was to evaluate the effect of neuronal activity on blood-brain barrier permeability in the healthy brain, and to determine whether changes in BBB dynamics play a role in cortical plasticity. The authors used a variety of well-validated approaches to first demonstrate that limb stimulation increases BBB permeability. Using in vivo-electrophysiology and pharmacological approaches, the authors demonstrate that albumin is sufficient to induce cortical potentiation and that BBB transporters are necessary for stimulus-induced potentiation. The authors include a transcriptional analysis and differential expression of genes associated with plasticity, TGF-beta signaling, and extracellular matrix were observed following stimulation. Overall, the results obtained in rodents are compelling and support the authors' conclusions that neuronal activity modulates the BBB in the healthy brain and that mechanisms downstream of BBB permeability changes play a role in stimulus-evoked plasticity. These findings were further supported with fMRI and BBB permeability measurements performed in healthy human subjects performing a simple sensorimotor task. There is literature to suggest that there are sex differences in BBB dysfunction in pathophysiological conditions and the authors have acknowledged the use of only males as a minor limitation of the study that should be addressed in the future. Future studies should also test whether the upregulation of OAT3 plays a role in cortical plasticity observed following stimulation. Overall, this study provides novel insights into how neurovascular coupling, BBB permeability, and plasticity interact in the healthy brain. 

      Reviewer #2 (Public Review): 

      Summary: 

      This study builds upon previous work that demonstrated that brain injury results in leakage of albumin across the blood brain barrier, resulting in activation of TGF-beta in astrocytes. Consequently, this leads to decreased glutamate uptake, reduced buffering of extracellular potassium and hyperexcitability. This study asks whether such a process can play a physiological role in cortical plasticity. They first show that stimulation of a forelimb for 30 minutes in a rat results in leakage of the blood brain barrier and extravasation of albumin on the contralateral but not ipsilateral cortex. The authors propose that the leakage is dependent upon neuronal excitability and is associated with an enhancement of excitatory transmission. Inhibiting the transport of albumin or the activation of TGF-beta prevents the enhancement of excitatory transmission. In addition, gene expression associated with TGF-beta activation, synaptic plasticity and extracellular matrix are enhanced on the "stimulated" hemisphere. That this may translate to humans is demonstrated by a break down in the blood brain barrier following activation of brain areas through a motor task. 

      Strengths: 

      This study is novel and the results are potentially important as they demonstrate an unexpected break down of the blood brain barrier with physiological activity and this may serve a physiological purpose, affecting synaptic plasticity. 

      The strengths of the study are: 

      (1) The use of an in vivo model with multiple methods to investigate the blood brain barrier response to a forelimb stimulation. 

      (2) The determination of a potential functional role for the observed leakage of the blood brain barrier from both a genetic and electrophysiological view point 

      (3) The demonstration that inhibiting different points in the putative pathway from activation of the cortex to transport of albumin and activation of the TGF-beta pathway, the effect on synaptic enhancement could be prevented.  (4) Preliminary experiments demonstrating a similar observation of activity dependent break down of the blood brain barrier in humans. 

      Weaknesses: 

      The authors adequately addressed most of my points. A few remain: 

      (1) Although the reviewers have addressed the possible effects of anaesthesia on neuro-vascular coupling. They have not mentioned or addressed the possible effects of ketamine (an NMDA receptor antagonist) on synaptic plasticity. Indeed, the low percentage of SEP increase following potentiation (10-20%) could perhaps be explained by partial block of NMDA receptors by ketamine.

      We agree and apologize for this oversight. This important issue is now addressed in the Discussion.

      “Notably, the antagonistic effect of ketamine on NMDA receptors might attenuate the magnitude of SEP potentiation recorded in our experiments (Anis et al., 1983; Salt et al., 1988).”

      (2) The experimental paradigms remain unclear to me. Now, it appears that drugs are applied for 50 minutes and that the stimulation occurs during the "washout period". The more conventional approach would be to have the drug application during the stimulation period to determine if the drugs occlude or enhance the effects of stimulation and then washout the drugs. The problem is that drugs variably washout at different rates depending upon their lipid solubility.

      We agree that the more conventional approach would have been to continue applying the drug throughout the experiment and that differential rates of washout may add variability to our experiments. However, despite this limitation, within each treatment group we found that the SEP response at 50 minutes (immediately after the drug application window) does not differ from SEP response at 80 minutes (after 30 minutes of stimulation and washout) [Figure 3H&G]. This suggests that the drug effects were still present despite terminating drug application and performing potentiation-inducing stimulation. Moreover, our analysis showed that animals within each treatment group (except AP5) had similar SEP responses with little intra-group variability.

      (3) It is still not clear to what extent the experimenters and those doing the analysis were blinded to group. If one or both were blind to group, then please put this in the methods.

      Thank you for this comment. We revised the Methods section to clearly confirm that data was collected and analyzed blindly.  

      Reviewer #3 (Public Review): 

      Summary: 

      This study used prolonged stimulation of a limb to examine possible plasticity in somatosensory evoked potentials induced by the stimulation. They also studied the extent that the blood brain barrier (BBB) was opened by the prolonged stimulation and whether that played a role in the plasticity. They found that there was potentiation of the amplitude and area under the curve of the evoked potential after prolonged stimulation and this was long-lasting (>5 hrs). They also implicated extravasation of serum albumin, caveolae-mediated transcytosis, and TGFb signalling, as well as neuronal activity and upregulation of PSD95. Transcriptomics was done and implicated plasticity related genes in the changes after prolonged stimulation, but not proteins associated with the BBB or inflammation. Next, they address the application to humans using a squeeze ball task. They imaged the brain and suggest that the hand activity led to an increased permeability of the vessels, suggesting modulation of the BBB. 

      Strengths: 

      The strengths of the paper are the novelty of the idea that stimulation of the limb can induce cortical plasticity in a normal condition, and it involves opening of the BBB with albumin entry. In addition, there are many datasets and both rat and human data. 

      Weaknesses: 

      The conclusions are not compelling however because of a lack of explanation of methods.

      In the revised paper, we added a section titled ‘study design’ that presents an overview of the experimental approach.

      The explanation of why prolonged stimulation in the rat was considered relevant to normal conditions should be as clear in the paper as it is in the rebuttal.

      We added a new paragraph to the Discussion section explaining this point as we did in the rebuttal:  

      “Our animal experiments show that a 30 min limb stimulation (at 6Hz and 2mA) increases cross-BBB influx, while a 1 min stimulation (of similar frequency and magnitude) does not. We believe that both types of stimulations fall within the physiological range because our continuous electrophysiological recordings showed no signs of epileptiform or otherwise pathological activity. Moreover, the recorded SEP levels were similar to those reported in previous physiological LTP studies in rats (Eckert & Abraham, 2010; Han et al., 2015; Mégevand et al., 2009) and humans (McGregor et al., 2016). In humans, skill acquisition often involves motor training sessions that last ≥30 minutes (Bengtsson et al., 2005; Classen et al., 1998) and result in physiological plasticity of sensory and motor systems (Classen et al., 1998; Draganski et al., 2004; Sagi et al., 2012). Hence, the experimental task in our human study (30 minutes of repetitive squeezing of an elastic stress-ball) is likely to represent physiological activity, with neuronal activation in primarily motor and sensory areas (Halder et al., 2005). Future human and animal studies are needed to explore the BBB modulating effects of additional stimulation protocols – with varying durations, frequencies, and magnitudes. Such studies may also elucidate the temporal and ultrastructural characteristics that differentiate between physiological and pathological BBB modulation. “

      The authors need to ensure other aspects of the rebuttal are as clear in the paper as in the rebuttal too. 

      Thank you for this comment. This was addressed in the revised paper.

      The only remaining concern that is significant is that it is hard to understand the figures. 

      Thank you for this comment. We revised the figures according to the reviewer’s recommendations. We hope that these changes increase the legibility of the figures. 

      Reviewer #3 (Recommendations For The Authors): 

      The manuscript is improved but there are still suggestions that do not appear to have been addressed. More experiments are not involved in addressing these concerns but one wants the paper to be clarified in terms of what was done. 

      Figures. Please use arrows to point to the effect that the reader should see. Please note what the main point is. 

      Major concerns: 

      Please add explanations, exact p values, and other revisions in the rebuttal to the paper. 

      Rebuttal explanations were added to the paper and p values appear in figure legends.

      Fig 1d shows a seizure-like event which the authors don't think is a seizure because it lacks a depolarization ship. This explanation is not convincing because a LFP would not necessarily show a depolarization ship. Another argument of a discussion of the event as a seizure is warranted. Note that expanding the trace might also show it is unlike a seizure. Regarding the idea that 6Hz 2 mA stimuli for 30 min are physiological, the authors make three arguments which are not clear. First, no epileptiform activity was found, but in Fig. 1 it looks like a seizure occurred. Second, memory and skill acquisition in humans open involve a similar training duration - but what about 6Hz 2 mA?

      Rats are known to rhythmically move their whiskers at frequencies ranging between 5 and 15 Hz (Mégevand et al., 2009). We agree that there is no clear way to justify the similarity between the experimental design in humans and rats. However, we believe that both paradigms (paw stimulation in rats and ball squeeze in humans) represent non-pathological input that we found to modulate barrier permeability. This argument was added to the discussion of the paper:

      “We believe that both types of stimulations fall within the physiological range because in rats, activity between 515 Hz represents physiological rhythmic whisker movement during environment exploration (Mégevand et al., 2009).” 

      Seizures are typically induced in rats via direct tetanic stimulation of the brain (at 50 Hz and 0.3-2.5mA) or maximal electroshock test to the cornea (at 50 Hz and 150 mA) (Swinyard et al., 1952). We, therefore, assert that the activity we observe represents physiological responses and not seizures. This argument is beyond the scope of the current paper. 

      Please note a limitation is that the high level of serum albumin is unlikely to be physiological but may not have been as high in the animal because of the low diffusion rate and degradation (please add the refs in the rebuttal). 

      Thank you, we added the following to the Results section: 

      “The relatively high concentration of albumin was chosen to account for factors that lower its effective tissue concentration such as its low diffusion rate and its likelihood to encounter a degradation site or a cross-BBB efflux transporter (Tao & Nicholson, 1996; Zhang & Pardridge, 2001).”

      Fig. 1. 

      Please consider a box in b to show where the expanded traces in the lower row came from. 

      Thank you for the suggestion. We added lines indicating where the trace excerpts were taken from.

      c. Please use arrows to point to the parts that the authors want the reader to note. In the legend, explain what t is, and delta HbT.

      Thank you. We implemented this suggestion.

      d. It is not clear what the double-sided arrows are meant to show compared to the arrow without two sides. 

      We replaced the two-headed arrow with two single ones.

      e. Please explain what the upward lines at the top signify. What does the red asterisk mean? 

      Thank you. We implemented this suggestion.

      f. Is the reader supposed to note the yellow area? Please make it with an arrow or circle if so. 

      Thank you, we added a white circle to mark the area of tracer accumulation.

      g. Please explain what the permeability index is or reference the part of the paper that does. 

      Further to this suggestion, we added a refence to the appropriate methods section to the legend.

      h. Please use arrows to point to the area of interest. 

      Thank you. We implemented this suggestion.

      m-n. Please mark areas of interest with arrows.  m. the top right two images are unclear. I suggest making them say ipsi inset and contra inset instead of using asterisks. 

      Thank you. We added the ipsi and contra labels to panels in m. The images in panel n represent a phenomenon with no particular region of interest, but rather peri-vascular tracer accumulation along the entire depicted blood vessel. We clarified that panel n represents a separate experiment than panel m: “n. In an animal injected with both EB and NaFlu post stimulation, fluorescence imaging shows extravascular accumulation of both tracers along a cortical small vessel in the stimulated hemisphere.”

      Figure 2. 

      (2) a. Middle. What are the vertical lines at the top? The rebuttal states that was explained in the revised legends but I don't see it. 

      Our apologies. We now included an explanation that “an excerpt of the stimulation trace is shown above the middle LFP trace”.

      c and d are very different field potentials in shape and therefore hard to compare. The rebuttal addresses this but the explanation is not in the revised text. 

      We agree that there is variability in SEP responses between animals. We now added a statement acknowledging this in the methods section: “To overcome potential variability in SEP morphology between animals (Mégevand et al., 2009), each animal’s plasticity measures (max amplitude and AUC of post stimulation SEP) were compared to the same measures at baseline.” 

      In d, it is not clear there is potentiation because the traces are not aligned. 

      All panels depicting SEP traces represent raw data with no alignment. The shift observed in panel d exemplifies why we compare post-stimulation parameters of max amplitude and area under curve to baseline in each animal. 

      Exact P values are said to have been added in the rebuttal but they were not. 

      Exact P values appear in Figure legends.

      (3) b. Use arrows to mark the area of interest. 

      Thank you. We added a white circle to mark the area of tracer accumulation similar to Figure 1f.

      d. Why is there an oscillation superimposed on all traces except CNQX? 

      We agree that this is an interesting question. Future studies should determine the source of this SEP pattern.   

      (4) What does the line and the number 2 mean? How were data normalized? What was counted? What area of cortex?

      The number 2 refers to the scale bar line, meaning a log fold change of 2 reflects the size of the scale bar line. 

      The plot shows the log fold change against the mean count of each gene in the contralateral somatosensory cortex between 1 and 24 hours after stimulation.

      The x axis title was changed to “mean expression” and the legend was modified to:

      “Scatter plot of gene expression from RNA-seq in the contralateral somatosensory cortex 24 vs. 1 h after 30 min stimulation. The y axis represents the log fold change, and the x axis represents the mean expression levels (see methods, RNA Sequencing & Bioinformatics). Blue dots indicate statistically significant differentially expressed genes (DEGs) by Wald Test (n=8 rats per group).”

      How were the pericytes, smooth muscle cells, ,etc. distinguished? 

      This was explained under Methods->RNA Sequencing & Bioinformatics: “Analysis of cell-specific and vascular zonation genes was performed as described (Vanlandewijck et al., 2018), using the database provided in (http://betsholtzlab.org/VascularSingleCells/database.html).”

      What were the chi square statistics? If there were cells used instead of rats, please justify. 

      Thank you. The legend was expanded to include the following:

      “The contralateral somatosensory cortex was found to have a significantly higher number of DEGs related to synaptic plasticity, than the ipsilateral side (***p<0.001, Chi-square).”     

      (5) b. what do the icons mean? 

      We agree that the icons were confusing. We simplified this panel to just show when participants were asked to squeeze the ball (black icon). This explanation was added to the Figure legend.

      Abbreviations? 

      Abbreviations of MRI protocols were added to the figure legend for clarity.

      In c-e what are the units of measure? Fold-change? 

      The units represent t-statistics values for each voxel. The label ‘t-statistic’ was added to the figure.  

      What are the white Iines, + and - signs? 

      The white lines point to voxels of highest activation (t-statistic). This was added to the legend.

      And these are not +/- signs these are voxels with significant activation which only appear similar.

      f. Please explain f and g for clarity. 

      Thank you. The explanation was modified for added clarity.

      Supplemental Fig. 4. 

      Original question: If ipsilateral and contralateral showed many changes why do the authors think the effects were only contralateral? 

      The authors replied: Our gene analysis was designed to complement our in vivo and histological findings, by assessing the magnitude of change in differentially expressed genes (DEGs). This analysis showed that: (1) the hemisphere contralateral to the stimulus has significantly more DEGs than the ipsilateral hemisphere; and (2) the DEGs were related to synaptic plasticity and TGF-b signaling. These findings strengthen the hypothesis raised by our in vivo and histological experiments. 

      Could the authors clarify the answer to the question in the text? 

      Thank you. This section was added to the Discussion. 

      Papers referenced in this letter:

      Anis, N. A., Berry, S. C., Burton, N. R., & Lodge, D. (1983). The dissociative anaesthetics, ketamine and phencyclidine, selectively reduce excitation of central mammalian neurones by N-methyl-aspartate. British Journal of Pharmacology, 79(2), 565–575. hQps://doi.org/10.1111/j.1476-5381.1983.tb11031.x

      Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience, 8(9), 1148–1150. hQps://doi.org/10.1038/nn1516

      Classen, J., Liepert, J., Wise, S. P., Hallett, M., & Cohen, L. G. (1998). Rapid plasticity of human cortical movement representation induced by practice. Journal of Neurophysiology, 79(2), 1117–1123. hQps://doi.org/10.1152/JN.1998.79.2.1117/ASSET/IMAGES/LARGE/JNP.JA47F4.JPEG

      Draganski, B., Gaser, C., Busch, V., Schuierer, G., Bogdahn, U., & May, A. (2004). Changes in grey matter induced by training. Nature, 427(6972), 311–312. hQps://doi.org/10.1038/427311a

      Eckert, M. J., & Abraham, W. C. (2010). Physiological effects of enriched environment exposure and LTP induction in the hippocampus in vivo do not transfer faithfully to in vitro slices. Learning and Memory, 17(10), 480–484. hQps://doi.org/10.1101/lm.1822610

      Halder, P., Sterr, A., Brem, S., Bucher, K., Kollias, S., & Brandeis, D. (2005). Electrophysiological evidence for cortical plasticity with movement repetition. European Journal of Neuroscience, 21(8), 2271–2277. hQps://doi.org/10.1111/J.1460-9568.2005.04045.X

      Han, Y., Huang, M. De, Sun, M. L., Duan, S., & Yu, Y. Q. (2015). Long-term synaptic plasticity in rat barrel cortex. Cerebral Cortex, 25(9), 2741–2751. hQps://doi.org/10.1093/cercor/bhu071

      McGregor, H. R., Cashaback, J. G. A., & Gribble, P. L. (2016). Functional Plasticity in Somatosensory Cortex Supports Motor Learning by Observing. Current Biology, 26(7), 921–927. hQps://doi.org/10.1016/j.cub.2016.01.064

      Mégevand, P., Troncoso, E., Quairiaux, C., Muller, D., Michel, C. M., & Kiss, J. Z. (2009). Long-term plasticity in mouse sensorimotor circuits after rhythmic whisker stimulation. Journal of Neuroscience, 29(16), 5326– 5335. hQps://doi.org/10.1523/JNEUROSCI.5965-08.2009

      Sagi, Y., Tavor, I., HofsteQer, S., Tzur-Moryosef, S., Blumenfeld-Katzir, T., & Assaf, Y. (2012). Learning in the Fast Lane: New Insights into Neuroplasticity. Neuron, 73(6), 1195–1203. hQps://doi.org/10.1016/j.neuron.2012.01.025

      Salt, T. E., Wilson, D. G., & Prasad, S. K. (1988). Antagonism of N-methylaspartate and synapBc responses of neurones in the rat ventrobasal thalamus by ketamine and MK-801. British Journal of Pharmacology,

      94(2), 443–448. hQps://doi.org/10.1111/j.1476-5381.1988.tb11546.x

      Swinyard, E. A., Brown, W. C., & Goodman, L. S. (1952). Comparative assays of antiepileptic drugs in mice and rats. The Journal of Pharmacology and Experimental Therapeutics, 106(3), 319–330. hQp://jpet.aspetjournals.org/content/106/3/319.abstract

      Tao, L., & Nicholson, C. (1996). Diffusion of albumins in rat cortical slices and relevance to volume transmission. Neuroscience, 75(3), 839–847. hQps://doi.org/10.1016/0306-4522(96)00303-X

      Vanlandewijck, M., He, L., Mäe, M. A., Andrae, J., Ando, K., Del Gaudio, F., Nahar, K., Lebouvier, T., Laviña, B.,

      Gouveia, L., Sun, Y., Raschperger, E., Räsänen, M., Zarb, Y., Mochizuki, N., Keller, A., Lendahl, U., &

      Betsholtz, C. (2018). A molecular atlas of cell types and zonation in the brain vasculature. Nature, 554(7693), 475–480. hQps://doi.org/10.1038/nature25739

      Zhang, Y., & Pardridge, W. M. (2001). Mediated efflux of IgG molecules from brain to blood across the blood– brain barrier. Journal of Neuroimmunology, 114(1–2), 168–172. hQps://doi.org/10.1016/S01655728(01)00242-9

    1. to simply slot Clarkson into the standard history of the field would miss much of the point

      I think this is too much modesty, or a kind of self-undercutting to try to convey the importance of the point. But functionally it undercuts the significance of the earlier parts of the chapter. At the end here I'm understanding the argument as being

      1. The antislavery campaign shows what state-of-the-art data visualization meant c. 1800, and these two different visualizations from Clarkson make the case that he should be considered one of the canonical figures.
      2. That's important because it heightens a set of ethical and political questions about whether and when to visualize. Clarkson's work can be considered a countervisualization or something -- possibly a concept to introduce ? -- because it's taking advantage of the trade etc. Also highlights dataviz as a political-rhetorical form, not just a scientific practice about astronomy etc.
      3. Just because we admire things about Clarkson's career doesn't mean should literally canonize him as a saint. Equiano's reaction shows that even at that time there are a different set of requirements.

      And then there is the metaphor of water and streams. This does a few things: 1. provides a counterpoint to the God's-eye, object view by adding a contingency of flow and direction, fluidity, and contingency. 2. was useful for ~1800 readers who ALSO weren't always looking for this objective god's-eye view, which is OK. (I think the infographic/dataviz distinction from the introduction here is useful, because it underlines that the more 'subjective' or whatever flow timelines are an ADVANCE on Priestley's straight lines and can be seen as such. 3. Motivates your own data visualization of the streams of with the also-canonical Mississippi visualization. I may have missed this but I think the connection here is almost fully implicit. This could be one key to motivating the water thing as your own choice.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1:

      We thank the reviewer for his/her time and for the constructive comments. Below please find our detailed responses to your points.

      STING is a key signalling hub in the innate immune system, receiving multiple inputs from upstream activators (such as cGAS) and in turn triggering multiple downstream events (such as IFN induction, NF-kB signalling, autophagy, cell death). Mutations in the STING gene cause a rare inflammatory disease called SAVI. Using a previously established STING ki mouse that recapitulates some of the clinical observations in SAVI patients, this manuscript tests the hypothesis that TNF signalling drives pathology. Using anti-TNF antibody and TNF receptor knockout, the authors show that TNF indeed plays important roles in causing disease in this mouse model. For example, the loss of T cells and neurons is prevented when TNF signalling is blocked, and lung pathology is rescued in STING ki mice lacking TNF receptors. Overall, the manuscript is well written and laid out, and the experimental work is of a high technical standard.

      Major comments

        • Most figures show pooled data from two independent experiments including a total of 5-8 mice. Given the variability in some of the readouts, this raises the question of whether there is sufficient statistical power to draw conclusions. For example, in Figure 2, the conclusion that "Infliximab did not alter the expression of inflammatory mediators" seems questionable given the results in Figure 2F and G. Did the authors perform a power calculation? What effect size can the authors detect given the variability and number of replicates? Similarly, in Figure 3, the authors conclude that "Disruption of TNFR signaling did not significantly prevent T cell lymphopenia"; however, with some more replicates, the data in Figure 3D would likely reach significance. Similar concerns apply to several panels in Figures 4 and 6 and to Figure S5M. Ideally, the authors should perform additional repeat experiments to increase the number of replicates. If that is not possible, power calculations need to be provided and conclusions should explicitly mention the minimum effect size that the author can detect given the small sample size (for example "Infliximab did not alter the expression of inflammatory mediators more than x-fold").* Thank you for this suggestion. However, it is not possible to repeat the treatment of mice with Infliximab for generation of more replicates. The blockade of TNF signalling by treatment with drugs did not cure the murine SAVI disease. According to animal welfare restrictions, we cannot perform additional treatment experiments with Infliximab or Etanercept.

      We analysed the effect size d, f and power of all these presented results and collected them in table S4. Additional explanations about effect sizes were added in the corresponding text to Figures 2 and 3. The demonstrated results in Figure 4 and 6 already contain significant data. We did not include the calculation of effects sizes here. All effect size and power calculations are summarized in table S4.

      • The authors should not make unjustified overstatements. For example, STING KI; TNFR1/2 KO mice should not be referred to as a "new mouse model". The manuscript simply tests the role of TNFR1/2 in the already published STING N153S model. In line 687, avoid using "impressively" and in line 734 avoid using "massively".*

      • *

      Thank you for this suggestion. We changed this sentence into:…”these newly generated mouse lines of TNFR”…., see line 796. Additionally, in line 687 (actual line 705) we omitted “impressively” and in line 734 “massively produced” into “elevated” (actual line 752).

      Minor comments

      • Line 767-769: The statement that spike activates cGAS is misleading, because this effect is an indirect consequence of cell-to-cell fusion (Liu et al 2022).*

      • *

      Thank you for this suggestion. We changed this sentence into: Cell fusion caused by the SARS-CoV-2 spike protein is a potent… (actual line 785).

      Reviewer #1 (Significance (Required)):

      • *

      The main strengths of this study are (1) the use of complementary antibody-based and genetic methods to test the role of TNF signalling; (2) the use of multiple different readouts; and (3) the analysis of many different cell types / organ systems. The main weaknesses are (1) small sample sizes limiting statistical power (see above) and (2) the exclusive use of mouse models.

      • *

      Overall, my opinion is that the advance is important, both fundamentally and clinically. Studies of this and the related V154M mouse model previously showed an important role of non-IFN pathways in driving disease. This study indicates that TNF signalling may cause pathology. This not only extends our understanding of STING's role in autoinflammation but also opens a direct therapeutic avenue using approved TNF targeting drugs.

      • *

      This study will be primarily of interest to specialised audiences working on STING and SAVI, and secondarily to the wider innate immunity field.

      • *

      This reviewer has expertise in the field of nucleic acid sensing, including cGAS-STING.

      • *

      • *

      Reviewer #2:

      We thank the reviewer for his/her time and for the constructive comments. Below please find our detailed responses to your points.

      *In this paper, Luksch et al (2024) examines the role of TNF signaling in STING-associated vasculopathy with onset in infancy (SAVI). By using pharmacological inhibition and genetic inactivation of TNF receptors in a murine SAVI model (STING ki), the research found that pharmacologically inhibiting TNF signaling improved T cell lymphopenia but had limited effects on lung disease. Genetic inactivation of TNFR signaling, particularly TNFR1, enhanced thymocyte survival and expanded the peripheral T cell pool, reducing inflammation and neurodegeneration. The development and progression of severe lung disease in STING ki mice are also reliant on TNFR1 signaling, while TNFR2 deletion did not alleviate lung inflammation. The authors also explored the severe inflammatory lung disease manifestation, showing that primary lung endothelial cells in STING ki mice allowed more neutrophil attachment compared to those in STING WT mice, indicating chronic STING activity in endothelial cells disrupts the endothelial barrier and promotes severe lung disease. The study highlights TNFR signaling as crucial in SAVI and COVID-19 progression and suggests blocking TNFR1 signaling as a potential therapeutic approach for both diseases. *

      • *

      Major comments:

      The paper establishes a strong connection between TNFR1 depletion and the reduction of SAVI disease severity in lung and neuroinflammation, suggesting TNFR1 blockade as a viable therapeutic strategy for SAVI. To strengthen the arguments and improve the therapeutic potential, the authors should address the following major comments:

        • The authors conclude that TNFR1 signaling drives murine SAVI disease, as evidenced by the reduced severity of lung disease in TNFR1 -/- mice. While the genetic model is convincing, the discrepancy between pharmacological inhibition and genetic models needs clarification. Before attributing the pharmacological failure to late administration, have the authors considered that Infliximab might not sufficiently deplete TNF to achieve therapeutic benefits? In figure 2H, serum TNF levels were not significantly altered in STING ki mice treated with Infliximab. Have the authors considered using other TNF inhibitors or alternative methods to measure TNF depletion efficacy in STING ki murine models, such as qPCR, flow cytometry, or immunohistochemistry in lymph nodes or lung tissues?* Thank you for this suggestion. In a preliminary experiment, we already treated STING WT and STING ki mice with Etanercept which is not included in the paper. 3-week-old mice were treated with subcutaneously injection of 25 mg/kg Etanercept or saline, twice per week, for 7 weeks. After treatment, all mice were euthanized and single cell suspensions of blood and spleen were used for flow cytometry analysis. Lung tissue was harvested for histological analysis. Quantification of gene expression was performed by snap frozen lung and kidney tissue and quantification of secreted proteins was analysed by snap frozen serum.

      The transcription of ISGs and proinflammatory mediators in lung tissue was not significantly improved by the Etanercept treatment of mice, see additional figure below (A – D). Interestingly, the amount of secreted CXCL9 in the serum was reduced in Etanercept treated mice compared to vehicle treated mice (E). We concluded that our treatment strategy had no impact in the manifestation and progression of murine SAVI disease, in highly inflamed tissues / organs. However, we found a reduction (partially significant) of proinflammatory mediator transcriptions in the kidney of Etanercept treated mice compared to vehicle control mice. Murine SAVI disease is a systemic autoinflammatory disease without histological alteration in kidney tissue of 10 weeks old mice. Remarkably, transcription of ISGs and proinflammatory mediators is highly upregulated in SAVI mice. Treatment with Etanercept improved this aberrant gene expression in murine SAVI influenced tissue / organ (I – K). These results encouraged us to perform the treatment with infliximab because we expected a more pronounced effect since infliximab can bind the monomeric and trimeric form while etanercept can only bind to the active trimeric from of TNF.

      Etanercept treatment of STING WT (in black) and ____STING ki (in red)____ mice.

      (A) Relative expression level of Cxcl10, (B) Mx1, (C) Tnf and (D) Il1b in lung tissue of Etanercept or saline treated STING WT and STING ki mice. (E) Quantification of CXCL9, (F) CXCL10, (G) IL-6 and (H) TNF in serum samples from STING WT and STING ki mice after treatment. (I) Relative expression level of Cxcl10, (J) Mx1, (K) Tnf and (L) Il1b in kidney tissue of treated mice.

      • The TNF pathway exhibits redundancy, as multiple signaling molecules or pathways can compensate for the loss of TNF function to maintain cellular processes and immune responses. The authors showed that thymocytes of STING ki mice lacking TNFR1/2 expressed significantly lower levels of IFN-related genes (Cxcl10, Sting1), and mice lacking TNFR1 and TNFR1/2 expressed reduced levels of NF-κB-related genes. Does this imply that IFN and NF-κB pathways are downstream of TNF signaling driving SAVI progression? It would be valuable to hear the authors' comments or postulations on the potential mechanisms of TNF driving SAVI progression in the discussion, and the methods to dissect the mechanisms further using genetic or pharmacological methods.*

      Thank you for this suggestion. STING is a key player in various proinflammatory mechanism and is directly involved in IFN and NF-κB signalling. We assume that these signalling pathways are adaptable to various proinflammatory situations. Knock out of TNFR1 and TNFR1/2 results in a strong inhibition of all inflammatory reactions in the whole organisms. We think, it is not possible to conclude mechanisms of murine SAVI manifestation and progression from the results of these mouse lines only. These observations provide new hypothesis, but cannot completely explain the mechanism.

      • The authors mentioned that the pharmacological inhibition of TNF by Infliximab is ineffective due to late administration compared to the onset of SAVI. How would this affect the therapeutic treatment of TNF if the treatment is going to be later than the disease onset? Can the authors elaborate on the potential ways to circumvent the timing of treatment? Would TNFR1 antagonists experience the same issue? To understand disease progression and optimal targeting times, the creation of an inducible TNFR1/2 -/- mouse model could be beneficial. This is optional, but the authors are encouraged to comment on improving TNFR1/2 -/- mouse SAVI models to further study the therapeutic potential of TNF signaling blockage in treating SAVI.*

      We agree with the suggestion. In the next project, we want to generate STING ki mice with inducible knock out.

      Minor comments:

      • The authors separate STING WT and STING ki into different graphs, which can sometimes make it hard to compare STING WT and STING ki baseline levels. It would be beneficial to merge the two genotypes into single graphs for easier comparison.*

      Thank you for this suggestion. In the first version of this manuscript, we collected results from STING WT and STING ki mice in one graph with 8 bars in different colours and textures in the case of TNFR knock out lines. These graphs were overloaded and very confusing. It is was not possible to mark statistical calculations inside these graphs without losing the focus. Hence, we created the demonstrated design of graphs. We think this is the most convincing version.

      • Figure S5 lacks statistical annotations, although the legends mention them. Are the statistics usually shown when a comparison is mentioned in the text, or are they only displayed when the differences are significant? It would be helpful if the authors could clarify this and ensure that all relevant statistical comparisons are clearly reflected in the graphs, regardless of the significance level. This consistency would improve the clarity and interpretation of the data presented.*

      • *

      Thank you for this suggestion. We removed the significance level from the legend of Figure S5 (actually line 1199).

      • *

      The authors did an excellent job discussing the study's implications, but some of this content could be moved to the introduction. The hypothesis that "tumor necrosis factor (TNF) signaling is involved in the manifestation and progression of murine SAVI disease" can be introduced more naturally once the authors present previous findings on TNF's association with various autoimmune disorders. This would set a clear context for the study's objectives and rationale.

      We agree with this suggestion and inserted the sentence: “In our previous investigations, we observed an elevated transcription of Tnf in spleen and thymus of STING ki mice (Siedel et al., 2020).” (actual line 89/90).

      General Assessment: The study identifies enhanced TNF signaling as a driver of SAVI and specifies TNFR1 blockage as a promising treatment to reduce disease severity. It thoroughly characterizes pharmacological inhibition and genetic perturbations of TNF signaling in murine SAVI models and creates a novel mouse model for studying TNF-targeted therapies in SAVI treatment.

      *However, the study is limited in characterizing the discrepancy between pharmacological inhibition and genetic depletion of TNF and understanding the underlying mechanisms of TNF driving chronic STING activation and tissue inflammation. *

      Advances: The study extends knowledge in the field by demonstrating that enhanced TNF signaling drives SAVI, establishing causation rather than mere correlation. The authors provide strong rationale for treating SAVI with TNF inhibitors/blockage, previously used in other autoimmune disorders like IBD or Crohn's disease, but not in SAVI. They also present a valuable genetic model for studying TNFR signaling blockage in SAVI progression, which is important for both the field of SAVI and future therapy development.

      Audience: The research provides translational and clinical insights by suggesting that targeting TNFR1 signaling could inspire novel treatments for SAVI. The study also advances basic research on SAVI disease progression. Immunologists and clinicians studying and treating autoimmune disorders are the intended audience, but the findings have broader implications. The study highlights the potential role of TNF signaling in COVID-19 disease progression and treatment, thus attracting interest beyond the field of autoimmune disorders.

      • *

      Field of expertise:

      cGAS-STING regulation in chromosomally unstable cancers, genomic instability, nuclear envelope rupture and repair

      Do not have sufficient expertise in:

      Immunological underpinning of autoimmune disorders, clinical models or manifestations of SAVI

      • *

      • *

      Reviewer #3:

      We thank the reviewer for his/her time and for the constructive comments. Below please find our detailed responses to your points.

      • *

      Uncontrolled activation of STING is linked to autoinflammatory disease "STING-associated vasculopathy with onset in infancy (SAVI)". The authors had previously published a mouse model of SAVI, which was generated by knocking in the disease causing variant N153S into the endogenous murine Sting1 gene (STING ki) (Luksch et.al., 2019). In the current study, the author further investigated the role of tumor necrosis factor (TNF) signaling in manifestation and progression of murine SAVI disease by using the approach of pharmacologic and genetic inhibition of TNF receptors TNFR1 and TNFR2. Overall, the authors were able to demonstrate the following novel findings:

      • *

      1) Infliximab treatment of STING ki mice significantly increased the number of blood CD8+ T cells and thymic cells count. The authors claimed that the pharmacological inhibition of TNF signalling has a partial rescue effect of T cell lymphopenia. However, pharmacologic inhibition of TNF signalling however has no effect on lung disease.

      2) On the other hand, STING ki;Tnfr1-/- (lacking TNFR1) showed the similar modest rescue of the CD8+ T and CD4+ T cells in blood compared to the WT C57BL/6 (BL6) but not with STING ki;Tnfr2-/- (lacking TNFR2). STING ki;Tnfr1-/-, STING ki;Tnfr2-/- and STING ki;Tnfr1/2-/- had modest rescue of thymic cell count and reduced spleen cell count (reduced splenomegaly). Along with the rescued thymic content and reduced splenomegaly, genetic ablation of TNF signalling (STING ki;Tnfr1-/-) also prevented manifestation of severe inflammatory lung disease.

      3) To investigate the role of lung endothelial cells in the development of interstitial lung disease, primary murine lung endothelial cells from STING WT, STING ki and STING WT;Tnfr1/2-/- and STING ki;Tnfr1/2-/- mice were isolated and bulk RNAseq was performed. This showed decreased level of several proinflammatory cytokines (e.g. Tnf, Il1b) and chemokines (e.g. Cxcl1, Cxcl2, Cxcl9, Cxcl10, Ccl2, Ccl3 and Ccl4) in STING ki mice lacking TNFR1/2 compared to STING ki mice.

      4) Neutrophils were isolated from bone marrow and were added to cultured primary lung endothelial cell monolayers. The experiments demonstrated that the attachment and transmigration of neutrophil cells were dependent on expression of STING gain-of-function mutation in endothelial cells.

      • *

      A few points require clarification before publication of this study.

      • Tnfr1-/-, Tnfr2-/- and Tnfr1/2-/- did not show any statistical significant improvement of thymic cell count in STING ki mice. As such, the statement in the conclusion/summary section of discussion regarding Tnfr1 can restore thymocyte numbers should be toned-down.
      • Thank you for this suggestion. In Figure 4 E, we demonstrated that knock out of TNFR1 leads to increasing of SP CD8 thymocyte count and partially of SP CD4 thymocyte count (Fig. 4 D). In agreement with this suggestion, we marked this subpopulation of thymocytes in the discussion and summary section, see actual line 684 and see actual line 794.

      2) The section on Neuroinflammation and neurodegeneration and dependency of TNFR1/2 signaling is very currently difficult to follow (based on how the data are presented in figures and text). This section requires to be re-written for clarity.

      • *

      Thank you for this suggestion. We re-wrote this section, see line 472 - 499.

      Neuroinflammation and neurodegeneration in dependency of TNFR1/2 signaling

      The extent of inflammation in mouse brain resulting from constitutive activation of STING N153S was reported by quantifying the density of Iba1-positive microglia (Fig.5 A). Consistent with our previous findings (Szego et al., 2022), the density of Iba1-positive microglia in the substantia nigra was higher in STING ki;BL6 mice than in STING WT mice (Fig.5 B). TNFR deficiency did not affect neuroinflammation because there was no significant difference between the density of Iba1-positive microglia between STING ki;BL6 mice and STING ki;Tnfr1/2-/- mice (Fig.5 B). This suggests that the TNF pathway is not required for STING-induced microglia activation in the substantia nigra.

      In addition, we measured the extent of STING-induced astrogliosis by quantifying the density of GFAP-positive cells (Fig. 5 A). Consistent with our previous findings, the density of GFAP-positive astroglia was higher in STING ki than in STING WT mice (Fig. 5C). Yet, as for microglia, there was no significant difference between the density of GFAP-positive astroglia between STING ki;BL6 mice and STING ki;Tnfr1/2-/- mice (Fig.5 C), suggesting that the TNF pathway is not required for STING-induced astrogliosis in the substantia nigra.

      Finally, we measured the extent of STING-induced neurodegeneration by quantifying the density of TH-positive dopaminergic neurons in the substantia nigra (Fig. 5A). As in our previous findings, the density of TH-positive neurons was lower in STING ki;BL6 mice than in STING WT mice (Fig.5 D). The density of TH-positive neurons in the substantia nigra of STING ki;Tnfr1/2-/- mice was higher than the density of TH-positive neurons in the substantia nigra of STING ki;BL6 mice (Fig. 5 D), suggesting that the STING-induced degeneration of TH-positive neurons was blunted in Tnfr1/2-/- mice and that TNFR1/2 are involved in the STING-induced degeneration of dopaminergic neurons.

      Hence, there is a discrepancy between STING-induced effects on glial cells as opposed to STING-induced effects on neurons. The dependence of STING-induced neurodegeneration but not glial response on TNFR1/2 suggests that the STING-induced degeneration of dopaminergic neurons is not a direct consequence of microglia or astroglia activation. This is consistent with the emerging concept of a neuron-specific inflammatory response (Welikovitch et al., 2020).

      *The powerful use of in vivo genetic KO models and TNF inhibitor makes this study a valuable contribution to the field - helping further decipher the importance of the NF-KB/TNF branch of STING in SAVI (knowledge gap). The audience for this work would be specialised to STING biology and potential clinical treatments of SAVI. *

      • *

      Our expertise is in nucleic acids sensing (such as STING) and auto-immunity.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers

      We thank the reviewers for their comments and suggestions, which we think are helpful and will improve the manuscript, and intend to address with the changes and planned revisions below.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Bello et al look at the SNP rs28834970 associated with Alzheimer's disease (AD), with C being the risk allele, on chromatin accessibility and expression of a nearby gene, PTK2B, in microglia. Their contention is that the single SNP affects chromatin accessibility and binding of the transcription factor CEBP[beta] in an intronic region of PTK2B and thereby affects PTKB expression. I had a few questions that I think are critical to be addressed. Please note that my numbering of panels is based on the figures, not the legends, which do not seem to quite agree with each other. There are also some figure legends that say "IFNg" while the figures say "LPS", which should be fixed.

      We apologise for the mistake in the figure legend that made this confusing, which we have now revised.

      The abstract says that editing a line that is homozygous for protective alleles to homozygous for risk results in "subtle downregulation of PTK2B expression". It isn't clear to me that the presented data fully supports this contention, which is central to the argument of the paper. In figure 2e, the authors show in both RNAseq and ddPCR that there is numerically lower PTK2B expression but this is not indicated to be statistically significant by one-way paired ANOVA. If there is no nominally significant difference in the edited lines, compared to the proposed significant differences in lines carrying the full risk haplotype (figure 1), then it would not seem sensible to ascribe the effects to the single edited base pair.

      We agree with the reviewer that given the effect of the SNP on PTK2B expression in the edited lines is small and only significant in macrophages, we should not interpret the effects to be mediated solely through PTK2B expression, and have substantially reworded the manuscript accordingly.

      Whilst the effects in the eQTL analysis are significant, it is worth noting that this is likely due to the much larger number of donors (133-217) giving greater power to detect the subtle changes in expression (~1.1 to 2 fold in eQTL). This change is of a similar magnitude in our SNP edited lines (~1.2 fold in SNP edited lines) as would be expected of most common regulatory variants so we believe that it could be the primary causal variant. However, we cannot exclude that other variants in the haplotype could contribute to the effect, so have also reworded accordingly to make this clear.

      Given this uncertainty about the overall strength of effect of the single base pair change it would seem important to evaluate the proposed mechanism of CEBPb binding. It wasn't clear whether the ATAC-seq data summarized in the volcano plot in 2C is proposed to be a cause or a consequence of the CEBPb binding change. I am assuming that the 'fold change' estimate here is CC compared to TT, which would be consistent with direction of effect in figure 1, but please clarify.

      We apologise for the mistake in the figure legend that made this confusing, which we have now revised along with clarification in the revised text. It is difficult to be sure whether changes in chromatin accessibility are a cause or consequence of CEBPb binding, but the fact that the binding of CEBPb is increased in the CC allele (Fig 2a, Fig 2c), that the C allele better matches the consensus sequence (Fig 2b) and there is increased chromatin accessibility (Fig 2a, Supp Fig 3b) suggests that CEBPb binding is causing the formation of the region of chromatin accessibility.

      In contrast to the subtle effects at PTK2B, the global transcriptional effects in figure 3 look quite strong. Are any of these changes dependent on PTK2B, that is to say, are they mimicked by partial suppression of PTK2B expression or activity?

      We agree that the downstream effects of the SNP are much stronger than the effects on PTK2B expression, and we have substantially reworded the manuscript to make it clear that we are unsure that the effects of the SNP are all mediated via PTK2B.

      However, we note that there is evidence in the literature of a loss in CCL4 and CCL5 expression upon PTK2B knockout in macrophages (https://www.nature.com/articles/s41467-021-27038-5) and inhibition of PTK2B in monocytes results in a reduction in CCL5 and CXCL1 (https://www.nature.com/articles/s41598-019-44098-2) consistent with our observations.

      Experiments to manipulate PTK2B expression in microglia and readout changes at the RNA level would take a few months to complete, but we would be willing to do this if the reviewer felt this was necessary.

      Finally, in figure 4, it should be clarified as to why lower expression of PTK2B would be expected to have a detrimental effect on Alzheimer's risk. If understood correctly, and again fixing the figure legends would be helpful, the CC edited lines (risk) have lower chemokine induction than the unedited TT lines.

      We apologise for the error in this figure which we have corrected in the revised version. You are correct that the CC lines have a lower chemokine level in both unstimulated and stimulated cells, and we have now discussed further how this may be linked to increased disease risk.

      "Even though overexpression of these chemokines is characteristic of neuroinflammation, correlated with disease progression and found in late stages of AD, knockout of chemokines, such as CCL2, and chemokine receptors, such as CCR2 and CCR5, in mice is associated with increased Aβ deposition and accumulation [47,50-52,107]. It has also been found that patients carrying CCR5Δ32 mutation, which prevents CCR5 surface expression, develop AD at a younger age[108]. Therefore, we hypothesize that in individuals carrying the C/C allele of rs28834970 downregulation of these chemokines in macrophages and microglia harbouring the C/C allele of rs28834970 affects Aβ-induced microglia chemotaxis, leukocytes recruitment and clearance of Aβ, and may increase the risk of developing symptomatic AD"

      Reviewer #1 (Significance (Required)):

      Going from GWAS hits, which represent blocks of high LD inherited variants, to single functional variants is a difficult problem in human genetics. The current paper attempts to isolate the effect of a single variant within an LD block on IPSC derived macrophages and microglia. This idea might be useful in nominating PTK2B as a therapeutic target for AD, although there is some question in my mind as to direction of effect.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      SUMMARY: In this manuscript the authors explore the biological effects of an intronic SNP in the PTK2B gene, previously shown to be associated with late onset Alzheimer's disease (AD) risk. Based on the likely effect of the SNP locus on PTK2B expression in the macrophage lineage, the authors explore the consequences of introducing with the Crispr/Cas9 technique the biallelic SNP base change (C/C vs T/T) in a human IPSC line that is then differentiated into macrophages or microglia. They observe that C/C increases chromatin accessibility and CEBPb binding in comparison to T/T, with a slight decrease in PTK2B expression, significant in macrophages but not in microglia. The authors then investigate the transcriptome changes induced by the C/C mutation and find alteration in many genes, including a decreased expression of a number of cytokine or receptor proteins involved in inflammatory responses. The authors also mention a decreased effect on IFNg-induced reduced mobility but the data are missing (see Figure errors below). Overall the authors propose that the risk SNP is associated with a decreased PTK2B expression and hypothesize a link between this change and a decreased function of macrophages/microglia that may contribute to AD pathology.

      MAJOR COMMENTS

      1- The authors claim that their results show that the investigated SNP has a causal effects in "microglial function" (Title) and in Alzheimer's disease (AD) (Abstract 2nd sentence "Here we validate a causal single nucleotide polymorphism (SNP) associated with an increased risk of Alzheimer's disease". The word "causal" is repeated many times. However the authors should qualify their claim with respect to AD. Their results do show that the SNP has an effect on chromatin accessibility, CEBP binding, PTK2B expression and transcriptome, but the link between these changes is not formally demonstrated and their potential role in AD-like phenotype is not explored. The "causal" role is not formally and logically demonstrated. It remains an interesting, plausible hypothesis and the results provide strong arguments in support of that hypothesis but do not prove it, yet.

      Concerning the title, "causal effects on microglial function" is awkward, anything that has effects is logically "causal" in these effects. The title should be "... has effects on microglial functions" or "... alters microglial function".

      We agree with the reviewer that given the effect of the SNP on PTK2B expression in the edited lines is small and only significant in macrophages, we should not interpret the effects to be mediated solely through PTK2B expression, or that they cause AD. We have substantially reworded the manuscript throughout to account for this.

      2- One major difficulty in the results is to link the slight decrease in PTK2B transcript, which is only significant in macrophages, with the rest of the phenotype. Because what matters to make this link is not the mRNA but the protein, and because mRNA levels are often not strictly correlated with the protein levels, the authors should measure the PTK2B/PYK2 protein levels in their differentiated cell lines in basal conditions and following activation (as they do for other readouts) using immunoblotting. A robust and significant diminution in PYK2 protein would strongly support its role in linking PTK2B expression and transcriptome change.

      We have performed preliminary analyses of PTK2B expression by Western blot in these cell lines after differentiation, but were unable to observe a significant change in abundance in the edited cell lines. This is not unexpected given the results at the RNA level, since the effect size of this common regulatory variant is likely very small (estimated to be ~1.2 fold from the eQTL analysis), and likely within the variability of this assay.

      As mentioned above, we have reworded the manuscript to avoid interpreting that the effects of rs28834970 are mediated solely through effects on PTK2B expression. We think that an experiment to manipulate PTK2B levels (see next point) may be a better way to demonstrate whether these effects are mediated through PTK2B expression.

      An optional additional key experiment would be to reverse the transcriptome phenotype by increasing the expression of PTK2B (e.g. by cDNA transfection). Note that these points are important because an alternative hypothesis to explain the effects of C/C mutation on macrophage function would be that the C/C mutation has a long distance effect on other chromatin regions with key role in regulating these cells.

      We agree that this would be a valuable experiment, and are planning additional experiments to investigate the effect of manipulating PTK2B levels (through knockout) on microglia.

      3- The manuscript contains several errors in the figures and figure legends. In Fig. 2 the legends for the figure items are shuffled. Figure 4 and Supplementary Figure 5 are duplicates of the same one. Consequently important data are not presented.

      We apologise for the errors in these figures that were due to a mistake during uploading where the incorrect versions were used. The legends for figure 2 and panels in figure 4 have now been corrected, and show the effects of rs28834970 on microglial migration and chemokine release in the presence or absence of IFNg.

      4- When the number of replicates is small (e.g. n = 3) it is preferable to use non parametric tests (rank analysis, e.g. Mann Whitney's test) rather than t test. This applies to Figures 2D (current legend 2A), 2E (current legend 2B), Figure 4A-C, Supplementary Figures 2A, 2B. In Supplementary Fig 4E (MARCO) the number of replicates (presumably 3 because based on RNAseq) and the used test are not indicated. Is it the RNAseq statistical analysis?

      We thank the reviewer for this comment. We acknowledge that the t-test may lead to inflated false discovery rates. However, it has been shown that for small sample sizes parametric tests have a power advantage compared to non-parametric ones that may outweigh the possibly exaggerated false positives. See https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02648-4#Sec3 which states:

      "In conclusion, when the per-condition sample size is less than 8, parametric methods may be used because their power advantage may outweigh their possibly exaggerated false positives."

      We have also modified the legend of supplementary figure 4E to clarify the number of replicates used.

      5- In addition to the above comment on tests, when the number of replicates is small it is not appropriate (and misleading) to show box plots or bars with SEM. In the indicated figures the individual data points should be shown.

      We now show individual replicates on box plots (Figure 2D, 2E and supp figure 4E).

      MINOR COMMENTS:

      a- Macrophages and microglia are very similar cell types. Could the authors comment more on the differences they observe and how they are related to those previously described?

      We have now referenced the original papers and commented on the markers that we see differentially expressed, notably P2RY12 which is a key homeostatic microglia marker that distinguishes these cells from macrophages.

      b- In Fig. 2A CEBPb cut and run plot, the differences are not limited to the SNP immediate vicinity, there are also visible differences between T/T and C/C plots in at least a 40-kb range. Is it due to multiple interactions of CEBPb? How can the point difference have broad consequences? Please explain this potentially interesting and relevant finding.

      Whilst there may be small changes in CEBPb binding at the second intronic PTK2B chromatin peak, this is not statistically significant given the variability between repeats. In fact, the only significant change we see in CEBPb binding genome-wide is at the locus overlapping the SNP (Fig 2c).

      c- Potentially cis-altered genes near the SNP include CHRNA2 and EPHX2 (see Sup. Fig. 3a). Their expression may not be detected in macrophage lineage. If this is the case please indicate in the text, otherwise please include the corresponding data in Sup. Fig. 3b to show the presence or absence of SNP-induced change.

      You are correct that CHRNA2 and EPHX2 are not expressed in our macrophages or microglia, and we have now explicitly stated this in the revised text.

      d- In general the Figures are not of very high quality and are difficult to read or understand without constantly going back and forth to the legends (which are mislabeled in some instances). To improve:

      . Please increase font size whenever possible.

      . Please improve Fig. 1d by indicating the position of the SNP, numbering the exons (an intermediate scale plot may be necessary and lines on bottom trace are hardly visible).

      . Please indicate the correct color code for T/T and C/C in Fig 3a and b, left panels, which currently doesn't match.

      . Please label the Venn's diagrams comparisons in Sup. Fig. 4b.

      . In the text and legends the Figure items are identified with letters in upper case, in the figures they are in lower case. Please be consistent.

      We have improved the resolution of the images in the pdf and Fig 1d has been revised to include the position of the SNP. The colour code for T/T and C/C is correct in fig 3a and 3b, but since the PCA plots are independently created, we would not always expect the position of the T/T and C/C alleles to be the same. The Venn diagrams in Sup Fig 4b have been updated, and the letters for the figure panels made consistently upper case throughout.

      e- In Fig. 2D and 2E, the Y axes should start at zero to avoid artificially increasing the visual differences. If there is a strong reason not to do so (I don't see any here), the Y axis should be clearly interrupted to avoid confusion.

      We have altered this accordingly.

      f- In the introduction the authors provide some background about previous work about the potential role of PTK2B/PYK2 in AD pathophysiology. The cited preclinical results suggest that PTK2B activity could have a deleterious effect (references in the manuscript). In contrast, some other reports (PMID: 29803828, 33718872) suggest a protective effect of PTK2B/PYK2. Because the evidence in the current manuscript suggests that the risk-associated SNP results in a decreased function of PTK2B/PYK2 (through decreased levels), at least in cells of the macrophage lineage, the authors could broaden their discussion to include these results.

      We have now discussed the conflicting evidence in the revised manuscript.

      Reviewer #2 (Significance (Required)):

      ADVANCE: Late onset Alzheimer's disease is a major medical issue. It has a complex genetic risk component with many associated loci identified in GWAS. Most of these have only a small individual impact on the risk. One of the SNPs associated with increased risk (rs28834970) is located in an intron of the PTK2B gene. Although various reports have investigated the role of the PTK2B gene product, the tyrosine kinase PYK2, in several AD models, the possible link with rs28834970, is unclear.

      An important point is to determine whether TàC SNP corresponding to rs28834970 alters PTK2B expression and how it does so. An alternative hypothesis could be that the SNP has a strong linkage disequilibrium with an unidentified allele in human populations that could be responsible for AD risk. The current manuscript is a significant step forward in addressing that question. By generating a biallelic C/C SNP mutation in a human IPSC line the current study allows to eliminate such linked contribution.

      The strength of the manuscript is to show an effect on chromatin accessibility, CEBP binding and possibly PTK2B transcripts. It also provides interesting evidence of a broad effect of the C/C mutation on the transcriptome of macrophage lineage cells. In its current form the manuscript presents weaknesses that could be improved. These flaws include issues with the presentation discussed above and the uncomplete demonstration that it is the decrease in PTK2B expression that causes the macrophage/microglia phenotype. If these flaws were overcome the paper would represent a significant advance.

      AUDIENCE: The expected audience is specialized in AD with a possible broader range if all weaknesses are addressed.

      REVIEWER EXPERTISE: Basic science close to the field.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewer #1:

      We agree with Reviewer 1 that a function of ROPGEFs in this process was expected to some degree. However, we want to point out that this manuscript focuses on the requirement of ROPGEFs and especially the spatio-temporal description of ROP signalling polarisation and activation during pollen germination. Moreover, different to the downstream ROPs, we show ROPGEFs do not act strictly redundant, confirming results from root hair initiation and providing additional evidence that multiple signalling pathways are required for pollen germination and that ROPGEFs might be essential for bringing specificity to these signals.

      Major comments:

      1. Only one GEF11 mutant line, gef11-t1, was analyzed for germination ratio. It is presumptuous to conclude that GEF11 has no function in the pollen germination of Arabidopsis thaliana (line 241- line 242).

      After the initial negative results, we did not focus on GEF11 further. Thus, we fully agree that it is presumptuous to make such strong statements about the role of GEF11 during pollen germination. We generated additional gef11 mutant alleles for this revision plan using CRISPR/Cas9 as no other suitable lines were available. Moreover, we now have additional higher-order mutants available to demonstrate the function of GEF11 during pollen germination. These additional lines were generated and confirmed and are growing right now. Thus, we will be able to implement new results addressing this point timely, allowing us to make a more founded statement about the function of GEF11 (see Response to Reviewer #2).

      Minor comments:

      1. In Figure 2A, pollen germination ratio was not provided for the single mutants gef8-c△3 and gef9-c△

      This is due to the generation process of the CRISPR/Cas9 alleles. These alleles were generated by a construct mutating both genes simultaneously; thus, these mutants are unavailable as single mutant lines. Instead of separating these alleles by outcrossing, we included additional single mutant alleles for both GEFs with a similar deletion. As all these CRISPR/Cas9 mutants have a complete deletion of the GEF-ORF, we are sure about the loss of the according GEF function. Additional alleles account for possible unspecific effects.

      In Figure 3D, the subcellular localization of GEF12GEF8C is fuzzy. Better imaging is needed.

      We agree that the quality of these images is not ideal due to this specific line having less fluorescent signal. We screened for more lines of this construct and already performed more experiments. We will provide better images for this genotype.

      In Figure 3E, it is intriguing that both GEF8-S518A and GEF8-S518D are not associated with the PM in germinating pollen grains. Does it mean that phosphorylation at S518 is not relevant to polar distribution of GEF8?

      We also find this very intriguing as we did not expect this result. However, we interpret it slightly differently in the way that the S518 site is relevant for GEF polarisation, which might be conferred by RLK interaction. We think both mutant forms alter this potential association with RLKs, thus losing polarisation. We will include more imaging experiments of these constructs and additional lines to strengthen our results. Moreover, we generated lines to study these lines' functionality and complementation capacity, which will be included in a revised manuscript.

      T-DNA insertion lines, gef11-t1 and gef12-t1, need to be verified by PCRs in Figure S3D.

      Thanks for pointing this out. This control should be provided, and we will include the verification in the supplement.

      Response to Reviewer #2:

      Like Reviewer #2, we are also very intrigued by the biphasic accumulation of GEFs, as this is an entirely novel feature of this process. Like Reviewer #2, we also interpret this as an exploration and establishment phase, which could help us to understand how the pollen germination site is decided in species without aperture-dependent pollen germination.

      Major comments:

      1. In line 241, the authors conclude that GEF11 has no function in pollen germination. However, it is likely that GEF11 also plays a redundant role as GEF12 does. I recommend the authors check the phenotypes of gef11,gef12 double mutant and gef8,gef9,gef11 triple mutant to confirm that GEF11 has indeed no function. Otherwise, this conclusion should be better rephrased.

      This point is well justified and similar to the comment of Reviewer #1. As stated before, we had to generate additional lines for this. We will analyse an additional gef11 allele, gef8/gef11 and gef9/11 double mutants, and gef9/11/12 triple mutants to address the function of GEF11 in more detail. The conclusions of the original manuscript will, of course, be adjusted according to the new results.

      Although GEF12 is in the cytosol, the strong pollen germination defects in gef8,gef9,gef12 triple mutants do indicate a critical role of GEF12. Is it possible that GEFs could function in the cytosol? The authors can test this possibility by examining the rescuing ability of several constructs that express, for example, GEF12, GEF12(+GEF8C), GEF8(SA), or GEF8(SD) in gef8. The authors may not perform all of these rescue experiments, but some of the mentioned lines are already in hands. They could readily check the phenotypes.

      We thank the Reviewer for this great point. This information is crucial to discriminate the function of the individual GEFs. We have generated new lines expressing some of the mentioned constructs in the gef8 background to address this. We now have lines that complement gef8 with GEF12, GEF12GEF8C, GEF8S518A, GEF8S518D, and GEF8ΔC. We are currently performing experiments which determine the functionality of these constructs, which will allow us to make more conclusive statements about the function of GEFs in the cytosol and how important the PRONE domain alone, or the membrane attachment of GEFs, is for their function.

      The authors conclude that the C-terminus of GEF8 and GEF9 is necessary and sufficient for membrane localization because GEF8/9C can target GEF12 PRONE domain to the membrane. It is intriguing whether the C-terminus alone could confer membrane targeting ability. Currently, it is not fully understood how GEFs localize to the membrane. Examining the localization of GEF8/9C itself would help clarify this and improve our understanding of GEF regulation. Alternatively, the authors may discuss evidence that supports or disagrees with this possibility.

      This is a good suggestion by the reviewer and indeed intriguing if the C-Terminus alone could confer membrane attachment. Meanwhile, we obtained plants expressing such constructs, showing that the C-terminus alone is insufficient for membrane attachment. This is not surprising, as these domains are largely disordered, and we suspect that the context of an adjacent PRONE domain is required to carry out this function. We will include our new results in the revised manuscript.

      Minor comments:

      1. The N- and C-terminus of GEF8 are predicted to inhibit complex formation. How is the prediction performed? Do the authors use monomer prediction or multimer prediction? Alphafold2 has a low accuracy in predicting non-conserved regions. How confident are the predicted inhibitory contacts?

      We used multimer-prediction of Alphafold2 for the shown structures. However, we fully agree that the predicted structures of Alphafold have low accuracy in that regard, especially for disordered domains like this. We will provide confidence models and predicted aligned error (PAE) plots for this structure. Additionally, we will put our conclusions in a better perspective of these structure confidences and tone down our interpretations of this section.

      Localization of ROPs and calcium reporter in Figure 4 appears to be variable. It would help clarify the specific effects on each reporter if the authors present these data more quantitatively.

      We agree with the reviewer that some of the observations are variable. We will provide the data more quantitatively, including overviews of which percentage we observed the described phenomena and a more quantitative analysis of the strength and timing of signal accumulation (see also Response to Reviewer #3).

      Response to Reviewer #3:

      Major points:

      1. One of my major points is that the manuscript is now mainly based on the observations of individual pollen grains. These are then subjected to well-performed image analysis approaches but still represent somewhat anecdotal evidence (Fig 1A, B, Fig 3C-E, etc). The analysis and (numerical) presentation of a more robust data sample (which I presume the authors have acquired) would strengthen the ms considerably. This goes beyond the Figs - e.g. in l. 164-165 authors state rather vaguely, "we observed that mCit-GEF8 and mCit-GEF9 accumulated at a defined region in the cell periphery, which strongly correlated with the future germination site." Here, I would appreciate the data showing the actual correlation, if every germinated pollen grain displays GEF8/9 accumulation, whether there is a population of pollen grains showing the GEF8/9 transient but not germinating, etc...

      We very much appreciate the reviewer's comment, as this version of the manuscript indeed seems like we made our conclusions based on observations made from individual pollen. However, this is not the case. As the reviewer suspected, more data is available but not included in the manuscript. We have multiple observations for each of the shown constructs and only show a representative one. Furthermore, we imaged more pollen germination events of lines that showed variability and included additional lines for some constructs. We will provide a more quantitative analysis of the results to better represent the variability of the individual constructs, and we will adjust the manuscript accordingly (see comment 2).

      Where the authors analyse multiple cells, we are still missing some info - e.g. it is not stated what the error bars in Fig 1C, D represents (SD, SEM, CI?), size of the sample, etc. In any case, it is evident that there is quite substantial variability in the data, which is understandable. Maybe the authors can plot the individual profile lines along the average? Plus, GEF9 seem to have the maximum pre-germination localisation at -5 min rather than -9 min.

      We agree with the Reviewer that information is missing or not obviously stated. We will correct this for the revised manuscript. Moreover, we agree that the suggested way of showing the data would provide more information and allow a better representation of the results and their variability. This can be seen in the reviewer's interpretation of the results of GEF9. In this case, we see some variability in the timing of GEF9 accumulation, leading to the peak maximum shift. In a revised manuscript, we will, as suggested, show the data as individual lines, providing a better representation of the data. Moreover, we will include such representations for other used constructs to provide a general, more quantitative data analysis (see comment 1).

      I know it is very challenging, but the ms would be much stronger with the in vivo imaging of pollen germination on stigmatic papillae (i) GEF8/9 in wt, (ii) gef8/9 double mutant. This would bring crucial data about the role of the GEF polar domain and its functional relation to pollination.

      This would indeed be great to see. We put an effort into establishing such in vivo imaging experiments with our fluorescent markers. However, we cannot image these events in an in vivo setup (at least with our resources). This has two reasons: 1. The events are very fast and limited to a small region at the pollen-papilla contact side, which we have issues resolving optically and timely. 2. The used marker lines only have a low fluorescent level due to the native promoter, and stronger expression would lead to overexpression artefacts. In vitro, it is difficult to see the observed signal accumulation. In the in vivo situation, we are facing additional diffraction of the papilla cells, which would make the observation of GEF accumulation impossible with our microscopes.

      The phylogeny presented in Fig S1 is only rudimental and not very interesting. Given the author's results, I would love to see if GEF8/9 orthologs also exist in species with defined pollen apertures (where establishing a dynamic site makes little sense). The authors touch on this (L409-411), but it would deserve better analysis and discussion.

      We agree with the reviewer that studying GEF function/accumulation in species with aperture-dependent germination would be interesting. However, we can not conclude functional orthologs in other species based on phylogeny. Such phylogenetic analyses were done, for example, by Kim et al. (BMC Plant Biology, 2020, doi: 10.1186/s12870-020-2298-5). The issue is that all Arabidopsis pollen-expressed GEFs form a closed phylogenetic group without allowing the interpretation of which rice homolog is the functional ortholog of the respective Arabidopsis GEF (this is the same for maize). Thus, such phylogenetic analyses are not conclusive, and they would require experimental data to prove orthology. However, we agree that this point can be interpreted and discussed better, and we will include this in the revised manuscript.

      I am not entirely convinced by the authors' interpretation of rather strange S518 mutation data. Could S518A mutation affect overall GEF8 structure/stability?

      We were also suspicious about these results, as they were unexpected (see also Response to Reviewer #1). To confirm these results, we made additional lines for these constructs, double-checked that the constructs were correct and made more observations for both GEF8S18A and GEF8S18D. Additionally, we started investigating the functionality of these constructs and have this data available timely. Preliminary results suggest that the constructs are partial to fully functional compared to the WT GEF8, arguing against these mutations' effect on structure or stability. We will include more data for these constructs in a revised manuscript to allow a more conclusive interpretation of these unexpected observations.

      Although the authors cannot observe the localisation of ROPs in the plasma membrane, they see the apparent accumulation of active ROP marker CRIB4 there - implying that ROPs must localise to the pollen PM at the germination site. This discrepancy should be solved or at least discussed more.

      The reviewer is correct in that we cannot observe ROP accumulation but rather the accumulation of ROP activity (as seen by CRIB4). This is in line with the observation made by Xiang et al. (2023, Plant Physiology, doi: 10.1093/plphys/kiad196), which also cannot find ROP accumulation. We are convinced that ROPs are present at the plasma membrane of the pollen germination site, but no accumulation is observable. We believe this is due to a high mobility of ROPs and that no accumulation is required, as only a few ROPs are sufficient to activate downstream signals. We will discuss these results in more detail in a revised manuscript to better explain the observed discrepancy.

      Given that calcium oscillates very rapidly in pollen and pollen tubes (with frequency ~6-20s), the profound, long-term changes in calcium levels reported by the authors can hardly be referred to as oscillations. The phenomenon observed should again be analysed using a bigger sample.

      We agree that the terminology is not good, as it suggests similarities to the oscillations found in pollen tubes. Thus, we will change the revised manuscript and refer to the changes in Ca2+ levels as “elevations”. Moreover, we will provide a more quantitative analysis and a bigger sample size, as stated in Response to Reviewer #2.

      Minor points:

      1. In Fig 1F, GEF12 also seems to be polarly localised to the future site.

      The chosen sample is not ideal, as it looks like GEF12 would also slightly accumulate. However, as seen in the quantification of this cell, GEF12 does not significantly accumulate at the pollen germination site, and we never observed any accumulation of GEF12 that is comparable to GEF8 or GEF9. We will include another sample of this colocalisation in the revised manuscript to avoid misinterpretation of the data.

      It is difficult to make any assumptions based on the AlphaFold2 predictions without showing their confidence assessments (e.g., PAE plots). The authors state this themselves in the discussion (L. 447-449).

      As the Response to Reviewer #2 stated, we will include structures with confidence values and PAE plots in the supplement. We additionally tone down our interpretation of these structure predictions to make clear that these structures should be interpreted carefully.

      On one hand the authors repeatedly state that pollen GEFs do act in a redundant manner (and provide some evidence for it), on the other hand the absence of an in vivo phenotype for single and double knockout lines and only mild phenotype for a triple ko line does suggest a level of redundancy. This should be rephrased.

      We agree that this is not clearly phrased. In a revised version, we will change the manuscript to indicate which type and level of redundancy are described. We will discriminate between genetic redundancy, as seen in the mild in vivo effects, and non-redundant molecular function, as observed by protein localisation.

    1. Reviewer #1 (Public Review):

      Summary:

      The novel advance by Wang et al is in the demonstration that, relative to a standard extinction procedure, the retrieval-extinction procedure more effectively suppresses responses to a conditioned threat stimulus when testing occurs just minutes after extinction. The authors provide some solid evidence to show that this "short-term" suppression of responding involves engagement of the dorsolateral prefrontal cortex.

      Strengths:

      Overall, the study is well-designed and the results are potentially interesting. There are, however, a few issues in the way that it is introduced and discussed. Some of the issues concern clarity of expression/communication. However, others relate to a theory that could be used to help the reader understand why the results should have come out the way that they did. More specific comments and questions are presented below.

      Weaknesses:

      INTRODUCTION & THEORY

      (1) Can the authors please clarify why the first trial of extinction in a standard protocol does NOT produce the retrieval-extinction effect? Particularly as the results section states: "Importantly, such a short-term effect is also retrieval dependent, suggesting the labile state of memory is necessary for the short-term memory update to take effect (Fig. 1e)." The importance of this point comes through at several places in the paper:

      1A. "In the current study, fear recovery was tested 30 minutes after extinction training, whereas the effect of memory reconsolidation was generally evident only several hours later and possibly with the help of sleep, leaving open the possibility of a different cognitive mechanism for the short-term fear dementia related to the retrieval-extinction procedure." ***What does this mean? The two groups in study 1 experienced a different interval between the first and second CS extinction trials; and the results varied with this interval: a longer interval (10 min) ultimately resulted in less reinstatement of fear than a shorter interval. Even if the different pattern of results in these two groups was shown/known to imply two different processes, there is absolutely no reason to reference any sort of cognitive mechanism or dementia - that is quite far removed from the details of the present study.

      1B. "Importantly, such a short-term effect is also retrieval dependent, suggesting the labile state of memory is necessary for the short-term memory update to take effect (Fig. 1e)." ***As above, what is "the short-term memory update"? At this point in the text, it would be appropriate for the authors to discuss why the retrieval-extinction procedure produces less recovery than a standard extinction procedure as the two protocols only differ in the interval between the first and second extinction trials. References to a "short-term memory update" process do not help the reader to understand what is happening in the protocol.

      (2) "Indeed, through a series of experiments, we identified a short-term fear amnesia effect following memory retrieval, in addition to the fear reconsolidation effect that appeared much later."<br /> ***The only reason for supposing two effects is because of the differences in responding to the CS2, which was subjected to STANDARD extinction, in the short- and long-term tests. More needs to be said about how and why the performance of CS2 is affected in the short-term test and recovers in the long-term test. That is, if the loss of performance to CS1 and CS2 is going to be attributed to some type of memory updating process across the retrieval-extinction procedure, one needs to explain the selective recovery of performance to CS2 when the extinction-to-testing interval extends to 24 hours. Instead of explaining this recovery, the authors note that performance to CS1 remains low when the extinction-to-testing interval is 24 hours and invoke something to do with memory reconsolidation as an explanation for their results: that is, they imply (I think) that reconsolidation of the CS1-US memory is disrupted across the 24-hour interval between extinction and testing even though CS1 evokes negligible responding just minutes after extinction.

      (3) The discussion of memory suppression is potentially interesting but, in its present form, raises more questions than it answers. That is, memory suppression is invoked to explain a particular pattern of results but I, as the reader, have no sense of why a fear memory would be better suppressed shortly after the retrieval-extinction protocol compared to the standard extinction protocol; and why this suppression is NOT specific to the cue that had been subjected to the retrieval-extinction protocol.

      3A. Relatedly, how does the retrieval-induced forgetting (which is referred to at various points throughout the paper) relate to the retrieval-extinction effect? The appeal to retrieval-induced forgetting as an apparent justification for aspects of the present study reinforces points 2 and 3 above. It is not uninteresting but needs some clarification/elaboration.

      (4) Given the reports by Chalkia, van Oudenhove & Beckers (2020) and Chalkia et al (2020), some qualification needs to be inserted in relation to reference 6. That is, reference 6 is used to support the statement that "during the reconsolidation window, old fear memory can be updated via extinction training following fear memory retrieval". This needs a qualifying statement like "[but see Chalkia et al (2020a and 2020b) for failures to reproduce the results of 6]."

      https://pubmed.ncbi.nlm.nih.gov/32580869/<br /> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7115860/

      CLARIFICATIONS, ELABORATIONS, EDITS

      (5) The Abstract was not easy to follow:

      5A. What does it mean to ask: "whether memory retrieval facilitates update mechanisms other than memory reconsolidation"? That is, in what sense could or would memory retrieval be thought to facilitate a memory update mechanism?

      5B. "First, we demonstrate that memory reactivation prevents the return of fear shortly after extinction training in contrast to the memory reconsolidation effect which takes several hours to emerge and such a short-term amnesia effect is cue independent (Study 1, N = 57 adults)."<br /> ***The phrasing here could be improved for clarity: "First, we demonstrate that the retrieval-extinction protocol prevents the return of fear shortly after extinction training (i.e., when testing occurs just min after the end of extinction)." Also, cue-dependence of the retrieval-extinction effect was assessed in study 2.

      5C. "Furthermore, memory reactivation also triggers fear memory reconsolidation and produces cue-specific amnesia at a longer and separable timescale (Study 2, N = 79 adults)." ***In study 2, the retrieval-extinction protocol produced a cue-specific disruption in responding when testing occurred 24 hours after the end of extinction. This result is interesting but cannot be easily inferred from the statement that begins "Furthermore..." That is, the results should be described in terms of the combined effects of retrieval and extinction, not in terms of memory reactivation alone; and the statement about memory reconsolidation is unnecessary. One can simply state that the retrieval-extinction protocol produced a cue-specific disruption in responding when testing occurred 24 hours after the end of extinction.

      5D. "...we directly manipulated brain activities in the dorsolateral prefrontal cortex and found that both memory retrieval and intact prefrontal cortex functions were necessary for the short-term fear amnesia."<br /> ***This could be edited to better describe what was shown: E.g., "...we directly manipulated brain activities in the dorsolateral prefrontal cortex and found that intact prefrontal cortex functions were necessary for the short-term fear amnesia after the retrieval-extinction protocol."

      5E. "The temporal scale and cue-specificity results of the short-term fear amnesia are clearly dissociable from the amnesia related to memory reconsolidation, and suggest that memory retrieval and extinction training trigger distinct underlying memory update mechanisms."<br /> ***The pattern of results when testing occurred just minutes after the retrieval-extinction protocol was different from that obtained when testing occurred 24 hours after the protocol. Describing this in terms of temporal scale is unnecessary, and suggesting that memory retrieval and extinction trigger different memory update mechanisms is not obviously warranted. The results of interest are due to the combined effects of retrieval+extinction and there is no sense in which different memory update mechanisms should be identified with retrieval (mechanism 1) and extinction (mechanism 2).

      5F. "These findings raise the possibility of concerted memory modulation processes related to memory retrieval..."<br /> ***What does this mean?

      (6) "...suggesting that the fear memory might be amenable to a more immediate effect, in addition to what the memory reconsolidation theory prescribes..."<br /> ***What does it mean to say that the fear memory might be amenable to a more immediate effect?

      (7) "Parallel to the behavioral manifestation of long- and short-term memory deficits, concurrent neural evidence supporting memory reconsolidation theory emphasizes the long-term effect of memory retrieval by hypothesizing that synapse degradation and de novo protein synthesis are required for reconsolidation."<br /> ***This sentence needs to be edited for clarity.

      (8) "previous behavioral manipulations engendering the short-term declarative memory effect..."<br /> ***What is the declarative memory effect? It should be defined.

      (9) "The declarative amnesia effect emerges much earlier due to the online functional activity modulation..."<br /> ***Even if the declarative memory amnesia effect had been defined, the reference to online functional activity modulation is not clear.

      (10) "However, it remains unclear whether memory retrieval might also precipitate a short-term amnesia effect for the fear memory, in addition to the long-term prevention orchestrated by memory consolidation."<br /> ***I found this sentence difficult to understand on my first pass through the paper. I think it is because of the phrasing of memory retrieval. That is, memory retrieval does NOT precipitate any type of short-term amnesia for the fear memory: it is the retrieval-extinction protocol that produces something like short-term amnesia. Perhaps this sentence should also be edited for clarity.

      I will also note that the usage of "short-term" at this point in the paper is quite confusing: Does the retrieval-extinction protocol produce a short-term amnesia effect, which would be evidenced by some recovery of responding to the CS when tested after a sufficiently long delay? I don't believe that this is the intended meaning of "short-term" as used throughout the majority of the paper, right?

      (11) "To fully comprehend the temporal dynamics of the memory retrieval effect..."<br /> ***What memory retrieval effect? This needs some elaboration.

      (12) "We hypothesize that the labile state triggered by the memory retrieval may facilitate different memory update mechanisms following extinction training, and these mechanisms can be further disentangled through the lens of temporal dynamics and cue-specificities."<br /> ***What does this mean? The first part of the sentence is confusing around the usage of the term "facilitate"; and the second part of the sentence that references a "lens of temporal dynamics and cue-specificities" is mysterious. Indeed, as all rats received the same retrieval-extinction exposures in Study 2, it is not clear how or why any differences between the groups are attributed to "different memory update mechanisms following extinction".

      (13) "In the first study, we aimed to test whether there is a short-term amnesia effect of fear memory retrieval following the fear retrieval-extinction paradigm."<br /> ***Again, the language is confusing. The phrase, "a short-term amnesia effect" implies that the amnesia itself is temporary; but I don't think that this implication is intended. The problem is specifically in the use of the phrase "a short-term amnesia effect of fear memory retrieval." To the extent that short-term amnesia is evident in the data, it is not due to retrieval per se but, rather, the retrieval-extinction protocol.

      (14) The authors repeatedly describe the case where there was a 24-hour interval between extinction and testing as consistent with previous research on fear memory reconsolidation. Which research exactly? That is, in studies where a CS re-exposure was combined with a drug injection, responding to the CS was disrupted in a final test of retrieval from long-term memory which typically occurred 24 hours after the treatment. Is that what the authors are referring to as consistent? If so, which aspect of the results are consistent with those previous findings? Perhaps the authors mean to say that, in the case where there was a 24-hour interval between extinction and testing, the results obtained here are consistent with previous research that has used the retrieval-extinction protocol. This would clarify the intended meaning greatly.

      DATA

      (15) Points about data:

      15A. The eight participants who were discontinued after Day 1 in study 1 were all from the no-reminder group. Can the authors please comment on how participants were allocated to the two groups in this experiment so that the reader can better understand why the distribution of non-responders was non-random (as it appears to be)?

      15B. Similarly, in study 2, of the 37 participants that were discontinued after Day 2, 19 were from Group 30 min, and 5 were from Group 6 hours. Can the authors comment on how likely these numbers are to have been by chance alone? I presume that they reflect something about the way that participants were allocated to groups, but I could be wrong.

      15C. "Post hoc t-tests showed that fear memories were resilient after regular extinction training, as demonstrated by the significant difference between fear recovery indexes of the CS+ and CS- for the no-reminder group (t26 = 7.441, P < 0.001; Fig. 1e), while subjects in the reminder group showed no difference of fear recovery between CS+ and CS- (t29 = 0.797, P = 0.432, Fig. 1e)."<br /> ***Is the fear recovery index shown in Figure 1E based on the results of the 1st test trial only? How can there have been a "significant difference between fear recovery indexes of the CS+ and CS- for the no-reminder group" when the difference in responding to the CS+ and CS- is used to calculate the fear recovery index shown in 1E? What are the t-tests comparing exactly, and what correction is used to account for the fact that they are applied post-hoc?

      15D. "Finally, there is no statistical difference between the differential fear recovery indexes between CS+ in the reminder and no reminder groups (t55 = -2.022, P = 0.048; Fig. 1c, also see Supplemental Material for direct test for the test phase)."<br /> ***Is this statement correct - i.e., that there is no statistically significant difference in fear recovery to the CS+ in the reminder and no reminder groups? I'm sure that the authors would like to claim that there IS such a difference; but if such a difference is claimed, one would be concerned by the fact that it is coming through in an uncorrected t-test, which is the third one of its kind in this paragraph. What correction (for the Type 1 error rate) is used to account for the fact that the t-tests are applied post-hoc? And if no correction, why not?

      15E. In study 2, why is responding to the CS- so high on the first test trial in Group 30 min? Is the change in responding to the CS- from the last extinction trial to the first test trial different across the three groups in this study? Inspection of the figure suggests that it is higher in Group 30 min relative to Groups 6 hours and 24 hours. If this is confirmed by the analysis, it has implications for the fear recovery index which is partly based on responses to the CS-. If not for differences in the CS- responses, Groups 30 minutes and 6 hours are otherwise identical.

      15F. Was the 6-hour group tested at a different time of day compared to the 30-minute and 24-hour groups; and could this have influenced the SCRs in this group?

      15G. Why is the range of scores in "thought control ability" different in the 30-minute group compared to the 6-hour and 24-hour groups? I am not just asking about the scale on the x-axis: I am asking why the actual distribution of the scores in thought control ability is wider for the 30-minute group?

      (16) During testing in each experiment, how were the various stimuli presented? That is, was the presentation order for the CS+ and CS- pseudorandom according to some constraint, as it had been in extinction? This information should be added to the method section.

      (17) "These results are consistent with previous research which suggested that people with better capability to resist intrusive thoughts also performed better in motivated dementia in both declarative and associative memories."<br /> ***Which parts of the present results are consistent with such prior results? It is not clear from the descriptions provided here why thought control ability should be related to the present findings or, indeed, past ones in other domains. This should be elaborated to make the connections clear.

    2. Reviewer #3 (Public Review):

      SUMMARY

      Wang et al. have addressed how acquired fear and extinction memories evolve over time. Using a retrieval-extinction procedure in healthy humans, they have investigated the recovery of fear memories 30-60 minutes., 6 hours, and 24 hours after the retrieval-extinction phase. They have addressed this research question through 3 different experiments which included manipulations of the reminder cue, the time interval, and brain activity. Together, the studies suggest that early on after retrieval-extinction (30-60 min. later), retrieval-extinction may lead to an attenuation of fear recovery (after reinstatement) for all fear cues, as well as the non-reminded ones. Study 3 moreover suggests that this effect may depend on normal dlPFC function. In addition, the paper also contains data in line with prior findings suggesting that a 6-hour interval does not benefit from the reminder cue, and that a 24-hour interval does, and specifically for the reminded fear cue. The latter findings are seen as evidence of fear memory reconsolidation.

      STRENGTHS

      (1) The paper combines three related human fear conditioning studies, each with decent sample sizes. The authors are transparent about the fact that they excluded many participants and about which conditions they belonged to.

      (2) The effect that this paper investigates (short-term fear memory after a retrieval-extinction procedure) has not been studied extensively, thus making it a relevant topic.

      (3) The application of brain stimulation as a means to study causal relationships is interesting and goes beyond the purely behavioral or pharmacological interventions that are often used in human fear conditioning research. Also, the use of an active control stimulation is a strength of the study.

      WEAKNESSES

      (1) The entire study hinges on the idea that there is memory 'suppression' if (1) the CS+ was reminded before extinction and (2) the reinstatement and memory test takes place 30 minutes later (in Studies 1 & 2). However, the evidence supporting this suppression idea is not very strong. In brief, in Study 1, the effect seems to only just reach significance, with a medium effect size at best, and, moreover, it is unclear if this is the correct analysis (which is a bit doubtful, when looking at Figure 1D and E). In Study 2, there was no optimal control condition without reminder and with the same 30-min interval (which is problematic, because we can assume generalization between CS1+ and CS2+, as pointed out by the authors, and because generalization effects are known to be time-dependent). Study 3 is more convincing, but entails additional changes in comparison with Studies 1 and 2, i.e., applications of cTBS and an interval of 1 hour instead of 30 minutes (the reason for this change was not explained). So, although the findings of the 3 studies do not contradict each other and are coherent, they do not all provide strong evidence for the effect of interest on their own.

      Related to the comment above, I encourage the authors to double-check if this statement is correct: "Also, our results remain robust even with the "non-learners" included in the analysis (Fig. S1 in the Supplemental Material)". The critical analysis for Study 1 is a between-group comparison of the CS+ and CS- during the last extinction trial versus the first test trial. This result only just reached significance with the selected sample (p = .048), and Figures 1D and E even seem to suggest otherwise. I doubt that the analysis would reach significance when including the "non-learners" - assuming that this is what is shown in Supplemental Figure 1 (which shows the data from "all responded participants").

      Also related to the comment above, I think that the statement "suggesting a cue-independent short-term amnesia effect" in Study 2 is not correct and should read: "suggesting extinction of fear to the CS1+ and CS2+", given that the response to the CS+'s is similar to the response to the CS-, as was the case at the end of extinction. Also the next statement "This result indicates that the short-term amnesia effect observed in Study 2 is not reminder-cue specific and can generalize to the non-reminded cues" is not fully supported by the data, given the lack of an appropriate control group in this study (a group without reinstatement). The comparison with the effect found in Study 1 is difficult because the effect found there was relatively small (and may have to be double-checked, see remarks above), and it was obtained with a different procedure using a single CS+. The comparison with the 6-h and 24-h groups of Study 2 is not helpful as a control condition for this specific question (i.e., is there reinstatement of fear for any of the CS+'s) because of the large procedural difference with regard to the intervals between extinction and reinstatement (test).

      (2) It is unclear which analysis is presented in Figure 3. According to the main text, it either shows the "differential fear recovery index between CS+ and CS-" or "the fear recovery index of both CS1+ and CS2+". The authors should clarify what they are analyzing and showing, and clarify to which analyses the ** and NS refer in the graphs. I would also prefer the X-axes and particularly the Y-axes of Fig. 3a-b-c to be the same. The image is a bit misleading now. The same remarks apply to Figure 5.

      (3) In general, I think the paper would benefit from being more careful and nuanced in how the literature and findings are represented. First of all, the authors may be more careful when using the term 'reconsolidation'. In the current version, it is put forward as an established and clearly delineated concept, but that is not the case. It would be useful if the authors could change the text in order to make it clear that the reconsolidation framework is a theory, rather than something that is set in stone (see e.g., Elsey et al., 2018 (https://doi.org/10.1037/bul0000152), Schroyens et al., 2022 (https://doi.org/10.3758/s13423-022-02173-2)).

      In addition, the authors may want to reconsider if they want to cite Schiller et al., 2010 (https://doi.org/10.1038/nature08637), given that the main findings of this paper, nor the analyses could be replicated (see, Chalkia et al., 2020 (https://doi.org/10.1016/j.cortex.2020.04.017; https://doi.org/10.1016/j.cortex.2020.03.031).

      Relatedly, it should be clarified that Figure 6 is largely speculative, rather than a proven model as it is currently presented. This is true for all panels, but particularly for panel c, given that the current study does not provide any evidence regarding the proposed reconsolidation mechanism.

      Lastly, throughout the paper, the authors equate skin conductance responses (SCR) with fear memory. It should at least be acknowledged that SCR is just one aspect of a fear response, and that it is unclear whether any of this would translate to verbal or behavioral effects. Such effects would be particularly important for any clinical application, which the authors put forward as the ultimate goal of the research.

      (4) The Discussion quite narrowly focuses on a specific 'mechanism' that the authors have in mind. Although it is good that the Discussion is to the point, it may be worthwhile to entertain other options or (partial) explanations for the findings. For example, have the authors considered that there may be an important role for attention? When testing very soon after the extinction procedure (and thus after the reminder), attentional processes may play an important role (more so than with longer intervals). The retrieval procedure could perhaps induce heightened attention to the reminded CS+ (which could be further enhanced by dlPFC stimulation)?

      (5) There is room for improvement in terms of language, clarity of the writing, and (presentation of the) statistical analyses, for all of which I have provided detailed feedback in the 'Recommendations for the authors' section. Idem for the data availability; they are currently not publicly available, in contrast with what is stated in the paper. In addition, it would be helpful if the authors would provide additional explanation or justification for some of the methodological choices (e.g., the 18-s interval and why stimulate 8 minutes after the reminder cue, the choice of stimulation parameters), and comment on reasons for (and implications of) the large amount of excluded participants (>25%).

      Finally, I think several statements made in the paper are overly strong in light of the existing literature (or the evidence obtained here) or imply causal relationships that were not directly tested.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Review, 3D SIM + AO, Wang and coworkers

      In this manuscript, Wang and coworkers report an upright 3D SIM system with adaptive optics (AO) correction. They demonstrate that AO improves imaging into thick 3D samples, including Drosophila larval brain. They also explore the use of remote focusing with their setup. The authors clearly demonstrate a gain with AO, and we are convinced that the microscope they build offers some utility over existing state of the art, particularly in samples thicker than a single cell. That said, we have concerns with the manuscript that we would like to see addressed before recommending publication:

      • Given the emphasis on super-resolution imaging deep inside a sample, we were surprised to see no mention of other forms of structured illumination that allow super-resolution imaging in samples thicker than a single cell. These include the 'spot-scanning' implementations of SIM that offer better imaging at depth by virtue of pinholes, and include MSIM, iSIM, and rescan confocal technologies. The two-photon / AO implementation of iSIM seems particularly germane, e.g. https://pubmed.ncbi.nlm.nih.gov/28628128/ Please consider citing these works, as they help place the existing work into context.
      • As we're sure the authors appreciate, besides aberrations, a major additional obstacle to 3D SIM in thick tissues is the presence of out-of-focus background. Indeed, this point was mentioned by Gustafsson in his classic 2008 paper on 3D SIM (https://pubmed.ncbi.nlm.nih.gov/18326650/): 'The application area of three-dimensional structured illumination microscopy overlaps with that of confocal microscopy, but the two techniques have different and complementary strengths. Structured illumination microscopy offers higher effective lateral resolution, because it concentrates much of the excitation light at the very highest illumination angles, which are most effective for encoding high-resolution information into the observed data, whereas confocal microscopy spreads out its illumination light more or-less uniformly over all available angles to form a focused beam. For very thick and compactly fluorescent samples, however, confocal microscopy has an advantage in that its pinhole removes out-of focus light physically. Structured illumination microscopy is quite effective at removing out-of-focus light computationally, because it is not subject to the missing-cone problem, but computational removal leaves behind the associated shot noise. Therefore confocal microscopy may be preferable on very thick and dense samples, for which the in-focus information in a conventional microscope image would be overwhelmed by out-of-focus light, whereas structured illumination microscopy may be superior in a regime of thinner or sparser samples.' This point is not mentioned at all in the manuscript, yet we are certain it is at least partially responsible for the residual image artifacts the authors mention. Please discuss the problem of out of focus light on 3D samples, particularly with an eye to the 'spot-scanning' papers mentioned above.
      • The authors use a water dipping lens, yet they image into samples that are mounted on coverslips, i.e. they use a dipping lens to image through a coverslip: see attached pdf for reference

      This almost certainly introduces spherical aberration, which the authors seem to observe: see attached pdf for reference

      We find this troubling, as it seems that in the process of building their setup, the authors have made a choice of objective lens that introduces aberrations - that they later correct. At the very least, this point needs to be acknowledged in the manuscript (or please correct us if we're wrong) - as it renders the data in Figs. 3-4 somewhat less compelling than if the authors used an objective lens that allowed correction through a coverglass, e.g. a water dipping lens with a correction collar. In other words, in the process of building their AO setup, the authors have introduced system aberrations that render the comparison with 3D SIM somewhat unfair. Ideally the authors would show a comparison with an objective lens that can image through a glass coverslip. - The authors tend to include numbers for resolution without statistics. This renders the comparisons meaningless in my opinion; ideally every number would have a mean and error bar associated with it. We have included specific examples in the minor comments below. - In Fig. 5, after the 'multipoint AO SIM', the SNR in some regions seems to decrease after AO: see attached pdf for reference

      Please comment on this issue.

      • Please provide timing costs for the indirect AO methods used in the paper, so the reader understands how this time compares to the time required for taking a 3D SIM stack. In a similar vein, the authors in Lines 213-215, mention a 'disproportionate measurement time' when referring to the time required for AO correction at each plane - providing numbers here would be very useful to a reader, so they can judge for themselves what this means. What is the measurement time, why is it so long, and how does it compare to the time for 3D SIM? It would also be useful to provide a comparison between the time needed for AO correction at each (or two) planes without remote focusing (RF) vs. with RF, so the reader understands the relative temporal contributions of each part of the method. We would suggest, for the data shown in Fig. 5, to report a) the time to acquire the whole stack without AO (3D SIM only); b) the time to acquire the data as shown; c) the time to acquire the AO stack without RF. This would help bolster the case for remote focusing in general; as is we are not sure we buy that this is a capability worth having, at least for the data shown in this paper.
      • Some further discussion on possibly extending the remote focusing range would be helpful. We gather that limitations arose from an older model of the DM being used, due to creep effects. We also gather from the SI that edge effects at the periphery of the DM was also problematic. Are these limitations likely non-issues with modern DMs, and how much range could one reasonably expect to achieve as a result? We are wondering if the 10 um range is a fundamental practical limitation or if in principle it could be extended with commercial DMs.

      Minor comments

      • The paper mentions Ephys multiple times, even putting micromanipulators into Fig. 1 - although it is not actually used in this paper. If including in Figure 1, please make it clear that these additional components are aspirational and not actually used in the paper.
      • The abstract mentions '3D SIM microscopes', 'microscopes' redundant as the 'm' in 'SIM' stands for 'microscope'.
      • 'fast optical sectioning', line 42, how can optical sectioning be 'fast'? Do they mean rapid imaging with optical sectinong?
      • line 59, 'effective imaging depth may be increased to some extent using silicone immersion objectives', what about water immersion objectives? We would guess these could also be used.
      • line 65 - evidence for 'water-dipping objectives are more sensitive to aberrations' ? Please provide citation or remove. They are certainly more prone to aberrations if used with a coverslip as done here.
      • 'fast z stacks' is mentioned in line 103. How fast is fast?
      • line 116 'we imaged 100 nm diameter green fluorescent beads'. Deposited on glass? Given that this paper is about imaging deep this detail seems worth specifying in the main text.
      • lines 127-130, when describing changes in the bead shape with numbers for the FWHM, please provide statistics - quoting single numbers for comparison is almost useless and we cannot conclude that there is a meaningful improvement without statistics.
      • In the same vein, how can we understand that remote focus actually improves the axial FWHM of the widefield bead? Is this result repeatable, or it just noise?
      • line 155, 'Because of the high spatial information...' -> 'Because of the high resolution spatial information...'
      • When quoting estimated resolution #s from microtubules (lines 158-163) similarly please provide statistics as for beads.
      • It seems worth mentioning the mechanism of AO correction (i.e. indirect sensing) in the main body of the text, not just the methods.
      • How long do the AO corrections take for the datasets in the paper?
      • Were the datasets in Fig. 2-4 acquired with remote focusing, or in conventional z stack mode? Please clarify this point in the main text and the figure captions.
      • It would be helpful when showing z projections in Figs. 3-5 to indicate the direction of increasing depth (we assume this is 'down' due to the upright setup, but this would be good to clarify)
      • line 174, 'showed significant improvements in both intensity and contrast after reconstruction' - we see the improvements in contrast and resolution, it is harder to appreciate improvements in intensity. Perhaps if the authors showed some line profiles or otherwise quantified intensity this would be easier to appreciate.
      • line 195 'reduced artefacts' due to AO. We would agree with this statement - the benefit from AO is obvious, and yet there are still artefacts. If the authors could clarify what these (residual) artefacts are, and their cause (out of focus light, uncorrected residual aberrations, etc) this would be helpful for a reader that is not used to looking at 3D SIM images.
      • Line 197, 'expected overall structure', please clarify what is expected about the structure and why.
      • Line 199, what is a 'pseudo structure'?
      • Fig. 4B, 'a resolution of ~200 nm is retained at depth', please clarify how this estimate was obtained, ideally with statistics.
      • Fig. 4D, please comment on the unphysical negative valued intensities in Fig. 4D, ideally explaining their presence in the caption. It would also be helpful to highlight where in the figure these plots arise, so the reader can visually follow along.
      • Line 245, 'rapid mitosis'. What does rapid mean, i.e. please provide the expected timescale for mitosis.
      • For the data in Fig. 6, was remote refocusing necessary?
      • What is the evidence for 'reduced residual aberrations', was a comparative stack taken without AO? In general we feel that the results shown in Fig. 6 would be stronger if there were comparative results shown without AO (or remote focusing).
      • Line 350, 'incorporation of denoising algorithms' - citations would be helpful here.
      • Line 411, 'All three were further developed and improved' - vague, how so?
      • Sensorless AO description; how many Zernike modes were corrected?
      • Multi-position aberration correction. Was the assumption of linearity in the Zernike correction verified or met? Why is this a reasonable assumption?
      • Fig. S1B is not useful; if the idea is to give a visual impression of the setup, we would recommend providing more photos with approximate distances indicated so that the reader has a sense of the scale of the setup. As is - it looks like a photograph of some generic optical setup.
      • SI pattern generation - 'the maximum achievable reconstruction resolution was only slightly reduced to about 95% of the theoretical maximum'. We don't understand this sentence, as the resolution obtained on the 100 nm beads is considerably worse than 95% of the theoretical maximum. Or do the authors mean 95% of the theoretical maximum given their pitch size of 317 nm for green and 367 nm for red? SI Deformable mirror calibration

      'spanning the range [0.1, 0.9]' - what are the units here?

      What are the units in Fig. S5C, S5D?

      It would be useful to define 'warmup' also in the caption of SI Fig. S6A. SI Remote Focusing, 'four offsets, {-5 mm, -2.5 mm, 2.5 mm, 5 mm}...' are the units mm or um? '...whereas that of the 10 beads was...' here, do the authors mean the position of the beads derived from the movement of the piezo stage, as opposed to the remote focusing? The authors refer to the 'results from Chapter 3.2'. What are they talking about? Do they mean a supplementary figure, or earlier supplementary results? In general, we found the discussion in this paragraph difficult to follow. Supplementary Fig. 9 seems to be not referred to anywhere in the text. - Since the paper emphasizes 3D SIM, OTFs along the axial direction would also be useful to show, in addition to the lateral OTFs shown in Fig. 2D. - When the sample is moved by the piezo, the axial phase of the 3D-SIM illumination pattern is stable as the sample is scanned through the illumination pattern. When remote focusing is performed, the sample is always stable so the axial phase of the 3D-SIM illumination pattern is presumably changing with remote focusing. Can the authors clarify if the 3D SIM illumination pattern is scanned when remote focusing is applied, or is the intensity pattern stable in z? - In Supplementary Fig. 9, primary spherical is referred to twice, both at index 11 and 22. The latter is presumably secondary spherical? - we do not understand the x axis label, in Fig. S4D, is it really [0, 50, 50, 50] as written? see attached pdf for reference

      Referee Cross-Commenting

      I don't have much to add; the other reviewers raise good points and I think it would be good if the authors could respond to their feedback in a revised manuscript.

      Significance

      Nearly all fluorescence images deteriorate as a function of depth. Methods to ameliorate this depth-dependent degradation are thus of great practical value, as they improve the information content of images and thus (hopefully) biological insight. In this work, the authors develop a method to improve super-resolution imaging (3D SIM) at depth, by combining it with adaptive optics.

    1. Note: This response was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1:

      This study provides negative in vivo evidence for the use of two PERK inhibitors and of TUDCA for the treatment of Sli1-related Marinesco-Sjögren syndrome (MSS).

      Overall, the manuscript reports a substantial amount of work and the study could be published in its present format. The experiments are well described in terms of methodology and appropriate analysis has been applied. Claims are proportionate and not overstated

      I would have only minor comments related to some clarifications that the authors could make in the present manuscript and a suggestion for experiments that could improve the manuscript.

      First, although this is not my expertise, the in vitro analysis of CHOP luciferase assays suggests that very high concentrations, in particular of TUDCA, are needed to observe an effect. The authors may wish to clarify their opinion and whether this could be the reason why in vivo they have been unable to obtain any inhibition of the PERK pathway.

      The reviewer is correct in pointing out that high concentrations of trazodone, DBM and TUDCA were required to inhibit the PERK pathway in the CHOP::luciferase reporter cell lines. However, as we state in the Discussion, we do not think that their lack of effect in vivo was due to insufficient drug levels, since woozy mice were treated with trazodone, DBM or TUDCA according to dose regimens and administration routes that have proved effective in other neurodegenerative disease mouse models. Moreover, our analysis did not find major differences in drug bioavailability between mice with the woozy genetic background (CXB5/ByJ) and C57BL/6J mice in which these drugs had shown neuroprotective effects (see also the response to the next point).

      Second, it seems to me that when measuring the Trazodone metabolism there is a difference between acute and chronic treatment. It would be worth discussing what the authors make of that and what is more relevant (I assume chronic) to the disease model outcome.

      We realized that the nomenclature used in Figures 6 and 7 was confusing, leading the reader to think there were differences in trazodone levels between chronically and acutely treated mice.

      The experiment shown in Figure 6 was designed to test whether there were differences in trazodone pharmacokinetics and metabolism between mice of the woozy strain, which have the CXB5/ByJ genetic background, and C57BL/6J mice in which trazodone had shown neuroprotective effects in previous studies. In contrast, Figure 7 illustrates the levels of trazodone and m-CPP in control and woozy mice (both of which have the CXB5/ByJ genetic background) that had been chronically treated with trazodone for 5 weeks. These are the same animals as in Figure 3, as we state in Figure 7 legend. Therefore one should compare the levels of trazodone and m-CPP in Figure 7 with those of the "woozy" group (CXB5/ByJ genetic background) in Figure 6. This comparison shows that trazodone and m-CPP levels are comparable after chronic and acute (6h) treatment.

      To avoid confusion, we have changed the mouse nomenclature. We have renamed the control group of mice as "CT" (previously "WT") throughout the text and figures. In Figure 6, we have used CXB5/ByJ instead of "woozy" to emphasize the comparison between the different genetic backgrounds (CXB5/ByJ vs C57BL/6J). Finally, we have replaced the colors of symbols in Figure 7 in order to match those of Figure 3. We have also made the description and discussion of these results clearer in the revised manuscript.

      With respect to the experiments a simple and informative addition would be the evaluation of the PERK pathway in mice treated with TUDCA, as this is missing.

      The effect of TUDCA treatment on the PERK pathway is shown in Figure 5, where we measured CHOP mRNA levels in Purkinje cells microdissected from mice treated with 0.4% TUDCA in the chow, and in Figure 9C and D, where we measured the percentage of CHOP-immunopositive Purkinje cells in the cerebellum of same groups of mice by immunohistochemistry.

      Figure 10 illustrates the results of an additional experiment in which woozy mice were treated with 500 mg/kg TUDCA intraperitoneally (ip), to test whether this alternative dosing regimen was any better. Like the treatment per os, TUDCA ip had no beneficial effect on motor dysfunction. Therefore we deemed it unnecessary to check the effect on PERK pathway inhibition in this group of mice.

      A more difficult but potentially more interesting line of investigation is that of searching for potential actions of Trazodone that are PERK independent and might be responsible for the partial rescue observed in the beam walking test, which is much more sensitive and specific than rotarod, so worth considering. Assuming authors want to go down this route and add significance to their study my suggestion would be an unbiased RNA seq from the brain samples they already have. However, this is a suggestion to steer the study towards a more positive outcome and it is not necessary to support their current conclusions.

      We agree with the reviewer that it would be interesting to investigate the mechanism by which trazodone slightly ameliorated the motor performance of woozy mice in the beam walking test. In the Discussion, we speculated that this could be due to an effect of trazodone on cerebellar serotonergic neurotransmission, which would require electrophysiological investigations to demonstrate. Of course, other mechanisms may also be operative, which RNA seq may help identify, as the reviewer suggests. However, this would be a complex and lengthy investigation, the results of which would not change the main conclusions of the present paper. We plan to explore this line of investigation in a future study.

      Reviewer #2:

      Lavigna et al. described the effect of Trazodone in Marinesco-Sjögren syndrome model mice. Although the results are somewhat disappointing, this research has provided fundamental evidence for the future development of MSS therapeutics. There are few minor comments to further improve the manuscript

      Major comment<br /> P14<br /> "Trazodone metabolism to m-CPP was slightly impaired in woozy mice compared to C57BL/6J mice. This was evident from the m-CPP/trazodone ratio, calculated on the AUC0-t in the plasma, which was 0.34 in woozy and 0.67 in C57BL/6J mice."

      Why was the concentration different between WT and woozy mice? Which organ mainly contributes to the metabolism of trazodone? Is the function of this target organ different between WT and woozy mice?<br /> Similar to trazodone, m-CPP clearance from plasma was slightly faster in woozy than in C57BL/6J mice.<br /> Is m-CPP eliminated via the kidney? Or liver? Why is there a difference? Does SIL1 functions in liver or kidney? Needs discussion. This is the same for brain m-CPP levels.

      As explained in the response to the second comment of reviewer #1, "woozy" in Figure 6 referred to mice with the CXB5/ByJ genetic background, and in this experiment we compared trazodone pharmacokinetics and metabolism between CXB5/ByJ and C57BL/6J mice. We have modified the nomenclature of Figure 6 and the Results to make this clear.

      Trazodone undergoes extensive hepatic metabolism, and only a small percentage is excreted unchanged in the urine. Metabolism involves hydroxylation, oxidation and dealkylation reactions, forming in particular the 5HT-active metabolite m-CPP (by CYP3A4). This and other metabolites are mainly excreted in urine, as conjugates [1-3]. The slight differences in trazodone pharmacokinetics and metabolism between the CXB5/ByJ and C57BL6/J mice shown in Figure 6 is not attributable to loss of SIL1 function, since both groups of mice carried wild-type Sil1 alleles, but is most likely due to subtle differences between the two strains, for example in the binding to plasma proteins, metabolic enzymes, transporters and/or the excretion processes. The available data do not allow to clarify this issue.

      The main point, however, is that no major differences were found in the plasma and brain concentrations of trazodone between these two strains of mice, which could have explained the lack of efficacy of trazodone in woozy mice, as we now further stress in the revised Discussion.

      Minor comments

      P3 L5 mutation should be variant.

      This has been changed.

      P4 L1 eIF2a-P should be phosphorylated eIF2α (p-eIF2α). The reviewer prefers (p-eIF2α) than (eIF2α-p) throughout the manuscript.

      There is no standard rule for indicating phosphorylated proteins, and phosphorylated eIF2α is referred to in various ways in different papers, with the "p" in capital or lowercase, preceding or following the protein name, separated by a dash or not. We would prefer to maintain the current nomenclature for consistency with our previous publications, unless the Editor deems otherwise.

      P9 L11 M-CPP should be fully spelled out the first time it appears. m-Chlorophenylpiperazine (m-CPP)

      M-CPP is spelled out the first time it appears in the Material and Methods, subheading Drug treatments and bioanalysis.

      Please explain the difference between the expected function of trazodone and its metabolite m-CPP. Why m-CPP is not effective.

      Based on the observation that mice of the woozy strain had lower brain levels of m-CPP than C57BL6/J mice (Figure 6), we hypothesized that the lack of effect of trazodone in woozy mice could be due to m-CPP mediating the PERK signaling inhibitory activity of trazodone. However, experiments in CHOP::luciferase reporter cells demonstrated that m-CPP does not inhibit PERK signaling (Figure 2D). The precise mechanism by which trazodone inhibits PERK signaling is not known [4], which makes it difficult to speculate why its main metabolite, m-CPP, does not exhibit this activity.

      P11 L3 Fig. 3 Fig. 3A and B?

      Yes, we specifically refer to panels A and B of Figure 3. We have indicated this in the revised manuscript.

      P11 L6 at 7 weeks of age?

      We have re-done the statistical analysis by two-way ANOVA and reported the results in the legend to Figure 3. There is a significant difference between vehicle- and trazodone-treated woozy mice in the number of missteps when the two groups are compared globally. No statistically significant difference in the number of missteps is detected at specific time points by post-hoc analysis. There is no statistically significant difference between vehicle- and trazodone-treated woozy mice in the time to traverse the beam. The Results section has been revised accordingly.

      P12 L17 ~4 times, 4 times? Please state the exact value.

      Done.

      Figure 7 Why are brain m-CPP levels higher than plasma levels? Is trazodone metabolized in brain tissue?

      Trazodone is extensively metabolized in the liver through Cytochrome P450 (Rotzinger et al., 1999). It is well documented that m-CPP readily passes the blood-brain barrier, much better than the parent compound, explaining its high brain levels [2].

      P19 L7 ISRIB; please fully spell out the first time it appears.

      Done.

      References

      1. Rotzinger S, Bourin M, Akimoto Y, Coutts RT, Baker GB (1999) Metabolism of some “second”- and “fourth”-generation antidepressants: iprindole, viloxazine, bupropion, mianserin, maprotiline, trazodone, nefazodone, and venlafaxine. Cell Mol Neurobiol 19:427– 442. https://doi.org/10.1023/a:1006953923305
      2. Caccia S, Ballabio M, Samanin R, Zanini MG, Garattini S (1981) (--)-m-Chlorophenyl- piperazine, a central 5-hydroxytryptamine agonist, is a metabolite of trazodone. J Pharm Pharmacol 33:477–478. https://doi.org/10.1111/j.2042-7158.1981.tb13841.x
      3. DeVane CL, Boulton DW, Miller LF, Miller RL (1999) Pharmacokinetics of trazodone and its major metabolite m-chlorophenylpiperazine in plasma and brain of rats. Int J Neuropsychopharm 2:17–23. https://doi.org/10.1017/S1461145799001303
      4. Halliday M, Radford H, Zents KAM, Molloy C, Moreno JA, Verity NC, Smith E, Ortori CA, Barrett DA, Bushell M, Mallucci GR (2017) Repurposed drugs targeting eIF2alpha-P-mediated translational repression prevent neurodegeneration in mice. Brain 140:1768– 1783. https://doi.org/10.1093/brain/awx074
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Author responses


      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      In their manuscript, Dutta and colleagues compared the meiotic recombination landscapes between five budding yeast species. In the first part of the work, the authors constructed a high-resolution map of meiotic recombination events in Kluyveromyces lactis supported by high-quality genome assemblies for two strains of this yeast. Then, partially repeating their CO and NCO mapping strategy, they compared a number of meiotic recombination parameters between the five species (sometimes three, depending on the quality of the data for each species). They particularly focused on key parameters for meiotic recombination, such as crossover interference and homeostasis and obligate crossover. Although the analysis is interesting, it is underdeveloped in many places and lacks the general conclusions regarding the evolution of recombination and the broader perspective that would be expected from a comparison of these phenomena in budding yeasts.

      [R] Tackling the evolution of recombination is ambitious. Here, with a dataset of five species, it is hard to draw strong evolutionary conclusions besides the variations in the crossover (CO) landscapes and the control of CO formation that we observed, which is already significant. The multiple losses of CO interference that we describe here may constitute our strongest evolutionary conclusion. It potentially underscores the minor evolutionary advantage associated to CO interference at least in budding yeasts. In this context, we changed the title to be more factual and updated the text to better highlight the significance and implications of our findings.

      Major comments:

      The authors indicate that the distribution of hotspots and coldspots is not preserved between species, but this finding is not properly documented. I think it would be useful to include recombination maps in a main figure for all species (or at least for S. cerevisiae, K. lactis and L. waltii) with the elements highlighted. This will allow for a visual illustration of the variability in the recombination landscape between the studied species. [R] The genomes of the species show blocks of synteny but overall, they are not collinear and therefore, it is not possible to have a direct comparison of the recombination maps. In our previous work, we have highlighted the non-conservation of CO hotspots between S. cerevisiae, L. kluyveri and L. waltii (Brion et al. 2017; Dutreux et al. 2023). Briefly, we retrieved conserved syntenic blocks in L. kluyveri and L. waltii genomes containing at least two S. cerevisiae orthologs associated with one hotspot. L. waltii shares only five out of the 92 S. cerevisiae crossover hotspots (RHO5, SLS1, GYP6, OLE1 and MRPL8), while L. kluyveri shares only one. L. waltii and L. kluyveri share no crossover hotspots. In addition, our current study shows that none of the K. lactis hotspot is conserved in any of the four other species (response figure 1 and new supplementary figure S11).

      Response Figure 1. Density of crossovers along the genome using a 5 kb window in the S. cerevisiae genome (Mancera et al. 2008; Oke et al. 2014; Krishnaprasad et al. 2015 combined dataset). Horizontal dotted green line represents crossover hotspot significance threshold. Solid spheres represent the conserved CO hotspots with either L. kluyveri (red) or L. waltii (blue). None of the 92 S. cerevisiae crossover hotspot is conserved in L. lactis.

      Although analyses analogous to those presented in Fig. S5 had already been published in other comparisons of the recombination landscape in yeast (e.g. Dutreux et al., 2023), I think that Figs. S5A and S5B are worth to be presented in the main figures (not supplementary data). In many species of eukaryotes, the detection of NCOs is practically impossible, therefore only results for COs are presented. Therefore, it is perhaps also worth discussing the fact that the relationship applies to all recombination events and not only COs, and therefore is related to the regulation of DSBs frequency and not individual DSBs repair pathways.

      [R] Figures S5A-B are now included in the main figure, Figure 2B. The association holds true for all total recombination (CO+NCO) events as well, new supplementary figure S6A.

      The authors find that CO coldspots were associated with DNA repair genes. Unfortunately, an equivalent analysis was not performed for all recombination events (CO + NCO). I presume this approach is based on the belief that COs are more mutagenic than NCOs. However, recent studies in humans suggest that, at least in mammals, meiotic DSBs themselves are mutagenic, regardless of the pathway used for their repair (Hinch et al., Science 2023). Therefore, I would suggest repeating the analysis also considering NCOs (although I am aware that the picture of NCOs may be incomplete). I would also like to see some graphical representation of the analysis. Is it possible to perform a classic analysis of coldspots/hotspot enrichment in relation to gene ontology?

      [R] As suggested, we performed the analysis to independently detect coldspots for all recombination events (CO+NCO). Based on a threshold of

      In relation to the previous point - it may be worth repeating this type of analysis also for other yeasts used in this study, or at least for S. cerevisiae, to be able to consider the extent to which this relationship is universal and dependent on the meiotic DSB repair pathway.

      [R] The analysis regarding the CO coldspots has been performed in the other species as well. As mentioned in the main text, although some overlap between CO coldspots and DNA repair genes has been observed in the other species as well, we observed a significant enrichment in K. lactis only, maybe because the dataset is larger than in the other species.

      In Fig. S7, the point where WGD occurred is marked in the wrong place, or at least that is what the sentence in the text says ("The Lachancea and Kluyveromyces species branched from the Saccharomyces lineage more than 100 million years ago, before to the ancestral whole-genome duplication (WGD) event specific of the S. cerevisiae lineage").

      [R] We regret the oversight and have corrected the figure.

      The result presented in Fig. S8 is interesting and should be shown in the main figures. Perhaps it would be worth adding an illustration illustrating simple versus complex COs.

      [R] The old Figure S8 is now a part of main Figure 2C-D with the illustrations describing the CO types.

      The last part of the results includes an analysis of the evolutionary rates of the ZMM genes. In the discussion, the authors should also refer the results of this analysis to the previous analysis of the overrepresentation of DNA repair genes in recombination coldspots. I understand that ZMM are not DNA repair proteins in the strict sense, but I think it is worth familiarizing readers with the authors' view on this matter. Moreover, I would suggest showing where MLH1 and MLH3 are located on the plot in Fig. 6 (especially the meiosis-specific MLH3), whether the selection pressure acts on them as on ZMM proteins, or rather as on DNA repair proteins. Showing the SLX4 and MUS81 would also be interesting.

      [R] Figure 6 has been updated as suggested and now shows the Mlh1, Mlh3, Slx4 and Mus81 dN/dS values for the three species.

      I feel like the discussion is underdeveloped. I missed a deeper summary of the comparison between meiotic recombination among the tested budding yeasts in the context of the presence and absence of functional ZMM. Even the title of the work is not properly developed in the manuscript text. The analysis shows that it is not the presence of a functional ZMM pathway or its lack that introduces differences between the individual recombination landscapes, although ZMM determines the presence of proper CO interference. With the caveat that for L. kluyveri it is basically unknown whether it has a functional ZMM or not. Maybe confirming the lack of expression of some ZMM genes in meiosis of this species would answer the question of how it should be treated?

      [R] We agree with this reviewer that our original title was imprecise, so we changed it to be more factual, emphasizing on the multiple losses of crossover interference in budding yeasts. As stated above, it potentially underscores the minor/negligible evolutionary advantage associated to CO interference at least in budding yeasts. From there, it is hard to draw deeper conclusions since the actual roles/functions of CO interference are still under debate, notably in yeasts where the CO frequency tends to be high. We improved the discussion to better highlight these points.

      We also agree that a deeper characterization of the ZMM factors persisting in the non-Saccharomyces yeasts would be informative, but we believe it is beyond the scope of the current manuscript and more suitable for a follow up work. However, our recent publication about L. kluyveri (Legrand et al 2024) shows that Zip3 is properly expressed in meiosis and behaves as in S. cerevisiaesince it is located at DSB sites. Furthermore, we have unpublished transcriptomic data (Response Figure 2) showing that all the ZMM genes from L. kluyveri are specifically induced in meiosis (fold increase >16 at least compared to pre-sporulation conditions). Therefore, so far, although the level of CO interference in L. kluyveri is minimal, there is no indication that the ZMM genes are mis regulated.

      Response Figure 2. Transcriptomic data showing that all the ZMM genes from L. kluyveri are specifically induced in meiosis (Unpublished data from Llorente Lab, CRCM, Marseille).






      Minor comments:

      In general, Figure captions are imprecise, many of them lack clear information explaining what is depicted. Authors should remember that figure legends should be self-sufficient. [R] The figure legends have been updated and are now self-sufficient.

      In the revised manuscript, I would suggest placing figure numbers on the figures and using line numbering, which would facilitate the reception of the work and possible reference to its individual elements in the review.

      [R] We regret the omission. Figure numbers, Line numbers and Page numbers have been added.

      Reviewer #1 (Significance (Required)):

      The study provides a new insight into the variation in recombination landscape within budding yeast species with a special emphasis on crossover control. This includes also de novo assemblies of Kluyveromyces lactis genome and high-resolution tetrad-based maps of meiotic recombination events. Previously, recombination maps of different yeast species were compared, however this study focuses on budding yeasts, some of which lost ZMM pathway and differ in some crossover parameters, like interference and homeostasis. Although the analysis is interesting, it lacks the general conclusions regarding the evolution of recombination and the broader perspective that would be expected from a comparison of these phenomena in budding yeasts.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      This paper describes the genome-wide mapping of meiotic recombination in non-Saccharomyces yeast, Kluyveromyces lactis. By using heterologous parental strains, the authors mapped crossovers (COs) and noncrossovers (NCOs) on the genome of K. lactis which lacks proteins necessary for CO formation such as S. cerevisiae, mammals and plants. This is an extension of previous works by the authors' group which mapped CO and NCO in different yeast, Lachancea kluyveri and L. waltii by a similar approach. The authors found that CO frequencies in K. lactis are much lower than those in S. cerevisiae and COs showed weaker interference, which facilitates the non-random distribution of COs along a chromosome. Overall, the experiments and informatic analyses have been done in good quality and the results are convincing. The paper provides additional new information on the landscape of meiotic recombination in different yeast species. These results are of great interest to researchers in the field of meiotic recombination and evolution of meiosis. There are some issues that the authors may be able to address before the publication.

      Major points: While the authors noted that K. lactic shows the loss of a pro-CO factors (ZMM protein), Spo16, and Msh5 (due to the introduction of an in-frame stop codon), it still possesses other proteins such as Zip1, Zip2, Zip3, Zip4/Spo22, Mer3, and Msh4. It is still likely that these pro-CO factors control CO formation (and interference) in this yeast. It would be nice for the authors to study whether the knockout of these genes is dispensable for CO formation and interference in meiosis. A similar analysis should be done for L. kluyveri which retains all ZMM genes, but this is clearly out of the scope of this paper.

      [R] The question of the functions of the remaining ZMM factors is indeed interesting and related to point #8 from reviewer 1 (please see above). Although this is beyond the scope of our work, we would like to refer here to work from Amy McQueen's lab using L. lactis Zip1 in S. cerevisiae (Voelkel-Meiman 2015). This study shows that L. lactis Zip1 does not allow synaptonemal complex assembly in S. cerevisiae but allows CO formation independently of the Msh4/5 complex but that depend on Zip2/4/Spo16 and Mlh1/3 for their resolution. Overall, these results suggests that L. lactis Zip1 at least retained ancestral functions shared with S. cerevisiae Zip1. However, it is not possible to conclude if the lack of full complementation of L. lactis Zip1 in S. cerevisiae comes from functional divergence or simply by the inability of L. lactis Zip1 to function properly in a heterologous context.

      Minor points:

      No page number, no main Figure number. It is hard to review this paper. [R] We regret the oversight. Figure numbers, Line numbers and Page numbers have been added.

      References: In some cases, in the Introduction, the authors referred to review papers such as Pyatnitskaya et al. (2019) for ZMM proteins while in the other parts, they referred to original papers; for example, three papers for Mlh1-Mlh3. If the number of references is not limited, original papers should be cited in the text.

      [R] We regret this omission. Original papers have now been included in the citations.

      Figure 3A, page 9, second paragraph: When the authors compared CO and NCO densities, it would be nice to show P-values for the comparison.

      [R] p-values have now been added to the updated figure.

      Please show a ratio of CO to NCO in each yeast in Figure 3B in the second paragraph of page 9 in the main text.

      [R] The ratios have now been included in the figure for both the CO:NCO ratios and CO:corrected_NCO ratios, in the main text and figure legends.

      Figure S5 and page 7, the first paragraph and page 9, third paragraph: CO/NCO densities (negative correlation to chromosome sizes) in S. cerevisiae should be checked with or without short chromosomes (I, III, and VI), which show very unique regulation of meiotic DSB formation (see Murakami et al. Nature 2020).

      [R] Even excluding the small chromosomes, the size dependent trend persists for S. cerevisiae and S. paradoxus.

      Table S7: Please add the S. cerevisiae gene name such as ZIP1 next to S. cerevisiae orthologs such as YDR285W. Moreover, please explain the column in detail or clarify the data. What does "meiosis" mean here? For example, YJL074C is SMC3, which is expressed in mitosis as well as in meiosis. The same is true for YGL163C, which is RAD54, which plays a minor role in meiosis, but plays a critical in mitotic DSB repair.

      [R] We corrected Table S7 as desired by systematically including the standardized gene names.

      The Gene Ontology (GO) annotation is a statement about the function of a particular gene. It offers a structured framework and a comprehensive set of concepts to describe the functions of gene products across all organisms. It is specifically crafted to support the computational representation of biological systems. In our specific case, we only looked at genes with the gene ontology annotation "meiosis". Together, these statements comprise a "snapshot" of current biological knowledge and is by no means absolute. This has been detailed in the supplementary Table S7.

      Reviewer #2 (Significance (Required)):

      This study provides the landscape of meiotic recombination in non-Saccharomyces yeast, Kluyveromyces lactis. The genome-wide recombination map in K. lactis shows lower crossover frequencies with weaker crossover interference than those in S. cerevisiae. Overall, the experiments and informatic analyses have been done in good quality and the results are convincing. The paper provides additional new information on the landscape of meiotic recombination in different yeast species, particularly in terms of the evolution of meiotic recombination. These results are of great interest to researchers in the field of meiotic recombination and evolution of meiosis.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      Dutta et al. have compiled a genome-wide meiotic recombination map for Kluyveromyces lactis and compared it to a compilation of meiotic recombination maps for four other species, two of which (Lachancea kluyveri and Lachancea waltii), like K. lactis, predate the genome duplication event that produced the other two (Saccharomyces cerevisiae and S. paradoxus). Meiosis in many species studied (including metazoans and plants) shows control over the number and distribution of crossovers, which are critical for faithful chromosome segregation during meiosis. This takes the form of crossover interference, where crossovers are spaced more evenly than expected by chance, and crossover homeostasis, where many fewer chromosomes lack a crossover than is expected by chance. While both of the post-duplication species show both crossover interference and homeostasis, none of the pre-duplication species show crossover homeostasis, and crossover interference is very weak. In two cases (K. lactis and L. waltii), this can be explained by mutational loss of a few of the genes (called the ZMM genes) that promote meiotic crossovers in many species. However, L. kluyveribehavior cannot be explained in this way. Recombination hotspots are present but are not shared between the pre-duplication species or between the pre- and post-duplication species, perhaps not surprising for species that diverged more that 100 million years ago. Overall, this work will be a useful contribution to our understanding of the different possible flavors of meiotic recombination mechanisms and control that are possible (and, one might add, promote long-term species viability). A) Evaluation, reproducibility and clarity The work presented in this paper is straightforward and unimpeachable and will largely be of interest to those studying meiotic recombination, be it mechanistic studies or studies of the implications for population genetics. The analysis is technically correct, although there are some aspects where a slightly different emphasis should be considered (see comments below). However, the data and the analysis could stand as they currently are, without further revision.

      Suggestions are below. 1. (trivial) it would have been useful if pages and lines were numbered.

      [R] We regret the oversight. Figure numbers, Line numbers and Page numbers have been added.

      "Across the 205 meioses...". In general, it would be desirable to apply compensation for the fact that NCOs and COs are differently detected. Since, in K. lactis, 35% of COs are not accompanied by detectable gene conversion, it seems reasonable to apply a correction to measured NCOs here and throughout the paper, regardless of the species. For example, if one assumes that 35% of NCOs are not detected, how does this affect estimates of chromosomes that do not appear to have undergone interhomolog recombination? Estimates of CO/NCO bias? In a similar vein, if the CO event is not considered (just the conversion events associated with it), how does this affect measures of conversion tract lengths in COs and NCOs?

      [R] We thank the reviewer for this suggestion. We have performed the correction for the NCO estimates as described in Mancera et al. 2008, on a per tetrad basis across all the species. The fraction of missed NCOs were 7%, 34%, 30%, 23% and 25% respectively for S. paradoxus, S. cerevisiae, K. lactis, L. waltii and L. kluyveri. The fraction of missed NCOs depend upon the parental marker density. In addition, we performed the CO:NCO bias analysis both with the detected and the corrected NCO frequencies and the trends remain unchanged (Now included in figure 3). Finally, we refrain from using the corrected NCO frequencies while reporting the NCO frequencies (Table 1, main text) to maintain uniformity with our previous work and since, these corrections do not alter any results.

      It might be useful to report recombination event frequencies in terms of events/chromosome, as this, rather than event/unit distance, is functionally more relevant. In the same vein, it might be useful to consider total event homeostasis, in addition to just crossover homeostasis.

      [R] This has been updated as suggested. .

      An interesting observation is that two of the three pre-duplication species clearly at one time had a full complement of ZMM genes but lost some due to mutation. Have there ever been attempts to detect either synaptonemal complex or axial elements in these species?

      [R] This is related to point #8 from reviewer 1 and to the major point of reviewer 2 (please see above).

      To our knowledge, cytological observations of synaptonemal complex (SC) or axial elements have been performed in L. kluyverionly by us and the SC is clearly visible (Legrand et al 2024).

      However, it is key to remind here that K. lactis axis protein encoding genes HOP1 and RED1 have been cloned by the Roeder's lab by functional complementation of S. cerevisiae corresponding mutants, supporting the functional conservation of these genes (Smith and Roeder 2000). Finally, as mentioned above, K. lactis Zip1 retained at least some function of the ancestral Zip1 protein that are also shared by the S. cerevisiae protein (Voelkel-Meiman 2015).

      The observation of elevated evolutionary rates in ZMM genes is also intriguing, but it would help if "dN/dS ratio" was defined.

      [R] It is now defined in the text.

      The observation of frequent E0 chromosomes is taken to suggest efficient achiasmate segregation; has the "corrected" NCO frequency been considered? Do the different frequencies of E0 chromosomes predict the different spore viabilities seen between species?

      [R] E0 is not predictive at all of the spore viability as we have shown in previous studies (see L. kluyveri - Brion et al. 2017, L. waltii-Dutreux et al. 2023). In addition, this has been shown is S. cerevisiae as well (Nishant et al. 2009).

      Figure 3A-what would this look like if it were plotted as "Events per chromosome" rather than per megabase?

      [R] We changed the figure (now figure 2A) and plotted as events per chromosome to show the variability of events at the chromosome level.

      Figure legends tend to be unreasonably terse, which makes figures more difficult to interpret.

      [R] This has been updated as suggested.

    1. We would like to thank you and the reviewers for your thoughtful comments that assisted us to improve the manuscript. We carefully followed the reviewers’ recommendations and provide a detailed point-by-point account of our responses to the comments. 

      Please find below the important changes in the updated manuscript.

      (1) We changed the title according to the comments provided by reviewer #1.

      (2) We edited the introduction, results, and discussion to improve the link between the objectives of the study, the findings, and their discussion, as reviewer #2 recommended.

      (3) We clarified the link between camouflage and fitness, which is now presented as a hypothesis, as reviewer #1 suggested.

      (4) We added new analyses and figures in the main text and in the supplementary materials to better emphasize sex differences in landing force, foraging strategies and hunting success, following reviewer #1 suggestion.

      (5) According to reviewer #2 comments, we edited the results adding key information about methods to help the reader understand the findings without reading the Methods section.

      (6) We added important details about the model selection approach along with a discussion of the low R-square values reported in our analyses on hunting success, as reviewer #2 suggested.

      eLife assessment 

      This fundamental work substantially advances our understanding of animals' foraging behaviour, by monitoring the movement and body posture of barn owls in high resolution, in addition to assessing their foraging success. With a large dataset, the evidence supporting the main conclusions is convincing. This work provides new evidence for motion-induced sound camouflage and has broad implications for understanding predator-prey interactions. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this paper, Schalcher et al. examined how barn owls' landing force affects their hunting success during two hunting strategies: strike hunting and sit-and-wait hunting. They tracked tens of barn owls that raised their nestlings in nest boxes and utilized high-resolution GPS and acceleration loggers to monitor their movements. In addition, camcorders were placed near their nest boxes and used to record the prey they brought to the nest, thus measuring their foraging success. 

      This study generated a unique dataset and provided new insights into the foraging behavior of barn owls. The researchers discovered that the landing force during hunting strikes was significantly higher compared to the sit-and-wait strategy. Additionally, they found a positive relationship between landing force and foraging success during hunting strikes, whereas, during the sit-and-wait strategy, there was a negative relationship between the two. This suggests that barn owls avoid detection by generating a lower landing force and producing less noise. Furthermore, the researchers observed that environmental characteristics affect barn owls' landing force during sit-and-wait hunting. They found a greater landing force when landing on buildings, a lower landing force when landing on trees, and the lowest landing force when landing on poles. The landing force also decreased as the time to the next hunting attempt decreased. These findings collectively suggest that barn owls reduce their landing force as an acoustic camouflage to avoid detection by their prey. 

      The main strength of this work is the researchers' comprehensive approach, examining different aspects of foraging behavior, including high-resolution movement, foraging success, and the influence of the environment on this behavior, supported by impressive data collection. The weakness of this study is that the results only present a partial biological story contained within the data. The focus is on acoustic camouflage without addressing other aspects of barn owls' foraging strategy, leaving the reader with many unanswered questions. These include individual differences, direct measurements of owls' fitness, a detailed analysis of the foraging strategy of males and females, and the collective effort per nest box. However, it is possible that these data will be published in a separate paper. 

      We greatly appreciate your recognition of the comprehensive approach and extensive data collection. Our primary objective was to study the role of acoustic camouflage. Nonetheless, the manuscript now includes a detailed analysis of the foraging strategy and hunting success of males and females (lines 164-225).

      The results presented support the authors' conclusion that lower landing force during sit-andwait hunting increases hunting success, likely due to a decreased probability of detection by their prey, resulting in acoustic camouflage. The authors also argue that hunting success is crucial for survival, and thus, acoustic camouflage has a direct link to fitness. While this statement is reasonable, it should be presented as a hypothesis, as no direct evidence has been provided here.

      Thank you for the comment. We agree and thus have edited the language accordingly.  

      However, since information about nestling survival is typically monitored when studying behavior during the breeding period, the authors' knowledge of the effect of acoustic camouflage on owls' fitness can probably be provided. Furthermore, it will be interesting to further examine the foraging strategies used by different individuals during foraging, the joint foraging success of both males and females within each nest box, and the link between landing force and foraging success if the data are available.

      We are currently writing a manuscript on these topics. We are aware that several scientific questions regarding the foraging ecology of the barn owl still need our attention. Regarding the link between landing force and foraging success, we believe that our revised manuscript addresses this specific topic, please see specific responses below.

      However, even without this additional analysis on survival, this paper provides an unprecedented dataset and the first measurement of landing force during hunting in the wild. It is likely to inspire many other researchers currently studying animal foraging behavior to explore how animals' movements affect foraging success.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors provide new evidence for motion-induced sound camouflage and can link the hunting approach to hunting success (detailing the adaptation and inferring a fitness consequence). 

      Strengths: 

      Strong evidence by combining high-resolution accelerometer data with a ground-truthed data set on prey provisioning at nest boxes. A good set of co-variates to control for some of the noise in the data provides some additional insights into owl hunting attempts. 

      Weaknesses: 

      There is a disconnect between the hypotheses tested and the results presented, and insufficient detail is provided on the statistical approach. R2 values of the presented models are very small compared to the significance of the effect presented. Without more detail, it is impossible to assess the strength of the evidence.

      In the revised manuscript, we changed the way results are presented and we improved the link between the hypotheses and the results. The R2 values are indeed small. It is however important to keep in mind that we are assessing the outcome of one specific behavior (i.e. landing force during sit-and-wait hunts) on hunting success in a wild environment, where many complex ecological interactions likely influence hunting success. Nonetheless, the coefficients (as reported in the results) show that for every 1 N increase in landing force, there is a 15% reduction in hunting success, which is substantial. In the discussion we also note that 50 Hz is a relatively low sampling frequency for estimating the peak ground reaction force. We have gone back over the presentation of our results and made our discussion more nuanced to acknowledge this aspect. 

      We have also added a detailed description about our model selection process in the methods section and provide a model selection table for each analysis in the supplementary materials.

      The authors seem to overcome persisting challenges associated with the validation and calibration of accelerometer data by ground-truthing on-board measures with direct observations in captivity, but here the methods are not described any further and sample sizes (2 owls - how many different loggers were deployed?) might be too small to achieve robust behavioural classifications.

      Thank you for the comment. Details of our methods of behavioural identification are provided in lines 385 – 429. There are two reasons why our results should not be limited by the sample size. First, we used the temporal sequence of changes in acceleration, and rates of change in acceleration data, which make the methods robust to individual differences in acceleration values. Furthermore, our methods for behavioural identification were not based on machine learning. Instead, we use a Boolean based approach (as described in Wilson et al. 2018. MEE), which is more robust to small differences in absolute values that might occur e.g. in relation to slight changes in device position. 

      Recommendation for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Comment 1. This study provides new insights into animals' foraging behavior and will probably inspire other researchers to examine foraging behavior in such high resolution.

      We hope so, thank you.

      Comment 2. However, it is necessary to describe better the measured landing force and the hunting strike and perching behavior so the readers can understand these methods when reading the results (and without reading the Methods).

      We have now changed the text in the “Results” to help the reader understand the key methods while reading the results.

      Comment 3. In addition, make sure you use the same terminology for hunting strategies during the entire paper and especially in all figures and corresponding result descriptions.

      We now use consistent terminology throughout the text and figures. We hope that this is now clear in the revised manuscript.

      Comment 4. In addition, although I find your statement about the link between acoustic camouflage and fitness reasonable, it should be described as a hypothesis or examined if you want to keep the direct link statement. I believe showing a direct link can add an additional outstanding aspect to this paper, but I also understand that it can be addressed in a separate paper.

      We agree that the relationship between hunting success and barn owl fitness is an important topic, but it necessitates a consideration of both hunting strategies, including hunting on the wing, which extends beyond the limits of our current study. Indeed, our primary objective was to conduct a detailed examination of the interplay between acoustic camouflage and the success of the sit-and-wait technique.

      However, we have edited the manuscript to explicitly describe the link between acoustic camouflage and fitness as a hypothesis. We believe this adjustment provides a more accurate representation of our approach. We hope this clarifies the specific emphasis of our work and its contribution to the understanding of barn owl hunting behavior.

      Here are my detailed comments about the paper: 

      Comment 5. Title: Consider changing the title to "Acoustic camouflage predicts hunting success in a wild predator." 

      We would like to thank you for your nice proposition. However, we opted for a different title, which is now “Landing force reveals new form of motion-induced sound camouflage in a wild predator”.

      Comment 6. Line 91-93: Please provide additional information about the collected dataset, including: 

      Description of the total period of observations, an average and standard deviation of perching and hunting attempt events per individual per night, number of foraging trips per individual per night, details about the geographic location and characteristics of the habitat, season, and reproductive state. 

      The revised manuscript now includes detailed information about the collected dataset (i.e. study area, reproductive state, etc…). “We used GPS loggers and accelerometers to record high resolution movement data during two consecutive breeding seasons (May to August in 2019 and 2020) from 163 wild barn owls (79 males and 84 females) breeding in nest boxes across a 1,000 km² intensive agricultural landscape in the western Swiss plateau.” Results section, lines 79 – 82

      Details about the number of foraging trips per individuals and per night are now presented in the results: “Sexual dimorphism in body mass was marked among our sampled individuals. Males were lighter than females (84 females, average body mass: 322 ± 22.6 g; 79 males, average body mass 281 ± 16.5 g, Fig S6) and provided almost three times more prey per night than females (males: 8 ± 5 prey per night; females: 3 ± 3 prey per night; Fig.S7). Males also displayed higher nightly hunting effort than females (Males: 46 ± 16 hunting attempts per night, n= 79; Females: 25 ± 11 hunting attempts per nights, n=84; Fig. 3A, Fig S8). However, females were more likely to use a sit and wait strategy than males (females: 24% ± 15%, males: 13% ± 10%, Fig.S9). As a result, the number of perching events per night was similar between males and females (Females: 76 ± 23 perching events per nights; Males: 69 ± 20 perching events per night; Fig S8).” (lines 165 – 174) 

      Comment 7. In addition, state if the information describes breeding pairs of males and females and provides statistics on the number of tracked pairs and the number of nest boxes.

      The revised manuscript now includes a description of the number of tracked breeding pairs and the number of nest boxes. “Of these individuals, 142 belonged to pairs for which data were recovered from both partners (71 pairs in total, 40 in 2019, 31 in 2020). The remaining 21 individuals belonged to pairs with data from one partner (11 females and 1 male in 2019; 4 females and 5 males in 2020).” (lines 82 – 85.)

      Comment 8. Line 93: Briefly define the term "landing force" and explain how it was measured (and let the reader know that there is a detailed description in the Methods).

      We now include a brief definition of the “landing force” along with a brief explanation of how it was measured in the results section. “We extracted the peak vectoral sum of the raw acceleration during each landing and converted this to ground reaction force (hereafter “landing force”, in Newtons) using measurements of individual body mass (see methods for detailed description).” (lines 92 – 95).

      Comment 9. Line 94: All definitions, including "pre-hunting force," need to be better described in the Results section.

      Thank you for this suggestion. We now provided a better description of those key definitions directly in the results section: 

      Measurement of landing force: “Barn owls employing a sit-and-wait strategy land on multiple perches before initiating an attack, with successive landings reducing the distance to the target prey (Fig. 2C). 

      We used the acceleration data to identify 84,855 landings. These were further categorized into perching events (n = 56,874) and hunting strikes (n = 27,981), depending whether barn owls were landing on a perch or attempting to strike prey on the ground (Fig. 1A and B, see methods for specific details on behavioral classification).” (lines 88 – 95)

      Pre-hunt perching force predicts hunting success: “Finally, we analyzed whether the landing force in the last perching event before each hunting attempt (i.e. pre-hunt perching force) predicted variation in hunting success” (lines 229 – 230)

      Comment 10. Line 102: Remove "Our analysis of 27,981 hunting strikes showed that" and add "n = 27,981" after the statistics. You have already stated your sample size earlier. There is no need to emphasize it again, although your sample size is impressive.

      We modified the text in the results section as suggested.

      Comment 11. Line 104: The results so far suggest that the difference in landing force between males and females is an outcome of their different body masses. However, it is not clear what is the reason for the difference in the number of hunting strike attempts between males and females (Lines 104-106). Can you compare the difference in landing force between males and females with similar body mass (females from the lower part of the distribution and males from the upper part)? Is there still a difference?

      Thank you, following your comment we made some new analyses that clarified the situation around landing force involved in perching and hunting strike events between sexes. But firstly, we wanted to clarify why there is a difference in number of hunting attempts between males and females. During the breeding season, females typically perform most of the incubation, brooding, and feeding of nestlings in the nest, while the male primarily hunts food for the female and chicks. The female supports the male providing food in a very irregular way, and this changes from pair to pair (paper in prep.). The differences in number of hunting attempts between males and females reflects this asymmetry in food provisioning between sexes during this specific period. We specified this in the revised version of the manuscript (lines 164 – 174). 

      We also provide a new analysis to investigate sex differences in mass-specific landing force (force/body mass). We found that males and females produce similar force per unit of body mass during perching events. This demonstrates that the overall higher perching force in females (see Fig. 4C in the manuscript) is therefore driven by their higher body mass. (lines 194 – 199)

      Comment 12. Line 154: I believe Boonman et al. (2018) is relevant to this part of the discussion. Boonman, Arjan, et al. found that barn owl noise during landing and taking off is worth considering. ["The sounds of silence: barn owl noise in landing and taking off."

      Behavioral Processes 157 (2018): 484-488.]

      We now cited this paper in the discussion.

      Comment 13. Line 164: Your results do not directly demonstrate a link to fitness, although they potentially serve as a proxy for fitness (add a reference). However, you might have information regarding nestlings' survival - that will provide a direct link for fitness. Change your statement or add the relevant data.

      We appreciated your feedback, and we adjusted the language accordingly.

      Comment 14. Line 213: If the poles are closer to the ground - is it possible that the higher trees and buildings serve for resting and gathering environmental information over greater distances? For example, identifying prey at farther distances or navigating to the next pole?

      Yes, this is indeed the most likely explanation for the fact that owls land more on buildings and trees than on poles until the last period (about 6 minutes) before hunting. In these last minutes, barn owls preferentially use poles, as we showed in figure 2B. The revised manuscript now includes this explanation in the discussion (lines 269 – 284).

      Comment 15. Line 250: The product "AXY-Trek loggers" does not appear on the Technosmart website (there are similar names, but not an exact match). Are you sure this is the correct name of the tracking device you used? 

      Thank you for pointing out this detail that we missed. The device we used is now called "AXY-Trek Mini" (https://www.technosmart.eu/axy-trek-mini/). We have corrected this error directly in the revised manuscript.

      Comment 16. Line 256: Please explain how the devices were recovered. Did you recapture the animals? If so, how? Additionally, replace "after approximately 15 days" with the exact average and standard deviation. Furthermore, since you have these data, please state the difference in body mass between the two measurements before and after tagging.

      The birds were recaptured to recover the devices. Adults barn owls were recaptured at their nest sites, again using automatic sliding traps that are activated when birds enter the nest box. The statement "after approximately 15 days" was replaced by the exact mean and standard deviation, which were 10.47 ± 2.27 days. Those numbers exclude five individuals from the total of 163 individuals included in this study. They could not be recaptured in the appropriate time window but were re-encountered when they initiated a second clutch later in the season (4 individuals) or a new clutch the year after (1 individual).

      We integrated this previously missing information in the revised manuscript (lines 370 – 372).

      Comment 17. Line 259: What was the resolution of the camera? What were the recording methods and schedule? How did you analyze these data? 

      The resolution was set to 3.1 megapixel. Motion sensitive camera traps were installed at the entrance to each nest box throughout the period when the barn owls were wearing data loggers, and each movement detected triggered the capture of three photos in bursts. The photos recorded were not analyzed as such for this study, but were used to confirm each supply of prey, which had previously been detected from the accelerometer data. We added these details in the revised manuscript (lines 377 – 380)

      Comment 18_1. Figure 1: 

      Panel A) Include the sex of the described individual. 

      The sex of the described individual is now included in the figure caption.

      Comment 18_2. It would be interesting to show these data for both males and females from the same nest box (choose another example if you don't have the data for this specific nest box). 

      Although we agree that showing tracks of males and females from the same nest is very interesting, the purpose of this figure was to illustrate our data annotation process and we believe that adding too many details on this figure will make it appear messy. However, the revised manuscript now includes a new figure (Fig. 3A) which shows simultaneous GPS tracks of a male and a female during a complete night, with detailed information about perching and hunting behaviors.

      Comment 18_3. Add the symbol of the nest box to the legend. 

      Done

      Comment 18_4. Provide information about the total time of the foraging trip in the text below. 

      The duration of the illustrated foraging trip has been included in the figure caption.

      Comment 18_5. To enhance the figure’s information on foraging behavior, consider color coding the trajectory based on time and adding a background representing the landscape. Since this paper may be of interest to researchers unfamiliar with barn owl foraging behavior, it could answer some common questions. 

      For similar reasons explained in our answer above (Comment 18_2), we would rather keep this figure as clean as possible. However, we followed your recommendations and included these details in the new Figure 3 described above. In this new figure, GPS tracks are color coded according to the foraging trip number and includes a background representing the landscape. To provide even more detail about the landscape, we added another figure in the supplementary materials (Fig. S2) which provides illustration of barn owls foraging ground and nest site that we think might be of interest for people unfamiliar with barn owls.

      Comment 18_6. Inset panels) provide a detailed description of the acceleration insert panels. 

      Done

      Comment 18_7. Color code the acceleration data with different colors for each axis, add x and y axes with labels, and ensure the time frame on the x-axis is clear. How was the self-feeding behavior verified (should be described in the methods section)? 

      We kept both inset panels as simple as possible since they serve here as examples, but a complete representation of these behaviors (with time frame, different colors and labels) is provided in the supplementary materials (figure S3). We included this statement in the figure caption and added a reference to the full representations from the supplementary materials: 

      In the Figure caption: “Inset panels show an example of the pattern of the tri-axial acceleration corresponding to both nest-box return and self-feeding behaviors (but see Fig S3for a detailed representation of the acceleration pattern corresponding to each behavior).” 

      In the Method section: “Self-feeding was evident from multiple and regular acceleration peaks in the surge and heave axes (resulting in peaks in VeDBA values > 0.2 g and < 0.9 g, Fig.S3D), with each peak corresponding to the movement of the head as the prey was swallowed whole.”.

      Comment 18_8. Panel B) Note in the caption that you refer to the acceleration z-axis.

      We believe that keeping the statement “the heave acceleration…” in the figure caption is more informative than referring to the “z-axis” as it describes the real dimension to which we are referring. The use of the x, y and z axes can be misleading as they can be interchanged depending on the type and setting of recorders used.

      Comment 18_9. Present the same time scale for both hunting strategies to facilitate comparison. You can achieve this by showing only part of the flight phase before perching. 

      Done

      Comment 18_10. Panel C) Presenting the data for both hunting strategy and sex would provide more comprehensive information about the results and would be relatively easy to implement. 

      We agree with your comment. We present the differences in landing force for both landing contexts and sexes in the new Figure 3 as well as in the supplementary materials (Figure S10) of this revised manuscript.

      Comment 19. Figure 2: Please provide an explanation of the meaning of the circles in the figure caption.  

      Done

      Comment 20. Figure 3: 

      Panel A) It is unclear how the owl illustration is relevant to this specific figure, unlike the previous figures where it is clear. Also, suggest removing the upper black line from the edge of the figure or add a line on the right side. 

      Done (now in Figure 2).

      Panel B) "Density" should be capitalized. 

      Done

      Panel C) Add a scale in meters, and it would be helpful to include an indication of time before hunting for each data point. 

      Done

      Comment 21. Figure S1: Mark the locations of the nest boxes and ensure that trajectories of different individuals and sexes can be identified. 

      The purpose of this figure was to show the spatial distribution of the data. We think that adding nest locations and coloring the paths according to individuals and/or sex will make the figure less clear. However, the new Figure 3 highlights those details.

      Comment 22. Figure S2: Show the pitch angle similarly to how you showed the acceleration axes, and explain what "VeDBA" stands for. Provide a description of the perching behavior, clearly indicating it on the figure. Add axes (x, y, z) to the illustration of the acceleration explanation. 

      We edited this figure (now figure S3) to show the pitch angle and provide an explanation of what “VeDBA” stands for in the figure caption. The figure caption now also provides a better description of the perching behavior. For the axes (i.e. X, Y, Z), we prefer to refer to the heave, surge, and sway as this is more informative and refers to what is usually reported in studies working with tri-axial accelerometers.

      Comment 23. Table S1: Improve the explanation in the caption and titles of the table. 

      Done

      Reviewer #2 (Recommendations For The Authors): 

      Comment 1. From the public review and my assessment there, the authors can be assured that I thoroughly enjoyed the read and am looking forward to seeing a revised and improved version of this paper. 

      We thank the reviewer for this comment. We revised the manuscript according to their comments.

      Comment 2. In addition to my major points stated above, I would like to add the following recommendations: 

      The manuscript is overall well written, but it uses a very pictorial language (a little as if we were in a David Attenborough documentary) that I find inappropriate for a research paper (especially in the abstract and introduction, "remarkable" (2x), "sophisticated" (are there any unsophisticated adaptations? We are referring to something under selection after all) etc.

      We appreciated that you found the paper overall well written, and we understand the comment about pictorial language. We therefore slightly changed the text to make sure that the adjective used to describe adaptive strategies are not over-emphasized.

      Comment 3. Abstract 

      "While the theoretical benefits of predator camouflage are well established, no study has yet been able to quantify its consequences for hunting success." - This claim is actually not fully true: 

      Nebel Carina, Sumasgutner Petra, Pajot Adrien and Amar Arjun 2019: Response time of an avian prey to a simulated hawk attack is slower in darker conditions, but is independent of hawk colour morph. Soc. open sci.6:190677 

      We edited our claim to specify that the consequences of predator camouflage on hunting success has never been quantified in natural conditions and cited the reference in the introduction.

      Comment 4. Line 23. Rephrase to: "We used high-resolution movement data to quantify how barn owls (Tyto alba) conceal their approach when using a sit-and-wait strategy, as well as the power exerted during strikes." 

      We edited this sentence in the abstract, as suggested.

      Comment 5. Results 

      There is a disconnect between the objectives outlined at the end of the introduction and the following results that should be improved. 

      The authors state: "Using high-frequency GPS and accelerometer data from wild barn owls (Tyto alba), we quantify the landing dynamics of this sit-and-wait strategy to (i) examine how birds adjust their landing force with the behavioral and environmental context and (ii) test the extent to which the magnitude of the predator cue affects hunting success." But one of the first results presented are sex differences. 

      This is a fair point. We have now changed our statement in the end of the introduction as well as the order of the results to improve the link between the objectives outlined in the introduction and the way result are presented. 

      Comment 6. At this stage, the reader does not even know yet that we are presented with a size-dimorphic species that also has very different parental roles during the breeding season. This should be better streamlined, with an extra paragraph in the introduction. And these sex differences are then not even discussed, so why bring them up in the first place (and not just state "sex has been fitted as additional co-variate to account for the size-dimorphism in the species" without further details). 

      We edited the way the objectives are outlined in the introduction to cover the size dimorphism (lines 70 – 76). We also completely changed the way the sex differences are presented in the results, including a new analysis that we believe provides a better comprehensive understanding of barn owl foraging behavior (lines 164 – 206). Finally, we added a new paragraph in the discussion to consider those results (lines 319 – 339).

      Comment 7. It is not clear to me where and how high-resolution GPS data were used? The results seem to concentrate on ACC – why GPS was used and how it features should be foreshadowed in a few lines in the introduction. I definitively prefer having the methods at the end of a manuscript, but with this structure, it is crucial to give the reader some help to understand the storyline. 

      GPS data were used to validate some behavioral classifications (prey provisioning for example), but most importantly they were used to link each landing event with perch types. We edited the text in the result section to clarify where GPS and/or ACC data were used.

      Comment 8. Discussion 

      Move the orca example further down, where more detail can be provided to understand the evidence. 

      After our extensive edits in the discussion, we felt this example was interrupting the flow. We now cite this study in the introduction. 

      Comment 9. Size dimorphism and evident sex differences are not discussed. 

      The revised manuscript now includes a new paragraph in the discussion in which sex differences are discussed (lines 319 – 339).

      Comment 10. Be more precise in the terminology used (for example, land use seems to be interchangeable with habitat characteristics?). 

      We modified “land use” with “habitat data” in the revised manuscript.

      Comment 11. Methods 

      Please provide a justification for the very high weight limit (5%; line 256). This limit is outdated and does not fulfill the international standard of 3% body weight. I assume the ethics clearance went through because of the short nature of the study (i.e., the birds were not burdened for life with the excess weight? But a line is needed here or under the ethics considerations to clarify this). 

      The 5% weight limit was considered acceptable due to the short deployment period, and we now edited the ethics statement to emphasize this point. However, it is important to note that there is no real international standard, with both 3% and 5% weight limits being commonly used. Both limits are arbitrary and the impact of a fixed mass on a bird varies with species and flight style. All owls survived and bred similarly to the non-tagged individuals in the population (lines 373 – 376 & lines 558 – 561)

      EDITORIAL COMMENT: We strongly encourage you to provide further context and clarification on this issue, as suggested by the Reviewer. On a related point, the ethics statement refers to GPS loggers, rather than GPS and ACC devices; we encourage you to clarify wording here.

      Thank you for highlighting this point that indeed needed some clarifications.

      Although we have used the terminology "GPS recorders", the authorization granted by the Swiss authorities for this study effectively covers the entire tracking system, which combines both GPS and ACC recorders in the same device. We have therefore changed the wording used in the ethics statement to avoid any misunderstanding (lines 373 – 376 & lines 558 – 561)

      Comment 12. Please provide more information on the model selection approach, what does "Non-significant terms were dropped via model simplification by comparing model AIC with and without terms." mean? Did the authors use a stepwise backward elimination procedure (drop1 function)? Or did they apply a complete comparison of several candidate models? I think a model comparison approach rather than stepwise selection would be more informative, as several rather than only one model could be equally probable. This might also improve model weights or might require a model averaging procedure - current reported R2values are very small and do not seem to support the results well. 

      We apologize for the lack of details about this important aspect of the statistical analysis. We applied an automated stepwise selection using the dredge function from the R package “MuMin”, therefore applying a complete comparison of several candidate models. The final models were chosen as the best models since the number of candidate models within ∆AIC<2 was relatively low in each analysis and thus a model averaging was not appropriate here. We edited the methods section to ensure clarity, and added model selection tables for each analysis, ranked according to AICc scores, in the supplementary materials (lines 532 – 552)

      In addition, we agree that the reported R-squared values in our analyses are quite low, specifically regarding the influence of pre-hunt perching force on hunting success (cond R2 = 0.04). Nonetheless, landing impact still has a notable effect size (an increase of 1N reduces hunting success by 15%). The reported values are indicative of the inherent complexity in studying hunting behavior in a wild setting where numerous variables come into play. We specifically investigated the hypothesis that the force involved during pre-hunt landings, and consequently the emitted noise, influences the success of the next hunting attempt in wild barn owls. Factors such as prey behavior and micro-habitat characteristics surrounding prey (such as substrate type and vegetation height) are most likely to be influential but hard, or nearly impossible, to model. We now cover this in a more nuanced way in the discussion (lines 266 – 268)

      Comment 13. Please explain why BirdID was nested in NightID - this is not clear to me.

      Probably here there is a misunderstanding because we wrote that we nested NightID in BirdID (and not BirdID in NightID). 

      Comment 14. I hope the final graphs and legends will be larger, they are almost impossible to read. 

      We enlarged the graphs and legends as much as possible to improve readability. However, looking at the graphs in the published version they seem clear and readable.

      Comment 15. Figure S1: Does "representation" mean the tracks don't show all of the 163 owls? If so, be precise and tell us how many are illustrated in the figure. 

      Figure S1 represent the tracks for each of the 163 barn owls used in the study. We changed the terminology used in the figure caption to avoid any misunderstanding.

      Comment 16. Figure S4: Please adjust the y-axis to a readable format. 

      Done

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer #1 comments:

      (1) SY1 aggregation enhances (in terms of number of aggregates) when Sphingolipid biosynthesis is blocked.

      a. Line no 132-133: I agree that there is circumstantial evidence that the maturation pathway of SY1 IB is perturbed by knocking down sphingolipid biosynthesis. However, to prove this formally, a time course of IB maturation needs to be reported in the knock-down strains.

      Please see Figure 2-figure supplement 1 for the time course of SY1 IB maturation in the knock-down strains. We have added the result to the manuscript, please see lines 129-131on page 5 in the revised version.

      b. It will be good to have formal evidence that sphingolipids are indeed downregulated when these genes are downregulated (knocked down).

      This issue has been clearly evidenced in previous reports, and we have added the appropriate references in the main text. For example, down-regulation of LCB1 or SPT in yeast decreased sphingolipid levels by Huang et al (https://doi.org/10.1371/journal.pgen.1002493). According to the report from Tafesse FG, et al (https://doi.org/10.1371/journal.ppat.1005188), in mammalian cells in which Sptlc2 was knocked down by CRISPR/Cas9, sphingolipid and glucosylceramide production is almost completely blocked. In addition, the levels of sphingosine, sphingomyelin, and ceramide were significantly lower compared to control cells. Please see lines 143-144 on pages 6 and lines 232-233 on pages 9 in the revised version.

      (2) In a normal cell (where sphingolipid biosynthesis is not hampered), the aggregate of SY1 (primarily the Class I aggregate) is localized only on the mitochondrial endomembrane system. These results have been published for other aggregation-prone proteins and are partly explained in the literature. However, their role in the context of maturation is relatively unclear. The authors however provide no strong evidence to show if mitochondria are preferentially involved in any of the stages of IB maturation. Specifically:

      a. Line 166-167: It is not clear from Figure 4B that this is indeed the case. Only the large IB seems to colocalize in all three panels (Class I, 2, 3) with Mitotracker. The smaller IBs in 2 and 3 do not show any obvious co-localization. It is also possible that they do co-localize, but it is not clear from the images. I would appreciate it if the authors either provide stronger evidence (better image) or revise this statement. This point is crucial in some claims made later in the manuscript. (pls see comment #5A).

      Based on the reviewer's suggestion, we replaced the images in Figure 4B. In addition, we added the 3D reconstruction results of the interrelationship between Class 3 and Mitotracker in Figure 4-figure supplement 1B, to further show their relationship.

      (3) The localization is due to the association of SY1 (aggregates) with mitochondrial proteins like Tom70, Tim44 etc. There are some critical points (that can strengthen the manuscript) that are not addressed here. Primarily, the important role of mitochondria in the context of toxicity is neglected. Although the authors have mentioned in the discussion that it was not their main focus, I believe that this is the novel part of the manuscript and this part is potentially a beautiful addition to literature. The questions I found unanswered are:

      a. Is the localization completely lost upon deleting these genes? I see only a partial loss in shape/localization. This is not properly explained in the manuscript. The shape of the IB seems to remain intact while the localization is slightly altered. This indicates that even when sphingolipid is present, SY1 localization is dictated by the (lipid-raft embedded) proteins. Interestingly, it shows that even in the absence of mitochondrial localization the shape of the aggregates is not altered in these deletion strains! How do the authors explain this if mitochondrial surface sphingolipids are important for IB maturation? (the primary screen found that sphingolipid biosynthesis promotes the formation of Class I IBs).

      We agree that mutation in one mitochondrial binding protein only a partial loss in shape/localization, and we have replaced “association” with “surrounding” in the manuscript. Please see lines 163-166 on page 6 in the revised version. In mutants that interact with SY1, we counted the proportion of Class 3 aggregates formed by SY1 and found an increase in the proportion of SY1 Class 3 aggregates in the deletion mutants compared to controls, partially lost interaction of SY1 with mitochondria has effect on shape of aggregates, as detailed in line 184 on page 7 and Figure 4-figure supplement 1D. We think that SY1 interactions with mitochondrial proteins are important for the localization of SY1 IB in mitochondria, whereas sphingolipids play an important role in facilitating the formation of Class 1 IBs from Class 3 aggregates.

      b. What happens to the toxicity when the aggregates are not localized on mitochondria?

      We thank the reviewer for the comments, however to investigate this issue, since a single mutant can only partially affect the phenotype, it may be necessary to construct groups of mutants of different genes to observe the effect, which we will further elucidate in our future studies. What we want to show in this work is that SY1 achieves binding to mitochondria by interacting with these mitochondrial proteins.

      c. It is important to note that sphingolipids may affect the whole process indirectly by altering pathways involved in protein quality control or UPR. UPR may regulate the maturation of IBs. It is therefore important to test if any of the effects seen could be of direct consequence.

      We agree with the reviewer's comments, but there was no significant enrichment for protein quality control or UPR-related pathways in our genome-wide screen, so it is unlikely that sphingolipids indirectly cause maturation of IBs by affecting these two pathways. We addressed this issue in our discussion. Please see lines 325-328 on page 12 in the revised version.

      d. In Figure 4D, the authors find SY1 when they pull down Tom70, Tom37 or Tim44. Tim44 is a protein found in the mitochondrial matrix, how do the authors explain that this protein is interacting with a protein outside the mitochondrial outer membrane?

      This interaction could be potentially due to that some of the soluble SY1 enter the mitochondrial matrix and interact with Tim44.

      e. Is it possible that the authors are immunoprecipitating SY1 since IBs have some amount of unimported mitochondrial proteins in aggregates formed during proteotoxic stress (https://doi.org/10.1073/pnas.2300475120) (Liu et al. 2023).

      Our Co-IP experiments were performed in the soluble state supernatant, so mitochondrial proteins in aggregates were not detected.

      f. Line 261 (Discussion): Does deletion of Tom70 or one of the anchors increase Class III aggregation and increase toxicity? Without this, it is hard to say if mitochondria are involved in detoxification.

      We thank the reviewer for the comments, please see our response to comment 3b.

      (4) This fuels the loss of mitochondrial function.

      a. Line 218-219: Although the change is significant, the percentage change is very slight. Is this difference enough to be of physiological relevance in mitochondrial function? In our hands, the DCF fluorescence is much more variable.

      We agree with the reviewer that there is a small difference (but significant). To which extend such a difference be of physiological relevance in mitochondrial function need to be further investigated.

      b. Is SY1-induced loss of mitochondrial function less in knockouts of Tom70 or the other ones found to be important for localizing the SY1 aggregate to mitochondria?

      We examined mitochondrial membrane potential (indicated by Rho 123 fluor intensity) in tom70Δ, tom37Δ and control his3Δ strains and found that the knocking out of Tom70 or Tom37 reduced the mitochondrial toxicity caused by SY1 expression. Please see lines 212-214 on page 8 in the revised version, and Figure 5-figure supplement 2.

      (5) Mitochondrial function is further abrogated when there is a block in sphingolipid biosynthesis.

      a. Myriosin acted like the deletion strains that showed less structured aggregates. There were more aggregates (Class 3) but visually they seemed to be spread apart. The first comment (#2A) on aggregate classes and their interaction with mitochondria may become relevant here.

      According to a recent review article (https://doi.org/10.3389/fcell.2023.1302472), sphingolipids are present in the mitochondrial membrane, bind to many mitochondrial proteins and have emerged as key regulators of mitochondrial morphology, distribution and function. Dysregulation of sphingolipid metabolism in mitochondria disrupts many mitochondrial processes, leading to mitochondrial fragmentation, impaired bioenergetics and impaired cellular function. Myriocin treatment, which affects sphingolipid metabolism, causes mitochondria to become more fragmented, which may explain why the aggregates appear visually spread apart. Regarding the interaction with mitochondria, we counted the proportion of SY1 aggregates surrounded by mitochondria after treatment with myriocin, and the results were not significantly different compared to the control. Please see lines 168-169 on page 6 in the revised version, and Figure 4-figure supplement 1C.

      (6) A similar phenomenon is conserved in mammalian cell lines.

      a. Line 225-226: Did the authors confirm that this was the only alteration in the genome? Or did they complement the phenotype, genetically?

      We performed SPTLC2 gene complementation experiments in knockout cell lines and found that SPTLC2 gene complementation was able to reduce the number of cells forming IBs and the percentage of dispersed irregular IBs compared to controls. Please see lines 240-242 on page 9 in the revised version, and Figure 6-figure supplement 2B.

      b. Line 241-245: One of the significant phenotypes observed by downregulating sphingolipid biosynthesis in yeast and mammalian cells, was the increase in the number of aggregates. This is not shown in myriocin treatment in mammalian cells. This needs to be shown to the main concordance with the original screen and the data presented with the KO mammalian cell line.

      Please see Figure 7-figure supplement 1A for the data on the proportion of cells forming SY1 IBs after myriocin treatment in mammalian cells, and myriocin treatment in mammalian cells was the same as in the KO mammalian cell line.

      Minor Comments:

      Line 273-275: How is this statement connected to the previous statement? Was it observed that aggregate fusion was advantageous to the cells?

      Yes, aggregate/oligomer fusion is advantageous to the cells, and we have modified the previous statement. Please see line 280 on page 10 in the revised version.

      Line 293-294: I am not sure I understand this statement.

      We have modified this statement. Please see lines 302-303 on page 11 in the revised version.

      Line 295-296: But the authors have commented at multiple places that mitochondria detoxify the cell from SY1 aggregates. I find this link fascinating and worth investigating. Most of the current work has some known links in literature (not everything). The mitochondrial connection being the most fascinating one.

      We have removed this sentence. We have added a validation experiment for the role of mitochondrial activity in SY1 IB maturation in the revised version.

      Line 318: Do the authors mean: The open question is...

      Thanks to the reviewer, we have corrected it.

      Response to Reviewer #2 comments:

      I recommend considering live cell microscopy to analyze whether sphingolipid-dependent formation of SY1 IB takes place at the mitochondrial outer membrane. The IBs could also be produced at other membranes and then transported to the mitochondrial outer membrane for storage.

      As shown in Figure 4A, SY1 IB primarily interacts with mitochondria.

      I recommend analyzing whether mitochondrial activity is needed for sphingolipid-dependent SY1 IB formation. Are these IBs localized to mitochondrial membrane solely as scaffold or are these organelles needed to provide the energy for driving IB formation in concert with sphingolipids? This point could be addressed with rho0 strains lacking mitochondrial DNA.

      We thank the reviewer for this recommendation. We expressed SY1 protein in BY4741 rho0 strain as suggested and found that the maturation and mitochondrial surrounding state of SY1 IB was not affected by mitochondrial activity. Please see lines 185-187 on page 7 in the revised version, and Figure 4-figure supplement 1E and 1F.

      The authors should be more precise in the statistical methods used in their study (method, pre-/post-tests, number of replicates...).

      We thank the reviewer for the comment and we have provided a more precise description of the statistical methods. Please see lines 531-534 on page 19 and figure legends in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript aims at a quantitative model of how visual stimuli, given as time-dependent light intensity signals, are transduced into electrical currents in photoreceptors of macaque and mouse retina. Based on prior knowledge of the fundamental biophysical steps of the transduction cascade and a relatively small number of free parameters, the resulting model is found to fairly accurately capture measured photoreceptor currents under a range of diverse visual stimuli and with parameters that are (mostly) identical for photoreceptors of the same type.

      Furthermore, as the model is invertible, the authors show that it can be used to derive visual stimuli that result in a desired, predetermined photoreceptor response. As demonstrated with several examples, this can be used to probe how the dynamics of phototransduction affect downstream signals in retinal ganglion cells, for example, by manipulating the visual stimuli in such a way that photoreceptor signals are linear or have reduced or altered adaptation. This innovative approach had already previously been used by the same lab to probe the contribution of photoreceptor adaptation to differences between On and Off parasol cells (Yu et al, eLife 2022), but the present paper extends this by describing and testing the photoreceptor model more generally and in both macaque and mouse as well as for both rods and cones.

      Strengths:

      The presentation of the model is thorough and convincing, and the ability to capture responses to stimuli as different as white noise with varying mean intensity and flashes with a common set of model parameters across cells is impressive. Also, the suggested approach of applying the model to modify visual stimuli that effectively alter photoreceptor signal processing is thought-provoking and should be a powerful tool for future investigations of retinal circuit function. The examples of how this approach can be applied are convincing and corroborate, for example, previous findings that adaptation to ambient light in the primate retina, as measured by responses to light flashes, mostly originates in photoreceptors.

      Weaknesses:

      In the current form of the presentation, it doesn't become fully clear how easily the approach is applicable at different mean light levels and where exactly the limits for the model inversion are at high frequency. Also, accessibility and applicability by others could be strengthened by including more details about how parameters are fixed and what consensus values are selected.

      Thank you - indeed a central goal of writing this paper was to provide a tool that could be easily used by other laboratories. We have clarified and expanded four points in this regard: (1) we have stated more clearly that mean light levels are naturally part of inversion process, and hence the approach can be applied across a broad range of light levels (lines 292-297); (2) we have expanded our analysis of the high frequency limits to the inversion and added that expanded figure to the main text (new Fig 5); (3) we have included additional detail about our calibration procedures, including our calibration code, to facilitate transfer to other labs; and, (4) we have detailed the procedure for identification of consensus parameters (line 172-182, 191-199 and Methods section starting on line 831).

      Reviewer #2 (Public Review):

      Summary:

      This manuscript proposes a modeling approach to capture nonlinear processes of photocurrents in mammalian (mouse, primate) rod and cone photoreceptors. The ultimate goal is to separate these nonlinearities at the level of photocurrent from subsequent nonlinear processing that occurs in retinal circuitry. The authors devised a strategy to generate stimuli that cancel the major nonlinearities in photocurrents. For example, modified stimuli would generate genuine sinusoidal modulation of the photocurrent, whereas a sinusoidal stimulus would not (i.e., because of asymmetries in the photocurrent to light vs. dark changes); and modified stimuli that could cancel the effects of light adaptation at the photocurrent level. Using these modified stimuli, one could record downstream neurons, knowing that any nonlinearities that emerge must happen post-photocurrent. This could be a useful method for separating nonlinear mechanisms across different stages of retinal processing, although there are some apparent limitations to the overall strategy.

      Strengths:

      (1) This is a very quantitative and thoughtful approach and addresses a long-standing problem in the field: determining the location of nonlinearities within a complex circuit, including asymmetric responses to different polarities of contrast, adaptation, etc.

      (2) The study presents data for two primary models of mammalian retina, mouse, and primate, and shows that the basic strategy works in each case.

      (3) Ideally, the present results would generalize to the work in other labs and possibly other sensory systems. How easy would this be? Would one lab have to be able to record both receptor and post-receptor neurons? Would in vitro recordings be useful for interpreting in vivo studies? It would be useful to comment on how well the current strategy could be generalized.

      We agree that generalization to work in other laboratories is important, and indeed that was a motivation for writing this as a methods paper. The key issue in such generalization is calibration. We have expanded our discussion of our calibration procedures and included that code as part of the github repository associated with the paper. Figure 10 (previously Figure 9) was added to illustrate generalization. We believe that the approach we introduce here should generalize to in vivo conditions. We have expanded the text on these issues in the Discussion (sections starting on line 689 and 757).

      Weaknesses:

      (1) The model is limited to describing photoreceptor responses at the level of photocurrents, as opposed to the output of the cell, which takes into account voltage-dependent mechanisms, horizontal cell feedback, etc., as the authors acknowledge. How would one distinguish nonlinearities that emerge at the level of post-photocurrent processing within the photoreceptor as opposed to downstream mechanisms? It would seem as if one is back to the earlier approach, recording at multiple levels of the circuit (e.g., Dunn et al., 2006, 2007).

      Indeed the current model is limited to a description of rod and cone photocurrents. Nonetheless, the transformation of light inputs to photocurrents can be strongly nonlinear, and such nonlinearities can be difficult to untangle from those occurring late in visual processing. Hence, we feel that the ability to capture and manipulate nonlinearities in the photocurrents is an important step. We have expanded Figure 10 to show an additional example of how manipulation of nonlinearities in phototransduction can give insight into downstream responses. We have also noted in text that an important next step would be to include inner segment mechanisms (section starting on line 661); doing so will require not only characterization of the current-to-voltage transformation, but also horizontal cell feedback and properties of the cone output synapse.

      (2) It would have been nice to see additional confirmations of the approach beyond what is presented in Figure 9. This is limited by the sample (n = 1 horizontal cell) and the number of conditions (1). It would have been interesting to at least see the same test at a dimmer light level, where the major adaptation mechanisms are supposed to occur beyond the photoreceptors (Dunn et al., 2007).

      We have added an additional experiment to this figure (now Figure 10) which we feel nicely exemplifies the approach. The approach that we introduce here really only makes sense at light levels where the photoreceptors are adapting; at lower light levels the photoreceptors respond near-linearly, so our “modified” and “original” stimuli as in Figure 10 (previously Figure 9) would be very similar (and post-phototransduction nonlinearities are naturally isolated at these light levels).

      Reviewer #3 (Public Review):

      Summary:

      The authors propose to invert a mechanistic model of phototransduction in mouse and rod photoreceptors to derive stimuli that compensate for nonlinearities in these cells. They fit the model to a large set of photoreceptor recordings and show in additional data that the compensation works. This can allow the exclusion of photoreceptors as a source of nonlinear computation in the retina, as desired to pinpoint nonlinearities in retinal computation. Overall, the recordings made by the authors are impressive and I appreciate the simplicity and elegance of the idea. The data support the authors' conclusions but the presentation can be improved.

      Strengths:

      -  The authors collected an impressive set of recordings from mouse and primate photoreceptors, which is very challenging to obtain.

      -  The authors propose to exploit mechanistic mathematical models of well-understood phototransduction to design light stimuli that compensate for nonlinearities.

      -  The authors demonstrate through additional experiments that their proposed approach works.

      Weaknesses:

      -  The authors use numerical optimization for fitting the parameters of the photoreceptor model to the data. Recently, the field of simulation-based inference has developed methods to do so, including quantification of the uncertainty of the resulting estimates. Since the authors state that two different procedures were used due to the different amounts of data collected from different cells, it may be worthwhile to rather test these methods, as implemented e.g. in the SBI toolbox (https://joss.theoj.org/papers/10.21105/joss.02505). This would also allow them to directly identify dependencies between parameters, and obtain associated uncertainty estimates. This would also make the discussion of how well constrained the parameters are by the data or how much they vary more principled because the SBI uncertainty estimates could be used.

      Thank you - we have improved how we describe and report parameter values in several ways. First, the previous text erroneously stated that we used different fitting procedures for different cell types - but the real difference was in the amount of data and range of stimuli we had available between rods and cones. The fitting procedure itself was the same for all cell types. We have clarified this along with other details of the model fitting both in the main text (lines 121-130) and in the Methods (section starting on line 832). We also collected parameter values and estimates of allowed ranges in two tables. Finally, we used sloppy modeling to identify parameters that could covary with relatively small impact on model performance; we added a description of this analysis to the Methods (section starting on line 903).

      -  In several places, the authors refer the reader to look up specific values e.g. of parameters in the associated MATLAB code. I don't think this is appropriate, important values/findings/facts should be in the paper (lines 142, 114, 168). I would even find the precise values that the authors measure interesting, so I think the authors should show them in a figure/table. In general, I would like to see also the average variance explained by different models summarized in a table and precise mean/median values for all important quantities (like the response amplitude ratios in Figures 6/9).

      We have added two tables with these parameters values and estimates of allowable ranges. We also added points to show the mean (and SD) across cells to the population figures and added those numerical values to the figure legends throughout.

      -  If the proposed model is supposed to model photoreceptor adaptation on a longer time scale, I fail to see why this can be an invertible model. Could the authors explain this better? I suspect that the model is mainly about nonlinearities as the authors also discuss in lines 360ff.

      For the stimuli that we use we see little or no contribution of slow adaptation in phototransduction. We have expanded the description of this point in the text and referred to Angueyra et al (2022) which looks at this issue in more detail for primate cones (paragraph starting on line 280).

      -  The important Figures 6-8 are very hard to read, as it is not easy to see what the stimulus is, the modified stimulus, the response with and without modification, what the desired output looks like, and what is measured for part B. Reworking these figures would be highly recommended.

      We have reworked all of the figures to make the traces clearer.

      -  If I understand Figure 6 correctly, part B is about quantifying the relative size of the response to the little first flash to the little second flash. While clearly, the response amplitude of the second flash is only 50% for the second flash compared to the first flash in primate rod and cones in the original condition, the modified stimulus seems to overcompensate and result in 130% response for the second flash. How do the authors explain this? A similar effect occurs in Figure 9, which the authors should also discuss.

      Indeed, in those instances the modified stimulus does appear to overcompensate. We suspect this is due to differences in sensitivity of the specific cells probed for these experiments and those used in the model construction. We now describe this limitation in more detail (lines 524-526). A similar point comes up for those experiments in which we speed the photoreceptor responses (new FIgure 9B), and we similarly note that the cells used to test those manipulations differed systematically from those used to fit the model (lines 558-560).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I only have a few minor questions and suggestions for clarification.

      It hasn't become fully clear to me how general the model is when different mean light levels (on long-time scales) are considered. Are there slow adaptation processes not captured in the model that affect model performance? And how should one go about setting the mean light level when, for example, probing ganglion cells with a stimulus obtained through model inversion? Should it work to add an appropriate DC component to the current that is provided as input to the inverted model? (Presumably, deriving a stimulus and then just adding background illumination should not work, or could this be a good approximation, given a steady state that is adapted to the background?)

      We have clarified in the main text that slow adaptation does not contribute substantially to responses to the range of stimuli we explored (lines 281-289). We have also clarified that the stimulus in the model inversion is specified in isomerizations per second - so the mean value of the stimulus is automatically included in the model inversion (lines 293-298).

      Furthermore, a caveat for the model inversion seems to be the potential amplification of high-frequency noise. The suggested application of a cutoff temporal frequency seems appropriate, but data are shown only for a few example cells. Is this consistent across cells? (Given that performance between, e.g., mouse cones can vary considerably according to Fig. 4B?) I would also like to suggest moving the corresponding Supplemental Figure (4.1) into the main part of the manuscript, as it seems quite important.

      We have added population analysis to the new Figure 5 (which was Figure 4 - Figure Supplement 1). We have also clarified that the amplification of high frequency noise is an issue only when we try to apply model inversion to measured stimuli. When we use model inversion to identify stimuli that elicit desired responses, the target responses are computed from a linear model that has no noise, so this is not a concern in applications like those in Figures 6-10.

      Also, could the authors explain more clearly what the effect of the normalization of the estimated stimulus by the power of the true stimulus is? Does this simply reduce power at high frequency or also affect frequencies below the suggested cutoff (where the stimulus reconstruction should presumably be accurate even without normalization)?

      Indeed this normalization reduces high frequency power and has little impact on low frequencies where the inversion is accurate; this is now noted in the text (line 363). As for amplification of high frequency noise (previous comment), the normalization by the stimulus power is only needed when inverting measured responses (i.e. responses with noise) and is omitted when we are identifying stimuli that elicit desired responses (e.g. in Figures 6-10).

      While the overall performance of the model to predict photoreceptor currents is impressive, it seems that particular misses occur for flashes right after a step in background illumination and for the white-noise responses at low background illumination (e.g. Figure 1B). Is that systematic, and if so what might be missing in the model?

      Indeed the model (at least with fixed parameters across stimuli) appears to systematically miss a few aspects of the photoreceptor responses. These include the latency of the response to a bright flash and the early flashes in the step + flash protocol in Figure 1B. Model errors for the variable mean noise stimulus (Figure 2) showed little dependence on time even when responses were sorted by mean light level and by previous mean level. Model errors did not show a clear systematic dependence on light level; this likely reflects, at least in part, the use of mean-square-error to identify model parameters. We have expanded our discussion of these systematic errors in the text (lines 164-166).

      I was also wondering whether this is related to the fact that in Figure 9B, the gain in the modified condition is actually systematically higher when there is more background light. Do the authors think that this could be a real effect or rather an overcompensation from the model? (By the way, is it specified what "Delta-gain" really is, i.e., ratio or normalized difference?)

      We suspect this is an issue with the sensitivity of the specific cells for which we did these experiments (i.e. variability in the gamma parameter between cells). This sensitivity varies between cells, and such variations are likely to place the strongest limitation on our ability to use this approach to manipulate responses in different retinas. We now note those issues in the Results (lines 523-526, 557-559 and 591-593) with reference to Figures 9 (previously Figure 8) and 10 (previously Figure 9), and describe this limitation more generally in the Discussion (section starting on line 649). We have also changed delta-gain to response ratio, which seemed more intuitive.

      Maybe I missed this, but it seems that the parameter gamma is fitted in a cell-type-specific fashion (e.g. line 163), but then needs to be fixed for held-out cells. How was this done? Is there much variability of gamma between cells?

      There is variability in gamma between cells, and this likely explains some of systematic differences between data and model (see above and Methods, lines 902-903). For the consensus models in Figure 2B, gamma was allowed to vary for each cell while the remaining consensus model parameters were fixed. Gamma was set equal to the mean value across cells for model inversion (i.e. for all of the analyses in Figures 4-10). We have described the fitting procedure in considerably more detail in the revised Methods (starting on line 832).

      For completeness, it would be nice to have the applied consensus model parameters in the manuscript rather than just in the Matlab code (especially since the code has not been part of the submission). Also, some notes on how the numerical integration of the differential equations was done would be nice (time step size?).

      We have added tables with consensus parameters and estimates of the sensitivity of model predictions to each parameter. We have also added additional details about the numerical approaches (including the time step) to Methods.

      Similarly, it would be nice to explicitly see the relationships that are used to fix certain model parameters (lines 705ff). And can the constants k and n (lines 709-710) be assumed identical for different species and receptor types?

      We have added more details to the model fitting to the methods, including the use of steady-state conditions to hold certain parameters fixed (lines 862 and 866). We are not aware of any direct comparisons of k and n across species and receptor types. We have noted that model performance was not improved by modest changes in these parameters (due to compensation by other model parameters). More generally, we have explained how some parameters trade for others and hence the logic of fixing some even when exact values were not available.

      For the previous measurements of m and beta (lines 712-713), is there a reference or source?

      We have added references for these values.

      Did the authors check for differences in the model parameters between cone types (e.g., S vs. M)?

      We did not include S cones here. They are harder to record from and collecting a fairly large data set across a range of stimuli would be challenging. Our previous work shows that S cones have slower responses than L and M cones, and this would certainly be reflected in differences in model parameters. We have noted this in the text (Methods, line 808-810).

      For the stated flash responses time-to-peak (lines 183-184), is this for a particular light intensity with no background illumination?

      Those are flashes from darkness - now noted in the text.

      Figure 2 - Supplement 1 doesn't have panel labels A and B, unlike the legend.

      Fixed - thank you.

      Reviewer #2 (Recommendations For The Authors):

      (1) Fig. 2B - for some cells, the consensus model seems to fit better than the individual model. How is this possible?

      This was mostly an error on our part (we inadvertently included responses to more stimuli in fitting the individual models, which slightly hampered their performance). Even with this correction, however, a few cells remain for which the consensus model outperforms and individual model. We believe this is because there is more data to constrain model parameters for the consensus models (since they are fit to all cells at the same time), and that can compensate for improvements associated with customizing parameters to specific cells.

      (2) Fig. 2 Supplement 1, it would be useful to see a blow-up of the data in an inset, as in Fig. 2B.

      Thanks - added.

      (3) Line 400 - this paragraph could include additional quantification and statistics to back up claims re 'substantially reduced', 'considerably lower'.

      We quantify that in the next sentence by computing the mean-square-error between responses and sinusoidal fits (also in Figure 7B, which now includes statistics as well). We have made that connection more direct in the text.

      (4) Maybe a supplement to Fig. 8 could show the changes to the stimulus required to alter the kinetics in both directions - to give more insight into part B., especially.

      Good suggestion - we have added the stimuli to all of the panels of the figure (now Figure 9).

      (5) Fig. 8B - in 'Speed response up' condition - there seems to be error in the model for the decay time of the response - especially for the 'original' condition, which is not quantified in 8C. Was it generally difficult to predict responses to flashes?

      That seems largely to reflect that the cells used for those experiments had faster initial kinetics than the average cells (responses to the control traces are also faster than model predictions in these cells - black traces in Figure 9B). We have added this to the text.

      (6) Line 678, possibly notes that 405 nm equally activates S and M photopigments in mice, since most of the cones co-express the two photopigments (Rohlich et al., 1994; Applebury et al., 2000; Wang et al., 2011).

      Thanks - we have added this (lines 827-829).

      (7) The discussion could include a broader description of the various approaches to identifying nonlinearities within retinal circuitry, which include (incomplete list): recording at multiple levels of the circuit (e.g., Kim and Rieke 2001; Rieke, 2001; Baccus and Meister, 2002; Dunn et al., 2006; 2007; Beaudoin et al., 2007; Baccus et al., 2008); recording currents vs. spiking responses in a ganglion cell (e.g., Kim and Rieke, 2001; Zaghloul et al., 2005; Cui et al., 2016); neural network modeling approaches (e.g., Maheswaranathan et al., 2023); optogenetic approaches to studying filtering/nonlinear behavior at synapses (e.g., Pottackal et al., 2020; 2021).

      Good suggestion - we have added this to the final paragraph of the Discussion.

      Reviewer #3 (Recommendations For The Authors):

      -  I am personally not a fan of the style: "... as Figure 4A shows..." or comparable and much prefer a direct "We observe that X is the case (Figure 4A)". If the authors agree, they may want to revise their paper in this way.

      We have revised the text to avoid the “... as Figure xx shows” construction. We have retained multiple instances which follow a “Figure xx shows that …” construction (which is both active rather than passive and does not use a personal pronoun).

      -  I am not a fan of the title. Light-adaption clamp caters only to a very specialized audience.

      We have changed the title to “Predictably manipulating photoreceptor light responses to reveal their role in downstream visual responses.”

      -  The parameter fitting procedure should not only be described in Matlab code, but in the paper.

      Thanks - we have expanded this in the Methods considerably (section starting on line 832).

      -  The authors should elaborate on why different fitting procedures were used.

      We did not describe that issue clearly. The fitting procedures used across cells were identical, but we had different data available for different cell types due to experimental limitations. We have substantially revised that part of the main text to clarify this issue (paragraph starting on line 121).

      -  The authors state in line 126 that the input stimulus is supposed to mimic eye movements mouse, monkey, or human? Please clarify.

      Thanks - we have changed this sentence to “abrupt and frequent changes in intensity that characterize natural vision.”

      -  Please improve the figure style. For example, labels should be in consistent capitalization and ideally use complete words (e.g. Figure 2B, 4B, and others).

      We have made numerous small changes in the figures to make them more consistent.

      -  Is the fraction of variance calculated on held-out-data? Linear models should be added to Figure 2B.

      The fraction of variance explained was not calculated on held out data because of limitations in the duration of our recordings. Given the small number of free parameters, and the ability of the model to capture held out cells, we believe that the model generalizes well. We have added a supplemental figure with linear model performance (Figure 2 - Figure Supplement 2).

      -  Fig. 9A is lacking bipolar cell and amacrine cell labels. Currently, it looks like HC is next to the BC in the schematic.

      Thanks - we have updated that figure (now Figure 10A)

      -  Maybe I am misunderstanding something, but it seems like the linear model prediction shown in Figure 2A for the rod could be easily improved by scaling it appropriately. Is this impression correct or why not?

      We have clarified how the linear model is constructed (by fitting the linear model to low contrast responses of the full model at the mean stimulus intensity). We also added a supplemental figure, following the suggestion above, showing the linear model performance when a free scaling factor is included for each cell.

      -  The verification experiment in Fig. 5 is only anecdotal and is elaborated only in Figure 6. If I am not mistaken, this does not necessitate its own figure/section but could rather be merged.

      We have kept this figure separate (now Figure 6) as we felt that it was important to highlight the approach in general in a figure before getting into quantification of how well it works.

      -  Figure 5 right is lacking labels. What is red and grey?

      Thanks for catching that - labels are added now.

      -  The end of the Discussion is slightly unusual. Did some text go missing?

      Thanks - we have rearranged the Discussion so as not to end on Limitations.

      -  There is a bonus figure at the end which seems also not to belong in the manuscript.

      Thanks - the bonus figure is removed now.

      -  The methods should also describe briefly what kind of routines were used in the Matlab code, e.g. gradient descent with what optimizer?

      We’ve added that information as well.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their positive assessment of our manuscript. We agree that there are some further experiments suggested by the reviewers that would enhance our study. We have highlighted further proposed experimental work in bold for clarity.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      1. EVIDENCE, REPRODUCIBILITY AND CLARITY Summary: The Matrix 2 (M2) protein of influenza A virus (IAV) is a single pass transmembrane protein known to act as a tetrameric ion channel that is important for both viral entry and egress. The paper by Figueras-Nova et al. entitled "Caspase cleavage of Influenza A virus M2 disrupts M2-LC3 interaction and regulates virion production" reports on the regulation of IAV virion production through a regulatory interplay between a caspase cleavage site and a LC3 interacting region (LIR) motif in M2. In its C-terminal cytoplasmic tail the IAV M2 protein contains a C-terminal LIR motif interacting with LC3. The authors show that this LIR motif is preceded by a functional caspase cleavage motif cleaved predominantly by caspase-6, with some contribution from caspase-3: The motif 82-SAVD-85 directs cleavage after the aspartate (D) at position 85. The cleavage leads to loss of the remaining C terminal sequence from amino acid 86 to 97. The core LIR motif 91-FVSI-94 LIR motif is then lost from M2 which can no longer bind LC3. As previously described by the same group using point mutations in the LIR motif (Ref 12.), loss of a functional LIR., here by caspase- mediated deletion of the LIR, affects the virion production and inhibits filamentous budding. LC3B lipidation is increased upon treatment with a caspase inhibitor. The authors show for the first time that LC3 is included into IAV virions via binding to M2. Furthermore, they also report a co-crystal structure of the M2 C terminus (aa 70-97), containing the caspase cleavage site and LIR, and LC3B (aa 3-125) adding new insights into this interaction and showing that the caspase cleavage site is in a flexible region N-terminal to the LIR. This work shows how caspase cleavage may modulate LC3B lipidation, trafficking to the plasma membrane, incorporation of LC3B in the virions, filamentous budding and virion production (viral titer).

      Major comments: The findings reported here are very well supported by the data shown. This is a very clearly written paper with well described and nicely visualized results that are accompanied by adequate statistical analyses.

      We thank the reviewer for their assessment of our manuscript.

      The authors report a new way the LC3B binding to the C-terminal tail of the M2 proteins is regulated and suggest that this is an adaptation the virus has made to adjust virion production to host cell status by hijacking the function of host caspases. They show that the caspase cleavage motif is evolutionary conserved and use that as an argument. Perhaps it could be discussed if it also could be an argument that the host protects itself against a too massive virion production as this could be too detrimental to the host? Would it not also be an evolutionary advantage to the virus in the long run by avoiding killing the host?

      This is an interesting point. We agree there could be advantage for the virus not to overproduce virions under certain circumstances. Consistent with this caspase-6 deficient mice had increased mortality in response to IAV PR8 infection, and presented and increase in viral spread in the lungs (Zheng, 2021; doi: 10.1016/j.cell.2020.03.040). This is also relevant for the comments made by Reviewer 2. The manuscript will be updated to include a discussion of this point.

      A question I may raise which is optional as it may be too much work to address as part of this study is if the reported regulation of LC3B binding has any role in regulating the ion channel function of the M2 tetramer?

      It is well established that there is no impact of distal C-terminal truncations on M2 ion channel activity (Cady et al., 2009, doi: 10.1021/bi9008837 Schnell and Chou, doi: 10.1038/nature06531; Nguyen et al., 2008, doi: 10.1021/bi801315m; Tobler et al., 1999, doi: 10.1128/jvi.73.12.9695-9701.1999). This is also consistent with data from our lab (Ulferts et al., 2021, doi: 10.1016/j.celrep.2021.109899, Beale et al., 2014, doi: 10.1016/j.chom.2014.01.006) as well as others (Ren et al., 2015, doi: 10.1128/JVI.00576-15) showing the effects of the LIR motif and the proton channel are distinct. We appreciate the reviewer suggesting further work here as optional, but there is already compelling evidence to show there is no substantial effect of the LIR motif on ion channel activity. (See also Reviewer 2 points 4 and 5).

      Minor comments: Delete "with" in line 145.

      This will be changed in the updated manuscript.

      Line 217: It should be written more specifically how "cells were surface stained with M2"

      The protocol for surface staining of M2 will be explained in more detail in the updated manuscript.

      1. SIGNIFICANCE

      This is a very well performed study with a sound experimental strategy and well performed assays with clear results increasing our insight into the interplay between the Influenza A virus and host cells. Although caspase mediated cleavage of the autophagy receptor and signaling scaffold protein p62 (Ref. 25), removing the LIR and LC3-binding, has been reported before I consider this study as novel in reporting this type of regulation of LC3 binding. The cleavage of p62 deletes a large part of the protein while here it is a "clean" deletion of the LIR sequence representing a conceptual advance of regulation of LC3 binding. The study also reports for the first time on LC3B incorporated into virions. The effects on trafficking to the plasma membrane and viral budding and virion production are similar to those reported before (Ref. 12) using viruses with point mutations crippling the LIR motif. This research will be of interested to all studying virus- host interaction and to the autophagy field both as a non autophagic role of LC3B, and as a regulatory mechanism of LIR-LC3B interactions involving the irreversible caspase cleavage-mediated deletion of the LIR motif.

      We thank the reviewer for this assessment of our manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The influenza A virus (IAV) M2 protein is small transmembrane protein which plays a role in virus entry and egress. In a previous study, Beale et al. (2014) identified an LC3-interacting region (LIR) in the M2 cytoplasmic domain that was found to recruit the LC3B protein to the plasma membrane. Recombinant IAV harboring mutations in the LIR motif showed reduced particle stability and lost filamentous morphology.

      In the present study, Figueras-Novoa et al. show that the LIR motif is removed in response to activation of cellular caspases. The authors demonstrate that in in IAV-infected THP-1 cells M2 is partially cleaved at the motif (82)SAVD(85)¯A by caspase 6. Caspase inhibitors abolished cleavage, and a mutant virus harboring the D85A substitution was found to be resistant to caspase action. A crystal structure of purified M2 C- terminus and LC3B revealed that the caspase cleavage site lies in a flexible region that is accessible to caspases.

      Mutant virus encoding a truncated M2 protein (M2D86-97) was unable to interact with LC3, in accordance with the absence of the LIR motif. The M2D86-97 mutant showed reduced lipidation of LC3, while enhanced lipidation of LC3 was observed when wild-type virus-infected cells were treated with caspase inhibitors. The authors also observed that cell surface transport of M2D86-97 but not M2-D85A was impaired. However, in purified virus particles a mix of cleaved and uncleaved M2 was detected. The authors also demonstrated that lipidated LC3B was present in purified virions of wild-type virus particles but even more abundant in M2-D85A virions. Finally, M2D86-97 mutants produced significantly less infectious particles compared to wild-type virus while the D85A cleavage mutant replicated to similar titers than wt virus.

      Based on these findings the authors concluded that caspases regulate the interaction of M2 protein with LC3 which impacts virion production. Specifically, they propose that caspase-mediated removal of the LIR motif may enable a switch between filamentous and non-filamentous budding in response to depletion of cellular resources. However, the authors were unable to rescue a filamentous IAV with a truncated M2 protein and therefore could not provide direct proof for their guess.

      While the data are sound and presented well, they do not support the conclusions of the authors.

      1. To the authors opinion, the conserved caspase cleavage site in the M2 protein might provide an evolutionary advantage for the virus. However, the M2-D85A mutation has no effect on viral replication, so the biological significance of why M2 needs to be cleaved at all is unclear. The conclusion that caspase-induced M2 cleavage is a fine-tuning mechanism of IAV has not been supported by experiments.

      We thank the reviewer for the assessment of our data. We think the reviewer is specifically objecting to the phrase “We conclude that this highly conserved interaction and cleavage act as a regulatory mechanism exploited by IAV to fine-tune virion production in different cellular contexts.” This is a reasonable inference from our results, but we accept that it is not proven. We will change the wording to make it clear this has not been definitively demonstrated.

      1. The finding that the permanently truncated IAV M2 mutant virus was substantially attenuated does not necessarily mean that abrogation of M2-LC3 interaction was responsible for this attenuation. As the M2 protein plays a role in virus budding at the plasma membrane (recruitment of M1 protein, induction of membrane curvature, membrane scission), the impaired transport of the truncated M2 protein might already explain that the virus was attenuated and that incorporation of the protein into the viral envelope was reduced.

      We will confirm this further with additional experiments using LIR mutants. Recapitulating the plasma membrane transport defect of truncated M2 with LIR mutants including the newly characterised M2D87A and M2D88A mutants and a more severe mutant with a FVSI_AAAA substitution would strongly imply this truncation mutant phenotype is due to the lack of LIR motif.

      1. It is also not clear whether the loss of the C-terminal 11 amino acids may have affected the interaction of the M2 protein with other proteins such as TRAPPC6A-delta (Zhu et al., 2017).

      This is a reasonable point, however Zhu et al., 2017 (https://doi.org/10.1128/jvi.01757-16) reported that the interaction with TRAPPC6A retains M2 intracellularly. If the phenotype observed with our truncation was due to the loss of interaction with TRAPPC6A, the opposite phenotype would be observed (more M2 in the plasma membrane with the truncated M2∆86-97 mutant). To address this point directly we will attempt to rescue an M2 mutant virus that has disrupted the reported TRAPPC6A binding site and assess M2 plasma membrane localization.

      The authors did not rule out whether the truncation of the M2 protein by 11 amino acids would have an effect on proton channel activity. Proton channel activity, however, might be important to preserve the metastable conformation of HA in the secretory pathway and might be also important for virus uncoating.

      M2D86-97 induced less LC3 lipidation than wild-type M2 or the D85A mutant. The remaining lipidation was attributed to the ion channel activity of the M2 protein. Can the authors rule out that the truncation of the M2 protein led to reduced ion channel activity which in turn led to reduced LC3B lipidation?

      We have addressed points 4 and 5 in response to Reviewer 1.

      The suggested role of caspase cleavage as a regulatory switch between filamentous and spherical virions (lines 304- 313) is highly speculative as long as the authors do not provide any experimental proof for it. The authors indicated that they were unable to rescue filamentous IAV with M2D86-97. However, would it be possible to use caspase inhibitors to test their hypothesis?

      We acknowledge that M2∆86-97 could not be rescued in a filamentous background. The use of caspase inhibitors would only increase the amount of full length M2 present, and does not provide an alternative strategy for increasing the proportion of truncated M2. However, since M2∆86-97 mutant could not be rescued, we will attempt to rescue additional LIR motif mutants to address this point. In particular, D87A and D88A mutants could be generated in a MUd background, as well as the F91S mutant.

      The authors used only the PR8 strain for their studies, a highly cell culture-adapted strain with spherical morphology. Are the findings obtained with this strain are also valid for others IAV strains?

      As we highlight in Figure 2I, both the caspase cleavage motif and LIR motif are highly conserved in human IAV strains. PR8 was used as it is the reverse genetic system in use and approved for use in the lab. We will attempt to address this by testing whether other IAV strains we are able to obtain also undergo caspase mediated cleavage of M2. If possible, we will obtain recent clinical isolates to show cleavage of M2 in a strain that has not adapted to cell culture.

      1. The authors mainly used the THP-1 cells for their studies, a human macrophage-like cell line. However, human IAV mostly replicate in epithelial cells of the respiratory tract and cause only abortive infections of macrophages. Why did the authors choose this cell line? Can the findings obtained with this cell line be translated to epithelial cells of the airways?

      THP-1 cells are widely used for the study of caspase activity. However, we also show M2 cleavage in MDCK cells and HAP1 cells. PR8 infection of A549 cells does not induce significant amounts of cell death in the infection time points used and, as caspase activation is linked to cell death, we did not observe M2 cleavage in this cell type. We will attempt to infect some epithelial cell types to confirm this phenotype.

      1. Minor issues:

      2. Fig. 1C: There seem to be quite some differences in the cleavage efficiency of M2 between panels A, B, C, and D? Any explanations?

      Different cell types (THP-1 cells and HAP1 cells) are used for the experiments mentioned above, which accounts for the different amount of M2 cleavage.

      • Fig. 1: Panel E: The labeling of the first amino acids as aa 76 seems to be wrong!

      We thank the reviewer for pointing this out, this will be corrected in the updated manuscript.

      Line 147: ...caspase mediated disruption of the M2-LC3 interaction (Fig 2A-B). Should be Fig. 2A-C.

      This sentence was referring to Figure 2A-B, as it refers to LC3B lipidation and not the coIP. This sentence will be changed in the text to reflect the intended meaning.

      • Growth kinetics of the various mutant viruses are missing?

      __We will provide growth kinetics for the relevant mutants _(M2D85A and M2∆86-97).___

      • Line 195: The authors speculate that aa85 is important for viral fitness: That should be demonstrated!

      This speculation is based on the very strong conservation of D85 in human IAV strains. The importance of D85 in viral fitness (permitting cleavage of M2) is only likely to be directly demonstrable in transmission models (for example ferrets) which is not feasible or justifiable.

      Reviewer #2 (Significance (Required)):

      Authors concluded that caspases regulate the interaction of M2 protein with LC3 which impacts virion production. Specifically, they propose that caspase-mediated removal of the LIR motif may enable a switch between filamentous and non-filamentous budding in response to depletion of cellular resources. However, the authors were unable to rescue a filamentous IAV with a truncated M2 protein and therefore could not provide direct proof for their guess. +<br /> +

      • As stated in the response to the comments above, we will attempt to rescue LIR mutant viruses (____D87A and D88A) in a MUd background which would provide further support for our hypothesis. Our data has significance for the understanding of the cell biology of influenza infection as commented on by Reviewers 1 and 3.

        • Reviewer #3 (Evidence, reproducibility and clarity (Required)): Summary : In this article, the authors identify a caspase cleavage site in the influenza A virus (IAV) Matrix 2 protein (M2) that leads to a truncated form of M2 deleted from its C-term LC3-interacting region (LIR). This cleaved form of M2 is seen and accumulates starting at 12 hours post-infection. IAV expressing M2 delta 86-97 mutant, corresponding to cleaved M2, seems to disrupt LC3B localization to cell plasma membrane upon infection. The authors also show that the IAV M2 delta 86-97 has a reduced viral titer compared to IAV WT. Overall the data are quite exciting where the authors identify the specific caspase responsible for the cleavage and show the residues of M2 necessary for LC3 interaction. However, some of the data showing the consequence of the cleavage for viral replication could be better clarified.

      We thank Reviewer 3 for their kind comments and we propose further experiments to clarify the consequences of cleavage.

      Major comments: - In Fig3A-B, the authors seek to demonstrate that the localization of M2 to the plasma membrane requires LIR motif. However, the representative images for cell infected with the delta 86-97 mutant show relatively few cell are expressing M2 raising questions of the infectivity of this mutant virus or if the overall expression of M2 in this assay is less for the delta 86-97 mutant. The authors should consider first quantifying the ratio of M2 cell surface staining over total M2 staining and second re-evaluate the representative images chosen.

      __We will include more examples of permeabilised cells in which comparable numbers of cells are M2 positive between mutants. We will also include high-content microscopy based quantification to support this. __To clarify, we confirm that the quantification of M2 intensity in the plasma membrane is carried out relative to the number of M2 positive cells, as the reviewer agrees is the most accurate way. To avoid confusion, we will update figure legends to describe more accurately the quantification process. A comparison between surface M2 and total M2 cannot be done on an individual cell basis, as once cells are permeabilized (to look for internal M2), robust differentiation between surface and internal M2 is difficult. The above clarification and additional data should provide the necessary support for our conclusions.

      • In fig3E, it is unclear what is being quantified in the graph as the legend and text lines 222-223 mention that spot intensity was measured but the y axis indicates LC3 relocalization intensity. Given LC3 is punctated particularly in the cytosol, It is unclear which spots of LC3 they are referring to. Based on the images shown, using a graph with LC3 surface staining as performed for M2 would clarify the data. The authors should clarify the reporting of these data in the results section. Additionally, the images of the control non-infected cells should be added to 3C.

      We agree with the reviewer on this point. The figure will be updated to describe more accurately what is being quantified. Additionally, images for uninfected cells in 3C will be added.

      • The data in Fig4 and FigS3 need to be strengthened to be conclusive. The volcano plot in FigS3A indicates that there is more LC3B and IAV proteins in M2 D85A than M2delta86-97. However in Fig4E, both LC3 I and LC3 II are increased in virions M2 delta 86-97 compared to M2 D85A which is opposite to the authors' conclusions in lines 244-245. In other words, the total amount of lipidated LC3 is higher in virions from IAV M2 without LIR motif than M2 with LIR. LC3II/I ratio in fig4F would suggest in virions containing M2 with LIR motif, LC3B II may be preferentially incorporated compared to virions containing M2 without LIR, which incorporates both LC3B I and LC3B II. Since this is a critical point made by the authors, performing a co-immunoprecipitation of M2 D58A and M2delta86-97 in the particles and then assessing for binding of LC3 I or II would bolster their conclusions.

      Figure 4F quantifies the ratio of LC3II to LC3I in infectious particles. Another two repeats used to quantify this ratio will be shown in addition, with a better representation of increased amounts of lipidated LC3II in M2D85A infectious particles, as well as an increased LC3II/LC3I ration in said particles when compared to M2∆86-97. Because of the low yield acquired from the purification of IAV virions, performing an IP would be difficult. Even if this were technically feasible it would not prove that M2 is binding LC3 inside the virion – we do not make this claim in our paper, merely that LC3B can be detected in the purified viral particles. We will clarify this point in the revised manuscript.

      • In Fig4J, even if statistically significant, the PFU difference between M2 D85A and M2 delta86-97 is minimal, performing growth curve assay would help appreciate this difference over time. In Thp1 cells, as the authors show caspase cleavage of M2 at time point 12h 14h 16hpi etc... (fig1), they should also show PFU data at these same time points for M2 mutant D85A compared to WT and M2 delta 86-97.

      We agree with the reviewer and indeed this was a point we attempted to make in our manuscript: Figure 4J shows a statistically significant difference between the titers. However, in the text we state that, even though statistically significant, the difference is much smaller than in other titer quantifications performed. Given the nature of a plaque assay, differences of less than a log fold cannot be considered as definitively indicating biological significance. We will clarify this in a revised manuscript. We will also provide the relevant growth kinetics (as per response to Reviewer 2).

      • The title of Fig4 and FigS3 and in text line 226 should be changed as M2 incorporation into virions is not shown and not described in the text. Plus, in figS3B, the authors show that between the M2 mutants, there is no difference in the abundance of M2 and other viral proteins compared to M1.

      The title of Figures 4 and S3 will be changed to more accurately reflect all of the points made by the figure.

      • In the image shown in Fig4H the number of plaques is higher for M2delta86-97 even though the size in smaller than M2 WT. Could the authors clarify in the text of the results section how they quantify PFU in their plaque assay and if they used a size criterion when quantifying the number of plaques?

      The images of plaques are taken at different dilutions, with the M2∆86-97 image belonging to two dilutions lower than the M2WT image. We will include the calculation used for PFU/mL, which does not take into account plaque size. Furthermore, images of the whole plate, showing plaqued serial dilutions will be shown.

      • In fig3B, the legend indicates 8 hpi but on the graphs it is 9 hpi.

      We thank the reviewer for pointing out this mistake. Both should read 8 hpi, this will be corrected in the new manuscript.

      Reviewer #3 (Significance (Required)):

      The authors demonstrated that IAV M2 binding to LC3 is regulated by caspase cleavage. The authors clearly identify the cleavage site and the caspase involved: caspase 6. The cleaved form of M2 seems relevant to IAV infection as it is accumulating after 12hpi. Using a M2 mutant D85A that cannot be cleaved by caspase 6 and truncated M2 mutant delta86-97 mimicking caspase cleaved M2, the authors are able to elegantly address the role of M2 cleavage. However, the importance of M2 caspase cleavage on IAV infection is not demonstrated. Eventually, addressing the impact of the caspase cleavage of M2 LIR motif on autophagy or CASM would be interesting. - Advance: conceptual. - Audience: basic research, specialized in virology, specialized in autophagy. - Field of expertise: virology, autophagy.

      We agree with the reviewer that we have made a conceptual advance in our understanding of the cell biology of influenza A virus infection. We have also determined the structure of the terminal part of the M2 tail in complex with LC3B. The biological importance of the phenotypes we show are most likely in transmission of the virus between hosts, which for IAV would require animal experiments outside the scope of this study. We have demonstrated regulation of the LIR motif by caspase cleavage in a variety of ways, using cell biological and biochemical methods. IAV is a very significant human and animal pathogen, and we believe we have made an important advance in describing a host-pathogen interaction of relevance for viral egress.

    1. Author response:

      Reviewer #1 (Public Review):

      Weaknesses:

      There are some minor weaknesses.

      Notably, there are not a lot of new insights coming from this paper. The structural comparisons between MCC and PCC have already been described in the literature and there were not a lot of significant changes (outside of the exo- to endo- transition) in the presence vs. absence of substrate analogues.

      We agree that the structures of the human MCC and PCC holoenzymes are similar to their bacterial homologs. That is due to the conserved sequences and functions of MCC and PCC across different species.

      There is not a great deal of depth of analysis in the discussion. For example, no new insights were gained with respect to the factors contributing to substrate selectivity (the factors contributing to selectivity for propionyl-CoA vs. acetyl-CoA in PCC). The authors state that the longer acyl group in propionyl-CoA may mediate stronger hydrophobic interactions that stabilize the alpha carbon of the acyl group at the proper position. This is not a particularly deep analysis and doesn't really require a cryo-EM structure to invoke. The authors did not take the opportunity to describe the specific interactions that may be responsible for the stronger hydrophobic interaction nor do they offer any plausible explanation for how these might account for an astounding difference in the selectivity for propionyl-CoA vs. acetyl-CoA. This suggests, perhaps, that these structures do not yet fully capture the proper conformational states.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We will discuss this limitation in our revised manuscript.

      The authors also need to be careful with their over-interpretation of structure to invoke mechanisms of conformational change. A snapshot of the starting state (apo) and final state (ligand-bound) is insufficient to conclude *how* the enzyme transitioned between conformational states. I am constantly frustrated by structural reports in the biotin-dependent enzymes that invoke "induced conformational changes" with absolutely no experimental evidence to support such statements. Conformational changes that accompany ligand binding may occur through an induced conformational change or through conformational selection and structural snapshots of the starting point and the end point cannot offer any valid insight into which of these mechanisms is at play.

      Point accepted. We will revise our manuscript to use "conformational differences" instead of "conformational changes" to describe the differences between the apo and ligand-bound states.

      Reviewer #2 (Public Review):

      Comments and questions to the manuscripts:

      I'm quite impressed with the protein purification and structure determination, but I think some functional characterization of the purified proteins should be included in the manuscript. The activity of enzymes should be the foundation of all structures and other speculations based on structures.

      We appreciate this comment. However, since we purified the endogenous BDCs and the sample we obtained was a mixture of four BDCs, the enzymatic activity of this mixture cannot accurately reflect the catalytic activity of PCC or MCC holoenzyme. We will acknowledge this limitation in the discussion section of our revised manuscript.

      In Figure 1B, the structure of MCC is shown as two layers of beta units and two layers of alpha units, while there is only one layer of alpha units resolved in the density maps. I suggest the authors show the structures resolved based on the density maps and show the complete structure with the docked layer in the supplementary figure.

      We appreciate this comment. We have shown the cryo-EM maps of the PCC and MCC holoenzymes in fig. S8 to indicate the unresolved regions in these structures. The BC domains in one layer of MCCα in the MCC-apo structure were not resolved. However, we think it would be better to show a complete structure in Fig. 1 to provide an overall view of the MCC holoenzyme. We will revise Fig. 1B and the figure legend to clearly point out which domains were not resolved in the cryo-EM map and were built in the structure through docking.

      In the introduction, I suggest the author provide more information about the previous studies about the structure and reaction mechanisms of BDCs, what is the knowledge gap, and what problem you will resolve with a higher resolution structure. For example, you mentioned in line 52 that G437 and A438 are catalytic residues, are these residues reported as catalytic residues or this is based on your structures? Has the catalytic mechanism been reported before? Has the role of biotin in catalytic reactions revealed in previous studies?

      Point accepted. It was reported that G419 and A420 in S. coelicolor PCC, corresponding to G437 and A438 in human PCC, were the catalytic residues (PMID: 15518551). The same study also reported the catalytic mechanism of the carboxyl transfer reaction. The role of biotin in the BDC-catalyzed carboxylation reactions has been extensively studied (PMIDs: 22869039, 28683917). We will include these information in the introduction section of our revised manuscript.

      In the discussion, the authors indicate that the movement of biotin could be related to the recognition of acyl-CoA in BDCs, however, they didn't observe a change in the propionyl-CoA bound MCC structure, which is contradictory to their speculation. What could be the explanation for the exception in the MCC structure?

      We appreciate this comment. We do not have a good explanation for why we did not observe a change in the propionyl-CoA bound MCC structure. It is noteworthy that neither acetyl-CoA nor propionyl-CoA is the natural substrate of MCC. Recently, a cryo-EM structure of the human MCC holoenzyme in complex with its natural substrate, 3-methylcrotonyl-CoA, has been resolved (PDB code: 8J4Z). In this structure, the binding site of biotin and the conformation of the CT domain closely resemble that in our acetyl-CoA-bound MCC structure. Therefore, the movement of biotin induced by acetyl-CoA binding mimics that induced by the binding of MCC's natural substrate, 3-methylcrotonyl-CoA, indicating that in comparison with propionylCoA, acetyl-CoA is closer to 3-methylcrotonyl-CoA regarding its ability to bind to MCC. We will discuss this possibility in our revised manuscript.

      In the discussion, the authors indicate that the selectivity of PCC to different acyl-CoA is determined by the recognition of the acyl chain. However, there are no figures or descriptions about the recognition of the acyl chain by PCC and MCC. It will be more informative if they can show more details about substrate recognition in Figures 3 and 4.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We will discuss this limitation in our revised manuscript.

      How are the solved structures compared with the latest Alphafold3 prediction?

      Since AlphaFold3 was not released when our manuscript was submitted, we did not compare the solved structures with the AlphaFold3 predictions. We have now carried out the predictions using Alphafold3. Due to the token limitation of the AlphaFold3 server, we can only include two α and six β subunits of human PCC or MCC in the prediction. The overall assembly patterns of the Alphafold3-predicted structures are similar to that of the cryo-EM structures. The RMSDs between PCCα, PCCβ, MCCα, and MCCβ in the apo cryo-EM structures and those in the AlphaFold3-predicted structures are 7.490 Å, 0.857 Å, 7.869 Å, and 1.845 Å, respectively. The PCCα and MCCα subunits adopt an open conformation in the cryo-EM structures but adopt a closed conformation in the AlphaFold-3 predicted structures, resulting in large RMSDs.

    1. AbstractDefining a multicellular model can be challenging. There may be hundreds of parameters that specify the attributes and behaviors of objects. Hopefully the model will be defined using some format specification, e.g., a markup language, that will provide easy model sharing (and a minimal step toward reproducibility). PhysiCell is an open source, physics-based multicellular simulation framework with an active and growing user community. It uses XML to define a model and, traditionally, users needed to manually edit the XML to modify the model. PhysiCell Studio is a tool to make this task easier. It provides a graphical user interface that allows editing the XML model definition, including the creation and deletion of fundamental objects, e.g., cell types and substrates in the microenvironment. It also lets users build their model by defining initial conditions and biological rules, run simulations, and view results interactively. PhysiCell Studio has evolved over multiple workshops and academic courses in recent years which has led to many improvements. Its design and development has benefited from an active undergraduate and graduate research program. Like PhysiCell, the Studio is open source software and contributions from the community are encouraged.

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.128), and has published the reviews under the same license. This is part of the PhysiCell Ecosystem Series: https://doi.org/10.46471/GIGABYTE_SERIES_0003

      Reviewer 1. Meghna Verma:

      Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

      The authors have provided links for video descriptions for installation and that is appreciated.

      One overall recommendation is: If all the screenshots (for e.g.: from Fig 1-12 of the main paper and all the subsections in Supplementary) can be combined in one figure that will help enhance the complete overview and the overall flow of the paper.

      Additional comments are available here: https://gigabyte-review.rivervalleytechnologies.comdownload-api-file?ZmlsZV9wYXRoPXVwbG9hZHMvZ3gvVFIvNTA3L1Jldmlld19QaHlzaUNlbGxTdHVkaW9fTVYucGRm

      Reviewer 2. Koert Schreurs and Lin Wouters supervised by Inge Wortel

      Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is?

      The problem statement is addressed in the introduction, which mentions the need for a GUI tool as a much more accessible way to edit the XML-based model syntax. However, it is somewhat confusing who exactly the intended audience of the paper is. Is the paper targeted at researchers that already use PhysiCell, but might want to switch to the GUI version? Or should it (also) target the potential new user-base of researchers interested in using ABMs, for whom the XML version was not sufficiently accessible and who will now gain access to these models because there is a GUI? Specifying the intended audience might impact some sections of the paper. For example, for users who already use PhysiCell, the step-by-step tutorials might not be useful since they would already know most of the available options; they would just need a quick overview of what info is in which tab. But if the paper is (also) targeted at potential new users, then some additional information could make both the paper and the tool much more accessible, such as:
      
      • A clear comparison to other modeling frameworks and their functionalities. Why should they use PhysiCell instead of one of the other available (GUI) tools? For example, the referenced Morpheus, CC3D and Artistoo all focus on a different model framework (CPMs); this might be worth mentioning. And what about Chaste? Does it represent different types of models, or are there other reasons to consider PhysiCell over Chaste or vice versa? For new users, this would be important information to include. The paper currently also does not mention other frameworks except those that offer a GUI. While the main point of the paper is the addition of the GUI, for completeness sake it might still be good to mention a broader overview of ABM frameworks and how they compare to PhysiCell, or simply to refer to an existing paper that provides such an overview.
      • The current tutorial immediately dives into very specific instructions (what to click and exact values to enter), often without explaining what these options mean or do. New users would probably appreciate to get a rough outline of which types of processes can be modelled, and which steps they would take to do so. This could be as easy as summarising the different main tabs before going into the details. I understand that some of these explanations will overlap with the main PhysiCell software – but considering that the GUI will open up modelling to a different type of community, it might make sense to outline them here to get a self-contained overview of functionality.
      • Indeed, if the above information is provided, the detailed tutorial might fit better as an appendix or in online documentation. That would also leave more space to explain not only which values to enter, but also what these variables do, why choose these values, what other options to consider, etc. Having this information together in one place would be very useful for beginning users.

      Is the source code available, and has an appropriate Open Source Initiative license been assigned to the code?

      The software is available under the GPL v3 licence.

      As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code?

      There is a Github repository, ensuring that it is possible to contribute and report issues, and the paper explicitly invites community contributions. However, although the paper mentions that it is possible to seek support through Github Issues and “Slack channels”, we could find no link to the latter resource. This should probably be added to make this resource usable for the reader (or otherwise the statement should be removed)

      Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

      Mostly yes, as installation and deployment are outlined in the paper and documentation. However, we did notice a couple of issues: - The studio guide explains how to compile a project in PhysiCell (https://github.com/PhysiCell-Tools/Studio-Guide/blob/main/README.md), but does not mention that Mac users need to specify the g++ version at the top of the Makefile. This is explained in a separate blog (http://www.mathcancer.org/blog/setting-up-gcc-openmp-on-osx-homebrew-edition/) but should be outlined (or at least referenced) here as well. - There are several different resources covering the installation process, referring to e.g. github.com/physicell-training, github.com/PhysiCell-Tools/Studio-Guide, and the abovementioned blog. But this might not be very accessible to all users targeted by the new GUI functionality (especially when command line interventions and manual Makefile edits are involved). While not all of this has to be changed before publication, having all information in one place would already improve accessibility to a larger user-base. - When following the instructions (https://github.com/PhysiCell-Tools/Studio-Guide/blob/main/README.md), “python studio/bin/studio.py -p -e virus-sample” the -p flag gives an error: “Invalid argument(s): [‘-p’]”. We assumed it has to be left out, but perhaps the docs have to be updated.

      Is the documentation provided clear and user friendly?

      Mostly yes, as there is already a lot of documentation available. However, the user-friendliness could be improved with some minor changes. For example, the documentation could be made more user-friendly if resources were available from a central spot. Currently, information can be found in different places: - https://github.com/PhysiCell-Tools/Studio-Guide/blob/main/README.md provides installation instructions and a nice overview of what is where in the GUI, but as mentioned above, does not mention potential issues when installing on MacOS. - The paper provides very detailed examples; these might be nice to include along with the abovementioned overview. - Potentially other places as well. It would be great if the main documentation page could at least link to these other resources with a brief description of what the user will find there. Further, some additions would make the documentation more complete: - It would be good to have an overview somewhere of all the configuration files that can be supplied/loaded (e.g. those for “rules” and for initial configurations). - A clearer instruction/small tutorial on how to use simularium and paraview with physicell studio; especially for paraview there is no instruction on how to use your own data or make your own `.pvsm` file In the longer term, it might be worthwhile to set up a self-contained documentation website (this is relatively easy nowadays using e.g. Github pages), which can outline dependencies, installation instructions, a quick overview, detailed tutorials, example models, links to Github issues/slack communities. This is not a requirement for publication but might be worth looking into in the future as it would be more user-friendly.
      

      Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?

      No. The core functionality of the software is nicely outlined in the Github README (https://github.com/PhysiCell-Tools/Studio-Guide/blob/main/README.md), but as mentioned before, this high-level overview is missing in the paper itself. The README and paper recommend installing the Anaconda python distribution to get the required python dependencies. This is fine, but adding a setup file or requirements.txt might still be useful for users who are more familiar with python and want a more minimal installation. Providing a conda environment.yml that allows running the studio along with paraview and/or simularium might also be helpful. Note that running the studio with simularium in anaconda did not work because anaconda did not have the required vtk v9.3.0; instead we had to install simularium without anaconda (“pip3 install simularium”).

      Are there (ideally real world) examples demonstrating use of the software?

      The detail tutorial nicely walks the reader through the tool (although as mentioned before, a high-level overview is missing and the level of detail feels slightly out of place in the paper itself). When walking through the example in the paper and the supplementary, we did run into a few (minor) issues: - It might be good to stress explicitly that after copying the template.xml into tumor_demo.xml, the first step is always to compile using “make”. The paper mentions “Assuming … you have compiled the template project executable (called “project”) …”. But it might not be immediately clear to all users how exactly they should do so (presumably by running “make tumor_demo” after copying the xml file?). - When running “python studio/bin/studio.py -c tumor_demo.xml -e project” as instructed, a warning pops up that “rules0.csv” is not valid (although the tool itself still works). - The instructions for plotting say to press “enter” when changing cmin and cmax, but Mac offers only a return key. Pressing fn+return to get the enter functionality also does not work; it might be good to offer an alternative for Mac. - When reproducing the supplementary tutorial, results were slightly different. It might be good if the example would offer a random seed so that users can verify that they can reproduce these results exactly. In our hands, when reproducing figs 39, 40, 48, 49 yields way more (red) macrophages (even when running multiple times), but we could not be sure if this is due to variation between runs, or a mistake in the settings somewhere.
      
      
      The paper mentions that they have started setting up automated testing, but it does not give an idea of what the current test coverage is. Did they add a few tests here and there, or start to systematically test all parts of the software? I understand the latter might not be achievable immediately, but it would be good if users and/or contributors can at least get a sense of how good the current coverage is. (Note: the framework uses pytest, which seems to offer some functionality to generate coverage reports, see e.g. https://www.lambdatest.com/blog/pytest-code-coverage-report/). The code in studio_for_pytest.py has a comment “do later, otherwise problems sometimes”, but it is not entirely clear if the relevant issue has been resolved.
      

      Additional Comments: The presented tool offers a GUI interface to the PhysiCell framework for agent-based modeling. As outlined for the paper, this offers significant value to the users since editing a model is now much more accessible. The tool comes with extensive functionality and instructions. Overall, the tool functions as advertised, and will be of great value to the community of PhysiCell users that now have to edit XML files by hand. It is therefore (mostly) publishable as is if some of the issues with installation (mentioned above) can be straightened out. That said, we do think some improvements could make both the tool and the paper more accessible to a larger user audience. Most of these have been mentioned in the other questions, but we will list some additional ones below. Note that many of these are just suggestions, so we will leave it up to the authors if and when they implement them.

      Suggestions for the paper: While the paper nicely outlines design ideas and usage of the tool, there were some points where we felt that the main point did not quite come across, for example: - As mentioned in the question about problem statement and intended audience, adding some information to the paper would make it a more useful resource to users not yet familiar with PhysiCell (see remarks there). - The section “Design and development” describes the development history of the tool. In principle this is a valuable addition, because it illustrates how the project is under ongoing development and has already been improved several times based on feedback of users. However, the amount of information on each previous stage is slightly confusing; it is not entirely clear how this relates to the paper and current tool. If the main point is to showcase that the current tool has been built based on practical user experiences, this would probably come across better if this section was somewhat shorter and focused on the design choices rather than previous versions. If the main point is something else, it should be clarified what the main idea is. – The point of Table 1 was unclear to us – consider removing or explaining the main idea. - Several figures do not have captions (e.g. Figure 1 but also others); it would be helpful to clarify what message the figure should convey. – P4 “adjust the syntax for Windows if necessary” – is it self-explanatory how users should adjust? Consider adding the correct code for windows as well if possible, since users that want to use the GUI tool might not be familiar with command line syntax. - P6 “if you create your own custom C++ code referring directly to cell type ID” – this functionality is never discussed. This might be part of the general PhysiCell functionality, but it would be good to at least provide a link to a resource on how you could do this. - P8 “Only those parameters that display … editing the C++ code” – it was not entirely clear to me what this means, could you clarify? - P13 mentions you can immediately see changes to the model parameters made. This is very useful for prototyping when users want immediate feedback. However, what happens when you try to save output for a simulation where parameters were changed while the simulation was running? Would users be reminded that their current output is not representative? - Discussion: it is good to mention that the tool is already being used. Can you give an indication based on your experience how long it takes new users to learn to navigate the tool? This might be useful information to add in the paper. - The last statement on LLMs seems to come out of nowhere. Consider leaving it out or expanding further on what would be needed to make this work/how feasible this is.

      Further comments on the tool itelf: - The paper mentions that results may not be fully reproducible if multiple threads are used (I assume this is the case even when a random seed is set). In this case, would it make sense to throw a warning the first time a user tries to set a seed with multiple threads, to avoid confusion as to why the results are not reproducible? - Unusable fields are not always greyed out to indicate that they are disabled, which sometimes makes it seem as though the tool is unresponsive. In other places unusable options are set to grey, so it might be good to double-check if this is consistent. - At the initial conditions (IC) page there is no legend; it might be good to add one. - There are some small inconsistencies between the field names mentioned in the paper and those in the tool/screenshots. For example “boundary condition” (p5) should be “dirichlet BC”, “uptake” (p6) should be “uptake rate”. For the latter, the paper mentions that the length scale is 100 micron but this should be visible in the tool as well. - Not all fields have labels, so it is not always clear what the options do (see e.g. drop-downs in Figure 6). – There are a few points in the tool where you have to “enable” a functionality before it works, but this might not always be intuitive. For example, if you upload a file with initial conditions, it can be assumed that you want to use it. There might be good reasons for this in some cases but in general, consider if all these checkpoints are necessary or if this could be simplified. Same goes for the csv files that have to be saved separately instead of through the main “save” button – in the long term it might be worth saving all relevant files when they are updated, or at least throwing a warning that you have to save some of them separately.

    1. AbstractDespite advances in identifying genetic markers associated to severe COVID-19, the full genetic characterisation of the disease remains elusive. This study explores the use of imputation in low-coverage whole genome sequencing for a severe COVID-19 patient cohort. We generated a dataset of 79 imputed variant call format files using the GLIMPSE1 tool, each containing an average of 9.5 million single nucleotide variants. Validation revealed a high imputation accuracy (squared Pearson correlation ≈0.97) across sequencing platforms, showing GLIMPSE1’s ability to confidently impute variants with minor allele frequencies as low as 2% in Spanish ancestry individuals. We conducted a comprehensive analysis of the patient cohort, examining hospitalisation and intensive care utilisation, sex and age-based differences, and clinical phenotypes using a standardised set of medical terms developed to characterise severe COVID-19 symptoms. The methods and findings presented here may be leveraged in future genomic projects, providing vital insights for health challenges like COVID-19.

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.127 ), and has published the reviews under the same license. For a video summary from the author see: https://youtu.be/x6oVzt_H_Pk?si=Byufhl0mIL3h0K6u

      The reviews are as follows:

      Reviewer 1. Jong Bhak:

      Severe cases of covid-19 patients are critical data. This manuscript deals with detailed clinical information genome set as a subset of exome sequences and provide invaluable data for on-going global covid-19 omics studies.

      Reviewer 2. Alfredo Iacoangeli:

      The authors present the release of a new dataset that include low coverage WGS data of 79 individuals who experienced severe covid-19 in Madrid (Spain). The authors processed the data and imputed common variants and they are making this dataset available to the scientific community. They also present the clinical data of these patients in a descriptive and informative fashion. Finally, the authors also validated the quantify of their imputation, showcasing the potential of low coverage WGS as an alternative to microarrays. Overall the manuscript is written very well, clear, and exhaustive. The data is certainly valuable. Its generation and processing and analysis appears robust.
      

      Overall I support the publication of this article and dataset. I only have a small number of minor suggestions for the authors: The sentence "Traditionally, the genotyping process has relied on array technologies as the standard, both at the broader GWAS level and the more specific genetic scoring and genetic diagnostics levels" sounds a little off. I totally understand where the authors come from but given the central role of NGS and Sanger for genetic diagnostics I would suggest the authors to modify accordingly or to keep the GWAS focus.

      Please double-check the use a statistical terms in the description of the imputed data. For example: "On average, each VCF file in this rich dataset contains 9.49 million high-confidence single nucleotide variants [95%CI: 9.37 million - 9.61 million] (Figure 1)." The use of CI in this context is a little miss-leading as it is not strictly referring to a distribution of probability but to a finite collection. A range would be more appropriate. The authors say that they examined the ethnicity of the 79 individuals, however I do not think the ancestry is actually reported anywhere while a few figures show ancestral population data. The authors might clarify or correct the terminology.

      Looking at figure 2 the sentence " although the male age distribution exhibits a broader range and higher variability, suggestive of a greater" does not appear justified. The authors might want to clarify or correct accordingly.

      The sentence "This exploratory analysis highlights the diverse ways in which severe COVID-19 can present, and the importance of comprehensive and nuanced clinical phenotyping in improving our understanding and management of the disease." suggests some basic clustering might be useful. The readers might benefit from a couple of graphs or figures quantifying the overlap of the SNPs across samples and maybe one that shows the density of SNPs across the genome.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Tutak et al use a combination of pulldowns, analyzed by mass spectrometry, reporter assays, and fluorescence experiments to decipher the mechanism of protein translation in fragile X-related diseases. The topic is interesting and important.

      Although a role for Rps26-deficient ribosomes in toxic protein translation is plausible based on already available data, the authors' data are not carefully controlled and thus do not support the conclusions of the paper.

      Strengths:

      The topic is interesting and important.

      Weaknesses:

      In particular, there is very little data to support the notion that Rps26-deficient ribosomes are even produced under the circumstances. And no data that indicate that they are involved in the RAN translation. Essential controls (for ribosome numbers) are lacking, no information is presented on the viability of the cells (Rps26 is an essential protein), and the differences in protein levels could well arise from block in protein synthesis, and cell division coupled to differential stability of the proteins.

      We agree that presented data could benefit from addition of suggested experiments. We will  address the ribosome content, global translation rate and cell viability upon RPS26 depletion. We are also planning to apply polysome profiling to determine if RPS26-depleted ribosomes are translationally active.

      Specific points:

      (1) Analysis of the mass spec data in Supplemental Table S3 indicates that for many of the proteins that are differentially enriched in one sample, a single peptide is identified. So the difference is between 1 peptide and 0. I don't understand how one can do a statistical analysis on that, or how it would give out anything of significance. I certainly do not think it is significant. This is exacerbated by the fact that the contaminants in the assay (keratins) are many, many-fold more abundant, and so are proteins that are known to be mitochondrial or nuclear, and therefore likely not actual targets (e.g. MCCC1, PC, NPM1; this includes many proteins "of significance" in Table S1, including Rrp1B, NAF1, Top1, TCEPB, DHX16, etc...).

      The data in Table S6/Figure 3A suffer from the same problem.

      Tables S3 and S6 show the mass spectrometry output data from MaxQuant analysis  without any flittering.  Certain identifications, i.e. those denoted as contaminants (such as keratins) were removed during statistical analysis in Perseus software. Regarding the data presented in Table S6 (SILAC data), we argue that these data are of very good quality. More than 2000 proteins were identified in a 125min gradient, with over 80% of proteins that were identified with at least 2 unique peptides. However, we acknowledge that the description of Tables S3 and S6 may lead to misunderstanding, thus we will clarify their explanation.

      I am not convinced that the mass spec data is reliable.

      (2) The mass-spec data however claims to identify Rps26 as a factor binding the toxic RNA specifically. The rest of the paper seeks to develop a story of how Rps26-deficient ribosomes play a role in the translation of this RNA. I do not consider that this makes sense.

      Indeed, we identified RPS26 as a protein co-precipitated with FMR1 RNA containing expanded CGG repeats. However, we do not claim that they interact directly. Downregulation of FMRpolyG biosynthesis could be an outcome of the alteration of ribosomal assembly, changes in efficiency and fidelity of PIC scanning or impeded elongation or more likely combination of some of these processes. We will  provide better explanation regarding those issues in the revised version of the manuscript.

      (3) Rps26 is an essential gene, I am sure the same is true for DHX15. What happens to cell viability? Protein synthesis? The yeast experiments were carefully carried out under experiments where Rps26 was reduced, not fully depleted to give small growth defects.

      We agree with the Reviewer 1 that RPS26 is an essential protein. Previously, it was shown that cell viability in cells with mutated C-terminal deletion of RPS26 is decreased (Havkin-Solomon T, Nucleic Acids Res 2023). We will address the question regarding the suppression of FMRpolyG in models with partial RPS26 knock-down.

      (4) Knockdown efficiency for all tested genes must be shown to evaluate knockdown efficiency.

      Missing experiments showing efficiency of knock-down will be included in the revised version of the manuscript.

      (5) The data in Figure 1E have just one mock control, but two cell types (control si and Rps26 depletion).

      We will clarify this ambiguity in the revised version of the manuscripts.

      (6) The authors' data indicate that the effects are not specific to Rps26 but indeed also observed upon Rps25 knockdown. This suggests strongly that the effects are from reduced ribosome content or blocked protein synthesis. Additional controls should deplete a core RP to ascertain this conclusion.

      We agree that observed effect may stem partially from reduced ribosome content, however, we argue that this is not the only explanation. In the publication concerning RPS25 regulation of G4C2-related RAN translation (Yamada SB, 2019, Nat Neurosci), it was shown that RPS25 KO does not affect global translation. Our experiments (SUnSET assay, unpublished) indicated that RPS26 KD also did not reduce global translation rate significantly. We will present that data in the revised version of the manuscript.

      (7) Supplemental Figure S3 demonstrates that the depletion of S26 does not affect the selection of the start codon context. Any other claim must be deleted. All the 5'-UTR logos are essentially identical, indicating that "picking" happens by abundance (background).

      Results shown in Fig.S3 does not imply that RPS26 does not affect the selection of start codon context entirely. We just tested a few hypotheses. We decided to test -4 position, because this position was indicated as the most sensitive to RPS26 regulation in yeast (Ferretti M, 2017, Nat Struct Mol Biol). Regarding WebLOGO analysis; we wrote in the manuscript that we did not identify any specific motif or enrichment within analysed transcripts in comparison to background. We will clarify this ambiguity in revised version of the manuscript.

      (8) Mechanism is lacking entirely. There are many ways in which ribosomes could have mRNA-specific effects. The authors tried to find an effect from the Kozak sequence, unsuccessfully (however, they also did not do the experiment correctly, as they failed to recognize that the Kozak sequence differs between yeast, where it is A-rich, and mammalian cells, where it is GGCGCC). Collisions could be another mechanism.

      As in (7).

      Reviewer #2 (Public Review):

      Summary:

      Translation of CGG repeats leads to the accumulation of poly G, which is associated with neurological disorders. This is a valuable paper in which the authors sought out proteins that modulate RAN translation. They determined which proteins in Hela cells bound to CGG repeats and affected levels of polyG encoded in the 5'UTR of the FMR1 mRNA. They then showed that siRNA depletion of ribosomal protein RPS26 results in less production of FMR1polyG than in control. There are data supporting the claim that RPS26 depletion modulates RAN translation in this RNA, although for some results, the Western results are not strong. The data to support increased aggregation by polyG expression upon S26 KD are incomplete.

      Strengths:

      The authors have proteomics data that show the enrichment of a set of proteins on FMR1 RNA but not a related RNA.

      Weaknesses:

      - It is insinuated that RPS26 binds the RNA to enhance CGG-containing protein expression. However, RPS26 reduction was also shown previously to affect ribosome levels, and reduced ribosome levels can result in ribosomes translating very different RNA pools.

      We agree that presented data could benefit from addition of some experiments. Therefore we will address questions regarding the ribosome content, global translation rate and cell viability upon RPS26 depletion. We are also planning to apply polysome profiling to determine if RPS26-depleted ribosomes are translationally active. However, we did not state that RPS26 binds directly to RNA with expanded CGG repeats and that this interaction is crucial for translation regulation of studied RNA. We just tested such hypotheses. We will improve the text narration in revised version of the manuscript to make major conclusions clearer.

      - A significant claim is that RPS26 KD alleviates the effects of FMRpolyG expression, but those data aren't presented well.

      We thank the Reviewer 2 for this comment. We will show the data derived from a few different cell models that we already have obtained. Moreover, we will include results of experiments with luminescence readout for FMRpolyG fused with luciferase upon RPS26 KD.

      Reviewer #3 (Public Review):

      Tutak et al provide interesting data showing that RPS26 and relevant proteins such as TSR2 and RPS25 affect RAN translation from CGG repeat RNA in fragile X-associated conditions. They identified RPS26 as a potential regulator of RAN translation by RNA-tagging system and mass spectrometry-based screening for proteins binding to CGG repeat RNA and confirmed its regulatory effects on RAN translation by siRNA-based knockdown experiments in multiple cellular disease models and patient-derived fibroblasts. Quantitative mass spectrometry analysis found that the expressions of some ribosomal proteins are sensitive to RPS26 depletion while approximately 80% of proteins including FMRP were not influenced. Since the roles of ribosomal proteins in RAN translation regulation have not been fully examined, this study provides novel insights into this research field. However, some data presented in this manuscript are limited and preliminary, and their conclusions are not fully supported.

      (1) While the authors emphasized the importance of ribosomal composition for RAN translation regulation in the title and the article body, the association between RAN translation and ribosomal composition is apparently not evaluated in this work. They found that specific ribosomal proteins (RPS26 and RPS25) can have regulatory effects on RAN translation(Figures 1C, 2B, 2C, 2E, 4A, 5A, and 5B), and that the expression levels of some ribosomal proteins can be changed by RPS26 knockdown (Figure 3B, however, the change of the ribosome compositions involved in the actual translation has not been elucidated). Therefore, their conclusive statement, that is, "ribosome composition affects RAN translation" is not fully supported by the presented data and is misleading.

      We thank Reviewer 3 for critical comments and suggestions. We agree that the proposed title may be misleading and the presented data does not fully support the aforementioned statement regarding ribosomal composition affecting FMRpolyG synthesis. Hence, we will change the title together with a narrative regarding these unfortunate statements that go beyond the presented results.

      (2) The study provides insufficient data on the mechanisms of how RPS26 regulates RAN translation. Although authors speculate that RPS26 may affect initiation codon fidelity and regulate RAN translation in a CGG repeat sequence-independent manner (Page 9 and Page 11), what they really have shown is just identification of this protein by the screening for proteins binding to CGG repeat RNA (Figure 1A, 1B), and effects of this protein on CGG repeat-RAN translation. It is essential to clarify whether the regulatory effect of RPS26 on RAN translation is dependent on CGG repeat sequence or near-cognate initiation codons like ACG and GUG in the 5' upstream sequence of the repeat. It would be better to validate the effects of RPS26 on translation from control constructs, such as one composed of the 5' upstream sequence of FMR1 with no CGG repeat, and one with an ATG substitution in the 5' upstream sequence of FMR1 instead of near-cognate initiation codons.

      We will address the question regarding the influence of the content of CGG repeats and START codon selection (including different near-cognate start codons) on RPS26-sensitive translation, and present these data in revised version of the manuscript.

      (3) The regulatory effects of RPS26 and other molecules on RAN translation have all been investigated as effects on the expression levels of FMRpolyG-GFP proteins in cellular models expressing CGG repeat sequences Figures 1C, 2B, 2C, 2E, 4A, 5A, and 5B). In these cellular experiments, there are multiple confounding factors affecting the expression levels of FMRpolyG-GFP proteins other than RAN translation, including template RNA expression, template RNA distribution, and FMRpolyG-GFP protein degradation. Although authors evaluated the effect on the expression levels of template CGG repeat RNA, it would be better to confirm the effect of these regulators on RAN translation by other experiments such as in vitro translation assay that can directly evaluate RAN translation.

      We agree that there are multiple factors affecting final translation of investigated mRNA including aforementioned processes. We evaluated the level of FMR1 mRNA, which turned out not to be affected upon RPS26 depletion (Figure 2B&C), however, we will address other possibilities as well.

      (4) While the authors state that RPS26 modulated the FMRpolyG-mediated toxicity, they presented limited data on apoptotic markers, not cellular viability (Figure 1E), not fully supporting this conclusion. Since previous work showed that FMRpolyG protein reduces cellular viability (Hoem G et al., Front Genet 2019), additional evaluations for cellular viability would strengthen this conclusion.

      We thank Reviewer 3 for this suggestion. We addressed the effect of RPS26 KD on apoptotic process induced by FMRpolyG. We will perform other experiments regarding different aspects of FMRpolyG-mediated cell toxicity as well.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The mechanisms of how axonal projections find their correct target requires the interplay of signalling pathways, and cell adhesion that act over short and long distances. The current study aims to use the small ventral lateral clock neurons (s-LNvs) of the Drosophila clock circuit as a model to study axon projections. These neurons are born during embryonic stages and are part of the core of the clock circuit in the larval brain. Moreover, these neurons are maintained through metamorphosis and become part of the adult clock circuit. The authors use the axon length by means of anti-Pdf antibody or Pdf>GFP as a read-out for the axonal length. Using ablation of the MB- the overall target region of the s-LNvs, the authors find defects in the projections. Next, by using Dscam mutants or knock-down they observe defects in the projections. Manipulations by the DNs - another group of clock neurons- can induce defects in the s-LNvs axonal form, suggesting an active role of these neurons in the morphology of the s-LNvs.

      Strengths:

      The use of Drosophila genetics and a specific neural type allows targeted manipulations with high precision.

      Proposing a new model for a small group of neurons for axonal projections allows us to explore the mechanism with high precision.

      Weaknesses:

      It is unclear how far the proposed model can be seen as developmental.

      The study of changes in fully differentiated and functioning neurons may affect the interpretation of the findings.

      We appreciate the reviewer's feedback on the strengths and weaknesses of our study.

      We acknowledge the strengths of our research, particularly the precision afforded by using Drosophila genetics and a specific neural type for targeted manipulations, as well as the proposal of a new model for studying axonal projections in a small group of neurons.

      We understand the concerns about the developmental aspects of our proposed model and the use of Pdf-GAL4 >GFP as a read-out for the axonal length (revised manuscript Figure 1--figure supplement 1). However, even with the use of Clk856-GAL4 that began to be expressed at the embryonic stage (revised manuscript Figure 3--figure supplement 1) to suppress Dscam expression, the initial segment of the dorsal projection of s-LNvs (the vertical part) remained unaffected. Instead, the projection distance is severely shortened towards the midline, and this defect persists until the adult stage. It is for this reason that we delineate the dorsal projections of s-LNvs into two distinct phases: the vertical and horizontal parts, rather than a mere expansion in correspondence with the development of the larval brain.

      Thank you for your valuable feedback, and we have incorporated these considerations into our revised manuscript to enhance the clarity and depth of our research.

      Reviewer #2 (Public Review):

      Summary:

      The paper from Li et al shows a mechanism by which axons can change direction during development. They use the sLNv neurons as a model. They find that the appearance of a new group of neurons (DNs) during post-embryonic proliferation secretes netrins and repels horizontally towards the midline, the axonal tip of the LNvs.

      Strengths:

      The experiments are well done and the results are conclusive.

      Weaknesses:

      The novelty of the study is overstated, and the background is understated. Both things need to be revised.

      We appreciate your acknowledgment that the experiments were well-executed and the results conclusive. This validation reinforces the robustness of our findings.

      We take note of your feedback regarding the novelty of the study being overstated and the background being understated. While axonal projections navigate without distinct landmarks, like the midline or the layers, columns, and segments, they pose more challenges and uncertainties. As highlighted, our key contribution lies in elucidating how axonal projections without clear landmarks are guided, with our research demonstrating how a newly formed cluster of cells at a specific time and location provides the necessary guidance cues for axons.

      We value your insights, and we have carefully addressed these points in our manuscript revision to improve the overall quality and presentation of our research.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      The overall idea of using the s-LNvs as a model is indeed intriguing. There are genetic tools available to tackle these cells with great precision.

      However, based on the stage at which these cells are investigated raises some issues, that I feel are critical to be addressed.

      These neurons develop their axonal projections during embryogenesis and are fully functioning when the larvae hatch, thus to investigate axonal pathfinding one would have to address embryonic development.

      The larval brain indeed continues to grow during larval life, however extensive work from the Hartenstein lab, Truman lab, and others have shown that the secondary (larval born) neurons do not yet wire into the brain, but stall their axonal projections.

      It is thus quite unclear, what the authors are actually studying.

      One interpretation could be that the authors observe changes in axon length due to morphological changes in the brain. Indeed, the fact that the MB expands the anatomy of the surrounding neuropil changes too.

      Moreover, it is unclear when exactly the Pdf-Gal4 (and other drivers) are active, thus how far (embryonic) development of s-LNvs is affected, or if it's all happening in the differentiated, functioning neuron. (Gal4 temporal delay and dynamics during embryonic development may further complicate the issue). As far as I am aware the MB drivers might already be active during embryonic stages.

      Since the raised issue is quite fundamental, I am not sure what might be the best and most productive fashion to address this.

      Eg. either to completely re-focus the topic on "neural morphology maintenance" or to study the actual development of these cells.

      We thank the reviewer for the detailed and insightful feedback on our study. We have tested whether Pdf-Gal4 could effectively label s-LNv, and tracked the s-LNv projection in the early stage after larvae hatching. We did not observe the PDF antibody staining signal and the GFP signal driven by Pdf-GAL4 when the larvae were newly hatched. At 2-4 hours ALH, PDF signals were primarily concentrated at the end of axons, while GFP signals were mainly concentrated at the cell body. Helfrich-Förster initially detected immunoreactivity for PDF in the brains approximately 4-5 hours ALH. The GFP signal expressed by Pdf-GAL4 driver does have signal delay. However, at 8 hours ALH, the GFP signal strongly co-localized with the PDF signal within the axons (see revised manuscript lines 98-101) (Figure 1—figure supplement 1).

      Based on previous research findings and our staining of Clk856-GAL4 >GFP, it is indeed confirmed that the dorsal projection of s-LNvs in Drosophila is formed during the embryonic stage (Figure 3—figure supplement 1). The s-LNvs in first-instar larval Drosophila are capable of detecting signal output and may play a role in regulating certain behaviors. Our selection of tools for characterizing the projection pattern of s-LNv was not optimal, leading us to overlook the crucial detail that the projection had already formed during its embryonic stage.

      However, even when employing Clk856-GAL4 to suppress Dscam expression from the embryonic stage, the initial segment of the dorsal projection of s-LNvs (the vertical part) remains unaffected. Instead, the projection distance is severely shortened towards the midline, and this defect persists until the adult stage. It is for this reason that we delineate the dorsal projections of s-LNvs into two distinct phases: the vertical and horizontal parts, rather than a mere expansion in correspondence with the development of the larval brain.

      From the results searched in the Virtual Fly Brain (VFB) database (https://www.virtualflybrain.org/), it is clear that the neurons that form synaptic connections with s-LNvs at the adult stage are essentially completely different from the neurons that are associated with them at the L1 larval stage. Thus, most neurons that form synapses with s-LNvs in the early larvae either cease to exist after metamorphosis or assume other roles in the adult stage. Similar to the scenario where Cajal-Retzius cells and GABAergic interneurons establish transient synaptic connections with entorhinal axons and commissural axons, respectively, these cells form a transient circuit with presynaptic targets and subsequently undergo cell death during development. In our model, the neurons that synapse with s-LNvs in early development serve as "placeholders," offering positive or negative cues to guide the axonal targeting of s-LNvs towards their ultimate destination.

      Thank you again for your valuable feedback, and we have incorporated these considerations into our revised manuscript to enhance the clarity and depth of our research.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      In the introduction too many revisions are cited and very few actual research papers. This should be corrected and the most significant papers in the field should be cited. For example, there is no reference to the pioneering work from the Christine Holt lab or the first paper looking at axon guidance and guideposts by Klose and Bentley, Isbister et al 1999.

      The introduction should encapsulate the actual knowledge based on actual research papers.

      We acknowledge your concern regarding the citation of review papers rather than primary research papers in the introduction. Following your suggestion, we have revised the introduction section to incorporate references to relevant research papers.

      In the introduction and discussion: The authors cite revisions where the signals that guide axons across different regions including turning are shown and they end up saying: "However, how the axons change their projection direction without well-defined landmarks is still unclear." I think the sentence should be changed. Many things are still not clear but this is not a good phrasing. Maybe they could focus on their temporal finding?

      We appreciate the reviewer's feedback and insightful suggestions. We agree that emphasizing the temporal aspect is crucial in our study. However, we also recognize the significance of understanding the origin of signals that guide axonal reorientation at specific locations. While axonal projections navigating without distinct landmarks pose more challenges and uncertainties compared to those guided by prominent landmarks like the midline, our research demonstrates the crucial role of a specific cell population near turning points in providing accurate guidance cues to ensure precise axonal reorientation. We have revised our phrasing in the introduction and discussion to better reflect these key points (see revised manuscript lines 69-71 and 350-354). Thank you for highlighting the significance of focusing on our temporal findings and the complexities involved in studying axonal projection.

      Many rather old papers have looked into the effect of repulsive guideposts to guide axon projections. In particular, I can think of the paper from Isbister et al. 1999 (DOI: 10.1242/dev.126.9.2007) that not only shows how semaphoring guides Ti axon projection but also shows how the pattern of expression of sema 2a changes during development to guide the correct projection. I really think that the novelty of the paper should be revised in light of the actual knowledge in the field.

      We appreciate the reviewer's reference to the seminal work by Isbister et al. (1999) and the importance of guidepost cells in axon projection guidance, which we have already cited in our revised manuscript. It is crucial to recognize that segmented patterns such as the limb segment traversed by Ti1 neuron projections or neural circuits formed in a layer- or column-specific manner also serve as intrinsic "guideposts," offering valuable insights into axonal pathfinding processes. In our model, explicit guidance cues are lacking. As highlighted, our key contribution lies in elucidating how axonal projections without clear landmarks are guided, with our research demonstrating how a newly formed cluster of cells at a specific time and location provides the necessary guidance cues for axons (see revised manuscript lines 350-354). We have ensured that our revised manuscript reflects these insights and emphasizes the significance of studying axonal guidance in the absence of distinct guideposts. Thank you for underscoring these essential points, which enhance our understanding of axonal projection dynamics.

      Minors:

      Line 54, the authors start talking about floorplate at the end of a section on Drosophila. Please use “In vertebrates”, or “in invertebrates” or “in Drosophila” etc.. when needed to put things in context.

      We thank the reviewer for this suggestion and have modified this sentence. Please refer to lines 62-63 of the revised manuscript.

      Line 69: many factors change the axonal outgrowth. The authors are missing the paper from Fernandez et al. 2020, who have shown that unc5 the receptor of netrin induces the stalling for sLNvs projections before the turn. https://doi.org/10.1016/j.cub.2020.04.025

      We thank the reviewer for this suggestion and have added this research article. Please refer to line 79 of the revised manuscript.

      Line 99: "precisely at the pivotal juncture". It I hard to see how it was done in the figures shown. Can the authors add a small panel with neuronal staining showing this (please no HRP)?

      For all figures, tee magenta is too strong and it is really hard to see the sLNvs projections. Can this be sorted, please?

      We have depicted the pivotal juncture in the schematic diagram on the left side of Figure 1C. Additionally, we have included a separate column of images without HRP in Figure 1A. Moreover, we have modified the pseudo-color of HRP from magenta to blue to enhance the visualization of the s-LNv projection. The figure legends have also been correspondingly modified.

      Line 407: Spatial position relationship between calyx and s-LNvs. OK107-GAL4 labels ... calyx and s-LNvs labeled by, which which.

      We have modified it according to your suggestion. Please refer to lines 430-432 of the revised manuscript.

      Line 137 typo RPRC

      We thank the reviewer for noticing this mistake, which has now been corrected. Please refer to line 148-149 of the revised manuscript.

      Section 158-164. the paper from Zhang et al 2019 needs to be cited since they have found the same effect of decreasing Dscam even if they didn't think about horizontal projection.

      Thanks to the suggestion, we have included in the manuscript the phenotype observed by Zhang et al. (2019) upon knocking down Dscam1-L in adults. Please refer to lines 170-172 of the revised manuscript.

      Line 176: typo senses (instead of sensor).

      Thank you for pointing out our mistake. We have modified it according to your suggestion. Please refer to line 189 of the revised manuscript.

      Line 193: more than Interesting it is Notable. Add "ubiquitus" knockdown.

      Thank you for the suggestion. We have included the word "ubiquitus" to enhance the precision of the narrative. Please refer to line 206 of the revised manuscript.

      Line 224: the pattern of expression of the crz cells is not visible where the projections of sLNvs are located. Are they in that region? Or further away?

      We've changed the pseudo-color of HRP, and in the updated Figure 5- figure supplement 1, you can see the projection pattern of crz+ cells, positioned close to the end of the s-LNv axon terminal.

      Line 243: applied? Do you mean "used"

      Thank you for the suggestion. We have revised it at line 256.

      Figure 5 Sup1: the schematic shows DNs proliferation that is not visible on the GFP image. Please comment.

      We have modified the Figure 5 figure supplementary 1 for 120 h per-GAL4, Pdf-GAL80 >GFP expression pattern. Due to the strong GFP intensity in some DN neurons, there was a loss of GFP signal. Additionally, in Figure 6 figure supplementary 1, we have added co-localization images of DN and s-LNv at 72 h and 96 h. To better illustrate the co-localization information, we have shown only a portion of the layers in the right panel. We hope these additions clarify your concerns.

      Line 251: cite Fernandez et al. 2020 with Purohit et al 2012.

      We have modified it according to your suggestion. Please refer to line 264 of the revised manuscript.

      Line 272: you have not shown synergistic effects because you have not modulated both pathways at the same time. You should talk about complementary.

      We have modified it according to your suggestion at lines 25, 285, 439.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) Point for more elaborate discussion: Apparently the timescale of negative feedback signals is conserved between endothelial cell migration in vitro (with human cells) and endothelial migration during the formation of ISVs in zebrafish. What do you think might be an explanation for such conserved timescales? Are there certain processes within cytoskeletal tension build up that require this quantity of time to establish? Or does it relate to the time that is needed to begin to express the YAP/TAZ target genes that mediate feedback?

      The underlying mechanisms responsible for the conserved timescale is a major direction that we continue to explore. Localization of YAP/TAZ to the nucleus is likely not rate-limiting. We showed previously that acute RhoA activation produced significant YAP/TAZ nuclear localization within minutes, while subsequent co-transcriptional activity aligned with the gene expression dynamics observed here (Berlew et al., 2021). We hypothesize that the dynamics of YAP/TAZdependent transcription and the translation of those target genes are rate-limiting for initial feedback loop completion (tic = 4 hours). This is supported by work from us and others in a variety of cell lines showing YAP/TAZ transcriptional responses take place during the first few hours after activation. (Franklin et al., 2020; Mason et al., 2019; Plouffe et al., 2018) While our data identify mediators of initial feedback loop completion, the molecular effectors that determine the timescale of new cytoskeletal equilibrium establishment (teq = 8 hours) remain unclear.

      (2) Do you expect different timescales for slower endothelial migratory processes (e.g. for instance during fin vascular regeneration which takes days)?

      We selected the ISV development model because it exhibits similar migratory kinetics to our previously-explored human ECFC migration in vitro. The comparable kinetics allowed us to study dynamics of the feedback loop in vivo on similar time scales, but we have not explored models featuring either slower or faster dynamics. 

      It would be interesting to test how feedback dynamics are impacted in distinct endothelial migratory processes. Our data suggest that the feedback loop is necessary for persistent migration; however, YAP and TAZ respond to a diversity of upstream regulators in addition to mechanical signals, which might depend on the process of vascular morphogenesis. For example, after fin amputation, inflammation and tissue regeneration may impact the biochemical and mechanical environment experienced by the endothelium. Additionally, cells display different migratory behaviors in ISV morphogenesis compared to fin regeneration. During ISV formation, sprouting tip cells migrate dorsally through avascular tissue, followed by stalk cells. (Ellertsdóttir et al., 2010) In contrast, the fin vasculature regenerates by forming an intermediate vascular plexus, where some venous-derived endothelial cells migrate towards the sprouting front, while others migrate against it. (Xu et al., 2014) We are excited to study the role of this feedback loop in these different modes of neovessel formation in future studies.

      (3) Is the ~4hrs and 8hrs feedback time window a general property or does it differ between specific endothelial cell types? In the veins the endothelial cells generate less stress fibers and adhesions compared to in the arteries. Does this mean that there might be a difference in the feedback time window, or does that mean that certain endothelial cell types may not have such YAP/TAZcontrolled feedback system?

      Recent studies suggest that venous endothelial cells are the primary endothelial subtype responsible for blood vessel morphogenesis. (Lee et al., 2022, 2021; Xu et al., 2014) They are highly motile and mechanosensitive, migrating against blood flow. (Lee et al., 2022) The Huveneers group has shown that the actin cytoskeleton is differently organized in adult arteries and veins in response to biomechanical properties of its extracellular matrix, rather than intrinsic differences between arterial and venous cells. (van Geemen et al., 2014) This suggests that arterial and venous cells have distinct cytoskeletal setpoints due to mechanical cues in their environment (Price et al., 2021). We expect this to impact the degree of cytoskeletal remodeling and cell migration at equilibrium, rather than the kinetics of the feedback loop per se, though we have not yet tested this hypothesis. Testing these predictions on cytoskeletal setpoint stability and adaptation is a major direction that we continue to explore. 

      (4) The experiments are based on perturbations to prove that transcriptional feedback is needed for endothelial migration. What would happen if the feedback systems is always switched on? An experiment to add might be to analyse the responsiveness of endothelial cells expressing constitutively active YAP/TAZ.

      This is a problem that we are actively pursuing. Though the feedback system forms a coherent loop, we anticipate that the identity of the node of the loop selected for constitutive activation will influence the outcome, depending on whether that node is rate-limiting for feedback kinetics and the extent of intersection of that node with other signaling events in the cell. For example, we have observed that constitutive YAP activation drives profound changes to the transcriptional landscape including, but not limited to, RhoA signaling (Jones et al., 2023). We further anticipate that constitutive activation of feedback loop nodes may alter feedback dynamics, while dynamic or acute perturbation will be required to dissect these contributions in real time. For these reasons, ongoing work in the lab is pursuing these questions using optogenetic tools that enable precise spatial and temporal control (Berlew et al., 2021).   

      (5) To investigate the role of YAP-mediated transcription in an accurate time-dependent manner the authors may consider using the recently developed optogenetic YAP translocation tool: https://doi.org/10.15252/embr.202154401

      We are enthusiastic about the power of optogenetics to interrogate the nodes and timescales of this feedback system, and we are now funded to pursue this line of research. 

      Reviewer #2:

      The idea is intriguing, but it is not clear how the feedback actually works, so it is difficult to determine if the events needed could occur within 4 hrs. Specifically, it is not clear what gene changes initiated by YAP/TAZ translocation eventually lead to changes in Rho signaling and contractility. Much of the evidence to support the model is preliminary. Some of the data is consistent with the model, but alternative explanations of the data are not excluded. The fish washout data is quite interesting and does support the model. It is unclear how some of the in vitro data supports the model and excludes alternatives.

      Major strengths:

      The combination of in vitro and in vivo assessment provides evidence for timing in physiologically relevant contexts, and a rigorous quantification of outputs is provided. The idea of defining temporal aspects of the system is quite interesting.

      Major weaknesses:

      The evidence for a "loop" is not strong; rather, most of the data can also be interpreted as a linear increase in effect with time once a threshold is reached. Washout experiments are key to setting up a time window, yet these experiments are presented only for the fish model. A major technical challenge is that siRNA experiments take time to achieve depletion status, making precise timing of events on short time scales problematic. Also, Actinomycin D blocks most transcription so exposure for hours likely leads to secondary and tertiary effects and perhaps effects on viability. No RNA profiling is presented to validate proposed transcriptional changes.

      We thank the reviewer for these helpful suggestions. We have expanded our explanation of the history and known mediators of the feedback loop in the introduction. We and, independently, the Huveneers group recently reported that human endothelial cells maintain cytoskeletal equilibrium for persistent motility through a YAP/TAZ-mediated feedback loop that modulates cytoskeletal tension. (Mason et al., 2019; van der Stoel et al., 2020) Because YAP and TAZ are activated by tension of the cytoskeleton (Dupont et al., 2011), suppression of cytoskeletal tension by YAP/TAZ transcriptional target genes constitutes a negative feedback loop (Fig. 1A). We described key components of this cell-intrinsic feedback loop, which acts as a control system to maintain cytoskeletal homeostasis for persistent motility via modulation of Rho-ROCK-myosin II activity. (Mason et al., 2019) Both we and the Huveneers group found that perturbation of genes and pathways regulated by YAP/TAZ mechanoactivation can functionally rescue motility in YAP/TAZ-depleted cells (e.g., RhoA/ROCK/myosin II, NUAK2, DLC1). (Mason et al., 2019; van der Stoel et al., 2020) We further showed previously that both YAP/TAZ depletion and acute YAP/TAZ-TEAD inhibition consistently increased stress fiber and FA maturation and arrested cell motility, accounting for these limitations of siRNA. (Mason et al., 2019)

      Enduring limitations to the temporal, spatial, and cell-specific control of the genetic and pharmacologic methods have inspired us to initiate alternative approaches, which are the subject of ongoing efforts. Further research will be necessary in the zebrafish to determine the extent to which the observed migratory dynamics are driven by cytoskeletal arrest. 

      To identify early YAP/TAZ-regulated transcriptional changes, we have added RNA profiling of control and YAP/TAZ depleted cells cultured on stiff matrices for four hours. Genes upregulated by YAP/TAZ depletion were enriched for Gene Ontology (GO) terms associated with Rho protein signal transduction, vascular development, cellular response to vascular endothelial growth factor (VEGF) stimulus, and endothelial cell migration (Fig. 9B). These data support a role for YAP and TAZ as negative feedback mediators that maintain cytoskeletal homeostasis for endothelial cell migration and vascular morphogenesis.  

      Reviewer #3:

      The authors used ECFC - endothelial colony forming cells (circulating endothelial cells that activate in response to vascular injury).

      Q: Did the authors characterize these cells and made sure that they are truly endothelial cells - for example examine specific endothelial markers, arterial-venous identity markers & Notch signalling status, overall morphology etc prior to the start of the experiment. How were ECFC isolated from human individuals, are these "healthy" volunteers - any underlying CVD risk factors, cells from one patient or from pooled samples, what injury where these humans exposed to trigger the release of the ECPFs into the circulation, etc. The materials & methods on ECFC should be expanded.

      Human umbilical cord blood-derived ECFCs were isolated at Indiana University School of Medicine and kindly provided by Dr Mervin Yoder. Cells were cultured as described by the Yoder group (Rapp et al., 2011) and our prior paper (Mason et al., 2019). We have expanded the materials and methods section to describe the source and characterization of these cells.

      The authors suggest that loss of YAP/TAZ phenocopies actinomycin-D inhibition - "both transcription inhibition and YAP/TAZ depletion impaired polarization, and induced robust ventral stress fiber formation and peripheral focal adhesion maturation". However, the cell size of actinomycin-D treated cells (Fig. 1B, top right panel), differs from the endothelial cell size upon siYAP/TAZ (Fig. 1E, top right panel) - and vinculin staining seems more pronounced in actinomycin-D treated cells (B, bottom right) when compared to siYAP/TAZ group. Cell shape is defined by acto-myosin tension.

      Q: Besides Fraction of focal adhesion >1um; focal adhesion number did the authors measure additional parameters related to cytoskeleton remodelling / focal adhesions that can substantiate their statement on similarity between loss of YAP/TAZ and actinomycin-D treatment. Would it be possible to make a more specific genetic intervention (besides YAP/TAZ) interfering with the focal adhesion pathway as opposed to the broad spectrum inhibitor actinomyocin-D.

      Our previous paper (Mason et al., 2019) delineated the mechanistic relationships between YAP/TAZ signaling, focal adhesion turnover, actomyosin polymerization, and the intervening mechanisms of myosin regulation. Specifically, we demonstrated that YAP/TAZ regulate the myosin phosphatase kinase, NUAK2, and ARHGAP genes to mediate this feedback. Expanding on this work, the current study aimed to define the temporal kinetics of the cytoskeletal mechanotransductive feedback in vitro and in vivo. We used actinomycin-D and YAP/TAZ depletion to interrogate the role of transcriptional regulation and YAP/TAZ signaling, respectively. In this revision, we have added RNA profiling that identifies early YAP/TAZ-regulated transcriptional changes and further points to other molecular mediators of focal adhesions (e.g. TRIO, RHOB, THBS1) that will be the subjects of future studies.    

      Q: Does the actinomycin-D treatment affect responsiveness to Vegf? induce apoptosis or reduce survival of the ECFC?

      We have not looked specifically at the effect of actinomycin-D treatment on responsiveness to VEGF. However, actinomycin-D has been reported to reduce transcription of VEGF receptors (E et al., 2012). In contrast, we found that YAP/TAZ depletion upregulated GO terms associated with endothelial cell migration and response to VEGF stimulus (Fig. 9B), as well as receptors to angiogenic growth factors, including KDR and FLT4 (Fig. 9E). These results suggest YAP/TAZ depleted cells may be more sensitive to VEGF stimulation but remain nonmotile due to cytoskeletal arrest.

      We showed previously that long-term treatment with actinomycin-D reduces ECFC survival (Mason et al., 2019).

      Q: Which mechanism links ECM stiffness with endothelial surface area in the authors scenario. In zebrafish, activity of endothelial guanine exchange factor Trio specifically at endothelial cell junctions (Klems, Nat Comms, 2020) and endoglin in response to hemodynamic factors (Siekmann, Nat Cell Biol 2017) have been show to control EC shape/surface area - do these factors play a role in the scenario proposed by the authors.

      Our new transcriptional profiling indicates both Trio and endoglin are regulated through YAP and TAZ in human ECFCs. We plan to follow up on these findings.

      Q: The authors report that EC migrate faster on stiff substrate, and concomitantly these cells have a larger surface area. What is the physiological rationale behind these observations. Did the authors observe such behaviors in their zebrafish ISV model? How do these observations integrate with the tip - stalk cell shuffling model (Jakobsson & Gerhardt, Nat Cell Biol, 2011) and Notch activity in developing ISVs.

      This question raises important distinctions between the mode of migration in ISV morphogenesis and endothelial cells adherent to substrates. Cells behave and respond to mechanical cues differently in 2D vs. 3D matrices. (LaValley and Reinhart-King, 2014) Additionally, the microenvironment in vivo is much more complex, combining numerous biochemical signals and changing mechanical properties. (Whisler et al., 2023) We are actively investigating the downstream targets of YAP/TAZ mechanotransduction and how that integrates with other pathways known to regulate vascular morphogenesis, such as Notch signaling. 

      The authors examined the formation of arterial intersegmental vessels in the trunk of developing zebrafish embryos in vivo. They used a variety of pharmacological inhibitors of transcription and acto-myosin remodelling and linked the observed morphological changes in ISV morphogenesis with changes in endothelial cell motility.

      Q: Reduced formation and dorsal extension of ISVs may have several reasons, including reduced EC migration and proliferation. The Tg(fl i1a:EGFP) reporter however is not the most suitable line to monitor migration of individual endothelial cells. Can the authors repeat the experiments in Tg(fl i1a:nEGFP); Tg(kdrl:HRAS-mCherry) double transgenics to visualize movement-migration of the individual endothelial cells and EC proliferation events, in the different treatment regimes.

      So far, we have not tracked individual endothelial cells during ISV morphogenesis. We agree this is the best approach and are pursuing a similar technique for these experiments.

      ISV formation is furthermore affected by Notch signalling status and a series of (repulsive) guidance cues.

      Q: Does de novo blockade of gene expression with Actinomycin D affect Notch signalling status, expression of PlexinD - sFlt1, netrin1 or arterial-venous identify genes.

      While we have not performed gene expression analysis under the Actinomycin D condition, Actinomycin D functions as a broad transcription inhibitor. We are currently pursuing the downstream targets of YAP/TAZ mechanotransduction in both ECFCs and zebrafish.

      Remark: The authors may want to consider using the Tg(fl i1:LIFEACT-GFP) reporter for in vivo imaging of actin remodelling events.

      We thank the reviewer for their helpful suggestion.

      Remark: the authors report "As with broad transcription inhibition, in situ depletion of YAP and TAZ by RNAi arrested cell motility, illustrated here by live-migration sparklines over 10 hours: siControl: , siYAP/TAZ: (25 μm scale-bar: -)". Can the authors make a separate figure panel for this, how many cells were measured?

      Please refer to our previous publication for the complete details on this data (Mason et al., 2019). We have added the citation in the text.

      Remark: in the wash-out experiments, exposure to the inhibitors is not the same in the different scenarios - could it be that the longer exposure time induces "toxic" side effect that cannot be "washed out" when compared to the short treatment regimes?

      This is a possible limitation of the pharmacological approach and have included it in the discussion section. We are currently exploring alternative approaches to interrogate the timescale of the feedback loop more precisely.  

      References

      Berlew EE, Kuznetsov IA, Yamada K, Bugaj LJ, Boerckel JD, Chow BY. 2021. Single-Component Optogenetic Tools for Inducible RhoA GTPase Signaling. Advanced Biology 5:2100810. doi:10.1002/adbi.202100810

      Dupont S, Morsut L, Aragona M, Enzo E, Giulitti S, Cordenonsi M, Zanconato F, Le Digabel J,Forcato M, Bicciato S, Elvassore N, Piccolo S. 2011. Role of YAP/TAZ in mechanotransduction. Nature 474:179–183. doi:10.1038/nature10137

      E G, Cao Y, Bhattacharya S, Dutta S, Wang E, Mukhopadhyay D. 2012. Endogenous Vascular Endothelial Growth Factor-A (VEGF-A) Maintains Endothelial Cell Homeostasis by Regulating VEGF Receptor-2 Transcription. J Biol Chem 287:3029–3041. doi:10.1074/jbc.M111.293985

      Ellertsdóttir E, Lenard A, Blum Y, Krudewig A, Herwig L, Affolter M, Belting H-G. 2010. Vascular morphogenesis in the zebrafish embryo. Developmental Biology, Special Section: Morphogenesis 341:56–65. doi:10.1016/j.ydbio.2009.10.035

      Franklin JM, Ghosh RP, Shi Q, Reddick MP, Liphardt JT. 2020. Concerted localization-resets precede YAP-dependent transcription. Nat Commun 11:4581. doi:10.1038/s41467-02018368-x

      Jones DL, Hallström GF, Jiang X, Locke RC, Evans MK, Bonnevie ED, Srikumar A, Leahy TP, Nijsure MP, Boerckel JD, Mauck RL, Dyment NA. 2023. Mechanoepigenetic regulation of extracellular matrix homeostasis via Yap and Taz. Proceedings of the National Academy of Sciences 120:e2211947120. doi:10.1073/pnas.2211947120

      LaValley DJ, Reinhart-King CA. 2014. Matrix stiffening in the formation of blood vessels. Advances in Regenerative Biology 1:25247. doi:10.3402/arb.v1.25247

      Lee H-W, Shin JH, Simons M. 2022. Flow goes forward and cells step backward: endothelial migration. Exp Mol Med 54:711–719. doi:10.1038/s12276-022-00785-1

      Lee H-W, Xu Y, He L, Choi W, Gonzalez D, Jin S-W, Simons M. 2021. Role of Venous Endothelial Cells in Developmental and Pathologic Angiogenesis. Circulation 144:1308–1322. doi:10.1161/CIRCULATIONAHA.121.054071

      Mason DE, Collins JM, Dawahare JH, Nguyen TD, Lin Y, Voytik-Harbin SL, Zorlutuna P, Yoder MC, Boerckel JD. 2019. YAP and TAZ limit cytoskeletal and focal adhesion maturation to enable persistent cell motility. Journal of Cell Biology 218:1369–1389. doi:10.1083/jcb.201806065

      Plouffe SW, Lin KC, Moore JL, Tan FE, Ma S, Ye Z, Qiu Y, Ren B, Guan K-L. 2018. The Hippo pathway effector proteins YAP and TAZ have both distinct and overlapping functions in the cell. J Biol Chem 293:11230–11240. doi:10.1074/jbc.RA118.002715

      Price CC, Mathur J, Boerckel JD, Pathak A, Shenoy VB. 2021. Dynamic self-reinforcement of gene expression determines acquisition of cellular mechanical memory. Biophysical Journal 120:5074–5089. doi:10.1016/j.bpj.2021.10.006

      Rapp BM, Saadatzedeh MR, Ofstein RH, Bhavsar JR, Tempel ZS, Moreno O, Morone P, Booth DA, Traktuev DO, Dalsing MC, Ingram DA, Yoder MC, March KL, Murphy MP. 2011. Resident Endothelial Progenitor Cells From Human Placenta Have Greater Vasculogenic Potential Than Circulating Endothelial Progenitor Cells From Umbilical Cord Blood. Cell Med 2:85–96. doi:10.3727/215517911X617888

      Tammela T, Zarkada G, Nurmi H, Jakobsson L, Heinolainen K, Tvorogov D, Zheng W, Franco CA, Murtomäki A, Aranda E, Miura N, Ylä-Herttuala S, Fruttiger M, Mäkinen T, Eichmann A, Pollard JW, Gerhardt H, Alitalo K. 2011. VEGFR-3 controls tip to stalk conversion at vessel fusion sites by reinforcing Notch signalling. Nat Cell Biol 13:1202–1213. doi:10.1038/ncb2331

      van der Stoel M, Schimmel L, Nawaz K, van Stalborch A-M, de Haan A, Klaus-Bergmann A, Valent ET, Koenis DS, van Nieuw Amerongen GP, de Vries CJ, de Waard V, Gloerich M, van Buul JD, Huveneers S. 2020. DLC1 is a direct target of activated YAP/TAZ that drives collective migration and sprouting angiogenesis. Journal of Cell Science 133:jcs239947. doi:10.1242/jcs.239947

      van Geemen D, Smeets MWJ, van Stalborch A-MD, Woerdeman LAE, Daemen MJAP, Hordijk PL, Huveneers S. 2014. F-Actin–Anchored Focal Adhesions Distinguish Endothelial Phenotypes of Human Arteries and Veins. Arteriosclerosis, Thrombosis, and Vascular Biology 34:2059–2067. doi:10.1161/ATVBAHA.114.304180

      Whisler J, Shahreza S, Schlegelmilch K, Ege N, Javanmardi Y, Malandrino A, Agrawal A, Fantin A, Serwinski B, Azizgolshani H, Park C, Shone V, Demuren OO, Del Rosario A, Butty VL, Holroyd N, Domart M-C, Hooper S, Szita N, Boyer LA, Walker-Samuel S, Djordjevic B, Sheridan GK, Collinson L, Calvo F, Ruhrberg C, Sahai E, Kamm R, Moeendarbary E. 2023. Emergent mechanical control of vascular morphogenesis. Science Advances 9:eadg9781. doi:10.1126/sciadv.adg9781

      Xu C, Hasan SS, Schmidt I, Rocha SF, Pitulescu ME, Bussmann J, Meyen D, Raz E, Adams RH, Siekmann AF. 2014. Arteries are formed by vein-derived endothelial tip cells. Nat Commun 5:5758. doi:10.1038/ncomms6758

    1. Colletotrichum fungi infect a wide diversity of monocot and eudicot hosts, causing plant diseases on almost all economically important crops worldwide. In addition to its economic impact, Colletotrichum is a suitable model for the study of gene family evolution on a fine scale to uncover events in the genome that are associated with the evolution of biological characters important for host interactions. Here we present the genome sequences of 30 Colletotrichum species, 18 of them newly sequenced, covering the taxonomic diversity within the genus. A time-calibrated tree revealed that the Colletotrichum ancestor diverged in the late Cretaceous around 70 million years ago (mya) in parallel with the diversification of flowering plants. We

      Reviewer 1: Jamie McGowan In this study, Baroncelli and colleagues carry out a comprehensive analysis of genomic evolution in Colletotrichum fungi, an important group of plant pathogens with diverse and economically significant hosts. Their comparative genomic and phylogenomics analyses are based on the genome sequences of 30 Colletotrichum species spanning the diversity of the genus, including pathogens of dicots, monocots, and both dicots and monocots. This includes 18 genome sequences that are newly reported in this study. They also perform comparative transcriptomic analyses of 4 Colletotrichum species (2 dicot pathogens and 2 monocot pathogens) on different carbon sources. Overall, I thought the manuscript was very well written and technically sound. The results should be of interest to a broad audience, particularly to those interested in fungal evolutionary genomics and plant pathology. I only have a few minor comments. Minor comments: (1) Lines 50 - 51: "The plant cell wall (PCW) consists of many different polysaccharides that are attached not only to each other through a variety of linkages providing the main strength and structure for the PCW". I found this confusing - is the sentence incomplete? (2) Line 66: "Some Colletotrichum species show…" I think there should be a couple of introductory sentences about Colletotrichum before this. (3) Figure 1: It would be informative to label which genomes were sequenced with PacBio versus just Illumina. (4) Lines 254 - 255: "As no other enrichment was identified we performed a manual annotation of genes identified in Figure 3D". I don't think it is clear here what manual annotation this is referring to. (5) One area where I felt the analysis was lacking was the lack of analyses on genome repeat content. The authors highlight the large variation in genome sizes within Colletotrichum species (~44 Mb vs ~90 Mb) and show in Figure 1 that this correlates with increased non-coding DNA. It would have been interesting to determine if this is driven by the proliferation of particular repeat families. (6) Another concern is the inconsistent use of genome annotation methods. 12 of the genomes reported in this study were annotated using the JGI annotation pipeline, whereas the other 6 were annotated using the MAKER pipeline. Several studies (e.g., Weisman et al., 2022 - Current Biology) show that inconsistent genome annotation methods can inflate the number of observed lineage specific genes. The authors may wish to comment on this or demonstrate that this isn't an issue in their study (e.g., by aligning lineage specific proteins against the other genome assemblies).

    1. Structural variants (SVs) play a significant role in speciation and adaptation in many species, yet few studies have explored the prevalence and impact of different categories of SVs. We conducted a comparative analysis of long-read assembled reference genomes of closely related Eucalyptus species to identify candidate SVs potentially influencing speciation and adaptation. Interspecies SVs can be either fixed differences, or polymorphic in one or both species. To describe SV patterns, we employed short-read whole-genome sequencing on over 600 individuals of E. melliodora and E. sideroxylon, along with recent high quality genome assemblies. We aligned reads and genotyped interspecies SVs predicted between species reference genomes. Our results revealed that 49,756 of 58,025 and 39,536 of 47,064 interspecies SVs could be typed with short reads, in E. melliodora and E. sideroxylon

      Reviewer 1: Jakob Butler Ferguson et al have performed a thorough analysis of two species of Eucalyptus, quantifying the extent of structural variation between assembled genomes of the species and determining how prevalent those variations are across a selection of wild material. I believe this study is of sufficient quality for publication in GigaScience, if some minor inconsistencies and grammatical issues are addressed, and a few supporting analyses are performed. The major changes I would like to see include the addition of a syri plot of the complete set of SVs between E. melliodora and E. sideroxylon. I believe this, along with correcting the scale on the plots of recombination in Figure S6/S7 would allow for a better comparison of how recombination rate is interacting with the SVs. I would also suggest a more formal test of enrichment for COG terms, to better support the statements of "enrichment" in the discussion. Suggested changes by line: Line 142 - This section is quite short, I would either merge this section into the Genome scaffolding (and annotation) section, or expand on the results of the gene annotation. Line 182 - (Supplementary Figure S4) Line 183 (and throughout) - Please be consistent with your references to tables and figures. Line 186 - delete comma after 28.63% Line 194 - These are density plots rather than histograms Figure 4 - Both axes are labelled as PC1 Line 217 (page 10, line numbers are doubled up) - This seems repetitive, perhaps "…especially as they may also represent divergent sequences". Line 221 (page 11) - Please insert "and" before polymorphic translocations Line 223 - You have stated that those not successfully genotyped in both species are private or artefacts earlier in the paragraph, please reduce the repetition. Figure 6 - I don't find this figure particularly informative (and somewhat confusing to interpret). I think showing the percentages of each different SV in a visual form implies a level of equivalence in genomic impact, which is difficult to reconcile with the raw difference in numbers. I think a supplemental table with the focus on the percentages would illustrate the point better. Line 246 - There is no mention in the methods about what r threshold was used to declare a pair "correlated", please state it here or in the methods. Line 265 - This line was confusing to interpret. A suggested alteration: "significant value. After attempting to functionally annotating all genes across the genome and placing them within COG categories, 247 of the total 281 gene candidates in SSPs were annotated. These genes were enriched for...." Line 266 - I would like to see a formal enrichment analysis rather than "increased/decreased association", so we could have a clearer picture of which gene functions are truly over/underrepresented in SSPs. You could subsequently limit Figure 8 to those that show a difference. Line 275 - The grammar of this title is a bit off, perhaps "Effect of syntenic, rearranged, unaligned regions and genes on recombination rates" Line 276 - This is the first mention of p, please define it as recombination rate Line 283 - The supplemental Figure S6 and S7 seem to have regions of heightened recombination, but this is difficult to interpret and compare with the current variable axis scales. Please make these consistent. I would also like to see the syri graph of the two aligned genomes, as this would allow for a visual comparison of SV regions with recombination rate. Line 290 - How were p-values adjusted? Line 294 - More information about this 'significantly' higher recombination rate would be good, either in the figure or further expanded in the text. Line 307 - Italics for species names (repeated in Figure 10 and Figure 11 caption) Line 310 - Similar problem to line 275 Figure 10 - Having Figure 9b repeated in Figure 10 and Figure 11 is unnecessary. Line 336 - Vertical lines show average FST, not p Line 341 - Similar problem to line 275 Line 356 - translocations should be plural Line 367 - Vertical lines show average SNP density, not p Line 391 - This is the first mention of barrier loci, please define Line 413 - As mentioned above, I would recommend a formal enrichment test to support this statement Line 428 - Grammar is poor here, please correct Line 490 - Please make this a complete sentence Line 499 - Please state how the Hi-C map was manually edited, and what informed the position of those edits. Line 508 - Please provide an example of how well your LAI score of ~18 compares. The LAI paper seems to intimate that 10 is low quality? Line 513 - Missing bracket for version number Line 536 - Syntenic rather than synteny Line 717 - Formatting error in references Supp table S3-S4-S5 - Space between E. and sideroxylon

  3. Jun 2024
    1. However, by examining the bacteriome in detail, we can obtain much more information about its composition and function than diversity alone can tell us. Based on the taxonomic constitution of our samples, Proteobacteria and Actinobacteria phyla were clearly dominant both in fish skin mucus and water samples. The dominance of the Proteobacteria phylum is not an uncommon observation in fish external mucus samples1,3,5,6,8,11,21,62,63, however, differences between fish species have been observed for the other phyla1,11,62,63. Moreover, significant within-species variability in dominant phyla has been described64, and variability within individuals related to body sites should be noted12.The microbiome can be an important indicator of various pathological conditions, which has already been described in fish, for example, in the case of the gastrointestinal tract65. In this regard, the Bacteroidota phylum may be interesting, which has been highlighted as a marker for eutrophication9,66. Understanding the changes in the composition of the bacteriome or even the microbiome during different pathological conditions can be an important step in understanding and potentially diagnosing disease processes.Our results are therefore in line with the dominance of the Proteobacteria phylum observed in other fish species, but direct comparison with C. carpio is not possible due to the lack of available data. Of course, our observations on the bacteriome composition of our samples are also limited by their paramount host genome contamination, which reduced the coverage of bacterial genomes of interest in the sequencing reaction.

      Since you have the resolution to go below phylum, I think it would be interesting to focus on that more in the discussion.

    1. 17) The just man is the freest of anyone from anxiety; but the unjust man is perpetually haunted by it.

      I found this passage disturbing and I do not necessarily agree with it. I think that because we have two different people with two different moral compasses, their views on the world are polar and there is a struggle in comparing them. This "unjust" person has an opposite view of anxiety, punishment, power, fear, etc... because they are "morally wrong," and may not experience the same emotional spectrum as a person who always does the right thing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study presents a valuable tool for searching molecular dynamics simulation data, making such data sets accessible for open science. The authors provide convincing evidence that it is possible to identify useful molecular dynamics simulation data sets and their analysis can produce valuable information.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      Tiemann et al. have undertaken an original study on the availability of molecular dynamics (MD) simulation datasets across the Internet. There is a widespread belief that extensive, well-curated MD datasets would enable the development of novel classes of AI models for structural biology. However, currently, there is no standard for sharing MD datasets. As generating MD datasets is energy-intensive, it is also important to facilitate the reuse of MD datasets to minimize energy consumption. Developing a universally accepted standard for depositing and curating MD datasets is a huge undertaking. The study by Tiemann et al. will be very valuable in informing policy developments toward this goal.

      Strengths:

      The study presents an original approach to addressing a growing concern in the field. It is clear that adopting a more collaborative approach could significantly enhance the impact of MD simulations in modern molecular sciences.

      The timing of the work is appropriate, given the current interest in developing AI models for describing biomolecular dynamics.

      Weaknesses:

      The study primarily focuses on one major MD engine (GROMACS), although this limitation is not significant considering the proof-of-concept nature of the study.

      We thank the reviewer for his/her comments. Moving forward, our plan includes expanding this research to encompass other MD engines used in biomolecular simulations and materials sciences, such as NAMD, Charmm, Amber, LAMMPS, etc. However, this requires parsing associated files to supplement the sparse metadata generally available for the related datasets

      Reviewer #2 (Public Review):

      Summary:

      Molecular dynamics (MD) data is deposited in public, non-specialist repositories. This work starts from the premise that these data are a valuable resource as they could be used by other researchers to extract additional insights from these simulations; it could also potentially be used as training data for ML/AI approaches. The problem is that mining these data is difficult because they are not easy to find and work with. The primary goal of the authors was to discover and index these difficult-to-find MD datasets, which they call the "dark matter of the MD universe" (in contrast to data sets held in specialist databases).

      The authors developed a search strategy that avoided the use of ill-defined metadata but instead relied on the knowledge of the restricted set of file formats used in MD simulations as a true marker for the data they were looking for. Detection of MD data marked a data set as relevant with a follow-up indexing strategy of all associated content. This "explore-and-expand" strategy allowed the authors for the first time to provide a realistic census of the MD data in non-specialist repositories.

      As a proof of principle, they analyzed a subset of the data (primarily related to simulations with the popular Gromacs MD package) to summarize the types of simulated systems (primarily biomolecular systems) and commonly used simulation settings.

      Based on their experience they propose best practices for metadata provision to make MD data FAIR (findable, accessible, interoperable, reusable).

      A prototype search engine that works on the indexed datasets is made publicly available. All data and code are made freely available as open source/open data.

      Strengths:

      The novel search strategy is based on relevant data to identify full datasets instead of relying on metadata and thus is likely to have many true positives and few false positives.

      The paper provides a first glimpse at the potential hidden treasures of MD simulations and force field parametrizations of molecules.

      Analysis of parameter settings of MD simulations from how researchers *actually* run simulations can provide valuable feedback to MD code developers for how to document/educate users. This approach is much better than analyzing what authors write in the Methods sections.

      The authors make a prototype search engine available.

      The guidelines for FAIR MD data are based on experience gained from trying to make sense of the data.

      Weaknesses:

      So far the work is a proof-of-concept that focuses on MD data produced by Gromacs (which was prevalent under all indexed and identified packages).

      As discussed in the manuscript, some types of biomolecules are likely underrepresented because different communities have different preferences for force fields/MD codes (for example: carbohydrates with AMBER/GLYCAM using AMBER MD instead of Gromacs).

      Materials sciences seem to be severely under-represented --- commonly used codes in this area such as LAMMPS are not even detected, and only very few examples could be identified. As it is, the paper primarily provides an insight into the *biomolecular* MD simulation world.

      The authors succeed in providing a first realistic view on what MD data is available in public repositories. In particular, their explore-expand approach has the potential to be customized for all kinds of specialist simulation data, whereby specific artifacts are used as fiducial markers instead of metadata. The more detailed analysis is limited to Gromacs simulations and primarily biomolecular simulations (even though MD is also widely used in other fields such as the materials sciences). This restricted view may simply be correlated with the user community of Gromacs and hopefully, follow-up studies from this work will shed more light on this shortcoming.

      The study quantified the number of trajectories currently held in structured databases as ~10k vs ~30k in generalist repositories. To go beyond the proof-of-principle analysis it would be interesting to analyze the data in specialist repositories in the same way as the one in the generalist ones, especially as there are now efforts underway to create a database for MD simulations (Grant 'Molecular dynamics simulation for biology and chemistry research' to establish MDDB' DOI 10.3030/101094651). One should note that structured databases do not invalidate the approach pioneered in this work; if anything they are orthogonal to each other and both will likely play an important role in growing the usefulness of MD simulations in the future.

      We thank the reviewer for his/her comments. As mentioned to Reviewer 1, we intend to extend this work to other MD engines in the near future to go beyond Gromacs and even biomolecular simulations. Furthermore, as the value of accessing and indexing specialized MD databases such as MDDB, MemprotMD, GPCRmd, NMRLipids, ATLAS, and others has been mentioned by the reviewer, it is indeed one of our next steps to continue to expand the MDverse catalog of MD data. This indexing may also extend the visibility and widespreaded adoptability of these specific databases.

      Reviewer #3 (Public Review):

      Molecular dynamics (MD) simulations nowadays are an essential element of structural biology investigations, complementing experiments and aiding their interpretation by revealing transient processes or details (such as the effects of glycosylation on the SARS-CoV-2 spike protein, for example (Casalino et al. ACS Cent. Sci. 2020; 6, 10, 1722-1734 https://doi.org/10.1021/acscentsci.0c01056) that cannot be observed directly. MD simulations can allow for the calculation of thermodynamic, kinetic, and other properties and the prediction of biological or chemical activity. MD simulations can now serve as "computational assays" (Huggins et al. WIREs Comput Mol Sci. 2019; 9:e1393.

      https://doi.org/10.1002/wcms.1393). Conceptually, MD simulations have played a crucial role in developing the understanding that the dynamics and conformational behaviour of biological macromolecules are essential to their function, and are shaped by evolution. Atomistic simulations range up to the billion atom scale with exascale resources (e.g. simulations of SARS-CoV-2 in a respiratory aerosol. Dommer et al. The International Journal of High Performance Computing Applications. 2023; 37:28-44. doi:10.1177/10943420221128233), while coarse-grained models allow simulations on even larger length- and timescales. Simulations with combined quantum mechanics/molecular mechanics (QM/MM) methods can investigate biochemical reactivity, and overcome limitations of empirical forcefields (Cui et al. J. Phys. Chem. B 2021; 125, 689 https://doi.org/10.1021/acs.jpcb.0c09898).

      MD simulations generate large amounts of data (e.g. structures along the MD trajectory) and increasingly, e.g. because of funder mandates for open science, these data are deposited in publicly accessible repositories. There is real potential to learn from these data en masse, not only to understand biomolecular dynamics but also to explore methodological issues. Deposition of data is haphazard and lags far behind experimental structural biology, however, and it is also hard to answer the apparently simple question of "what is out there?". This is the question that Tiemann et al explore in this nice and important work, focusing on simulations run with the widely used GROMACS package. They develop a search strategy and identify almost 2,000 datasets from Zenodo, Figshare and Open Science Framework. This provides a very useful resource. For these datasets, they analyse features of the simulations (e.g. atomistic or coarse-grained), which provides a useful snapshot of current simulation approaches. The analysis is presented clearly and discussed insightfully. They also present a search engine to explore MD data, the MDverse data explorer, which promises to be a very useful tool.

      As the authors state: "Eventually, front-end solutions such as the MDverse data explorer tool can evolve being more user-friendly by interfacing the structures and dynamics with interactive 3D molecular viewers". This will make MD simulations accessible to non-specialists and researchers in other areas. I would envisage that this will also include approaches using interactive virtual reality for an immersive exploration of structure and dynamics, and virtual collaboration (e.g. O'Connor et al., Sci. Adv.4, eaat2731 (2018). DOI:10.1126/sciadv.aat2731)

      The need to share data effectively, and to compare simulations and test models, was illustrated clearly in the COVID-19 pandemic, which also demonstrated a willingness and commitment to data sharing across the international community (e.g. Amaro and Mulholland, J. Chem. Inf. Model. 2020, 60, 6, 2653-2656 https://doi.org/10.1021/acs.jcim.0c00319; Computing in Science & Engineering 2020, 22, 30-36 doi: 10.1109/MCSE.2020.3024155). There are important lessons to learn here, for simulations to be reproducible and reliable, for rapid testing, for exploiting data with machine learning, and for linking to data from other approaches. Tiemann et al. discuss how to develop these links, providing good perspectives and suggestions.

      I agree completely with the statement of the authors that "Even if MD data represents only 1 % of the total volume of data stored in Zenodo, we believe it is our responsibility, as a community, to develop a better sharing and reuse of MD simulation files - and it will neither have to be particularly cumbersome nor expensive. To this end, we are proposing two solutions. First, improve practices for sharing and depositing MD data in data repositories. Second, improve the FAIRness of already available MD data notably by improving the quality of the current metadata."

      This nicely states the challenge to the biomolecular simulation community. There is a clear need for standards for MD data and associated metadata. This will also help with the development of standards of best practice in simulations. The authors provide useful and detailed recommendations for MD metadata. These recommendations should contribute to discussions on the development of standards by researchers, funders, and publishers. Community organizations (such as CCP-BioSim and HECBioSim in the UK, BioExcel, CECAM, MolSSI, learned societies etc) have an important part to play in these developments, which are vital for the future of biomolecular simulation.

      We thank the reviewer for his/her comments. Beyond the points mentioned to Reviewers 1 and 2, as the reviewer suggested, it would be of great interest to combine innovative and immersive approaches to visualize and possibly interact with the data collected. This is indeed more and more amenable thanks to technologies such as WebGL and programs such as Mol*, or even - as also pointed out by the reviewer - through virtual reality, for example with the mentioned Narupa framework or with the UnityMol software. For a comprehensive review on MD trajectory visualization and associated challenges, we refer to our recent review article https://doi.org/10.3389/fbinf.2024.1356659.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Some minor text editing would improve the readability of the manuscript.

      It would be very useful if the authors could share their perspectives on the best and most efficient approach to sharing datasets and code associated with a publication. My concern lies in the fact that Github, which is currently the dominant platform for sharing code, is not well-suited for hosting large MD datasets. As a result, researchers often need to adopt a workflow where code is shared on Github and datasets are stored elsewhere (e.g., Zenodo). While this is feasible, it adds extra work. Ideally, a transparent process could be developed to seamlessly share code and datasets linked to a study through a unified interface.

      We thank the reviewer for this excellent suggestion. To our knowledge, there is yet no easy framework to jointly store and share code and data, linked to their scientific publication. Of course, code can be submitted to “generic” databases along with the data, but at the current state, those do not provide such useful features like collaborative work & track recording as done to the extent of GitHub.

      Although GitHub is indeed a suitable platform to deposit code, we strongly advise researchers to archive their code in Software Heritage. In addition to preserving source code, Software Heritage provides a unique identifier called SWHID that unambiguously makes reference to a specific version of the source code.

      So far, it is the responsibility of the scientific publication authors to link datasets and source codes (whether in GitHub or Software Heritage) in their paper, but also to make the reverse link from the data and code sharing platforms to the paper after publication.

      As mentioned by the reviewer, a unified interface that could ease this process would significantly contribute to FAIR-ness in MD.

      Reviewer #2 (Recommendations For The Authors):

      L180: I am not aware that TRR files contain energy terms as stated here, my understanding was that EDR files primarily served that purpose.

      “…available in one dataset. Interestingly, we found 1,406 .trr files, Which contain trajectory but also additional information such as velocities, energy of the system, etc’ While the file is especially useful in terms of reusability, the large size (can go up to several 100GB) limits its deposition in most…”

      Indeed, our formulation was ambiguous. The EDR files contain the detailed information on energies, whereas TRR files contain numerous values from the trajectory such as coordinates, velocities, forces and to some extent also energies

      (https://manual.gromacs.org/current/reference-manual/file-formats.html#trr)

      L207: The text states that the total time was not available from XTC files, only the number of frames. However, XTC files record time stamps in addition to frame numbers. As long as these times are in the Gromacs standard of picoseconds, the simulation time ought to be available from XTCs.

      “…systems and the number of frames available in the files (Fig. 3-B). Of note, the frames do not directly translate to the simulation runtime - more information deposited in other files (e.g. .mdp files) is needed to determine the complete runtime of the simulation. The system was up…”.

      Thank you for the useful comment, we removed this sentence. We now mention that studying the simulation time would be of interest in the future, especially when we will perform an exhaustive analysis of XTC files.

      “Of note, as .xtc files also contain time stamps, it would be interesting to study the relationship between the time and the number of frames to get useful information about the sampling. Nevertheless, this analysis would be possible only for unbiased MD simulations. So, we would need to decipher if the .xtc file is coming from biased or unbiased simulations, which may not be trivial.”

      Analysis of MDP files: Were these standard equilibrium MD or can you distinguish biased MD or free energy calculations?

      Currently we do not distinguish between biased and unbiased MD, but in the future we may attempt to do so, e.g. by correlating it with standard equilibration force-fields/parameters, timesteps or similar. Nevertheless, a true distinction will remain challenging.

      L336: typo: pikes -> spikes (or peaks?)

      “…simulations of Lennard-Jones models (Jeon et al., 2016). Interestingly, we noticed the appearance of several pikes at 400K, 600K and 800K, which were not present before the end of the year 2022. These peaks correspond to the same study related to the stability of hydrated crystals (Dybeck et al., 2023)’ Overall, thhis analysis revealed that a wide range of temperatures have been explored,…”

      Thank you. We have corrected this typo.

      Make clear how multiple versions of data sets are handled, e.g., if v1, v2, and v3 of a dataset are provided in Zenodo then which one is counted or are all counted?

      We collected the latest version only of datasets, as exposed by default by the Zenodo API. To reflect this, we added the following sentence to the Methods and Materials section, Initial data collection sub-section:

      “By default, the last version of the datasets was collected.”

      L248 Analysis of GRO files seems fairly narrow because PDB files are very often used for exactly the same purpose, even in the context of Gromacs simulations, not the least because it is familiar to structural biologists that may be interested in representative MD snapshots. Despite all the shortcomings of abusing the PDB format for MD, it is an attempt at increased interoperability. Perhaps the authors can make sure that readers understand that choosing GRO for analysis may give a somewhat skewed picture, even within Gromacs simulations.

      Thanks for this comment. We collected about 12,000 PDB files that could indeed be output from Gromacs simulations and easily be shared due to the universality of this format, but that could as well come from different sources (like other MD packages or the PDB database itself). We purposely decided to limit our study to files strictly associated with the Gromacs package, like MDP and XTC file types. However, we will extend our survey to all other structure-like formats and especially the PDB file type. We reflected this purpose in the following sentence (after line 281)

      “Beyond .gro files, we would like to analyze the ensemble of the ~12,000 .pdb files extracted in this study (see Figure 2-B) to better characterize the types of molecular structures deposited.”

      A simple template metadata file would be welcome (e.g., served from a GitHub/GitLab repository so that it can be improved with community input).

      Thank you for this suggestion that we fundamentally agree with. However, the generation of such a file is a major task, and we believe that the creation of a metadata file template requires far-reaching considerations, therefore is beyond the scope of this paper and should not be decided by a small group of researchers. Indeed, this topic requires a large consensus of different stakeholders, from users, to MD program developers, and journal editors. It would be especially useful to organize dedicated workshops with representatives of all these communities to tackle this specific issue, as mentioned by Reviewer3 in his/her public review. As a basis for this discussion, we humbly proposed at the end of this manuscript a few non-constraining guidelines based on our experience retrieving the data.

      To emphasize this statement, we added the following sentence at the end of the “Guidelines for better sharing of MD simulation data” section (line 420):

      “Converging on a set of metadata and format requires a large consensus of different stakeholders from users, to MD program developers, and journal editors. It would be especially useful to organize specific workshops with representatives of all these communities to collectively tackle this specific issue.”

      In "Data and code availability" it would be good to specify licenses in addition to stating "open source". Thank you for pointing out that GitLab/GitHub are not archives and that everyone should be strongly encouraged to submit data to stable archival repositories.

      We added the corresponding licenses for code and data in the “Data and code availability” section.

      Reviewer #3 (Recommendations For The Authors)

      The paper is well written, with very few typographical or other minor errors.

      Minor points:

      Line 468-9 "can evolve being more user-friendly" should be "can evolve to being more user-friendly", I think.

      Thank you, we have changed the wording accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study reports on the packing of molecules in cellular compartments, such as actin-based protrusions. The study provides solid evidence for parameters that enable the building of a biophysical model of filopodia, which is required to gain a complete understanding of these important actin-based structures. Some areas of the manuscript require further clarification.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript proposes an alternative method by SDS-PAGE calibration of Halo-Myo10 signals to quantify myosin molecules at specific subcellular locations, in this specific case filopodia, in epifluorescence datasets compared to the more laborious and troublesome single molecule approaches. Based on these preliminary estimates, the authors developed further their analysis and discussed different scenarios regarding myosin 10 working models to explain intracellular diffusion and targeting to filopodia.

      Strengths:

      Overall, the paper is elegantly written and the data analysis is appropriately presented.

      Weaknesses:

      While the methodology is intriguing in its descriptive potential and could be the beginning of an interesting story, a good portion of the paper is dedicated to the discussion of hypothetical working mechanisms to explain myosin diffusion, localization, and decoration of filopodial actin that is not accompanied by the mandatory gain/loss of function studies required to sustain these claims.

      To be fair, the detailed mechanisms that we raise related to diffusion, localization, and decoration are based on extensive work by others. Many prior papers use domain deletions of Myo10 and fall in the category of gain/loss-of-function studies. It is true that we have not repeated those extensive studies, but it seems appropriate to connect with and cite their work where appropriate.

      Reviewer #2 (Public Review):

      Summary:

      The paper sought to determine the number of myosin 10 molecules per cell and localized to filopodia, where they are known to be involved in formation, transport within, and dynamics of these important actin-based protrusions. The authors used a novel method to determine the number of molecules per cell. First, they expressed HALO tagged Myo10 in U20S cells and generated cell lysates of a certain number of cells and detected Myo10 after SDS-PAGE, with fluorescence and a stained free method. They used a purified HALO tagged standard protein to generate a standard curve which allowed for determining Myo10 concentration in cell lysates and thus an estimate of the number of Myo10 molecules per cell. They also examined the fluorescence intensity in fixed cell images to determine the average fluorescence intensity per Myo10 molecule, which allowed the number of Myo10 molecules per region of the cell to be determined. They found a relatively small fraction of Myo10 (6%) localizes to filopodia. There are hundreds of Myo10 in each filopodia, which suggests some filopodia have more Myo10 than actin binding sites. Thus, there may be crowding of Myo10 at the tips, which could impact transport, the morphology at the tips, and dynamics of the protrusions themselves. Overall, the study forms the basis for a novel technique to estimate the number of molecules per cell and their localization to actin-based structures. The implications are broad also for being able to understand the role of myosins in actin protrusions, which is important for cancer metastasis and wound healing.

      Strengths:

      The paper addresses an important fundamental biological question about how many molecular motors are localized to a specific cellular compartment and how that may relate to other aspects of the compartment such as the actin cytoskeleton and the membrane. The paper demonstrates a method of estimating the number of myosin molecules per cell using the fluorescently labeled HALO tag and SDS-PAGE analysis. There are several important conclusions from this work in that it estimates the number of Myo10 molecules localized to different regions of the filopodia and the minimum number required for filopodia formation. The authors also establish a correlation between number of Myo10 molecules filopodia localized and the number of filopodia in the cell. There is only a small % of Myo10 that tip localized relative to the total amount in the cell, suggesting Myo10 have to be activated to enter the filopodia compartment. The localization of Myo10 is log-normal, which suggest a clustering of Myo10 is a feature of this motor.

      Weaknesses:

      One main critique of this work is that the Myo10 was overexpressed. Thus, the amount in the cell body compared to the filopodia is difficult to compare to physiological conditions. The amount in the filopodia was relatively small - 100s of molecules per filopodia so this result is still interesting regardless of the overexpression. However, the overexpression should be addressed in the limitations.

      This is a reasonable perspective and we now note this caveat in the Limitations section so that readers will take note. Our goal here was to understand a system in which Myo10 is the limiting reagent for filopodia, rather than a native system that expresses high Myo10 on its own. Because U2OS cells do not express detectable levels of Myo10 (see below), the natural perturbation here is overexpressing Myo10 to stimulate filopodial growth.

      The authors have not addressed the potential for variability in transfection efficiency. The authors could examine the average fluorescence intensity per cell and if similar this may address this concern.

      Indeed, cells are heterogenous and will naturally express different levels of Myo10 not only due to transfection efficiency, but also due to their state (cell cycle stage, motile behavior, and more). In fact, we measure the transfection efficiency of each bioreplicate and account for it in our calibration procedure. We also measure the fluorescence intensity per cell, which lets us calculate the total Myo10s per cell and the cell-to-cell variability. These Myo10 distributions across cells are shown in Fig. 1D-E.

      We note here an error that we made in applying this transfection efficiency correction in the first submission. When we obtain the total Myo10 molecules by SDS-PAGE, we should divide by the total number of transfected cells. However, due to an operator precedence error, the transfection efficiency appeared in the numerator rather than the denominator. We have now corrected this error, which has the effect of increasing the number of molecules in all of our measurements. The effect of this correction has strengthened one of the paper’s main conclusions, that Myo10 is frequently overloaded at filopodial tips.

      The SDS PAGE method of estimating the number of molecules is quite interesting. I really like this idea. However, I feel there are a few more things to consider. The fraction of HALO tag standard and Myo10 labeled with the HALO tagged ligand is not determined directly. It is suggested that since excess HALO tagged ligand was added we can assume nearly 100% labeling. If the HALO tag standard protein is purified it should be feasible to determine the fraction of HALO tagged standard that is labeled by examining the absorbance of the protein at 280 and fluorophore at its appropriate wavelength.

      This is a fair point raised by the reviewer, and we have now measured a labeling efficiency of 90% in Supplementary Figure 2A-C. We have adjusted all values according to this labeling efficiency.

      The fraction of HALO tagged Myo10 labeled may be more challenging to determine, since it is in a cell lysate, but there may be some potential approaches (e.g. mass spec, HPLC).

      As noted, this value is considerably more challenging. Instead, we determined conditions under which labeling in cells is saturated. We have now stained with a concentration range for both fixed and live cell samples. Saturation occurs with ~0.5 μM HaloTag ligand-TMR in fixed/permeabilized cells and in live cells (Supplementary Figure 2D-E). This comparison of live cells vs. permeabilized cells allows us to say that the intact plasma membrane is not limiting labeling under these conditions.

      In Figure 1B, the stain free gel bands look relatively clean. The Myo10 is from cell lysates so it is surprising that there are not more bands. I am not surprised that the bands in the TMR fluorescence gel are clean, and I agree the fluorescence is the best way to quantitate.

      Figure 1B shows the focused view at high MW, and there is not much above Myo10. The full gel lanes shown in Supp. Fig. 1C show the expected number of bands from a cell lysate.

      In Figure 3C, the number of Myo10 molecules needed to initiate a filopodium was estimated. I wonder if the authors could have looked at live cell movies to determine that these events started with a puncta of Myo10 at the edge of the cell, and then went on to form a filopodia that elongated from the cell. How was the number of Myo10 molecules that were involved in the initiation determined? Please clarify the assumptions in making this conclusion.

      We thank the reviewer (and the other reviewers) for this excellent suggestion. We have now carried out these live cell experiments. These experiments were quite challenging, because we needed to collect snapshots of ~50 cells to measure the mean fluorescence intensity of transfected cells and then acquire movies of several cells for analysis. The U2OS cells were also highly temperature-sensitive and would retract their filopodia without objective heating.

      We have now analyzed filopodial initiation events and measured considerably more Myo10 at the first signs of accumulation– in the 100s of molecules. The dimmer spots that we measured in the first draft were likely unrelated to filopodial initiation, and we have corrected the discussion on this point.

      We now also track further growth from a stable filopodial tip (the phased-elongation mechanism from Ikebe and coworkers) and find approximately 500 molecules bud off in those events. We also track filopodial elongation rates as a function of Myo10 numbers. We have added additional live cell imaging sections that include these results.

      It is stated in the discussion that the amount of Myo10 in the filopodia exceeds the number of actin binding sites. However, since Myo10 contains membrane binding motifs and has been shown to interact with the membrane it should be pointed that the excess Myo10 at the tips may be interacting with the membrane and not actin, which may prevent traffic jams.

      This is also an excellent point to consider, and we have expanded the relevant discussion along these lines. We agree that the Myo10 at the filopodial tip is likely membrane-bound. We now estimate the 2D membrane area occupied by Myo10, and find that it reaches nearly full packing in many cases (under a number of assumptions that we spell out more fully in the manuscript).

      Reviewer #3 (Public Review):

      Summary:

      The unconventional myosin Myo10 (aka myosin X) is essential for filopodia formation in a number of mammalian cells. There is a good deal of interest in its role in filopodia formation and function. The manuscript describes a careful, quantitative analysis of Myo10 molecules in U2OS cells, a widely used model for studying filopodia, how many are present in the cytosol versus filopodia and the distribution of filopodia and molecules along the cell edge. Rigorous quantification of Myo10 protein amounts in a cell and cellular compartment are critical for ultimately deciphering the cellular mechanism of Myo10 action as well as understand the molecular composition of a Myo10-generated filopodium.

      Consistent with what is seen in images of Myo10 localization in many papers, the vast majority of Myo10 is in the cell body with only a small percentage (appr 5%) present in filopodia puncta. Interestingly, Myo10 is not uniformly distributed along the cell edge, but rather it is unevenly localized along the cell edge with one region preferentially extending filopodia, presumably via localized activation of Myo10 motors. Calculation of total molecules present in puncta based on measurement of puncta size and measured Halo-Myo10 signal intensity shows that the concentration of motor present can vary from 3 - 225 uM. Based on an estimation of available actin binding sites, it is possible that Myo10 can be present in excess over these binding sites.

      Strengths:

      The work represents an important first step towards defining the molecular stoichiometry of filopodial tip proteins. The observed range of Myo10 molecules at the tip suggests that it can accommodate a fairly wide range of Myo10 motors. There is great value in studies such as this and the approach taken by the authors gives one good confidence that the numbers obtained are in the right range.

      Weaknesses:

      One caveat (see below) is that these numbers are obtained for overexpressing cells and the relevance to native levels of Myo10 in a cell is unclear.

      A similar concern was raised by Reviewer 2; please see above.

      An interesting aspect of the work is quantification of the fraction of Myo10 molecules in the cytosol versus in filopodia tips showing that the vast majority of motors are inactive in the cytosol, as is seen in images of cells. This has implications for thinking about how cells maintain this large population in the off-state and what is the mechanism of motor activation. One question raised by this work is the distinction between cytosolic Myo10 and the population found at the ‘cell edge’ and the filopodia tip. The cortical population of Myo10 is partially activated, so to speak, as it is targeted to the cortex/membrane and presumably ready to go. Providing quantification of this population of motors, that one might think of as being in a waiting room, could provide additional insight into a potential step-by-step pathway where recruitment or binding to the cortical region/plasma membrane is not by itself sufficient for activation.

      As mentioned in our response to Reviewer 2, we have now carried out quantitation in live cells to capture Myo10 transitions from cell body into filopodial movement. We attempted to identify this membrane-bound population of motors in our new live cell experiments but were unable to make convincing measurements. Notably, we see no noticeable enrichment of Myo10 at the cortex relative to the cytosol. Although we believe there is a membrane-bound waiting room (akin to the 3D-2D-1D mechanism of Molloy and Peckham), we suspect that the 2D population is diffusing too rapidly to be detected under our imaging conditions.

      Specific comments:

      (1) It is not obvious whether the analysis of numbers of Myo10 molecules in a cell that is ectopically overexpressing Myo10 is relevant for wild type cells. It would appear to be a significant excess based on the total protein stained blot shown in Fig S1E where a prominent band the size of tagged Myo10 seen in the transfected sample is almost absent in the WT control lane.

      Even “wildtype” cells vary considerably in their Myo10 expression levels. For example, melanoma cells often heavily upregulate Myo10, while these U2OS cells produce nearly none (Supplementary Figure 1E). Thus, there is no single, widely acceptable target for Myo10 expression in wildtype cells.

      Please note that the new Supplementary Figure 1E is a Myo10 Western blot, not total protein staining as before.

      Ideally, and ultimately an important approach, would be to work with a cell line expressing endogenously tagged Myo10 via genome engineering. This can be complicated in transformed cells that often have chromosomal duplications.

      Indeed, we chose U2OS cells for this work because they do not express detectable levels of Myo10, and thus we can avoid all of these complications. Here we can examine how Myo10 levels control filopodial production through ectopic expression.

      However, even though there is an excess of Myo10 it would appear that activation is still under some type of control as the cytosolic pool is quite large and its localization to the cell edge is not uniform. But it is difficult to gauge whether the number of molecules in the filopodium is the same as would be seen in untransfected cells. Myo10 can readily walk up a filopodium and if excess numbers of this motor are activated they would accumulate in the tip in large numbers, possibly creating a bulge as and indeed it does appear that some tips are unusually large. Then how would that relate to the normal condition?

      As noted above, the normal condition depends on the cellular system. However, endogenous Myo10 also accumulates in bulges at filopodial tips, so this is not a phenotype unique to Myo10 overexpression. For example, the images from Figure 1 of the Berg and Cheney (2002) citation show bulges from endogenous Myo10 in endothelial cells.

      (2) Measurements of the localization of Myo10 focuses in large part on ‘Myo10 punctae’. While it seems reasonable to presume that these are filopodia tips, the authors should provide readers with a clear definition of a puncta. Is it only filopodia tips, which seems to be the case? Does it include initiation sites at the cell membrane that often appear as punctae?

      We define puncta as any clusters/spots of Myo10 signal detected by segmentation, not limited to any location within the surface-attached filopodia. We exclude puncta that appear in the cell interior (~5 of which appear in Fig. 1A). These are likely dorsal filopodia, but there are few of these compared to the surface attached filopodia of U2OS cells. In Figure 2, “puncta” includes all Myo10 clusters along the filopodia shaft, though a majority happen to be tip-localized (please see Supplementary Figure 4B). We have edited the main text for clarification.

      Along those lines, the position of dim punctae along the length of a filopodium is measured (Fig 3D). The findings suggest that a given filopodium can have more than one puncta which seems at odds if a puncta is a filopodia tip. How frequently is a filopodium with two puncta seen? It would be helpful if the authors provided an example image showing the dim puncta that are not present at the tip.

      We have now provided an example image of dim puncta along filopodia in Supplementary Figure 4C.

      (3) The concentration of actin available to Myo10 is calculated based on the deduction from Nagy et al (2010) that only 4/13 of the actin monomers in a helical turn are accessible to the Myo10 motor (discussion on pg 9; Fig S4). Subsequent work (Ropars et al, 2016) has shown that the heads of the antiparallel Myo10 dimer are flattened, but the neck is rather flexible, meaning that the motor can a variable reach (36 - 52 nm). Wouldn’t this mean that more actin could be accessible to the Myo10 motor than is calculated here?

      Although we see why the reviewer might believe otherwise, the 4/13 fraction of accessible actin holds. This fraction is obtained from consideration of the fascin-actin bundle structure alone, independent of the reach of any particular myosin motor. Every repeating layer of 13 actin subunits (or 36 nm) has 4 accessible myosin binding-sites. The remaining 9 sites are rejected because a single myosin motor domain will have a steric clash with a neighboring actin filament in the bundle. A myosin with an exceptionally long reach might reach the next 13 subunit layer, but that layer also has only 4 binding sites. Thus, we can calculate the number of binding sites per unit length along the filopodium. This number would hold for a dimeric myosin with any reach, including myosin-5 or myosin-2.

      (4) Quantification of numbers of Myo10 molecules in filopodial puncta (Fig 3C) leads the authors to conclude that ‘only ten or fewer Myo10 molecules are necessary for filopodia initiation’ (pg 7, top). While this is a reasonable based on the assumption that the formation of a puncta ultimately results from an initiation event, little is known about initiation events and without direct observation of coalescence of Myo10 at the cell edge that leads to formation of a filopodium, this seems rather speculative.

      As noted above, we have now performed the necessary live cell imaging of filopodial nucleation events and have updated our conclusions accordingly.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have made a series of comments that might help the authors improve their manuscript:

      - A full calibration of the methodology would require testing a wider range of protein amounts, to exhaustively detect the dynamic range of the technique. The authors acknowledge in the discussion that “Furthermore, our estimates of molecules are predicated on the calibration curve of the Halo Standard Protein on the SDS-PAGE gels, which is likely the highest source of error on our molecule counts”. A good way of convincing a nasty reviewer is to provide a calibration with more than 3 reference points. At least this will help exclude from the analysis cells where Myo10 estimates are not in the linear regime of detection.

      We completely agree with the reviewer’s suggestion to build a robust calibration curve. The SDS gel shown in Figure 1C originally contained 4 reference points, but the highest HaloTag standard protein point oversaturated the detector at the set exposure in the TMR channel and was omitted. We have now re-run the SDS gel to include a HaloTag standard protein curve comprising 5 points, alongside all three bioreplicates from the fixed cell experiments and all three bioreplicates from the live cell experiments (updated in Figure 1B-C). We had saved frozen lysates from the original fixed cell work, so we were able to reanalyze our data with the new set of standards. The Myo10 quantities are consistent, but with much tighter CIs from the standard curve.

      - As already said this methodology is intriguing, however, a correlative validation with a conventional SMLM approach to address the bona-fide of the method would be ideal.

      Unfortunately, single molecule approaches for validation are impractical for us. Due to the relatively high magnification of our TIRF microscope and the large spread area of the U2OS cells, single cells typically extend beyond the field of view. We acknowledge the benefits of SMLM quantitative techniques and other approaches cited in the introduction section. To avoid use of special tools/instruments, we offer our methodology, based off Pollard group’s quantitative Western blotting of GFP, as a simpler alternative accessible to anyone.

      - TMR is a small ligand likely interacting also with Halo in its denatured state. However, to clear any doubts a parallel Native-PAGE investigation should be included, or if existing a specific reference should be provided.

      Perhaps there is a misunderstanding here. One of the key advantages of the HaloTag labeling system is that the engineered dehalogenase is covalently modified by the ligand (the TMR-ligand is a suicide substrate). This means that the TMR remains bound even under denaturing conditions, which allows its detection in SDS-PAGE. Native gels are unnecessary here.

      - Moreover, SDS-PAGE is run at alkaline pH, have the authors considered these points when designing the methodology? Fluorescence images were taken in PBS, which has a different pH. Could the authors, or the literature, exclude these aspects as potential pitfalls in the methodology? Also temperature is affecting fluorescence emission, but it is easier to control with certain tolerance in the room-temperature regime.

      Our method does not compare fluorescence values that cross the experimental systems (SDS-PAGE vs. microscopy). Cellular proteins and HaloTag protein standards are compared in a single setting of SDS-PAGE to obtain the average number of Myo10s per transfected cell. Likewise, all measurements on intact (live or fixed) cells are conducted in that single setting to obtain average fluorescence per cell. Thus, there is no issue with the different buffers or temperatures affecting fluorescence emission.

      - The authors should test their approach also with truncation variants of Myosin10 (for instance lacking the PH or motor domain). This is a classical approach that might prove the potential of the technique when altering the capacity of the protein to interact with a main binding partner. Also, treatments that induced filopodia formation might prove useful (i.e., hypotonic media induce filopodia formation in some fibroblast cell lines in our hands).

      The reviewer raises interesting suggestions that we aim to address in future experiments, but truncation variants and environmental perturbations are beyond the focus of the current manuscript. Here, we report on the otherwise unperturbed state when we add exogenous full-length Myo10 to the U2OS cells. But indeed, experiments with Myo10 domain truncations, PI3K and PTEN inhibition, and cargo protein / activating cofactor knock-downs (among others) are on our drawing board.

      - Most of the mechanisms hypothesized in the discussion are sound and plausible. However, the authors have chosen an experimental model where transient transfection of exogenous Myo10 in U2OS is performed. This approach poses two main and fundamental questions that are not resolved by the data provided:

      A) how do different expression levels affect the Myo10 counting?

      Our counting procedure does not assume uniform expression across a population of cells– quite the opposite, in fact. We directly measure Myo10 expression levels on a cell-by-cell basis with microscopy, once we know the number of molecules in our total pool (see the Methods for details). As an example of the final output, Figs. 1D and 1E show the total number of Myo10 molecules per cell for fixed and live cells, respectively.

      B) how does endogenous and unlabeled Myo10 hamper the bonafide of counts? The authors claimed “U2OS cells express low levels of Myo10, so there is a small population of unlabeled endogenous Myo10 unaddressed by this paper”. As presented, the low levels of endogenous Myo10 sound an arbitrary parameter, and there are no data presented that can limit if not exclude this bias in the analysis. To produce data in a genetically modified cell line with Halo-tag on the endogenous protein will represent a much cleaner system. Alternatively, the authors should look for Myo10 KO cell lines where they can back-transfect their Halo-Tagged Myo10 construct in a more consistent framework, focusing on cells with low-to-mid levels of expression.

      We agree, this is an important point to nail down (and is often neglected in the literature). We have now measured the endogenous Myo10 levels in U2OS cells by Western blotting and found that it is undetectable compared to our HaloTagged construct expression. Please see Supp. Fig 1E. Thus, for all intents and purposes, every Myo10 molecule in these experiments came from our expression plasmid. Accordingly, we have removed this caveat from the paper.

      Minor points

      - Figure 1B. To help the reader SDS-PAGE gels annotations should be clearer already from the figure.

      We have updated the annotations for clarity.

      - Methods should be organized in sessions. As it stands, it is hard for the reader to look for technical details.

      We have expanded and added subsections to the Methods as requested.

      - The good practice of indicating the gene and transcript entry numbers and the primer used to amplify and clone into the backbone vectors is getting lost in many papers. I would strongly encourage the authors to add this information to the methods.

      We have included the gene entries to the methods and will include a full FASTA file of the coding sequence as supplementary information to avoid any ambiguity here.

      The authors write “It is unclear how myosins navigate to the right place at the right time, but our results support an important interplay between Myo10 and the actin network.” It is a bit scholastic to say that Myo10 and actin have an important interplay, they are major binding partners. What is the new knowledge contained in this sentence?

      Agreed– we have deleted the sentence in question.

      Reviewer #2 (Recommendations For The Authors):

      The authors should address all the weaknesses indicated in the public review.

      There were a few other places that require clarification.

      On page 4, the last paragraph. It is stated that the targeting of Myo10 was reported/proposed based on previous work (ref 31). The next few sentences are not referenced and thus likely refer to ref 31. The authors did not measure the parameters discussed in these sentences, so it is important to clarify that they are referring to previous work and not the current study.

      Indeed, the next few sentences still refer to old reference 31, so we have now edited the paragraph for clarity.

      On page 7, the reference to Figure 3A indicates that the trend of higher Myo10 correlating with more filopodia. However, the reference to Figure 3B indicates total intracellular Myo10 weakly correlates with more filopodia. However, the x-axis on Figure 3B is filopodia molecules not the intracellular Myo10. Please clarify.

      We appreciate the reviewer for catching our mistake. Those plots are now in Fig. 2 and have been edited accordingly.

      Reviewer #3 (Recommendations For The Authors):

      The Discussion of results at the end of each section is rather brief and could be expanded on a bit more.

      Before we were operating under the constraints of an eLife Short Report. We have now expanded the discussion for a full article.

      The authors mention that actin filaments at the tips of filopodia could be frayed, citing Medalia et al, 2007 (ref 40). That paper describes an early cryoEM analysis of filopodia from the amoeba Dictyostelium. EM images of mammalian filopodia tips, e.g. Svitkina et al, 2003, JCB, do not show quite the same organization of actin as seen in the Dictyostelium filopodia tips. However, recent work from the Bershadsky lab, Li et al, 2023, presents a few cryoEM images of tips of left-bent filopodia that are tightly adhered to a substrate and there it looks like actin filaments become disorganized in tips, along with membrane bulging. The authors should consider expanding discussion of the filopodia tips to take into account what is known for mammalian filopodia.

      We thank the reviewer for bringing these enlightening papers to our attention. We have now included these citations in the discussion.

      Fig 1D - The x-axis is a bit odd, it goes from 0 then to 2.5e+06 with no indication of the bin size. Can this be re-labelled or the scale displayed a bit differently?

      We have double-checked the axis breaks, which are large because the underlying values are large. We have also provided the bin size as requested for all histograms.

      Fig 4A - What is the bin size for the histogram?

      As above, we have now updated the figure legends (now in Fig. 3) to include the bin size.

      Methods -

      - Please provide an accession number for the Myo10 nucleotide sequence used for this work as there are at least two known isoforms.

      Thank you for noting this. We are using the full-length, not the headless isoform. We have now updated the Methods accordingly.

      - No mention is made of the SDS sample buffer used, was that also added to the sample?

      We have now updated the Methods accordingly.

      - How are samples boiled at 70 deg C? Do the authors actually mean ‘heated’?

      Indeed. We have now corrected “boiled” to “heated.”

      - Could the authors please briefly explain the connected component analysis used to identify filopodia?

      We have now updated the Methods accordingly.

      - The intensity of filopodia was determined by dividing tip intensity by the total bioreplicate sum of intensities then multiplying it by the total pool, if this reviewer understands correctly. It sounds like intensities are being averaged across a whole cell population instead of cell-by-cell. Is that correct? If so, can the authors please provide the underlying rationale for this? If not, then please better describe what was actually done.

      We apologize for the confusion. Intensities are being averaged (summed) across a whole cell population, but importantly that step is only used to obtain a scale factor that converts the fluorescence signal at the microscope to the number of molecules. We then use that scale factor for all cells imaged in the bioreplicate, to both 1) find the total Myo10 in that cell, and 2) find the total amount of that Myo10 in any given location within that cell.

      To further clarify, each bioreplicate has a known total number of Myo10 molecules associated with the number of cells loaded onto the SDS gel. From the SDS gel, we have an average number of Myo10 molecules per positively transfected cell. If 50 cell images are analyzed, then there is a Myo10 ‘total pool’ of (50 cells) * (average Myo10 molecules/cell). The fluorescence signal intensities in microscopy were summed for all cells within the bioreplicate (50 cells in this example). However, due to variation in expression, not every cell has the same signal intensity when imaged under the same conditions. It would be inaccurate to assume each cell contains the average Myo10 molecules/cell. Therefore, to get the number of molecules within a given Myo10 cell (or punctum), the summed cell (punctum) intensity was divided by the bioreplicate fluorescence signal intensity sum and multiplied by ‘total pool.’

      - The authors quantify Myo10 protein amounts by western blotting using Halo tag fluorescence, a method that should provide good accuracy. The results depend on the transfection efficiency and it is rarely the case that it is 100%. The authors state that they use a ‘value correction for positively transfected cells’ (pg 11). It is likely that there was a range of expression levels in the cells, how was a cut-off for classifying a cell as non-expressing determined or set?

      As described in the Methods, “microscopy was used to count the percentage of transfected cells from ~105-190 randomly surveyed cells per bioreplicate.” Cells were labeled and located with DAPI. If no TMR signal could be visually detected by microscopy, then the cell was deemed to be non-Myo10 expressing. We did not set a cutoff fluorescence value, as untransfected cells have no detectable signal. Please see Supplementary Figure 1F for examples.

      - “In-house Python scripts” are used for image analysis. Will these be made publicly available?

      Yes, we will package these up on GitHub.

    1. Author response:

      a) that the investigation is very interesting and inventive, and has the potential to reveal some novel insights.

      We thank the reviewers and are excited to improve upon the manuscript through their suggestions.

      b) that the problem of temporal autocorrelation in the fMRI and behavioral data has not been dealt with clearly and convincingly

      We agree that convincingly accounting for fMRI temporal autocorrelation is important to our claims. To reduce its effects, we used field standard methods: prewhitening and autocorrelation modeling with SPM’s FAST algorithm (shown by Olszowy et al. 2019 to be superior to SPM’s default setting), as well as a high-pass filter of 128 Hz. There is still some first-order autocorrelation structure present across voxels in the left hippocampal beta series: across participants there is slightly positive autocorrelation between the betas of decision trials on successive trials, that decays to ~0 at subsequent lags. We note that our task is a narrative, and some patterns over time are expected; instead of attempting to fully eliminate all temporal structure in the data, we aim to show that the temporal distance between trials is unlikely to explain our effects.

      In the within versus between social dimension representational similarity analysis, the average temporal distance between trials is the same within and between dimensions. The clustering analysis is a between subject analysis about individual differences–and the same overall temporal structure is experienced by all participants.

      The trajectory analysis does not focus on consecutive trials across characters, but rather on consecutive trials within characters, where the time gap between successive trials is relatively large and highly variable. An average of over a minute of time elapses between successive decision trials for a given character (versus ~20 seconds across characters), which is on average almost 11 narrative slides and 3 decision trials. Across characters, the temporal gap between decision trials ranges between 12 seconds to more than 10 minutes, reducing the likelihood that temporal autocorrelation drives character-related estimates. We also highlight the shuffled choices control model, which shares the same temporal autocorrelation structure as the model of interest but had significantly poorer social location decoding–a strong indication that temporal autocorrelation alone can’t explain these results. For each participant, we shuffled their choices and re-computed trajectories that preserved the origin and end locations but produced different locations along the way. Our model decoded location significantly better than this null model, and this difference in performance can't be explained by differences in temporal autocorrelation in the neural or behavioral data.

      In the revision, we will further address this concern. For example, we will report more details on the task structure to aid in interpretation and will more precisely characterize the temporal autocorrelation profile. Where appropriate, we will also improve on and/or add more control analyses that preserve the autocorrelation structure.

      c) that a number of important interesting questions have not been addressed: Are the differences between social partners encoded in the hippocampus? Are the social dimensions encoded in a consistent manner across social partners?

      We believe that we should be able to decode other interesting task- and relationship-related features from the hippocampal patterns, as suggested by the reviewers. In the revision, we will attempt several such analyses, while taking care to control for temporal autocorrelation.

      d) that the cluster analysis in the brain-behavior correlation analysis is not well motivated or validated and should be clarified.

      We agree with the reviewers that this clustering analysis should be better described and validated. We aimed to ask whether less diverse and distinctive cognitive representations of the relationship trajectories relate to smaller real-world social networks. This question of impoverished cognitive maps was first raised by Edward Tolman; we think it is relevant here, as well. In the revision, we will clarify its motivations and implications, and better evaluate it for its robustness. Here, we address a few comments made by the reviewers.

      Reviewer 2 noted that other analyses could be used to ask whether social cognitive map complexity relates to real-world social network complexity. While the proposed alternatives are interesting (e.g., correlating decoding accuracy with social network size), we believe these analyses ask different questions. The current co-clustering analysis was intended to estimate map complexity jointly from the behavioral and neural signatures of the social map across characters. In contrast, the spline location decoding is within character; the accuracy of this decoding does not say much about representations across characters. And although we think character decoding is an interesting possible addition to this manuscript, its accuracy may reflect other aspects of the relationships, beyond just spatial representation. Thus, we will provide a clearer and better validated version of the current analysis to address this question.

      We would also like to clarify that we did not collect the Social Network Index questionnaire in the Initial sample; as such these results are more tentative than the other analyses, due to the inability to confirm them in a separate sample. Reviewer 2 also suggests that a single outlier could drive this effect; but estimating the effect with robust regression also returns a right-tailed p < 0.05, showing that the relationship is robust to outliers.

      References

      Olszowy, W., Aston, J., Rua, C. & Williams, W.B. Accurate autocorrelation modeling substantially improves fMRI reliability. Nature Communications. (2019).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides important new insights into how multisensory information is processed in the lateral cortex of the inferior colliculus, a poorly understood part of the auditory midbrain. By developing new imaging techniques that provide the first optical access to the lateral cortex in a living animal, the authors provide convincing in vivo evidence that this region contains separate subregions that can be distinguished by their sensory inputs and neurochemical profiles, as suggested by previous anatomical and in vitro studies. Additional information and analyses are needed, however, to allow readers to fully appreciate what was done, and the comparison of multisensory interactions between awake and anesthetized mice would benefit from being explored in more detail.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors provide a characterisation of auditory responses (tones, noise, and amplitude-modulated sounds) and bimodal (somatosensory-auditory) responses and interactions in the higher-order lateral cortex (LC) of the inferior colliculus (IC) and compare these characteristics with the higher order dorsal cortex (DC) of the IC - in awake and anaesthetised mice. Dan Llano's group has previously identified gaba'ergic patches (modules) in the LC distinctly receiving inputs from somatosensory structures, surrounded by matrix regions receiving inputs from the auditory cortex. They here use 2P calcium imaging combined with an implanted prism to - for the first time - get functional optical access to these subregions (modules and matrix) in the lateral cortex of IC in vivo, in order to also characterise the functional difference in these subparts of LC. They find that both DC and LC of both awake and anaesthetised mice appear to be more responsive to more complex sounds (amplitude-modulated noise) compared to pure tones and that under anesthesia the matrix of LC is more modulated by specific frequency and temporal content compared to the gabaergic modules in LC. However, while both LC and DC appear to have low-frequency preferences, this preference for low frequencies is more pronounced in DC. Furthermore, in both awake and anesthetized mice, somatosensory inputs are capable of driving responses on their own in the modules of LC, but very little (possibly not at all) in the matrix. However, bimodal interactions may be different under awake and anesthesia in LC, which warrants deeper investigation by the authors: They find, under anesthesia, more bimodal enhancement in modules of LC compared to the matrix of LC and bimodal suppression dominating the matrix of LC. In contrast, under awake conditions bimodal enhancement is almost exclusively found in the matrix of LC, and bimodal suppression dominates both matrix and modules of LC.

      The paper provides new information about how subregions with different inputs and neurochemical profiles in the higher-order auditory midbrain process auditory and multisensory information, and is useful for the auditory and multisensory circuits neuroscience community.

      Strengths:

      The major strength of this study is undoubtedly the fact that the authors for the first time provide optical access to a subcortical region (the lateral cortex of the inferior colliculus (i.e. higher order auditory midbrain)) which we know (from previous work by the same group) have optically identifiable subdivisions with unique inputs and neurotransmitter release, and plays a central role in auditory and multisensory processing. A description of basic auditory and multisensory properties of this structure is therefore very useful for understanding auditory processing and multisensory interactions in subcortical circuits.

      Weaknesses:

      I have divided my comments about weaknesses and improvements into major and minor comments. All of which I believe are addressable by the reviewers to provide a more clear picture of their characterisation of the higher-order auditory midbrain.

      Major comment:

      (1) The differences between multisensory interactions in LC in anaesthetised and awake preparations appear to be qualitatively different, though the authors claim they are similar (see also minor comment related to figure 10H for further explanation of what I mean). However, the findings in awake and anaesthetised conditions are summarised differently, and plotting of similar findings in the awake figures and anaesthetised figures are different - and different statistics are used for the same comparisons. This makes it very difficult to assess how multisensory integration in LC is different under awake and anaesthetised conditions. I suggest that the authors plot (and test with similar statistics) the summary plots in Figure 8 (i.e. Figure 8H-K) for awake data in Figure 10, and also make similar plots to Figures 10G-H for anaesthetised data. This will help the readers understand the differences between bimodal stimulation effects on awake and anaesthetised preparations - which in its current form, looks very distinct. In general, it is unclear to me why the awake data related to Figures 9 and 10 is presented in a different way for similar comparisons. Please streamline the presentation of results for anaesthetised and awake results to aid the comparison of results in different states, and explicitly state and discuss differences under awake and anaesthetised conditions.

      We thank the reviewer for the valuable suggestion. We only highlighted the similarities between the data obtained from anesthetized and awake preparations to indicate the ability to reproduce the technique in awake animals for future assessment. Identifying those similarities between the two experimental setups was based on the comparison between modules vs matrix or LC vs DC within each experimental setup (awake vs anesthetized). Therefore, the statistics were chosen differently for each setup based on the size of the subjects (n) within each experimental preparation. However, we agree with the reviewer’s comment that there are differences between the anesthetized and awake data. To examine these differences, we ran the same statistics for Figure 5 (tonotopy of LC vs. DC-anesthetic animals) and Figure 9 (tonotopy of LC vs DC-awake animals). In addition, we added a new figure after Figure 9 to separate the statistical analysis from the maps. Accordingly, Figures 4 and 5 (maps and analysis, respectively -anesthetized animals) now match Figures 9 and 10 (maps and analysis, respectively – awake animals). We also did the same thing for Figures 7 (microprism imaging of the LC - anesthetized animals), 8 (imaging of the LC from the dorsal surface - anesthetized animals) as well as Figure 11 or old Figure 10 (microprism imaging of the LC - awake animals) to address the similarities and differences of the multisensory data between awake and anesthetized animals. We edited the text accordingly in the result and discussion sections.

      (2) The claim about the degree of tonotopy in LC and DC should be aided by summary statistics to understand the degree to which tonotopy is actually present. For example, the authors could demonstrate that it is not possible/or is possible to predict above chance a cell's BF based on the group of other cells in the area. This will help understand to what degree the tonotopy is topographic vs salt and pepper. Also, it would be good to know if the gaba'ergic modules have a higher propensity of particular BFs or tonotopic structure compared to matrix regions in LC, and also if general tuning properties (e.g. tuning width) are different from the matrix cells and the ones in DC.

      Thank you for the reviewer’s suggestion. We have examined the tonotopy of LC and DC using two regression models (linear and quadratic polynomial) between the BFs of the cells and their location on the anatomical axis. Therefore, the tonotopy is indicated by a significant regression fit with a high R2 between the BFs the cells, and their location within each structure. For the DC, there was a significant regression fit between the BFs of the cells and their locations over the rostromedial to the caudolateral axis. Additionally, the R2 of the quadratic polynomial fit was higher than that of the linear fit, which indicates a nonlinear distribution of cells based on their BFs, which is consistent with the presence of high-low-high tuning over the DC surface. Given that the microprism cannot image the whole area of the LC, and it images a slightly different area in each animal, it was very difficult to get a consistent map for the LC as well as a solid conclusion about the LC tonotopy. However, we have examined the regression fit between the BFs of cells and their location along the main four anatomical axes of the field of view obtained from each animal (dorsal to ventral), (rostral to caudal), (dorsocaudal to ventrorostral) (dorsorostral to ventrocoudal). Unlike the DC, the LC imaged via microprism showed a lower R2 for both linear and quadratic regression mostly in the dorsoventral axis. We show the fitting curves of these regressions in Figure 4-figure supplement 1 (anesthetized data) and Figure 9-figure supplement 1 (awake data). Despite the inconsistent tonotopy of the LC imaged via microprism, the modules were found to have a higher BFs median at 10 kHz compared to matrix that had a lower BFs median at 7.1 kHz, which was consistent across the anesthetized and awake animals. We have added these results in the corresponding spot in the results section (lines 193-197 and 361-364). We have examined the tuning width using the binarized receptive field sum (RFS) method in which each neuron was given a value of 1 if it responds to a single frequency (Narrow RF), but this value increases if the neuron responds to more neighbor frequencies (wide RF). We did this calculation across all the sound levels. Both DC and LC of the anesthetized animals had higher RFS mean and median than those of awake animals given that ketamine was known to broaden the RF. However, in both preparations (anesthetized and awake), the DC had a higher RFS mean than that of the LC, which could be consistent with the finding that the DC had a relatively lower SMI than the LC. To show these new data, we made a new Figure 10-figure supplement 1, and we edited the text accordingly [lines 372-379 & 527-531].

      (3) Throughout the paper more information needs to be given about the number of cells, sessions, and animals used in each panel, and what level was used as n in the statistical tests. For example, in Figure 4 I can not tell if the 4 mice shown for LC imaging are the only 4 mice imaged, and used in the Figure 4E summary or if these are just examples. In general, throughout the paper, it is currently not possible to assess how many cells, sessions, and animals the data shown comes from.

      Thank you for the reviewer’s comment. We do apologize for not adding this information. We added all the information regarding the size of the statistical subjects (number of cells or number of animals used) for every test outcome. To keep the flow of the text, we added the details of the statistical tests in the legends of the figures.

      (4) Throughout the paper, to better understand the summary maps and plots, it would be helpful to see example responses of the different components investigated. For example, given that module cells appear to have more auditory offset responses, it would be helpful to see what the bimodal, sound-only, and somatosensory responses look like in example cells in LC modules. This also goes for just general examples of what the responses to auditory and somatosensory inputs look like in DC vs LC. In general example plots of what the responses actually look like are needed to better understand what is being summarised.

      Thank you for the reviewer’s comment and suggestion. We modified Figure 6 and the text accordingly to include all the significant examples of cells discussed throughout the work.

      Reviewer #2 (Public Review):

      Summary:

      The study describes differences in responses to sounds and whisker deflections as well as combinations of these stimuli in different neurochemically defined subsections of the lateral and dorsal cortex of the inferior colliculus in anesthetised and awake mice.

      Strengths:

      The main achievement of the work lies in obtaining the data in the first place as this required establishing and refining a challenging surgical procedure to insert a prism that enabled the authors to visualise the lateral surface of the inferior colliculus. Using this approach, the authors were then able to provide the first functional comparison of neural responses inside and outside of the GABA-rich modules of the lateral cortex. The strongest and most interesting aspects of the results, in my opinion, concern the interactions of auditory and somatosensory stimulation. For instance, the authors find that a) somatosensory-responses are strongest inside the modules and b) somatosensory-auditory suppression is stronger in the matrix than in the modules. This suggests that, while somatosensory inputs preferentially target the GABA-rich modules, they do not exclusively target GABAergic neurons within the modules (given that the authors record exclusively from excitatory neurons we wouldn't expect to see somatosensory responses if they targeted exclusively GABAergic neurons), and that the GABAergic neurons of the modules (consistent with previous work) preferentially impact neurons outside the modules, i.e. via long-range connections.

      Weaknesses:

      While the findings are of interest to the subfield they have only rather limited implications beyond it. The writing is not as precise as it could be. Consequently, the manuscript is unclear in some places. For instance, the text is somewhat confusing as to whether there is a difference in the pattern (modules vs matrix) of somatosensory-auditory suppression between anesthetized and awake animals. Furthermore, there are aspects of the results which are potentially very interesting but have not been explored. For example, there is a remarkable degree of clustering of response properties evident in many of the maps included in the paper. Taking Figure 7 for instance, rather than a salt and pepper organization we can see auditory responsive neurons clumped together and non-responsive neurons clumped together and in the panels below we can see off-responsive neurons forming clusters (although it is not easy to make out the magenta dots against the black background). This degree of clustering seems much stronger than expected and deserves further attention.

      Thank you for the reviewer’s comment. We do apologize if some areas in the manuscript were imprecisely written. For anesthetized and awake data, we have only emphasized the similarities between the two setups to show the ability to use microprism in awake animals for future assessment. To highlight the differences between anesthetized and awake animals, we have now run uniform statistics for all the data collected from both setups. Accordingly, we have edited Figures 4 and 5 (tonotopy-anesthetized) to match Figures 9 and new Figure 10 (tonotopy-awake). Also, we edited Figures 7 and 8 (multisensory- anesthetized) to match Figure 11 or old Figure 10 (multisensory- awake). We edited the text accordingly in the results section and discussed the possible differences between anesthetized and awake data in the discussion section [lines 521-553].

      We agree with the reviewer’s comment that the cells were topographically clustered based on their responses. Some of these clusters include the somatosensory responsive cells, which were located mostly in the modules (Figures 7D and 8E). Also, the auditory responsive cells with offset responses were clustered mostly in the modules (Figures 7C and 8F). Accordingly, we have edited the text to emphasize this finding.

      We noticed also that some responsive cells to the tested stimulations were surrounded by nonresponsive cells. By comparing the response of the cells to different stimuli we found that while Figures 7 and 11 (old Figure 10) showed only the response of the cells to auditory stimulation (unmodulated broadband noise at 80 dB) and somatosensory stimulation (whisker deflection), some nonresponsive cells to these specific stimulations were found to be responsive to pure tones of different frequencies and amplitudes. As an indicator of the cells' viability, we additionally examined the spontaneous activity of the nonresponsive cells across different data sets. We note that spontaneous activity was rare for all cells even among the responsive cells to sound or somatosensory stimulations. This finding could be related to the possibility that the 2P imaging of calcium signals may not be sensitive enough to track spontaneous activity that may originate from single spikes. However, in some data sets, we have found that the cells that did not respond to any tested stimuli showed spontaneous activity when no stimulation was given indicating the viability of those cells. We have addressed the activity of the non-responsive cells in the text along with a new Figure 11-figure supplement 1.

      We changed the magenta into a green color to be suitable for the dark background. Also, we have completely changed the color palette of all of our images to be suitable for color-blind readers as suggested by reviewer 1.

      Reviewer #3 (Public Review):

      The lateral cortex of the inferior colliculus (LC) is a region of the auditory midbrain noted for receiving both auditory and somatosensory input. Anatomical studies have established that somatosensory input primarily impinges on "modular" regions of the LC, which are characterized by high densities of GABAergic neurons, while auditory input is more prominent in the "matrix" regions that surround the modules. However, how auditory and somatosensory stimuli shape activity, both individually and when combined, in the modular and matrix regions of the LC has remained unknown.

      The major obstacle to progress has been the location of the LC on the lateral edge of the inferior colliculus where it cannot be accessed in vivo using conventional imaging approaches. The authors overcame this obstacle by developing methods to implant a microprism adjacent to the LC. By redirecting light from the lateral surface of the LC to the dorsal surface of the microprism, the microprism enabled two-photon imaging of the LC via a dorsal approach in anesthetized and awake mice. Then, by crossing GAD-67-GFP mice with Thy1-jRGECO1a mice, the authors showed that they could identify LC modules in vivo using GFP fluorescence while assessing neural responses to auditory, somatosensory, and multimodal stimuli using Ca2+ imaging. Critically, the authors also validated the accuracy of the microprism technique by directly comparing results obtained with a microprism to data collected using conventional imaging of the dorsal-most LC modules, which are directly visible on the dorsal IC surface, finding good correlations between the approaches.

      Through this innovative combination of techniques, the authors found that matrix neurons were more sensitive to auditory stimuli than modular neurons, modular neurons were more sensitive to somatosensory stimuli than matrix neurons, and bimodal, auditory-somatosensory stimuli were more likely to suppress activity in matrix neurons and enhance activity in modular neurons. Interestingly, despite their higher sensitivity to somatosensory stimuli than matrix neurons, modular neurons in the anesthetized prep were far more responsive to auditory stimuli than somatosensory stimuli (albeit with a tendency to have offset responses to sounds). This suggests that modular neurons should not be thought of as primarily representing somatosensory input, but rather as being more prone to having their auditory responses modified by somatosensory input. However, this trend was reversed in the awake prep, where modular neurons became more responsive to somatosensory stimuli than auditory stimuli. Thus, to this reviewer, the most intriguing result of the present study is the dramatic extent to which neural responses in the LC changed in the awake preparation. While this is not entirely unexpected, the magnitude and stimulus specificity of the changes caused by anesthesia highlight the extent to which higher-level sensory processing is affected by anesthesia and strongly suggest that future studies of LC function should be conducted in awake animals.

      Together, the results of this study expand our understanding of the functional roles of matrix and module neurons by showing that responses in LC subregions are more complicated than might have been expected based on anatomy alone. The development of the microprism technique for imaging the LC will be a boon to the field, finally enabling much-needed studies of LC function in vivo. The experiments were well-designed and well-controlled, and the limitations of two-photon imaging for tracking neural activity are acknowledged. Appropriate statistical tests were used. There are three main issues the authors should address, but otherwise, this study represents an important advance in the field.

      (1) Please address whether the Thy1 mouse evenly expresses jRGECO1a in all LC neurons. It is known that these mice express jRGECO1a in subsets of neurons in the cerebral cortex, and similar biases in the LC could have biased the results here.

      Thank you for the reviewer’s comment. In the work published by Dana, et al, the expression of jRGECO1a in all Thy1 mouse lines was determined by the brightness of the jRGECO1a in the soma. Given that some cells do not show a detected level of jRGECO1a fluorescence until activated, the difference in expression shown in different brain regions could be related to the level of neuronal activity at the time of sample processing and not the expression levels of the indicator itself. To the best of our knowledge, there is no antibody for jRGECO1a, which can be used for detecting the expression levels of the indicator regardless of the neuronal activity. To test the hypothesis that DC and LC have different levels of jRGECO1a, we examined the expression levels of jRGECO1a after we perfused the mice with high potassium saline to elicit a general neuronal depolarization in the whole brain. Then we immunostained against NeuN (the neuronal marker) to quantify the percentage of the neurons expressing jRGECO1a to the total number of neurons (indicated by NeuN). To have a fair comparison, we restricted our analysis to include the areas imaged only by 2P as some regions were not accessible by microprism such as the deep ventral regions of the LC. There is a similar % of cells expressing jRGECO1a in DC and LC. As expected, the neurons expressing jRGECO1a were only nonGABAergic cells. We addressed these findings in the new Figure 3-figure Supplement 1 as well as the corresponding text in the results [lines 178-184] and methods sections [lines 878-892].

      (2) I suggest adding a paragraph or two to the discussion to address the large differences observed between the anesthetized and awake preparations. For example, somatosensory responses in the modules increased dramatically from 14.4% in the anesthetized prep to 63.6% in the awake prep. At the same time, auditory responses decreased from 52.1% to 22%. (Numbers for anesthetized prep include auditory responses and somatosensory + auditory responses.). In addition, the tonotopy of the DC shifted in the awake condition. These are intriguing changes that are not entirely expected from the switch to an awake prep and therefore warrant discussion.

      Thank you for the reviewer’s comment. To determine if differences exist between anesthetized and awake data, we have now used the same statistics and edited Figures 4,5,7,8,9, and 10 as well as added a new Figure 11. Accordingly, we have edited the result section and added a paragraph addressing the possible differences between the two preparations in the Discussion section [lines 521-553]..

      (3) For somatosensory stimuli, the authors used whisker deflection, but based on the anatomy, this is presumably not the only somatosensory stimulus that affects LC. The authors could help readers place the present results in a broader context by discussing how other somatosensory stimuli might come into play. For example, might a larger percentage of modular neurons be activated by somatosensory stimuli if more diverse stimuli were used?

      We agree with the reviewer’s point. Indeed, the modules are receiving different inputs from different somatosensory sources such as somatosensory cortex and dorsal column nuclei, which could indicate that the activity of the cells in the modular areas could be evoked by different types of somatosensory stimulations, which is an open area for future studies. We have discussed this point in the revised Discussion section [lines 516-520].

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Figure 3H: The lateral surface seems quite damaged by the prism. An example slice of the imaging area of each mouse would help the reader better understand the extent of damage the prism leaves in the area of interest.

      Thank you for the reviewer’s comment. We already have included such images in Figures 4A, 7A, and 9A to present the field of view of all prism experiments. However, we need to clarify the point of tissue damage. The insertion of microprism may be associated with some tissue damage as a result of making the pocket for the microprism to be inserted, but it is not possible to get neuronal signals from a damaged field of view. Therefore, we do not believe that there is tissue damage to the parts of the LC imaged by microprism. However, there may be some areas where the microprism is not in direct contact with the LC surface. These areas are located mostly in the periphery of the field of view, and they are completely black as they are out of focus (i.e., the left side of Figure 3B). The right side of Figure 3b as well as Figure 3A have some black areas, which present the vasculatures, where there are no red signals because of the lack of jRGECO1a expression in those areas.

      (2) In relation to the data shown in Figure 4E it is claimed that LC is tuned to higher frequencies (lines 195-196). However, the majority of cells appear to be tuned to frequencies below 14kHz (with a median of 7.5 kHz), which is quite low for the mouse. I assume that the authors mean frequencies that are relatively higher than the DC, but it is worth mentioning in the text that the BFs found in the LC are quite low-frequency responses for the mouse.

      Thank you for the reviewer’s comment, which we agree with. We edited this part by acknowledging that around 50% of the LC cells had a low-frequency bias to 5 and 7.1 kHz. Then we mentioned that most of the LC cells are tuned to relatively higher frequencies than those of the DC [lines 215-218].

      (3) Figure 5A-C: Is it the tone-responsive cells plus an additional ~22% of cells that respond to AM, or are there also cells that respond to tones that do not respond to AM. Please break down to which degree the tone and AM responsive cells are overlapping.

      Thank you for the reviewer’s comment and suggestion. We broke down the responsive cells into cells responsive only to pure tone (tone selective cells or Tone-sel) or to only AM-noise (noise selective cells or Noise-sel) as well as cells responding to both sounds (nonselective cells or Non-sel). We examined the fractions of these categories of cells in both LC and DC within all responsive neurons. Accordingly, we have edited Figure 5A-C as well as the text [lines 229-243].

      (4) Figure 5D. It is unclear to me how a cell is classified as SMI or TMI responsive after computing the SMI or TMI for each cell. What statistic was used to determine if the cell was responsive or not?

      Thank you for the reviewer’s comment. We do apologize for the confusion caused by Figures 5D and E. These figures do not show the values of SMI or TMI, respectively. Rather, the figures show the percentage of the spectrally or temporally modulated cells, respectively. At each sound level, the cells were categorized into two main types. The spectrally modulated cells are those responsive to pure tones or unmodulated noise, so they can detect the spectral features of the sound (old Figure 5D or new Figure 5E). The temporally modulated cells are those responsive to AM-noise, so they can detect the temporal features of the sound of complex spectra like the broadband noise (old Figure 5E or new Figure 5F). To clear this confusion, we removed the words SMI and TMI from the figures, and then we renamed the x-axis label into “% of spectrally modulated cells” and “% of temporally modulated cells” for Figures 5D (new 5E) and E (new 5F), respectively.

      (5) Figure 5 D, E: Is the decrease in SMI and TMI modulated cells in the modules a result of simply lower sensitivity to sounds (i.e. higher response thresholds)? If a cell responds to neither tone, AM, or noise it will have a low SMI and TMI index. If this is the case that affects the interpretation, as it is then not a decrease in sensitivity to spectral or temporal modulation, but instead a difference in overall sound sensitivity.

      Thank you for the reviewer’s comment. We apologize for the confusion about Figures 5E and D, which did not show the SMI and TMI values. Rather, they show the percentage of spectrally or temporally modulated cells, respectively, as explained in our previous response. Therefore, Figure 5D shows the percentage of cells that can detect the spectral features of sound, while Figure 5E shows the percentage of cells that can detect the temporal features of sounds of complex spectra like broadband noise. Accordingly, Figures 5D and E show the sensitivity to different features of sound and not the overall sound sensitivity.

      (6) Figure 7 and 8: What is the false positive rate expected of the responsive cells using the correlation cell flagging criteria? Especially given that the fraction of cells responsive to somatosensory stimulation in LC (matrix) is 0.88% and 1.3% in DC, it is important to know what the expected false positive rate is in order to be able to state that there are actually somatosensory responses there or if this is what you would expect from false positives given the inclusion test used. Please provide an estimate of the false positive rate given your inclusion test and show that the rate found is statistically significantly above that level - and show this rate with a line in Figure 7 H, I.

      Thank you for the reviewer’s comment. To test the efficiency of the correlation method to determine the responsive cells, we initially ran an ROC curve comparing the automated method to a blinded human interpretation. The AUC of the ROC curve was 0.88. This high AUC value indicates that the correlation method can rank the random responsive cells than the random nonresponsive cells. At the correlation coefficient (0.4), which was the cutoff value to determine the responsive cells for somatosensory stimulation, the specificity was 87% and the sensitivity 72%, the positive predictive value was 73%, and the negative predictive value was 86%. Although the above percentages indicate the efficiency of the correlation method, we excluded all the false responsive cells from the analysis. Therefore, the fractions of cells in the graphs are the true responsive cells with no contamination of the non-responsive cells. We also modified Figures 7H and I to match the other data sets obtained from awake animals. Therefore, Figures 7H and I no longer show the average of the responsive cells. Instead, they show the % of different fractions of responsive cells within each cellular motif (modules and matrix). Accordingly, we believe that there is no need to include a rate line on the graph. We added the section describing the validation part to the methods section [lines 808-815].

      (7) Figure 7: Please clarify what is meant by a cell responding to 'both responding to somatosensory and auditory stimulation'. Does it mean that the cell has responses to both auditory and somatosensory stimulation when presented individually or if it responds to both presented together? If it is the former, I don't understand how the number to both can be higher than the number of somatosensory alone (as both requires it also to respond to somatosensory alone). If it is the latter (combined auditory and somatosensory) then it seems that somatosensory inputs remove the responsiveness of most cells that were otherwise responsive to auditory alone (e.g. in the module while 42% respond to sound alone, combined stimulation would leave only 10% of cells responsive). Please clarify what exactly the authors are plotting and stating here.

      Thank you for the reviewer’s comment. The responsive cells in Figure 7 are divided into three categories. Each category has a completely different group of cells. The first category is for the cells responding only to auditory stimulation (auditory-selective cells or Aud-sel). The second category is for the cells that respond only to somatosensory stimulation (somatosensory selective cells or Som-sel). The third category is for the cells that respond to both auditory and somatosensory stimulations when both stimulations are presented individually (auditory/somatosensory nonselective cells or Aud/Som-nonsel). Accordingly, the number of cells may be different across all these categories. We have clarified this part in the text [lines 299-303]. We have modified Figures 7, 8, and 11 (old Figure 10) to match the data from anesthetized and awake animals, so Figures 7H and I now show the collective % of the cells from all animals within modules vs matrix.

      (8) Why are the inferential statistics used in Figure 9F (chi-square test) and Figure 5A-C (t-test) when it tests the same thing (the only difference is one is anaesthetised data and the other awake)? Indeed, all Figure 9 and 10 (awake data figures) plots use chi-square tests to test differences in percentages instead of t-tests used in earlier (anaesthetised data figures) plots to test differences in percentages between groups. Please clarify the reason for this change in statistics used for similar comparisons.

      Thank you for the reviewer’s comment. Imaging the LC via microprism from awake animals confirmed the ability to run this technique with no interference to the ambulatory functions of the animals. Therefore, the main goal was to highlight the similarities between the data obtained from awake and anesthetized setups by highlighting the comparison between the LC and DC or between modules and matrix within each preparation (anesthetized vs awake). Accordingly, the statistics used to run these comparisons were chosen based on the number of the tested animals at each setup (7 anesthetized animals and 3 awake animals for prism insertion). The low number of animals used for awake data made us use the number of cells collectively from all animals instead of the number of animals, so we used the Chi-square test to examine the differences in percentages.

      (9) Figure 10H: The main text describes the results shown here as similar to what was seen in anaesthetised animals. But it looks to me like the results in awake animals are qualitatively different from the multisensory interaction seen in anaesthetised animals. In anaesthetised animals the authors find that there is a higher chance of auditory responses being enhanced by somatosensory inputs when cells are in the modules compared to in the matrix. However, in awake data, this relationship is flipped, with more bimodal enhancement found in the matrix compared to the modules. Furthermore, almost all cells in the modules are suppressed by combined somatosensory input which looks like it is different from what is found in anaesthestised mice and what is described in the discussion: 'we observed that combined auditory-somatosensory stimulation generally suppressed neural responses to auditory stimuli and that this suppression was most prominent in the LC matrix'.

      Thank you for the reviewer’s comment. Our statement was meant to show how the data obtained from awake and anesthetized animals were generally similar. However, we agree that the statement may not be suitable due to the possible differences between awake and anesthetized animals. To address a fair comparison between the anesthetized and awake preparations, we ran similar statistics and graphs for Figures 7, 8, and 11 (old Figure 10). Given that the areas occupied by modules and matrix are different across animals due to the irregular shape of the modules, we chose to run a chi-square test for all the data to quantify the collective % of responding cells within modules vs matrix from all tested animals for each experimental setup (anesthetized vs awake). The anesthetized and awake animals similarly showed that modules and matrix had higher fractions of auditory responsive cells. However, matrix had more cells responding to auditory stimulations than modules, while modules had more cells responding to somatosensory stimulation than matrix. In contrast, while the anesthetized animals showed higher fractions of offset auditory-responsive cells, which were mostly clustered in the modules, the offset auditory-responsive cells were very rare in awake animals (6 cells/one animal).

      Based on the fractions of cells with suppressed or enhanced auditory response induced by bimodal stimulation, the data obtained from anesthetized and awake animals showed that the auditory response in the matrix was suppressed more than enhanced by bimodal stimulation. In contrast, modules had different profiles across the experimental setups and locations. For instance, the modules imaged via microprism in the anesthetized and awake animals showed suppressed more than enhanced auditory responses, but modules imaged from the dorsal surface in anesthetized animals showed enhanced more than suppressed auditory responses. Additionally, modules had less suppressed and more enhanced auditory responses compared to matrix in the anesthetized animals regardless of the location of the modules (microprism or dorsal surface). Yet, modules from awake animals had more suppressed and less enhanced auditory responses compared to matrix. We have addressed these differences in the results and discussion section.

      Additional minor comments that I think the authors could use to aid their manuscript clarity:

      (1) The figure colour selection - especially in Figures 7 and 8 - is really hard to tell apart. Please choose more distinct colours, and a colour scheme that is appropriate for colour blind readers.

      Thank you for the reviewer’s suggestion. We have noticed that the magenta color assigned for the cells with offset responses was very difficult to distinguish from the black background. We have changed the magenta color to green to be different from the color of other cells. Using Photoshop, we chose a color scheme that is suitable for color-blind readers in all our maps.

      (2) The sentence in lines 331-334 should be rephrased for clarity.

      Thank you for the reviewer’s suggestion. We have rephrased the statement for clarity [lines 364-371].

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in the public review the strong clustering evident in some of the maps (some of which may be related to module/matrix differences but certainly not all of it) seems worth scrutinizing further. Would we expect such a strong spatial segregation of auditory responsive and non-responsive neurons? Would we expect response properties (e.g. off-responsiveness) other than frequency tuning to show evidence of a topographic arrangement in the IC? In addressing this it would, of course, be important to rule out that this clustering is not down to some trivial experimental variables and truly reflects functional organization. For instance, are the patches of non-responsive neurons found in parts of the field of view with poor visibility, poor labelling, etc which may explain why it is difficult to pick up responses there? Are the neurons in non-responsive areas otherwise active (i.e. do they show spontaneous activity) or could they be 'dead'? Could the way neuropil signals are dealt with play a role here (it is weighted by 0.4 which strikes me as quite low)? In relation to this, I am also wondering to what extent the extreme overrepresentation (Figure 4) of neurons with a BF of 5kHz (some of this is, of course, down to the fact that the lower end of the frequency range was 5kHz and that the step size was 0.5 octaves), especially in the DC, is to be interpreted.

      Thank you for the reviewer’s comment. Before analysis, the ROIs of all cells were set around the cell bodies using the jRGECO1a signals as a reference, so all cells (responsive and nonresponsive) were collected from areas of good visibility of jRGECO1a signals. In other words, no cells were collected from regions having poor jRGECO1a signals. In Figures 7, 8, and 11 (old Figure 10), the cells showed response either only to unmodulated broadband noise at 80 dB as an auditory stimulus or to whisker deflection with specific speed and power as a somatosensory stimulus. Given that the two stimuli above had specific parameters, the remaining non-responsive cells may respond to auditory or somatosensory stimulations with other features. For instance, some nonresponsive cells to the unmodulated broadband noise were responding to pure tones with different amplitudes and frequencies or to different AM-noise with different amplitudes and modulation frequencies.  Also, these nonresponsive cells may not respond to any of our tested stimuli and may respond to other sensory stimulations. Some of the non-responsive cells showed spontaneous activity when no stimulations were presented. However, we can not rule out the possibility that some of these nonresponsive cells may not be viable. We have addressed the clustering properties in the revised version of the manuscript in the corresponding spots of the results and discussion sections. We have added a new supplementary figure (Figure 11- Figure Supplement 1) to show how the nonresponsive cells to the unmodulated noise may respond to other types of sound and to show the spontaneous activity of some non-responsive cells.

      For the neuropil, previous reports used the contamination factor (r) in a range of 0.3-0.7 (we referenced these studies in the method section [line 776) based on the tissue or cells imaged, vasculatures, and the objective used for imaging. Therefore, we optimized the contamination factor (r) to be 0.4 through a preliminary analysis based on the tissue we image (LC), and the objective used (16x with NA = 0.8 and 3 mm as a working distance).

      We agree that there is an overrepresentation of 5 kHz as the best tuning frequency for DC cells. The previous report (A. B. Wong & Borst, 2019) showed a large zone of the DC where cells were tuned to (2-8 kHz). Given that 5kHz was the lowest tested frequency in our experiment, we think that the low-frequency bias of the DC surface is consistent between studies. This finding also could be supported by the electrophysiology data obtained by spanning the recording electrodes through the IC tissue along the dorsoventral axis. In those experiments, the cells were tuned to lower frequencies at the dorsal surface of the IC.

      We have changed the magenta-colored cells to green ones, so it will be easier to identify the cells. As required by another reviewer, we changed the color pallets of some images and cellular maps to be suitable for color-blind readers. 

      The manuscript would benefit from more precise language in a number of places, especially in the results section.

      Line 220/221, for instance: "... a significant fraction of cells that did not respond to pure tones did respond to AM-noise" Strictly speaking, this sentence suggests that you considered here only the subset of neurons that did not respond to pure tones and then ran a test on that subset. The test that was done seems to suggest though that the authors tested whether the percentage of responsive cells was greater for pure tones or for AM noise.

      Thank you for the reviewer’s comment. We do apologize for the confusion. In the revised manuscript, we categorized the cells according to their response into cells responding to pure tone only (tone-selective cells or Tone-sel), Am-noise only (noise-selective cells or Nose-sel), and to both pure tone and am-noise (nonselective cells or Non-sel). We have modified Figure 5 accordingly. We did the same thing for the data obtained from awake animals and showed that in a new figure to easily match the analysis done for the anesthetized animals.

      Please refer to the figure panels in the text in consecutive order. 2B, for instance, is mentioned after 2H.

      Thank you for the reviewer’s comment. Throughout the paper, we kept the consecutive order of the figure panels within each figure to be in a smooth flow with the text. Yet, figure 2 was just the only exception for a good reason. Figure 2 is a complex one that includes many panels to show a parallel comparison between LC imaged via microprism and DC through single photon images, two-photon images, validating laser lesioning, and histology. Accordingly, we navigated many panels of the figure to efficiently highlight the aspects of this comparison. We prefer to keep Figure 2 as one figure with its current format to show this parallel comparison between LC and DC.

      The legend for Figure 2 could be clearer. For instance, there are two descriptions for panel D. Line 1009: "(C-E)" [i.e. C, D, E] and line 1010: "(D and F)".

      Thank you for the reviewer’s comment. It should be C and E, not C-E. We have fixed the mistake [line 1224]

      Line 275: What does 'with no preference' mean?

      Thank you for the reviewer’s comment. We do apologize for the confusion. There are three categories of cells. Some cells respond only to auditory stimulation, while others respond to only somatosensory stimulation. However, there is another group of cells that respond nonselectively to auditory and somatosensory stimulations or Aud/Som-nonsel cells. We edited the sentence to be clearer [lines 303-304].

      Line 281 (and other places): What does 'normalized against modules' mean?

      Thank you for the reviewer’s comment. This normalization was done by dividing the number of responsive cells of the same response type in the matrix by that in the modules. Therefore, the value taken by modules was always 1 and the value taken by the matrix is something around 1. Accordingly, the value for matrix could be > 1 if matrix had more cells than modules. In contrast, the value of matrix would be < 1 if matrix had fewer cells than modules. In the revised version, we used this normalization method to make the revised Figures 5C and 10C to describe the cell fractions responding to pure tone only, AM-noise only, or to both stimuli in the matrix vs modules. 

      Sentence starting on line 288. I don't find that point to be as obvious from the figures as the sentences seem to suggest. Are we to compare magenta points (auditory off cells) from 7C with green points in 7F?

      Thank you for the reviewer’s comment. We came to this conclusion based on our visual comparison of magenta points (now green in the revised version to increase the visibility) representing the auditory offset cells in Figure 7C and the green points in Figure 7F representing the cells responding to both somatosensory and auditory stimulations. In the revised manuscript, we statistically examined if the percentage of onset auditory response and offset auditory responses are different within the responsive cells to both somatosensory and auditory stimulations in the modules vs matrix. We have found that most of the cells responding to both somatosensory and auditory stimulations inside the modules had offset auditory responses, which could indicate a level of multisensory integration between somatosensory input and the offset auditory responses in these cells. We have added the statistical results to the revised manuscript to address this effect [lines 312-317]

      Lines 300-302: "These data suggest that the module/matrix system permits preservation of distinct multimodal response properties in the face of massive integration of inputs in the LC". First, I'm not quite sure what that sentence means. Second, it would be more appropriate for the discussion. Third, the fact that we are more likely to find response enhancement in the modules than in the matrix is nicely consistent with the idea (supported by work from the senior author's lab and others) that excitatory somatosensory input predominantly targets neurons in the modules (which is why we see mostly response enhancement in the modules) and that this input targets GABAergic neurons which then project to and inhibit neurons both outside and inside of their module. Therefore, I would recommend that the authors replace the aforementioned sentence with one that interprets these results in light of what we know about this somatosensory-auditory circuitry.

      Thank you for the reviewer’s comment. Despite the massive multimodal inputs, the LC receives from auditory vs nonauditory regions, the module/matrix system is a platform for distinct multimodal responses indicated by more somatosensory responsive cells in modules versus more auditory responsive cells in matrix, which matches the anatomical differences that were reported before. We edited the sentence in the light of the comparison between the data obtained from awake and anesthetized animals and moved it to the discussion section [lines 503-506].

      The term 'LC imaged via microprism' is used dozens of times throughout the manuscript. Replacing it with a suitable acronym or initialism could improve the flow of the text and would make some of the sentences less cumbersome.

      Thank you for the reviewer’s suggestion. We changed the term “LC imaged via microprism” into LC (microprism) throughout the revised manuscript.

      5A-C: It is unclear what is being compared here. What are the Ns? Different animals?

      Thank you for the reviewer’s comment. We do apologize for this missing information. We have added the number of subjects used in every statistical test in each corresponding figure legend.

      5G: minus symbol missing on the y-axis.

      Thank you for the reviewer’s comment. We gladly have fixed that.

      Figure 6: Are these examples or population averages?

      Thank you for the reviewer’s question. Every figure panel of the old Figure 6 represents a single trace of an example cell. However, we modified Figure 6 to include more examples of cells showing different responses complying with another reviewer’s suggestion. Each panel of the new Figure 6 represents the average response of 5 stimulations of the corresponding stimulus type. We preferred to show the average signal because it was the one used for the subsequent analysis.

      How are module borders defined?

      Thank you for the reviewer’s question. The modules were defined based on the intensity of the green channel that shows the expression of the GFP signals. The boundaries of modules were determined according to the distinction between high and low GFP signal boundaries of the modules. This step was done before data analysis to avoid any bias.

      7JKL: How are these to be interpreted? Does panel 7K, for instance, indicate that the fraction of neurons showing 'on' responses was roughly twice as large in the matrix than in the modules and that the fraction of neurons showing 'off' responses was roughly 10 times larger in the modules than in the matrix (the mean seems to be at about 1/10).

      Thank you for the reviewer’s comment. The data represented by Figures 7J-L defined the normalization of the number of cells of the same response type in the matrix against the modules. This normalization was done per animal, and then the data of the matrix were plotted against the normalization line at 1 representing the modules. The matrix will be claimed to have more cells than modules if the median of the matrix values > 1. In contrast, the matrix will be claimed to have fewer cells than the modules if the median of the matrix values < 1. Finally, if the median of matrix values = 1, this means there is no difference between matrix and modules. However, to match the data obtained from anesthetized animals (Figures 7 and 8) with those obtained from awake animals (Figure 11 or old Figure 10), we ran all data through the Chi-square test in the revised manuscript. Therefore, the format of Figures 7K-L was changed in the revised manuscript, so they became new Figures 7I-K.

      10A suggests that significantly more than half the neurons shown here are not auditory responsive. Perhaps I am misinterpreting something here but isn't that in contrast to what is shown in panel 9F?

      Thank you for the reviewer’s comment. The data shown in Figure 10A (or revised Figure 11A) represents the cellular response to only one stimulus (broadband noise at 80 dB with no modulation frequency), while Figure 9F (revised 10B) represents the cells responding to varieties of auditory stimulations of different combinations of frequencies and amplitudes (pure tones) as well as to AM-noise of different amplitudes and modulation frequencies. Accordingly, the old Figure 9F or revised Figure 10B shows different cell types based on their responses. For instance, some cells respond only to pure tone. Others respond only to AM-noise or to both pure tones and AM-noise. This may also support that the nonresponsive cells in Figure 10A (revised 11A) can respond to other types of sound features.

      The way I understood panels 7L and 8K there were more suppressed neurons in the matrix than in the modules (line 296: "cells in the modules had a higher odds of having an enhancement response to bimodal stimulation than matrix, while cells in the matrix had a higher odds of having a suppressive response to bimodal stimulation"). Now, panel 10F indicates that in awake mice there is a greater proportion of suppressed neurons in the modules than in the matrix. I may very well have overlooked or misread something but I may not be the only reader confused by this so please clarify.

      Thank you for the reviewer’s comment. We do apologize for this confusion. The ambiguity between Figures 7 and 8 (anesthetized animals) as well as Figure 10 (awake animals) comes from the fact that different statistics have been used for each preparation. In the revised version, we have fixed that by running the same statistics for all the data, and we accordingly revised Figures 7, 8, and 10 (new Figure 11). In brief, the matrix preserves a higher percentage of cells with suppressed auditory responses than those with enhanced auditory responses induced by bimodal stimulation in all conditions (anesthetized vs awake). In contrast, modules act differently across all tested conditions. While modules had more cells with enhanced auditory responses induced by bimodal interaction in anesthetized animals, they had more cells with suppressed response in awake animals indicating that modules could be sensitive to the effect of anesthesia compared to matrix. We addressed this effect in the discussion of the revised manuscript [lines 521-553].

      Line 438: ...as early AS...

      Thank you for the reviewer’s comment. We gladly fixed that [line 512].  

      Reviewer #3 (Recommendations For The Authors):

      My minor recommendations for the authors are as follows:

      (1) The text can be a bit difficult to follow in places. This is partly due to the convoluted nature of the results, but I suggest a careful read-through to look for opportunities to improve the prose. In particular, there is a tendency to use long sentences and long paragraphs. For example, the third paragraph of the introduction runs for almost fifty lines.

      Thank you for the reviewer’s comment and suggestion. We have fixed that.

      (2) This might be due to journal compression, but some of the bar graphs in the figures are difficult to read. For example, the individual data points, especially when filled with striped background colors get lost. Axes can become invisible, like the y-axis in 7L, and portions of bars, like in 7F, are not always rendered correctly. Error bars are sometimes hidden behind data points, as in 5C. Increasing line thickness and shifting individual data points away from error bars might help with this.

      Thank you for the reviewer’s comment and suggestion. We made all the data points with black color and filled circles to make the data points visible. We put all the data points behind the main columns, so they don’t block the error bars. We have fixed figures 7 and 5.

      (3) Throughout the manuscript, the authors use a higher SMI to indicate a preference of cells for auditory stimuli with "greater spectral... complexity" (e.g., lines 219 and 372). I find this interpretation a bit challenging since SMI compares a neuron's preference for tones over noise, and to me, tones seem like the least spectrally complex of all auditory stimuli. Perhaps some clarification of what the authors mean by this would help. For example, is the assumption that a neuron that prefers tones over noise is, either directly or indirectly, receiving input sculpted by inhibitory processes?

      Thank you for the reviewer’s comment. In general, higher SMI values indicate an increase in the preference of the cells to respond to pure tones than noise with no modulation (less spectral complexity). We will clarify this statement throughout the manuscript. However, the SMI value was not mentioned in lines 219 and 372. The statement mentioned in line 219 describes the revised figure 5C (old 5B), where more cells in matrix specifically respond to AM-noise compared to modules, which indicates the preference of the matrix to respond to sounds of greater spectral and temporal complexity. The statement in 372 in the discussion section refers to the finding in revised figures 5E and F (old 5D and E). In the revised figure 5E or old 5D, the data show that matrix has more cells responding to pure tones or noise with no modulation than modules, so matrix has a lower threshold to detect the spectral features of sound (revised figure 5E or old 5D). In the revised figure 5F or old 5E, the data show that matrix has more cells responding to AM-noise than modules, which indicates that matrix functions more to process the temporal features of sound. As explained above, all findings were related to the percentage of cells responding to specific sound stimuli and not the exact SMI values. We have revised the figures accordingly by removing the terms SMI and TMI from the figures, and we have clarified that in the text.

      (4) Lines 250-253: How does a decrease in SMI correspond to "an increase in pure tone responsiveness?" Doesn't a decrease suggest the opposite?

      Thank you for the reviewer’s comment, which we agree with. We do apologize for that. We have fixed this statement [lines 275-277] and any related findings accordingly.

      (5) Line 304: Add "imaged via microprism" or similar after "response profiles with the LC.".

      Thank you for the reviewer’s suggestion. We have fixed that. However, we changed the term “LC imaged via microprism” into “LC(microprism)” for simplicity as suggested by another reviewer [line 330].

      (6) Figure 5A and C: Both plots show that more neurons responded to AM-noise than tones, but it would be interesting to know how much the tone-responsive and AM-noise responsive populations overlapped. Were all tone-responsive neurons also responsive to AM-noise?

      Thank you for the reviewer’s comment. We have categorized the cells based on their response to pure tone only, AM-only, and both pure tone and AM-noise when each stimulus is presented individually. We have modified Figures 5A and C, and they are now Figures 5B and D.

      (7) Figure 5G: Missing negative sign before "0.5.".

      Thank you for the reviewer’s suggestion. We gladly have fixed that. However, old Figure 5G became a revised Figure 5H.  

      (8) Figure 7 legend, Line 1102: Missing period after "(C and E)".

      Thank you for the reviewer’s suggestion. We think that the period should be placed before (C and E) at the end of “respectively”. The parentheses refer to the statements after them. We gladly fixed that. [line 1394]

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This study presents valuable findings as it shows that sleep rhythm formation and memory capabilities depend on a balanced and rich diet in fly larvae. The evidence supporting the claims of the authors is convincing with rigorous behavioral assays and state-of-the-art genetic manipulations. The work will be of interest to researchers working on sleep and memory. 

      Public Reviews: 

      Summary: 

      This manuscript investigates how energetic demands affect the sleep-wake cycle in Drosophila larvae. L2 stage larvae do not show sleep rhythm and long-term memory (LTM), however, L3 larvae do. The authors manipulate food content to provide insufficient nutrition, which leads to more feeding, no LTM, and no sleep even in older larvae. Similarly, activation of NPF neurons suppresses sleep rhythm. Furthermore, they try to induce a sleep-like state using pharmacology or genetic manipulations in L2 larvae, which can mimic some of the L3 behaviours. A key experimental finding is that activation of DN1a neurons activate the downstream DH44 neurons, as assayed by GCaMP calcium imaging. This occurs only in third instar and not in second instar, in keeping with the development of sleep-wake and feeding separation. The authors also show that glucose metabolic genes are required in Dh44 neurons to develop sleep rhythm and that DH44 neurons respond differently in malnutrition or younger larvae. 

      Strengths: 

      Previous studies from the same lab have shown the sleep is required for LTM formation in the larvae, and that this requires DN1a and DH44 neurons. The current work builds upon this observation and addresses in more detail when and how this might develop. The authors can show that low quality food exposure and enhanced feeding during larval stage of Drosophila affects the formation of sleep rhythm and long-term memory. This suggests that the development of sleep and LTM are only possible under well fed and balanced nutrition in fly larvae. Non-sleep larvae were fed in low sugar conditions and indeed, the authors also find glucose metabolic genes to be required for a proper sleep rhythm. The paper presents precise genetic manipulations of individual classes of neurons in fly larvae followed by careful behavioural analysis. The authors also combine thermogenetic or peptide bath application experiments with direct calcium imaging of specific neurons. 

      Weaknesses: 

      The authors tried to induce sleep in younger L2 larvae, however the behavioral results suggest that they were not able to induce proper sleep behaviour as in normal L3 larvae. Thus, they cannot show that sleep during L2 stage would be sufficient to form LTM. 

      We agree that the experiments with Gaboxadol feeding in L2 did not perfectly mimic L3 sleep behaviors. However, genetic induction of sleep in L2 was effective in increasing sleep duration and depth similar to that observed in normal L3. As noted below in response to specific reviewer comments, because gaboxadol feeding is standard in the field for adult sleep induction, we prefer to still include this data in the manuscript for transparency. Moreover, the gaboxadol manipulation did cause a significant decrease in arousal threshold compared to control larvae. Together these approaches support the hypothesis that sleeping more/more deeply is not sufficient to promote LTM in L2.

      The authors suggest that larval Dh44 neurons may integrate "information about the nutritional environment through the direct sensing of glucose levels to modulate sleep-wake rhythm development". They identify glucose metabolism genes (e.g., Glut1) in the downstream DH44 neurons as being required for the organization of the sleep-wake-feeding rhythm, and that CCHa signaling in DN1a signaling to the DH44 cells via the receptor. However, how this is connected is not well explained. Do the authors think that the nutrient sensing is only occurring in the DH44 neurons and not in DN1a or other neurons? Would not knocking down glucose metabolism in any neuron lead to a functional defect? What is the evidence that Dh44 neurons are specific sensors of nutritional state? For example, do the authors think that e.g. the overexpression of Glut1 in Dh44 neurons, a manipulation that can increase transport of glucose into cells, would rescue the effects of low-sugar food? 

      We thank the reviewer for these suggestions and have added the experiment proposed. We found that knockdown of Hex-C in DN1a neurons did not disrupt sleep-wake rhythms (Fig. S4G-I) suggesting that Dh44 neurons are specialized in requiring glucose metabolism to drive sleep-wake rhythms. We have also added further clarification in the text regarding the existing evidence that Dh44 neurons act has nutrient sensors.

      Some of the genetic controls seem to be inconsistent suggesting some genetic background effects. In Figure 2B, npf-gal4 flies without the UAS show no significant circadian change in sleep duration, whereas UAS-TrpA flies do. The genetic control data in Figure 2D are also inconsistent. Npf-Gal4 seems to have some effect by itself without the UAS. The same is not seen with R76G11-Gal4. Suppl Fig 2: Naïve OCT and AM preference in L3 expressing various combinations of the transgenes show significant differences. npf-Gal4 alone seems to influence preference. 

      The sleep duration and bout number/length data are highly variable. 

      All experiments are performed in isogenized background so variability seen in genetic controls likely reflects stochastic nature of behavioral experiments. Indeed, adult sleep data also shows a great deal of variability within the same genetic background (PMID: 29228366). We agree it is an important point, and we attempt to minimize variability as much as possible with backcrossing of flies and tight control of environmental conditions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Low sugar exposure and activation of NPF neurons might not induce the same behavioral changes. LS exposure does not enhance mouth hook movements, but overall food intake. NPF activation seems to enhance mouth hook movements, but the data for food intake is not shown. This information would be necessary to compare the two different manipulations. 

      We thank the reviewer for this suggestion. However, we elected not to perform food intake experiments with the NPF activation experiments. Since we are not directly comparing the low sugar and NPF manipulations to each other, we think that both experiments together support the conclusion that immature food acquisition strategies (whether food intake or feeding rate) limit LTM performance. 

      The authors write that the larval feeding assays run for 4 hours, can they explain why that long? Larvae should already have processed food within 4 hours, so that the measurement would not include all eaten food.

      We clarified the rationale for doing 4 hour feeding assays in the results section. We did 4 hours on blue dyed food because initial experiments of 1 hour with control L3 at CT1-4 were difficult to interpret. The measurement does not include all of the eaten food in the 4 hours but does reflect more long-term changes in food intake.

      Sleep induction with Gaboxadol seems to not really work - sleep duration, bout number and length are not enhanced, and arousal threshold is only slightly lower. Thus, the authors should not use this data as an example for inducing sleep behaviour. 

      We agree this approach did not have a large effect in larvae. However, because gaboxadol feeding is standard in the field for adult sleep induction, we prefer to still include this data in the manuscript for transparency. Moreover, the Gaboxadol manipulation did cause a mild (but significant) decrease in arousal threshold compared to control larvae. Gaboxadol feeding also caused a significant decrease in total body weight compared to control larvae indicating that even slightly deeper sleep could be detrimental to younger animals.

      Activation of R76G11 with TrpA1 seems to work better for inducing sleep like behaviour. However, the authors describe that they permanently activated neurons. To induce a "normal" sleep pattern, the authors might try to only activate these neurons during the normal enhanced sleep time in L3 (CT13?) and not during the whole day. This might also allow larvae to eat during day time and gain more weight. 

      We apologize that this point was not clearer, but we did do acute activation of R76G11(+) neurons, as proposed by the reviewer. We have clarified the text to make this point.

      It would be interesting to see how larvae fed with high sucrose and low protein diet would behave in this assay. Do the authors suggest that sugar is most important for the development of sleep behaviour or that it is a combination of sugar and protein that might be required? 

      We agree that feeding larvae a high sucrose and low protein diet would be interesting. However, we initially tried a low protein diet and observed significant developmental delays. Therefore, we are concerned that developmental defects on a high sucrose and low protein diet would confound behavioral results. Additionally, the Dh44 manipulations (glucose & GCN2 signaling) suggest that sugar is the most important for the development of sleep behaviors.

      Reviewer #3 (Recommendations For The Authors): 

      The authors could discuss if the interaction between DN1a clock neurons and Dh44 neurons is mediated synaptic or by volume transmission following the extracellular release of the CCHa1 neuropeptide. They write that "the development of Dh44 neuronal competency to receive clock-driven cues" and that "DN1a clock neurons anatomically and functionally connect to Dh44" but a discussion about volume vs. synaptic signalling would be of interest. 

      We thank the reviewer for this suggestion. We revised the discussion to address this point.

      line 223 " demonstrating that post-synaptic processes likely". It would be interesting to read a discussion on whether it is known if these are postsynaptic or peptide-mediated volume effects? 

      We added additional text to the discussion to address these points.

      - The authors may want to include a schematic of the circuit and how its position in the general anatomy of the fly larva. 

      We thank the reviewer for this suggestion. We have added a model figure to Fig. S6.

      "Dh44 neurons act through glucose metabolic genes" - consider rewording e.g. require glucose metabolic genes 

      We revised the text.

      - line 45 "Early in development, young animals must obtain enough nutrients to ensure proper growth" - this is too general, many animals do not feed in early life-cycle stages (e.g. lecitotrophic development), consider rewording 

      We revised the text to be more specific.

      - line 90 "however, L3 at CT1 consume more than L3 at CT12 (Figure S1A)" - typo CT13, also consider rewording to match the structure of the sentence before 'however, L3 consumed more at CT1 than at CT13' 

      We revised the text to fix this error.

      - Line 111 "and loss of deep sleep" - how is deep sleep defined and measured in the larvae? It is not clear from the data or the text. 

      We revised the text to define deep sleep in the results section. We also have a description of how arousal threshold is calculated in the methods.

      - In Figure 3B and G the individual data points are not shown 

      We did not show individual data points for those graphs because we are plotting the average percentage of 4 biological replicates.

      Typo: 

      Figure 1 legend "F, n= n=100-172 " 

      We revised the text to fix this typo.

    1. Reviewer #2 (Public Review):

      This manuscript is motivated by the question of what mechanisms cause overyielding in mixed-species communities relative to the corresponding monocultures. This is an important and timely question, given that the ultimate biological reasons for such biodiversity effects are not fully understood.

      As a starting point, the authors discuss the so-called "additive partitioning" (AP) method proposed by Loreau & Hector in 2001. The AP is the result of a mathematical rearrangement of the definition of overyielding, written in terms of relative yields (RY) of species in mixtures relative to monocultures. One term, the so-called complementarity effect (CE), is proportional to the average RY deviations from the null expectations that plants of both species "do the same" in monocultures and mixtures. The other term, the selection effect (SE), captures how these RY deviations are related to monoculture productivity. Overall, CE measures whether relative biomass gains differ from zero when averaged across all community members, and SE, whether the "relative advantage" species have in the mixture, is related to their productivity. In extreme cases, when all species benefit, CE becomes positive. When large species have large relative productivity increases, SE becomes positive. This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0), or that competitively superior species dominate mixtures and thereby driver overyielding (SE>0).

      However, it is very important to understand that CE and SE capture the "statistical structure" of RY that underlies overyielding. Specifically, CE and SE are not the ultimate biological mechanisms that drive overyielding, and never were meant to be. CE also does not describe niche complementarity. Interpreting CE and SE as directly quantifying niche complementarity or resource competition, is simply wrong, although it sometimes is done. The criticism of the AP method thus in large part seems unwarranted. The alternative methods the authors discuss (lines 108-123) are based on very similar principles.

      The authors now set out to develop a method that aims at linking response patterns to "more true" biological mechanisms.

      Assuming that "competitive dominance" is key to understanding mixture productivity, because "competitive interactions are the predominant type of interspecific relationships in plants", the authors introduce "partial density" monocultures, i.e. monocultures that have the same planting density for a species as in a mixture. The idea is that using these partial density monocultures as a reference would allow for isolating the effect of competition by the surrounding "species matrix".

      The authors argue that "To separate effects of competitive interactions from those of other species interactions, we would need the hypothesis that constituent species share an identical niche but differ in growth and competitive ability (i.e., absence of positive/negative interactions)." - I think the term interaction is not correctly used here, because clearly competition is an interaction, but the point made here is that this would be a zero-sum game.

      The authors use the ratio of productivity of partial density and full-density monocultures, divided by planting density, as a measure of "competitive growth response" (abbreviated as MG). This is the extra growth a plant individual produces when intraspecific competition is reduced.

      Here, I see two issues: first, this rests on the assumption that there is only "one mode" of competition if two species use the same resources, which may not be true, because intraspecific and interspecific competition may differ. Of course, one can argue that then somehow "niches" are different, but such a niche definition would be very broad and go beyond the "resource set" perspective the authors adopt. Second, this value will heavily depend on timing and the relationship between maximum initial growth rates and competitive abilities at high stand densities.

      The authors then progress to define relative competitive ability (RC), and this time simply uses monoculture biomass as a measure of competitive ability. To express this biomass in a standardized way, they express it as different from the mean of the other species and then divide by the maximum monoculture biomass of all species.

      I have two concerns here: first, if competitive ability is the capability of a species to preempt resources from a pool also accessed by another species, as the authors argued before, then this seems wrong because one would expect that a species can simply be more productive because it has a broader niche space that it exploits. This contradicts the very narrow perspective on competitive ability the authors have adopted. This also is difficult to reconcile with the idea that specialist species with a narrow niche would outcompete generalist species with a broad niche. Second, I am concerned by the mathematical form. Standardizing by the maximum makes the scaling dependent on a single value.

      As a final step, the authors calculate a "competitive expectation" for a species' biomass in the mixture, by scaling deviations from the expected yield by the product MG ⨯ RC. This would mean a species does better in a mixture when (1) it benefits most from a conspecific density reduction, and (2) has a relatively high biomass.

      Put simply, the assumption would be that if a species is productive in monoculture (high RC), it effectively does not "see" the competitors and then grows like it would be the sole species in the community, i.e. like in the partial density monoculture.

      Overall, I am not very convinced by the proposed method.

      (1) The proposed method seems not very systematic but rather "ad hoc". It also is much less a partitioning method than the AP method because the other term is simply the difference. It would be good if the authors investigated the mathematical form of this remainder and explored its properties.. when does complementarity occur? Would it capture complementarity and facilitation?

      (2) The justification for the calculation of MG and RC does not seem to follow the very strict assumptions of what competition (in the absence of complementarity) is. See my specific comments above.

      (3) Overall, the manuscript is hard to read. This is in part a problem of terminology and presentation, and it would be good to use more systematic terms for "response patterns" and "biological mechanisms".

      Examples:<br /> - on line 30, the authors write that CE is used to measure "positive" interactions and SE to measure "competitive interactions", and later name "positive" and "negative" interactions "mechanisms of species interactions". Here the authors first use "positive interaction" as any type of effect that results in a community-level biomass gain, but then they use "interaction" with reference to specific biological mechanisms (e.g. one species might attract a parasite that infests another species, which in turn may cause further changes that modify the growth of the first and other species).

      - on line 70, the authors state that "positive interaction" increases productivity relative to the null expectation, but it is clear that an interaction can have "negative" consequences for one interaction partner and "positive" ones for the other. Therefore, "positive" and "negative" interactions, when defined in this way, cannot be directly linked to "resource partitioning" and "facilitation", and "species interference" as the authors do. Also, these categories of mechanisms are still simple. For example, how do biotic interactions with enemies classify, see above?

      - line 145: "Under the null hypothesis, species in the mixture are assumed to be competitively equivalent (i.e., absence of interspecific interactions)". This is wrong. The assumption is that there are interspecific interactions, but that these are the same as the intraspecific ones. Weirdly, what follows is a description of the AP method, which does not belong here. This paragraph would better be moved to the introduction where the AP method is mentioned. Or omitted, since it is basically a repetition of the original Loreau & Hector paper.

      Other points:

      - line 66: community productivity, not ecosystem productivity.<br /> - line 68: community average responses are with respect to relative yields - this is important!<br /> - line 64: what are "species effects of species interactions" ?<br /> - line 90: here "competitive" and "productive" are mixed up, and it is important to state that "suffers more" refers to relative changes, not yield changes.<br /> - line 92: "positive effect of competitive dominance": I don't understand what is meant here.

    1. Reviewer #1 (Public Review):

      Summary:

      This paper uses a model of binge alcohol consumption in mice to examine how the behaviour and its control by a pathway between the anterior insular cortex (AIC) to the dorsolateral striatum (DLS) may differ between males and females. Photometry is used to measure the activity of AIC terminals in the DLS when animals are drinking and this activity seems to correspond to drink bouts in males but not females. The effects appear to be lateralized with inputs to the left DLS being of particular interest.

      Strengths:

      Increasing alcohol intake in females is of concern and the consequences for substance use disorder and brain health are not fully understood, so this is an area that needs further study. The attempt to link fine-grained drinking behaviour with neural activity has the potential to enrich our understanding of the neural basis of behaviour, beyond what can be gleaned from coarser measures of volumes consumed etc.

      Weaknesses:

      The introduction to the drinking in the dark (DID) paradigm is rather narrow in scope (starting line 47). This would be improved if the authors framed this in the context of other common intermittent access paradigms and gave due credit to important studies and authors that were responsible for the innovation in this area (particularly studies by Wise, 1973 and returned to popular use by Simms et al 2010 and related papers; e.g., Wise RA (1973). Voluntary ethanol intake in rats following exposure to ethanol on various schedules. Psychopharmacologia 29: 203-210; Simms, J., Bito-Onon, J., Chatterjee, S. et al. Long-Evans Rats Acquire Operant Self-Administration of 20% Ethanol Without Sucrose Fading. Neuropsychopharmacol 35, 1453-1463 (2010).) The original drinking in the dark demonstrations should also be referenced (Rhodes et al., 2005). Line 154 Theile & Navarro 2014 is a review and not the original demonstration.

      When sex differences in alcohol intake are described, more care should be taken to be clear about whether this is in terms of volume (e.g. ml) or blood alcohol levels (BAC, or at least g/kg as a proxy measure). This distinction was often lost when lick responses were being considered. If licking is similar (assuming a single lick from a male and female brings in a similar volume?), this might mean males and females consume similar volumes, but females due to their smaller size would become more intoxicated so the implications of these details need far closer consideration. What is described as identical in one measure, is not in another.

      No conclusions regarding the photometry results can be drawn based on the histology provided. Localization and quantification of viral expression are required at a minimum to verify the efficacy of the dual virus approach (the panel in Supplementary Figure 1 is very small and doesn't allow terminals to be seen, and there is no quantification). Whether these might differ by sex is also necessary before we can be confident about any sex differences in neural activity.

      While the authors have some previous data on the AIC to DLS pathway, there are many brain regions and pathways impacted by alcohol and so the focus on this one in particular was not strongly justified. Since photometry is really an observational method, it's important to note that no causal link between activity in the pathway and drinking has been established here.

      It would be helpful if the authors could further explain whether their modified lickometers actually measure individual licks. While in some systems contact with the tongue closes a circuit which is recorded, the interruption of a photobeam was used here. It's not clear to me whether the nose close to the spout would be sufficient to interrupt that beam, or whether a tongue protrusion is required. This detail is important for understanding how the photometry data is linked to behaviour. The temporal resolution of the GCaMP signal is likely not good enough to capture individual links but I think more caution or detail in the discussion of the correspondence of these events is required.

      Even if the pattern of drinking differs between males and females, the use of the word "strategy" implies a cognitive process that was never described or measured.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      #1) Summary: The transport of effector proteins across membranes from the producing bacterium into a target cell is at the core of bacterial secretion systems. How an additional layer in form of a capsule affects effector export and the susceptibility towards effector import is not fully understood. Here, Flaugnatti and colleagues combined bacterial genetics with phenotypic assays and electron microscopy to demonstrate a dual role of a bacterial capsule in preventing T6SS-mediated effector export and promoting protection from effector import by another bacterium's T6SS. The wide variety of methods used, complementation of the mutants, and validation of the findings across strains strengthen the author's conclusions. Although the main conclusions seem straight forward, the authors unravel the unexpected complexity underlying these phenotypes with strong mechanistic work. In brief, a capsule-deficient mutant (∆itra) is shown to assemble its T6SS similar to the WT, yet secretes more Hcp than the WT and is better in T6SS-mediated killing of other bacteria. A capsule-overproducing mutant (∆bfmS) shows both, a partial deficiency in T6SS assembly and an additional reduction in exported Hcp, and is worse in T6SS-mediated killing than the WT. A mutant with a capsule similar to WT and deficient in cell sensing (∆tslA) forms the least T6SS apparatuses and is yet better in T6SS-mediated killing than the overcapsulated mutant. Together, these data show an effect of the capsule on (i) T6SS apparatus assembly, (ii) effector export, (iii) effector import, and (iv) the need for clearance of accumulating non-secreted Hcp by ClpXP. The work on a clinical isolate of Acinetobacter tumefaciens and the data on an impaired T6SS activity on other cells by antibiotic-induced capsulation is a strong demonstration of the work's clinical relevance in addition to the findings' conceptual novelty.

      • In my view, the manuscript is outstanding with very high quality of experimental data, very well written text and very clear presentation of the data in figures. A few minor comments and suggestions below that I think would strengthen the manuscript.*

      __ Authors’ reply #1: __We thank the reviewer for their enthusiasm.

      • *

      Major comment:

      #2) OPTIONAL: Fig. 4c/l. 320: Having an indirect effect of an antibiotic on T6SS activity by antibiotic-induced capsule formation is very intriguing and contributes to the clinical relevance of the overall findings. When I saw the data in Fig. 4c, the graph instantaneously reminded me of the panel in Fig. 2a, where a similar phenotype is observed by changing the predator:prey ratio in the absence of any antibiotic. The authors themselves comment on the possibility of antibiotic-induced, reduced predator growth (and thereby a change in predator:prey ratio) as a one factor impacting the phenotype here. I am wondering if this data could be strengthened or better disentangled to test more precisely if it is the antibiotic induced capsule formation per se that affects T6SS-mediated killing by A. baumanii in the presence of antibiotics. Would it help to take the bfmS mutant along as a control for direct comparison to see if antibiotic-induced capsule formation of the WT to similar levels of the mutant results in the same killing phenotype? Would it help to test for T6SS-mediated killing in the presence and absence of antibiotics at multiple predator:prey ratios? Could the effect of the antibiotic on A. baumanii growth be measured and considered when choosing the ratio at which the bacteria are mixed?

      __ Authors’ reply #2: __The point raised by the reviewer is very important. As we have stated in the manuscript, the capsule-induced production using antibiotics impacts the growth of A. baumannii and could therefore change the predator-prey ratio, potentially affecting the observed phenotype. However, the antibiotic is expected to equally impact the non-encapsulated ΔitrA strain, yet this strain maintains very strong T6SS killing activity in the presence of chloramphenicol. Thus, we do not believe the predator-prey ratio is causing the observed effect. To address this point more directly, we nonetheless propose to: i) repeat the experiments with different predator-prey ratios (1:1, 2:1, and 5:1), and ii) include a bfmS mutant as a control.

      Minor comments:

      #3) Figure 1D, l. 155, I might have missed this, do the authors happen to have the numbers of E. cloacae as well? This would strengthen the claim on A. baumannii survival because of E. cloacae is being killed.

      __ Authors’ reply #3: __The reviewer is correct; we did not include the survival of E. cloacae in the initial manuscript due to technical reasons (counter-selection of E. cloacae). However, we propose to repeat the experiment using an E. cloacae strain carrying a plasmid conferring kanamycin resistance. This will allow us to counter-select E. cloacae after contact with the A. baumannii predator to determine if E. cloacae is killed by A. baumannii in a T6SS-dependent manner.


      #4) Figure 2, I suggest to write out the species name of the prey in the box with the ratio. With E. cloacae being referred to in the previous figure and starting with similar letters than E. coli, I wasn't sure at first sight what E. c. refers to.

      __ Authors’ reply #4: __We appreciate the comment and will revise the figure as suggested.

      #5) use of the term "T6SS activity" throughout the manuscript (e.g. l. 182, l. 187). I leave this up to the authors. To me, it seems like an umbrella term for the initial observation and I see that such a term can be very handy for the writing. I just would like to mention that the use of the term was not always intuitive to me and sometimes even a bit misleading. For example, l. 182 refers to "increased T6SS activity". As a reader, I only know about 'T6SS activity on other cells' or 'a T6SS-mediated effect on other cells' at this point. T6SS apparatus assembly/firing activity is tested for specifically later and it turns out to differ between mutants. By the time the term is used in the discussion, it captures multiple nuanced phenotypes described by then. The more precise definition of the term in l. 200 helped to capture what exactly is meant by the authors.

      __ Authors’ reply #5: __We propose rephrasing the sentences to include the term "T6SS-secretion activity" when referring to Hcp secretion assays and "T6SS-mediated killing activity" when referring to killing experiments.

      __#6) __l. 198-199 "Collectively, our findings indicate that CPS does not hinder the secretion process of the T6SS or the consequent elimination of competing cells". I might be missing something, I cannot entirely follow this sentence. Didn't the authors just show that the CPS does hinder T6SS-mediated elimination of competing cells in panel 2A and less secreted Hcp in the encapsulated WT compared to the non-encapsulated mutant in panel 2B?

      __ Authors’ reply #6:__ We thank the reviewer for this comment. We realize that the sentence wasn’t well phrased, resulting in confusion. What we meant was that the T6SS is functional regarding its T6SS-mediated killing and secretion in the WT strain, while we also showed that the non-capsulated strain kills and secretes more T6SS material in the supernatant. Thus, there seems to be a balance between capsule production and T6SS activity in the WT. We will revise the sentence to better reflect this meaning.

      #7) l. 224, typo, "in"

      __ Authors’ reply #7:__ We will correct this typo. Thank you.

      • *

      #8) Two connected comments: l. 338, Just a thought, I am wondering about the title of the section. After reading it a second time, I think it is technically correct. When reading it first, I was a bit confused when getting to the data because apparatus assmebly is impaired in the capsule-overproducing strain and although "preserved", doesn't the data indicate that there is less T6SS assembly in the bfmS mutant and that this might be because of less cell sensing and isn't this a main point that there is a difference in apparatus assembly in the capsule overproducing strain compared to WT (other than no difference in apparatus assembly in the strain without capsule)? To me it seems not fully acknowledged as a finding in the interpretation of the data that less cells of the bfmS mutant have a T6SS apparatus. Isn't that interesting? A title along the lines of "Capsule-overproducing strain has preserved sensory function and assembles less T6SS apparatuses" would have been more intuitive for me. l. 352, In case I didn't miss a reference to this data earlier in the manuscript, I am wondering if it would be worth mentioning the finding on the reduced apparatus assembly of the bfmS mutant earlier, together with Figure 3 already. At least a sentence that mentions already that there is more coming later. When I got to this line in the manuscript and read the findings on the apparatus assembly, I first needed to go back to figure 3 and look at the data there again in light of this finding. It is mentioned here on the side but I think very important for the interpretation of the phenotypic data of the bfmS mutant shown earlier, isn't it? The tslA mutant is used beautifully here.

      __ Authors’ reply #8:__ We thank the reviewer for the suggestion and the kind comment about the beautiful usage of the tslA mutant. We will modify the title of the corresponding paragraph as suggested to make it more intuitive.

              Regarding the comment about mentioning the T6SS apparatus assembly defect in the *bfmS* mutant earlier, we respectfully disagree. While we agree that this point is important and can partially explain the difference in killing activity, we believe that showing it together with the *tslA* mutant (Figure 5) makes more sense and is easier for the reader to understand.
      

      #9) Discussion: optional comment. On the one hand, I like the concise discussion. On the other hand, I see more potential here for bringing it all together (potentially at the expense of shortening some of the introduction). I think the subtleties of the findings are complex. For example, I could envision a graphical summary with a working model on all the effects of a capsule on the T6SS and its potential clinical relevance making the study accessible to even more readers.

      __ Authors’ reply #9: __In the revised manuscript, we will include a graphical summary/model.


      Significance

      #10) General assessment: I consider the story very strong in terms of novelty, experimental approaches used, quality of the data, quality of the writing and figures of the manuscript. In my view, the aspects that could be improved are optional/minor and concern only one figure and some phrasing.

      • Advance: I see major advance in the findings (i, mechanistic) on the mechanism of how the capsule interferes with T6SS, (ii, fundamental) on the discovery of ClpXP degrading Hcp, and (iii, clinical) on the meaning of antibiotic treatment for the T6SS of this clinically relevant and often multi-drug resistant bacterial species, which strongly complements existing work on the T6SS and antibiotics in A. baumanii (e.g. of the Feldman group). As the authors write themselves, the starting points of the study of a capsule protecting from a T6SS and the effect of a T6SS on other cells being negatively impacted by a capsule were known, although not studied in one species and not understood mechanistically.*

      • Audience: I see the result of interest to a broad audience in the fields of bacteria-bacteria interactions, Acinetobacter baumanii, type VI secretion, antimicrobial resistance, bacterial capsules.*

      __ Authors’ reply #10: __We once again thank the reviewer and highly appreciate their positive and constructive feedback on our work. We hope the reviewer will be satisfied with the revised version of our manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      #11) In the manuscript by Flaugnatti et al., the authors provide clear evidence of the interplay between capsule outer coat production and the Type VI secretion system (T6SS) in Acinetobacter baumannii. The authors demonstrate that the presence of the capsule or the activity of the T6SS enhances survival against attacking bacteria. However, they also show that in their model bacterium, the (over)production of the capsule likely hinders T6SS dynamics, thereby reducing overall killing efficiency. Additionally, they reveal that the amount of the T6SS component Hcp is regulated in cells that can no longer assemble and/or secrete via the T6SS, presumably by the ClpXP protease. Overall, the experiments are well designed, and most conclusions are supported by the data and appropriate controls. I have however some suggestions that could further strengthen the manuscript prior to publication.

      __ Authors’ reply #11: __We are grateful for the reviewer’s enthusiasm and will implement their comments and suggestions in the revised version of the manuscript.


      Major comments:

      #12) Line 164. The authors use E. coli as prey to test the T6SS activity of A. baumannii. Why not directly use the E. cloacae strain (with or without T6SS) for this purpose? This would provide direct evidence that A. baumannii uses its T6SS to kill E. cloacae, thus confirming the authors conclusions in this section.

      __ Authors’ reply #12: __We thank the reviewer for this comment. We used E. coli to assess the functionality of the T6SS in different strains of A. baumannii, as it is commonly done in the T6SS field. However, as suggested by reviewer 1 (see comment #3) and in response to this query, we will also provide survival data of E. cloacae in the revised manuscript using a plasmid-carrying E. cloacae derivative that allows direct selection.

      #13) In Figure 2, the authors show that a non-capsulated strain kills more effectively and secretes more than a WT, but has a similar number of T6SS. They suggest in their conclusion that "the observed increase in T6SS activity in the non-capsulated strain suggests a compensatory mechanism for the absence of the protective capsule layer." This conclusion implies the presence of an "active" regulatory mechanism that would increase the number of successful T6SS firing events, which has not been demonstrated. Could it not simply be that the capsule blocks some shots that cannot penetrate and are therefore ineffective? This hypothesis is mentioned in lines 204-208. The authors should clarify the conclusion of this section. Given the challenge this may pose in A. baumannii, I suggest that the authors quantify the assembly/firing dynamics of the T6SS under WT and ΔitrA conditions. This would help distinguish between the two hypotheses explaining better firing in non-capsulated cells: i.e., if the number of assembled T6SS is the same in both strains (Fig 2C & 2D), do non-capsulated cells assemble/fire faster, indicating an adaptation in regulation, or do we observe the same dynamics, suggesting a simple physical barrier blocking the passage of certain T6SS firing events?

      __ Authors’ reply #13:__ We realize that the sentence, and more specifically the word "compensatory," might have been misleading and thank the reviewer for bringing this to our attention. What we meant to convey is that there is a balance between capsule production and T6SS activity; if disturbed, the balance shifts in one direction or the other. Specifically, there is more protection through the production of a thicker capsule (e.g., in the ∆bfmSmutant or under sub-MIC conditions of antibiotics, regulated by the Bfm system, as mentioned in the text) or more T6SS activity when less capsule is present (e.g., in the ΔitrA mutant, which we propose is caused by the lack of the steric hindrance). We will rephrase this sentence in the revised manuscript to better convey this message.

              Regarding the quantification of T6SS dynamic assembly/firing events between the capsulated (WT) and non-capsulated (ΔitrA) strains, we do not think this is required for this study, as the amount of secreted Hcp reflects the overall activity of the system. Importantly, we also do not have the technical means to quantify assembly/firing rates under Biosafety 2 conditions, as this requires specialized microscopes with very fast acquisition options (see, for instance, Basler, Pilhofer *et al.*, 2012, *Nature*). Indeed, very few labs in the T6SS field have been able to measure such rates.
      

      #14) Line 428-429. It is mentioned that the deletion of lon does not have a notable effect. However, I observe that the absence of Lon alone causes a more rapid degradation of Hcp in the cells compared to the WT strain (Fig 7B). How do the authors explain that the absence of this protease (whether under conditions of Hcp accumulation or not) increases the degradation of this protein in the cell? This explanation should be included in the manuscript.

      __ Authors’ reply #14: __That’s a fair point. We didn’t address this point further, as the deletion of lon didn’t resolve the issue of why Hcp is degraded. In fact, the opposite seems to be the case, as there is less Hcp in the ∆lon strain compared to the WT. While this observation is not directly relevant to the question of why Hcp is degraded late during growth in secretion-impaired strains, we will properly mention it in the revised manuscript.

              Please also note that a strong growth defect of a Δ*lon*Δ*clpXP* double mutant impaired further investigation in this direction.
      
      • *

      Minor comments:

      #15) Throughout the manuscript, the authors use the term "predator" to refer to A. baumannii. Predation is a specific phenomenon that involves killing for nourishment. To my knowledge, the T6SS has never been shown to be a predation weapon but rather a weapon for interbacterial competition, which is a different concept. If this has not been demonstrated in A. baumannii, the authors should replace the term "predator" with "attacker" (or an equivalent term) to clarify the context.

      __ Authors’ reply #15: __We thank the reviewer for this comment. The term “predator,” as highlighted by the reviewer, typically implies killing for nourishment/cellular products. In the context of T6SS, it facilitates the killing of competitors, releasing DNA into the environment that can subsequently be acquired through natural competence for transformation, as observed in species like Vibrio cholerae (our work by Borgeaud et al., 2015, Science) or other Acinetobacter species such as Acinetobacter baylyi (Ringel et al., 2017, Cell Rep.; Cooper et al., 2017, eLife). The acquisition of DNA reflects the killing for cellular products of the prey. As most A. baumannii strains are also naturally competent, this justifies the usage of the predator and prey nomenclature.

              Apart from this fact, it seems to be a matter of nomenclature, with many papers in the field using one term or the other. Yet, ultimately, this doesn’t change any of the scientific findings. Therefore, to satisfy the reviewer, we will change “predator” to “attacker” throughout the revised manuscript.
      

      #16) Line 274. Since the authors stated that in the Wzc mutant, the capsule is "predominantly found in the supernatant and only loosely attached to the cell," this result is not unexpected. This finding is also consistent with the previous results from Fig. 3A & B, which show sensitivity to complement-mediated killing and the weak amount of (ab)normal CPS produced in that strain, further confirmed by Fig. 3E.

      __ Authors’ reply #16__: We fully agree with the reviewer’s suggestion and will remove the statement.

      #17) Line 299. The authors speculate that "... T6SS may deploy through gaps akin to arrow-slits in the capsule's mesh...". However, this is very unlikely since a WT strain kills (Fig. 3C) and secretes (Fig. 2B & 3D) less effectively than the itrA mutant, suggesting that the T6SS is not assembled in the "right places" devoid of CPS; otherwise, we would expect similar T6SS activity. Based on the results in Fig. 2 (and my earlier comment), this implies that A. baumannii assembles its T6SS randomly, and in the presence of the capsule, its shots would need to be in the right place to penetrate the envelope and reach the target. Could the authors comment on this point and provide a model figure to better visualize the interplay between the capsule and T6SS under the three major conditions: WT, non-capsulated, and capsule overproduction?

      __ Authors’ reply #17: __We thank the reviewer and agree with their comment. We discussed the hypothesis of T6SS deployment through gaps, drawing a parallel to what was proposed for biofilm and T6SS in V. cholerae(Toska et al., 2018, PNAS). However, as mentioned earlier, we believe that the effect of the capsule on T6SS activity is primarily due to steric hindrance, which increases the distance between the T6SS apparatus and the prey cell. To clarify our findings further, we will include a model summarizing our results, as requested by reviewer 1 (see comment #9).


      __ #18)__ In Fig. 5A, the microscopy panels should be adjusted to the same dynamic range as the WT (which represents a true signal), which does not appear to be the case for the tlsA mutant panel for instance. The image gives the impression of a large amount of free TssB-msfGFP in the cytoplasm. However, this effect is due to the dynamic range being adjusted to display a signal. This observation is consistent with the fact that the amount of TssB-msfGFP protein is identical across all strains (Fig. S2F).

      __ Authors’ reply #18: __We will adjust the images to the range of the WT in the revised manuscript, as suggested. However, regardless of how these images are presented, the enumeration of T6SS structures will remain unchanged, which was the sole point of this experiment.

      • *

      #19) Unless I am mistaken, the authors do not comment on the fact that in a ΔbfmS strain, the number of T6SS is halved compared to a WT or ΔitrA strain. If capsule overproduction only partially limits the TslA-dependant T6SS assembly, how can this result be explained? Is it related to the degradation of Hcp in this background, which ultimately limits the formation of T6SS? If so, it would be interesting to mention this connection in the section "Prolonged secretion inhibition triggers Hcp degradation”

      __ Authors’ reply #19: __We did mention that the T6SS assembly of the ΔbfmS mutant is reduced compared to the WT (or ΔitrA), likely due to the defect in sensing the prey (lines 369-374 and 468-472 of the initial manuscript). However, we will revise the sentence to improve clarity in the revised version of the manuscript.

      Significance

      #20) This work is highly intriguing as it not only delves into the specific mechanisms involved but also connects fundamental elements in bacterial competition, i.e., the necessity for self-protection and aggression for survival. The manuscript offers valuable insights into cellular dynamics at a microscale level and prompts new inquiries into the regulation of these systems on a population scale. The work is well-done and the writing is also clear. I am convinced that this work represents another significant step towards understanding bacterial mechanisms and will undoubtedly spark considerable interest in the field.

      __ Authors’ reply #20: __We sincerely thank reviewer #2 for their constructive inputs, which will improve our manuscript.

      • *

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      #21) The manuscript by Flaugnatti et al investigates the relationship between functions of the T6SS in A. baumannii and production of capsular polysaccharide. The manuscript argues that (1) capsule protects A. baumannii against T6SS-mediated attack by other bacteria, (2) capsule also interferes with the bacterium's own T6SS activity, and (3) the T6SS inner tube protein Hcp is regulated by degradation by ClpXP. The main critiques regard the first two conclusions, which seem to be based solely on use of a mutant that has a confounding effect as described below; and to strengthen the third claim by further exploring the results of overexpressing Hcp and by determining whether there is a fitness benefit for Hcp regulation.

      __ Authors’ reply #21: __We thank reviewer #3 for their relevant input. We will conduct additional experiments based on their comments, and these will be incorporated into the revised manuscript.

      • *

      __Main items:____ __

      #22) Throughout the paper, an itrA deletion mutant is used as the capsule-deficient strain and conclusions are drawn about role of capsule based on this mutant. However, itrA deletion also eliminates the protein O-glycosylation pathway (Lees-miller et al 2013), a potential confounder. Analysis of mutants specifically deficient in the high-molecular weight capsule but not protein glycosylation, and/or mutants in the protein o-glycosylation enzyme, should be incorporated into the study to enhance the ability to make conclusions about the role of the capsule.

      __ Authors’ reply #22: __Fair point. We thank the reviewer for this important suggestion. To distinguish between the O-glycosylation pathway and capsule production, we will generate a ∆pglL strain (specific to O-glycosylation), as suggested, and will repeat the key experiments (similar to Fig. 2A and 2B). We are almost done with the engineering of this mutant strain and therefore don’t expect any major delays.

      #23) Evidence could be provided to support the idea raised in lines 482-483 that T6SS component accumulation is toxic ("degradation [of T6SS components] could serve as a strategy to alleviate proteotoxic stress..."). For example, growth curves of ∆clpXP strains with and without hcp could be analyzed, to determine how degrading Hcp is helping the bacteria.

      __ Authors’ reply #23: __We will perform growth curves of ΔclpXP strains with and without hcp, as suggested by the reviewer. However, we are uncertain whether we will be able to observe differences between these strains, as the conditions under which such degradation is significant may be challenging to replicate under standard laboratory conditions.

      __#24) __The possible ClpXP recognition sequence identified at the C terminus of Hcp is interesting-does overexpression of an Hcp variant lacking/altered in this motif alter its protein levels compared to WT Hcp?

      __ Authors’ reply #24: __We thank the reviewer for this suggestion. We are in the process of performing the suggested experiment and will include the data in the manuscript.

      __Minor items:____ __

      #25) *A better explanation could be provided for why overexpressing hcp in WT but not in ∆hcp leads to increased Hcp protein levels. There is a statement about Hcp being regulated post transcriptionally, possibly by degradation (lines 422-423), but would that not also result in regulation in the WT strain? *

      __ Authors’ reply #25: __The reviewer is absolutely correct here. Despite careful genetic engineering, we believe that the hcp mutant used may have a polar effect, causing Hcp accumulation only in the ∆hcp + p-hcp strain but not in the WT + p-hcp strain, which remains capable of secretion. The ∆hcp strain therefore mimics the secretion-impaired tssB mutant. We will clarify this in the revised manuscript.

      #26) *An untreated control is needed in Fig. 4B. *

      __ Authors’ reply #26: __The untreated samples were shown in all previous figures. However, we understand the reviewer's point and will repeat the experiment with the untreated control included in the same experiment.

      #27) *line 179: please clarify "reflecting better invading bacteria" *

      __ Authors’ reply #27: __We appreciate the reviewer mentioning this oversight. We meant to compare this to a situation where a bacterium invades an already existing community, resulting in a predator-prey ratio below 1. We will clarify this further in the revised manuscript.

      #28) *line 351: consider rewording the statement that ∆tslA results in decreased in T6SS assembly and activity using the tssB-msfGFP microscopy assay; it is not clear that activity is measured in this assay. *

      __ Authors’ reply #28: __The reviewer is correct. We will revise the sentence accordingly to better reflect the T6SS assembly.

      #29) *lines 260-265: This experiment could use clarifying, but it would seem that it requires analysis of the secreted capsule levels in the tssB mutant to show it does not produce extracellular capsule to the same extent that ∆bfmS does. *

      __ Authors’ reply #29: __We thank the reviewer for the suggestion and will include these experimental data in the revised manuscript.

      #30) *Fig. 6C and 7A labelling could be improved to avoid potential confusion that the bar graphs are quantifying the western blot. E.g., could add a corresponding vertical label to the Western data, or consider changing "relative expression of hcp" to something reflecting analysis of transcript levels. *

      __ Authors’ reply #30: __We will improve this figure by splitting the qPCR and Western blot data into independent panels. This will eliminate any confusion.


      #31) lines 416-417 and Fig. 7A: states that "hcp mRNA levels increased significantly", but more careful wording could be used because the WT's transcript change is not significant after overexpression (though it is significant in ∆hcp).

      __ Authors’ reply #31: __Point well taken. We will improve the sentence (and Figure) to make its meaning unambiguous.

      • *

      #32) lines 479-480 states that in secretion-impaired strains accumulation of Hcp is mitigated by ClpXP; while this was shown for ∆tssB, was this also the case for ∆bfmS?

      __ Authors’ reply #32: __This is indeed an interesting suggestion. We are in the process of generating the double mutant ∆bfmSclpXP and will include the experimental results in the revised manuscript.


      Significance

      #33) *The strengths of the study are the focus on a clinically significant pathogen, the potential novel roles for the important capsule virulence factor of A. baumannii, and the identification of novel points of control of the T6SS. The analyses of T6SS function are thorough and carefully performed. *

      __ Authors’ reply #33: __We thank the reviewer for their comments, which we believe will significantly strengthen our work, particularly regarding the capsule aspect.

    1. Author response:

      eLife assessment

      This valuable study uses single-cell transcriptomics to explore the mouse vomeronasal organ and represents an advance that enhances our understanding of neural diversity within this sensory system. Findings suggest a unique endoplasmic reticulum (ER) structure in Gnao1 neurons and allow for the synthesis of a developmental trajectory from stem cells to mature vomeronasal sensory neurons. Convincing methods, data, and analyses broadly support the claims, although experiments supporting the main ER-related claim are incomplete and lack quantification of co-expression and statistics on labeling intensity or coverage. Adding these data would greatly strengthen the conclusions of the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Devakinandan and colleagues present a manuscript analyzing single-cell RNA-sequencing data from the mouse vomeronasal organ. The main advances in this manuscript are to identify and verify the differential expression of genes that distinguish apical and basal vomeronasal neurons. The authors also identify the enriched expression of ER-related genes in Gnao1 neurons, which they verify with in situ hybridizations and immunostaining, and also explore via electron microscopy. Finally, the results of this manuscript are presented in an online R shiny app. Overall, these data are a useful resource to the community. I have a few concerns about the manuscript, which I've listed below.

      General Concerns:

      (1) The authors mention that they were unable to identify the cells in cluster 13. This cluster looks similar to the "secretory VSN" subtype described in a recent preprint from C. Ron Yu's lab (10.1101/2024.02.22.581574). The authors could try comparing or integrating their data with this dataset (or that in Katreddi et al. 2022) to see if this is a common cell type across datasets (or arises from a specific type of cell doublets). In situ hybridizations for some of the marker genes for this cluster could also highlight where in the VNO these cells reside.

      Cluster13 (Obp2a+) cells identified in our study have similar gene expression markers to those identified with the “putative secretory” cells in Hills et al. manuscript. At the time this manuscript was available publicly, our publication was already finalized and communicated. We welcome the suggestion to integrate data, which we will attempt and address in our revision.      

      (2) I found the UMAPs for the neurons somewhat difficult to interpret. Unlike Katreddi et al. 2022 or Hills et al. 2024, it's tricky to follow the developmental trajectories of the cells in the UMAP space. Perhaps the authors could try re-embedding the data using gene sets that don't include the receptors? It would also be interesting to see if the neuron clusters still cluster by receptor-type even when the receptors are excluded from the gene sets used for clustering. Plots relating the original clusters to the neuronal clusters, or dot plots showing marker gene expression for the neuronal clusters might both be useful. For example, right now it's difficult to interpret clusters like n8-13.

      We will represent the UMAPs to make the developmental trajectory clearer. How neuron clusters are affected by the presence or exclusion of receptors is an interesting question that we will address in our revision, along with showing markers of each neuronal cluster, as suggested by the reviewer.  

      Reviewer #2 (Public Review):

      Summary:

      The study focuses on the vomeronasal organ, the peripheral chemosensory organ of the accessory olfactory system, by employing single-cell transcriptomics. The author analyzed the mouse vomeronasal organ, identifying diverse cell types through their unique gene expression patterns. Developmental gene expression analysis revealed that two classes of sensory neurons diverge in their maturation from common progenitors, marked by specific transient and persistent transcription factors. A comparative study between major neuronal subtypes, which differ in their G-protein sensory receptor families and G-protein subunits (Gnai2 and Gnao1, respectively), highlighted a higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. Moreover, distinct differences in ER content and ultrastructure suggest some intriguing roles of ER in Gnao1-positive vomeronasal neurons. This work is likely to provide useful data for the community and is conceptually novel with the unique role of ER in a subset of vomeronasal neurons. This reviewer has some minor concerns and some suggestions to improve the manuscript.

      Strengths:

      (1) The study identified diverse cell types based on unique gene expression patterns, using single-cell transcriptomic.

      (2) The analysis suggests that two classes of sensory neurons diverge during maturation from common progenitors, characterized by specific transient and persistent transcription factors.

      (3) A comparative study highlighted differences in Gnai2- and Gnao1-positive sensory neurons.

      (4) Higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons.

      (5) Distinct differences in ER content and ultrastructure suggest unique roles of ER in Gnao1-positive vomeronasal neurons.

      (6) The research provides conceptually novel on the unique role of ER in a subset of vomeronasal neurons, offering valuable insights to the community.

      Weaknesses:

      (1) The connection between observations from sc RNA-seq and EM is unclear.

      (2) The lack of quantification for the ER phenotype is a concern.

      We would like to point out that the connection between scRNA-seq and EM was made in our experiments that investigated the localization of ER proteins via IHC (in Figure 5). The intriguing observation that the levels of a number of ER luminal and membrane proteins were higher in Gnao1 compared to Gnai2 neurons, led us to hypothesize a differential ER content or ultrastructure, which was verified by EM. The quantification of ER phenotype would definitely strengthen our observations, which we will add in our revised manuscript.       

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Devakinandan and colleagues have undertaken a thorough characterization of the cell types of the mouse vomeronasal organ, focusing on the vomeronasal sensory neurons (VSNs). VSNs are known to arise from a common pool of progenitors that differentiate into two distinct populations characterized by the expression of either the G protein subunit Gnao1 or Gnai2. Using single-cell RNA sequencing followed by unsupervised clustering of the transcriptome data, the authors identified three Gnai2+ VSN subtypes and a single Gnao1+ VSN type. To study VSN developmental trajectories, Devakinandan and colleagues took advantage of the constant renewal of the neuronal VSN pool, which allowed them to harvest all maturation states. All neurons were re-clustered and a pseudotime analysis was performed. The analysis revealed the emergence of two pools of Gap43+ clusters from a common lineage, which differentiate into many subclusters of mature Gnao1+ and Gnai2+ VSNs. By comparing the transcriptomes of these two pools of immature VSNs, the authors identified a number of differentially expressed transcription factors in addition to known markers. Next, by comparing the transcriptomes of mature Gnao1+ and Gnai2+ VSNs, the authors report the enrichment of ER-related genes in Gnao1+ VSNs. Using electron microscopy, they found that this enrichment was associated with specific ER morphology in Gnao1+ neurons. Finally, the authors characterized chemosensory receptor expression and co-expression (as well as H2-Mv proteins) in mature VSNs, which recapitulated known patterns.

      Strengths:

      The data presented here provide new and interesting perspectives on the distinguishing features between Gnao1+ and Gnai2+ VSNs. These features include newly identified markers, such as transcription factors, as well as an unsuspected ER-related peculiarity in Gnao1+ neurons, consisting of a hypertrophic ER and an enrichment in ER-related genes. In addition, the authors provide a comprehensive picture of specific co-expression patterns of V2R chemoreceptors and H2-Mv genes.

      Importantly, the authors provide a browser (scVNOexplorer) for anyone to explore the data, including gene expression and co-expression, number and proportion of cells, with a variety of graphical tools (violin plots, feature plots, dot plots, ...).

      Weaknesses:

      The study still requires refined analyses of the data and rigorous quantification to support the main claims.

      The method description for filtering and clustering single-cell RNA-sequencing data is incomplete. The Seurat package has many available pipelines for single-cell RNA-seq analysis, with a significant impact on the output data. How did the authors pre-process and normalize the data? Was the pipeline used with default settings? What batch correction method was applied to the data to mitigate possible sampling or technical effects? Moreover, the authors do not describe how cell and gene filtering was performed.

      The data in Figure 7-Supplement 3 show that one-sixth of the V1Rs do not express any chemoreceptor, while over a hundred cells express more than one chemoreceptor. Do these cells have unusually high or low numbers of genes or counts? To exclude the possibility of a technical artifact in these observations, the authors should describe how they dealt with putative doublet cells or debris.

      Surprisingly, some clusters are characterized by the expression of specific chemoreceptors (VRs). Have these been used for clustering? If so, clustering should be repeated after excluding these receptors.

      The identification of the VSN types should be consistent across the different analyses and validated. The data presented in Figure 1 lists four mature VSN types, whereas the re-clustering of neurons presented in Figure 3 leads to a different subdivision. At present, it remains unclear whether these clusters reflect the biology of the system or are due to over-clustering of the data, and therefore correspond to either noise or arbitrary splitting of continua. Clusters should be merged if they do not correspond to discrete categories of cells, and correspondence should be established between the different clustering analyses. To validate the detected clusters as cell types, markers characteristic of each of these populations can be evaluated by ISH or IHC.

      There is a lack of quantification of imaging data, which provides little support for the ER-related main claim. Quantification of co-expression and statistics on labeling intensity or coverage would greatly strengthen the conclusions and the title of the paper.

      scRNA-seq data analysis methods: We agree with the reviewer and will elaborate on the various criterion, parameters and methods in our revision. As described above, our revised manuscript will include analysis of how inclusion / exclusion of VRs affects cell clusters, as well as quantification of the ER phenotype. We will address the reviewer’s concern of over-clustering.

      We think that the cells expressing zero as well as two V1Rs are real and cannot be attributed to debris or doublets for the following reasons:

      a) Cells expressing no V1Rs are not necessarily debris because they express other neuronal markers at the same level as cells that express one or two V1Rs. Higher expression threshold values used in our analysis may have somewhat increased the proportion of cells with zero V1Rs. We will modify figure 7-supplement 3c to add another group showing Gnai2 level in cells expressing zero V1Rs.

      b) Cells co-expressing V1R genes: We listed the frequency of cells co-expressing V1R gene combinations in Supplementary table - 8. Among 134 cells that express two V1Rs, 44 cells express Vmn1r85+Vmn1r86, 21 express Vmn1r184+Vmn1r185, 13 express Vmn1r56+Vmn1r57, 6 express Vmn1r168+Vmn1r177, and so on. Doublets generally are a random combination of two cells. Here, each specific co-expression combination represents multiple cells and is highly unlikely by random chance. Some of the co-expression combinations were identified earlier and verified experimentally in Lee et al., 2019 and Hills et. al. Furthermore, Figure-7 supplement 3c shows that the level of Gnai2 expression is comparable across cells expressing one or two V1Rs. If the V1R expressing cells are doublets, we expect the level of Gnai2 to be higher, as compared to cells expressing single V1R. We will elaborate on this in our revised manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all three reviewers for their insightful comments. Based on this feedback, we have performed additional experiments, and revised our manuscript. Below, we address each comment and describe the revisions.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: Ponomarova et al. showed that neomorphic idh-1 mutation results in increased levels of cellular D-2HG. The authors compared the high D-2HG phenotypes by D-2HG dehydrogenase mutant and identified vitamin B12 dependent vulnerability differences. The downregulated gene function of glycine cleavage system involved in one-carbon donor units exacerbates the phenotypes while adding one-carbone donors suppresses the phenotype. They concluded that the idh-1neo mutation imposes a dependency on the one-carbon pool. The manuscript is very interesting but I think the manuscript should be modified to be more clear for broad audiences.

      Concerns: The authors mention a number of examples for metabolic changes of D-2HG in the first paragraph of introduction. I think that a metabolic map explaining the changes helps readers to understand the questions proposed by the authors.

      Thank you for this suggestion. A figure illustrating the contributing factors in D-2HG metabolism has been added to the manuscript (Figure 1A).

      The authors say that D-2HG affects carcinogenesis in many ways, citing previous works. They should say a higher concentration of D-2HG does affect carcinogenesis or not in dhgd loss of function, if they assume the concentration is most important for carcinogenesis.

      Thank you for pointing this out. We have added this information in lines 70-72 of the revised manuscript: "Increased levels of D-2HG caused by the inhibition of D-2-hydroxyglutarate dehydrogenase activity have also been associated with different cancers (PMID: 29339485, PMID: 34296423, PMID: 35007759)."

      Line 110, mode should be read as model, I guess.

      Thank you - we have corrected this error.

      In Figure 4C, concentrations of formate are shown; 0. 20, 40, 80, 160 mM. Is this correct? the high concentration of substrates changes the osmotic pressure of the medium. Also, high concentration of formic acid is toxic to animals. Considering the concentration of vitamin B12 was 64 nM, I wonder concentration unit of formate is also nM.

      We confirm that we supplemented the media with formate in the millimolar range. The highest doses of supplemented formate somewhat slowed the development of P0 animals, but they consistently produced viable progeny. To clarify this we have added the following line to the text on lines 184-187: "The highest doses of supplemented formate somewhat slowed the development of P0 animals, but restored the survival of idh-1neo embryos to wild-type levels on a regular diet of E. coli OP50 as well as the diet of RNAi-competent E. coli HT115."

      Additionally, the use of sodium formate ensured that the pH of the media remained unchanged.

      I could not understand how embryonic and larval lethality confer the same mechanisms on animal carcinogenesis. Could you explain the logic link between lethal mutation and carcinogenesis. Or do the two phenotypes share only a part of metabolic changes?

      Thank you for this suggestion. We have added this in lines 242-246 of the Discussion:

      "While our results have focused on how the neomorphic idh-1 mutation affects the developing embryo, proliferating cancer cells also have been shown to have increased demand for 1C units, for instance, to synthesize nucleosides (33)(PMID: 24657017). Thus, we can speculate that cancers with mutated IDH1 may be increasingly sensitive to depletion of the 1C pool, also."

      Vitamin B12 is an essential substance and deficiency in humans results in sever diseases. Is the lethal phenotype by treatment of idh-1neo mutants comparable to humans? Is the concentration of vitamin B12 similar in humans?

      The daily dose of human vitamin B12 (cobalamin) in supplements can reach 12.5 µg per kg (PMID: 18606874), while we supplement the media fed to worms with approximately 55 µg cobalamin per kg (64 nM adenosylcobalamin). No known adverse effects are associated with excessive intake of vitamin B12 by healthy individuals; therefore, no tolerable upper intake level has been set (PMID: 23193625). However, the impact of vitamin B12 on patients with IDH1neo-positive cancers has not been studied.

      Reviewer #1 (Significance (Required)):

      I think that the manuscript is interesting and may lead an important progress of this field. However, in general, metabolic disorders are difficult to understand for the people outside the speciality. The authors should explain carefully the structure/property, pathways, enzyme functions, and concentration effects of substances of interest.

      See above, we hope these edits are sufficient.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Increased levels of the metabolite D-2HG (derived from alpha-KG) are associated with multiple disorders. In a previous study, the authors showed that in C. elegans dhgd-1 deletion mutants, embryonic lethality resulting from the accumulation of D-2HG in is caused by a lack of ketone bodies. In this study, the authors generated a new model of D-2HG accumulation in C. elegans, idh-1neo, in order to further understand how D-2HG exerts its toxic effects in different contexts. This allele mimics mutations found in neomorphic mutations of human IDH1 that lead to abnormal D-2HG production from alpha-KG. Interestingly, the authors find that idh-1neo mutants are distinct from animals lacking the D-2HG dehydrogenase dhgd-1 previously reported. Specifically, while vitamin B12 rescues the embryonic lethality in dhgd-1 deletion animals, it enhances the lethality of idh-1neo animals. Through an elegant genetic screen, and complementation studies with specific metabolites, they provide compelling evidence that this vitamin B12-dependent enhancement is due to depletion of the 1C pool. Specifically, a reverse genetic screen revealed that inactivation of components of the 1 C-producing glycine cleavage system (GCS) results in embryonic lethality in idh-1neo, but not wildtype animals. Complementation studies with specific metabolites show that replenishing C groups is sufficient to reverse embryonic lethality.

      This is a very clear, well written paper. Experiments are well controlled and executed, figures are of the highest quality and conclusions are convincing. Prior studies are appropriately referenced. No additional experiments are required by this reviewer.

      Minor points 1) In Figure 2A could authors explain how beta-alanine (increased) is different from alanine (decreased). As a non-specialist this is not clear to me.

      Thank you for pointing this out. We added this explanation to the figure legend (lines 510-512).

      2) Did the authors test inactivation of the lipoamide dehydrogenase (dld-1) has the same effect as the other identified components of the GCS?

      The dld-1 RNAi clone was present in the metabolic library that we screened but was not identified as a "hit." We have added the following in lines 164-168 of the revised manuscript: "Two other GCS genes, gcsh-2 and dld-1 were not identified as 'hits'. gcsh-2 is associated with the same reaction as gcsh-1, indicating that the latter encodes an active enzyme (30). dld-1 functions in other metabolic processes, particularly in lactate/pyruvate metabolism, and confers embryonic lethality when knocked down in wild type animals (31)".

      **Referees cross-commenting**

      Comments to Reviewer #3: 1/ The authors treat the idh-1neo worms with vitamin B12 to reduce 3HP concentrations. The authors should consider conducting experiments to reduce 3HP by other means also. This would help establish a causal relationship between the D-2HG accumulation and observed phenotypes.

      The authors show that adding vitamin B12 to the diet of the idh-1neo significantly increased their D-2HG levels. Furthermore, dhgd-1 RNAi drives a further increase in D-2HG in idh-1neo animals and led to 100% penetrant embryonic lethality among the F1 generation of idh-1neo animals. Together I think this provided strong evidence for a causal relationship between the D-2HG accumulation and observed phenotypes. Further characterizing these phenotypes would be interesting but is beyond the scope of this paper.

      4/ The authors should clarify whether it is really vitamin B12 or any other metabolite from the bacteria (like methionine) that is bringing about the phenotypes. Have they tested metabolically inactive bacteria?

      the authors show that supplementing B12-treated idh-1neo animals with formate (another 1C donor) restored the survival of idh-1neo embryos, supporting a role for B12 in depletion of the 1C pool. They also show that suppressing Met/SAM cycle genes in idh-1neo prevent 1C depletion and restore availability of 1C units. So the evidence that 1C unit depletion is at the core of the observed phenotypes is pretty convincing

      7/ The authors should conduct metabolomic profiling to examine changes in metabolic pathways, including 1C, glycine metabolism, glucose metabolism etc, in idh-1neo animals subjected to GCS gene knockdown, and vitamin B12 supplementation.

      Not clear how these experiments would add to this story. Open up another line of research

      8/ The audience will be limited to the field although the study pertains to an oncometabolite. The study value would have improved if the authors had included cancer cell data. Also, the phenotype studied has not been mechanistically linked to the oncometabolite function, making the study academic in nature.

      The intetest of this study is that it is being carried out in an organismal context.

      Reviewer #2 (Significance (Required)):

      As a geneticist with a general interest in metabolomics I find this an elegans study that offers new insight into how IDH-1 and -2 neomorphic mutations affect metabolic rewiring in the context of a whole animal. Although similarities are observed between idh-1neo mutants and animals lacking the D-2HG dehydrogenase dhgd-1, both of which have increased levels of the metabolite D-2HG, specific metabolic differences are observed. The identification of 1C unit deficiency as a driver of lethality in idh-1neo mutants is highly significant given the central importance of 1C metabolism. This study should therefore be of interest to a wide audience.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Ponomarova et al presents a short follow up of their previous study to elucidate the role of a oncogenic variant of idh-1 that increases the 3HP levels, similar to the Ddhgd-1 mutant. Using a combination of metabolomics and genetics, they show that the defect in idh-1neo worms on high vitamin B12 diet is the draining of the 1C pool, distinct from the mechanisms of lethality observed in the Ddhgd-1 mutant. While the findings are interesting, there is a lack of mechanistic understanding of the basis of the phenotype observed. Moreover, the authors do not establish the link between the oncometabolite, that should support uncontrolled cell division, with the observed phenotype. Some control experiments are missing and should be included in the revised manuscript. there could be many other The comments on the manuscript are as follows, in no particular order:

      1. The authors treat the idh-1neo worms with vitamin B12 to reduce 3HP concentrations. The authors should consider conducting experiments to reduce 3HP by other means also. This would help establish a causal relationship between the D-2HG accumulation and observed phenotypes.

      To further examine the link between 3HP and idh-1neo embryonic lethality, we targeted hphd-1 by RNAi, which increases 3HP levels (Ponomarova et al., 2023). Hphd-1 knockdown did not induce lethality in the wild-type or exacerbate lethality in idh-1neo animals (Figure S3), further demonstrating that lack of 3HP degradation is not linked to this phenotype (lines 143-145).

      Also, see cross-comments from Reviewer #2 above.

      The authors should investigate the functional impact of HPHD-1 inhibition on 3-hydroxypropionate levels and D-2HG accumulation by RNAi knockdown of HPHD-1 in idh-1neo animals.

      We have now performed the suggested experiment please see response to comment 1 above.

      The authors do not clearly mention clearly which diet in some of their experiments. This is imporant since the two diets used (OP50 and HT115) differ in their vitamin B12 content, and thus could have different consequences.

      We added this information in figures, figure legends, and lines 259-260 of the revised manuscript.

      The authors should clarify whether it is really vitamin B12 or any other metabolite from the bacteria (like methionine) that is bringing about the phenotypes. Have they tested metabolically inactive bacteria?

      The reviewer correctly points out that bacterial metabolism may play a role in the effects exerted by vitamin B12. We have not tested metabolically inactivated bacteria, however, our RNAi experiments (Figure 4E) demonstrate that supplemented vitamin B12 acts through the Met/SAM cycle in idh-1neo animals. Please also see cross-comments from Reviewer #2.

      The authors consistently use 64 nM of Vitamin B12. Will the hphd-1 mutant and the idh-1neo mutant have different vitamin B12 thresholds for the observed phenotypes?

      Thank you for raising this interesting point. While 64 nM vitamin B12 virtually eliminates 3HP accumulation in idh-1 animals (Figure 2D), we have not tested if this dose is sufficient to eliminate 3HP accumulation in hphd-1 mutant. However, potential differences in 3HP levels in idh-1neo and hphd-1 animals treated with vitamin B12 would not contradict our conclusion that 3HP is not the cause of embryonic lethality in idh-1neo mutant animals.

      Figure 3b: HT115 has inherently high levels of vitamin B12 so the RNAi effect of genes should be seen on the OP50 diet supplemented with B12.

      Despite reports of elevated B12 levels in E. coli HT115, vitamin B12-induced embryonic lethality of idh-1neo on a diet of OP50 is more severe than on a diet of HT115 bacteria (Figure 4C). Therefore, it may be harder to quantify synthetic lethal interaction of idh1-neo with GCS RNAi knockdown using OP50 strains (which would need to be created).

      The authors should conduct metabolomic profiling to examine changes in metabolic pathways, including 1C, glycine metabolism, glucose metabolism etc, in idh-1neo animals subjected to GCS gene knockdown, and vitamin B12 supplementation.

      While these results would be interesting and further our understanding of metabolic changes that occur in idh-1neo mutant animals we think they are beyond the scope of the manuscript. Also, please see cross-comments from Reviewer #2.

      Perform rescue experiments using different one-carbon donors (e.g., formate, serine) to restore embryonic viability in idh-1neo mutants under conditions of vitamin B12-induced stress. Quantify the efficacy of these interventions using developmental assays.

      In addition to formate rescue experiments (Figure 4C), we supplemented idh-1neo animals with serine (Figure 4D and S7). Similar to formate, serine supplementation resulted in the rescue of idh-1neo embryonic lethality on an E. coli OP50 diet (lines 187-189). The lack of rescue on an HT115 diet could be due to HT115 bacteria containing more glycine (Gao et al., 2017), which might limit the efficiency of serine conversion to glycine needed for 1C unit production.

      Provide experimental evidence to show that idh-1neo animals possess an alternative source of energy.

      We have previously found that diminished production of ketone bodies in ∆dhgd-1 mutants causes embryonic lethality that can be rescued by exogenous supplementation of ketone body 3-hydroxybutyrate (Ponomarova et al., 2023). In contrast to dhgd-1 mutants, idh-1neo embryonic lethality fails to respond to supplemented 3-hydroxybutyrate (Figure S4), indicating the lethality associated with the idh-1neo mutation is caused by a different mechanism, i.e., a depletion in 1C-units.

      The authors use vitamin B12 to inhibit the shunt pathway (line 127). They should explore alternate strategies to do the same, like gene knockdown.

      Please see our response to comment 1 above where we discuss RNAi knock-down of the shunt pathway gene, hphd-1.

      It is not clear why the authors did not follow up with the other phenotypes of the idh-1neo that were visible without the Vitamin B12 supplementation. They should follow up with this and also other phenotypes to explore the broader physiological consequences of D-2HG accumulation.

      We agree that the other physiological consequences of D-2HG accumulation are interesting, and we plan to investigate them in our future studies.

      The authors should include control experiments without supplementation of vitamin B12, ketone bodies etc. in each of their figures.

      We thank the reviewer for this suggestion. We have added these data (Figures S5, 6, 7, and 8).

      The authors posit that the idh-1neo depletes the 1C pool leading to the observed lethality. So, when they supply formate to replenish it, they rescue the lethality of the B12-treated worms. Similar results are obtained by knocking down the enzymes. So where are the 1C units going? Understanding this will provide the much-needed mechanistic understanding to this study.

      We appreciate this insightful comment and expand our discussion to elaborate on this issue (lines 224-227). "We propose that a lack of 1C units in idh-1neo can impede pyrimidine biosynthesis via thymidylate synthase tyms-1, which uses 1C units to generate dTMP. Supporting this hypothesis, RNAi of tyms-1 causes embryonic lethality (36-38)."

      It may be important to measure the D-2HG levels in the mitochondria vs the cytosol.

      While this is an interesting point, we think that this line of inquiry is beyond the scope of this work (and is technically challenging).

      The idh-1neo is an oncometabolite. The authors do not show any data to indicate whether this mutant has any defect in cell division/cell cycle in the somatic tissue or germline.

      In this study we primarily focused on the molecular changes in the metabolic network that occur in idh-1neo mutant animals, which we think is an important advance in understanding the basis for how this mutation affects IDH function. Additional phenotypic outcomes of these perturbed metabolic processes will be the basis of future studies.

      Reviewer #3 (Significance (Required)):

      The audience will be limited to the field although the study pertains to an oncometabolite. The study value would have improved if the authors had included cancer cell data. Also, the phenotype studied has not been mechanistically linked to the oncometabolite function, making the study academic in nature.

      While we agree that the link between idh-1neo, 2HG production and oncometabolite function has not been directly shown we think that our study adds important molecular understanding of metabolic changes that occur in relation to idh-1neo function which are important for future studies of how this mutation affects carcinogenesis. Also, please see cross-comments from Reviewer #2.

      In addition, we specified statistical significance in Figure 2, described statistical tests used (lines 361-363) and corrected a few grammatical errors throughout the text.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      The manuscript by Sejour et al. is testing "translational ramp" model described previously by Tuller et al. in S. cerevisiae. Authors are using bioinformatics and reporter based experimental approaches to test whether "rare codons" in the first 40 codons of the gene coding sequences increase translation efficiency and regulate abundance of translation products in yeast cells. Authors conclude that "translation ramp" model does not have support using a new set of reporters and bioinformatics analyses. The strength of bioinformatic evidence and experimental analyses (even very limited) of the rare codons insertion in the reporter make a compelling case for the authors claims. However the major weakness of the manuscript is that authors do not take into account other models that previously disputed "rare or slow codon" model of Tuller et al. and overstate their own results that are rather limited. This maintains to be the weak part of the manuscript even in the revised form.

      We are glad the reviewer thinks our evidence makes “a compelling case for the authors claims”. This was our main aim, and we are satisfied with this.

      The reviewer believes the major weakness of the manuscript is that we do not take into account other models and do not (see below) cite numerous other relevant papers. The reviewer made essentially the same criticism at the first review, at which time we looked quite hard for papers generally meeting the reviewer’s description. We found a few, which we incorporated here. Still, we did not find the body of evidence whose existence the reviewer implies. We are citing every study we know to be relevant, though of course we will have inadvertently missed some, given the huge body of literature. After the first round of review, we wrote “the reviewer did not give specific references, and, though we looked, we weren’t always sure which papers the reviewer had in mind.” We hoped the reviewer would provide citations. But only two citations are provided here, both to A. Kochetov, and these don’t seem central to the reviewer’s points.

      The studies that authors do not mention argue with "translation ramp" model and show more thorough analyses of translation initiation to elongation transition as well as early elongation "slow down" in ribosome profiling data. Moreover several studies have used bioinformatical analyses to point out the evolution of N-terminal sequences in multiple model organisms including yeast, focusing on either upstream ORFs (uORFs) or already annotated ORFs. The authors did not mention multiple of these studies in their revised manuscript and did not comment on their own results in the context of these previous studies.

      Mostly, we do not know to what papers the reviewer is referring. This may be our failing, but it would have helped if the reviewer had cited one of them. There are papers discussing the evolution of N-terminal sequences, but as far as we know, these do not discuss translation speed or codon usage. Of course, we may have missed some papers.

      As such the authors approach to data presentation, writing and data discussion makes the manuscript rather biased, focused on criticizing Tuller et al. study and short on discussing multiple other possible reasons for slow translation elongation at the beginning of the protein synthesis. This all together makes the manuscript at the end very limited.

      We think the reviewer may be considering our paper as being generally about translation speeds, whereas in our minds, it is not. This difference in views as to what the paper is “about” is perhaps causing friction. To us, it is indeed a limited paper. We are narrowly focused on the finding of Tuller that there is an enrichment of rare, slow codons at the 5’ end of genes, and we have sought an explanation of this particular fact. This is not a paper about rates of translation generally—it is a limited paper about the reason for the 5’ enrichment of rare, slow codons.

      To expand on this, the encoded slow 5’ translation due to rare, slow codons (of Tuller et al.) is a small effect (1% to 3%). The possible unencoded slow 5’ translation of unknown mechanism discussed by some other papers (e.g., Weinberg et al. 2016, Shah et al. 2013) is a much larger effect (50% or more). Just from the different magnitudes, it seems likely these are different phenomena. And yet, despite the small size of the encoded effect, it is for some reason this paper by Tuller et al. that has captured the attention of the literature: as we point out below, Tuller et al. has been cited over 900 times. Partly because of the wide and continuing influence of this paper, it is worth specifically and narrowly addressing its findings.

      Reviewer #2 (Public Review):

      Tuller et al. first made the curious observation, that the first ∼30-50 codons in most organisms are encoded by scarce tRNAs and appear to be translated slower than the rest of the coding sequences (CDS). They speculated that this has evolved to pace ribosomes on CDS and prevent ribosome collisions during elongation - the "Ramp" hypothesis. Various aspects of this hypothesis, both factual and in terms of interpreting the results, have been challenged ever since. Sejour et al. present compelling results confirming the slower translation of the first ~40 codons in S. cerevisiae but providing an alternative explanation for this phenomenon. Specifically, they show that the higher amino acid sequence divergence of N-terminal ends of proteins and accompanying lower purifying selection (perhaps the result of de novo evolution) is sufficient to explain the prevalence of rare slow codons in these regions. These results are an important contribution in understanding how aspects of the evolution of protein coding regions can affect translation efficiency on these sequences and directly challenge the "Ramp" hypothesis proposed by Tuller et al.

      I believe the data is presented clearly and the results generally justify the conclusions.

      We thank the reviewer for his/her attention to the manuscript, and for his/her comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      As mentioned in the public review major weakness of the manuscript is the lack of analyses for confounding effects, overstatements of the results (using single amino acid sequence reporter) and the lack of discussion of previous work that argues against Tuller et al model. In my previous review I mentioned multiple other studies that addressed "slow codons" model in more detail.

      No, the reviewer did not cite any specific studies.

      While some of these studies are mentioned in the revised manuscript, authors are still rather biased and selective in their discussions. I should also point out that previous studies, that authors fail again to mention, were focused on either translation initiation, initiation to elongation transition or early elongation effects in relation to mRNA sequence, structure, codons as well as amino acid sequence. Also additional studies with bioinformatic analyses of N-terminal conservation and existence of start sites at the beginning of the protein sequences in multiple model organisms were also omitted.

      Again, we do not know to what papers the reviewer is referring. But this sounds like a lot. Our paper is aimed at a specific, narrow topic: Why is there an excess of rare, slow codons in the 5’ region of genes? We are not trying to make general statements about all things affecting and affected by translation speed, we are just trying to explain the excess of rare, slow codons.

      In general manuscript seems to be too much focused-on discussion of Tuller's paper . . .

      Yes, we are focused on the Tuller findings, the excess of rare slow codons in 5’ regions.

      . . . and arguing with the model that was already shown by multiple other studies to be limited and not correct.

      We find it unsatisfactory that the reviewer states in a public review that there are multiple other studies showing that the Tuller model is not correct, and yet does not cite any of them. Furthermore, for the reviewer to say that Tuller et al. is “not correct” is too sweeping. The core finding of Tuller et al. was the excess of rare, slow codons in the 5’ regions of genes. We confirm this; we believe it is correct; we are not aware of any literature disputing this. Then, Tuller interpreted this as an adaptation to promote translational efficiency. On the interpretation, we disagree with Tuller. But if one is to disagree with this interpretation, one needs an alternative explanation of the fact of the excess rare, slow codons. Providing such an alternative explanation, and doing an experiment to distinguish the explanations, is our contribution. We are not aware of any other paper making our interpretation.

      There are of course many papers that discuss various aspects of translation at the 5’ ends of genes, and we do cite quite a few such papers in our manuscript, though certainly not all. But papers of this general kind do not, and cannot, show that Tuller et al. is “not correct”. As far as we know, no paper provides an alternative explanation for the rare slow codons, and no paper does an experiment to modulate translation speed and look at the effect on gene expression. Notably, the slow translation phenomenon associated with the rare codons found by Tuller et al. is a very small effect—a change of about 1% to 3% of translation speed. Some other papers on translation speed are dealing with possible changes in the range of 50% or more. These are presumably some other phenomenon (if indeed they are even real changes in translation speed), and, whether they are true or not, the results and interpretations of Tuller et al. could still be true or not. Of course, if we knew of some previous paper showing the Tuller paper is not correct, we should and would cite it.

      To expand on the current view of Tuller in the literature, Tuller et al. has been cited 956 times according to Google Scholar. This makes it an extremely influential paper. After finding Tuller et al. in Entrez Pubmed, one can look under “Cited by” and see the five most recent papers that cite Tuller et al. The five papers given on May 23 2024 were Bharti . . . Ignatova 2024; Uddin 2024; Khandia . . . Choudhary 2024; Love and Nair 2024; and Oelschlaeger 2024. We went through these five most recent papers that cite Tuller et al., and asked, did these authors cite the Tuller results as fully correct, or did they mention any doubts about the results? All five of the papers cited the Tuller results as fully correct, with no mention of any kind of doubt. For instance, Kandia et al. 2024 state “The slow “ramp” present at 5’ end of mRNA forms an optimal and robust means to reduce ribosomal traffic jams, thus minimizing the cost of protein expression40.”, while Oelschlaeger (2024) states “Slow translation ramps have also been described elsewhere and proposed to prevent traffic jams along the mRNA [51,52,53].” Although Uddin (2024) cited Tuller as fully correct, Uddin seemed to think (it is a little unclear) that Tuller found an enrichment of highly-used codons, opposite to the actual finding. The multiple contrary studies mentioned by the reviewer do not seem to have been very influential.

      There are papers containing skepticism about the Tuller interpretation, and also papers with results that are difficult to reconcile in a common-sense way with the Tuller interpretation. But skepticism, and a difficulty to reconcile with common sense, are far from a demonstration that a paper is incorrect. Indeed, Tuller et al. may have been published in Cell, and may be so highly cited, exactly because the findings are counter-intuitive, colliding with common sense. Our contribution is to find a common-sense interpretation of the surprising but correct underlying fact of the 5’ enrichment of rare, slow codons.

      Having wrote that in the previous review, I have to admit that Sejour et al manuscript in the main text has a minimal amount of novelty with experimental evidence, the conclusions are based on three reporters with and without stalling/collision sequence with the same amino acid sequence and varying codons. Some more novelty is seen in bioinformatic analyses of multiple yeast sequences and sequence conservation at the N-termini of proteins. However, even this part of the manuscript is not discussed fully and with correct comparison to previous studies. Authors, based on my previous comments discuss further experimental shortcomings in their new and "expanded" discussion but the use of a single reporter in this case cannot relate to all differences that may be coming from ORFs seen in complete yeast transcriptome. There are multiple studies that used more reporters with more than one amino-acid and mRNA sequence as well as with similar variation of the rare or common codons. The handwaving argument about the influence of all other mechanisms that can arise from different start sites, RNA structure, peptide interaction with exit channel, peptidyl-tRNA drop-off, eIF3 complex initiation-elongation association, and etc, is just pointing up to a manuscript that is more about bashing up Tuller's model and old paper than trying to make a concise story about their own results and discuss their study in plethora of studies that indicated multiple other models for slow early elongation.

      We don’t understand why the reviewer is so grudging.

      Discussion of the ribosome's collisions and potential impact of such scenario in the author's manuscript is left completely without citation, even though such work has relevant results to the author's conclusions and Tuller's model.

      This is not true. We cite Dao Duc and Song (2018) “The impact of ribosomal interference, codon usage, and exit tunnel interactions on translation elongation rate variation.” PLoS Genet 14, and Tesina, . . . and Green (2020) “Molecular mechanism of translational stalling by inhibitory codon combinations and Poly(A) tracts. EMBO J., which are two excellent papers on this subject. We also cite Gamble et al. (2016), who found the underlying result, but at that time did not attribute it to ribosome collisions.

      Previous studies (not cited) for example clearly indicate how the length from stalling sequence to start codon is related to ribosome collisions. Moreover such studies are pointing out differences in initiation vs elongation rates that may impact ribosome collisions and protein expression. Both of these topics would be very valuable in discussions of evolutionary changes in the current yeast ORFs. Not to mention that authors do not really discuss also possibilities for differences in 5'UTRs and uORFs in relation to downstream ORFs sequence and codon composition.

      It is not clear to us that such papers are highly relevant to the issue on which we are working.

      The argument about whether cycloheximide or not is doing 5' ribosome slowdown (lines 425-443) is just rambling about Weinberg's paper from 2016 without any real conclusion. In this section authors are just throwing down hypothesis that were more clearly explained in Weinberg's manuscript or shown experimentally in studies done after the Weinberg et al. paper was published.

      Earlier, the reviewer had the criticism that “The studies that authors do not mention argue with "translation ramp" model and show more thorough analyses of translation initiation to elongation transition as well as early elongation "slow down" in ribosome profiling data.” The main study we know of dealing with these issues like these is that of Weinberg et al. 2016. In our opinion, this is a thoughtful paper on these issues. But now, at this point, the reviewer seems to criticize the fact that we do extensively cite results from Weinberg et al. It is true that there is no ultimate conclusion, but why there is no conclusion is a little bit interesting. Weinberg et al show that even in studies that do not use cycloheximide as the first step in ribosome profiling, there is some left-over high density of ribosomes near 5’ ends. But, all these ribosome profiling experiments do use cycloheximide at a later step in the procedure. Until someone does a ribosome profiling experiment without the use of any cycloheximide at any step, there will be no firm conclusion. This is not our fault—and also not the issue we are writing about. And, the reason this paragraph is in the manuscript at all is that the reviewer (we thought) had asked for something like this in the first review.

      At the end, even in the limited novelty of evolutionary arguments about non-existing N-terminal conservation of codons or amino acids they fail to cite and discuss previous work by Kochetov (BioEssays, 2008 and NAR, 2011) which have additional explanation on evolution of N-terminal sequences in yeast, human or Drosophila.

      These two papers of Dr. Kochetov’s have some relevance and we now cite them. These are the only papers cited by the reviewer in his/her two reviews.

      Probably the reviewer would have preferred a paper on a different subject.


      The following is the authors’ response to the original reviews.

      Response to Reviewers:

      We thank the reviewers for their comments, and their evident close reading of the manuscript. Generally, we agree with the reviewers on the strengths and weaknesses of our manuscript. Our revised manuscript has a more extensive discussion of alternative explanations for initial high ribosome density as seen by ribosome profiling, and which more specifically points out the limitations of our work.

      As a preface to specific responses to the reviewers, we will say that we could divide observations of slow initial translation into two categories, which we will call “encoded slow codons”, and “increased ribosome density”. With respect to the first category, Tuller et al. documented initial “encoded slow codons”, that is, there is a statistical excess of rare, slowly-translated codons at the 5’ ends of genes. Although the size of this effect is small, statistical significance is extremely high, and the existence of this enrichment is not in any doubt. At first sight, this appears to be a strong indication of a preference for slow initial translation. In our opinion, our main contribution is to show that there is an alternative explanation for this initial enrichment of rare, slow codons—that they are a spandrel, a consequence of sequence plasticity at the 5’ (and 3’) ends of genes. The reviewers seem to generally agree with this, and we are not aware that any other work has provided an explanation for the 5’ enrichment of rare codons.

      The second category of observations pertaining to slow initial translation is “increased ribosome density”. Early ribosome profiling studies used cycloheximide to arrest cell growth, and these studies showed a higher density of ribosomes near the 5’ end of genes than elsewhere. This high initial ribosome density helped motivate the paper of Tuller et al., though their finding of “encoded slow codons” could explain only a very small part of the increased ribosome density. More modern ribosome profiling studies do not use cycloheximide as the first step in arresting translation, and in these studies, the density of ribosomes near the 5’ end of genes is greatly reduced. And yet, there remains, even in the absence of cycloheximide at the first step, a significantly increased density of ribosomes near the 5’ end (e.g., Weinberg et al., 2016). (However, most or all of these studies do use cycloheximide at a later step in the protocol, and the possibility of a cycloheximide artefact is difficult to exclude.) Some of the reviewer’s concerns are that we do not explain the increased 5’ ribosome density seen by ribosome profiling. We agree; but we feel it is not the main point of our manuscript. In revision, we more extensively discuss other work on increased ribosome density, and more explicitly point out the limitations of our manuscript in this regard. We also note, though, that increased ribosome density is not a direct measure of translation speed—it can have other causes.

      Specific Responses.

      Reviewer 1 was concerned that we did not more fully discuss other work on possible reasons for slow initial translation. We discuss such work more extensively in our revision. However, as far as we know, none of this work proposes a reason for the 5’ enrichment of rare, slow codons, and this is the main point of our paper. Furthermore, it is not completely clear that there is any slow initial translation. The increase in ribosome density seen in flash-freeze ribosome profiling could be an artefact of the use of cycloheximide at the thaw step of the protocols; or it could be a real measure of high ribosome density that occurs for some other reason than slow translation (e.g., ribosomes might have low processivity at the 5’ end).

      Reviewer 1 was also concerned about confounding effects in our reporter gene analysis of the effects of different codons on efficiency of translation. We have two comments. First, it is important to remember that although we changed codons in our reporters, we did not change any amino acids. We changed codons only to synonymous codons. Thus at least one of the reviewer’s possible confounding effects—interactions of the nascent peptide chain with the exit channel of the ribosome—does not apply. However, of course, the mRNA nucleotide sequence is altered, and this would cause a change in mRNA structure or abundance, which could matter. We agree this is a limitation to our approach. However, to fully address it, we feel it would be necessary to examine a really large number of quite different sequences, which is beyond the scope of this work. Furthermore, mRNAs with low secondary structure at the 5’ end probably have relatively high rates of initiation, and also relatively high rates of elongation, and it might be quite difficult to disentangle these. But in neither case is there an argument that slow initial translation is efficient. Accurate measurement of mRNA levels would be helpful, but would not disentangle rates of initiation from rates of elongation as causes of changes in expression.

      Reviewer 2 was concerned that the conservation scores for the 5’ 40 amino acids, and the 3’ 40 amino acids were similar, but slow translation was only statistically significant for the 5’ 40 amino acids. As we say in the manuscript, we are also puzzled by this. We note that 3’ translation is statistically slow, if one looks over the last 100 amino acids. Our best effort at an explanation is a sort of reverse-Tuller explanation: that in the last 40 amino acids, the new slow codons created by genome plasticity are fairly quickly removed by purifying selection, but that in the first 40 amino acids, for genes that need to be expressed at low levels, purifying selection against slow codons is reduced, because poor translation is actually advantageous for these genes. To expand on this a bit, we feel that the 5000 or so proteins of the proteome have to be expressed in the correct stoichiometric ratios, and that poor translation can be a useful tool to help achieve this. In this explanation, slow translation at the 5’ end is bad for translation (in agreement with our reporter experiments), but can be good for the organism, when it occurs in front of a gene that needs to be expressed poorly. Whereas, in Tuller, slow translation at the 5’ end is good for translation.

      Reviewer 2 wondered whether the N-terminal fusion peptide affects GFP fluorescence in our reporter. This specific reporter, with this N-terminus, has been characterized by Dean and Grayhack (2012), and by Gamble et al. (2016), and the idea that a super-folder GFP reporter is not greatly affected by N-terminal fusions is based on the work of Pedelacq (2006). None of these papers show whether this N-terminal fusion might have some effect, but together, they provide good reason to think that any effect would be small. These citations have been added.

    1. Author response:

      Reviewer #1 (Public Review):

      Abbasi et al. assess in this MEG study the directed connectivity of both cortical and subcortical regions during continuous speech production and perception. The authors observed bidirectional connectivity patterns between speech-related cortical areas as well as subcortical areas in production and perception. Interestingly, they found in speaking low-frequency connectivity from subcortical (the right cerebellum) to cortical (left superior temporal) areas, while connectivity from the cortical to subcortical areas was in the high frequencies. In listening a similar cortico-subcortical connectivity pattern was observed for the low frequencies, but the reversed connectivity in the higher frequencies was absent.

      The work by Abbasi and colleagues addresses a relevant, novel topic, namely understanding the brain dynamics between speaking and listening. This is important because traditionally production and perception of speech and language are investigated in a modality-specific manner. To have a more complete understanding of the neurobiology underlying these different speech behaviors, it is key to also understand their similarities and differences. Furthermore, to do so, the authors utilize state-of-the-art directed connectivity analyses on MEG measurements, providing a quite detailed profile of cortical and subcortical interactions for the production and perception of speech. Importantly, and perhaps most interesting in my opinion, is that the authors find evidence for frequency-specific directed connectivity, which is (partially) different between speaking and listening. This could suggest that both speech behaviors rely (to some extent) on similar cortico-cortical and cortico-subcortical networks, but different frequency-specific dynamics.

      These elements mentioned above (investigation of both production and perception, both cortico-cortical and cortico-subcortical connectivity is considered, and observing frequency-specific connectivity profiles within and between speech behaviors), make for important novel contributions to the field. Notwithstanding these strengths, I find that they are especially centered on methodology and functional anatomical description, but that precise theoretical contributions for neurobiological and cognitive models of speech are less transparent. This is in part because the study compares speech production and perception in general, but no psychophysical or psycholinguistic manipulations are considered. I also have some critical questions about the design which may pose some confounds in interpreting the data, especially with regard to comparing production and perception.

      (1) While the cortico-cortical and cortico-subcortical connectivity profiles highlighted in this study and the depth of the analyses are impressive, what these data mean for models of speech processing remains on the surface. This is in part due, I believe, to the fact that the authors have decided to explore speaking and listening in general, without targeting specific manipulations that help elucidate which aspects of speech processing are relevant for the particular connectivity profiles they have uncovered. For example, the frequency-specific directed connectivity is it driven by low-level psychophysical attributes of the speech or by more cognitive linguistic properties? Does it relate to the monitoring of speech, timing information, and updating of sensory predictions? Without manipulations trying to target one or several of these components, as some of the referenced work has done (e.g., Floegel et al., 2020; Stockert et al., 2021; Todorović et al., 2023), it is difficult to draw concrete conclusions as to which representations and/or processes of speech are reflected by the connectivity profiles. An additional disadvantage of not having manipulations within each speech behavior is that it makes the comparison between listening and speaking harder. That is, speaking and listening have marked input-output differences which likely will dominate any comparison between them. These physically driven differences (or similarities for that matter; see below) can be strongly reduced by instead exploring the same manipulations/variables between speaking and listening. If possible (if not to consider for future work), it may be interesting to score psychophysical (e.g., acoustic properties) or psycholinguistic (e.g., lexical frequency) information of the speech and see whether and how the frequency-specific connectivity profiles are affected by it.

      We thank the reviewer for pointing this out. The current study is indeed part of a larger project investigating the role of the internal forward model in speech perception and production. In the original, more comprehensive study, we also included a masked condition where participants produced speech as usual, but their auditory perception was masked. This allowed us to examine how the internal forward model behaves when it doesn't receive the expected sensory consequences of generated speech. However, for the current study, we focused solely on data from the speaking and listening conditions due to its specific research question. We agree that further manipulations would be interesting. However, for this study our focus was on natural speech and we avoided other manipulations (beyond masked speech) so that we can have sufficiently long recording time for the main speaking and listening conditions.

      (2) Recent studies comparing the production and perception of language may be relevant to the current study and add some theoretical weight since their data and interpretations for the comparisons between production and perception fit quite well with the observations in the current work. These studies highlight that language processes between production and perception, specifically lexical and phonetic processing (Fairs et al., 2021), and syntactic processing (Giglio et al., 2024), may rely on the same neural representations, but are differentiated in their (temporal) dynamics upon those shared representations. This is relevant because it dispenses with the classical notion in neurobiological models of language where production and perception rely on (partially) dissociable networks (e.g., Price, 2010). Rather those data suggest shared networks where different language behaviors are dissociated in their dynamics. The speech results in this study nicely fit and extend those studies and their theoretical implications.

      We thank the reviewer for the suggestion and we will include these references and the points made by the reviewer in our revised manuscript.

      (3) The authors align the frequency-selective connectivity between the right cerebellum and left temporal speech areas with recent studies demonstrating a role for the right cerebellum for the internal modelling in speech production and monitoring (e.g., Stockert et al., 2021; Todorović et al., 2023). This link is indeed interesting, but it does seem relevant to point out that at a more specific scale, it does not concern the exact same regions between those studies and the current study. That is, in the current study the frequency-specific connectivity with temporal regions concerns lobule VI in the right cerebellum, while in the referenced work it concerns Crus I/II. The distinction seems relevant since Crus I/II has been linked to the internal modelling of more cognitive behavior, while lobule VI seems more motor-related and/or contextual-related (e.g., D'Mello et al., 2020; Runnqvist et al., 2021; Runnqvist, 2023).

      We thank the reviewer for their insightful comment. The reference was intended to provide evidence for the role of the cerebellum in internal modelling in speech. We do not claim that we have the spatial resolution with MEG to reliably spatially resolve specific parts of the cerebellum.

      (4) On the methodological side, my main concern is that for the listening condition, the authors have chosen to play back the speech produced by the participants in the production condition. Both the fixed order as well as hearing one's own speech as listening condition may produce confounds in data interpretation, especially with regard to the comparison between speech production and perception. Could order effects impact the observed connectivity profiles, and how would this impact the comparison between speaking and listening? In particular, I am thinking of repetition effects present in the listening condition as well as prediction, which will be much more elevated for the listening condition than the speaking condition. The fact that it also concerns their own voice furthermore adds to the possible predictability confound (e.g., Heinks-Maldonado et al., 2005). In addition, listening to one's speech which just before has been articulated may, potentially strategically even, enhance inner speech and "mouthing" in the participants, hereby thus engaging the production mechanism. Similarly, during production, the participants already hear their own voice (which serves as input in the subsequent listening condition). Taken together, both similarities or differences between speaking and listening connectivity may have been due to or influenced by these order effects, and the fact that the different speech behaviors are to some extent present in both conditions.

      This is a valid point raised by the reviewer. By listening to their own previously produced speech, our participants might have anticipated and predicted the sentences easier. However, during designing our experiment, we tried to lower the chance of this anticipation by several steps. First, participants were measured in separate sessions for speech production and perception tasks. There were always several days' intervals between performing these two conditions. Secondly, our questions were mainly about a common/general topic. Consequently, participants may not remember their answers completely.

      Importantly, using the same stimulus material for speaking and listening guaranteed that there was no difference in the low-level features of the material for both conditions that could have affected the results of our statistical comparison.

      Due to bone conduction, hearing one’s unaltered own speech from a recording may seem foreign and could lead to unwanted emotional reactions e.g. embarrassment, so participants were asked whether they heard their own voice in a recording already (e.g. from a self-recorded voice-message in WhatsApp) which most of them confirmed. Participants were also informed that they were going to hear themselves during the measurement to further reduce unwanted psychophysiological responses.

      (5) The ability of the authors to analyze the spatiotemporal dynamics during continuous speech is a potentially important feat of this study, given that one of the reasons that speech production is much less investigated compared to perception concerns motor and movement artifacts due to articulation (e.g., Strijkers et al., 2010). Two questions did spring to mind when reading the authors' articulation artifact correction procedure: If I understood correctly, the approach comes from Abbasi et al. (2021) and is based on signal space projection (SSP) as used for eye movement corrections, which the authors successfully applied to speech production. However, in that study, it concerned the repeated production of three syllables, while here it concerns continuous speech of full words embedded in discourse. The articulation and muscular variance will be much higher in the current study compared to three syllables (or compared to eye movements which produce much more stable movement potentials compared to an entire discourse). Given this, I can imagine that corrections of the signal in the speaking condition were likely substantial and one may wonder (1) how much signal relevant to speech production behavior is lost?; (2) similar corrections are not necessary for perception, so how would this marked difference in signal processing affect the comparability between the modalities?

      One of the results of our previous study (Abbasi et al., 2021) was that the artefact correction was not specific to individual syllables but generalised across syllables. Also, the repeated production of syllables was associated with substantial movements of the articulators mimicking those observed during naturalistic speaking. We therefore believe that the artefact rejection is effective during speaking. We also checked this by investigating speech related coherence in brain parcels in spatial proximity to the articulators. In our previous study we also show that the correction method retains neural activity to a very large degree. We are therefore confident that speaking and listening conditions can be compared and that the loss of true signals from correcting the speaking data will be minor.

      References:

      • Abbasi, O., Steingräber, N., & Gross, J. (2021). Correcting MEG artifacts caused by overt speech. Frontiers in Neuroscience, 15, 682419.

      • D'Mello, A. M., Gabrieli, J. D., & Nee, D. E. (2020). Evidence for hierarchical cognitive control in the human cerebellum. Current Biology, 30(10), 1881-1892.

      • Fairs, A., Michelas, A., Dufour, S., & Strijkers, K. (2021). The same ultra-rapid parallel brain dynamics underpin the production and perception of speech. Cerebral Cortex Communications, 2(3), tgab040.

      • Floegel, M., Fuchs, S., & Kell, C. A. (2020). Differential contributions of the two cerebral hemispheres to temporal and spectral speech feedback control. Nature Communications, 11(1), 2839.

      • Giglio, L., Ostarek, M., Sharoh, D., & Hagoort, P. (2024). Diverging neural dynamics for syntactic structure building in naturalistic speaking and listening. Proceedings of the National Academy of Sciences, 121(11), e2310766121.

      • Heinks‐Maldonado, T. H., Mathalon, D. H., Gray, M., & Ford, J. M. (2005). Fine‐tuning of auditory cortex during speech production. Psychophysiology, 42(2), 180-190.

      • Price, C. J. (2010). The anatomy of language: a review of 100 fMRI studies published in 2009. Annals of the new York Academy of Sciences, 1191(1), 62-88.

      • Runnqvist, E., Chanoine, V., Strijkers, K., Pattamadilok, C., Bonnard, M., Nazarian, B., ... & Alario, F. X. (2021). Cerebellar and cortical correlates of internal and external speech error monitoring. Cerebral Cortex Communications, 2(2), tgab038.

      • Runnqvist, E. (2023). Self-monitoring: The neurocognitive basis of error monitoring in language production. In Language production (pp. 168-190). Routledge.

      • Stockert, A., Schwartze, M., Poeppel, D., Anwander, A., & Kotz, S. A. (2021). Temporo-cerebellar connectivity underlies timing constraints in audition. Elife, 10, e67303.

      • Strijkers, K., Costa, A., & Thierry, G. (2010). Tracking lexical access in speech production: electrophysiological correlates of word frequency and cognate effects. Cerebral cortex, 20(4), 912-928.

      • Todorović, S., Anton, J. L., Sein, J., Nazarian, B., Chanoine, V., Rauchbauer, B., ... & Runnqvist, E. (2023). Cortico-cerebellar monitoring of speech sequence production. Neurobiology of Language, 1-21.

      Reviewer #2 (Public Review):

      Summary:

      The authors re-analyse MEG data from a speech production and perception study and extend their previous Granger causality analysis to a larger number of cortical-cortical and in particular cortical-subcortical connections. Regions of interest were defined by means of a meta-analysis using Neurosynth.org and connectivity patterns were determined by calculating directed influence asymmetry indices from the Granger causality analysis results for each pair of brain regions. Abbasi et al. report feedforward signals communicated via fast rhythms and feedback signals via slow rhythms below 40 Hz, particularly during speaking. The authors highlight one of these connections between the right cerebellum lobule VI and auditory association area A5, where in addition the connection strength correlates negatively with the strength of speech tracking in the theta band during speaking (significant before multiple comparison correction). Results are interpreted within a framework of active inference by minimising prediction errors.

      While I find investigating the role of cortical-subcortical connections in speech production and perception interesting and relevant to the field, I am not yet convinced that the methods employed are fully suitable to this endeavour or that the results provide sufficient evidence to make the strong claim of dissociation of bottom-up and top-down information flow during speaking in distinct frequency bands.

      Strengths:

      The investigation of electrophysiological cortical-subcortical connections in speech production and perception is interesting and relevant to the field. The authors analyse a valuable dataset, where they spent a considerable amount of effort to correct for speech production-related artefacts. Overall, the manuscript is well-written and clearly structured.

      Weaknesses:

      The description of the multivariate Granger causality analysis did not allow me to fully grasp how the analysis was performed and I hence struggled to evaluate its appropriateness. Knowing that (1) filtered Granger causality is prone to false positives and (2) recent work demonstrates that significant Granger causality can simply arise from frequency-specific activity being present in the source but not the target area without functional relevance for communication (Schneider et al. 2021) raises doubts about the validity of the results, in particular with respect to their frequency specificity. These doubts are reinforced by what I perceive as an overemphasis on results that support the assumption of specific frequencies for feedforward and top-down connections, while findings not aligning with this hypothesis appear to be underreported. Furthermore, the authors report some main findings that I found difficult to reconcile with the data presented in the figures. Overall, I feel the conclusions with respect to frequency-specific bottom-up and top-down information flow need to be moderated and that some of the reported findings need to be checked and if necessary corrected.

      Major points

      (1) I think more details on the multivariate GC approach are needed. I found the reference to Schaum et al., 2021 not sufficient to understand what has been done in this paper. Some questions that remained for me are:

      (i) Does multivariate here refer to the use of the authors' three components per parcel or to the conditioning on the remaining twelve sources? I think the latter is implied when citing Schaum et al., but I'm not sure this is what was done here?

      If it was not: how can we account for spurious results based on indirect effects?

      Yes, multivariate refers to the three components.

      (ii) Did the authors check whether the GC of the course-target pairs was reliably above the bias level (as Schaum et. al. did for each condition separately)? If not, can they argue why they think that their results would still be valid? Does it make sense to compute DAIs on connections that were below the bias level? Should the data be re-analysed to take this concern into account?

      We performed statistics on DAI and believe that this is a valid approach. We argue that random GC effects would not survive our cluster-corrected statistics.

      (iii) You may consider citing the paper that introduced the non-parametric GC analysis (which Schaum et al. then went on to apply): Dhamala M, Rangarajan G, Ding M. Analyzing Information Flow in Brain Networks with Nonparametric Granger Causality. Neuroimage. 2008; 41(2):354-362. https://doi.org/10.1016/j.neuroimage.2008.02. 020

      Thanks, we will add this reference in the revised version.

      (2) GC has been discouraged for filtered data as it gives rise to false positives due to phase distortions and the ineffectiveness of filtering in the information-theoretic setting as reducing the power of a signal does not reduce the information contained in it (Florin et al., 2010; Barnett and Seth, 2011; Weber et al. 2017; Pinzuti et al., 2020 - who also suggest an approach that would circumvent those filter-related issues). With this in mind, I am wondering whether the strong frequency-specific claims in this work still hold.

      This must be a misunderstanding. We are aware of the problem with GC on filtered data. But GC was here computed on broadband data and not in individual frequency bands.

      (3) I found it difficult to reconcile some statements in the manuscript with the data presented in the figures:

      (i) Most notably, the considerable number of feedforward connections from A5 and STS that project to areas further up the hierarchy at slower rhythms (e.g. L-A5 to R-PEF, R-Crus2, L CB6 L-Tha, L-FOP and L-STS to R-PEF, L-FOP, L-TOPJ or R-A5 as well as R-STS both to R-Crus2, L-CB6, L-Th) contradict the authors' main message that 'feedback signals were communicated via slow rhythms below 40 Hz, whereas feedforward signals were communicated via faster rhythms'. I struggled to recognise a principled approach that determined which connections were highlighted and reported and which ones were not.

      (ii) "Our analysis also revealed robust connectivity between the right cerebellum and the left parietal cortex, evident in both speaking and listening conditions, with stronger connectivity observed during speaking. Notably, Figure 4 depicts a prominent frequency peak in the alpha band, illustrating the specific frequency range through which information flows from the cerebellum to the parietal areas." There are two peaks discernible in Figure 4, one notably lower than the alpha band (rather theta or even delta), the other at around 30 Hz. Nevertheless, the authors report and discuss a peak in the alpha band.

      (iii) In the abstract: "Notably, high-frequency connectivity was absent during the listening condition." and p.9 "In contrast with what we reported for the speaking condition, during listening, there is only a significant connectivity in low frequency to the left temporal area but not a reverse connection in the high frequencies."

      While Fig. 4 shows significant connectivity from R-CB6 to A5 in the gamma frequency range for the speaking, but not for the listening condition, interpreting comparisons between two effects without directly comparing them is a common statistical mistake (Makin and Orban de Xivry). The spectrally-resolved connectivity in the two conditions actually look remarkably similar and I would thus refrain from highlighting this statement and indicate clearly that there were no significant differences between the two conditions.

      (iv) "This result indicates that in low frequencies, the sensory-motor area and cerebellum predominantly transmit information, while in higher frequencies, they are more involved in receiving it."

      I don't think that this statement holds in its generality: L-CB6 and R-3b both show strong output at high frequencies, particularly in the speaking condition. While they seem to transmit information mainly to areas outside A5 and STS these effects are strong and should be discussed.

      We appreciate the reviewer's thoughtful comments. We acknowledge that not all connectivity patterns strictly adhere to the initial observation regarding feedback and feedforward communication. It's true that our primary focus was on interactions between brain regions known to be crucial for speech prediction, including auditory, somatosensory, and cerebellar areas. However, we also presented connectivity patterns across other regions to provide a more comprehensive picture of the speech network. We believe this broader perspective can be valuable for future research directions.

      Regarding the reviewer's observation about the alpha band peak in Figure 4, we agree that a closer examination reveals the connectivity from right cerebellum to the left parietal is in a wider low frequency range. We will refrain from solely emphasizing the alpha band and acknowledge the potential contribution of lower frequencies to cerebellar-parietal communication.

      We also appreciate the reviewer highlighting the need for a more nuanced interpretation of the listening condition connectivity compared to the speaking condition. The reviewer is correct in pointing out that while Figure 4 suggests a high-frequency connectivity from L-A5 to R-CB only in the speaking condition, a direct statistical comparison between conditions might not reveal a significant difference. We will revise the manuscript to clarify this point.

      Finally, a closer examination of Figure 3 revealed that the light purple and dark green edges in the speaking condition for R-CB6 and L-3b suggest outgoing connections at low frequencies, while other colored edges indicate information reception at high frequencies. We acknowledge that exceptions to this directional pattern might exist and warrant further investigation in future studies.

      (4) "However, definitive conclusions should be drawn with caution given recent studies raising concerns about the notion that top-down and bottom-up signals can only be transmitted via separate frequency channels (Ferro et al., 2021; Schneider et al., 2021; Vinck et al., 2023)."

      I appreciate this note of caution and think it would be useful if it were spelled out to the reader why this is the case so that they would be better able to grasp the main concerns here. For example, Schneider et al. make a strong point that we expect to find Granger-causality with a peak in a specific frequency band for areas that are anatomically connected when the sending area shows stronger activity in that band than the receiving one, simply because of the coherence of a signal with its own linear projection onto the other area. The direction of a Granger causal connection would in that case only indicate that one area shows stronger activity than the other in the given frequency band. I am wondering to what degree the reported connectivity pattern can be traced back to regional differences in frequency-specific source strength or to differences in source strength across the two conditions.

      This is indeed an important point. That is why we are discussing our results with great caution and specifically point the reader to the relevant literature. We are indeed thinking about a future study where we investigate this connectivity using other connectivity metrics and a detailed consideration of power.

      Reviewer #3 (Public Review):

      In the current paper, Abbasi et al. aimed to characterize and compare the patterns of functional connectivity across frequency bands (1 Hz - 90 Hz) between regions of a speech network derived from an online meta-analysis tool (Neurosynth.org) during speech production and perception. The authors present evidence for complex neural dynamics from which they highlight directional connectivity from the right cerebellum to left superior temporal areas in lower frequency bands (up to beta) and between the same regions in the opposite direction in the (lower) high gamma range (60-90 Hz). Abbasi et al. interpret their findings within the predictive coding framework, with the cerebellum and other "higher-order" (motor) regions transmitting top-down sensory predictions to "lower-order" (sensory) regions in the lower frequencies and prediction errors flowing in the opposite direction (i.e., bottom-up) from those sensory regions in the gamma band. They also report a negative correlation between the strength of this top-down functional connectivity and the alignment of superior temporal regions to the syllable rate of one's speech.

      Strengths:

      (1) The comprehensive characterization of functional connectivity during speaking and listening to speech may be valuable as a first step toward understanding the neural dynamics involved.

      (2) The inclusion of subcortical regions and connectivity profiles up to 90Hz using MEG is interesting and relatively novel.

      (3) The analysis pipeline is generally adequate for the exploratory nature of the work.

      Weaknesses:

      (1) The work is framed as a test of the predictive coding theory as it applies to speech production and perception, but the methodological approach is not suited to this endeavor.

      We agree that we cannot provide definite evidence for predictive coding in speech production and perception and we believe that we do not make that claim in the manuscript. However, our results are largely consistent with what can be expected based on predictive coding theory.

      (2) Because of their theoretical framework, the authors readily attribute roles or hierarchy to brain regions (e.g., higher- vs lower-order) and cognitive functions to observed connectivity patterns (e.g., feedforward vs feedback, predictions vs prediction errors) that cannot be determined from the data. Thus, many of the authors' claims are unsupported.

      We will revise the manuscript to more clearly differentiate our results (e.g. directed Granger-Causality from A to B) from their interpretation (potentially indicating feedforward or feedback signals).

      (3) The authors' theoretical stance seems to influence the presentation of the results, which may inadvertently misrepresent the (otherwise perfectly valid; cf. Abbasi et al., 2023) exploratory nature of the study. Thus, results about specific regions are often highlighted in figures (e.g., Figure 2 top row) and text without clear reasons.

      Our connectograms reveal a multitude of results that we hope is interesting to the community. At the same time the wealth of findings poses a problem for describing them. We did not see a better way then to highlight specific connections of interest.

      (4) Some of the key findings (e.g., connectivity in opposite directions in distinct frequency bands) feature in a previous publication and are, therefore, interesting but not novel.

      We actually see this as a strength of the current manuscript. The computation of connectivity is here extended to a much larger sample of brain areas. It is reassuring to see that the previously reported results generalise to other brain areas.

      (5) The quantitative comparison between speech production and perception is interesting but insufficiently motivated.

      We thank the reviewer for this comment. We have addressed that in detail in response to the point (1&4) of reviewer 1.

      (6) Details about the Neurosynth meta-analysis and subsequent selection of brain regions for the functional connectivity analyses are incomplete. Moreover, the use of the term 'Speech' in Neurosynth seems inappropriate (i.e., includes irrelevant works, yielding questionable results). The approach of using separate meta-analyses for 'Speech production' and 'Speech perception' taken by Abbasi et al. (2023) seems more principled. This approach would result, for example, in the inclusion of brain areas such as M1 and the BG that are relevant for speech production.

      We agree that there are inherent limitations in automated meta-analysis tools such as Neurosynth. Papers are used in the meta-analysis that might not be directly relevant. However, Neurosynth has proven its usefulness over many years and has been used in many studies. We also agree that our selection of brain areas is not complete. But Granger Causality analysis of every pair of ROIs leads to complex results and we had to limit our selection of areas.

      (7) The results involving subcortical regions are central to the paper, but no steps are taken to address the challenges involved in the analysis of subcortical activity using MEG. Additional methodological detail and analyses would be required to make these results more compelling. For example, it would be important to know what the coverage of the MEG system is, what head model was used for the source localization of cerebellar activity, and if specific preprocessing or additional analyses were performed to ensure that the localized subcortical activity (in particular) is valid.

      There is a large body of evidence demonstrating that MEG can record signals from deep brain areas such as thalamus and cerebellum including Attal & Schwarz 2013, Andersen et al, Neuroimage 2020; Piastra et al., 2020; Schnitzler et al., 2009. These and other studies provide evidence that state-of-the-art recording (with multichannel SQUID systems) and analysis is sufficient to allow reconstruction of subcortical areas. However, spatial resolution is clearly reduced for these deep areas. We will add a statement in the revised manuscript to acknowledge this limitation.

      (8) The results and methods are often detailed with important omissions (a speech-brain coupling analysis section is missing) and imprecisions (e.g., re: Figure 5; the Connectivity Analysis section is copy-pasted from their previous work), which makes it difficult to understand what is being examined and how. (It is also not good practice to refer the reader to previous publications for basic methodological details, for example, about the experimental paradigm and key analyses.) Conversely, some methodological details are given, e.g., the acquisition of EMG data, without further explanation of how those data were used in the current paper.

      We will revise the relevant sections of the manuscript.

      (9) The examination of gamma functional connectivity in the 60 - 90 Hz range could be better motivated. Although some citations involving short-range connectivity in these frequencies are given (e.g., within the visual system), a more compelling argument for looking at this frequency range for longer-range connectivity may be required.

      Given previous evidence of connectivity in the gamma band we think that it would be a weakness to exclude this frequency band from analysis.

      (10) The choice of source localization method (linearly constrained minimum variance) could be explained, particularly given that other methods (e.g. dynamic imaging of coherent sources) were specifically designed and might potentially be a better alternative for the types of analyses performed in the study.

      Both LCMV and DICS are beamforming methods. We used LCMV because we wanted used Granger Causality which requires broadband signals. DICS would only provide frequency-specific band-limited signals.

      (11) The mGC analysis needs to be more comprehensively detailed for the reader to be able to assess what is being reported and the strength of the evidence. Relatedly, first-level statistics (e.g., via estimation of the noise level) would make the mGC and DAI results more compelling.

      We perform group-level cluster-based statistics on mGC while correcting for multiple comparisons across frequency bands and brain parcels and report only significant results. This is an established approach that is routinely used in this type of studies.

      (12) Considering the exploratory nature of the study, it is essential for other researchers to continue investigating and validating the results presented in the current manuscript. Thus, it is concerning that data and scripts are not fully and openly available. Data need not be in its raw state to be shared and useful, which circumvents the stated data privacy concerns.

      We acknowledge the reviewer's concern regarding the full availability of the dataset. Due to privacy limitations on the collected data, we are unable to share it publicly at this time. However, to promote transparency and enable further exploration, we have provided the script used for data analysis and an example dataset. This example dataset should provide a clear understanding of the data structure and variables used in the analysis. Additionally, we are happy to share the complete dataset upon request from research teams interested in performing in-depth secondary analyses.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)):_ _ __ In this manuscript, Jones et al. report on a potential role for fam83fa in zebrafish hatching, radiation response and autophagy. The authors are commended for generating multiple KO lines and maternal-zygotic embryos for analysis. However, important controls are lacking and the data is circumstantial throughout with very little mechanistic insight into the precise roles, if any, of fam83f in these processes.

      We thank the reviewer for recognizing the strengths of our manuscript, and highlighting areas we might improve. Please see the specific comments below addressing the points raised. In respect of mechanistic insight, while we agree that our manuscript does not provide this, it was not intended to. Rather, we aim to communicate our descriptive findings on the role of Fam83fa in vivo, providing data for follow-up studies by other researchers into the mechanistic role of Fam83fa.

      1. Validation of the KO phenotypes (hatching, IR sensitivity) requires rescue with WT fam83fa WT mRNA, but not 1-500 or fam83fb mRNA.

      We thank the reviewer for raising the issue of rescue experiments. Such experiments are frequently used in knock-down experiments, where non-specificity may be a problem, but they are used more rarely in genetic knock-outs, where the gene defect is well defined. In the case of Fam83fa, a particular difficulty is that overexpression of fam83fa itself causes a p53-mediated DNA damage response (DDR) (Salama et al., 2019). Moreover, we have shown by both qRT-PCR and western blotting that injection of fam83fa mRNA into zebrafish embryos (the traditional technique by which rescue experiments are performed) induces a p53-mediated DDR. As a result, it would be very difficult to interpret the results of any rescue experiment, because one would have to be absolutely certain that levels of fam83fa re-expression recapitulate and do not exceed endogenous levels. As a tool for specificity, we therefore used more than one fam83fa-/- mutant line, carrying a different genomic mutation, and validated that the same phenotype was present in both. We are happy to provide the qRT-PCR and western blot data confirming the results of fam83fa mRNA injection, if required. We have included an additional section into the manuscript detailing this issue. 2.

      While the hatching phenotype (Fig 3) is convincing, there is no data on HG development in the null embryos. Does the HG develop normally in the absence of fam83fb? If so, this would support the authors conclusions that the role of fam83fb is functional rather than developmental (indirect effect). In situs as in Fig.1 might be helpful here.

      Thank you to the reviewer for this helpful suggestion. We agree that we did not investigate whether the hatching gland develops normally in the MZ-fam83fa-/- mutant embryos. No gross morphological differences were observed that led us to investigate this, although we agree it is an interesting question for a future project. In terms of functional vs developmental effects, we are confident that MZ-fam83fa-/- mutant embryos develop at a normal temporal rate, as evidenced by the machine learning based classifier used to assess temporal developmental trajectory (Figure S3 and Jones et al., 2022, 2024). This strongly suggests that the effect of fam83fa KO is functional rather than indirect and caused by (for example) developmental delay.

      While the IR sensitivity phenotype (Fig S4) is convincing, IR-induced cell death/apoptosis was not analyzed. There is a large literature describing straightforward assays for cell death/apoptosis detection in zebrafish with assays such as acridine orange or TUNEL labeling, or active casp3 whole-mount IF. Is IR-induced cell death enhanced in fam83fa KOs?

      We thank the reviewer for their positive comments and agree that investigating the nature of the cell death occurring following IR would be very interesting. We did make use of both acridine orange and TUNEL labeling following injection of fam83fa mRNA (see 1 above), and whilst the assays themselves were relatively straightforward, due to technical issues the quantification of fluorescence intensity was not. Similarly, we suspect that a significant degree of necrosis is also occurring, which further complicates the issue of data interpretation from both these approaches. We do, however, think this is an important avenue of questioning, and hope that other researchers will explore the mechanism of IR induced cell death in the MZ-fam83fa-/- mutants in the future,

      Similarly, there are multiple tools to assay autophagy in zebrafish (e.g., Moss et al., Histochem Cell Biol 2020, PMC7609422; Mathai et al., Cells 2017, PMC5617967). Is autophagy affected in the KOs, with or without IR? These experiments might directly implicate fam83fa in autophagy.

      We agree that there are exciting tools with which to assay autophagy in zebrafish, and although we considered some of these, including caudal fin regeneration, we deemed these experiments to be beyond the descriptive scope of this paper, given the time and resources available to us. We hope that other researchers will use our data as a basis for investigating the role of Fam83fa in autophagy further, using assays such as these suggested by the reviewer.

      Figure 4: Isn't there a slight reduction in p53 induction at 10 hours?

      Although the western blot in Figure 4A gives this impression, this is probably due to loading variability (see the anti-β-actin loading control band). Moreover, over three independent experiments (Figure 4B), this apparent difference is not statistically significant. Taken together with other evidence that the p53-mediated DNA damage response is not affected in MZ-fam83fa-/- mutants, we are confident there is no detectable change in the level of stabilized p53 in the MZ-fam83fa-/- mutants compared to WT.

      Given the widely documented, dominant role of p53 in zebrafish IR-sensitivity, the authors should test if the IR sensitivity of fam83fa KO animals is p53-dependent, ideally via a cross into p53 null, but at least via injection of p53 morpholinos.

      We agree that p53 is widely documented as playing an essential role in the IR induced DNA damage response in zebrafish. All our experiments suggest there is no difference between the levels of p53 (protein or mRNA) or any of the p53-induced downstream effectors (that we tested) in MZ-fam83fa-/- mutants compared to WT embryos. This was true whether or not the embryos were subjected to genotoxic stressors, including IR treatment. We therefore conclude that the increased sensitivity phenotype we observe as a result of loss of Fam83fa is not caused by a change in p53 activity, at least not as part of the DNA damage response.

      Do autophagy inhibitors phenocopy the hatching and IR-sensitivity defects of fam83fa embryos? Do the inhibitors exacerbate the mutant phenotypes or synergize with M or Z mutant phenotypes? (I may have missed this but do M and Z fam83fa null embryos have any phenotype? Or do the phenotypes only manifest in MZ embryos?)

      This is an excellent question, and indeed one we attempted to address. We tried to optimize several autophagy inhibitors including bafilomycin A1, chloroquine and wortmannin, as well as the proteasomal inhibitor MG132. In addition, we tried to optimize the autophagy promoters Torin1 and rapamycin. Unfortunately, we regularly saw global effects in zebrafish embryos that were difficult to characterize and control by dosage. At the same time, we were also working to confirm the specific effects of these drugs on autophagy using p62 and LC3-I and LC3-II western blots, which themselves were difficult to optimize. We attempted to optimize these experiments for 6 months before the COVID lockdown occurred, at which point they were abandoned. We would be delighted for future researchers to continue these experiments, as we are now unable to pursue this further due to closure of the Smith lab, but we agree that these are very pertinent questions. We hope the descriptive data provided in our paper will prompt other researchers in the autophagy field to further explore the role of Fam83fa in autophagy. In response to the zygotic phenotype question, this was something we did not investigate. As there was no immediately apparent phenotype in the zygotic generation, for ease of screening larger numbers of embryos we proceeded immediately to the maternal-zygotic (MZ) generation.

      Reviewer #1 (Significance (Required)):

      The role of Fam83f is not known. This study in zebrafish might be the first to clarify the function of this protein in vivo.

      We thank the reviewer for this positive insight, and we agree that our work is the first do so in vivo.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Fam83f is one of the proteins about which little is known. The authors Jones et al., tried to shed light on Fam83f function by knocking out the gene in zebrafish. Here they found that fam83 is expressed in the hatching gland and that larvae without Fam83f hatch significantly earlier than wild-type animals. The authors furthermore investigated the response of fam83f knock-out animals to DNA damage and found increased sensitivity to ionizing radiation and MMS. In order to find out more about Fam83f function in the DNA damage response, the authors performed RNA-seq after employing DNA damage and here they saw upregulation of several autophagy/lysosome-associated proteins and downregulation of some phosphatidylinositol-3-phosphate binding proteins, among others. Finally, the authors found that Fam83f is targeted to the lysosome. The manuscript is overall well written and clear in its general statement.

      We thank the reviewer for their encouraging comments.

      In the manuscript, the authors describe the investigation of several aspects of Fam83f function and particularly the role in hatching seems to be important for Fam83f as the gene is strongly expressed in the hatching gland and its absence leads to a clear and considerable earlier hatching. Unfortunately, all aspects of Fam83f function that are described in the manuscript are investigated very superficially, the conclusions are not supported by data and important controls are lacking. As such, the RNA-seq results are not confirmed by qRT-PCR, the role of the Fam83f LIR domain is not confirmed by co-IPs and it has not been investigated whether the presence of Fam83f in lysosomes is due to its degradation or whether it has a function in this cellular compartment.

      We thank the reviewer for their input and will address each point raised below: -

      • All aspects of Fam83f function are investigated superficially.

      We agree that we have not provided an in-depth analysis of the mechanistic role of Fam83fa. It was because there were so many roles that we decided to make this paper rather descriptive in nature, hoping that the observations will prove useful to other researchers who may wish to define the mechanistic roles of Fam83fa more deeply. Even without in-depth investigation, our findings are previously unreported and the phenotypes we report are clear. We have amended our manuscript to make it apparent that this paper is intended to be descriptive in nature, and we hope this addresses this issue.

      • Important controls are lacking - RNA-seq results are not confirmed by qRT-PCR

      We thank the reviewer for their comment. We did not include qRT-PCR data as a control for the RNA-seq data because 1) each RNA-seq experiment was repeated on three biological replicates across three independent experiments and 2) we conducted RNA-seq on two different MZ-fam83fa-/- mutant lines and only considered genes that were mis-regulated in both mutants. Taken together, we considered this to be sufficient validation for the manuscript. However, we also performed confirmatory qRT-PCR for several of the differentially expressed genes identified, including the three main PI(3)P binding genes. We have now included these data in the supplementary information as an additional control - see Figure S6G which is now also referred to in the main text, and additional primer sequences have been added to Table S1.

      • The role of the Fam83f LIR domain is not confirmed by co-Ips

      We agree with the reviewer that this is an important experiment, and we worked closely with Dr Brian Ludwig and Dr Karen Vousden (The Francis Crick Institute) to test this. We tried to express zebrafish Atg8 and Gabarap (the two main ATG8 proteins that bind to LIR domains) but were unable to express sufficient levels of protein to perform the co-Ips. The text in the manuscript has now been amended to reflect that this experiment is required to confirm the role of the putative LIR domain in Fam83fa.

      • *it has not been investigated whether the presence of Fam83f in lysosomes is due to its degradation or whether it has a function in this cellular compartment *

      Whilst we agree with the reviewer that this is an important question, we did not intend this paper to expand beyond a descriptive role of the observations we made following the loss of Fam83fa in vivo. These are important questions to follow up on to determine the mechanism of action of Fam83fa, and we hope that other researchers will pursue these avenues of investigation following the publication of our observations.

      Also, there is no leading concept in the manuscript. Starting from a role in hatching, the authors go to the DNA damage response and finally to the presence of Fam83f in lysosomes. How are these different aspects linked? Is the presence of Fam83f in lysosomes important for the suppression of hatching and how does Fam83f delays this process? (One would have wished that the authors would not have been that broad and were more focused on a particular aspect which then could have been investigated in depth.)

      We agree with the reviewer that the paper gives a broad overview of our observations and does not examine the underlying mechanisms in detail. However, we believe that descriptive papers such as this, where observations following genetic perturbation are reported, are equally important, providing as they do important foundational data for other researchers to take forward. We do postulate on the links between the hatching, DNA damage and lysosomal phenotypes we observe in the discussion section, and we have expanded on this following the reviewers' comments, to make our hypothesized link between these phenomena clearer.

      Specific comments: - All materials should be described in material and methods including the antibodies that have been used

      The antibodies used together with concentrations and catalog numbers are now in Materials and Methods

      • Abbreviations should be explained

      The manuscript has been revised to ensure all abbreviations are explained. We thank the reviewer for bringing this oversight to our attention.

      • Figure 4A: Levels of p53 should also be shown for untreated fam83f -/-KO1 and KO2 animals

      The authors thank the reviewers for raising this point. Extracts from untreated MZ-fam83fa-/- KO1 and KO2 embryos were not included on this particular blot, as p53 was observed to be undetectable in all embryos, across all our experiments (WT and both mutants) unless genotoxic stress was applied. No quantification could therefore be performed as the expression level was essentially zero. However, we have now included an example p53 western blot in Supplemental Figure 5A, which shows WT, MZ-fam83fa-/- KO1 and MZ-fam83fa-/- KO2 untreated blots for p53 (all undetectable) alongside treated embryos (detected).

      • Some references are missing (e.g. page 17, lane 320/321: As this group of cells arises....)

      This citation and reference have now been added; thank you to the reviewer for highlighting this omission.

      • Lane 369: The authors write about 4 KO lines but only two are shown in the figure.

      We thank the reviewer for this observation. In Figure 2B only KO1 and KO2 schematic diagrams are shown for simplicity (as these are the lines taken forward for further investigation). We have now amended the manuscript text to make this clear.

      • Lane 374/375: The NMD is not proven

      Absolutely - we have now revised the text to change this sentence accordingly and thank the reviewer for noting this.

      • Lane 380: how can RNA levels of fam83fa be upregulated when the gene has been knocked out? Why are these genes only upregulated in KO1? How relevant is this?

      This was a typographical error, and we are very grateful to the reviewer for picking up on this. It should have read 'fam83fb'. As nonsense-mediated decay and associated transcriptional adaptation have been previously reported in zebrafish, this finding may be of considerable interest to the community. It is a side observation, and not necessarily directly related to the role of Fam83fa in vivo, but we felt it important to include. Indeed, as a result of this observation we have recently shared our MZ-fam83fa-/- lines with another group who are planning to investigate precisely this question - why are fam83fb and fam83g only upregulated in KO1?

      • Figure 3C is not mentioned in the text and lacks any labelling

      Figure 3C is now clearly referred to in the text and a label added to the figure.

      • Lane 434/435: all relevant data should be shown (can be done as supplementary figure)

      We have now amended this to include an additional supplemental figure (Figure S5A).

      • Lane 434: The reference to the figure seems to be incorrect (5A4A)

      Amended accordingly - thank you for pointing out this mistake.

      • Figure 4C and 4D: what is the difference?

      Thank you to the reviewer for noticing this omission. These data are from t1 (+2hrs) and t2 (+10hrs) and have now been labelled accordingly.

      • S5C and S5D: why are there 3 clusters?

      We thank the reviewer for raising this as it has provided us with an opportunity to present our data more clearly. There are 3 clusters that represent the combination of the two first principal components, which are time and treatment. Therefore, the clusters represent i) untreated at t1, ii) treated at t1 and iii) treated at t2. However, having two plots with different color schemes made this confusing/misleading. We have now replaced the two PCA plots with one that is colored and labelled accordingly with the 3 aforementioned clusters.

      • Lane 495 to 505: What does this mean that the GO analysis shows upregulation and downregulation of endopeptidases and why "in contrast"?

      We thank the reviewer for this comment, and we agree that this paragraph was misleading/confusing. This has now been rewritten in the main text, clarifying that endopeptidases were consistently upregulated at both timepoints.

      Reviewer #2 (Significance (Required)):

      The strength of the manuscript is certainly that it provides inside into Fam83f function as there is not much known about Fam83f.

      We thank the reviewer for the positive comment, and we agree that very little is known about this highly conserved protein.

      These study is probably most interesting for people in the zebrafish and related fields as the authors convincingly show the expression of Fam83f in the hatching gland and also the earlier hatching in the absence of the protein is very clear.

      Thank you for the positive feedback.

      The weakness of the study is clearly that it does not provide an in-depth analysis. As such, it shows that Fam83f is involved in hatching and can delay the process but it remains elusive how this is achieved. (Likwise, also the investigation into the DNA damage response remains very superficial and does not prove a specific role for Fam83f in the DNA damage response or whether the increased sensitivity is more unspecifically caused by the absence of a gene or eventually even connected to the earlier hatching.

      Please refer to responses above (and changes made to the manuscript) clarifying that this study is intended to be descriptive, and provides important foundational data for further in-depth mechanistic studies by other researchers interested in the role of Fam83fa in vivo.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)):_ _ __ In their manuscript "Zebrafish reveal new roles for Fam83f in hatching and the DNA damage-mediated autophagic response", Jones et al. provide an interesting exploration for the function of a poorly studied protein, Fam83f in embryonic development. Using the zebrafish as a model organism, the study combines loss-of-function genetics, phenotypic analysis and RNA-sequencing to characterize and explore the result of Fam83f loss. Upon critical review of the manuscript and the results we offer suggestions to improve the manuscript (see 'minor technical issues'). Additionally, we would like to highlight a weakness of the study in making the connection between Fam83f to the observed phenotype (increased sensitivity to DNA damage), see 'major issues'.

      Major issues:

      Most of our concern stems from relatively incomplete connection of the loss of fam83f to increased sensitivity to DNA-damage and lysosome function.

      Please refer to comments above and changes made to the manuscript to clarify this is a descriptive paper that is not intended to provide in-depth mechanistic insight into the role of Fam83fa.

      Is the increased sensitivity in fam83f KO embryos a direct effect to fam83f loss? A rescue experiment (by introduction of Fam83fa mRNA into their KO2 fish line) in the presence of ionizing radiation would help us understand the functional role of this protein in this process. Furthermore, can overexpression of any of the down-regulated genes involved in lysosome function restore the early hatching phenotype or the sensitivity to DNA damage? Fam83fa rescue experiments would be very difficult to interpret - please see comments above and the corresponding changes to our manuscript.

      In terms of over-expressing some of the downregulated genes identified in the RNA-seq and qRT-PCR to see if the phenotype can be rescued, we feel these are excellent suggestions and we hope other researchers in future will attempt such experiments.

      Minor technical issues:

      -Methods line 203, clarify how many embryos were used per sample for RNA-seq (this was only described as 15 embryos in the main body results text).

      Text has been amended to clarify this. We thank the reviewer for noticing this oversight.

      -Comment about the expansion of fam83f orthologs in mammals (8) as opposed to only 2 in zebrafish

      We apologize for any confusion: mammals do not have 8 fam83f orthologs. Mammals and zebrafish have 8 FAM83 genes (FAM83A-FAM83H). Zebrafish, unlike mammals, have genome duplication and although mammals have only one FAM83F gene, zebrafish have two: Fam83fa and Fam83fb. We trust this clarifies this issue and believe this to be clear in our main text. However, we are happy to make any suggested amendments should the reviewer consider our wording confusing.

      -Supplementary figure 1C: please include representative images of secondary axis formation in fam83fa overexpressed Xenopus embryos.

      We have not included any images as these are already published in our related paper on FAM83F (Dunbar et al., 2020) which we refer to in the figure legend text. No additional images were captured specifically for this publication.

      -Provide more information about the mis-regulated genes in the RNA-seq analysis, how many are up or down regulated? Perhaps a better plot than a Venn diagram can be an MA-plot with the Venn diagram moved to a supplementary figure.

      The Venn diagrams in Figure 5A-C are to illustrate the number of differentially expressed genes that are shared between KO1 and KO2 (whether up or down regulated), and only those that are common to both lines are taken forward. Following the reviewer's comments, we have now displayed the behavior of the common genes across all replicates in one heatmap, with the data normalized to the WT untreated samples, and the normalized variance stabilized count indicates whether a gene is up or down regulated across each of the replicates and conditions. We believe this addresses the reviewer's comment as these data are now displayed in a more direct way and the genes that are consistently up or downregulated across all replicates (and indeed those that are not) can be clearly seen. We thank the reviewer for raising this and improving our data representation.

      -A better comparison of mis-regulated genes in the fam83f knockouts would be a comparison of KO2 and perhaps KO3, as the compensatory effects in KO1 can lead to additional indirect effect on the transcriptome. We understand the time and cost involved in this experiment and suggest that the differential gene expression analysis be performed individually on up or down regulated genes from KO2, or a comparison of such analysis will be provided with the differential gene expression analysis that was performed on shared mis-regulated genes between KO1 and KO2.

      The reviewer raises an excellent point. At the time of experimental design, we were concerned that omitting KO1 in favor of another line (e.g. KO3) would bias our results by excluding potentially important data. Similarly, as transcriptional adaptation occurs in a sequence specific manner, and the phenotype was present in KO1 regardless, we didn't want to exclude these data. However, with hindsight, we agree that it may have been prudent to exclude KO1 on this basis, and we may have seen an increased concordance of differentially expressed genes (DEGs) between KO2 and KO3. However, this is not possible to repeat now due to the Smith lab closing, and our documented findings are valid and important regardless. We acknowledge however that, with hindsight, what the reviewer suggests may have been better experimental design.

      -Can you confirm with the RNA-seq analysis that fam83g is upregulated in KO1 as opposed to KO2? (i.e. can the compensatory analysis you have observed with qRT-PCR be confirmed with the RNA-seq data?)

      This is an excellent question, and we thank the reviewer for raising this. fam83fb passed our threshold for significance to be deemed as differentially expressed (upregulated) in KO1 only, in accordance with our qRT-PCR data. fam83g did not pass the significance threshold, but perhaps this is not surprising as both fam83fb and fam83g are expressed at particularly low levels to start with and would probably require much greater sequencing depth to be detected.

      Reviewer #3 (Significance (Required)):

      There is fundamental value in clarifying the in vivo function of poorly characterized protein-coding genes. This study fills a gap in the literature, but the broader conceptual impact is limited. The authors do a thorough job at generating and characterizing CRISPR/Cas9 mediated knock-out zebrafish animals. It is further commended that the authors do a meticulous job in a quantitative description of the resulting phenotype. This is a thorough study, with the only major concern being the lack of rescue experiments that would be needed to substantiate the the role of fam83f in sensitivity to DNA damage and lysosome function.

      We thank the reviewer for their comments and trust we have addressed the issues concerned with the changes described above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Perampalam et al. describe novel methods for genome-wide CRISPR screening to identify and validate genes essential for HGSOC spheroid viability. In this study, they report that Netrin signaling is essential for maintaining disseminated cancer spheroid survival, wherein overexpression of Netrin pathway genes increases tumor burden in a xenograft model of ovarian cancer. They also show that high netrin expression correlates with poor survival outcomes in ovarian cancer patients. The study provides insights into the biology of netrin signaling in DTC cluster survival and warrants development of therapies to block netrin signaling for treating serous ovarian cancer.

      Strengths:

      - The study identifies Netrin signaling to be important in disseminated cancer spheroid survival

      - A Novel GO-CRISPR methodology was used to find key genes and pathways essential for disseminated cancer cell survival

      Thanks for the endorsement of our work and its importance to metastasis in ovarian cancer.

      Weaknesses:

      - The term dormancy is not fully validated and requires additional confirmation to claim the importance of Netrin signaling in "dormant" cancer survival.

      - Findings shown in the study largely relate to cancer dissemination and DTS survival rather than cancer dormancy.

      Much of the validation of dormancy and cell cycle arrest in HGSOC spheroids, as well as the culture model, have been published previously and hence was not repeated here.  I think this reviewer will appreciate the updated citations and explanations to better illustrate the state of knowledge.  We have also added new experiments that further emphasize the dormant state of spheroid cells in culture and xenografts, as well as patient derived spheroids used in this study.

      Reviewer #1 (Recommendations for Authors):

      (1) It is unclear what spheroid/adherent enrichment ratio is and how it ties into genes affecting cell viability. Why is an ER below 1 the criteria for selecting survival genes?

      Our screen uses the ‘guide only’ comparison in each culture condition to establish a gene score under that specific condition.  A low adherent score captures genes that are essential under standard culture conditions where cells are proliferating and this can include genes needed for proliferation or other basic functions in cell physiology.  A low spheroid score identifies the genes that are most depleted in suspension when cells are growth arrested and this is an indication of cell death in this condition.  Since gene knock outs are first established in adherent proliferating conditions, essential genes under these conditions will already start to become depleted from the population before suspension culture.  By selecting genes with a ratio of <1 we can identify those that are most relevant to dormant suspension culture conditions.  Ultimately, the lowest enrichment ratio scores represent genes whose loss of function is dispensable in the initial adherent condition, but critical for survival in suspension and this is what we aimed to identify. We’ve updated Figure 1B to illustrate this and we’ve updated the explanation of the enrichment ratio on page 6, lines 144 to 147 of the results.

      (2) The WB for phospho-p38 in figure 1A for OVCAR8 line does not show increased phosphorylation in the spheroid relative to the adherent. If anything, phospho-p38 appears to be reduced in the spheroid. Can the authors provide a better western blot?

      We’ve updated this blot with a longer exposure, see Figure 1A.  Phosphorylation levels of p38 are essentially unchanged in OVCAR8 cells in suspension culture, although the overall levels of p38 may be slightly reduced in dormant culture conditions.

      (3) How did the authors confirm dormancy apart from western blot for phospho-ERK vs phospho-p38? Authors should add EdU/BrdU staining and/or Ki67 staining to confirm dormancy.

      Previous publications that appear as citations 7,10, and 33 in the reference list established the growth arrest state of these cells in suspension culture in the past.  This included measuring other known markers of dormancy and quiescence such as p27, p130, and reduced cyclin/cdk activity and 3H-thymidine incorporation. In addition, other associated characteristics of dormancy such as EMT and catabolic metabolism have been demonstrated in these culture conditions (see citation 11 and Rafehi et al. Endocr. Relat. Cancer 23;147-59).  We’ve added these additional citations to our descriptions of dormant spheroid culture to better clarify the status of these cells in our experiments (see page 6, lines 126-28).  To ensure that cells are growth arrested in the experiments shown in this paper, we have updated Figure 1A to include blots of p130 and Ki67 to further emphasize that spheroid cells are not proliferating as the quiescence marker (p130) is high and the proliferative marker (Ki67) is lost in suspension culture.

      (4) Can the authors report spheroid volume over time in culture? How was viability measured?

      We’ve updated the methods (see page 27, line 574) to better highlight the description of cell survival that answers both of these questions. At the ends of experimental time points in both the screen and viability assays we captured live cells by replating on adherent plasticware. We fixed and stained with crystal violet and photographed plates to illustrate the sizes of spheroids (shown in Fig. 2 Supplement 1E, Fig. 6C, and 7D). We subsequently extracted the dye and quantitated it spectrophotometrically to quantitatively compare biomass of viable cells between experiments irrespective of the relatively random shapes of spheroids. We found reattachment and staining in this manner to match traditional viability assays such as CellTiter-Glo in a previous paper (10). Furthermore, biomass never increases in culture and diminishes gradually over time in culture consistent with the non-proliferative state of these experiments. Double checks of this equivalency of viability and reattached biomass measurments, as well as demonstrating that biomass is lost over time, are shown in Fig. 2 Supplement 1E that compares reattached crystal violet staining measurements with CellTiter-Glo for DYRK1A knock out cells over time in culture. In addition, we include a comparison of crystal violet staining of reattached spheroids with trypan blue dye exclusion in Fig. 5G and H. In both cases reattachment and more direct viability assays demonstrate the same conclusion that Netrin signaling supports viability in dormant culture.

      (5) Please show survival significance of Netrin signaling genes in recurrence/relapse free survival to claim importance in cancer dormancy.

      See Fig. 7 Supplement 1C where we include the recurrence free survival data. Netrin-1, and -3 high expressors also have a numerically shorter progression free survival but it is not statistically significant. Netrin-1 overexpression alone is also shown and it shows shorter survival with a P-value of 0.0735. Elevated survival of dormant cells in a residual disease state is expected to increase the chance of relapse and shorten this interval. Thus, this data is consistent with our model, but lacks statistical significance. 

      There are many alternative ways to interpret what shorter progression free survival, or overall survival, may mean biologically. Since survival of dormant cells is but one of them, we also added new data to experimentally investigate the role of endogenous Netrin signaling in dormant residual disease in Fig. 6 and described on page 12, lines 266-87.  We used xenograft experiments to show OVCAR8 spheroids form and withdraw from the cell cycle equivalently to suspension culture following intraperitoneal injection.  Furthermore, loss of Netrin signaling due to receptor deletions compromises survival during this early window before disseminated lesions form.  This argues that Netrin signaling contributes to survival during this window of dormancy.  In addition, mice engrafted with mutant cells experience prolonged survival when Netrin signaling is blocked.  Together, these experiments further argue that Netrin signaling supports survival in the dormant, non-proliferative phase, and leads to reduced survival of mice.

      (6) The authors show IHC staining of patient ascites derived HGSOC spheroids. However, no marker for dormancy is shown in these spheroids. Adding Ki67 staining or phospho-ERK vs phospho-p38 would be necessary to confirm cancer dormancy.

      We have added new staining for Ki67 and p130 that compares these markers in HGSOC tumors where Ki67 is high and p130 is low with ascites derived spheroids where staining is the opposite. Importantly, expression of p130 is linked to cellular quiescence and is not found to accumulate in the nucleus of cells that are just transiting through G1.  This confirms that the ascites derived spheroids are dormant.  See Fig. 4A-E and described on page 9, lines 201-7.

      (7) Overall, the findings are interesting in the context of cancer dissemination. There is not enough evidence for cancer dormancy and the importance of Netrin signaling in the survival of cancer dormancy. Overexpression of Netrin increases phosphorylation of ERK, leading one to expect an increase in proliferation. This suggests that Netrin breaks cancer cells out of dormancy, into a proliferative state.

      We have found that the discovery of Netrin activation of MEK-ERK in growth arrested cells is counterintuitive to many cancer researchers.  However, this axis exists in other paradigms of Netrin signaling in axon outgrowth that are not proliferation related (see citation 26, Forcet et al. Nature 417; 443-7 as an example).  We have added Fig. 5D and descriptions on page 11, lines 244-52 to better clarify that Netrins CAN’T induce cell proliferation through ERK.  Addition of recombinant Netrin-1 can only induce ERK phosphorylation in suspension culture conditions and not in quiescent adherent conditions.  The small magnitude of ERK phosphorylation induced by Netrin-1 in suspension compared to treating adherent, quiescent cells with the same concentration of mitogenic EGF further emphasizes that this is not a proliferative signal.  Lastly, the new xenograft experiment in Fig. 6A-D (described on page 12, lines 266-81 demonstrates the growth arrested context in which Netrin signaling in dormant spheroids leads supports viability.

      (8) If authors wish to claim cancer dormancy as the premise of their study, additional confirmatory experiments are required to support their claims. Alternatively, based on the current findings of the study, it would be best to change the premise of the article to Netrin signaling in cancer dissemination and survival of disseminated cancer spheroids rather than cancer dormancy.

      I expect that this reviewer will agree that we have added more than sufficient explanations of background work on HGSOC spheroid dormancy from the literature, as well as new experiments that address their questions about dormancy in our experiments.

      Reviewer #2 (Public Review):

      Summary:

      In this article, the authors employed modified CRISPR screens ["guide-only (GO)-CRISPR"] in the attempt to identify the genes which may mediate cancer cell dormancy in the high grade serous ovarian cancer (HGSOC) spheroid culture models. Using this approach, they observed that abrogation of several of the components of the netrin (e.g., DCC, UNC5Hs) and MAPK pathways compromise the survival of non-proliferative ovarian cancer cells. This strategy was complemented by the RNAseq approach which revealed that a number of the components of the netrin pathway are upregulated in non-proliferative ovarian cancer cells and that their overexpression is lost upon disruption of DYRK1A kinase that has been previously demonstrated to play a major role in survival of these cells. Perampalam et al. then employed a battery of cell biology approaches to support the model whereby the Netrin signaling governs the MEK-ERK axis to support survival of non-proliferative ovarian cancer cells. Moreover, the authors show that overexpression of Netrins 1 and 3 bolsters dissemination of ovarian cancer cells in the xenograft mouse model, while also providing evidence that high levels of the aforementioned factors are associated with poor prognosis of HGSOC patients.

      Strengths:

      Overall it was thought that this study is of potentially broad interest in as much as it provides previously unappreciated insights into the potential molecular underpinnings of cancer cell dormancy, which has been associated with therapy resistance, disease dissemination, and relapse as well as poor prognosis. Notwithstanding the potential limitations of cellular models in mimicking cancer cell dormancy, it was thought that the authors provided sufficient support for their model that netrin signaling drives survival of non-proliferating ovarian cancer cells and their dissemination. Collectively, it was thought that these findings hold a promise to significantly contribute to the understanding of the molecular mechanisms of cancer cell dormancy and in the long term may provide a molecular basis to address this emerging major issue in the clinical practice.

      Thanks for the kind words about the importance of our work in the broader challenges of cancer treatment.

      Weaknesses:

      Several issues were observed regarding methodology and data interpretation. The major concerns were related to the reliability of modelling cancer cell dormancy. To this end, it was relatively hard to appreciate how the employed spheroid model allows to distinguish between dormant and e.g., quiescent or even senescent cells. This was in contrast to solid evidence that netrin signaling stimulates abdominal dissemination of ovarian cancer cells in the mouse xenograft and their survival in organoid culture. Moreover, the role of ERK in mediating the effects of netrin signaling in the context of the survival of non-proliferative ovarian cancer cells was found to be somewhat underdeveloped.

      Experiments previously published in citation 7 show that growth arrest in patient ascites derived spheroids is fully reversible and that argued against non-proliferative spheroids being a form of senescence and moved this work into the dormancy field.  We have added extensive new support for our model systems and data to address the counterintuitive aspects of MEK-ERK signaling in survival instead of proliferation. 

      Reviewer #1 Recommendations for Authors

      (1) A better characterization of the spheroid model may be warranted, including staining for the markers of quiescence and senescence (including combining these markers with staining for the components of the netrin pathway)

      See Figure 1A and page 6, lines 126-36 where we have added blots for Ki67 and p130 to better emphasize the arrested proliferative state of cells in our screening conditions.  We have also added these same controls for patient ascites-derived spheroids in Figure 4 and described on page 9, lines 203-7.  One realization from this CRISPR screen, and others in our lab, is that it identifies functionally important aspects of cell physiology and not necessarily ones that are easily explored using commercially available antibodies.  Netrin-1 and -3 staining of patient derived spheroids in Fig. 4, as well as cell line spheroids stained in Fig. 4 Supplement 1 further support the relevance of this pathway in dormant cancer cells because Netrins are expressed in the right place at the right time.  The Netrin-1 stimulation experiments in Fig. 5C were originally carried out to probe HGSOC cells for functionality of Netrin receptors since we couldn’t reliably detected them by blotting or staining with available antibodies.  This demonstrates that this pathway is active in the various HGSOC cell lines we’ve used and specifically, using OVCAR8 cells, we show it is only active in suspension culture conditions.

      (2) In figure 1A it appears that total p38 levels are reduced in some cell lines in spheroid vs. adherent culture. The authors should comment on this.

      These blots have been updated to be more clear.  Overall p38 levels may be reduced in some cell lines and when compared with activation levels of phosphorylated p38 it suggests the fraction of activated p38 is higher. OVCAR8 cells may be an exception where the overall activity level remains approximately the same.

      (3) The authors should perhaps provide a clearer rationale for choosing to focus on the netrin signaling vs. e.g., GPCR signaling, and consider more explicit defining of "primary" vs. "tertiary" categories in Reactome gene set analysis.

      We’ve updated Fig. 1E and the text on page7, lines 161-5 to illustrate which gene categories identified in the screen belong to which tiers of Reactome categories. It better visualizes why we have investigated the Axon guidance pathway that includes Netrin because it is a highly specific signaling pathway that scores similarly to the broader and less specific categories at the very top of the list. As an aside, the GPCR signaling and GPCR downstream signaling have proven to be fairly intractable categories.  As best we can tell the GPCR downstream signaling category is full of MAPK family members and likely represents some redundancy with MAPK further down.  

      (4) In figure 3A-C, including factors whose expression did not appear to change between adherent and suspension conditions may be warranted as the internal control. Figure 3D-F may benefit from some sort of quantification.

      The mRNA expression levels are normalized to GAPDH as an internal control. We have updated this figure and re-plotted it as fold change relative to adherent culture cells with statistical comparisons to indicate which are significantly upregulated in suspension culture.

      The IHC experiments are now in Fig. 4D-F and show positive staining for Netrin-1 and -3.  Netrin-3 is easiest to see, while Netrin-1 is trickier because the difference with the no primary antibody control isn’t intensity, but the tint of the DAB stain.  We had to counter stain the patient spheroids with Hematoxylin in order for the slide scanner to find the best focal plane and make image registration between sections possible.  This unfortunately makes the Netrin-1 staining rather subtle.  For cell line spheroids in the Fig. 4, Supplement 1 we didn’t need the slide scanner and show negative controls without counter stain that are much more convincing of Netrin-1 detection and reassure us that our staining detects the intended target.  We’ve updated the labels in Fig. 4 and Fig. 4, Supplement 1 for this to be more intuitive.  Unfortunately, relying on the tint of the DAB stain leaves this as a qualitative experiment.

      - In figure 4C-E the authors show that Netrin-1 stimulation induces ERK phosphorylation whereby it is argued that this is a "low-level" stimulation of ERK signaling required for the survival of ovarian cells in the suspension. This is however hard to appreciate, and it was thought that having adherent cells in parallel would be helpful to wage whether this indeed is a "low level" ERK activity. Moreover, the authors should likely include downstream substrates of ERK (e.g., RSKs) as well as p38 in these experiments. The control experiments for the effects of PD184352 on ERK phosphorylation also appear to be warranted. Finally, performing the experiments with PD184352 in the presence of Netrin-1 stimulation would also be advantageous.

      We have added a new Netrin-1 stimulation experiment in Fig. 4D (described on page 11, line 244-52) that shows that Netrins can only activate  very low levels of ERK phosphorylation in suspension when proliferation is arrested. Netrin-1 stimulation of quiescent adherent cells where stimulation of proliferation is possible shows that Netrins are unable to activate ERK phosphorylation in this condition.  In contrast, we also stimulate quiescent adherent OVCAR8 cells with an equal concentration of EGF (a known mitogen) to offer high level ERK phosphorylation as a side by side comparison.  I think that this offers clear evidence that Netrin signaling is inconsistent with inducing cell proliferation.  We’ve also updated citations in the introduction to include citation 26 that offers a previously reported paradigm of Netrin-ERK signaling in axon outgrowth that is a non-cancer, non-proliferative context to remind readers that Netrins utilize MEK-ERK differently. 

      We highlight Netrin-MEK-ERK signaling as key to survival for a number of reasons.  First, Netrin signaling in this paradigm does not fit the dependence receptor paradigm where loss of Netrin receptors protect against cell death.  Fig. 5B rules this out as receptor loss never offers a survival advantage, but clearly receptor deletions compromise survival in suspension culture.  Second, positive Netrin signaling is known to support survival by inactivating phosphorylation of DAPK1.  We’ve added this experiment as Fig. 5 Supplement 1D and show that loss of Netrin receptors doesn’t reduce DAPK1 phosphorylation in a time course of suspension culture.  Consequently, we conclude this isn’t the survival signal either.  Since MEK and ERK family members scored in our screen, we investigated their role in survival.  We now show two different MEK inhibitors with different inhibitory mechanisms to confirm that MEK inhibition induces cell death. In addition to the previous PD184352 inhibitor in our first submission, we’ve added Trametinib as well and this is shown in Fig. 5G.  Since it is surprising the MEK inhibition can kill instead of just arrest proliferation, we’ve also added another cell death assay in which we show trypan blue dye exclusion as a second look at survival.  This is now Fig. 5H.  Lastly, we include Trametinib inhibition of ERK phosphorylation in these assays in Fig. 5I.  While we leave open what takes place downstream of ERK, our model in Fig. 5J offers a very detailed look at the components upstream.

      - Does inhibition of ERK prevent the abdominal spread of ovarian cancer cells? The authors may feel that this is out of the scope of the study, which I would agree with, but then the claims regarding ERK being the major mediator of the effects of netrin signaling should be perhaps slightly toned down.

      We agree that loss of function xenograft experiments will enhance our discovery of Netrin’s role in dormancy and metastasis.  We have added a new Fig. 6 that uses xenografts with Netrin receptor deficient OVCAR8 cells (UNC5 4KO).  It demonstrates that two weeks following IP engraftment we can isolate spheroids from abdominal washes and that cells have entered a state of reduced proliferation as determined by lowered Ki67 expression as well as other proliferation inducing genes.  In the case of UNC5 4KO cells, there is significant attrition of these cells as determined by recovering spheroids in adherent culture (Fig.6C) and by Alu PCR to detect human cells in abdominal washes (Fig. 6D).  Lastly, xenografts of UNC5 4KO cells cause much less aggressive disease and significantly extend survival of these mice (Fig. 6E,F).  Not exactly the experiment that the reviewer is asking for, but a clear indication that Netrin signaling supports survival in xenograft model of dormancy.

      - Notwithstanding that this could be deduced from figures 6D and F, it would be helpful if the number of mice used in each experimental group is clearly annotated in the corresponding figure legends. Moreover, indicating the precise statistical tests that were used in the figures would be helpful (e.g., specifying whether anova is one-way, two-way, or?)

      We have added labels to what is now Fig. 8B to indicate the number of animals used for each genotype of cells.  We have also updated figure legends to include more details of statistical tests used in each instance.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required):

      The majority of the conclusions are well supported by strong experimental evidence. The only area where that is not fully the case is the role of Pak1 as a downstream effector of FoxG1-FoxO6 and its effects on macropinocytosis. To further strengthen this claim, the authors should demonstrate that ablation of Pak1 can rescue the functional consequences of forced FoxO6 expression and whether overexpression of Pak1 rescues quiescence exit in FoxO6 knockout. Thank you to the reviewer for these helpful suggestions. To investigate the effects of Pak1 ablation, and therefore more directly the link between FOXG1 and FoxO6 and macropinocytosis, we tested the published Pak1 inhibitor IPA-3. Unfortunately, to distinguish the role of Pak1 in quiescence exit and macropinocytosis, we would need a dosage of IPA-3 that is efficacious but does not affect cell proliferation. It was not possible to optimise such a dosage (a dosage of 10uM is shown to be efficacious at inhibiting Pak1 (Verma et al, 2020; Wong et al, 2013) however even at 2.5uM we see significant cell death in our cells. Indeed, this is potentially due to pleiotropic roles for Pak1.

      Also, it is not feasible to overexpress Pak1 in the FoxO6 KO cells with inducible FOXG1. To ensure we are investigating quiescence exit this would need to be in an inducible manner; however, re-transfecting cells using the PiggyBac system would potentially alter FOXG1 transgene levels by excising the existing transgene.

      As shown in Figure S3, we do not observe clear vacuole formation in F6 (FOXG1-inducible) cells upon Dox addition. As detailed in the discussion, we hypothesise that FoxO6-induced macropinocytosis could represent a stalled state, with other pathways downstream of FOXG1 necessary to be activated concomitantly to ensure cell cycle re-entry, e.g., through increased pinocytic flux that cannot be assessed within our experimental timeframes. Indeed, active Pak1 has been found to modulate pinocytic cycling, enhancing both FITC-dextran uptake and efflux (Dharmawardhane et al, 2000). We therefore would not hypothesise that high Pak1 levels alone would be sufficient to drive quiescence exit.

      Alternatively, the macropinocytosis observed may be a metabolic stress response because of the hyperactivation of signalling pathways upon FoxO6 overexpression. Hyperactivation of Ras signalling, canonical Wnt and PI3K signalling have all been shown to play roles in inducing macropinocytosis (Overmeyer et al, 2008; Tejeda-Muñoz et al, 2019; Recouvreux & Commisso, 2017).

      We believe the observed macropinocytosis phenotype upon Foxo6 overexpression, and the changes in Pak1 expression upon Foxo6 loss or FOXG1 induction provide interesting insights into the function of this underexplored FoxO family member. However, currently we are unable to demonstrate a direct link between these processes and have therefore modified the text to reflect this (see lines 292-4, 330-3, 365-8).

      • The manuscript stresses the role of NSC quiescence exit in GBM and demonstrates that FoxG1 KO reduces FoxO6 levels in a murine GBM cell line but a BMP4-mediated quiescence and dox-induced FoxG1 over-expression or an abolishment of cell cycle re-entry thereof by reduced FoxO6 levels in the case of FoxG1 KO is lacking. But this would significantly substantiate the relevance of the findings. *

      Mouse GBM cells have elevated levels of FoxG1 and have been shown to be refractory to BMP4-mediated quiescence entry, maintaining colony formation following BMP treatment (Bulstrode et al, 2017). It is therefore challenging to specifically investigate cell cycle re-entry/ quiescence exit using these mouse GBM cells, or indeed any GBM cell line due to their inability to respond fully to BMP cues (Caren et al, 2015). It has also been shown by Bulstrode et al, 2017 that Foxg1 null mouse neural stem cells show an increased propensity to exit cycle in response to BMP treatment, and reduced colony formation on return to EGF/FGF-2 growth factors. FOXG1 null cell lines therefore show a reduced response to BMP cues, making it difficult to explore quiescence exit per se.To navigate this, instead we investigated Dox-induced FOXG1 overexpression in FoxO6 WT and KO mouse NS cells, which display similar quiescence characteristics upon BMP treatment (Figure 4).

      • In the introduction and discussion, FoxO6 is mentioned for its oncogenic roles in various cancers but no reference to GBM specifically is cited. It feels like a missed opportunity to not show evidence of this in the IENS cell line that has reduced levels of FoxO6; is there an effect in their proliferative capacity? What are the expression levels of Pak1 following FoxG1 KO in IENS cells? *

      Thank you for the helpful suggestion. It is indeed true the literature on FoxO6 in GBM is lacking, explaining the absence of citations on this. On investigation of expression of the proliferation marker Ki67 in these cells we found no significant difference in expression, now shown in Figure 1H. This is in fitting with previous findings of our lab (Bulstrode et al, 2017) which show that FOXG1 is dispensable for the maintenance of continued NSC or GSC proliferation in vitro. We investigated the expression levels of Pak1 following FOXG1 KO in IENS and found a decrease in both KO lines compared to parental cells (updated Figure 6F).

      As explained in our discussion, these data suggest that Foxg1/FoxO6/Pak1 are not functionally important in sustaining GSC/NSC proliferation, as shown by the lack of proliferation defects upon Foxg1 or FoxO6 deletion (Bulstrode et al, 2017), but impact regulatory transitions, as cells prepare to exit quiescence into the proliferative radial-glia like state.

      *Minor comments *

      - Fig1A shows 4 and 2-fold respectively for the two mouse NSC lines, not 17 and 4-fold increase as written on manuscript, please adjust accordingly.

      The qRT-PCR data are presented as log2(fold change) or - ddCt, where this value equals zero for the calibrator sample, as indicated in the figure legends and axes. The data are presented in this way to enable accurate visualisation of up- and down-regulation of gene expression. Data are stated as ‘fold increase’ in the text for ease of reading, which we have clarified in the text and figure legends (e.g. lines 154 and 176).

        • Fig2G manuscript reports a 235-fold upregulation, but graph looks more like a 7 or 8-fold as shown on Fig1A for the F6 NSC line. I would recommend checking the fold changes reported throughout the paper. *

      See previous comment above. The qRT-PCR data are presented as log2(fold change) or - ddCt, where this value equals zero for the calibrator, as indicated in the figure legends and axes. The data are presented in this way to enable accurate visualisation of up- and down-regulation of gene expression. Data are stated as ‘fold increase’ in the text for ease of reading, which we have clarified in the text and figure legends (e.g. lines 154 and 176).

      • The manuscript describes the increase of FOXG1 after BMP4-induced cell cycle exit as compared to non-BMP4 treated cells (p.8 first paragraph), but I am wondering if this expression is rather compared to dox negative and not vs BMP4 negative treatment. *

      Data are presented relative to the non-BMP treated (EGF/FGF-2) control throughout the manuscript for consistency. This is to enable changes in expression between -Dox and +Dox to be visualised throughout the quiescence-exit time course relative to the initial starting population in EGF/FGF-2 growth media, prior to BMP treatment.

        1. In Fig2G it is interesting that FoxO6 is upregulated in BMP4 treated throughout the experiment with highest values at day10 post treatment. At the same time, non-BMP4 treated cells keep decreasing their FoxO6 levels dramatically but there is no mention or reference to this effect.*

      In Figure 2G, all cells have been treated with BMP4, prior to return to growth media (EGF/FGF) with or without Dox. It is true that in the +Dox condition with FOXG1 induction, FoxO6 levels continue to increase up to Day 10, perhaps reflective of the expansion of a highly proliferative radial glia-like population.

        1. Fig2 would benefit from a western blot like Fig1D where FoxG1 and FoxO6-HA protein levels are also shown in dox-treated comparing BMP4-treated vs non-treated. *

      Due to the lack of specific FoxO6 antibodies and the absence of a FoxO6-HA tag in this cell line, it is not possible to perform protein analysis of FoxO6 levels in this figure as for Figure 1D.

      • The colonies in Fig3E should be quantified, as their ability to form neurospheres seems somewhat compromised upon FoxO6 KO. Fig3B and 3F could perhaps be consolidated into one panel in the interest of space and presentation. *

      Good suggestion. We have now consolidated Fig 3B and 3F into one panel (now Figure 3F) as suggested by the reviewer. We performed additional replicates for Figure 3E to quantify the colony formation efficiency. This showed a small but insignificant decrease in colony forming ability in the KO cells (Figure 3E). Importantly the FoxO6 null cells do form colonies, and our results show that FoxO6 is not essential for proliferation or colony formation of NSCs in EGF/FGF-2 – this therefore does not account for the complete loss in colony formation we see the in the FoxO6 KO cells upon FOXG1 induction.

      • Fig4A shows vs "parental" non-BMP on y axis but wouldn't this show fold change of dox+ parental vs parental. The authors should clarify this. *

      All samples in Figure 4A are compared to parental cells in EGF/FGF-2, i.e. non-BMP treated, as the calibrator sample where log2(fold change) equals zero. We chose to set a single calibrator sample for all data (parental and FoxO6 KO cells included) to allow us to compare changes in FOXG1 transgene across the entire experiment.

      • Perhaps the authors can add a non-BMP4 treated count of % FOXG1 positive cells to Fig4C for reference. *

      As shown in Figure 4A, both parental and FoxO6 KO cells show similar, i.e. negligible, FOXG1 transgene expression without Dox, compared to the parental non-BMP4 treated control, therefore negligible FOXG1-V5 positive cells are seen by ICC. We have edited Figure 4A to include a non-BMP treated and BMP-treated control to show the negligible FOXG1-V5 expression by qPCR as controls.

      • The sentence mentioning Fig5D for the first time (p.10 third paragraph) needs rephrasing for clarity and should also call out Fig5C for the mCherry expression live cell imaging data where appropriate. Fig5D does not appear to be live imaging as implied by the text. If vacuole formation is observed already as early as 10-11h after Dox induction, then it should be shown somewhere in Fig5. Vacuole formation is shown with a higher magnification image inset only in the 22h timepoint image. I think Fig5E should be more substantiated with some sort of quantification, e.g. % of vacuoles positive for EEA1 and/or LAMP1. *

      We apologise for this. The first reference to Figure 5D one line 234 should refer to Figure 5C, this has now been corrected in the text. Vacuoles are visible in Figure 5C panel 10 h 30 min, however, to make this clearer we have also supplied an accompanying movie of the live imaging (Movie 1). The imaging in Fig 5E has not been quantified as this imaging was performed with the purpose of confirming the vacuole structures seen are not simply enlarged lysosomes, due to their similarity in appearance to those published elsewhere (Ramosaj et al, 2021; Leeman et al, 2018). Instead, we have provided Western blotting data in Figure S5E to support this conclusion that there is no clear increase in EEA1 or LAMP1 (early endosomal or lysosomal) expression upon FoxO6-HA induction.

      *- Could the authors comment on the lack of proliferative advantage of the FoxO6 overexpression. FigS3 shows Edu staining, but there is no proliferation assay in either Fig5 or S3. What would be the effect of FoxO6 overexpression on BMP4-mediated quiescence with or without FoxG1 over-expression? *

      Induction of FoxO6-HA overexpression does not provide a proliferative advantage to the cells. Looking at individual cells, those with high FoxO6-HA levels seem to associate with EdU negativity. In Figure S3 we provide quantitative EdU incorporation assay as a proliferation assay (quantification of the number of cells cycling, therefore incorporating EdU, within a 24h pulse period). Quantification of the EdU staining in Figure S3G is provided in Figure S3H. We have now clarified this in the text on page 11, lines 263-4.

      Unfortunately, due to transgene overexpression using the PiggyBac transposon method, it is not feasible to overexpress FoxO6 and FOXG1 in the same cell line, as re-transfecting cells using the PiggyBac system would potentially alter FOXG1 transgene levels and make results difficult to interpret. Given the association of vacuolated cells with EdU negativity, we predict that FoxO6 overexpression would not give an advantage for quiescence exit. Indeed, BMP-treated cells with FoxO6 overexpression show a decrease in EdU positivity, as shown in Figure S3H. As discussed in the text, we hypothesise that cells with FoxO6 overexpression are in a stalled state, potentially due to signalling hyperactivation. While this may not be physiological, it gives us clues as to the function and downstream targets of FoxO6, which remain uncharacterised.

      *- Can the authors clarify if there is a proliferation change in F6 cells in Fig6F as in Fig2F? Fig6F shows Pak1 is already upregulated in quiescent NSCs, what are the expression levels of Pak1 in FoxO6 -/- ANS4 cells upon FoxG1-mediated quiescence exit as shown in Fig4? Is there a particular reason why the F6 cell line data is shown only up to day2 post Dox-induction rather than d4 or d10? For consistency with the rest of similar experimental data this timeline should be extended. Does Pak1 remain elevated, plateaus or keeps reducing further post day2? *

      The data is (previous) Figure 6F is the same assay and cell line as presented in Figure 2, but at an early timepoint (Day 2) during the quiescence exit assay. We have provided in the panel qRT-PCR analysis of Ki67 to show that cells begin to show increased proliferation at this timepoint. Due to our hypothesis that Pak1 is required at an early transition point, we decided to analyse this expression at an earlier timepoint than Figure 2. We have also repeated this at D10 (data below), showing Pak1 levels continue to increase with time, along with FoxO6 and the proliferative marker Ki67. Due to technical issues with variable FOXG1 transgene levels we were unable to analyse Pak1 expression levels in FoxO6+/- ANS4 cells upon FOXG1-mediated quiescence exit.

      *15 . Reviewer #1 (Significance (Required)): *

      The study provides a conceptual advance for exit from stem cell quiescence. There is strong evidence provided for murine neural stem cells, but the link to GBM cancer stem cells is less developed (but perhaps this is the subject of a separate manuscript).

      While FoxG1 is a known regulator of neurodevelopment and glioblastoma, the functions of FoxO6 have not been studied in the context of neural stem cells. In my view, this study should be of high interest to audiences in both neurodevelopment and cancer research. * Expertise: glioblastoma, cancer stem cells, neurodevelopment *

      We have edited the text and title to clarify that neural stem cells are used here as a model for GSCs with high levels of FOXG1 (e.g. lines 36 and 69).


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      *Major comments: *

      -The choice of NSCs as a main experimental model to understand the effects of FoxG1 and FoxO6 is not fully justified. The authors had previously shown that FoxG1 is expressed at very low levels in NSCs (Fig. 1A in Bulstrode et al. 2017). FoxO6 also seems to be barely expressed in NSCs (Fig. 1 of the current manuscript) and, in addition, its levels seem to go further down as cells exit quiescence (-Dox line in Fig. 2H). Therefore, these two genes do not seem to play an important role in the normal exit from quiescence of NSCs, with FoxO6 only affecting FoxG1 overexpression-induced exit from quiescence. * * *If the aim is to mimic a GBM-like state by FoxG1 overexpression, this should be made much clearer in the text, including title and abstract. In that case, the authors should also show a direct comparison of the levels of FoxG1 in GBM and upon Dox-induced overexpression in NSCs. *

      We agree with this criticism and suggestion to fix this. It is indeed our aim to mimic a GBM-like state by inducing FOXG1 overexpression and we should have made that more explicit. All experiments are performed in the context of high FOXG1 level. Like Foxg1, FoxO6’s homeostatic roles may be subtle in adulthood, and mostly involved in neural plasticity (Yu et al, 2019). This is in keeping with our finding that basal FoxO6 levels are low in adult NSCs and not required for sustained proliferation but are important for cell state transitions. If the FoxO6 levels activated by elevated FOXG1 represent an acquired dependency of GBM, there may be a therapeutic window to target this pathway. However, given the poorly understood roles of FoxO6, further work is needed to determine its specific value as a therapeutic target. We have modified the title and the text to make this clearer. This is also stated in the first paragraph of the results section on page 7 (line 148).

      We have provided below a Western Blot (Bulstrode, 2016) in which FOXG1 levels in F6 cells induced with Dox (1000 ng/ml the dosage used) with the GBM cell lines G7 and G144, and the normal NS cell line U5. This shows that the FOXG1 levels induced are significantly higher than found in normal neural stem cells (mouse or human). This model has been previously used and published in Bulstrode et al, 2017, upon which this manuscript expands.

      *-While the authors state that they aim to study NSC quiescence, they use a protocol that is closer to modelling astrocytic differentiation. In fact, in their previous work, they use this very same protocol (removal of growth factors and addition of BMP) to study the role of FoxG1 and Sox2 on astrocyte de-differentiation (Bulstrode et al. 2017). While there is arguably no perfect in vitro model of NSC quiescence, the current standard in the field is treatment with both BMP and FGF for 48 to 72 hours (e.g.: Mira et al., 2010, Martynoga et al., 2013, Knobloch et al., 2017, Leeman et al., 2020). BMP alone is regarded as a pro-astrocytic differentiation cue, and 24 hours might not be enough for NSCs to fully commit to either differentiation or quiescence. Therefore, either the claims in the paper are changed to match the astrocytic differentiation model, or a standard quiescence protocol should be used throughout to confirm the findings also apply to the exit from quiescence of NSCs. *

      We agree with the reviewer that there is indeed no perfect in vitro model of NSC quiescence and thank the reviewer for this useful discussion. Coincident with this project, this was an active area of research from our laboratory as explored by Marques-Torrejon et al, 2021 (Nature Comms). After 24 h BMP4 treatment, we found that adult mouse NS cells: exit cell cycle, are growth factor unresponsive, obtain an astrocytic morphology, upregulate astrocytic markers such as Gfap and Aqp4, and downregulate radial glia/NS cell markers such as Nestin and Olig2 (Figure 3).

      We therefore initially viewed them as terminally differentiated. However, the exact state of these cells is difficult to define due to the lack of definitive markers and transcriptional differences that can distinguish terminally differentiated GFAP-expressing astrocytes from quiescent type B SVZ NS cells (which also express GFAP) (Bulstrode et al, 2017; Doetsch et al, 1999; Codega et al, 2014). Findings from our laboratory later suggested some NS cell markers are maintained following BMP4 treatment and these cells can be forced back into cycle with combined Wnt/EGF signalling, or FGF/BMP signalling (Marques-Torrejon et al 2021). This suggests in vitro NS cells may lie along a continuous spectrum of states from dormant quiescent, activated quiescent (primed for cell cycle re-entry) to actively proliferating, similar to that observed in vivo in the mouse SVZ (Dulken et al, 2017). Indeed, after 24 h BMP4 treatment, we observe a minimal level of colony formation in no Dox controls following 10 days of exposure to the growth factors EGF/FGF-2 (Figure 2D-F).

      These non-cycling BMP4-induced astrocytic cells might therefore be better viewed as dormant quiescent NSCs, hence our reference as quiescent NSCs. The assay conditions used in this manuscript differ to those of Marques-Torrejon et al, in terms of density and length of BMP4 treatment; it is therefore likely that our BMP-treated cells are at different stages along the continuum between dormancy and primed quiescent states. Importantly, regardless of the exact cell type induced by 24 h BMP4 treatment, we have considered the changes induced by FOXG1 overexpression, in comparison to the effect of NS cell media alone.

      *-The FoxO6-induced vacuole formation in NSCs is a very interesting finding. However, so far it was only observed upon FoxO6 overexpression. To claim vacuolization is required for quiescence exit, the authors should show whether this phenomenon is also observed upon normal exit from quiescence and FoxG1-induced reactivation of NSCs. From the author's own data, Pak1 (which induces vacuolization) is unlikely to reactivate NSCs, as its expression is highest in BMP-treated cells (Figure 6F). The authors should show whether some vacuolization is present at these stage in NSCs and if not, discuss the possible interplay between Pak1 and FoxO6 in vacuole formation and quiescence exit. *

      As detailed in the discussion, we hypothesise that FoxO6- induced macropinocytosis could represent a stalled state, with other pathways downstream of FOXG1 necessary to be activated concomitantly to ensure cell cycle re-entry, e.g., through increased pinocytic flux that cannot be assessed within our experimental timeframes. Indeed, active Pak1 has been found to modulate pinocytic cycling, enhancing both FITC-dextran uptake and efflux (Dharmawardhane et al, 2000). Alternatively, the macropinocytosis observed may be a metabolic stress response because of hyperactivation of signalling pathways upon FoxO6 overexpression Hyperactivation of Ras signalling, canonical Wnt and PI3K signalling have all been shown to play roles in inducing macropinocytosis (Overmeyer et al, 2008; Tejeda-Muñoz et al, 2019; Recouvreux & Commisso, 2017).

      We do not see clear evidence of vacuoles in FOXG1-induced reactivation of NSCs – this supports that the macropinocytosis seen upon FoxO6 overexpression is a stalled state or due to hyperactivation. While this may not be physical, it gives us clues as to the function and downstream targets of FoxO6, which remain uncharacterised (such as a link of FoxO6 and FOXG1 with Pak1-related pathways). Demonstrating a requirement for vacuolisation in quiescence exit is outwidth this manuscript and therefore we are careful not to claim this. We have modified the text to clarify this.

      As the reviewer noted, it is interesting that Pak1 is highest in BMP-treated cells; it seems that BMP signalling itself is triggering elevated Pak1 levels, likely as cells undergo extensive cell shape changes during the transition from proliferation to quiescence. However, in EGF/FGF-2, Pak1 levels decrease, and our data suggests that FOXG1/FoxO6 are required to increase or maintain Pak1, potentially to again enable the cell shape/metabolic changes required on quiescence exit. We have added to the text to expand upon this observation on page 14 (lines 330-333). -Finally, the data on the regulation of Pak1 expression by FoxO6 is insufficient to draw any strong conclusions. Downregulation of Pak1 in FoxO6 cells is not enough evidence to claim a direct regulation. The authors should show whether Pak1 levels are increased after FoxO6 overexpression and whether FoxG1 is downregulated in FoxO6 KO NSCs (indirectly affecting Pak1 expression).

      We have performed qRT-PCR analysis of Foxg1 expression in FoxO6 KO NSCs and see no consistent difference in expression, indicating this is not indirectly affecting Pak1 expression (see below, 1). We have also investigated Pak1 levels upon FoxO6 overexpression, over a time course following Dox addition (see below, 2). Interestingly, when FoxO6 is overexpressed, Pak1 is not clearly upregulated at any time-point. It may be that as Pak1 is already expressed in the -Dox controls, due to its roles in a variety of cellular functions, that the levels are saturated already. It is clear that Pak1 expression decreases upon FoxO6 loss in EGF/FGF (without coincident Foxg1 downregulation) and in F6 cells, higher FOXG1 correlates with higher Pak1 in EGF/FGF. Together with the induction of macropinocytosis upon FoxO6 overexpression, these data provide interesting insights into the potential pathways downstream of Foxo6 in controlling quiescence exit, directly or indirectly related to Pak1 signalling. We have modified the text to reflect this on page 14 (lines 330-333).

      Minor comments: * Please state in the main text that NSCs are derived from the SVZ. *

      This has been added to the text on page 7 (line 149) and is in the methods ‘Cell Culture’ section.

      Reviewer #2 (Significance (Required)):

      As I said before, I find this work tackles a very important question, how is the exit from quiescence controlled in NSCs. This manuscript will be of interest to researchers in the fields of adult stem cell biology and adult neurogenesis. While my expertise lies mostly on NSC biology, this work is of potential great interest for the cancer field, particularly for brain cancer research. Elucidating the mechanisms GBM cells use to exit quiescence is crucial in order to avoid the relapse of this aggressive form of brain cancer. To increase the relevance of the work to the cancer community, some of the key findings should be reproduced with GBM cells. It would be particularly important to show whether Pak1 induced vacuolization and macropinocytosis can be observed in GBM cells.

      As detailed in the discussion, we hypothesise that FoxO6- induced macropinocytosis could represent a stalled state, with other pathways downstream of FOXG1 necessary to be activated concomitantly to ensure cell cycle re-entry, e.g., through increased pinocytic flux that cannot be assessed within our experimental timeframes. Alternatively, the macropinocytosis observed may be a metabolic stress response because of hyperactivation of signalling pathways upon FoxO6 overexpression Hyperactivation of Ras signalling, canonical Wnt and PI3K signalling have all been shown to play roles in inducing macropinocytosis (Overmeyer et al, 2008; Tejeda-Muñoz et al, 2019; Recouvreux & Commisso, 2017). We do not see clear evidence of vacuoles in FOXG1-indued reactivation of NSCs– this supports that the macropinocytosis seen upon FoxO6 overexpression is a stalled state or due to hyperactivation. We do not therefore think macropinocytosis per se would be observed in quiescence exit of GBM cells – indeed a normal form of macropinocytosis-induced cell death called methuosis has been observed in GBM cells with hyperactivated Ras signalling (Overmeyer et al, 2008). However, this phenotype still gives us clues as to the function of FoxO6 in quiescence exit in GSCs and the downstream signalling pathways it may regulate, such as Pak1-related signalling (discussed on lines 330-3 and 366-9).

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: * The overall objective of the paper is to investigate the mechanisms by which co-option of the activity of developmental master lineage regulators by cancer cells allows them to gain fitness. To answer this question, they focus on FOXG1. This TF acts during the specification of the telecephalon. Its expression can be increased in Glioblastoma (GBM) and, more importantly for the paper, FOXG1 has previously been shown to promote exit from quiescence of glioblastoma stem cells (GSCs) and non-transformed neural stem cells (NSCs). In a previous screen, the authors identified FoxO6 as a potential direct target gene of FOXG1. In this paper, they showed that with the gain of expression for FOXG1 in NSCs and loss of FOXG1 in GSCs, FoxO6 is increased or decreased, respectively. Loss of FoxO6 in NSCs does not alter their cell cycle or cell shape and specification. Yet, loss of FoxO6 in NSCs blocks FOXG1-mediated exit from quiescence. To understand the mechanisms, they decided to overexpress FoxO6 in NSCs and demonstrated that the cells undergo macropinocytosis, a process by which cells can engulf large amount of nutriments from the external medium. It remains to be determined whether this macropinocytosis occurs in cells overexpressing FOXG1 and GSCs. The authors provide a first answer by showing that overexpression of FOXG1 induces not only FoxO6 but also the expression of PAK1, one of the key kinases that regulates the membrane engulfment of macropinocytosis in NSCs. In GSC lines, the decrease of FOXO6 decreases PAK1 levels. *

      Major comments: * The paper describes interesting and convincing results (number of cell lines, repeated experiments seems sufficient) but it is difficult to reconcile them all in a single model, and this diminishes the impact of the study. Epistatic interactions between FoxG1, FoxO6, PAK1 and macropinocytosis are not always studied in the same cell models. Whether FOXG1-induced exit from quiescence of NSCs is dependent on a FOXG1-->FOXO6-->PAK1-->Macropinocytosis axis remains to be demonstrated. Also does such an axis operate in tumor cells remains to be fully assessed? In particular, if FoxO6 overexpression in NSCs can induce macropinocytosis, is this cellular process induced by FoxO6 downstream of FOXG1 activity during NSC quiescence exit? Is PAK1 a relay of FoxO6? Experiments looking at macropinocytosis and the involvement of PAK1 in the cell models of Figure 4 will definitely help to bridge the different results all together. *

      We thank the reviewer for this useful insight and discussion for future work.

      To directly investigate the effects of Pak1 ablation, and therefore more directly the link between FOXG1 and FoxO6 and macropinocytosis, we tested the published Pak1 inhibitor IPA-3. Unfortunately, to distinguish the role of Pak1 in quiescence exit and macropinocytosis, we would need a dosage of IPA-3 that is efficacious but does not affect cell proliferation. It was not possible to optimise such a dosage (a dosage of 10uM is shown to be efficacious at inhibiting Pak1 (Verma et al, 2020; Wong et al, 2013) however even at 2.5uM we see significant cell death in our cells. Indeed, this is potentially due to the variety of cellular functions Pak1 is involved in. Conversely, it is not feasible to overexpress Pak1 in the FoxO6 KO cells with inducible FOXG1. To ensure we are investigating quiescence exit this would need to be in an inducible manner; however, re-transfecting cells using the PiggyBac system would potentially alter FOXG1 transgene levels (through excision of the existing transgene) and therefore make results difficult to interpret.

      We hypothesise that FoxO6- induced macropinocytosis could represent a stalled state, with other pathways downstream of FOXG1 necessary to be activated concomitantly to ensure cell cycle re-entry, e.g., through increased pinocytic flux that cannot be assessed within our experimental timeframes (as detailed in the text discussion). Alternatively, the macropinocytosis observed may be a metabolic stress response because of hyperactivation of signalling pathways upon FoxO6 overexpression Hyperactivation of Ras signalling, canonical Wnt and PI3K signalling have all been shown to play roles in inducing macropinocytosis (Overmeyer et al, 2008; Tejeda-Muñoz et al, 2019; Recouvreux & Commisso, 2017). We do not see clear evidence of vacuoles in FOXG1-induced reactivation of NSCs– this supports that the macropinocytosis seen upon FoxO6 overexpression is a stalled state or due to hyperactivation and therefore not a physiological process in quiescence exit. We do not therefore think macropinocytosis per se would be observed in quiescence exit of GBM cells – indeed a normal form of macropinocytosis-induced cell death called methuosis has been observed in GBM cells with hyperactivated Ras signalling (Overmeyer et al, 2008).

      However, we believe the observed macropinocytosis phenotype upon Foxo6 overexpression, and the changes in Pak1 expression upon Foxo6 loss or FOXG1 induction provide interesting insights into the function of this underexplored FoxO family member, in GSCs and the downstream signalling pathways it may control, such as Pak1-related signalling. We have modified the text to reflect the limitations of our current data and discuss this (lines 330-3 and 366-9).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1 (Public Review):

      He et al. investigate the requirement and function of Blimp1 (encoded by Prdm1) in murine NK cells and ILC1. Employing a conditional knockout mouse model (Prdm1flox x Ncr1cre), the authors describe impaired abundance and maturation of Prdm1-deficient NK cells and ILC1 in different tissues. Blimp1-deficient NK cells have reduced expression of cytotoxic molecules (Gzmb, Prf1) and, in some instances, Ifng production, and Prdm1flox x Ncr1cre mice show impaired tumor control in experimental metastasis models. Using single-cell RNA sequencing analysis, the authors propose that Prdm1 regulates JunB expression and NK cell maturation. Based on in silico analyses, the authors suggest manifold intercellular communication between NK/ILC1 and macrophages. Without following up on any of these potentially interesting suggestions, the authors conclude their study reiterating that Prdm1 regulates IFNg-production of tumor-infiltrating NK cells and ILC1. Many of the reported functions of Blimp1 in NK cells have previously been identified using a mixed-chimera strategy comparing Prdm1 WT and KO NK cells (Kallies et al., Blood 2011). Here, the authors expand on these findings using a conditional model to delete Prdm1 in NK/ILC1 and single-cell sequencing and provide a more refined analysis of the functions of Blimp1 in these cells. Cell-chat analysis suggests close interactions of Blimp-dependent NK/ILC1 subsets with hepatic macrophages, but these suggestions are not followed up by experiments. Potentially interesting differences in the macrophage compartment of Ncr1-Cre x Prdm1-fl/fl mice are suggested by the scRNA-Seq data but are not validated e.g. by FACS. The study falls short in providing new mechanistic insights. Nevertheless, it is an interesting confirmation of "old" suggestions in a more refined setting, and the provided single-cell mRNA-Seq data represents a potentially valuable resource for the community. There are some control analyses that are required to support the conclusions of the authors, and I have a few suggestions that would help to improve the manuscript.

      We sincerely appreciate your careful review and insightful feedback on our manuscript. We have carefully considered your comments and present the results of new experiments conducted in response to your suggestions. Please find the detailed responses below.

      Major comments

      Comment 1: The authors do not control for the potential effects of Cre expression. Expression of Cre from within the Ncr1 locus (using the mouse model established by Narni-Mancinelli et al.) has significant effects on NK cells and especially ILC1s (reducing their frequency and absolute numbers and altering their functionality. The authors should characterize the Ncr1cre mice used here (developed by Shanghai Model Organism Center) in this regard and should use proper controls (Ncr1Cre+ Prdm1wt/wt as control for Ncr1Cre+ Prdm1fl/fl, instead of WT littermates) for all of their key data, e.g. those depicted in Fig 1FG, 2ADFH, 7D, S2,3,4.

      Response 1: This is a very insightful question that has posed a challenge for many researchers, including us, engaged in conditional knockout studies. The expression of Cre and the insertion of loxP sequences both have the potential to influence gene expression. This is because the region where loxP is inserted may contain regulatory sequences for the gene of interest. Ncr1-Cre is a frequently used transgenic mouse model in our laboratory. In our prior research, we also had concerns about the possible impact of Cre on NKp46 expression, which could lead to a decline in NK cell function. Therefore, in our previous studies focused on Smad4 expression in NK cells, we conducted similar experiments. In Figure 6 of our published paper in the Journal of Clinical Investigation (Wang et al., J Clin Invest, 2018), we compared NKp46-iCreTgfbr2fl/flSmad4fl/WT with NKp46-iCreTgfbr2fl/flSmad4fl/fl. Although the primary purpose is to establish Smad4's independence from TGF-β, it also allows for a comparison between Smad4fl/fl and Smad4fl/WT in the presence of Cre. In the critical phenotype we assessed, NKp46-iCreTgfbr2fl/flSmad4fl/fl (compared with NKp46-iCreTgfbr2fl/flSmad4fl/WT) exhibited the same phenotype as NKp46-iCreSmad4fl/fl (compared with NKp46WTSmad4fl/fl). This suggests that Cre's influence on NK cells may be within a reasonable and controllable range. Furthermore, in contrast to the decrease in Ncr1 expression caused by Cre, the reduction in the expression levels of genes targeted by Loxp knockout, such as Prdm1 in this study (Figure 1 E), is more significant. Therefore, with the current techniques and research methods, we believe that the data provided in this study can support the role of Prdm1 in

      NK cells.

      Comment 2: Several of the phenotypic findings on NK cells have been described before by Kallies et al. in 2011 (Ref 29), although using a different genetic Prdm1-ablation model (Prdm1-GFP/GFP knockin/knockout model). This study reported impaired NK cell maturation, reduced Gzmb expression, impaired in vivo cytotoxicity against subcutaneous RMA-S cells, impaired in vitro proliferation, comparable in vitro killing, increase in BM NK cell numbers. The authors should discuss/mention this more prominently in their manuscript, and highlight where they confirm or refine these previous findings, and where they actually provide new information.

      Response 2: We appreciate your valuable suggestions. The article you referred to, published in Blood, is indeed an excellent work. While we had cited this article, our discussion regarding its specific content was limited. Based on your advice, we have made revisions and included the following content in our discussion section (page 24; line 489-493):

      “In a study involving systemic knockout combined with competitive transplantation, it was found that Prdm1 promotes NK cell maturation and the expression of Gzmb. On the contrary, the same study also found that NK cells with Prdm1 deficiency exhibit heightened proliferation, increased survival, enhanced migratory abilities towards tumors, and greater cytotoxicity against subcutaneously implanted RMAS tumors (31).”.

      Comment 3: What is the reason to refer to the enriched cluster in Blimp1-deficient NK cells as "Junbhi"? There is no follow-up for a function of Junb, and there are many other genes upregulated in these cells. Most critically, these cells seem to represent exactly the c-Kithi cells that Kallies et al. already showed and discussed in their paper. The authors should stain for Kit, and also refer to this. Also, MacKay et al. performed Blimp1-Chip-Seq (in T cells), maybe it would be interesting to check to which of the identified DEGs Blimp1 can bind.

      Response 3: We appreciate the suggestion from the reviewer. We think a gene that supports the development of lymphocytes doesn't necessarily positively regulate their function. For example, JunB is essential for T cell development but can also induce T cell exhaustion (Lynn et al., Nature. 2019). Therefore, while Prdm1 has been shown to promote NK cell development, it cannot be assumed that it always positively regulates NK cell function, especially for anti-cancer immune surveillance. In this respect, we try to find a driving-factor of the impaired anti-tumor ability of Prdm1_Δ_Ncr1 NK cells. Although there are many other genes upregulated in this cluster (e.g. Kit), JunB attracts more our interest of its potential for regulating NK cells functions in cancer, whereas c-Kit is more likely a marker of NK cells maturation, which has been well-demonstrated by Kallies et al. and other studies. Our previous studies also showed that the expression of c-kit was decreased in mature NK cells, compared immature NK cells (Wang et al., J Clin Invest, 2018). 

      The lack of following experiments of Junb is because we cannot find valuable surface markers to investigate the follow-up function of _Junb_hi cNK cluster. If we use intracellular markers, it is more likely an analysis of gene expression pattern, which has been well-described in our RNA-seq data. As we describe above, our study did not aim to further investigate the role of prdm1 in NK cells maturation, as the c-Kit expression was upregulated in Prdm1-kncok NK cells and correlated with NK cell maturation, which has been validated by Kallies et al.. 

      We also have discussed the potential DEGs that could be bound and regulated by Prdm1 in our revised manuscript (page 27-28; line 561-571):

      “Prdm1 and Hobit directly bound and repressed Tcf7 (18), which encoded TCF-1, a TF binding and limiting the activity of Gzmb regulatory element (69). Gzmb has been demonstrated directly bound and activated by Junb in NK cells, which suggested Gzmb expression regulated by multiple Prdm1/Hobit downstream signals (26). In human T cells, binding motif of JUNB was enriched in the binding sites of PRDM1 (70), indicating the essential role of PRDM1-JUNB axis during NK cell and T cell development. In NK cells deficient in Prdm1 expression, we noted a decrease in Gzmb levels alongside with an elevation in Junb expression. This indicates that Prdm1 not only facilitates the expression of Gzmb in NK cells but also suppresses Junb expression. Given that Junb is recognized as a positive regulator of Gzmb (71), this presents a complex interplay that seems contradictory. Therefore, it is imperative to develop a theoretical framework to comprehensively understand and interpret this paradoxical relationship.”.

      Comment 4: cNK cells are considered circulating cells, that transiently pass through the liver.

      Previous studies have suggested almost identical gene expression patterns in hepatic and splenic NK cells. In functional tests, they often "perform" identically. I am therefore a bit surprised that the authors find a differential dependency of Blimp1 for the IFNg production of splenic (no role of Blimp1) versus hepatic (Blimp1 regulating IFNg production) NK cells (Fig S3). Do the authors have any suggestions on that? The analyses are performed by 12+4h stimulations with IL12/18, which could involve the effects of altered bystander cells (as suggested by Figure 6). Therefore, these analyses should be provided upon standard 4h stimulations with IL12/18 and also with PMA/I under BFA. Note: liver and splenic cNK cells look quite different in the chosen histograms in Figures 7 A, B, C, yet there is massive variability in these analyses - is there any systematic/technical problem?

      Response 4: We appreciate the valuable suggestion from the reviewer. Studies have suggested that, at the gene expression or transcriptomic level, liver NK cells exhibit more similarity to splenic NK cells while displaying greater divergence from liver ILC1s. However, we do not think that splenic NK cells or peripheral blood NK cells (which are more abundant in circulation) are entirely indistinguishable from liver NK cells. Notably, there are substantial differences in their maturity levels, with liver NK cells being more mature. Since we are examining the protein levels, a 4-hour stimulation period may not fully capture these distinctions. Even when considering the potential impact of bystander cells, the experimental design specifically targets Prdm1 knockout within NK cells, ensuring that the study accurately elucidates the role of Prdm1 in NK cells. For each experiment, we have implemented control measures, and any variances observed in the figures may be attributed to individual variations among the animals. It is also possible that the MFI values measured by flow cytometry exhibit larger variations than a percentage.

      Comment 5: Figure 4 H/I - In contrast to NK cells in Fig 4E, F, the KO and WT ILC1s seem to co-cluster largely. Authors should validate differentially expressed genes. How strong is the effect of Blimp1 in ILC1s? Or is Blimp1 a critical TF driving effector differentiation in NK cells, while it has only subtle effects in ILC1 (these may be regulated by Hobit?)? This seems an interesting finding that should at least be discussed. For these types of small differences in ILC1, FACS confirmation analyses should be performed and findings be reevaluated using Cre-expressing controls (see above).

      Response 5: We appreciate the suggestion from the reviewer. As request, we analyze the DEGs in liver cNK cells and ILC1s from our scRNA-seq data (revised Supplemental Figure 8, A and B). There only a few valuable DEGs in ILC1s compared to cNK cells. It’s likely that Prdm1 have more essential effect of cNK cells transcriptional program, while it plays more important role in keep the homeostasis of ILC1s population. We have discussed these points to better inform the readers. (page 27; line 554-561): 

      “Previous studies have identified Hobit and Prdm1 as central regulators instructing tissue-dependent programs and retention of diverse tissue-resident lymphocytes (18, 51, 53). Liver ILC1s required Hobit, but not necessary for cNK cells (6). Expression of Prdm1 was remarkably higher in cNK cells versus ILC1s (18). While in our study, cNK cells and liver ILC1s reduced simultaneously in Prdm1ΔNcr1 mice, and even more significant in ILC1s. This indicates that while Prdm1 is expressed at lower levels in ILC1s, its role in preserving the quantity of ILC1s may be more crucial. Thus, Prdm1 and Hobit may have parallel program in instructing ILC1s functional development and maturation.”. 

      We cannot find valuable surface marker to evaluate the change in ILC1s, as most of changes are intracellular markers.

      Comment 6: The authors describe and discuss some of Figure 1 and 2 data as if Blimp1 would be involved in alternative NK versus ILC1 fates, but there is no evidence for this.

      Response 6: There is no evidence that Prdm1 could alter the fate decision of the progenitor towards liver cNK or ILC1s. Although some studies reported the conversion between cNK cells and ILC1s in special contexts, it was widely accepted that liver cNK cells and ILC1s originated from different progenitors. While we observed changes in the proportions of liver cNK cells and ILC1 in Prdm1 KO mice, we still lack sufficient evidence to support the relative independence of NK and ILC1 development, as well as evidence to indicate that Prdm1 is exclusively responsible for NK and ILC1.

      Regarding the changes in NK and ILC1 proportions after Prdm1 KO, we believe that both NK and ILC1 cells require Prdm1 to maintain their populations, with ILC1 possibly requiring it to a greater extent. This is the reason for the altered balance between NK and ILC1 cells following Prdm1 KO. We wish to clarify this point to prevent any misconceptions among readers. To address this, we have added the following content to the discussion section (page 25; line 509-516):

      “Furthermore, although both liver NK cells and liver ILC1s require Prdm1 to maintain their quantity, liver ILC1s demonstrate a more pronounced dependency on Prdm1. However, it is currently widely believed that liver NK cells and liver ILC1s originate from different progenitors. It is worth noting that while we observed changes in the NK and ILC1 proportions after Prdm1 knockout, our data does not support the hypothesis that Prdm1 affects progenitor differentiation decisions, thereby influencing the fate selection of NK and ILC1. Further research may be needed to elucidate how Prdm1 regulates the balance between NK cells and ILC1s.”.

      Comment 7: There are several recent studies suggesting a role for Hobit, homologue of Blimp1, in NK cells and in ILC1, and in the control of liver metastases. The authors should discuss similar and unique functions of Hobit and Blimp1, also in the regulation of gene expression patterns, and should refer to these studies.

      Response 7: We would like to express our gratitude to the reviewer for your insightful comments, which bring forth a critical perspective. In accordance with the reviewer's suggestion, we have updated our discussion to include the diverse functions guided by Hobit and Prdm1 in regulating the development and function of cNK cells and ILC1s (page 27; line 554-561):

      “Previous studies have identified Hobit and Prdm1 as central regulators instructing tissue-dependent programs and retention of diverse tissue-resident lymphocytes (18, 51, 53). Liver ILC1s required Hobit, but not necessary for cNK cells (6). Expression of Prdm1 was remarkably higher in cNK cells versus ILC1s (18). While in our study, cNK cells and liver ILC1s reduced simultaneously in Prdm1ΔNcr1 mice, and even more significant in ILC1s. This indicates that while Prdm1 is expressed at lower levels in ILC1s, its role in preserving the quantity of ILC1s may be more crucial. Thus, Prdm1 and Hobit may have parallel program in instructing ILC1s functional development and maturation.”.

      As shown in Supplemental Figure 8, we analyzed two published scRNA-seq data performed with Hobit_KO mice and integrated DEGs in cNK cells and ILC1s with our data. We observed overlaps of DEGs in _Prdm1_Δ_Ncr1 and Hobit_KO between cNK cells and ILC1s, such as _Junb, Tcf7, Gzmb, and Prf1 (Supplemental Figure 8), indicating the similar regulatory network of Prdm1 and Hobit. These data are now described on page 19; lines 386-395:   

      “We also compared the gene expression patterns between Prdm1 and Hobit (homologue of Blimp1) with two published scRNA-seq data (51, 53). Following the knockout of Hobit, the DEGs were primarily identified within ILC1s. Conversely, after the knockout of Prdm1, a greater number of DEGs were observed in cNK cells. This indicates that Prdm1 likely possesses a broader range of target genes within cNK cells, whereas Hobit appears to have a more pronounced impact on gene expression within ILC1s (Supplemental Figure 8, C-F). There are some overlaps between the downstream transcriptional profile of Prdm1 and Hobit in liver cNK cells and ILC1s (Supplemental Figure 8, G and H), such as Junb, Fosb, Tcf7, Kit, Gzmb, Prf1, and Cxcr6 was simultaneously upregulated or downregulated in both Prdm1ΔNcr1 and _Hobit_KO liver cNK cells or ILC1s, indicating the similar regulatory networks of Prdm1 and Hobit.”.

      Comment 8: Figure 4: The authors should discuss (and cross-validate) their liver gene expression analyses in the context of published datasets of NK and ILC1, such as the ones by Lopez et al, Friedrich et al, Ducimetiere et al and Yomogida et al.

      Response 8: We thank the reviewer for raising this important point. To address this question, we have now analyzed the gene expression of liver cNK cells and ILC1 in two published data mentioned above, also in the context of Hobit-knock. We compared gene expression of different clusters and described in our revised manuscript (page 19; lines 386-395). 

      “We also compared the gene expression patterns between Prdm1 and Hobit (homologue of Blimp1) with two published scRNA-seq data (51, 53). Following the knockout of Hobit, the DEGs were primarily identified within ILC1s. Conversely, after the knockout of Prdm1, a greater number of DEGs were observed in cNK cells. This indicates that Prdm1 likely possesses a broader range of target genes within cNK cells, whereas Hobit appears to have a more pronounced impact on gene expression within ILC1s (Supplemental Figure 8, C-F). There are some overlaps between the downstream transcriptional profile of Prdm1 and Hobit in liver cNK cells and ILC1s (Supplemental Figure 8, G and H), such as Junb, Fosb, Tcf7, Kit, Gzmb, Prf1, and Cxcr6 was simultaneously upregulated or downregulated in both Prdm1ΔNcr1 and _Hobit_KO liver cNK cells or ILC1s, indicating the similar regulatory networks of Prdm1 and Hobit.”.

      Recommendations For The Authors:

      Comment 9: The use of a paired t-test analysis when comparing cells/groups from different mice is not correct. Instead, the authors should consider using e.g. an unpaired t-test and re-test the indicated significance (e.g. Figure 1F, Figure 2H).

      Response 9: We thank the reviewer’s comments. As we used littermates for the experiments and they are compared side by side, so the paired t-test analysis is acceptable. We reanalysis the significance in the results of Figure 1F, and Figure 2H using unpaired t-test. The statistics significance of Figure 1F using unpaired t-test was same as using t-test. However, in Figure 2H, the reduced IFN-γ production not reach statistics significance when used un-paired t-test (Supplemental Figure 12B). It may attribute to the variation between different littermates, but the trend is still under the scope of our conclusion. We believe that employing a paired t-test between littermates could be also meaningful. As such, we kept both statistical methodologies to ensure a thorough evaluation.

      Comment 10: In several instances, it is unclear whether data are pooled or representative (and if so, of how many analyses). This information needs to be provided for all analyses. 

      Response 10: We apologize for the lack of details and have now provided the sufficient information in our figure legends. 

      For example, we delete the number in original histogram to avoid the misunderstanding of the unclear whether data are pooled or representative (e.g. original Figure7 A-C; revised Figure7 A-C). Furthermore, we added the “representative” in figure legends of all flow cytometric plots to better inform readers (e.g. original Figure2, D and F; revised Figure2, B and D).

      Comment 11: In the title and abstract authors use "type 1 ILCs" for both NK cells and ILC1, and it is difficult to understand which phenotypes correspond to cNK cells versus ILC1. Most of the analyses clearly separate these two different cell types. I would appreciate a lot being more accurate in the abstract, and describing cNK and ILC1 phenotypes in a clear way.

      Response 11: We are really sorry for our inaccurate descriptions. According to Spits et al., (Spits et al., Nature Reviews Immunology, 2013) and other related studies, we have now adopted a more appropriate nomenclature as “Conventional NK cells” correspond to “cNK cells”, “Type 1 innate lymphoid cells” to “ILC1s”, and “Group 1 ILC” as the collective name of cNK and ILC1s. 

      The definition of these cells was described in the introduction (page 4, line 52-53; line58-62): 

      “Group 1 ILCs consist of cNK cells and ILC1s (1, 2), with distinct developmental trajectories and effect molecules (3).”, “In a state of homeostasis, liver group 1 ILCs (CD45+CD3-NK1.1+NKp46+) can be discriminated into cNK cells and ILC1s by the differential expression of CD49a and CD49b (2): cNK cells are marked by the expression of CD49b, while liver ILC1s exhibit a distinctive positivity for CD49a. Tumor Necrosis Factor Related Apoptosis Inducing Ligand (TRAIL) is also expressed on liver ILC1s, but not on cNK cells (10, 11).”. 

      We also describe cNK and ILC1 phenotypes in our scRNA-seq data, as shown in page 13; line 259-261: 

      “cNK cells expressed high levels of Itga2 (CD49b) and Eomes, while ILC1s had high levels expression of Itga1 (CD49a) and Tnfsf10 (Supplemental Figure 5, F and G).”.

      Comment 12: In the abstract authors state "The present study unveiled a novel regulatory mechanism of Prdm1 in liver Type 1 ILCs, showing promising potential for developing innovative immune therapy strategies against liver cancer." - maybe authors should discuss how their findings could be used for therapeutic approaches?

      Response 12: We appreciate comments from the reviewer. As there hasn't been a clear consensus on the role of Prdm1 in NK cells prior to this, some studies have suggested that Prdm1 can inhibit cytokine secretion by NK cells. Particularly, Kallies et al. in their 2011 article in Blood found that Prdm1 might suppress NK cell anti-tumor activity. Hence, there hasn't been any immunotherapy targeting Prdm1 in NK cells for cancer treatment. Our research demonstrates the enhancing role of Prdm1 in NK cell anti-tumor activity, providing theoretical support for NK cell therapy targeting Prdm1. 

      We added the following content to the discussion section (page 29; line 605-609): 

      “Further research may provide deeper insight into the role of PRDM1 in the anti-tumor function of human NK cells, enabling a more direct investigation of its application in cancer therapies. Given its important role in preserving liver cNK cells and ILC1s functional heterogeneity, enhancing Prdm1 function in human NK cells could potentially be a strategy to promote NK cell-based immunotherapy for cancer.”.

      Comment 13: The authors should explain or interpret their data a bit more (e.g. what is the consequence of GSEA enriched in negative regulation of Il6 production? (Fig. 3D)  do NK cells produce Il6 (Figure 3)? What's the impact of Il17 signaling in NK/ILC1 (Figure 5). Do the authors suggest JunB-driven metabolic reprogramming (Suppl. Fig 6D-F?).

      Response 13: We appreciate comments from the reviewer. The question of IL-6 production in NK cell also raised by another reviewer. We have checked the GSEA results, and found no valuable genes in IL-6 production in NK cells. According to the suggestions of another reviewer (Response to Reviewer 2 Comment, Comment 14), it may be prudent to omit this figure.

      IL-17 signaling indicated the plasticity of ILC1s, that may be originated from the differentiation of ILC3, we added more discussion of this part (page 17; line 341-344). 

      “Several ILC3 signature genes, such as Rora, Tmem176a, and Tmem176b (45), highly expressed in this cluster (Supplemental Figure 7D). Considering the close relationship between IL-17 mediated immunity response and ILC3 (1, 46), it is plausible that _Il7r_hi ILC1 cluster may be attributed, at least in part, to potential plasticity between ILC1 and ILC3 subsets.”.

      The decreased mitochondrial function may have more relevance to NK cell exhaustion in tumors. Our data suggest that the elevated expression of JunB in NK cells may predispose them to exhaustion. Currently, our hypothesis regarding the promotion of NK cell exhaustion by high JunB expression is based on the observed correlation between JunB expression levels and exhaustion phenotypes (at the gene expression and IFN-γ secretion levels) and the findings in reference 67 (Lynn et al., Nature, 2019), where JunB was found to promote T cell exhaustion. However, we have not demonstrated causation between high JunB expression and exhaustion in NK cells. We propose that in NK cells, especially mature NK cells, excessive JunB expression may make them more sensitive to exhaustion inducers. Nevertheless, further research is needed to confirm this. To clarify this, we added the following content in the discussion section (page 26; line 537-543): 

      “While our current data is not sufficient to definitively classify these cells as exhausted NK cells, it supports that a subpopulation, referred to Junbhi cluster, demonstrates an exhaustion-like phenotype.

      The significant increase in this cell population following Prdm1 knockout in NK cells may potentially be one of the reasons why Prdm1ΔNcr1 mice lose their tumor-killing capacity. Whether the excessive expression of JunB in NK cells is also a contributing factor to their exhaustion, similar to T cells(65), requires further investigation.”.

      Comment 14: Ref 25 and Ref 57 are the same publication?

      Response 14: We are really sorry for our careless mistakes. We have checked all the reference and corrected the wrong format.

      Comment 15: Figure 1, E - The method description of RT-PCR is missing. I apologize if I have overlooked this information.

      Response 15: We have now added the description of RT-PCR in our revised method section (page 31; line 638-644):

      “RNA was extracted from FACS-sorted NK cells or splenocytes using RNASimple Total RNA Kit (TIANGEN Biotech, 4992858) and subsequently reverse transcribed to cDNA with SuperScript VILO Master Mix (Thermo Fisher Scientific, 11755050) according to manufacturer’s instructions. qPCR was performed with SYBR Green Mix (Thermo Fisher Scientific, A25742) and CFX Opus 96 Real-Time PCR System (Bio-Rad). The relative mRNA expression level was calculated using 2-ddCt method. Primer sequences:           Prdm1: 5’-CAGAAACACTACTTGGTACA-3’; 5’-GATTGCTTGTGCTGCTAA-3’.”

      Comment 16: Figure 1, F - The NKp46+CD3- gate for the liver seems to cut the population, not all cells are included.

      Response 16: We appreciate the review’s comment and apologize for our carelessness. We expend our data with more samples and reanalyzed them with a more convincing gating strategy. We now update our figures (revised Figure 1G; revised Supplemental Figure 2A). Several changes have occurred in the data and conclusions, and we have accordingly revised these contents in our manuscript.

      The original text is:

      “Proportion and absolute number of cNK cells in blood, bone marrow, lung, liver, spleen, and lymph nodes were analyzed by flow cytometry. Compared with Prdm1+/+ mice, the percentage of cNK cells (CD3-NK1.1+NKp46+) among lymphocytes was decreased in all of these tissues except bone marrow and lymph nodes (Figure 1F; Supplemental Figure 2A). However, no significant difference was observed in the percentage of cNK cells among bone marrow-derived lymphocytes between Prdm1ΔNcr1 and Prdm1+/+ mice. The absolute number of cNK cells in blood, lung, liver, and spleen also decreased in Prdm1ΔNcr1 mice (Figure 1F; Supplemental Figure 2A). Only a slight decrease in the number of cNK cells was observed in the lymph nodes of Prdm1ΔNcr1 mice, which did not reach statistical significance either (Supplemental Figure 2A). In contrast, the absolute number of cNK cells in Prdm1fl/fl mice bone marrow is moderately higher than Prdm1ΔNcr1 mice (Figure 1F).”

      The revised text is (page 8; line 142-146):

      “Proportion and absolute number of cNK cells in blood, bone marrow, lung, liver, spleen, and lymph nodes were analyzed by flow cytometry. Compared with Prdm1+/+ mice, the percentage and absolute number of NK cells (CD45+CD3-NK1.1+NKp46+) among lymphocytes was decreased in all of these tissues, whereas increased number of NK cells were observed in bone marrow (Figure 1G; Supplemental Figure 2A).”

      Comment 17: Figure 1, The y-axis labeling of lung CD3-NKp46+ cells (x10^3) is not correct.

      Response 17: We are really sorry for our carelessness. We now check the labels and make sure they are correct.

      Comment 18: Figure 1, The statistical significance of absolute numbers of NKp46+ cells in the bone marrow should be reviewed.

      Response 18: We expend our data with more samples and reanalyzed them with a more convincing gating strategy. We observed significant increase of bone marrow NK cells quantity in our updated data. These changes are now described in our revised manuscript.

      The original text is: 

      “However, no significant difference was observed in the percentage of cNK cells among bone marrow-derived lymphocytes between Prdm1ΔNcr1 and Prdm1+/+ mice”, “In contrast, the absolute number of cNK cells in Prdm1fl/fl mice bone marrow is moderately higher than Prdm1ΔNcr1 mice (Figure 1F).”

      The revised text is (page 8; line 142-146):

      “Proportion and absolute number of cNK cells in blood, bone marrow, lung, liver, spleen, and lymph nodes were analyzed by flow cytometry. Compared with Prdm1+/+ mice, the percentage and absolute number of NK cells (CD45+CD3-NK1.1+NKp46+) among lymphocytes was decreased in all of these tissues, whereas increased number of NK cells were observed in bone marrow (Figure 1G; Supplemental Figure 2A).”

      Comment 19: Figure 1, G - CD27 and CD11b are used to define maturation stages within NK cells. Here the authors are analyzing group 1 ILC instead (containing both NK cells and ILC1, especially in the liver). It would be better to pre-gate on Eomes+ or CD49b+ NK cells for this analysis.

      Response 19: We apologize for the lack of details in this analysis. We have pre-gate CD49b+ NK cells for the maturation stages analysis. We have now added this statement in our revised manuscript and figure legend (page 8; line 149-151)

      “The maturation of cNK cells (gated by CD45+CD3-NK1.1+NKp46+CD49b+) from blood, bone marrow, lung, liver, spleen, and lymph nodes were assessed, based on the expression of CD11b and CD27.”.

      Comment 20: Supplementary Figure 1, A - The NKp46+CD3- gate seems to cut the population, not all cells are included. y-axis labeling of spleen CD3-NKp46+ cells (%) is not correct.

      Response 20: Thanks, we have corrected these errors and shown in our revised supplementary Figure 2A.

      Comment 21: Figure 2, D-G - Did the authors analyse the ILC1/NK compartment of the tumor? What is the abundance and phenotype of these cells dependent on Prdm1 expression? Proper Crecontrols should be used (see above).

      Response 21: We appreciate the suggestions from the reviewer. As request, we have now added the analysis of cNK/ILC1s population in the context of tumor. The proportion changes of cNK cells and ILC1s in Prdm1_Δ_Ncr1 mice was similar with the no tumor-burden condition, while the number of both cNK cells and ILC1s decreased in tumor bearing liver (revised Figure 7D). These contents have been updated in our revised manuscript (page 23; line 479-481):

      “The proportion changes of cNK cells and ILC1s in Prdm1ΔNcr1 mice was similar with the no tumorburden condition, while the number of both cNK cells and ILC1s have significant decreased in tumor-bearing liver (Figure 7D).”.

      The reason why we did not use Cre-controls was described in comment 1.

      Comment 22: Figure 2, H - Prdm1-deficient NK and ILC1 produce less Ifng in response to in vitro stimulations with Il-12 and /or Il-18, and bulk Seq analysis (Fig 3F) shows reduced Il12rb2 expression. Does the expression of cytokine receptors correlate with the maturation of NK cells? This could be analyzed from the single-cell RNA-seq dataset. The statistical significance of %Ifng after Il12/Il18 stimulation should be revisited (see above).

      Response 22: We thank the reviewer for the suggestions. To address this question, we explored the expression of IL-12 and IL-18 receptors in cNK and ILC1 clusters. Within cNK clusters, Il12rb2, Il18r1 and Il18rap was highly expressed in Prf1hi and Cxcr3hi cNK clusters (revised Supplemental Figure 6H), indicating the IL-18 receptor expression correlated with the NK cell maturation. While in ILC1, these receptors mostly expressed on Il7r_hi and _Gzmb_hi ILC1 clusters (revised Supplemental Figure 7C). Significant decreased of _Il18r1 expression in Prdm1_Δ_Ncr1 cNK cells and ILC1s may associated with the impaired ability to produce IFN-γ. We now added this analysis (page 18; line 364-368):

      “Within cNK cells, Il12rb2, Il18r1 and Il18rap was highly expressed in Prf1hi and Cxcr3hi cNK clusters (Supplemental Figure 6I), indicating the IL-18 receptor expression correlated with the NK cell maturation. While in ILC1, these receptors mostly expressed on Il7r_hi and _Gzmb_hi ILC1 clusters (Supplemental Figure 7D). Significant decreased of _Il18r1 expression in Prdm1ΔNcr1 cNK cells and ILC1s may associated with the impaired ability to produce IFN-γ.”.

      The un-paired t test of IFN-γ production was displayed in revised supplemental Figure 12 B. Difference in IFN-γ production was found to be not significant when analyzed using an unpaired ttest in original Figure 2 H. However, significance was observed in tumor-bearing liver cNK cells and ILC1s, specifically under the context of IL-12/IL-18 stimulation, as depicted in the original Figure 7E using an unpaired t-test. These variations may be attributed to differences among different littermates. Despite these variations, the trend remains consistent with our overall conclusions. We believe that employing a paired t-test between littermates could be also meaningful. As such, we kept both statistical methodologies to ensure a thorough evaluation.

      Comment 23: Figure 3, A-E - For bulk sequencing analysis, splenic CD3-NK1.1+NKp46+ were isolated. This population also contains ILC1 in the spleen (e.g. Flommersfeld et al.), although much less abundant compared to NK cells, and compared to the liver compartment. However, have the authors tested the abundance of splenic ILC1 in Prdm1-deficient mice, which may impact the gene expression data? In line with this the detection of altered Cxcr6 expression in Figure F, which is usually expressed by ILC1 rather than NK cells, may indicate an alteration in ILC1 numbers. The authors should validate the altered expression of CXCR6, Itga1, and Cx3cr1 on NK cells by flow cytometry.

      Response 23: We cited the work of Flommersfeld et al. into our manuscript and have expanded our Results section to include the following information (page 19; line 377-385):

      “Previous research found that spleen NK cells could be divided into three distinct groups based on their expression levels of CD27, CD62L, CD49a, and CD49b (52). CD27+CD62L- NK cells have remarkable high expression of Batf3, while it was only barely expressed in CD27+CD62L+ and CD27-CD62L+ NK cells (52). Based the sequencing data published by Flommersfeld et al., (GSE180978), a notable negative correlation was observed between the expression levels of Prdm1 and Batf3 (Supplemental Figure 8I). On top of that, our findings unveiled the negative regulatory influence of Prdm1 on Batf3 within both spleen and liver NK cells. This discovery highlights a potential upstream mechanism that may influence the hemostasis of the spleen NK cell subpopulations through Batf3.”.

      We validated the expression of CD49a (Itga1) and CX3CR1 in liver cNK cells and ILC1s in our revised manuscript, which is described in our revised manuscript (page 9; line 170-174, page 14; line 231-233):

      “Increased CD49a expression was also observed in Prdm1ΔNcr1 liver ILC1s, while it showed decreased expression in NKp46+ cells in the liver, bone marrow, and lymph nodes (Supplemental Figure 2, F and G).”, “The percentage of CX3CR1+ cNK cells was significantly decreased in multiple tissues of Prdm1_Δ_Ncr1 mice, while the proportion of CX3CR1+ ILC1 was increased in the liver (Figure 3F).”

      Comment 24: Figure 3, F - Tnfsf26: which gene is this? is this a typo? Is a function of this gene in NK cells reported? Altered Batf3 expression suggests an impact on ILC1-like NK cells (Flommersfeld et al).

      Response 24: We are very sorry for our mistakes. We have removed Tnfrsf26 from the heatmap.

      Comment 25: Figure 3, G-J refer to Kallies data?! 

      Response 25: Kallies‘s data has mentioned the reduced GzmB expression in Blimp1gfp/gfp mice. However, compared with Kallies’s study, we further analyzed the GzmB and Perforin expression in different mature stages of NK cells. Reduced GzmB expression not only due to the less mature phenotype in Prdm1-deficient NK cells, highlighting the role of Prdm1 in regulating NK cell function. So, we added these contents in the revised manuscript (page 12; line 233-242):

      “Lower GZMB and PRF1 production was observed in Prdm1-deficient splenic cNK cells, liver cNK cells and ILC1s (Figure 3, H-K; Supplemental Figure 4, A-I). Notably, the proportion of GZMB+ and PRF1+ cNK cells was decreased among almost all of the maturation stages of cNK cells (Figure 3, J and K). The relative mean fluorescent intensities (MFIs) of GZMB and PRF1 consistently show a reduction across all developmental stages in PrdmΔNcr1 NK cells (Supplemental Figure 4, H and I). Yet, no statistical difference of PRF1 was found within the CD11b-CD27+ and CD11b+CD27+ subsets, likely due to the relatively lower perforin levels in these populations (Supplemental Figure 4I). These findings suggest that Prdm1 may directly influence cytotoxic molecule in NK cells, rather than impacting their anti-tumor abilities solely by affecting the maturation phenotype of Prdm1-deficient NK cells.”

      In Discussion section (Kallies’s work is cited here in revised manuscript) (page 24; line 500-502):

      “Our results not only confirmed a decrease in cytotoxic molecules in Prdm1-deficient NK cells (31) but also showed that the reduction in Gzmb and perforin is not solely attributable to the diminished maturation of these cells.”

      Comment 26: Figure 3, G, I - How do the authors explain the high variability of GzmB and Prf1 in Prdm1+/+ cells? 2 samples have comparable values to Prdm1-deficient cells.

      Response 26: This may be due to the inherent differences in MFI among different samples. In the revised version, we have added data on percentages, which exhibit much less variability (Figure 3, H and I). The MFIs of GZMB and PRF1 are moved to supplemental Figure 4 E and F.

      Comment 27: Did the authors test the mice for potential germline recombination of the floxed allele, which has been suggested as a potential problem of Ncr1cre?

      Response 27: We appreciate the insightful comments provided by the reviewer, and this is a really good question. In Prdm1fl/fl mice, germline recombination typically results in a systemic knockout of Prdm1, which can lead to embryonic lethality. Given that mice were successfully born in the current study, it is almost unlikely that germline recombination of Prdm1 occurred due to leaky expression of Cre.

      To confirm this issue, we isolated splenocytes and assessed Prdm1 expression using qPCR. We observed no significant difference in Prdm1 expression between splenocytes from Prdm1+/+ and Prdm1ΔNcr1 mice (revised Figure 1F). This also indicated that germline recombination issues are unlikely to be present in the Prdm1ΔNcr1 mice.

      Comment 28: Histograms do not show MFI

      Response 28: We appreciate the comments provided by the reviewer. The MFI value was omitted.

      Comment 29: Supplementary Figure 4, B - FACS plot labelling: Typo, Histograms do not show MFI.

      Response 29: We sincerely thank the reviewer for careful reading. The typo in this figure was corrected. The MFI is omitted.

      Comment 30: Figure 4, A - What are the cells in the red cluster in the middle of the UMAP, do they belong to B cells? Why do they cluster so separately? It is interesting, but also surprising that NK and ILC1 cluster map so far apart from each other (rather with CD8 or B cells? or NKT cells) - do the authors have any comments?

      Response 30: We sincerely apologize for the mistakes in labeling a group of cells in our previous analysis. Upon a thorough re-evaluation, we have corrected the labels of several cell clusters that were previously misidentified. The revised heatmap (revised Supplemental Figure 5C) represents the marker genes for each cluster. Additionally, in our updated analysis (revised Figure 4A), we have included clusters for Epithelial cells, CD4+ T cells, NKT cells, and Kupffer cells. Please note, the red cluster identified in the center of the original heatmap corresponds to the CD4+ T cells.

      We checked the markers of cNK cell and ILC1 clusters and confirmed they are labeled correctly, as Ncr1 and Klrb1c (NK1.1) was highly expressed in these clusters compared to others (revised Supplemental Figures 5E).

      Comment 31: Does Junb expression correlate with the maturation stages of NK cells?

      Response 31: Our previous research indicated that during the maturation process of NK cells, there was a decrease in the expression levels of Junb (negative correlation), whereas there was an increase in the expression levels of Prdm1 (Wang et al., J Clin Invest, 2018; Supplemental Figure 5c and Supplemental Figure 11).

      Comment 32: The authors may consider validating their scRNA-seq data (e.g. by FACS analysis for highlighted markers, eg. cKit, Tcf7, Gzma, Cxcr3).

      Response 32: We appreciate the suggestion from the reviewer. We validated several marker genes, including Gzmb, Prf1, and Cx3cr1 by FACS, as shown in the revised Figure 3 F-K. Currently, FACS cannot distinguish liver NK cells into as many distinct clusters as can be achieved through scRNAseq analysis. However, we expect that as technology progresses, we will be able to enhance our validation of the scRNA-seq data.

      Comment 33: It is a bit unclear to me why authors refer to Cxcr3hi NK cells as tissue-resident. This is based on Cxcr3 and Ccr2 expression. To make this statement, a much more detailed analysis would be required. How are CD69, CD49a, or CXCR6 expression of these cells?

      Response 34: We appreciate the suggestion from the reviewer. The primary reason for classifying this specific cluster of NK cells as tissue-resident is derived from the differential expression genes (DEGs) and Gene Ontology (GO) analysis, which demonstrate significant chemokine receptor activity within this cluster.

      To make this statement more clearly, we check the expression of the above markers, but only Cd69 had expression in cNK clusters, which was highly expressed in _Junb_hi and _Cxcr3_hi cNK cells (revised Supplemental Figure 6D). We also used top30 DEGs in ILC1s versus cNK to calculate the module score in all cNK clusters, as _Cxcr3_hi cNK had highest score among these clusters (revised Supplemental Figure 6D). This part has been updated in our manuscript (page 15; line 298-308):

      “Expression of tissue-resident markers Cd69 was also highly expressed in this clusters (Supplemental Figure 6D). The enrichment of chemokine receptors in the genes upregulated in the Cxcr3_hi cluster implying a greater likelihood of this cluster being tissue-resident compared with other cNK cell clusters (Figure 4H). To further confirmed tissue-resident properties of this clusters, we calculated the module score based on top30 DEGs in ILC1 versus cNK clusters, including _Cxcr6, Itga1, Cd160, Cd226, etc. _Cxcr3_hi cNK clusters have the highest score among all cNK clusters (Supplemental Figure 6H), indicating the similarity with liver ILC1s. In the tumor microenvironment, reports indicated that NK cells could transform into ILC1s (25). If this conversion of cNK cells into ILC1s also occurred under normal physiological conditions, then _Cxcr3_hi cNK cell cluster might be the most susceptible to such transformation.”

      Comment 35: The authors suggest that Prdm1 regulates chemokine receptor expression. An alternative explanation could be that this is an indirect effect of altering the abundance of NK cell subsets.

      Response 35: We are sorry for lacking the details in these figures. The input cell number of each genotype has now been added in following figure legends. 

      Figure 4F, “Proportions of cNK cells among total cNK cells (left; 211 cells in Prdm1+/+, and 141 cells in Prdm1ΔNcr1) and within clusters (right).”; Figure 5C, “Proportions of ILC1s among total ILC1s in different genotypes (left; 114 cells in Prdm1+/+, and 63 cells in Prdm1ΔNcr1) and within each cluster (right).”; Figure 6C, “Proportions of MDMs and KCs among total macrophages in different genotypes (510 cells in Prdm1+/+, and 624 cells in Prdm1ΔNcr1).”

      To minimize the effects of discrepancies in input numbers between samples with different genotypes, we represented the relative proportions of each cluster within its specific genotype (e.g. Supplemental Figure 6B; Supplemental Figure 7B; Supplemental Figure 9B).

      Comment 36: Supplementary Figures 6 and 7, A - The formatting of gene annotations does not fit the heat maps (the gene names on the last rows are missing).

      Response 36: We apologize for our careless mistakes. We have now addressed these mistakes.

      Comment 37: Supplementary Figures 6 and 7, What is the consequence of compromised mitochondrial function? Increase apoptosis?

      Response 37: In our experiments, we did not find that Prdm1 has an effect on the apoptosis of NK cells. Conversely, previous studies have found that Prdm1 might inhibit the proliferation of NK cells (C. Kucuk, et. al., PNAS, 2011). We acknowledge that there is ongoing debate regarding the precise definition of NK cell exhaustion. In our experiments, no changes were detected in the expression levels of surface markers (TIGIT) associated with exhaustion on NK cells following the knockout of Prdm1. However, we did note a significant reduction in the cytokine secretion capacity and tumor control efficacy of NK cells after Prdm1 knockout. We prefer to say that the consequence of compromised mitochondrial function might be increased exhaustion. As we mentioned in discussion part (line 482-483), mitochondrial fragmentation has been confirmed to be closely associated with NK cell exhaustion in tumor (Zheng et al. Nature immunology, 2019). Although the evidence to define the exhausted NK cells in Prdm1_Δ_Ncr1 was not sufficient, our data may support the compromised mitochondrial functions, at least in part, associated with the exhausted phenotype of Prdm1_Δ_Ncr1 NK cells in cancer. 

      We have discussed these points in our revised manuscript (page 26; line 529-543): 

      “Mitochondria are pivotal organelles crucial for cellular metabolism. Disruptions in mitochondrial function have been linked to T Cell exhaustion, attributed to glycolytic reprogramming (66). Similarly, mitochondrial fragmentation has been closely associated with NK cell exhaustion (67).

      However, the concept of NK cell exhaustion isn't as firmly established as it is for T cells. Exhausted NK cells should primarily exhibit diminished functions. This is characterized by a diminished ability to destroy tumor cells, a reduced capability to activate other components of the immune system, and compromised proliferation and survival rates. Additionally, this reduced functionality is associated with a decline in the expression of molecules responsible for cytotoxic activity, lower production of IFN-γ, and metabolic disturbances that may arise from mitochondrial dysfunction. While our current data is not sufficient to definitively classify these cells as exhausted NK cells, it supports that a subpopulation, referred to Junb_hi cluster, demonstrates an exhaustion-like phenotype. The significant increase in this cell population following _Prdm1 knockout in NK cells may potentially be one of the reasons why Prdm1ΔNcr1 mice lose their tumor-killing capacity. Whether the excessive expression of JunB in NK cells is also a contributing factor to their exhaustion, similar to T cells(65), requires further investigation.”.

      Comment 38: Figure 5, Describing the scRNA Seq data, the authors are switching a lot between Figure 4 and Figure 5. Maybe a reorganization of the Figures (Figure 4: NK cell; Figure 5: ILC1) could help.

      Response 38: We appreciate the reviewer’s suggestion. We have now reorganized the Figure 4 and Figure 5.

      Comment 39: Figure 5, We suggest naming one of the ILC1 clusters "Gzmbhi" to keep it consistent with the FACS data.

      Response 39: We agree with this excellent suggestion and have now renaming the “Gzmahi” ILC1 cluster as “Gzmbhi” ILC1 cluster.

      Comment 40: Figure 5, C - How was the JunB score derived (which genes were used)?

      Response 40: The JunB score was calculated based on the expression of marker genes in _Junb_hi cNK clusters (DEGs in _Junb_hi cNK cluster compared to other clusters, as shown in revised Supplemental figure 6A). The score was calculated using “AddModuleScore” R package.

      Comment 41: Figure 5, G, I - The authors highlight Il17 signaling pathway, what is the impact of Il17 on NK/ILC1? Did the authors check for ILC3 (Rorc expression) within the ILC1 cluster?

      Response 41: The enrichment of IL-17 signaling pathway in Il7r_hi ILC1 indicated that this cluster encompass ILC1s originate from the conversion of Rorγt+ ILC3s. Although the Rorc expression was undetectable in all ILC1 clusters, we found several ILC3 marker genes highly expressed in this clusters (e.g. Rora, Tmem176a, Tmem176b) according to the ILC3 transcriptomes (Robinette et al., _Nature Immunology, 2015). 

      We have added these contents in our revised manuscript (page 17; line 341-344): 

      “Several ILC3 signature genes, such as Rora, Tmem176a, and Tmem176b (45), highly expressed in this cluster (Supplemental Figure 7D). Considering the close relationship between IL-17 mediated immunity response and ILC3 (1, 46), it is plausible that _Il7r_hi ILC1 cluster may be attributed, at least in part, to potential plasticity between ILC1 and ILC3 subsets.”.

      Comment 42: Figure 5, The authors detect more Ly49E+ cytotoxic ILC1 in Prdm1fl Ncr1cre mice.

      How does this observation fit to the reduced cytotoxicity of NK cells?

      Response 42: The proportion of _Klra_hi ILC1 was increased, while the _Gzmb_hi ILC1 was decreased in _Prdm1_ΔNcr1 mice. Moreover, total number of three ILC1 cluster was reduced in _Prdm1_ΔNcr1 mice.

      Comment 43: Line 350/351: Citation required.

      Response 43: We added the respective reference. (reference 55 and 56).

      Comment 44: Figure 6, The Cell-chat analysis provides interesting suggestions, but none are experimentally addressed. It is also difficult to evaluate these analyses: are any of the Mac subsets altered in frequency or phenotype in either genotype? This could be analyzed from the single-cell data in Fig 4. At the very least, flow cytometric validation of predicted shifts in the Mac compartment should be confirmed.

      Response 44: We gratefully thanks for these valuable suggestions. As requested, we analyzed macrophages and validated some of the scRNA-seq data by flow cytometry. We have re-written this part with the analysis of altered proportion of two macrophage clusters (Kupffer cells and Monocyte-derived macrophages) (page 20-21; line 399-436):

      “The scRNA sequencing analysis identified two well-established subpopulations of liver macrophages: the resident Kupffer Cells (KCs) and the Monocyte-Derived Macrophages (MDMs) (Figure 6, A-C; Supplemental Figure 9A). When comparing the total proportion of macrophages within the immune cell population of the liver between WT and Prdm1ΔNcr1 mice, there is an increase in Prdm1ΔNcr1 mice (Figure 6C). To confirm these findings, we utilized flow cytometry to define macrophages, including both KCs and MDMs, gating by CD45+Ly6G-F4/80+CD11b+ (Figure 6D).

      Our analysis showed that, following the deletion of Prdm1 in Group 1 ILCs, there is a significant increase in both the proportion and number of macrophages in the liver (Figure 6D).

      According to the transcriptional profile, liver macrophages further clustered and were labeled as “Ly6c2_hi”; “_Cxcl2_hi”; “_Ear2_hi” MDMs, and “_Mrc1_hi”; “_C1q_hi” KCs (Figure 6A, Supplemental Figure 9, A-E). Increased proportion of MDMs and KCs was observed in _Prdm1ΔNcr1 cells (Supplemental Figure 9B). Within MDMs clusters, Ly6c2_hi MDMs mainly compose of _Prdm1+/+ cells, while Prdm1ΔNcr1 cells concentrated in Cxcl2_hi cluster (Figure 6C). The scRNA-seq data reveal that following Prdm1 knockout in NKp46+ cells, there is a decrease in the proportion of KCs within the macrophage population, while the proportion of MDMs increases (Figure 6D). CX3CR1, a chemokine receptor, is extensively utilized to distinguish KCs and MDMs within macrophages. Cells expressing CX3CR1 are identified as MDMs, whereas those without CX3CR1 expression are categorized as KCs (56). Employing flow cytometry and leveraging CX3CR1 expression, we assessed the ratios of KCs and MDMs. However, diverging from the scRNA-seq findings, flow cytometry indicates that post-Prdm1 knockout in group 1 ILCs, there is a minor increase in the proportion of KCs within the total liver macrophages, and a decrease in the proportion of MDMs (Figure 6D; Supplemental Figure 9B). This discrepancy could stem from the different bases of classification: scRNA-seq defines KCs based on gene expression profiles, whereas flow cytometry differentiates between KCs and MDMs using the single surface marker, CX3CR1. Analysis of the macrophage subsets identified by scRNA-seq reveals that, while MDM clusters generally show high CX3CR1 expression, there exists a subset within MDMs, labeled _Mrc1hi, that also exhibits high levels of CX3CR1 (Supplemental Figure 9C). Consequently, if flow cytometry solely employs CX3CR1 for differentiating between KCs and MDMs, it could result in disparities when compared to scRNA-seq outcomes. Both KCs and MDMs has significantly increased in Prdm1ΔNcr1 mice, which was consist with the scRNA-seq data (Supplemental Figure 9, B and F). Despite the decrease in the proportion of Ly6c2hi MDMs in Prdm1ΔNcr1 mice, the expression levels of Ly6c2 exhibited minimal variation between WT and Prdm1ΔNcr1 mice (Supplemental Figure 9D). Intriguingly, within certain cellular subsets, notably the Ear2hi cluster, the Ly6c2 expression levels in KO mice were found to be higher than those in WT mice. Additionally, we employed flow cytometry to examine Ly6C expression within the macrophages. Similar with the scRNA-seq findings, there were no notable differences in Ly6C expression levels between WT and KO mice (Figure 6E; Supplemental Figure 9G).”.

      The changes of the macrophage compartment indicated the potential influence of functional NK cells to macrophages. We have revised these parts in our results and discussion (line 590-601). However, to address more analysis on macrophage is worthy but would go beyond the scope of this manuscript, which will be a direction of our further work.

      Comment 45: Figure 6, C1qhi Mac only are few cells/events, and interactions (or cells?) seem to be gone in the Prdm1-floxed mice. Is that true? Does it make sense to perform cell-chat analysis on so few cells?

      Response 45: We have now added KCs to the cell-chat analysis, and this cluster was belonged to C1qhi KCs. We have revised the analysis of corresponding parts in our manuscript (page 20-21; line 408-428):

      “According to the transcriptional profile, liver macrophages further clustered and were labeled as “Ly6c2_hi”; “_Cxcl2_hi”; “_Ear2_hi” MDMs, and “_Mrc1_hi”; “_C1q_hi” KCs (Figure 6A, Supplemental Figure 9, A-E). Increased proportion of MDMs and KCs was observed in _Prdm1ΔNcr1 cells (Supplemental Figure 9B). Within MDMs clusters, Ly6c2_hi MDMs mainly compose of _Prdm1+/+ cells, while Prdm1ΔNcr1 cells concentrated in Cxcl2_hi cluster (Figure 6C). The scRNA-seq data reveal that following Prdm1 knockout in NKp46+ cells, there is a decrease in the proportion of KCs within the macrophage population, while the proportion of MDMs increases (Figure 6D). CX3CR1, a chemokine receptor, is extensively utilized to distinguish KCs and MDMs within macrophages. Cells expressing CX3CR1 are identified as MDMs, whereas those without CX3CR1 expression are categorized as KCs (56). Employing flow cytometry and leveraging CX3CR1 expression, we assessed the ratios of KCs and MDMs. However, diverging from the scRNA-seq findings, flow cytometry indicates that post-Prdm1 knockout in group 1 ILCs, there is a minor increase in the proportion of KCs within the total liver macrophages, and a decrease in the proportion of MDMs (Figure 6D; Supplemental Figure 9B). This discrepancy could stem from the different bases of classification: scRNA-seq defines KCs based on gene expression profiles, whereas flow cytometry differentiates between KCs and MDMs using the single surface marker, CX3CR1. Analysis of the macrophage subsets identified by scRNA-seq reveals that, while MDM clusters generally show high CX3CR1 expression, there exists a subset within MDMs, labeled _Mrc1hi, that also exhibits high levels of CX3CR1 (Supplemental Figure 9C). Consequently, if flow cytometry solely employs CX3CR1 for differentiating between KCs and MDMs, it could result in disparities when compared to scRNA-seq outcomes.”.

      Comment 46: Figure 6, C - Here the interactions of both Mac+ILC1 and Mac+NK are shown together. It would be interesting to separate this analysis (also Suppl. Fig 9A-B) into comparisons of Mac+ILC1 vs Mac1+NK from WT or Prdm1fl Ncr1 mice.

      Response 46: As request, we re-analyzed this part in each genotype, which was showed in the Supplemental Figure 10. These data have now been described in (page 22; line 445-447).

      “The reduction of interaction mostly occurred in the cross-talk of ILC1-MDM and ILC1-KC, whereas no difference was observed in cNK-MDM and cNK-KC interaction (Supplemental Figure 10, A-H)”

      Comment 47: Supplementary Figure 9, A, B - Is this analysis using WT and Prdm1fl Ncr1cre dataset together? 

      Response 47: Yes, we used WT and Prdm1_Δ_Ncr1 data together. As the request above, we separate this analysis from WT or Prdm1_Δ_Ncr1 Ncr1 mice. These data have now been described in (page 22; line 445-460):

      “The reduction of interaction mostly occurred in the cross-talk of ILC1-MDM and ILC1-KC, whereas no difference was observed in cNK-MDM and cNK-KC interaction (Supplemental Figure 10, A-H). A reduction in the interaction of ligand-receptor, such as Mif-CD74, Cxcl16-Cxcr6, and Cxcl10-Cxcr3 was observed in Prdm1ΔNcr1 mice compared to Prdm1+/+ mice (Supplemental Figure 11). Compared to Prdm1+/+ mice, the information flow of CXCL and MIF pathways significantly decreased in Prdm1ΔNcr1 mice (Figure 6, H and I; Supplemental Figure 10, B, D, F, and H). These pathways play a crucial role in facilitating macrophage migration. The CXCL signaling was sent from Ly6c2_hi _Cxcl2_hi MDMs and _C1q_hi KC, targeting all ILC1 clusters and _Cxcr3_hi cNK cell clusters (Figure 6J). Of note, although the population of _Cxcl2_hi macrophage primarily comprised cells from _Prdm1ΔNcr1 mice, the interaction within the CXCL pathway between macrophages and group 1 ILCs was obviously less than Prdm1+/+ sample (Figure 6J). These changes could be linked to a decreased population of ILC1s and Cxcr3_hi cNK cell cluster in _Prdm1ΔNcr1 mice, implying that the homeostasis of _Cxcl2_hi macrophages required sufficient signals from cNK cells and ILC1s. The impaired CXCLCXCR interactions might subsequently lead to reduced recruitment and activation of group 1 ILCs and macrophages within the tumor microenvironment.”.

      Comment 48: Figure 7, A-C -What is the consequence/interpretation of reduced Mitotracker staining? Any metabolic assays performed? The definition of NK cell "exhaustion" is unclear, is reduced IFNg enough for that? Is the concept of NK cell exhaustion clearly established? Only shortly touched upon in the discussion, the rationale for suggesting an exhausted phenotype, should be explained.

      Response 48: MitoTracker was used to assess the mitochondrial mass. The reduced staining indicated compromised mitochondria function, which associated with mitochondrial fragmentation.

      We believe that the exhaustion of NK cells is not as well-established a concept as it is for T cells. The purpose of detecting mitochondria in this study is to provide evidence for the relationship between Prdm1 and the exhaustion of NK cells. In the discussion section, we have added the following content (page 26; line 529-543):

      “Mitochondria are pivotal organelles crucial for cellular metabolism. Disruptions in mitochondrial function have been linked to T Cell exhaustion, attributed to glycolytic reprogramming (66). Similarly, mitochondrial fragmentation has been closely associated with NK cell exhaustion (67).

      However, the concept of NK cell exhaustion isn't as firmly established as it is for T cells. Exhausted NK cells should primarily exhibit diminished functions. This is characterized by a diminished ability to destroy tumor cells, a reduced capability to activate other components of the immune system, and compromised proliferation and survival rates. Additionally, this reduced functionality is associated with a decline in the expression of molecules responsible for cytotoxic activity, lower production of IFN-γ, and metabolic disturbances that may arise from mitochondrial dysfunction. While our current data is not sufficient to definitively classify these cells as exhausted NK cells, it supports that a subpopulation, referred to Junb_hi cluster, demonstrates an exhaustion-like phenotype. The significant increase in this cell population following _Prdm1 knockout in NK cells may potentially be one of the reasons why Prdm1ΔNcr1 mice lose their tumor-killing capacity. Whether the excessive expression of JunB in NK cells is also a contributing factor to their exhaustion, similar to T cells(65), requires further investigation.”.

      Comment 49: Figure 7, x-axis labelling (MFI) of histograms is not correct. Do bar graphs and FACS plots show the same data? Does the number in the FACS plots indicate the MFI? If so, the FACS plots do not show representative samples?

      Response 48: We appreciate the valuable comments provided by the reviewer. In the revised Figure 7, the MFI values have been removed. Bar graphs now display summary data from FACS histograms.

      A representative sample close to the group's mean value was chosen for display in the histograms.

      Comment 50: Figure 7, D - How are these data different from Figure 2H? Why is it now called "exhaustion", but not in 2H? Is the detected IFNg only driven by ex vivo stimulation with Il12/Il18? As above, a "standard" 4h assay should also be provided to allow better interpretation of potential differences. In the introduction, the authors cite the Ducimetiere study (Ref 5) highlighting "the primary function of ILC1 in suppressing the seeding of metastatic tumor cells in liver tissue". Thus, it would be interesting to test Ifng production by liver ILC1 and NK cells ex vivo at early time points of tumor inoculation.

      Response 50: Tumors grow and proliferate within tissues, constituting one of the major causes of lymphocyte exhaustion. This part of the current study aims to investigate whether Prdm1 aids NK cells or ILC1 in resisting the exhaustion induced by malignant tumors. Specifically, we seek to ascertain whether the absence of Prdm1 renders NK cells or ILC1 more susceptible to exhaustion within the tumor microenvironment. Therefore, we will consider the capacity to secrete IFN-γ upon IL-12/IL-18 stimulation as one indicative aspect of exhaustion. It's crucial to emphasize that this assessment serves as only one piece of evidence, not the sole determinant. Overnight stimulation is a conventional method for studying NK cells and has been widely used across different laboratories, including our lab (e.g. Bream et al., Blood, 2003; Yu et al., Immunity, 2006; Wang et al., J Clin Invest, 2018). It's essential to clarify that our approach does not involve stimulating with tumor cells to evaluate the secretion capacity of IFN-γ by NK cells or ILC1.

      Reviewer 2 (Public Review):

      Summary:

      This study offers a significant advancement in understanding liver innate lymphoid cell (ILC) biology by elucidating the role of the transcription factor Prdm1. It shows that Prdm1 is crucial in maintaining the balance between conventional natural killer (cNK) cells and ILC1s in the liver, with knockout models revealing a vital role in cancer defense mechanisms. Despite not affecting direct cytotoxicity, Prdm1 deficiency leads to increased cancer metastasis and reduced secretion of key molecules like IFN-γ, pointing to its importance in immune regulation. The use of single-cell RNA sequencing further underscores Prdm1's role in cellular communication within the liver's immune milieu. This study is a robust contribution to the field, providing insights that could inform new immunotherapy approaches for liver cancer.

      Strengths:

      The study's strength lies in its comprehensive approach, combining the specificity of Prdm1 conditional deletion in Ncr1-cre mice with integrative omics analyses and cutting-edge cytometry to delineate Prdm1's role in liver Type 1 ILC biology and its functional implications in tumor immunity. This multifaceted strategy not only clarifies Prdm1's influence on ILC composition and maturation but also conveys potential therapeutic insights for liver cancer immunotherapy.

      We sincerely appreciate your interest and critical assessment of our manuscript. We have carefully read your comments and suggestions, and I am truly grateful for your expert guidance. We have worked on addressing each of your concerns and comments, and below we provide a point-to-point response. Please find the detailed responses below:

      Weakness

      Comment 1: A notable weakness of the study is the limited scope of in vivo disease models, primarily relying on the B16F10 melanoma model, which may not fully capture the complex behavior of Type 1 ILCs across diverse cancer types. Furthermore, the absence of direct human data, such as the effects of PRDM1 deletion in human NK cells or stem cells during their differentiation into NK and ILC1, leaves a gap in translating these findings to clinical settings.

      Response 1: We appreciate the reviewer for raising these important points, which we see as a unique opportunity for future work to transform our understanding of Prdm1 and its targets as opposed to a weakness of the present study. 

      In our revised manuscript, we have discussed these limitations of our study (page 29; line 602-609):

      “While our findings underscore the importance of Prdm1 in liver cNK cells and ILC1s tumor immune surveillance, it does not be validated in human NK cells, whereas previous studies have found that PRDM1 might inhibit the proliferation and function of human NK cells (33, 73). Furthermore, we not provided an in-depth evaluation in multiple tumor models. Further research may provide deeper insight into the role of PRDM1 in the anti-tumor function of human NK cells, enabling a more direct investigation of its application in cancer therapies. Given its important role in preserving liver cNK cells and ILC1s functional heterogeneity, enhancing Prdm1 function in human NK cells could potentially be a strategy to promote NK cell-based immunotherapy for cancer.”.

      Recommendations For The Authors:

      (Introduction) 

      Comment 2: Reference 1 appears slightly misplaced. You might find the nomenclature discussion in Spits et al., Nature Reviews Immunology, 2013, more appropriate.

      Response 2: We are really sorry for our inaccurate descriptions. According to Spits et al., (Spits et al., Nature Reviews Immunology, 2013) and other related studies, we have now adopted a more appropriate nomenclature as “Conventional NK cells” correspond to “cNK cells”, “Type 1 innate lymphoid cells” to “ILC1s”, and “Group 1 ILC” as the collective name of cNK and ILC1s. 

      The definition of these cells was described in the introduction (page 4, line 52-53; line58-62): 

      “Group 1 ILCs consist of cNK cells and ILC1s (1, 2), with distinct developmental trajectories and effect molecules (3).”, “In a state of homeostasis, liver group 1 ILCs (CD45+CD3-NK1.1+NKp46+) can be discriminated into cNK cells and ILC1s by the differential expression of CD49a and CD49b (2): cNK cells are marked by the expression of CD49b, while liver ILC1s exhibit a distinctive positivity for CD49a. Tumor Necrosis Factor Related Apoptosis Inducing Ligand (TRAIL) is also expressed on liver ILC1s, but not on cNK cells (10, 11).”. 

      We also describe cNK and ILC1 phenotypes in our scRNA-seq data, as shown in page 13; line 259-261: 

      “cNK cells expressed high levels of Itga2 (CD49b) and Eomes, while ILC1s had high levels expression of Itga1 (CD49a) and Tnfsf10 (Supplemental Figure 5, F and G).”.

      Comment 3: It has come to my attention that Reference 9 has been retracted. I recommend removing this citation to maintain the integrity of your references (https://doi.org/10.1182/blood.2023022801).

      Response 3: We thank the reviewer’s comment and we now have removed this citation.

      Comment 4: For a more comprehensive context around reference 15, consider citing Thierry Walzer's work ([https://rupress.org/jem/article/211/3/563/41636/T-bet-and-Eomes-instruct-thedevelopment-of-two)]) which aligns closely with your discussion.

      Response 4: We agree with the reviewer’s suggestion and have added this citation in our introduction (page 4; line 64-66):

      “Liver environment facilitated T-bet expression in the early stage of NK cells development, which results in Eomes repression. The repression of T-bet is required for Eomes+ NK cells (17).”.

      (Results) 

      Comment 5: The NK cell signature referenced in 32 has been questioned for its reliability as discussed by Cursons et al., CRI 2019 (https://pubmed.ncbi.nlm.nih.gov/31088844/). Reanalysis of data in Figure 1 B/C and Supplementary Figure 1 with the refined NK cell signature from Curson's work would be advantageous.

      Response 5: We thank the reviewer’s comment. As requested, we reanalyzed our data using the refined NK cell signature from Cursons et al. (revised Figure 1 A-C; revised Supplemental Figure 1). Of note, the overall survival of liver cancer (LIHC) patients only reached statistics significance when compared high and low expression of refined PRDM1-NK signature with a median cutoff (Figure 1, A-C). The overall survival performed with quartile high and low expression of refined PRDM1-NK signature was moved to supplemental figure 1, G-I. 

      The original text is: “Examination of 363 liver hepatocellular carcinoma (LIHC) patient samples from The Cancer Genome Atlas (TCGA) revealed a positive correlation between the expression of NK cell-associated genes (NCR1, NCR3, KLRB1, CD160, and PRF1) (32) and PRDM1 expression (Figure 1A). Patients with top and bottom quartiles of NK-PRDM1 signature expression were chosen for survival analysis (Figure 1B). Notably, patients with the NK-PRDM1_hi signature had better overall survival compared to the these with NK-_PRDM1_lo signature (Figure 1C). Similar results were also found in skin cutaneous melanoma (SKCM, n=454) and lung adenocarcinoma (LUAD, n=497) patients (Supplemental Figure 1, A-F). These data suggested that _PRDM1 in NK cells might be essential for immune surveillance in some solid tumors, including liver cancer. These findings prompted us to investigate the impact and mechanism of PRDM1 in NK cells and ILC1 within the context of liver cancer.”

      We have rewritten this part in our revised manuscript (page 7; line 119-132): 

      “Examination of 363 liver hepatocellular carcinoma (LIHC) patient samples from The Cancer Genome Atlas (TCGA) revealed a positive correlation between the expression of NK cell-associated genes (34) (NCR1, KLRB1, CD160, PRF1, etc.) and PRDM1 expression (Figure 1A). The patients are ordered from highest to lowest based on the expression of NK-Prdm1 for survival analysis (Figure 1B). Notably, patients exhibiting higher levels of NK-PRDM1 expression (above the median) experienced better survival outcomes compared to those with lower levels of NK-PRDM1 expression (below the median) (Figure 1C). Similar results were also found in skin cutaneous melanoma (SKCM, n=454) and lung adenocarcinoma (LUAD, n=497) patients (Supplemental Figure 1, A-F). Patients within the highest quartile of NK-PRDM1 signature expression demonstrated enhanced overall survival, a result that achieved statistical significance in LUAD and SKCM patients (Supplemental Figure 1, G-I). These data suggested that PRDM1 in NK cells might be essential for immune surveillance in solid tumors, including liver cancer, and prompted us to investigate the function and mechanism of PRDM1 in NK cells and ILC1 within the context of liver cancer.”.

      Comment 6: The origin of the Ncr1-cre mice utilised should be clarified; is this the line developed by Eric Vivier? (https://www.pnas.org/doi/10.1073/pnas.1112064108).

      Response 6: We did not use the line developed by Eric Vivier, our Ncr1-cre mice was purchase from Shanghai Model Organism Center, Inc.. We described this in our method parts (page 29-30; line 612-614): 

      Prdm1fl/fl mice were purchased from The Jackson Laboratory. Ncr1-iCre and B2m-/- mice were purchased from Shanghai Model Organisms Center, Inc.. Six- to twelve-week-old littermates were used for the experiment.”

      Comment 7: Considering the known reduction of Ncr1 expression in Ncr1-cre mice and its implications, it is recommended to repeat the B16F10 experiments with the correct control, Ncr1cre/+ Prdm1+/+.

      Response 7: This is an excellent question, and it has been raised by another reviewer and comprehensively answered (Reviewer 1, Comment 1). The answer is below: 

      The expression of Cre and the insertion of loxP sequences both have the potential to influence gene expression. This is because the region where loxP is inserted may contain regulatory sequences for the gene of interest. Ncr1-Cre is a frequently used transgenic mouse model in our laboratory. In our prior research, we also had concerns about the possible impact of Cre on NKp46 expression, which could lead to a decline in NK cell function. Therefore, in our previous studies focused on Smad4 expression in NK cells, we conducted similar experiments. In Figure 6 of our published paper in the Journal of Clinical Investigation (Wang et al., J Clin Invest, 2018), we compared NKp46iCreTgfbr2fl/flSmad4fl/WT with NKp46-iCreTgfbr2fl/flSmad4fl/fl. Although the primary purpose is to establish Smad4's independence from TGF-β, it also allows for a comparison between Smad4fl/fl and Smad4fl/WT in the presence of Cre. In the critical phenotype we assessed, NKp46iCreTgfbr2fl/flSmad4fl/fl (compared with NKp46-iCreTgfbr2fl/flSmad4fl/WT) exhibited the same phenotype as NKp46-iCreSmad4fl/fl (compared with NKp46WTSmad4fl/fl). This suggests that Cre's influence on NK cells may be within a reasonable and controllable range. Furthermore, in contrast to the decrease in Ncr1 expression caused by Cre, the reduction in the expression levels of genes targeted by Loxp knockout, such as Prdm1 in this study (Figure 1 E), is more significant. Therefore, with the current techniques and research methods, we believe that the data provided in this study can support the role of Prdm1 in NK cells.

      Comment 8: The proportion of ILC1 in wild-type mouse livers is notably higher than standard references. Could you confirm whether liver perfusion was performed before analysis? This procedure was not clearly detailed in the methods section.

      Response 8: We apologize that we did not provide enough detail regarding this point in our original method. We had performed the liver perfusion before analysis. This has now been clarified in the method section of the revised text (page 30-31; line 630-636): 

      “Mice were perfused with 1◊ PBS by portal vein puncture before harvesting tissues. Liver and lung was digested with 0.05% collagenase II for 30 minutes and filtered through 70 µm cell strainers, and mononuclear cells were isolated after subjected to density gradient using 30% and 70% percoll. Spleen were also removed and pressed through 70 µm filterers to obtain splenocytes. Peripheral blood mononuclear cells were obtained from peripheral blood after lysis of red blood cells (Biolegend, 420301). Flushing femurs and mechanical disruption of inguinal lymph nodes were performed to obtain cells from bone marrow and lymph nodes.”.

      The lymphocyte proportions in mice from different laboratories may exhibit slight variations, possibly due to genetic background disparities. To minimize the influence of genetic backgrounds, paired littermates were used in the current study, wherein one is Prdm1 WT and the other has the Prdm1 gene knocked out in NK cells.

      Comment 9: There appears to be inconsistency in reference formatting; for instance, Ref 39 does not match the formatting of other references. A thorough review of your citation format is suggested.

      Response 9: We apologize for the inadvertent errors and we reviewed the citation format.

      Comment 10: The information in Figures 2B and C may be better suited to the supplementary section as it does not significantly contribute to the main text.

      Response 10: We agree with the reviewer’s suggestion and these are now moved to supplementary figures (Supplemental Figure 2).

      Comment 11: The citation of reference 40 could be strengthened by including Sathe et al., 2014, which directly pertains to your findings (https://www.nature.com/articles/ncomms5539).

      Response 11: We added the suggested reference.

      Comment 12: Can the findings presented in Figure 2D/F be replicated using alternative models?

      This would substantiate the versatility of your results.

      Response 12: The current predominant in vivo tumor model for NK cells is primarily based on the use of B16F10 melanoma cells. These melanoma cells, with their low expression of MHC-I molecules, evade T cell-mediated immune surveillance, rendering them ideal targets for NK cells. Typically, this experimental melanoma metastasis assay involves tail vein injection, followed by nodules' detection in the lungs. To align with our investigation of liver-resident cNK and ILC1, we've introduced splenic injection (via the portal vein) and evaluated melanoma metastasis in the liver to reflect the anti-tumor capabilities of liver group 1 ILCs. We also explored subcutaneous tumor models, but we believe they may not effectively support Prdm1's role in cNK cells, particularly liver-resident NK cells and ILC1. While we've experimented with models using mouse liver tumor cells like Hepa 1-6, we found them less stable than B16F10 and less conducive to quantification. Should more suitable models or cells line emerge, we remain open to exploring them in future research.

      Comment 13: The absence of in vitro killing assessments against B16F10 and YAC-1 leaves a gap in the NK cell characterisation which would be valuable to address.

      Response 13: Isolating NK cells for ex vivo cytotoxicity assays typically requires stimulation with high concentrations of IL-2. Under such high IL-2 stimulation, many intracellular differences that contribute to difference in cytotoxicity, such as changes in transcription factors, are often masked. Another issue is that current ex vivo NK cell cytotoxicity assays often only isolate NK cells from the spleen. Liver-resident NK cells, on the other hand, are often limited in quantity and isolation methods, making it challenging to conduct ex vivo cytotoxicity assays effectively. If more sensitive detection methods become available, we will also incorporate ex vivo data into our future research endeavors.

      Comment 14: The suggestion that NK cells produce IL-6 is indeed a bold one, and without additional validation through intracellular cytokine detection or ELISA, it may be prudent to omit these claims.

      Response 14: We have checked the GSEA results, and found no valuable genes in IL-6 production.

      Therefore, we have removed this figure.

      Comment 15: The lack of fluorescence minus one (FMO) controls in Figure 3 and Supplementary

      Figure 4 is noted; including these would enhance the validity of your gating strategies.

      Response 15: As requested, we add the FMO controls in aforementioned figures.

      Comment 16: There seems to be a minor mix-up in referring to Figure 4A in the scRNAseq results section, perhaps it was intended to refer to Figure 3A?

      Response 16: We have corrected this part (line 247). We also double checked corrected the inaccuracies in the references to the figures. we apologize for the inadvertent errors.

      Comment 17: The rich datasets generated from bulk and scRNAseq are commendable. However, I urge you to make these datasets publicly accessible with a GEO accession number.

      Response 17: We appreciate the suggestion from the reviewer. We plan to upload our datasets when in the last version of our manuscript, which is also the request of the eLife policy.

      Comment 18: Figure 4K is insightful, yet a similar analysis of the ILC1 cluster could provide a more rounded understanding.

      Response 18: We thank the reviewer for the comments. We provide the similar analysis of ILC1s, as showing in revised Figure 5H. 

      Comment 19: The metabolic RNA signatures featured in Supplementary Figure 6 are intriguing and warrant further validation, perhaps through Seahorse analysis. Such validation could merit their inclusion in the main figures.

      Response 19: This is a very good suggestion. Currently, our data offer only limited indications in this context. We have chosen to validate some aspects of Prmd1's influence on cytotoxicity molecules. As for Prdm1's impact on other aspects of NK cells, such as metabolic functions, we may explore further in future research. Additionally, we hope that by publishing our research findings, laboratories worldwide can draw insights for their own studies and conduct relevant research based on this data.

      Comment 20: It is difficult to discern whether the cells depicted in Figure 7D are truly tumorinfiltrating ILC1 or NK cells that have adopted ILC1-like characteristics. Intravenous injection of CD45-PE could clarify this distinction, and if they are the latter, it may be more appropriate to refer to them as ILC1-like cells.

      Response 20: We completely agree with the reviewer's suggestion that "tumor-infiltrating lymphocytes" may not be accurate for the current experiment. Therefore, in the revised manuscript, we have changed it to "liver cNK or ILC1 from tumor-bearing livers.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      All of the reviewers indicate that their major concerns have been adequately addressed, but they each have a few comments that the authors should consider before submitting a final version (without further review) for publication. For example, a statement about the sex of the mice used in the studies and whether any differences were noted if both sexes were used. The idea that the loss of glutamate transport might affect NA loading into vesicles is also worth considering. Finally, the authors might want to mention that the role of neuropeptide release from NA neurons needs further examination. 

      As noted in the prior submitted revision, all experiments contained both males and females and this was addressed in our re-submission. In our analysis of breathing and metabolism, sex was included in the analysis and no significant phenotypic difference was observed (The statement of no sex difference is in line 451-456). For the fate map and in situ experiments, although the group size is small, we did not see obvious differences in the expression patterns in the three glutamate transporters between females and males (line 347-350). All the anatomical and phenotypic data in this manuscript are presented as combined graphs (figure 1, figure 1 supplement 1, figure 2, figure 2 supplement 2, figure 4,5,6,7) and we had differentially labeled our data points by sex (female data is pink and male data is blue).

      The possibility that loss of Vglut2 might affect NA release has been added in the discussion (line 485-491) of the current revision. Dopamine Beta Hydroxylase (DBH) converts dopamine to noradrenaline in the vesicles, thus, glutamate may not directly affect noradrenaline loading into vesicles. However, since loss of Vglut2 reduced dopamine release in subsets of dopaminergic neurons, it remains possible that glutamate affects dopamine loading in NA neurons and in turn perturbs DA to NA conversion in the vesicle by DBH and subsequent noradrenaline release. Future work could examine this hypothesis using fast-scan cyclic voltammetry (FSCV) or microdialysis.

      The further examination of the role of neuropeptide release from NA neurons is mentioned in the discussion (line 491-494 and line 497-499 of the pre).

      eLife assessment

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments provide compelling evidence that conditional deletion of vesicular glutamate transporters from noradrenergic neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. This study provides an important contribution to our understanding of how noradrenergic neurons regulate respiratory homeostasis in conscious adult mice. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments show that conditional deletion of Vglut2 in NA neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. Their observations challenge the importance of glutamatergic signaling from Vglut2 expressing NA neurons in normal respiratory homeostasis in conscious adult mice. 

      Strengths:

      The comprehensive Vglut1, Vglut2, and Vglut3 co-expression profiles in the central noradrenergic system and the combined measurements of breathing and oxygen consumption are two major strengths of this study. Observations from these experiments provide previously undescribed insights into (1) expression patterns for subtypes of the vesicular glutamate transporter protein in the noradrenergic system and (2) the dispensable nature of Vglut2dependent glutamate signaling from noradrenergic neurons to breathing responses to physiologically relevant gas challenges in adult conscious mice. 

      Weaknesses:

      Although the cellular expression profiles for the vesicular glutamate transporters are provided, the study does not document that glutamatergic-based signaling originating from noradrenergic neurons is evident at the cellular level under normal, hypoxic, and/or hypercapnic conditions. The authors effectively recognize this issue and appropriately discuss their findings in this context. 

      We thank the reviewer for the positive evaluation of our work.

      Reviewer #2 (Public Review):

      The authors characterized the recombinase-based cumulative fate maps for vesicular glutamate transporters (Vglut1, Vglut2 and Vglut3) expression and compared those maps to their realtime expression profiles in central NA neurons by RNA in situ hybridization in adult mice. Authors have revealed a new and intriguing expression pattern for Vglut2, along with an entirely uncharted co-expression domain for Vglut3 within central noradrenergic neurons. Interestingly, and in contrast to previous studies, the authors demonstrated that glutamatergic signaling in central noradrenergic neurons does not exert any influence on breathing and metabolic control either under normoxic/normocapnic conditions or after chemoreflex stimulation. Also, they showed for the first-time the Vglut3-expressing NA population in C2/A2 nuclei. In addition, they were also able to demonstrate Vglut2 expression in anterior NA populations, such as LC neurons, by using more refined techniques, unlike previous studies. 

      A major strength of the study is the use of a set of techniques to investigate the participation of NA-based glutamatergic signaling in breathing and metabolic control. The authors provided a full characterization of the recombinase-based cumulative fate maps for Vglut transporters. They performed real-time mRNA expression of Vglut transporters in central NA neurons of adult mice. Further, they evaluated the effect of knocking down Vglut2 expression in NA neurons using a DBH-Cre; Vglut2cKO mice on breathing and control in unanesthetized mice. Finally, they injected the AAV virus containing Cre-dependent Td tomato into LC of v-Glut2 Cre mice to verify the VGlut2 expression in LC-NA neurons. A very positive aspect of the article is that the authors combined ventilation with metabolic measurements. This integration holds

      particular significance, especially when delving into the exploration of respiratory chemosensitivity. Furthermore, the sample size of the experiments is excellent.  Despite the clear strengths of the paper, some weaknesses exist. It is not clear in the manuscript if the experiments were performed in males and females and if the data were combined. I believe that the study would have benefited from a more comprehensive analysis exploring the sex specific differences. The reason I think this is particularly relevant is the developmental disorders mentioned by the authors, such as SIDS and Rett syndrome, which could potentially arise from disruptions in central noradrenergic (NA) function, exhibit varying degrees of sex predominance. Moreover, some of the noradrenergic cell groups are sexually dimorphic. For instance, female Wistar rats exhibit a larger LC size and more LC-NA neurons than male subjects (Pinos et al., 2001; Garcia-Falgueras et al., 2005). More recently, a detailed transcriptional profiling investigation has unveiled the identities of over 3,000 genes in the LC. This revelation has highlighted significant sexual dimorphisms, with more than 100 genes exhibiting differential expression within LC-NA neurons at the transcript level. Furthermore, this investigation has convincingly showcased that these distinct gene expression patterns have the capacity to elicit disparate behavioral responses between sexes (Mulvey et al., 2018).

      Therefore, the authors should compare the fate maps, Vglut transporters in males and females, at least considering LC-NA neurons. Even in the absence of identified sex differences, this information retains significant importance. 

      An important point well raised by the authors is that although suggestive, these experiments do not definitively rule out that NA-Vglut2 based glutamatergic signaling has a role in breathing control. Subsequent experiments will be necessary to validate this hypothesis. 

      An improvement could be made in terms of measuring body temperature. Opting for implanted sensors over rectal probes would circumvent the need to open the chamber, thereby preventing alterations in gas composition during respiratory measurements. Further, what happens to body temperature phenotype in these animals under different gas exposures? These data should be included in the Tables. 

      Is it plausible that another neurotransmitter within NA neurons might be released in higher amounts in DBH-Cre; Vglut2 cKO mice to compensate for the deficiency in glutamate and prevent changes in ventilation? 

      Continuing along the same line of inquiry is there a possibility that Vglut2 cKO from NA neurons not only eliminates glutamate release but also reduces NA release? A similar mechanism was previously found in VGLUT2 cKO from DA neurons in previous studies (Alsio et al., 2011; Fortin et al., 2012; Hnasko et al., 2010). Additionally, does glutamate play a role in the vesicular loading of NA? Therefore, could the lack of effect on breathing be explained by the lack of noradrenaline and not glutamate? 

      We thank the reviewer for the positive evaluation and further suggestions. Please see our response in “Author Response” to the previous version of Reviewer #2 (Public review).

      Reviewer #4 (Public Review): 

      Summary:

      Although previous research suggested that noradrenergic glutamatergic signaling could influence respiratory control, the work performed by Chang and colleagues reveals that excitatory (specifically Vglut2) neurons is dynamically and widely expressed throughout the central noradrenergic system, but it is not significantly crucial to change baseline breathing as well the hypercapnia and hypoxia ventilatory responses. The central point that will make a significant change in the field is how NA-glutamate transmission may influence breathing control and the dysfunction of NA neurons in respiratory disorders. 

      Strengths:

      There are several strengths such as the comprehensive analysis of Vglut1, Vglut2, and Vglut3 expression in the central noradrenergic system and the combined measurements of breathing parameters in conscious unrestrained mice. 

      Other considerations :

      These results strongly suggest that glutamate may not be necessary for modulating breathing under normal conditions or even when faced with high levels of carbon dioxide (hypercapnia) or low oxygen levels (hypoxia). This finding is unexpected, considering many studies have underscored glutamate's vital role in respiratory regulation, more so than catecholamines. This leads us to question the significance of catecholamines in controlling respiration. Moreover, if glutamate is not essential for this function, we need to explore its role in other physiological processes such as sympathetic nerve activity (SNA), thermoregulation, and sensory physiology. 

      We thank the reviewer for the positive evaluation and further suggestions. The potential role of noradrenergic-derived glutamate in other processes, which is beyond the scope of this study, should be addressed in the future.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      All of my concerns were effectively resolved, leading me to accept the paper. However, I suggest that the authors consider investing in a more reliable system for measuring body temperature, as accurate measurements of this parameter are crucial for whole body plethysmography. 

      Thank you for the suggestion. The real-time measurement of body temperature is a goal in future studies.

      Reviewer #4 (Recommendations For The Authors):

      Because I am revising a revised version, I believe the authors have addressed most, if not all, the concerns raised by already 3 reviewers. In my understanding the authors achieved their aims and the results are totally supported by the conclusions. The impact of this work on the respiratory field is significant and is likely to advance the field. The methods and data utilized, which combine standard techniques with genetic tools, will be highly beneficial to the research community. 

      In my understanding I still have one concern that if glutamate is not critical, then what is? Could we potentially disable the noradrenergic (NA) system while preserving glutamate functionality to determine if the NA system is indeed crucial for respiratory physiology? This approach might provide clearer insights into the mechanisms underlying respiratory control. 

      We agree that there remain several exciting questions about the respective roles of noradrenaline, glutamate, and other neuropeptides such as Neuropeptide Y (NPY) and galanin. We are currently devising strategies to address the respective and combinatorial roles for all these candidates in breathing control. Most simply, we can conditionally, mutagenized each of them in the central noradrenergic system in an acute manner using DBH-CreER mice to determine if any of them are critical to respiratory control with the advantage of minimizing developmental compensatory events.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors evaluated a novel eIF2B activator, DNL343, in two mouse models representing different forms of the integrated stress response (ISR). They first assessed the pharmacokinetics of DNL343, demonstrating its ability to cross the blood-brain barrier and exhibit good bioavailability. In an acute ISR model induced by optic nerve crush (ONC) injury, DNL343 treatment reduced ISR-induced transcriptional changes and neuronal loss, demonstrating neuroprotective effects. Next, the authors generated an eIF2B loss-of-function mice model by knocking in disease-causing Eif2b5 variants. The model presents a chronic ISR and mimics vanishing white matter disease (VWMD). DNL343 treatment from the pre-symptomatic stage improved body weight and motor functions corrected transcriptional changes, and reversed proteomic and metabolomic alterations in the brain and cerebrospinal fluid. DNL343 treatment initiated at an advanced disease stage also showed positive effects, restoring body weight gain, suppressing ISR, reducing neurodegeneration biomarkers, and extending lifespan. These findings highlight DNL343 as an effective ISR inhibitor with potential applications in treating VWMD and other neurodegenerative disorders involving ISR.

      Strengths:

      The study's findings regarding the novel compound DNL343 offer significant promise in addressing VWMD, a condition currently lacking disease-modifying treatment. DNL343 directly targets eIF2B, the disease-causing complex in VWMD, and demonstrates notable efficacy in reversing the integrated stress response (ISR) and mitigating neurodegeneration in a VWMD mouse model. These results raise hope for the potential application of DNL343 in VWMD treatment, a development eagerly anticipated by patients and the VWMD research community. Moreover, the study hints at the broader potential of DNL343 in treating other ISR-related neurodegenerative disorders, such as amyotrophic lateral sclerosis, a prospect that holds broader interest. Additionally, the study's identification of potential biomarkers for VWMD represents a notable strength, potentially leading to improved disease progression assessment pending further confirmation in future research.

      Weaknesses:

      There are a couple of notable concerns in this study. Firstly, while the in vivo evidence strongly supports the efficacy of DNL343 in mitigating ISR and neurodegeneration, there is a lack of direct biochemical evidence to confirm its activity in eIF2B activation. Secondly, the potential for cardiovascular toxicity, which has been reported for a related eIF2B activator in a canine model (as mentioned in the manuscript), has not been evaluated for DNL343 in this study. This data gap regarding toxicity could be crucial for informing the future development of DNL343 for potential human use. Further investigation into these areas would be valuable for a comprehensive understanding of the compound's mechanisms and safety profile.

      We thank the reviewer for the thoughtful feedback and an opportunity to provide further clarification. To address the first question regarding biochemical evidence of the mechanism of action of DNL343, we agree that additional data is helpful to interpreting the results presented in this manuscript. We now include a citation to Craig et al (Craig, R.A., 2nd, J. De Vicente, A.A. Estrada, J.A. Feng, K.W. Lexa, M.J. Canet, W.E. Dowdle, R.I. Erickson, B.N. Flores, P.C.G. Haddick, L.A. Kane, J.W. Lewcock, N.J. Moerke, S.B. Poda, Z. Sweeney, R.H. Takahashi, V. Tong, J. Wang, E. Yulyaningsih, H. Solanoy, K. Scearce-Levie, P.E. Sanchez, L. Tang, M. Xu, R. Zhang and M. Osipov (2024). "Discovery of DNL343: A Potent, Selective, and Brain-Penetrant eIF2B Activator Designed for the Treatment of Neurodegenerative Diseases." J Med Chem.) which includes the full details on the discovery and characterization of DNL343.

      On the question of cardiovascular toxicity observed with previous eIF2B activating compounds, Craig et al also provides evidence in a non-human primate (cynomolgus monkey) model that DNL343 dosing did not result in QT prolongation or any functional cardiac changes. We have also completed a Phase 1 (NCT04268784) and Phase 1B double-blind (NCT05006352) trials in healthy and ALS participants, respectively and these trials are referenced on page 4, lines 102-103. The safety profile observed in these clinical studies supported further development of DNL343 for ALS in the Healey Platform trial (NCT04297683, Regimen G).

      Reviewer #2 (Public Review):

      Summary:

      The authors developed DNL343, a CNS-penetrant small molecule integrated stress response (ISR) inhibitor, to treat neurodegenerative diseases caused by ISR.

      Strengths:

      DNL343 is an investigational CNS-penetrant small molecule integrated stress response (ISR) inhibitor designed to activate the eukaryotic initiation factor 2B (eIF2B) and suppress aberrant ISR activation. The therapeutic efficacy of DNL343 has been extensively characterized in two animal models. Importantly, plasma biomarkers of neuroinflammation and neurodegeneration can be reversed with DNL343 treatment. Remarkably, several of these biomarkers show differential levels in CSF and plasma from patients with vanishing white matter disease (VWMD) upon DNL343 treatment. Overall, this is a very exciting study to target ISR for therapeutic interventions.

      Weaknesses:

      My main questions center around the characterization of DNL343.

      (1) Is there any biochemical evidence showing DNL343 activates eIF2B, such as binding assays or in vitro biochemical activity assays? A conference presentation was cited - "Osipov, M. (2022). Discovery of DNL343: a Potent Selective and Brain-penetrant eIF2B Activator Designed for the Treatment of Neurodegenerative Diseases. Medicinal Chemistry Gordon Research Conference. New London, NH." However, there needs to be public information about this presentation.

      Information from this presentation and more details on the discovery and characterization of DNL343 can be found in Craig et al J Med Chem (2024) and this citation has been replaced.

      (2) How was the selectivity of DNL343 demonstrated? What are the off-targets of DNL343, in particular when DNL343 is administered at a high dose? Thermal-proteasome profiling or photoaffinity labeling experiments could be considered.

      Please see Craig et al J Med Chem (2024) for full details. In brief, there were no significant off target effects observed for DNL343 in a Cerep panel.

      (3) What are the total drug concentrations in the brain and plasma? What are the unbound ratios?

      Following a single oral dose of DNL343 in mice, unbound brain-to-unbound plasma exposures ratios (Kp,uu) of 0.8 to 1.1 were observed, indicating high CNS penetrance. This was further supported by CSF-to-unbound plasma exposures ratios at 0.9 in the same mouse study. The CNS penetrance was also confirmed in rats and NHP by CSF-to-unbound plasma ratios near unity as reported in Craig et al J Med Chem (2024).

      (4) If DNL343 is given intravenously, what are the concentrations in the brain and plasma after 5 minutes and 1 hour or longer time points? In other words, does DNL343 cross BBB through passive diffusion or an active process?

      Unbound brain-to-unbound plasma exposure ratios following a single oral dose in the mouse were 0.8 to 1.1 and showed no time dependence. These measurements were made prior to, near, and following plasma tmax of DNL343, indicating unbound DNL343 crosses the BBB through passive diffusion and rapidly reached equilibrium between the brain and systemic circulation. Details can be found in Craig et al J Med Chem (2024).

      (5) What is the complete PK profile of DNL343 for intravenous and oral dosing?

      DNL343 administered orally to mice as a suspension formulation showed plasma PK consistent with prolonged absorption with tmax ranging from 3 to 4 h, and a terminal elimination half-life (t1/2) of ~10 h. Details can be found in Craig et al J Med Chem (2024).

      (6) Are there any major drug metabolites that could be of concern?

      DNL343 metabolism is through Phase 1 biotransformation pathways. None of the in vivo circulating metabolites show potency towards eIF2B activation. Given that none of these metabolites are of concern, we believe this information is beyond the scope of the current manuscript.

      Reviewer #3 (Public Review):

      Summary:

      ISR contributes to the pathogenesis of multiple neurodegenerative diseases, such as ALS, FTD, VWMD, etc. Targeting ISR is a promising avenue for potential therapeutics. However, previously identified ways to target ISR present some challenges. PERK inhibitors suppress ISR by inhibiting eIF2alpha phosphorylation and cause pancreatic toxicity in mice. In order to bypass eIF2alpha, previous studies have identified ISR suppressors that target eIF2B, such as ISRIB and 2BAct. These molecules suppress neurodegeneration but do not cause detrimental effects in mouse models. However, ISRIB is water-insoluble, and 2BAct causes cardiovascular complications in dogs, preventing their use in clinics. Here, the authors showed that DNL343, a new ISR inhibitor targeting eIF2B, suppresses neurodegeneration in mouse models. Combined with their previous results of a clinical phase I trial showing the safety of DNL343, these findings suggest the promise of DNL343 as a potential drug for neurodegenerative diseases in which ISR contributes to pathogenesis.

      Strengths:

      The finding is important and has disease implications, and the conclusion is not surprising.

      Weaknesses:

      The experimental design and data are hard to comprehend for an audience with a basic research background. This reviewer suggests that the authors use the same way that previous studies on ISRIB and 2BAct (e.g., Wong et al; eLife, 2019) designed experiments and interpret data.

      We thank this reviewer for their feedback and recognition that DNL343 has a promising potential as treatment for neurodegenerative diseases. While our studies share some similarities to Wong et al., eLife (2019) and Abbink et al., ACTN (2019), our study design is intentionally distinct (e.g. inclusion of both prevention and treatment dosing paradigms, determining dose-response impact of drug treatment across biomarkers) which necessitates tailored data visualization to effectively communicate our findings. However, we understand the importance of clarity for a broader audience and to this end, we have made a number of changes to the data figures, in particular data from omics experiments in Figures 3 and 5. We also provided additional supplemental tables to aid data interpretation. This would hopefully cater to both audiences familiar with previous work and those with a less specialized background.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Demyelination is a significant pathological feature in the VWMD mouse model. The authors should clarify whether they observed similar demyelination in their study and if DNL343 had any impact on reversing this demyelination. These findings are crucial for assessing the compound's effectiveness in mitigating neurodegeneration.

      Demyelination is indeed an important feature in the eIF2B LOF (VWMD) mouse model. Given that this phenotype and the ability to rescue the histological phenotype with this MOA (Wong et al; eLife, 2019, cited in introduction) is very well characterized, along with our limitation from the size and number of mouse tissues, we prioritized non-histological targeted and unbiased analyses that were aimed at identifying translatable biomarkers. Nonetheless, the totality of our data, in different mouse models and cell types, strongly supports DNL343 as a potent ISR inhibitor that is effective in attenuating neurodegeneration:

      · In the optic nerve crush model, DNL343 dose-dependently reduced retinal cell degeneration

      · In the VWMD mouse model, DNL343 attenuated the increase in a plasma biomarker of neurodegeneration, neurofilament-light, which corresponded to normalization in motor function.

      · Metabolomic and lipidomic analyses in the VWMD mouse model brain showed increases in oxysterols, such as 7-ketocholesterol, and cholesterol esters and these lipids are associated with demyelination (Nugent et al, 2020). DNL343 treatment attenuated the levels of these oxysterols, indicating decreased demyelination.

      · When initiated at an advance disease stage, reversal of plasma biomarkers of neurodegeneration (Nf-L) and neuroinflammation (GFAP) by DNL343 in this model was accompanied by extension in the lifespan that is otherwise shortened as the mutant animals succumb to disease.

      These data highlight the potential therapeutic benefits of DNL343 in the broader context of ISR-mediated neurodegeneration which can include but may not be limited to VWMD.

      (2) Figure 6 presents several biomarkers with significantly increased levels in VWMD mice and patient biofluids. However, these biomarkers are not reflected in the brain proteomics data presented in Figure 3. The discrepancy between these findings should be addressed and discussed in the manuscript to provide a more comprehensive understanding.

      Proteins detected in Figure 6 were not detected by TMT proteomics in the CSF. In the brain, only GFAP was detected and the overall abundance in tissue were similar in both genetic groups. Cytokines such as TIMP1, MCP1 are usually present in low abundances and therefore are challenging to detect in broad discovery proteomics method applied in this study. Antibody-based immunoassays are better suited to specifically measure low abundant proteins than mass-spectrometry-based proteomics, while mass-spectrometry based methods offer wider dynamic range to detect more highly abundant proteins. Differences in detection sensitivity between immunoassay vs mass spectrometry assays has been previously noted (Petrera et al, J Proteome Res, 2021). We have added new text to address this point in the revised manuscript (page 7, line 274-277).

      (3) Figure 7 discusses the effects of DNL343 treatment initiated at an advanced disease stage. Since the 4-week treatment did not rescue performance in the balance beam test (as shown in Figure 6A), it is important to clarify if a 20-week treatment had any impact on this parameter.

      This reviewer raised an important question that we were unfortunately unable test. When the balance beam training was administered after 8 (out of 20) weeks of dosing, most animals of both wildtype and mutant genotypes struggled to remain on or maintain balance on the beam and were unable to progress traversing the beam, making the assay unsuccessful in this cohort. This impairment appeared to be driven by distinct factors in the two genotypes: age-associated obesity in wild-type animals and severe motor impairment in the eIF2B HOM mice, irrespective of treatment. While it is possible that other less demanding and more sensitive assays could reveal more nuanced differences, this, and our earlier data (Figure 4G-I), suggest that DNL343 could prevent but not reverse functional deterioration. This is in line with our understanding of DNL343 mechanism of action that does not include neuronal regeneration, a therapeutic effect that is likely required for functional recuperation. We have added this point to the manuscript (page 8, line 319-326).

      Additionally, considering the significant increase in Gdf15 levels in the disease model, it would be valuable to know if DNL343 treatment affected Gdf15 levels. If these assays were conducted, reporting the data would greatly assist in evaluating the compound's efficacy when administered at an advanced disease stage.

      We were not able to measure GDF15 levels in the 20-week study due to limitation in the in-life collected plasma samples which was dedicated to assessing biomarkers of neurodegeneration (Figure 7E-F). However, data from our 4-week treatment study, which was initiated at a similar age range to the 20-week treatment study (19-26 and 24-33 weeks of age, respectively), showed that DNL343 was able to reduce GDF15 levels in the brain (mRNA and protein) and CSF (protein) (Supplemental Figure 5A-C), suggesting that DNL343 reduces ISR activation at an advanced disease stage in the model. We expect that this reduction observed at 4 weeks of treatment would persist for the duration of the extended treatment in the 20-week cohort.

      (4) A minor point. In Figures 5A, 5C, and 5E, it appears that the red-colored group should likely be labeled as "HOM 0 mg/kg" instead of "HOM 3 mg/kg".

      This has been amended, thank you.

      Reviewer #3 (Recommendations For The Authors):

      Major concerns:

      (1) The cellular function of DNL343 needs to be clarified. The authors claim that it activates eIF2B, but no cellular or molecular evidence is provided. Does it bind to eIF2B? Does it not affect eIF2alpha phosphorylation? Does it restore translation upon stress that causes eIF2alpha phosphorylation? Does it suppress stress granule assembly? The authors cited Sun, Tsai et al. 2023 and Osipov et al., 2022. However, these citations are conference abstracts with no published figures available for review.

      We agree that additional data outlining the biochemical evidence of the mechanism of action of DNL343 was needed. We now include a citation to Craig et al J Med Chem (2024) that includes the full details on the discovery and molecular characterization of DNL343.

      (2) It needs to be clarified how the authors selected the ISR marker genes. ISR genes are more than those selected. How about others? How did the authors measure the mRNA levels, bulk RNA-seq or RT-PCR? If the former, have the authors verified their results using RT-PCR? Have the authors measured the protein levels for nerve crush experiments (by both proteomic and individual protein analyses)? Also, no statistical analyses were found for the heat maps.

      The ISR marker genes were selected by a combination of experimental and literature data. Transcriptomics analysis of the eIF2B HOM brains was conducted using untargeted RNAseq (Supplemental Figure 1B). Here, we found an enrichment of transcripts previously reported to be ISR dependent, namely Atf4, Chac1, Ddit3, Eif4ebp1, Ppp1r15a (Larhammar et al., 2017), Atf3, Asns, Mthfd2, Psat1, Sesn2, Slc1a5, Slc7a5, Slc7a11, Trib3 (Wong et al., 2019, Abbink et al., 2019).  These transcripts were assayed using targeted qPCR in the eIF2B HOM brains, spleen and PBMC (Supplemental Figure 1A, C, D) and in the retinas from the ONC experiments (Figure 2C). We have further clarified the analysis method for the gene expression data in the figure legends.

      We did not interrogate the proteome of the retina in the ONC model. Our study in this model was intended as a proof-of-concept evaluation of DNL343 effects in this acute ISR-dependent model of neurodegeneration. To this end, we performed gene expression (Figure 2C) and immunofluorescence analyses (Figure 2D-F). Each of these analyses were conducted using dedicated whole retinas; conducting additional protein analyses would necessitate a separate cohort of animals.

      We believe that heatmaps provide the best visualization of the data, particularly the dose dependent effects of DNL343 on multiple genes, but we understand the value for also providing statistical analyses. To address this, we provide additional Supplemental tables to show the outcome of statistical analyses undertaken. Statistical data relating to Figure 2C can be found on new Supplemental Tables 1 & 2; those relating to Supplemental Figures 1A, C, and D on new Supplemental Tables 3, 5, 6, respectively; that from Figure 4D on new Supplemental Table 8, and that from Figure 7D on new Supplemental Table 11.

      (3) Both the authors and Wong et al. (eLife, 2019) performed transcriptomic analyses on HOM mice. How do the authors compare the two data sets? Are they the same?

      In this work, transcriptomic approach was applied to confirm induction of ISR response in our in vivo model. While data are not identical, all of the top annotated genes shown in supplementary figure 1B were also deemed to be significant by Wong and coworkers (Bayes factor > 10). More importantly, as explained in our responses to question #2 from reviewer 3,  ISR genes highlighted in supplementary Figure 1B were also confirmed in two other studies (Larhammar et al., 2017, Abbink et al., 2019). These data support our interpretation that eIF2B HOM have elevated ISR relative to WT mice. We have added new text to line 164 on page 5 to clarify this point.

      (4) Can the authors interpret their omic data using volcano plots for HOM rescue experiments, as Wong et al. did in eLife 2019? Heat maps with statistical analyses are more straightforward to comprehend. Can the authors verify some of these data using RT-PCR, Western blot, etc.?

      We added additional pathway interpretation in our Figure 3 and 5 to highlight key biological processes altered in the brain and cellular compartment origin of CSF proteins changed in eIF2B HOM at baseline and following treatment with DNL343. Our treatment designed employed multiple dosing levels and as such, summarization by volcano plot would have resulted in creation of many figures that can be more easily captured by a single heat map plot. However, to provide additional quantitative information, we now added supplementary tables showing full statistical analysis for all heat maps for added clarity and transparency.

      We demonstrated 100% correlation between the select genes we examined by qPCR in supplemental Figure 1A and those identified from brain by RNA-seq. In addition, question of reliability of RNA-seq data has been previously been examined in great detail (Everaet et al, Sci Rep 2017) and found ~85% concordance between RNA-seq and qPCR data and those that were discordant tended to have < 2 log2FC and were present in low abundance. Given that top core ISR genes identified in our study have >2 log2FC and have been verified by other independent labs (Larhammar et al., 2017, Abbink et al., 2019, Wong et al., 2019). Based on these, we do not think that there is a rationale need for technical confirmation of RNAseq data.

      Risks for mis-annotation of proteins in TMT data were further mitigated by removing protein with coverage < 20% and having less than 8 unique peptides detected and setting protein annotation FDR to <1%.

      Additionally, TMT-labelling based proteomics offers wider dynamic range and sensitivity than western blotting. Validation of TMT logFC data with western blot technique, which is less quantitative and has lower dynamic ranges of detection may not be very informative. Furthermore, similar trends of changes in key ISR genes and proteins shown in figures 4D and 5A (e.g PSAT, SLC7A11, SLC7A5) provides additional support for the authenticity of proteins identified in this work.

      Also, for Figures 4E and F, it is assumed that each line represents an individual animal, but why their body weight gains are so different for the wild type? Can the authors plot the mean and s.e.m.? Also, there are no data about neurodegeneration. The authors need to show microscopy images, count the numbers, and assess the morphology of nerve cells.

      The large data spread in the body weight gain in our wild-type mice reflect the normal variability of this endpoint which can be influenced by sex and age. Indeed, both factors are present in our cohorts as animals of both sexes were included and there was a 7-week age-range (10-17 weeks of age at dosing start). Each line in Figures 4E-F indeed represents data sampled from individual animal over time. We chose to represent the data this way for transparency and have provided additional visualization (new Supplemental Figure 3) showing both body weight gain and plasma Nf-L levels as mean ± SEM as requested by this reviewer.

      In this study we chose to use a clinically-relevant biomarker of neurodegeneration, plasma neurofilament light chain (NfL) (Figure 4F). This allowed us to prioritize the tissue samples from these studies to execute comprehensive unbiased analyses for more complete characterization of the phenotype of these eIF2B LoF mice. NfL is a biomarker that has been recognized as a sensitive measurement of neuronal/axonal damage regardless of cause (Gaetani et al., 2018, Khalil et al., 2018). Elevated levels of plasma (and CSF) NfL levels has been demonstrated across neurodegenerative conditions such as Alzheimer’s disease (Giacomucci et al., 2022), multiple sclerosis (Ferreira-Atuesta et al., 2021), and in ALS (Huang et al., 2018).

      (5) How ISR is connected to metabolomic changes? Can the authors explain it?

      ISR caused significant increases in amino acid transporter and serine/glycine/1-carbon metabolism enzymes transcript and protein abundances that were highlighted in Figure 3A and C and lines 237-255 in the main text. Similar patterns were also observed in prior published studies (Larhammar et al., 2017, Abbink et al., 2019, Wong et al., 2019). Consistent with these changes we observed increased levels of Alanine (transported by SLC3A2, SLC7A11, SLC7A3) and decreased cystathionine levels (associated with increased expression of CTH).  ATF4 is one of the main orchestrator of ISR response to stress (e.g., amino acid deprivation) and it is required for expression of amino acid transporters and enzymes required for synthesis non-essential amino acids (PMID: 28494858). ATF4 increases cellular amino acid uptake and deliver AA needed for synthesis of proteins and glutathione needed for survival.

      We also observed prominent changes in CE in eIF2B HOM and its normalization with DNL343 treatment shown in Figure 5C. We checked for changes in expression levels of CEL, CES1, LCAT, LIPA, SOAT1, and NCEH1 proteins involved in CE metabolism and failed to detect any changes in protein or RNA abundances.  This  suggests that a rapid demyelination is a more likely trigger for CE accumulation as reported in FTD-GRN (Marian OC et al., 2023 acta neuropathol commun 11, 52), and in experimental demyelination models (Nugent AA et al., 2020 Neuron). We have added new text to the discussion section of the manuscript page 9, lines 408-411 to discuss how these results relate to each other.

      (6) It is hard to understand the biomarker part. The authors said "potential translational biomarkers are elevated..." Do the authors mean they are elevated so they can be potential biomarkers? If their levels are unchanged (e.g., TIMP-1), how can they be biomarkers? Also, this part needs a conclusion/summary. Also, what does "reversed biomarkers..." mean?

      We have modified the text to clarify and included a concluding sentence for this section of the results (page 7, lines 297-299). In assessing whether a given protein could be a potential translational biomarker for human disease we evaluated if the following two conditions were met: (1) Increased or decreased gene expression or protein levels of the biomarker in the brain or biofluids (CSF or plasma) of Eif2b5 R191H homozygote mice relative to wild-type controls that is modulated or normalized by administration of DNL343 and (2) protein levels in biofluids from VWMD patients that show differential levels than healthy controls in the same directionality as what is seen in the mouse model. GDF-15, GFAP, and NfL meet these criteria, but TIMP-1 and MCP-1 do not.

      Minor concerns:

      (1) Please explain which multiple comparison tests the authors used.

      This information has been further clarified in the figure legends.

      (2) Administrating the drug at an advanced stage led to a trend of NfL reduction but did not rescue function. Can the authors discuss what this means?

      Further elaboration and discussion about this finding have been added to the results section on page 8, line 319-325.

      (3) For statistical analyses on the bar graphs, it would be better if the authors labeled the comparison pairs on the graphs.

      We agree that labelling comparisons in bar graphs could aid the readership and have added this modification. Additionally, comparisons are indicated in the figure legend.

      (4) The authors need to state clearly that 2BAct's cardiovascular toxicity was observed in dogs, not mice. The current study does not exclude similar DNL343 toxicity. However, previous clinical trials suggest that DNL343 may be safe for humans.

      The suggestion to specify cardiovascular toxicity in dogs has been added (page 3, line 101), thank you. We now include a citation to Craig et al J Med Chem (2024) that provides evidence in a non-human primate (cynomolgus monkey) model that DNL343 dosing did not result in QT prolongation or any functional cardiac changes. We have also completed a Phase 1 (NCT04268784) and Phase 1B double-blind (NCT05006352) trials in healthy and ALS participants, respectively and now include reference to these trials on page 4, lines 102-104. The safety profile observed in these clinical studies supported further development of DNL343 for ALS in the Healey Platform trial (NCT04297683, Regimen G).

    1. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Last and colleagues describe Ais, an open-source software package for the semi-automated segmentation of cryo-electron tomography (cryo-ET) maps. Specifically, Ais provides a graphical user interface (GUI) for the manual segmentation and annotation of specific features of interest. These manual annotations are then used as input ground-truth data for training a convolutional neural network (CNN) model, which can then be used for automatic segmentation. Ais provides the option of several CNNs so that users can compare their performance on their structures of interest in order to determine the CNN that best suits their needs. Additionally, pre-trained models can be uploaded and shared to an online database.

      Algorithms are also provided to characterize "model interactions" which allows users to define heuristic rules on how the different segmentations interact. For instance, a membrane-adjacent protein can have rules where it must colocalize a certain distance away from a membrane segmentation. Such rules can help reduce false positives; as in the case above, false negatives predicted away from membranes are eliminated.

      The authors then show how Ais can be used for particle picking and subsequent subtomogram averaging and for the segmentation of cellular tomograms for visual analysis. For subtomogram averaging, they used a previously published dataset and compared the averages of their automated picking with the published manual picking. Analysis of cellular tomogram segmentation was primarily visual.

      Strengths:

      CNN-based segmentation of cryo-ET data is a rapidly developing area of research, as it promises substantially faster results than manual segmentation as well as the possibility for higher accuracy. However, this field is still very much in the development and the overall performance of these approaches, even across different algorithms, still leaves much to be desired. In this context, I think Ais is an interesting package, as it aims to provide both new and experienced users with streamlined approaches for manual annotation, access to a number of CNNs, and methods to refine the outputs of CNN models against each other. I think this can be quite useful for users, particularly as these methods develop.

      Weaknesses:

      Whilst overall I am enthusiastic about this manuscript, I still have a number of comments:

      On page 5, paragraph 1, there is a discussion on human judgement of these results. I think a more detailed discussion is required here, as from looking at the figures, I don't know that I agree with the authors' statement that Pix2pix is better. I acknowledge that this is extremely subjective, which is the problem. I think that a manual segmentation should also be shown in a figure so that the reader has a better way to gauge the performance of the automated segmentation.

      On page 7, the authors mention terms such as "emit" and "absorb" but never properly define them, such that I feel like I'm guessing at their meaning. Precise definitions of these terms should be provided.

      For Figure 3, it's unclear if the parent models shown (particularly the carbon model) are binary or not. The figure looks to be grey values, which would imply that it's the visualization of some prediction score. If so, how is this thresholded? This can also be made clearer in the text.

      Figure 3D was produced in ChimeraX using the hide dust function. I think some discussion on the nature of this "dust" is in order, e.g. how much is there and how large does it need to be to be considered dust? Given that these segmentations can be used for particle picking, this seems like it may be a major contributor to false positives.

      Page 9 contains the following sentence: "After selecting these values, we then launched a batch particle picking process to determine lists of particle coordinates based on the segmented volumes." Given how important this is, I feel like this requires significant description, e.g. how are densities thresholded, how are centers determined, and what if there are overlapping segmentations?

      The FSC shown in Figure S6 for the auto-picked maps is concerning. First, a horizontal line at FSC = 0 should be added. It seems that starting at a frequency of ~0.045, the FSC of the autopicked map increases above zero and stays there. Since this is not present in the FSC of the manually picked averages, this suggests the automatic approach is also finding some sort of consistent features. This needs to be discussed.

      Page 11 contains the statement "the segmented volumes found no immediately apparent false positive predictions of these pores". This is quite subjective and I don't know that I agree with this assessment. Unless the authors decide to quantify this through subtomogram classification, I don't think this statement is appropriate.

      In the methods, the authors note that particle picking is explained in detail in the online documentation. Given that this is a key feature of this software, such an explanation should be in the manuscript.