10,000 Matching Annotations
  1. Apr 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ma, Yang et al. report a new investigation aimed at elucidating one of the key nutrients S. Typhimurium (STM) utilizes with the nutrient-poor intracellular niche within the macrophage, focusing on the amino acid beta-alanine. From these data, the authors report that beta-alanine plays an important role in mediating STM infection and virulence. The authors employ a multidisciplinary approach that includes some mouse studies and ultimately propose a mechanism by which panD, involved in B-Ala synthesis, mediates the regulation of zinc homeostasis in Salmonella. The impact of this work is questionable. There are already many studies reporting Salmonella-effector interactions, and while this adds to that knowledge it is not a significant advance over previous studies. While the authors are investigating an interesting question, the work has two important weaknesses; if addressed, the conclusions of this work and broader relevance to bacterial pathogenesis would be enhanced.

      Strengths:

      This reviewer appreciates the multidisciplinary nature of the work. The overall presentation of the figure graphics are clear and organized.

      Weaknesses:

      First, this study is very light on mechanistic investigations, even though a mechanism is proposed. Zinc homeostasis in cells, and roles in bacteria infections, are complex processes with many players. The authors have not thoroughly investigated the mechanisms underlying the roles of B-Ala and panD in impacting STM infection such that other factors cannot be ruled out. Defining the cellular content of Zn2+ STM in vivo would be one such route. With further mechanistic studies, the possibility cannot be ruled out that the authors have simply deleted two important genes and seen an infection defect - this may not relate directly to Zn2+ acquisition.

      Thank you for your patient and thoughtful reading, as well as the constructive comments and advice regarding our manuscript. We have revised the manuscript based on your comments and suggestions.

      You are correct that this work has not thoroughly investigated the mechanisms underlying the roles of β-alanine, panD, and zinc in impacting Salmonella infection. It is challenging to isolate sufficient amounts of Salmonella from infected cells or tissues and then measure the zinc concentration in the bacteria, and we have attempted to do so without success. Therefore, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (Δ_panD_), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. This information has been added to the revised manuscript (lines 325-329, 344-348).

      Meanwhile, we concur that additional, unknown mechanisms are involved in the virulence regulation by β-alanine in Salmonella. Our findings indicate that the double mutant Δ_panD_Δ_znuA_, which cannot synthesize β-alanine nor uptake zinc, is more attenuated than the single mutant Δ_znuA_ (Figures 5D and 6B). This suggests that the contribution of β-alanine to Salmonella's virulence is partially dependent on zinc acquisition. We have revised the related descriptions throughout the manuscript for clarity (lines 31, 304, 341,1056, 1068).

      Second, the authors hint at their newly described mechanism/pathway being important for disease and possibly a target for therapeutics. This claim is not justified given that they have employed a single STM strain, which was isolated from chickens and is not even a clinical isolate. The authors could enhance the impact of their findings and relevance to human disease by demonstrating it occurs in human clinical isolates and possibly other serovars. Further, the use of mouse macrophage as a model, and mice, have limited translatability to human STM infections.

      We thank you for your comments and advice on our manuscript and are delighted to accept them. Salmonella Typhimurium causes systemic disease in mice, which is similar to the symptoms of typhoid fever in humans and has been widely used to explore the pathogenesis of Salmonella. Based on your comment, we have now performed additional experiments to confirm several key points of our findings in another typical Salmonella serovar, Salmonella enterica serovar Typhi, which is a human-limited serovar and the cause of typhoid fever in humans (PLoS Pathog. 2012, 8(10):e1002933).

      We constructed the panD mutant strain (ΔpanD) in the S. Typhi strain Ty2 and  subsequently compared the replication of ΔpanD with that of the Ty2 wild-type in the human THP-1 monocyte like cell line (ATCC TIB-22) using gentamicin protection assays. The results showed that the replication of ΔpanD in THP-1cells was reduced by 2.6-fold at 20 h post-infection compared to the Ty2 wild-type strain  (P < 0.01) (Figure 2_figure Supplement 3), suggesting that panD also facilitates S. Typhi replication in human macrophages and may be involved in the systemic infection of S. Typhi in humans. This result has been included in the revised manuscript. (lines 203-210).

      Based on these results, we speculate that PanD may serve as a potential target for treating Salmonella infection.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28. Latin phrases like de novo should be italicized.

      Thank you for your careful review. We have revised the manuscript thoroughly (Lines 28, 65, 77, 106, 171, 173, 214, 1002, 1023, 1078).

      (2) Line 45. 'survival' typo.

      We have corrected it in the revised manuscript (Line 45).

      (3) Line 57. What evidence or prior work supports the SCV of macrophages in a nutrient-poor environment? Citation needed.

      The relevant reference has now been added (lines 62-63).

      (4) Lines 65-68. If an 'increasing number of studies have focused' on this topic, please cite them here.

      The relevant reference has now been added (lines 72-73).

      (5) Lines 69-71. Citations are needed for these claims.

      The relevant reference has now been added (lines 76-77, 79-80).

      (6) Line 76-77. Citation needed for this claim.

      The relevant reference has now been added (lines 84, 86).

      (7) Line 116-122, and Figure 1C, and Figure 1 legend. An important claim in this work is that the amino acid content of the macrophage cytoplasm is different +/- STM infection. The authors need to explain this result more carefully and define their acronyms. What is VIP, Log2 FC, etc.? What do the colors in Figure 1C mean? They are not defined. If possible, it would be more approachable to list these as molar concentrations, weight/cell, or number of molecules/cell. The authors should calculate an effect size for each of these data to help assess if the differences are meaningful. Without this information, and a clearer explanation of what these data are, it is difficult to evaluate the authors' claim that "8 [amino acids] showed significant differences in abundance."

      Thank you for the comment. The full names of VIP (Variable Importance in the Projection) and FC (fold change) have been included in the revised manuscript. In Figure 1C of the original manuscript, pink represents the content of amino acids that increased following Salmonella infection, whereas blue signifies the content of amino acids that decreased after Salmonella infection.

      Based on your suggestion, we have revised Figure 1C (now Figure 1C, D in the revised manuscript) and the content of amino acids is now expressed as weight per cell (ng/ 10<sup>7</sup> cells). The legend has been updated accordingly. (lines 9931-997).

      (8) Line 134-138. Additional controls are required for this experiment. By adding a nutrient (B-Ala) you have increased the nutrient availability and growth potential of the bacteria. This may not relate to anything special to B-Ala. Perhaps the addition of another amino acid, or sugar, would have a similar impact. Further, this result would be more compelling if the authors demonstrated a dose-dependent effect of B-Ala addition.

      Thank you for the comment. To further confirm that host-derived β-alanine can promote intracellular Salmonella replication, we have added varying concentrations of β-alanine (0.5, 1, 2, and 4 mM) to the culture medium (RPMI) of RAW264.7 cells. Subsequently, we infected these cells with Salmonella to assess the impact of β-alanine supplementation on the bacterium's replication within macrophages. Our observations indicate that the addition of 1, 2, and 4 mM β-alanine significantly (P < 0.001) enhanced Salmonella replication in RAW264.7 cells. Furthermore, the increase in Salmonella intracellular replication was dose-dependent, as illustrated in the revised Figure 1E. These findings suggest that host-derived β-alanine facilitates Salmonella replication inside macrophages. We have included these results in the revised manuscript (lines 141-149).

      (9) Lines 181-184, and Figure 2E. In addition to the fold-change replication data, here and elsewhere the authors should provide raw CFU counts for data transparency.

      Thank you for bringing this to our attention. In this work, we have utilized “fold intracellular replication (20 h intracellular bacterial CFU/ 2 h intracellular bacterial CFU)” to illustrate the differences in intracellular replication of different Salmonella strains in macrophages. The term “fold intracellular replication” is commonly employed in recently published reports (eg. FEMS Microbiol Lett. 2024, 9;371:fnae067; mBio. 2024, 15(7):e0112824; Front Microbiol. 2024, 14:1340143). To ensure data transparency, we have included the raw CFU counts in the source data file.

      (10) Line 197. Why employ i.p. injection of STM? As a non-typhoidal serovar, STM infection is enteric, and so i.p. injection seems very artificial if the goal is to understand the role B-Ala synthesis in disease.

      Thank you for the comment. Salmonella can induce gastroenteritis or systemic infection, which are associated with its capacity to invade intestinal epithelial cells and replicate within macrophages, respectively. In this study, using gentamicin protection assays and immunofluorescence analysis, we demonstrated that β-alanine is crucial for Salmonella replication inside macrophages. Since replication in macrophages is a key determinant of systemic Salmonella infection, we hypothesized that β-alanine also affects Salmonella systemic infection in vivo. Intraperitoneal (i.p.) injection enables Salmonella to disseminate directly to systemic sites via the lymphatic and bloodstream systems, bypassing the need for intestinal invasion (Microbiol Res. 2023, 275:127460; Int Immunopharmacol. 2016, 31:233-8). Thus, we conducted the mice infection assays via intraperitoneal (i.p.) injection to ascertain whether β-alanine affects systemic Salmonella infection. We have included the description in the revised manuscript to enhance clarity. (lines 217-221).

      Whether β-alanine influences Salmonella invasion of intestinal epithelial cells and intestinal colonization has not been investigated in this work; this issue will be explored in our future studies.

      (11) Line 207-214 and Figure 3. If the hypothesis is that B-Ala mediates STM survival/virulence through enhancing metabolism in the SCV and intracellular niche, why did the authors not investigate/enumerate STM in this niche in their in vivo studies?

      Thank you for the comment. Through immunofluorescence staining, we have investigated the bacterial count of Salmonella wild-type (WT), panD mutant (Δ_panD_), and complemented strain (cpanD) within the macrophages of the mouse liver. The findings indicated that the number of Δ_panD_ in each liver macrophage was significantly (P < 0.0001) lower than that of WT, and the complementation of Δ_panD_ increased the bacterial count in each liver macrophage to the level of WT (refer to Figure 3E in the revised manuscript). These results have been included in the revised manuscript. (lines 234-239).

      (12) Figure 4B - the down genes label is cut off.

      Thank you for your careful review. We have corrected it in the revised Figure 4B.

      (13) Line 260-265. SPI-2 needs to be defined and introduced, as do other terms here, to make the work approachable to non-STM specialists.

      The introduction of SPI-2 has been added to the revised manuscript. (Lines 290-292).

      (14) Line 300-301. Additional experiments are needed to support the claim that "data indicate that β-alanine promotes in vivo virulence of Salmonella, partially by increasing the expression of zinc transporter genes." Gene up- or down-regulation does not necessarily have any meaningful impact on function or activity. The authors here need an assay that confirms that the function of znuA is disrupted, such as examining the cell Zn2+ content in vivo at different levels of B-Ala exposure and/or panD activity. Moreover, more Zn2+ is not necessarily beneficial for STM, at levels too high zinc can exert cell toxicity. So, the authors have a correlation but no data supporting this mechanism explains their observations of virulence and infection. How much Zn2+ is ideal for STM growth?

      Thank you for the comment. It is challenging to isolate sufficient amounts of Salmonella from infected cells or tissues and then measure the zinc concentration in the bacteria, and we have attempted to do so without success. Therefore, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (ΔpanD), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. This information has been added to the revised manuscript (lines 325-329, 344-348).

      Zinc is essential for bacterial survival and growth, as zinc-binding proteins constitute approximately 5% of the bacterial proteome and play crucial roles in bacterial metabolism and growth (J Proteome Res. 2006, 5(11):3173-8; Future Med Chem. 2017, 9(9):899-910). Regarding Salmonella, zinc is also employed to undermine the antimicrobial host defense mechanisms of macrophages, by inhibiting NF-кB activation and impairing NF-кB-dependent bacterial clearance (J Biol Chem. 2018, 293(39):15316-15329; Infect Immun. 2017, 85(12):e00418-17). Thus, the efficient acquisition of zinc may play a crucial role in the survival and replication of Salmonella within macrophages, where zinc availability is extremely limited (Infect Immun. 2007, 75(12):5867-76; Biochim Biophys Acta. 2016, 1860(3):534-41). It has been reported that Salmonella utilizes the high-affinity ZnuABC zinc transporter to maximize zinc availability within host cells (Infect Immun. 2007, 75(12):5867-76). Here, we discovered that β-alanine can enhance the expression of the zinc transporter genes znuABC, which might serve as a supplementary mechanism for the efficient uptake of zinc by Salmonella within macrophages.

      You are correct that more zinc is not necessarily beneficial for Salmonella, as excessive zinc can inhibit the growth of Salmonella. Considering that zinc availability is limited within macrophages and the znuABC genes are significantly upregulated when Salmonella resides inside macrophages (PLoS Pathog. 2015, 11(11):e1005262; Science. 2018, 362(6419):1156-1160), it is likely that zinc acts as a limiting factor and may not attain very high concentrations during Salmonella's growth within macrophages. We have included a discussion on this matter in the revised manuscript.t (lines 459-466).

      (15) Figure 6B. Related to the above, these data would be more compelling with higher n and a dose-dependent response demonstrated for Zn2+ addition. This is a central point of the manuscript, and effectively what the authors propose as the underlying mechanism, and it should be more robustly substantiated.

      Thank you for the comment. As stated in the previous response, we were unable to directly assess the bacterial zinc concentration during Salmonella growth within macrophages. Instead, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (ΔpanD), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. Moreover, considering that zinc availability is limited within macrophages and the znuABC genes are significantly upregulated when Salmonella resides inside macrophages (PLoS Pathog. 2015, 11(11):e1005262; Science. 2018, 362(6419):1156-1160), it is likely that zinc acts as a limiting factor and may not attain very high concentration during Salmonella's growth within macrophages.

      Reviewer #2 (Public review):

      Summary:

      Salmonella exploits host- and bacteria-derived β-alanine to efficiently replicate in host macrophages and cause systemic disease. β-alanine executes this by increasing the expression of zinc transporter genes and therefore the uptake of zinc by intracellular Salmonella.

      Strengths:

      The experiments designed are thorough and the claims made are directly related to the outcome of the experiments. No overreaching claims were made.

      Weaknesses:

      A little deeper insight was expected, particularly towards the mechanistic aspects. For example, zinc transport was found to be the cause of the b-alanine-mediated effect on Salmonella intracellular replication. It would have been very interesting to see which are the governing factors that may get activated or inhibited due to Zn accumulation that supports such intracellular replication.

      We appreciate your review and advice. To further investigate the mechanisms by which β-alanine, panD, and zinc influence Salmonella infection, we have conducted additional experiments as suggested. For instance, we examined the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (Δ_panD_). This approach indirectly reflects zinc acquisition by intracellular Salmonella, as it is challenging to isolate sufficient amounts of the bacteria from infected cells or tissues for zinc concentration measurement. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared to that in WT-infected counterparts (Figures 5E and 6A). This suggests that the panD gene and β-alanine are crucial for Salmonella to absorb zinc from host cells. This new information has been included in the revised manuscript (lines 325-329, 344-348).

      Zinc is essential for bacterial survival and growth, as zinc-binding proteins constitute approximately 5% of the bacterial proteome and play crucial roles in bacterial metabolism and growth. (J Proteome Res. 2006, 5(11):3173-8; Future Med Chem. 2017, 9(9):899-910 ). Regarding Salmonella, zinc is also employed to undermine the antimicrobial host defense mechanisms of macrophages, by inhibiting NF-кB activation and impairing NF-кB-dependent bacterial clearance (J Biol Chem. 2018, 293(39):15316-15329; Infect Immun. 2017, 85(12):e00418-17). Thus, efficient zinc uptake could be crucial for Salmonella survival and replication within macrophages, where zinc availability is extremely limited (Infect Immun. 2007, 75(12):5867-76; Biochim Biophys Acta. 2016, 1860(3):534-41). It has been reported that Salmonella exploits the high-affinity ZnuABC zinc transporter to maximize zinc availability in host cells (Infect Immun. 2007, 75(12):5867-76). Here, we discovered that β-alanine can enhance the expression of the zinc transporter genes znuABC, which might serve as a supplementary mechanism for the efficient uptake of zinc by Salmonella within macrophages. We have addressed this issue in the revised manuscript (lines 459-466).

      Reviewer #2 (Recommendations for the authors):

      A few general clarifications and suggested experiments:

      (1) Metabolome analysis: Salmonella can itself produce b-alanine. Given that it is isolated from infected cells where salmonella has scavenged b-alanine from host cytosol as well as produced it, how b-alanine levels went down in metabolome analysis is confusing.

      Thank you for the comment. The method for targeted metabolic profiling is conducted as outlined in a recently published paper by our group (Nat Commun. 2021, 12(1):879). To prevent delays and changes in metabolite concentrations during the separation of bacterial contents from macrophages, we determined the combined metabolite concentrations directly from infected cells and Salmonella. We observed that each Salmonella cell contained only 0.01%-0.02% of the concentration of each corresponding combined metabolite. Approximately 94% of the infected macrophages contained no more than ten bacteria at 8 hours post-infection, confirming that the combined metabolites were predominantly from the host. We have included an explanation of this issue in the method section. (lines 557-560).

      (2) What is the basal level of b-alanine produced by macrophages? How was 1 mM conc. chosen?

      According to our results, the content of β-alanine in uninfected RAW264.7 cells is 26-33 μM/10<sup>7</sup> cell (700-900 ng/10<sup>7</sup> cell). The 1 mM concentration was chosen based on a published report (Appl Microbiol Biotechnol. 2004, 65(5):576-82).

      Additionally, we have supplemented the culture medium (RPMI) of RAW264.7 cells with 0.5, 1, 2, and 4 mM β-alanine and subsequently infected them with Salmonella to assess the impact of β-alanine supplementation on the bacterium's replication within macrophages. Our observations revealed that the supplementation with 1, 2, and 4 mM β-alanine significantly (P < 0.001) enhanced Salmonella replication in RAW264.7 cells. Furthermore, the addition of β-alanine to the infected cells resulted in a dose-dependent increase in Salmonella intracellular replication, as depicted in Figure 1E. These findings further support the notion that host-derived β-alanine facilitates Salmonella replication within macrophages. This data has been incorporated into the revised manuscript (lines 141-149).

      (3) The antimicrobial activity of macrophages preventing the growth of intracellular Salmonella will primarily be governed by genes such as GBPs, defensins, nitric oxide, etc. The expression of these genes should be tested rather than cytokines which are secreted with little effect on intracellular Salmonella.

      Thank you for the suggestion. We have investigated the levels of ROS (reactive oxygen species) and RNS (reactive nitrogen species) in Salmonella-infected RAW264.7 cells, both in the presence and absence of 1 mM β-alanine. The results indicated that β-alanine did not affect the ROS and RNS levels in RAW 264.7 cells (Figure 1_figure Supplement 1), suggesting that β-alanine does not influence the antimicrobial activity of macrophages. We have included these results in the revised manuscript (lines150-153).

      (4) For animal experiments, how many times was the experiment repeated? Can the animal experiment be done with b-alanine supplementation and panD mutant? Can the liver be stained to detect the bacteria?

      Thank you for the comment.

      i) Mouse infection assays were conducted twice, with at least 2 mice (n ≥ 2) in each injection group. The combined data from the two experiments was used for statistical analysis. This information has been added to the revised manuscript. (lines 678-681).

      ii) As suggested, mice infected with the panD mutant (Δ_panD_) were administered β-alanine (500 mg/kg/day, Behav Brain Res. 2014, 272:131-40; Physiol Behav. 2015, 145:29-37) orally on a daily basis. On the third day post-infection, the bacterial burden in the liver and spleen and the body weight of the infected mice were measured. The results indicated that administering β-alanine to mice did not affect the bacterial burden of ΔpanD in the liver and spleen nor did it influence the body weight of the infected mice (please refer to Author response image 1 below). It has been reported that β-alanine is a rate-limiting precursor for the biosynthesis of carnosine in mammals (Med Sci Sports Exerc. 2010, 42(6):1162-73; Neurochem Int. 2010, 57(3):177-88). Following supplementation, β-alanine may be rapidly synthesized into carnosine in mice, and the free β-alanine, particularly that which enters the macrophages of the liver and spleen, may be limited and insufficient to enhance Salmonella replication.

      Author response image 1.

      iii) Through immunofluorescence staining, we have investigated the bacterial count of Salmonella wild-type (WT), panD mutant (Δ_panD_), and complemented strain (c_panD_) within the macrophages of the mouse liver. The findings indicate that the number of Δ_panD_ in each liver macrophage was significantly (P < 0.0001) lower than that of WT, and the complementation of Δ_panD_ increased the bacterial count in each liver macrophage to the level of WT (Figure 3E in the revised manuscript). These results have been included in the revised manuscript. (lines 234-239).

      Reviewer #3 (Public review):

      Summary:

      Salmonella is interesting due to its life within a compact compartment, which we call SCV or Salmonella containing vacuole in the field of Salmonella. SCV is a tight-fitting vacuole where the acquisition of nutrients is a key factor by Salmonella. The authors among many nutrients, focussed on beta-alanine. It is also known from many other studies that Salmonella requires beta-alanine. The authors have done in vitro RAW macrophage infection assays and In vivo mouse infection assays to see the life of Salmonella in the presence of beta-alanine. They concluded by comprehending that beta-alanine modulates the expression of many genes including zinc transporters which are required for pathogenesis.

      Strengths:

      This study made a couple of knockouts in Salmonella and did a transcriptomic investigation to understand the global gene expression pattern.

      Weaknesses:

      The following questions are unanswered:

      (1) It is not clear how the exogenous beta-alanine is taken up by macrophages.

      We thank the reviewer for the question. It has been reported that β-alanine is transported into eukaryotic cells via the TauT (SLC6A6) and PAT1 (SLC36A1) transporters (Acta Physiol (Oxf). 2015, 213(1):191-212; Am J Physiol Cell Physiol. 2020 Apr 1;318(4):C777-C786; Biochim Biophys Acta. 1994, 1194(1):44-52.).

      (2) It is not clear how the Beta-alanine from the cytosol of the macrophage enters the SCV.

      According to the published report, translocation of SPI2 effector proteins induces the formation of specific tubular membrane compartments extend from the SCV, known as Salmonella-induced filaments (SIFs) (Traffic. 2001, 2(9):643-53; Traffic. 2007, 8(3):212-25; Traffic. 2008, 9(12):2100-16; Microbiology (Reading). 2012, 158(Pt 5):1147-1161). The membranes and lumens of both SIFs and SCVs form a continuous network, allowing vacuolar Salmonella to access various types of endocytosed materials (Front Cell Infect Microbiol. 2021, 11:624650; Cell Host Microbe. 2017, 21(3):390-402). We hypothesize that β-alanine may enter SCVs from the cytoplasm of macrophages via SIFs. This information has been included in the revised manuscript (lines 56-61).

      (3) It is not clear how the beta-alanine from SCV enters the bacterial cytosol.

      Thank you for the question. We have attempted to identify the transporter of β-alanine in Salmonella, but we found that the CycA transporter, which transports β-alanine in Escherichia coli, does not function in the same manner in Salmonella, despite Salmonella being closely related to E. coli.

      BasC is a bacterial LAT (L-Amino acid transporter) with an APC fold (J Gen Physiol. 2019, 151(4):505-517). The basC gene is reported to be present in the genomes of Pseudomonas, Acinetobacter, and Aeromonas, etc. Following your suggestion, we searched the genome of Salmonella Typhimurium at NCBI and did not find any basC gene or genes with a sequence similar to basC. Unfortunately, we have yet to identify the β-alanine transporter in Salmonella, and we will persist in our search in future work.

      (4) There is no clarity on the utilization of exogenous beta-alanine of the host and the de novo synthesis of beta-alanine by panD of Salmonella.

      Thank you for the comment. Our findings indicated that β-alanine levels were reduced in Salmonella-infected RAW264.7 cells. Furthermore, the addition of β-alanine to the culture medium (RPMI) of RAW264.7 cells significantly enhanced Salmonella replication, suggesting that the intracellular Salmonella utilize host-derived β-alanine for their growth. However, to date, we have not identified the transporter responsible for the uptake of exogenous β-alanine into the Salmonella cytosol.

      Moreover, we have discovered that the replication of the Salmonella panD mutant within macrophages and its virulence in mice are significantly reduced compared to the wild type (WT), indicating that the de novo synthesis of β-alanine is crucial for Salmonella's intracellular replication and virulence.

      These results indicate that either acquisition from the host or de novo synthesis of β-alanine is critical for Salmonella replication inside macrophages.

      Reviewer #3 (Recommendations for the authors):

      Cite this paper from 1985, which talks about the role of beta-alanine in Salmonella infection J Gen Microbiol,. 1985 May;131(5):1083-90. doi: 10.1099/00221287-131-5-1083. A Salmonella typhimurium strain defective in uracil catabolism and beta-alanine synthesis, T P West, T W Traut, M S Shanley, G A O'Donovan

      We have now cited this paper in the revised manuscript (lines 82-83).

      (2) BasC- can be important for beta-alanine transport. CycA transporter was not found to be involved in beta-alanine. However, it is important to find out which transporter is required for the uptake of beta-alaine.

      Thank you for pointing it out. We agree that it is important to determine which transporter is necessary for the uptake of β-alanine in Salmonella. BasC is a bacterial LAT (L-Amino acid transporter) with an APC fold (J Gen Physiol. 2019, 151(4):505-517). The basC gene is reported to be present in the genomes of Pseudomonas, Acinetobacter, and Aeromonas, etc. Following your suggestion, we searched the genome of Salmonella Typhimurium at NCBI and did not find any basC gene or genes with a sequence similar to basC. Unfortunately, we have yet to identify the β-alanine transporter in Salmonella, and we will persist in our search in future work.

      (3) Bacteria being quite stringent with its energy resources, it is unlikely that it will use de novo synthesis if the host resources are available. Only if the host resources are depleted, can it turn on the de novo synthesis involving panD. What is the status of fold-replication of panD mutant in the presence of exogenous addition of beta-alanine?

      Thank you for the comment. The addition of 1 to 4 mM of β-alanine increased the replication of the panD mutant (Δ_panD_) in RAW264.7 cells by 1.7- to 3.1-fold. This increase in Salmonella intracellular replication was dose-dependent, as shown in Figure 2H of the revised manuscript, further illustrating that host-derived β-alanine promotes Salmonella replication inside macrophages.

      We agree that bacteria are quite stringent with their energy resources. The results of this work indicate that either acquisition from the host or de novo synthesis of β-alanine is critical for Salmonella replication inside macrophages. We speculate that Salmonella relies on a large amount of β-alanine to efficiently replicate in macrophages, thereby highlighting the importance of β-alanine for Salmonella intracellular growth. We have discussed this issue in the revised manuscript. (lines 392-396).

      (4) 100% survival of animals infected with panD mutant is a bit of concern. What happens when beta-alanine is fed to mice and infected with panD mutant?

      Thank you for the comment. As suggested, mice infected with the panD mutant (ΔpanD) were administered β-alanine (500 mg/kg/day, as reported in Behav Brain Res. 2014, 272:131-40; Physiol Behav. 2015, 145:29-37) orally on a daily basis. On the third day post-infection, the bacterial load in the liver and spleen, as well as the body weight of the infected mice, were measured. The results indicated that administering β-alanine did not affect the bacterial load of Δ_panD_ in the liver and spleen nor did it influence the body weight of the infected mice (refer to Author response image 1). It has been reported that β-alanine is a rate-limiting precursor for the biosynthesis of carnosine in mammals (Med Sci Sports Exerc. 2010, 42(6):1162-73; Neurochem Int. 2010, 57(3):177-88). Following supplementation, β-alanine may be rapidly converted into carnosine in mice, and the free β-alanine, particularly that which enters the macrophages of the liver and spleen, may be limited and insufficient to enhance Salmonella replication.

      (5) How does beta-alanine from macrophages' cytosol enter the SCV.

      Thank you for pointing it out. According to published reports, the translocation of SPI2 effectors triggers the formation of specialized tubular membrane compartments, known as Salmonella-induced filaments (SIFs), which extend from the SCV (Traffic. 2001, 2(9):643-53; Traffic. 2007, 8(3):212-25; Traffic. 2008, 9(12):2100-16; Microbiology. 2012, 158:1147-1161). The membranes and lumens of SIFs and SCVs create a continuous network, allowing vacuolar Salmonella to access various types of endocytosed materials (Front Cell Infect Microbiol. 2021, 11:624650; Cell Host Microbe. 2017, 21(3):390-402). Consequently, it is plausible that β-alanine enters SCVs from the macrophage cytoplasm via SIFs. This information has been included in the revised manuscript.(lines 56-61).

      (6) It would be essential to dissect the role of exogenous beta-alanine and the use of de novo synthesized beta-alanine.

      We agree that it is essential to dissect the role of exogenous β-alanine and the use of de novo synthesized β-alanine. Our results indicate that Salmonella-infected macrophages exhibited lower levels of β-alanine compared to mock-infected macrophages. Furthermore, β-alanine supplementation in the cell medium enhanced Salmonella replication within macrophages in a dose-dependent manner, revealing that Salmonella utilizes host-derived β-alanine to promote intracellular replication. Additionally, a deficiency in the biosynthesis of β-alanine, resulting from mutation of the rate-limiting gene panD, led to reduced Salmonella replication in macrophages and systemic infection in mice. This suggests that Salmonella also employs bacterial-derived β-alanine to enhance intracellular replication and pathogenicity.

      We sought to identify the main transporters responsible for β-alanine uptake in Salmonella. Unfortunately, we have not yet found the transporter. We will address this issue in our future work.

    1. eLife Assessment

      By taking advantage of noise in gene expression, this important study introduces a new approach for detecting directed causal interactions between two genes without perturbing either. The main theoretical result is supported by a proof. Preliminary simulations and experiments on small circuits are solid, but further investigations are needed to demonstrate the broad applicability and scalability of the method.

    2. Reviewer #2 (Public Review):

      Summary:

      This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality.

      The authors benchmark their approach experimentally in several synthetic circuits. In 4 positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in 2 of the 4 positive control circuits. The authors constructed 16 negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter, or simply the cellular growth rate. The proposed method detected a causal effect in two of the 16 negative controls, which the authors argue is perhaps not a false positive, but due to an unexpected causal effect. Overall, the data support the potential value of the proposed approach.

      Strengths:

      The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations.

      By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations.

      The authors have improved the clarity and completeness of their proof compared to a previous version of the manuscript.

      Limitations:

      The authors themselves clearly outline the primary limitations of the study: The experimental benchmark is a proof of principle, and limited to synthetic circuits involving a handful of genes expressed on plasmids in E. coli. As acknowledged in the Discussion, negative controls were chosen based on the absence of known interactions, rather than perturbation experiments. Further work is needed to establish that this technique applies to other organisms and to biological networks involving a wider variety of genes and cellular functions. It seems to me that this paper's objective is not to delineate the technique's practical domain of validity, but rather to motivate this future work, and I think it succeeds in that.

      Might your new "Proposed additional tests" subsection be better housed under Discussion rather than Results?

      I may have missed this, but it doesn't look like you ran simulation benchmarks of your bootstrap-based test for checking whether the normalized covariances are equal. It would be useful to see in simulations how the true and false positive rates of that test vary with the usual suspects like sample size and noise strengths.

      It looks like you estimated the uncertainty for eta_xz and eta_yz separately. Can you get the joint distribution? If you can do that, my intuition is you might be able to improve the power of the test (and maybe detect positive control #3?). For instance, if you can get your bootstraps for eta_xz and eta_yz together, could you just use a paired t-test to check for equality of means?

      The proof is a lot better, and it's great that you nailed down the requirement on the decay of beta, but the proof is still confusing in some places:

      - On pg 29, it says "That is, dividing the right equation in Eq. 5.8 with alpha, we write the ..." but the next equation doesn't obviously have anything to do with Eq. 5.8, and instead (I think) it comes from Eq 5.5. This could be clarified.

      - Later on page 29, you write "We now evoke the requirement that the averages xt and yt are stationary", but then you just repeat Eq. 5.11 and set it to zero. Clearly you needed the limit condition to set Eq. 5.11 to zero, but it's not clear what you're using stationarity for. I mean, if you needed stationarity for 5.11 presumably you would have referenced it at that step.

      It could be helpful for readers if you could spell out the practical implications of the theorem's assumptions (other than the no-causality requirement) by discussing examples of setups where it would or wouldn't hold.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript presents a method to infer causality between two genes (and potentially proteins or other molecules) based on the non-genetic fluctuations among cells using a version of the dual-reporter assay as a causal control, where one half of the dual-reporter pair is causally decoupled, as it is inactive. The authors propose a statistical invariant identity to formalize this idea. 

      We thank the referee for this summary of our work. 

      Strengths: 

      The paper outlines a theoretical formalism, which, if experimentally used, can be useful in causal network inference, which is a great need in the study of biological systems. 

      We thank the referee for highlighting the potential value of our proposed method.

      Weaknesses: 

      The practical utility of this method may not be straightforward and potentially be quite difficult to execute. Additionally, further investigations are needed to provide evidence of the broad applicability of the method to naturally occurring systems and its scalability beyond the simple circuit in which it is experimentally demonstrated. 

      We agree with these two points and have rewritten the manuscript, in particular highlighting the considerable future work that remains to be done to establish the broad applicability and scalability of our method.

      In the rewritten manuscript we explicitly spell out potential practical issues and we explicitly state that our presented proof–of–principle feasibility study does not guarantee that our method will successfully work in systems beyond the narrowly sampled test circuits. This helps readers to clearly distinguish between what we claim to have done from what remains to be done. The re-written parts and additional clarifications are:

      Abstract (p. 1), Introduction (p. 1-2), Sec. “Proposed additional tests” (p. 8), and “Limitations of this study” (p. 10).

      Reviewer #2 (Public Review): 

      Summary: 

      This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality. 

      The authors benchmark their approach experimentally in several synthetic circuits. In four positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in two or three of the positive control circuits. The authors constructed sixteen negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter or simply the cellular growth rate. The proposed method detected a causal effect in one of the eight negative controls, which the authors argue is not a false positive, but due to an unexpected causal effect. Overall, the data support the practical usefulness of the proposed approach. 

      We thank the referee for their summary of our work.

      Strengths: 

      The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations. 

      By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations. 

      We thank the referee for summarizing the value of our work. 

      Caveats: 

      The term "causally" is used in the main-text statement of the central theorem (Eq 2) without a definition of this term. This makes it difficult to fully understand the statement of the paper's central theorem without diving into the supplement.  

      We thank the referee for this suggestion. In the revised manuscript we now define causal effects right before the statement of the main theorem of the main text (p. 2). We have also added a definition of the causal network arrows in the caption of Fig. 1 to help readers better understand our central claim.

      The basic argument of theorem 1 appears to rely on establishing that x(t) and y(t) are independent of their initial conditions. Yet, there appear to be some scenarios where this property breaks down: 

      (1) Theorem 1 does not seem to hold in the edge case where R=beta=W=0, meaning that the components of interest do not vary with time, or perhaps vary in time only due to measurement noise. In this case x(t), y(t), and z(t) depend on x(0), y(0), and z(0). Since the distributions of x(0), y(0), and z(0) are unspecified, a counterexample to the theorem may be readily constructed by manipulating the covariance matrix of x(0), y(0), and z(0). 

      (2) A similar problem may occur when transition probabilities decay with time. For example, suppose that again R=0 and X are degraded by a protease (B), but this protease is subject to its own first-order degradation. The deterministic version of this situation can be written, for example, dx/dt=-bx and db/dt=-b. In this system, x(t) approaches x(0)exp(-b(0)) for large t. Thus, as above, x(t) depends on x(0). If similar dynamics apply to the Y and Z genes, we can make all genes depend on their initial conditions, thus producing a pathology analogous to the above example. 

      The reviewer does not know when such examples may occur in (bio)physical systems. Nevertheless, since one of the advantages of mathematics is the ability to correctly identify the domain of validity for a claim, the present work would be strengthened by "building a fence" around these edge cases, either by identifying the comprehensive set of such edge cases and explicitly prohibiting them in a stated assumption set, or by pointing out how the existing assumptions already exclude them.  

      We thank the referee for bringing to our attention these edge cases that indeed violate our theorem as stated. In the revised manuscript we have “built a fence” around these edge cases by adding two requirements to the premise of our theorem: First, we have added the requirement that the degradation rate does not decay to zero for any possible realization. That is, if beta(t) is the degradation rate of X and Y for a particular cell over time, then taking the time average of beta(t) over all time must be non-zero. Second, we have added the requirement that the system has evolved for enough time such that the dual reporter averages <x> and <y>, along with the covariances Cov(x, z_{k}) and Cov(y, z_{k}) have reached a time-independent stationary state.  

      With these requirements, no assumptions need to be made about the initial conditions of the system, because any differences in the initial conditions will decay away as the system reaches stationarity. For instance, the referee’s example (1) is not possible with these requirements because beta(t) can no longer remain zero. Additionally, example (2) is no longer possible because the time average of the degradation rate would be zero, which is no longer allowed (i.e., we would have that integral from 0 to T of b(0)exp(-t)/T dt =  0 when T goes to infinity). 

      Note that adding the condition that degradation cannot decay to exactly zero does not reduce the biological applicability of the theorem. But as the referee correctly points out any mathematical theorem needs to be accurately stated and stand on its own regardless of whether biological systems could realize particular edge cases. Also note, that the requirement that the cellular ensemble has reached a time-independent distribution of cell-to-cell variability can be (approximately) experimentally verified by taking snapshots of ensemble variability at two sufficiently separate different moments in time. 

      In response to the referee’s comment, we have added the above requirements when stating the theorem in the main text. We have also added the requirement of non-decay of the degradation rate to the definition of the system in SI Sec. 4, along with the stationarity requirement in theorem 1 in SI Sec 5. We have also added mathematical details to the proof of the invariant in SI Sec 5.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      This manuscript presents a method to infer causality between two genes (and potentially proteins or other molecules) based on the non-genetic fluctuations among cells using a version of the dual-reporter assay as a causal control, where one half of the dual-reporter pair is causally decoupled, as it is inactive. The authors propose a statistical invariant identity to formalize this idea. They propose and experimentally demonstrate the utility of this idea with a synthetic reporter system in bacteria. 

      The paper is well written and clearly outlines the principle, the mathematical invariant relationship both to give the reader an intuitive understanding of why the relationship must be true and in their mathematical derivation of the proof of Theorem 1. 

      The paper outlines a theoretical formalism, which, if experimentally used, can be useful in causal network inference, which is a great need in the study of biological systems. However, the practical utility of this method may not be straightforward and potentially be quite difficult to execute. We think this work could offer a platform to advance the field of network inference, but would encourage the authors to address the following comments. 

      We thank the reviewer for the positive comments on readability, summarizing the value of our work, as well as the critical comments below that helped us improve the manuscript.

      Major comments: 

      (1) Although the invariant identity seems theoretically sound, the data from synthetic engineered circuits in this manuscript do not support that the invariant holds for natural causal relations between genes in wild-type cells. In all the positive control synthetic circuits (numbers 1 to 4) the target gene Z i.e. RFP was always on the plasmid, and in circuit #4 there was an additional endogenous copy. The authors recapitulate the X-to-Z causality in circuits 1, 2, and 3 but not 4. Ultimately, the utility of this method lies in the ability to capture causality from endogenous correlations, this observation suggests that the method might not be useful for that task. 

      We thank the referee for their careful reading of our synthetic circuits and sincerely apologize for an error in our description of circuit #4 in the schematic of Table S2 of the supplement. We incorrectly stated that this circuit contained a chromosomally expressed RFP. In fact, in circuit #4 RFP was only on the plasmid just like in the circuits #1-3. We have corrected the schematic in the revised manuscript and have verified that the other circuits are correctly depicted.

      In the revised manuscript, we now explicitly spell out that all our “positive control” test cases had the genes of interest expressed on plasmids, and that we have not shown that our method successfully detected causal interactions in a chromosomally encoded gene regulatory circuit, see additional statements in Sec. “Causally connected genes that break the invariant” on p. 6. 

      In the absence of any explicit experimental evidence, it is then important to consider whether chromosomally encoded circuits are expected to cause problems for our method which is based on a fluctuation test. Due to plasmid copy number fluctuations, X and Z will fluctuate significantly more when expressed on plasmids than when expressed chromosomally. However, because this additional variability is shared between X and Z it does not help our analysis which relies on stochastic differences in X and Z expression due to “intrinsic noise” effects downstream of copy number fluctuations. The additional “extrinsic noise” fluctuations due to plasmid copy number variability would wash out violations of Eq. (2) rather than amplify them. If anything, we thus expect our test cases to have been harder to analyze than endogenous fluctuations. This theoretical expectation is indeed borne out by numerical test cases presented in the revised supplement where plasmid copy fluctuations severely reduced the violations of Eq. 2, see new additional SI Sec. 15. 

      Additionally, the case of the outlier circuit (number 12) suggests that exogenous expression of certain genes may lead to an imbalance of natural stoichiometry and lead to indirect effects on target genes which can be misinterpreted as causal relations. Knocking out the endogenous copy may potentially ameliorate this issue but that remains to be tested. 

      We agree with the referee that the expression of exogenous genetic reporters can potentially affect cellular physiology and lead to undesired effects. In the revised manuscript we now explicitly spell out that the metabolic burden or the phototoxicity of introducing fluorescent proteins could in principle cause artificial interactions that do not correspond to the natural gene regulatory network, see Sec. “Proposed additional tests” on p. 8.

      However, it is also important to consider that the test circuit #12 represents a synthetic circuit with genes that were expressed at extremely high levels (discussed in 3rd paragraph of Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit”, p. 8), which led to the presumed cellular burden. Arguably, natural systems would not typically exhibit such high expression levels, but importantly even if they did, our method does not necessarily rely on fluorescently tagged proteins but can, in principle, also be applied to other methods such as transcript counting through sequencing or in-situ hybridization of fluorescent probes.  

      Ultimately, the value of this manuscript will be greatly elevated if the authors successfully demonstrate the recapitulation of some known naturally existing causal and non-causal relations. For this, the authors can choose any endogenous gene Z that is causally controlled by gene X. The gene X can be on the exogenous plasmid along with the reporter and the shared promoter. Same for another gene Z' which is not causally controlled by gene X. Potentially a knockout of endogenous X may be required but it might depend  on what genes are chosen. 

      If the authors think the above experiments are outside the scope of this manuscript, they should at least address these issues and comment on how this method could be effectively used by other labs to deduce causal relations between their favorite genes. 

      Because a full analysis of naturally occurring gene interactions was beyond the scope of our work, we agree with the referee’s suggestion to add a section to discuss the limitations of our experimental results. In the revised manuscript we reiterate that additional investigations are needed to show that the method works to detect causal interactions between endogenous genes, see Abstract (p. 1), Introduction (p. 1-2), Sec. “Proposed additional tests” (p. 8), and “Limitations of this study”  (p. 9). In the original manuscript we explicitly spelled out how other researchers can potentially carry out this further work in the subsections titled “Transcriptional dual reporters” (p. 3) and ”Translational dual reporters” (p. 3).  In the revised manuscript, we have added a section “Proposed additional tests” (p. 8) in which we propose an experiment analogous to the one proposed by the referee above, involving an endogenous gene circuit found in E. coli, as an example to test our invariant. 

      (2) For a theoretical exposition that is convincing, we suggest the authors simulate a larger network (for instance, a network with >10 nodes), like the one shown schematically in Figure 1, and demonstrate that the invariant relationship holds for the causally disconnected entities, but is violated for the causally related entities. It would also be interesting to see if any quantification for the casual distance between "X" and the different causally related entities could be inferred.  

      We thank the referee for this suggestion. We have added SI Sec. 14 where we present simulation results of a larger network with 10 nodes. We find that all of the components not affected by X satisfy Eq. (2) as they must. However, it is important to consider that we have analytically proven the invariant of Eq. (2) for all possible systems. It provably applies equally to networks with 5, 100, or 10,000 components. The main purpose of the simulations presented in Fig. (2) is to illustrate our results and to show that correlation coefficients do not satisfy such an invariant. However, they are not used as a proof of our mathematical statements.

      We thank the referee for the interesting suggestion of quantifying a “causal distance”. Unfortunately, the degree to which Eq. (2) is violated cannot directly equate to an absolute measure for the “causal distance” of an interaction. This is because both the strength of the interaction and the size of the stochastic fluctuations in X affect the degree to which Eq. (2) is violated. The distance from the line should thus be interpreted as a lower bound on the causal effect from X to Z because we do not know the magnitude of stochastic effects inherent to the expression of the dual reporters X and Y. While the dual reporters X and Y are identically regulated, they will differ due to stochastic fluctuations. Propagation of these fluctuations from X to Z are what creates an asymmetry between the normalized covariances. In the most extreme example, if X and Y do not exhibit any stochastic fluctuations we have x(t)=y(t) for all times and Eq. (2) will not be violated even in the presence of a strong causal link from X to Z.

      However, it might be possible to infer a relative causal distance to compare causal interactions within cells.

      That is, in a given network, the normalized covariances between X, Y and two other components of interest Z1, Z2 that are affected by X can be compared. If the asymmetry between (η𝑥𝑧1 , η𝑦𝑧1) is larger than the asymmetry between (η𝑥𝑧2 , η𝑦𝑧2) , then we might be able to conclude that X affects Z1 with a stronger interaction than the interaction from X to Z2, because here the intrinsic fluctuations in X are the same in both cases. 

      In response to the referee’s comment and to test the idea of a relative causal distance, we have simulated a larger network made of 10 components. In this network, X affects a cascade of components called Z8, Z9, and Z10, see the additional SI Sec. 14. Here the idea of a causal distance can be defined as the distance down the cascade: Z8 is closest to X and so has the largest causal strength, whereas Z10 has the weakest. Indeed, simulating this system we find that the asymmetry between η𝑥𝑧8 and η𝑦𝑧8 is the largest whereas that between  η𝑥𝑧10 and η𝑦𝑧10 the smallest. We also find that all of the components not affected by X have normalized covariances that satisfy Eq. (2). This result suggests that the relative causal distance or strength in a network could potentially be estimated from the degree of the violations of Eq. (2). 

      However, we note that these are preliminary results. In the case of the specific regulatory cascade now considered in SI Sec. 14, the idea of a causal distance can be well defined. Once feedback is introduced into the system, this definition may no longer make sense. For instance, consider the same network that we simulate in SI Sec. 14, but where the most downstream component in the cascade, Z10, feeds back and affects X and Y. In such a circuit it is unclear whether Z8 or Z10 is “causally closer” to X. A more thorough theoretical analysis, equipped with a more universal quantitative definition for causal distance or strength, would be needed to deduce what information can be inferred from the relative distances in the violations of Eq. (2). While this defines an interesting research question, answering it goes beyond the scope of the current manuscript. 

      Minor comments: 

      - The method relies on the gene X and the reporter Y having the same control which would result in similar dynamics. The authors do not quantitatively compare the YFP and CFP expression if this indeed holds for the synthetic circuits. It would be useful to know how much deviation between the two can be tolerated while not affecting the outcome. 

      We thank the referee for their comment. The invariant of Eq. (2) is indeed only guaranteed to hold only when the transcription rate of Y is proportional to that of X. How much levels of X and Y covary depends on the stochastic effects intrinsic to the expression of the dual reporters as well as how similar the transcriptional control of X and Y is. The stochastic difference between X and Y is exactly what we exploit. 

      However, in the limit of high YFP and CFP levels, intrinsic fluctuations that cause stochastic expression differences between X and Y become negligible and we can directly infer whether they are indeed tightly co-regulated from time-traces: Below, we show two single cell traces taken with our experimental setup in which the YFP and CFP fluorescence trajectories are almost exactly proportional. Both of these traces are from circuit #10 as defined in Table. S4. 

      Author response image 1.

      We chose the above traces because they showed the highest correlation between YFP and CFP levels. Other traces for lower expression levels have lower correlations due to effects of intrinsic noise (see Tables S2-S4). However, the existence of one trace in which YFP is almost perfectly proportional to CFP throughout can only occur if the YFP and CFP genes are under the same control. And, since the control of YFP and CFP genes in all of our synthetic circuits are identical (with the same promoters and plasmid positions), these data strongly suggest that our dual reporters are tightly co-regulated in all the synthetic circuits. Moreover, the negative control experiments presented in Fig. 3E provide a natural consistency check that the YFP and CFP are under the same control and satisfy Eq. (1).

      We agree that it would be useful to know how much the X and Y production rates can differ for Eq. (2) to hold. Importantly, our proven theorem already allows for the rates to differ by an unspecified proportionality constant. In response to the referee’s comment we have derived a more general condition under which our approach holds. In the newly added SI Sec. 7 we prove that Eq. (2) holds also when rates differ as long as the difference is stochastic in nature with an average of zero. We also prove that Eq. (2) holds in the face of multiplicative noise that is independent of the X and Y production rates.

      However, the production rates of X and Y cannot differ in all ways. Some types of differences between the X and Y production rates can lead to deviations of Eq. (2) even when there is no causal interaction. To highlight this, we added the results of simulations of a toy model in which the X and Y production rates differ by an additive noise term that does not average to zero, see Fig. S19B of the newly added SI Sec. 7.

      - The invariant should potentially hold true for any biological species that are causally related e.g. protein-protein interactions. Also, this method could potentially find many applications in eukaryotic cells. Although it's outside the scope of current work to experimentally demonstrate such applications, the authors should comment on experimental strategies to apply this method to overcome potential pitfalls (e.g. presence of enhancers in eukaryotic cells). 

      We thank the referee for this suggestion. We agree that there are potential pitfalls that could come into effect when our proposed approach is applied on more complex systems such as eukaryotic gene expression. In response to the referee’s comment, we have added an explicit discussion of these potential pitfalls in the discussion section “Limitations of this study” (see p. 10). 

      In particular, in eukaryotes there are many genes in which promoter sequences may not be the sole factor determining transcription rates. Other factors that can be involved in gene regulation include the presence of enhancers, epigenetic modifications, and bursts in gene expression, to name a few. We thus propose a few strategies, which include positioning the passive reporter at a similar gene loci as the gene of interest, measuring the gene regulation activities of the gene of interest and its passive reporter using a separate method, and exploiting the invariant with a third gene, where it is known there is no causal interaction, as a consistency check. In addition, we include in the SI a new section SI Sec. 8 which shows that the invariant holds in the face of many types of bursty gene expression dynamics.

      However, the above is not a comprehensive list. Some of the issues the referee mentions are serious and may not be straightforward to overcome. We now spell this out explicitly in the revised manuscript (p. 10). 

      - In the legend of Fig. 1, the sentence "Data points here are for..." is missing a few words, or needs to be rephrased. 

      We thank the referee for this comment. We have rewritten the figure caption, which now reads “Data points are numerical simulations of specific example networks (see SI for details) to illustrate the analytically proven theorem of Eq. 2.”

      - Fig. 2 talks about the uncertainties associated with each point on the scatter plots. However, it is difficult to understand the quantification in such a plot. It would be great to have a plot quantifying the uncertainties in the invariant relation for the different topologies studied, specifically in order to understand if one topology is consistently deviating more from the x=y line than the other topologies studied here.  

      We thank the referee for this suggestion. In the supplement of the revised manuscript we have added supplemental Figs. S3, S4, and  S5 to separately quantify the uncertainty of the difference processes plotted in Fig. 2 and have added a new section (SI Sec. 11) to discuss the processes simulated in Fig. 2 in more detail. In short, each simulated process generated less than ~5% of outliers when considering 95% confidence intervals (with the max percentage deviation being 5.01% for process 5, see Fig. S5). These outliers were then simulated over a larger number of simulations to reduce the sampling error, which resulted in 0% of outliers (see Sec. “Confidence intervals for finite sampling error” on Materials and Methods on p. 11). Some simulated processes generated larger percentage errors in the normalized covariances than others, but this is expected as different processes have different dynamics which will result in different degrees of sampling of the underlying distributions.

      Note, that the invariant of Eq. 2 is analytically proven for all tested topologies as none of the topologies include a causal effect from X to Z. Any deviation of the numerical data from the straight line prediction of Eq. 2 (right column in Fig. 2C) is due to the finite sampling of a stochastic process to estimate the true covariance from the sampling covariance. Any given parameter set was simulated several times which allowed us to estimate the sampling error from differences in between repeated samples. In the additional SI figures we now quantify this error for the different topologies. 

      In addition to the above changes we want to highlight that the purpose of the simulations presented in Fig. (2) is not to prove our statements or explore the behavior of different topologies. The purpose of the data presented in the right column of Fig. 2C is to illustrate the theoretical invariant and act as a numerical sanity check of our analytically proven result. In contrast, the data in the left column of Fig 2C illustrates that the correlations do not satisfy an invariant like Eq. 2 which applies to covariances but not correlations.  

      - The legend for Fig. 3 seems to end abruptly. There likely needs to be more.  

      We thank the referee for catching this mistake. We have corrected the accidentally truncated figure caption of Fig. 3.

      - There is a typo in equation (5.3) on page 23 of supplementary material, there should be x instead of y in the degradation equation of x. 

      We thank the referee for catching this mistake which has been corrected in the revised manuscript.

      - In the supplemental material, to understand the unexpected novel discovery of causality, Figure S5 is presented. However, this doesn't give the context for other negative controls designed, and the effect of rfp dynamics (which can be seen in the plots both in the main paper and the supplement) in the growth rate of cells in those constructs. As a baseline, it would be nice to have those figures.  

      We thank the referee for this suggestion. We have now included representative RFP traces with the growth rates for other negative control circuits, see Fig. S10. In addition, we have now included the cross correlation functions between RFP and growth rate in these negative control circuits, see Fig. S10A. While in all cases, RFP and growth rate are negatively correlated, the outlier circuit exhibits the largest negative correlation.

      The suggested comparison of the referee thus highlights that – in isolation – a negative correlation between RFP and growth rate is only weak evidence for our hypothesized causal interaction because negative correlations can result from the effect of growth rate affecting volume dilution and thus RFP concentration. Crucially, we thus additionally considered the overall variability of growth rate and found the outlier circuit has the largest growth rate variability which is indicative of something that is affecting the growth rate of those cells, see Fig. S10B. To compare the magnitude of RFP variability against other strains requires constraining the comparison group to other synthetic circuits that have RFP located on the chromosome rather than a plasmid. This is why we compare the CV of the outlier with the CV of circuit #5, which corresponds to the “regular” repressilator (i.e., the outlier circuit without the endogenous lacI gene). As an additional comparison, we computed the CV for a strain of E. coli that does not contain a synthetic plasmid at all, but still contains the RFP gene on the chromosome. We find that the CVs in the outlier circuit to be larger than in these two additional circuits, suggesting that the outlier circuit causes additional fluctuations in the RFP and growth rate. We now spell this out explicitly in the revised manuscript (see Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit“, p. 8).

      The referee is correct that the above arguments are only circumstantial evidence, but they do show that the data is consistent with a plausible explanation of the hypothesized causal interaction. Our main evidence for an RpoS mediated stress response that explains the deviations from Eq. 2 in the outlier circuit is the perturbation experiment in which the deviation disappears for the RpoS knockout strain. We now spell out this argument explicitly in the revised manuscript (see Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit“, p. 8).

      Reviewer #2 (Recommendations For The Authors): 

      The proof of theorem 1 relies on an earlier result, lemma 1. Lemma 1 only guarantees the existence of a "dummy" system that satisfies the separation requirement and preserves the dynamics of X and Y. However, in principle, it may be possible to maintain the dynamics of X and Y while still changing the relationship between Cov(X,Zk) and Cov(Y,Zk). This could occur if the dynamics of Zk differ in a particular way between the original system and the dummy system. So lemma 1 needs to be a little stronger- it needs  to mention that the dynamics of Zk are preserved, or something along these lines. The proof of lemma 1 appears to contain the necessary ingredients for what is actually needed, but this should be clarified. 

      We agree with the referee that this is an important distinction. Lemma 1 does in fact guarantee that any component Zk that is not affected by X and Y will have the same dynamics in the “dummy” system. However, as the referee points out, this is not stated in the lemma statement nor in the proof of the lemma. In response to the referee’s comment, we have made it clear in the lemma statement that the Zk dynamics are preserved in the “dummy” system, and we have also added details to the proof to show that this is the case, see Lemma 1 on p. 27 of the SI. 

      Readers who are familiar with chemical reaction diagrams, but not birth-death process diagrams may waste some time trying to interpret Equation 1 as a chemical reaction diagram with some sort of rate constant as a label on each arrow (I did this). It may be helpful to either provide a self-contained definition of the notation used, or mention a source where the necessary definitions can be found. 

      We agree with the referee. In the revised manuscript we have added a description of the notation used below Equation 1 of the main text, see p. 2. The notational overloading of the “arrow notation” is a perennial problem in the field and we thank the referee for reminding us of the need to clarify what the arrows mean in our diagrams.

      It would be helpful if the authors could propose a rule for deciding whether dependence is detected or not. As it stands presently, the output of the approach seems to be a chart like that in Figure 3D where you show eta_xz and eta_yz with confidence interval bars and the reader must visually assess whether the points more-or-less fall on the line of unity. It would be better to have some systematic procedure for making a "yes or no" call as to whether a causal link was detected or not. Having a systematic detection rule would allow you to make a call as to whether dependence in circuit 3 was detected or not. It would also allow you or a future effort to evaluate the true positive rate of the approach in simulated settings. 

      We thank the referee for this suggestion. In the revised manuscript we have added an explicit rule for detecting causality using the invariant of Eq. (2). Specifically, Eq. (2) can be re-written as r = 1 where r is the covariability ratio r = etaXZ/etaYZ. In that case, given 95% confidence intervals for the experimentally determined covariability ratio r, we say that there is a causal interaction if the confidence intervals overlap with the value of r = 1. 

      This corresponds to a null hypothesis test at the 2.5% significance level. The reason that it is at 2.5% significance and not 5% significance is as follows. Let’s say we measure a covariability ratio of r_m, and the 95% confidence interval is [r_m - e_m, r_m + e_m] for some error e_m. Without loss of generality, let’s say that r_m > 1 (the same applies if r_m < 1). This means that Prob(r < r_m - e_m) = 2.5% and Prob(r > r_m + e_m) = 2.5% , where r is the actual value of the covariability ratio. Under the null hypothesis that there is no causal interaction, we set r = 1. However, we now have Prob(1 < r_m + e_m) = 0, because we know that r_m > 1 and so we must have r_m + e_m > 1. The probability that the value of 1 falls outside the error bars is therefore 2.5% under the null hypothesis. 

      This proposed rule is the same rule that we used to detect statistical outliers in our simulations, where we found a “false positive” rate of 2.3% over 6522 simulated systems due to statistical sampling error (as discussed in the Materials and Methods section). In response to the referee’s suggestion, we have added the section “A rule for detecting causality in the face of measurement uncertainty” (p. 4). We also apply the rule to the experimental data and find that the rule detects 2/4 causal interactions in Fig. 3D. We have clarified this in the Fig. 3D caption, in the main text, and we have added a figure in the SI (Fig. S2) where we apply the null hypothesis test on the measured covariability ratios. 

      Note, whether the third interaction is “detected” or not depends on the cut-off value used. We picked the most common 95% rule to be consistent with the traditional statistical approaches. With this rule one of the data points lies right at the cusp of detection, but ultimately falls into the “undetected” category if a strictly binary answer is sought under the above rule. 

      It would be helpful to mention what happens when the abundance of a species hits zero. Specifically, there are two ways to interpret the arrow from X to X+d with a W on top: 

      Interpretation (1): 

      P(X+d | X) = W if X+d {greater than or equal to} 0  P(X+d | X) = 0 if X_i+d_i < 0 for at least one i 

      Interpretation (2): 

      P(X+d | X) = W regardless of whether X+d < 0  W = 0 whenever X_i < d_i for at least one i 

      Interpretation (1) corresponds to a graph where the states are indexed on the non-negative integers. Interpretation (2) corresponds to a graph where the states are indexed on the integers (positive or negative), and W is responsible for enforcing the non-negativity of mass. I believe you need the second interpretation because the first interpretation leads to problems with your definition of causality. For example, consider the reaction: 

      (Na, K) -- 0.1 --> (Na-1, K+1) 

      This could occur if Na and K are the intracellular concentrations of sodium and potassium ions in a cell that has an ATP-driven sodium-potassium exchanger whose rate is limited by the frequency with which extracellular potassium ions happen to flow by. Per the definition of causality found in the appendix, Na has no causal effect on K since Na does not show up in the reaction rate term. However, under interpretation (1), Na clearly has a causal effect on K according to a reasonable definition of causality because if Na=0, then the reaction cannot proceed, whereas if Na>0 then it can. However, under interpretation (2), the reaction above cannot exist and so this scenario is excluded. 

      We thank the referee for this comment that helped us clarify the meaning of arrows with propensities. In short, interpretation (2) corresponds to the definition of our stochastic systems. This is consistent with the standard notation used for the chemical master equation. As the referee points out, because molecular abundances cannot be negative, any biochemical system must then have the property that the propensity of a reaction must be equal to zero when the system is in a state in which an occurrence of that reaction would take one of the abundances to negative numbers. Stochastic networks that do not have this property cannot correspond to biochemical reaction networks.

      In the revised manuscript, we now spell this out explicitly to avoid any confusion, see SI page 25.

      Furthermore, we additionally discuss the referee’s example in which the rate of exchanging Na for K through an ion exchanger is approximately independent of the intracellular Na concentration. Because biochemical systems cannot become negative, it cannot be that the rate is truly constant, but at some point for low concentrations must go down until it becomes exactly zero for zero molecules. 

      Importantly, agreement with Eq. (2) does not imply that there is no causal effect from X to Zk. It is the deviation from Eq. (2) that implies the existence of a causal effect from X to Zk. Therefore, although the above referee’s example would constitute a causal interaction in our framework, it would not lead to a deviation of Eq. (2) because the fluctuations in Na (which we exploit) do not propagate to K. From a practical point of view, our method thus detects whether changing X over the observed range affects the production and degradation rates of Zk. 

      In the course of setting up the negative control benchmark circuits, a perturbation-based causal validation would be nice. For instance, first, verify that X does not affect Z by intervening on X (e.g. changing its copy number or putting it under the control of an inducible promoter), and ensuring that Z's activity is not affected by such interventions upon X. This approach would help to adjudicate questions of whether the negative control circuits actually have an unknown causal link. The existing benchmark is already reasonably solid in my view, and I do not know how feasible this would be with the authors' setup, but I think that a perturbation-based validation could in principle be the gold standard benchmark.  

      We agree that additional perturbation-based validation tests on all of the negative control circuits would indeed improve the evidence that our method worked as advertised. While such experiments are indeed beyond the scope of our current work we now explicitly point out the benefits of such additional controls in the revised Discussion.

      Below is a series of comments about typography, mostly about section 4 of the supplement. 

      We thank the referee for their careful reading and highlighting those mistakes.

      At the bottom of page 21, Z_aff is defined as the set of components that are affected by X. However, later Z_aff seems to refer to components affected by X or Y. For instance, in the proof of lemma 1, it is written "However, because a is part of z_aff, the {ak} variables must be affected by X and/or Y." 

      We thank the referee for catching this mistake. We have changed the definition of Z_aff throughout the supplement to refer to components affected by X or Y. If it can be experimentally ensured that Y is a passive reporter (i.e., it does not affect other components in the cell), then the theorem can only be violated if X affects Z. 

      In the equation following Eq 5.2, W_k and d_k should be W_i and d_i ?  

      Yes, the referee is correct. In the revised manuscript we have corrected W_k and d_k to W_i and d_i. 

      In Eq 5.3 in the lower-left transition diagram, I think a "y" should be an "x". 

      Yes, the referee is correct. In the revised manuscript  we have fixed this typo.

      In the master equation above Eq 5.5, the "R" terms for the y reactions are missing the alpha term, and I think two of the beta terms need to be multiplied by x and y respectively.  

      The referee is correct. In the revised manuscript  we have fixed this typo.

      The notation of Eq 5.8, where z_k(t) is the conditional expectation of z_kt, is strange and difficult to follow. Why does z_k(t) not get a bar over it like its counterparts for x, y, R, and beta? The bars, although not a perfect solution, do help.  

      We agree with the referee’s comment and have added further explanations to define the averages in question, see SI p. 28. In short, when we condition on the history of the components not affected by X or Y, we in effect condition on the time trajectories of z_{k} (when it is part of the components not affected by X and/or Y) and beta (since it only depends on the components not affected by X or Y). We thus previously did not include the bars when taking the averages of these components in the conditional space because the conditioning in effect sets their time-trajectories (so they become deterministic functions of time). In the revised manuscript we now also denote these conditional expectations with bars and we have added comments to the proof to clarify their definition.

      I think it would be helpful to show how the relationship <x>=<y>/alpha is obtained from Eq 5.5.  

      We agree with this suggestion and have added the derivations, see Eqs. (5.9) - (5.13) in the revised SI. 

      In the main text, the legend of Fig 3 cuts off mid-sentence.  

      We thank the referee for catching this mistake which has been fixed in the revised manuscript.

    1. eLife Assessment

      This important study provides compelling data from in vitro models and patient-derived samples to demonstrate how modulation of GSK3 activity can reprogram macrophages, revealing potential therapeutic applications in inflammatory diseases such as severe COVID-19. The study stands out for its clear and systematic presentation, convincing experimental approach, and the relevance of its findings to the field of immunology.

    2. Reviewer #1 (Public review):

      The manuscript by Rios et al. investigates the potential of GSK3 inhibition to reprogram human macrophages, exploring its therapeutic implications in conditions like severe COVID-19. The authors present convincing evidence that GSK3 inhibition shifts macrophage phenotypes from pro-inflammatory to anti-inflammatory states, thus highlighting the GSK3-MAFB axis as a potential therapeutic target. Using both GM-CSF- and M-CSF-dependent monocyte-derived macrophages as model systems, the study provides extensive transcriptional, phenotypic, and functional characterizations of these reprogrammed cells. The authors further extend their findings to human alveolar macrophages derived from patient samples, demonstrating the clinical relevance of GSK3 inhibition in macrophage biology.

      The experimental design is sound, leveraging techniques such as RNA-seq, flow cytometry, and bioenergetic profiling to generate a comprehensive dataset. The study's integration of multiple model systems and human samples strengthens its impact and relevance. The findings not only offer insights into macrophage plasticity but also propose novel therapeutic strategies for macrophage reprogramming in inflammatory diseases.

      Strengths:

      (1) Robust Experimental Design: The use of both in vitro and ex vivo models adds depth to the findings, making the conclusions applicable to both experimental and clinical settings.

      (2) Thorough Data Analysis: The extensive use of RNA-seq and gene set enrichment analysis (GSEA) provides a clear transcriptional signature of the reprogrammed macrophages.

      (3) Relevance to Severe COVID-19: The study's focus on macrophage reprogramming in the context of severe COVID-19 adds clinical significance, especially given the relevance of macrophage-driven inflammation in this disease.

      Weaknesses:

      There are no significant weaknesses in the study.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The manuscript by Rios et al. investigates the potential of GSK3 inhibition to reprogram human macrophages, exploring its therapeutic implications in conditions like severe COVID-19. The authors present convincing evidence that GSK3 inhibition shifts macrophage phenotypes from pro-inflammatory to anti-inflammatory states, thus highlighting the GSK3-MAFB axis as a potential therapeutic target. Using both GM-CSF- and M-CSF-dependent monocyte-derived macrophages as model systems, the study provides extensive transcriptional, phenotypic, and functional characterizations of these reprogrammed cells. The authors further extend their findings to human alveolar macrophages derived from patient samples, demonstrating the clinical relevance of GSK3 inhibition in macrophage biology.

      The experimental design is sound, leveraging techniques such as RNA-seq, flow cytometry, and bioenergetic profiling to generate a comprehensive dataset. The study's integration of multiple model systems and human samples strengthens its impact and relevance. The findings not only offer insights into macrophage plasticity but also propose novel therapeutic strategies for macrophage reprogramming in inflammatory diseases.

      Strengths:

      (1) Robust Experimental Design: The use of both in vitro and ex vivo models adds depth to the findings, making the conclusions applicable to both experimental and clinical settings.

      (2) Thorough Data Analysis: The extensive use of RNA-seq and gene set enrichment analysis (GSEA) provides a clear transcriptional signature of the reprogrammed macrophages.

      (3) Relevance to Severe COVID-19: The study's focus on macrophage reprogramming in the context of severe COVID-19 adds clinical significance, especially given the relevance of macrophage-driven inflammation in this disease.

      Weaknesses:

      There are no significant weaknesses in the study, though some minor points could be addressed for clarity and completeness, as outlined in the recommendations below.

      Many thanks for these comments. Please find below the response to the  specific recommendations.

      Recommendations for the authors:

      (1) In lines 263-266, the term "MoMac-VERSE" and its associated clusters are introduced without sufficient explanation. The authors should provide additional clarification on what these clusters represent and how they were derived.

      We have revised the text according to the reviewer´s suggestion and followed the original nomenclature of the MoMac-VERSE monocyte/macrophage clusters, also recognizing the procedure for their identification. The newly modified text now states: "Thus, analysis of the MoMac-VERSE (a resource that identified conserved monocyte and macrophage states derived from healthy and pathologic human tissues) (GSE178209) (2), indicated that GSK3 inhibition augments the expression of the gene sets that define MoMac-VERSE subsets identified as long-term resident macrophages [Cluster HES1_Mac (#2)] and tumor-associated macrophages with an M2-like signature [Clusters HES1_Mac (#2), TREM2_Mac (#3), C1Q<sup>hi</sup>_Mac (#16) and FTL_Mac (#17)] (2) (Figure 1H)."

      (2) In line 283, the reference labeled "2227" appears incorrect. It seems to be a formatting issue, and it might refer to references 22-27. Please verify and correct.

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (3) In line 353, the reference is incorrect. Please reviewe ensure that all references are properly cited throughout the manuscript.

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (4) In line 368, one of the patient samples shows a decreased IL-10 response after CHIR treatment. The authors should acknowledge the heterogeneity in the primary cell responses and adjust the conclusion accordingly to reflect this variability.

      We have modified the text following the reviewer´s comment, and acknowledge the heterogeneity in the production of IL-10 after GSK3 inhibition in the three analyzed samples. The modified text now states: "Consistent with these findings, CHIR-AMØ exhibited higher expression of MAFB (Figure 6F) whose increase correlated with an augmented secretion of Legumain, CCL2 and IL-10 (Figure 6G), although the latter was only seen in two samples, probably reflecting heterogeneity in primary cell responses."

      (5) Figure 7B: the UMAP shows 4 populations, but according to the visualization in the sup fig 3, there should be many more clusters. How do the authors explain this? Are these patient-specific clusters? Also, IMs can be separated into at least subpopulations. Can the authors plot also bona fide macrophage markers expressed by all subpopulations?

      To clarify this whole issue, and avoid misleading visualization of donor-specific clusters (see below), we have now replaced all UMAP plots shown in the previous version (in old Figure 7 and old Supplementary Figure 3) with new UMAP plots after running scVI reduction. In addition, we are including a new Supplementary Figure (new Supplementary Figure 3) that contains the information of the 21310 single-cell transcriptomes from human lungs reported in GSE128033 (ref. 47) after filtering and integration [nFeature > 200 and < 6000; Unique Molecular Identifiers (nCount) > 1000) and % of mitochondrial genes (< 15 %)]. Besides, old Supplementary Figure 3 has been replaced by the new Supplementary Figure 4, which includes the information of the single-cell transcriptomes from human lung macrophages selected from GSE128033 (ref. 47) based on their expression of the monocyte/macrophage-associated markers CD163, FABP4, LYVE1 or FCN1.

      Addresing the first question, UMAPs in old Figure 7B and old Supplementary Figure 3B had a different  number of clusters because old Figure 7B was derived from old Supplementary Figure 3B after grouping macrophage clusters according to the expression of previously defined markers and to limit the weight of donor-specific clusters. Specifically, the macrophage clusters from old Figure 7B were re-grouped according to the differential expression of:

      - FCN1 (including cluster 4, 7 and 12 from Figure 7B): Infiltrating monocytes.

      - FABP4 and TYMS-negative (including clusters 0, 2, 5 and 13 from Figure 7B), or MARCO and INHBA (cluster 9 from Figure 7B) or PPARG (cluster 11 from Figure 7B): Alveolar macrophages (AMØ).

      - TYMS, MKI67, TOP2A and NUSAP1 (cluster 15 from Figure 7B): Proliferating AMØ.

      - LYVE1 or RNASE1 or LGMN (including clusters 1, 3, 6, 8, 10 and 14 from Figure 7B): Interstitial Macrophages (IMØ).

      As the reviewer suggested, this type of UMAP plot yielded a large number of donor-specific clusters. To avoid such a misleading representation, we have now plotted UMAPs after running scVI reduction in every case. The new plots are now shown in new Figure 7A, new Figure 7B, new Supplementary Figure 3 (containing the information of the 21310 single-cell transcriptomes from GSE128033) and the novel Supplementary Figure 4 (with the information of the single-cell transcriptomes from human lung macrophages from GSE128033).

      Finally, to address the last issue, we have now plotted the expression of genes used for macrophage definition (CD163, FABP4, LYVE1, FCN1), as well as proliferation-associated genes (TYMS, MKI67, TOP2A, NUSAP1) and other bona fide macrophage marker genes (SPI1, FOLR2) in Supplementary Figure 4C.

      (6) statistics should be indicated in every figure legend and for every subfigure where applicable.

      We have now included the specific statistical procedure applied for each Figure and panel.

      Reviewer 2 (Public review):

      The study by Rios and colleagues provides the scientific community with a compelling exploration of macrophage plasticity and its potential as a therapeutic target. By focusing on the GSK3-MAFB axis, the authors present a strong case for macrophage reprogramming as a strategy to combat inflammatory and fibrotic diseases, including severe COVID-19. Using a robust and comprehensive methodology, in this study it is conducted a broad transcriptomic and functional analyses and offers valuable mechanistic insights while highlighting its clinical relevance

      Strengths:

      Well performed and analyzed

      Weaknesses:

      Additional analyses, including mechanistic studies, would increase the value of the study

      In an effort to address the comment of the reviewer, we have performed more detailed analysis of the kinetics and dose-response effects of GSK3 inhibition, which are now provided as new Supplementary Figure 3A.

      Regarding additional mechanistic studies, we decided to explore the relationship between inactive GSK3β and MAFB levels at the early stages of M-CSF- or GM-CSF-driven monocyte-to-macrophage differentiation. These experiments, performed in three independent monocyte preparations, indicated that, 48 hours along differentiation, M-CSF promoted a huge increase in both MAFB expression and a slight (albeit significant) rise in inactive GSK3β (P-Ser9-GSK3β) (compared to either untreated or GM-CSF-treated monocytes), further supporting the macrophage re-programming effect of GSK3. However, since the M-CSF-promoted increase in MAFB levels was much robust than the enhancement in inactive GSK3β, we hypothesize that proteasomal degradation of MAFB might be also distinct between M-CSF- (M-MØ) and GM-CSF-dependent (GM-MØ) monocyte-derived macrophages.

      Author response image 1.

      Total GSK3β, p-Ser9-GSK3β and MAFB levels in three preparations of freshly purified monocytes either unstimulated (-) or stimulated with M-CSF (10 ng/ml) or GM-CSF (1,000 U/ml) at different time points, as determined by Western blot (upper panel). Vinculin protein levels were determined as protein loading control. Mean ± SEM of the GSK3β/Vinculin, p-Ser9-GSK3β/Vinculin, and MAFB/Vinculin protein ratios from the three independent experiments are shown (lower panel) (paired Student’s t test: *, p<0.05; ****, p<0.001).

      Based on this finding, we then determined proteasome activity in fully differentiated M-CSF- and GM-CSF-dependent monocyte-derived macrophages. Use of the Immunoproteasome Activity Fluorometric Assay Kit II (UBPBio) in M-MØ and GM-MØ, either untreated or exposed to the proteasome inhibitor MG132, revealed that immune-proteasomal and proteasomal activity is significantly stronger in GM-MØ than in M-MØ,  as demonstrated in assays for chymotrypsin-like (ANW) and branched amino acid preferring (PAL) activity (immunoproteasome), and trypsin-like (KQL) activity (both proteasome and immunoproteasome). This result suggested that, indeed, immunoproteasomal activity might contribute to the differential expression of MAFB in M-MØ and GM-MØ.

      Author response image 2.

      Immunoproteasome activity in M-MØ and GM-MØ, either untreated or exposed to MG132, as determined using the Immunoproteasome Activity Fluorometric Assay Kit II (UBPBio) on the three indicated peptides (upper panel).  Mean ± SEM of three independent experiments are shown (paired Student’s t test: *, p<0.05) (lower panel).

      Consequently, we next set up experiments to assess whether the proteasome inhibitor MG132 was capable of enhancing the expression of MAFB-dependent genes in GM-MØ. Preliminary results of GM-MØ exposure to MG132 for 6 hours indicated an increase in the expression of MAFB protein and the MAFB-dependent genes LGMN and IL10. , as well as a reduction in the expression of the GM-MØ-specific gene CD1C.

      Author response image 3.

      A. Schematic representation of the exposure of MG132 to GM-MØ for 6 hours. B. MAFB protein levels in four independent preparations of GM-MØ exposed to either DMSO (DMSO-GM-MØ) or the proteasome inhibitor MG132 (MG132-GM-MØ) for 6 hours, as determined by Western blot (left panel). GAPDH protein levels were determined as protein loading control. Mean ± SEM of the MAFB/GAPDH protein ratios from the four independent experiments are shown (right panel) (paired Student’s t test: ***, p<0.005). C. Relative mRNA levels of the indicated genes in DMSO-GM-MØ and MG132-GM-MØ, as determined by RT-PCR on seven independent samples (paired Student’s t test: ***, p<0.005; ****, p<0.001).

      Unfortunately, this proteasome inhibitor (MG-132) caused a great reduction in cell viability after 6-8 hours. Since a similar decrease in cell viability was observed upon analysis with the ONX-0914 immunoproteasome inhibitor, we could not procede any further with this approach.

      Given the reviewer´s suggestion to include mechanistic insights to the manuscript, we are now providing these results (and the corresponding figures) only for the reviewer´s information and to make clear our attempts to comply with his/her request.

      Recommendations for the authors:

      The results are of interest, and only some minor issues need to be addressed to strengthen the conclusions of the study.

      We gratefully thank the reviewer for his/her comments. 

      (1) This study employs a single dose of 10 μM of the GSK3 inhibitor CHIR-99021 for 48 hours, which is reasonable for in vitro studies. However, further investigation into the effect of different doses and exposure times could provide additional insight into optimal dosing and durability of reprogramming effects. In addition, would an alternative GSK3 inhibitors have comparable effects?

      Following the reviewer suggestion, we have performed a kinetics and dose-response analysis of the effects of CHIR-99021, using MAFB protein levels as a readout. This experiments is now shown in new Supplementary Figure 1A, that replaces the old Supplementary Figure 1A panel where a shorter kinetics was presented. Results of this new experiment indicates a maximal effect of 10µM CHIR-99021, and that the effect of the inhibitor becomes maximal 24-48 hours after treatment. The text has been modified accordingly, and it now states: "Kinetics and dose-response analysis of the effects of CHIR-99021 on MAFB expression showed that maximal protein levels were achieved after a 24-48 hour exposure to 10µM CHIR-99021 (Supplementary Figure 1A), conditions that were used hereafter."

      Regarding the use of alternative GSK3 inhibitors, we had already provided that information in Supplementary Figure 1B, where the effects of SB-216763 (10 µM) or LiCl (10 mM) were evaluated. The huge reversal of the Tyr<sup>216</sup>/Ser<sup>9</sup> GSK3β phosphorylation ratio observed with CHIR-99021 was not seen with other GSK3 inhibitors, as indicated in the text. In any event, we believe that the relevance of this result with SB-216763 or LiCl is minimized by the results generated after siRNA-mediated GSK3 knockdown (shown in Figure 4), that completely reproduced the effects seen with CHIR-99021.

      (2) Why in the "reanalysis of single cell RNAseq data" section, the authors use Seurat v5 (R) but then change to python, and the other way around?

      As indicated in the documentation for Integrative Analysis in Seurat v5 (https://satijalab.org/seurat/articles/seurat5_integration), scVIIntegration requires reticulate package which allow us to run Python environment in R.

      (3) When the authors refer to the clusters enriched in MoMacVERSE, they use the labels of the clusters (for example #2 or #3). I would suggest using the annotations described in the original paper, to link it to the bibliography published through the labels established in the paper.

      We have revised the text according to the reviewer´s suggestion and followed the original nomenclature of the MoMac-VERSE monocyte/macrophage clusters, also recognizing the procedure for their identification. The newly modified text now states: "Thus, analysis of the MoMac-VERSE (a resource that identified conserved monocyte and macrophage states derived from healthy and pathologic human tissues) (GSE178209) (2), indicated that GSK3 inhibition augments the expression of the gene sets that define MoMac-VERSE subsets identified as long-term resident macrophages [Cluster HES1_Mac (#2)] and tumor-associated macrophages with an M2-like signature [Clusters HES1_Mac (#2), TREM2_Mac (#3), C1Q<sup>hi</sup>_Mac (#16) and FTL_Mac (#17)] (2) (Figure 1H)."

      (4) In line 309. Is there any significance on the "having a stronger effect"?

      We apologize for the misleading sentence. The phrase has been modified for better clarity, and the text now states: "Like CHIR-99021, silencing of both GSK3A and GSK3B augmented the expression of MAFB, with the simultaneous silencing of both GSK3A and GSK3B genes having a stronger effect (Figure 4B), and modulated the expression of 329 genes (Figure 4C,D)."

      (5) In line 337, "(22)(27)", are these references?

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (6) In the single-cell reanalysis, could you please provide integration Qc plots? It would be interesting to have it on the paper.

      To clarify this whole issue, and avoid misleading visualization of donor-specific clusters (see below), we have now replaced all UMAP plots shown in the previous version (in old Figure 7 and old Supplementary Figure 3) with new UMAP plots after running scVI reduction. In addition, we are including a new Supplementary Figure (new Supplementary Figure 3) that contains the information of the 21310 single-cell transcriptomes from human lungs reported in GSE128033 (ref. 47) after filtering and integration [nFeature > 200 and < 6000; Unique Molecular Identifiers (nCount) > 1000) and % of mitochondrial genes (< 15 %)]. Besides, old Supplementary Figure 3 has been replaced by the new Supplementary Figure 4, which includes the information of the single-cell transcriptomes from human lung macrophages selected from GSE128033 (ref. 47) based on their expression of the monocyte/macrophage-associated markers CD163, FABP4, LYVE1 or FCN1.

      As requested by the reviewer, we are now providing the Qc plots for the re-analysis in the new Supplementary Figures 3 and 4.

    1. eLife Assessment

      This important work presents a stochastic branching process model of tumour-immune coevolution, incorporating stochastic antigenic mutation accumulation and escape within the cancer cell population. They then used this model to investigate how tumour-immune interactions influence tumour outcome and the summary statistics of sequencing data of bulk and single-cell sequencing of a tumour. The evidence is currently incomplete: statistical comparisons between the observed mutational burden distribution and theoretical predictions in the absence of immune selection should be carried out. Conclusions should be tested extensively for robustness/sensitivity to parameters.

    2. Reviewer #1 (Public review):

      Summary:

      The topic of tumor-immune co-evolution is an important, understudied topic with, as the authors noted, a general dearth of good models in this space. The authors have made important progress on the topic by introducing a stochastic branching process model of antigenicity/immunogenicity and measuring the proportion of simulated tumors that go extinct. The model is extensively explored, and the authors provide some nice theoretical results in addition to simulated results.

      Major comments

      The text in lines 183-191 is intuitively and nicely explained. However, I am not sure all of it follows from the figure panels in Figure 2. For example, the authors refer to a mutation that has a large immunogenicity, but it's not shown how many mutations, or the relative size of the mutations in Figure 2. The same comment holds true for the claim that spikes also arise for mutations with low antigenicity.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors developed a model of tumour-immune dynamics, incorporating stochastic antigenic mutation accumulation and escape within the cancer cell population. They then used this model to investigate how tumour-immune interactions influence tumour outcome and summary statistics of sequencing data.

      Strengths:

      This novel modeling framework addresses an important and timely topic. The authors consider the useful question of how bulk and single-cell sequencing may provide insights into the tumour-immune interactions and selection processes.

      Weaknesses:

      One set of conclusions presented in the paper is the presence of cyclic dynamics between effector/cancer cells, antigenicity, and immunogenicity. However, these conclusions are supported in the manuscript by two sample trajectories of stochastic simulations, and these provide mixed support for the conclusions (i.e. the phasing asynchrony described in the text does not seem to apply to Figure 2C). Similarly, the authors also find immune selection effects on the shape of the mutational burden in Figure 5 D/H using a qualitative comparison between the distributions and theoretical predictions in the absence of immune response. However the discrepancy appears quite small in panel D, and there are no quantitative comparisons provided to evaluate the significance. An analysis of the robustness of all the conclusions to parameter variation is missing. Lastly, the role of the Appendix results in the main messages of the paper is unclear.

    1. eLife Assessment

      This valuable study shows that locomotion-related modulations in the mouse visual cortex are not uniform but primarily affect neurons in muscarinic receptor-negative patches, which receive projections from specific cortical areas. While the evidence is mostly solid, some uncertainties remain regarding the link between anatomical data and functional measurements. The study should be of interest to neuroscientists interested in state modulation of cortical function.

    2. Reviewer #1 (Public review):

      Processing in the primary visual cortex (V1) of mice is not only based on sensory inputs but also strongly modulated by locomotion. In this study, Meier et al. ask whether neurons that are modulated by locomotion form clusters in V1. Their work is based on previous studies from their lab establishing a modularity in the organization of primary visual cortex based on M2-muscarinic-acetylcholine-receptor-positive patches and interpatches (Ji et al. 2015, D'Souza et al. 2019). In these studies, they have highlighted the clustering of specific visual pathways and inhibition. In the current study, they extend this modularity to motor inputs, confirming a clustering of locomotion modulated neurons but also show that these clusters overlap with the M2-negative interpatches of layer 1. Finally, they establish a blueprint for visual processing streams in V1, segregating projections to and from lateral visual areas (LM, AL, and RL) from projections to and from the lateral areas, including the visual area PM, the retrosplenial cortex (RSP), and the secondary motor area (MOs).

      Conceptually, this study provides an important finding in the organization of locomotion-related signaling in primary visual cortex, which clearly has substantial implications for sensory processing in visual cortex. While the anatomical data are solid, the link to physiology is incomplete. In conclusion, there are numerous issues that leave the main findings in some doubt, so the authors have some work to do before I find this story convincing.

      Major issues:

      (1) The major results in this study rely on proper quantification of neuronal responses during resting and running. Recently, it has been reported that hemodynamic occlusion can strongly influence measurements of fluorescent changes using two-photon imaging (Yogesh et al. 2025, doi.org/10.1101/2024.10.29.620650). Since it is unclear whether there is an inherent bias in vasculature and hemodynamic occlusion in M2 patches and interpatches, a quantification of the effect of hemodynamic occlusion would be necessary. This control would ideally be done using mice with GFP expression to test if there is still a clustering of locomotion-modulated neurons that overlaps with M2-negative interpatches. Alternatively, the authors should at the very least quantify the vascularization in M2 patches and interpatches.

      (2) To assess the effects, the authors use a correlation analysis for many of their findings (e.g., Figures 2b,c, 4j,k, ...). This, however, is inappropriate to assess the significance of the results. I suggest redoing all statistics with hierarchical bootstrap sampling (Saravanan et al. 2020, PMID: 33644783) or similar.

      (3) The authors use two different measures to assess whether and to what extent a neuron is locomotion sensitive, the LMI and "locomotion-responsive". While the LMI is defined based on recording in the light and dark (Figure 2), the "locomotion-responsiveness" is defined only in the dark (Figure 3a,c,d). The link between the two measures should be clarified.

      a) Additionally, Figure 2b shows higher average LMI for interpatches, but the locomotion-responsive fraction is similar in interpatches and patches (relative number of pairs in Figure 3c and Figure 3d). How do the authors explain this discrepancy?

      b) How is the LMI calculated - based on the average or the maximum response over stimuli? One particular stimulus? If the LMI is defined for each stimulus separately, what is plotted in Figure 2b?

      (4) In the last panels of Figures 4-7, the authors analyze the alignment of cell bodies with the M2 patches. While in superficial layers it might be straightforward to align the cell body locations with the M2 patches and interpatches in layer 1, this alignment does not appear to be trivial for deeper layers. The authors should provide additional material to convince the reader of the proper alignment.

      (5) Related to point 4 above - Given the importance of a proper alignment of M2 patches with the in vivo imaging, the in vivo - ex vivo alignment should be more convincing than Figure 1 C-E. Measuring M2 patches in vivo (as the authors have tried to do) would have provided more solid evidence. Have the authors tried to remove the dura for their in vivo imaging to increase signal-to-noise? In any case, more examples of proper alignment are necessary.

      (6) The authors state that locomotion selectively affects M2-/M2- pairs based on Figure 3c. However, to make this claim, there should be a significant difference between the correlation of stimulus-driven noise of M2-/M2- locomotion-responsive pairs and M2-/M2- locomotion-unresponsive pairs, AND no significant difference in the same analysis for M2+/M2+ pairs (i.e., testing the differences between the bars in Figure 3c and Figure 3d).

    3. Reviewer #2 (Public review):

      Summary:

      Meier et al. explore the variability of locomotion-related modulations in mouse area V1. They present 4 major findings: V1 L2/3 neurons beneath M2- interpatches are more strongly locomotion-modulated than those beneath M2+ patches, while V1 L2/3 neurons are more strongly orientation tuned. They then use viral tracing to examine the relationship of M2- interpatches and M2+ patches with inputs from and outputs to HVOs, MO, RSP, and LP, and find evidence for different closed-loop subnetworks within L1; these relationships, however, are more complicated for cell bodies in L2/3. Finally, they also describe an overlap between M2- interpatches and SOM+ dendrites/axons.

      Strengths:

      The strength of the manuscript is the detailed anatomical quantification of closed-loop connectivity, and the description of the organizing principles of M2- interpatches and M2+ patches.

      Weaknesses:

      The major weakness of the manuscript is the lack of a direct connection between the functional and the anatomical data, and the somewhat puzzling effects observed in the analysis of noise correlations. The former issue might be alleviated by modelling, where the authors could explore the space of possibilities that could explain the functional data based on the anatomical connectivity. Some control analyses could be done, for the comparison of noise correlations.

    4. Reviewer #3 (Public review):

      The authors build on the large body of their previous research, which showed that the mouse primary visual cortex is organised into two types of clusters, M2+ and M2-, which exhibit distinct input patterns from thalamus and higher visual cortical areas and distinct visual tuning preferences. The current study reveals that a like-to-like projection from within-cluster neurons to the areas that provide feedback projections and, furthermore, that neurons in the M2- clusters are more strongly affected by non-visual signals about the locomotion of the animal.

      The study adds fundamental insights to our understanding of the principles of cortical organisation and computation, specifically how the cortex integrates sensory and action-related signals.

      While the tracing data are very convincing, data analysis should be strengthened to support the claims:

      (1) The locomotion modulation index (LMI) compares the mean activity during running and not running but does not seem to account for differences between visual stimuli, so that the LMI could be influenced by the neuron's visual tuning rather than its sensitivity to locomotion, e.g. if the mouse was running more when the neuron's preferred stimulus was presented. Trials should first be averaged per stimulus, and then across stimuli. Alternatively, only the preferred stimulus could be considered.

      The significance test (unpaired t-test) suffers from the same flaw. Instead an ANOVA (with stimulus parameter as factor) would resolve the problem, or testing whether fitting the data with two tuning curves (one per locomotion state) or a single curve results in a lower error (using cross-validation).

      Given that there is evidence that specific visual stimuli can induce more or less running in mice, this issue is very important to account for behavioural differences across stimuli.

      (2) All bars in Figure 2b show a lower LMI than the reported mean LMI of 0.19. This should be checked.

      (3) Correlation tests: Pearson correlation is only meaningful when applied to continuous data. A more suitable test for discrete data like the M2 patch quantile is a rank test like Kendall's coefficient of rank correlation. This applies to data in Figure 2b,c, 4j,k, Figure 2 - Supplement 2,1a, etc.

      (4) How OSI was determined should be clarified. Specifically, were R_pref and R_ortho the mean responses to the two opposite movement directions? Similarly, how was the half-width at half-maximum of orientation determined? From the fits in Figure 2a, it looks like the widths of both Gaussians can be different.

      (5) The correlation measures in Figure 3 would greatly benefit from additional analyses to help interpretation of the results.

      a) Correlations between neurons typically increase with increasing firing rates (e.g., de la Rocha J, Doiron B, Shea-Brown E, Josić K, Reyes A. 2007. Correlation between neural spike trains increases with firing rate. Nature 448:802-6. doi:10.1038/nature06028). Could the higher correlations in M2+ pairs (Figure 3a) be explained by higher firing rates in M2+ compared to M2- neurons?

      b) To determine correlations in Figure 3a, trials during locomotion and stationarity were pooled. As locomotion impacts the firing rate of the neurons, it would be helpful to separate correlations between the two states, locomotion vs stationarity, so the measures reflect something closer to "noise correlations" rather than tuning to locomotion.

      c) Similarly, in Figure 3b, I wonder whether the large correlations in M2- pairs are driven by locomotion rather than functional connectivity. As suggested in b, a better test of noise correlations would be to account for locomotion, i.e., separate trials by stimulus identity and locomotion state. To prevent conditions with few trials from having greater weight in the overall noise correlations, I suggest the authors first z-score responses per condition, then determine noise correlations across all trials (as explained in Renart et al., 2010).

      d) Correlations in Figure 3a,b should be tested with an ANOVA and a control for multiple tests.

      (6) In plots like Figure 4j-l, it would be very informative to show individual measures (per ROI and mouse) in addition to mean +- SEM. As the counts are low (<10) it wouldn't obstruct the plot.

      (7) The caption of Figure 4l says that most retrogradely labelled cells are located in L2/3. However, the plot only shows data from L2/3 and a single section of L4, so one cannot compare it to other layers. Can the authors corroborate the claim with data from other layers?

      (8) Methods:<br /> The authors should provide more details on the visual stimuli: What was the background on which gratings were presented? How long was the inter-stimulus interval? What was presented during the inter-stimulus interval? How large were gratings used to map tuning to SF, TF, and orientation?

    1. eLife Assessment

      The findings are important and intriguing, with theoretical or practical implications beyond a single subfield. The computational methods employed are clever and sophisticated and the strength of evidence is convincing. Many of the methodological concerns raised after the first round of review were addressed in the revised version, although all three reviewers also highlighted that the exploratory nature of the paper and the lack of clarity regarding the hypotheses make it hard to assess the impact of the results on existing theories.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use a sophisticated task design and Bayesian computational modeling to test their hypothesis that information generalization (operationalized as a combination of self-insertion and social contagion) in social situations is disrupted in Borderline Personality Disorder. Their main finding relates to the observation that two different models best fit the two tested groups: While the model assuming both self-insertion and social contagion to be present when estimating others' social value preferences fit the control group best, a model assuming neither of these processes provided the best fit to BPD participants.

      Strengths:

      The revisions have substantially strengthened the paper and the manuscript is much clearer and easier to follow now. The strengths of the presented work lie in the sophisticated task design and the thorough investigation of their theory by use of mechanistic computational models to elucidate social decision-making and learning processes in BPD.

      Weaknesses:

      Some critical concerns remain after the first revision, particularly regarding the use of causal language and the clarity of the hypotheses and results, specified in the points below.

      (1) The authors frequently refer to their predictions and theory as being causal, both in the manuscript and in their response to reviewers. However, causal inference requires careful experimental design, not just statistical prediction. For example, the claim that "algorithmic differences between those with BPD and matched healthy controls" are "causal" in my opinion is not warranted by the data, as the study does not employ experimental manipulations or interventions which might predictably affect parameter values. Even if model parameters can be seen as valid proxies to latent mechanisms, this does not automatically mean that such mechanisms cause the clinical distinction between BPD and CON, they could plausibly also refer to the effects of therapy or medication. I recommend that such causal language, also implicit to expressions like "parameter influences on explicit intentional attributions", is toned down throughout the manuscript.

      (2) Although the authors have now much clearer outlined the stuy's aims, there still is a lack of clarity with respect to the authors' specific hypotheses. I understand that their primary predictions about disruptions to self-other generalisation processes underlying BPD are embedded in the four main models that are tested, but it is still unclear what specific hypotheses the authors had about group differences with respect to the tested models. I recommend the authors specify this in the introduction rather than refering to prior work where the same hypotheses may have been mentioned.

      (3) Caveats should also be added about the exploratory nature of the many parameter group comparisons. If there are any predictions about group differences that can be made based on prior literature, the authors should make such links clear.

      (4) I'm not sure I understand why the authors, after adding multiple comparison correction, now list two kinds of p-values. To me, this is misleading and precludes the point of multiple comparison corrections, I therefore recommend they report the FDR-adjusted p-values only. Likewise, if a corrected p-value is greater than 0.05 this should not be interpreted as a result.

      (5) Can the authors please elaborate why the algorithm proposed to be employed by BPD is more 'entropic', especially given both their self-priors and posteriors about partners' preferences tended to be more precise than the ones used by CON? As far as I understand, there's nothing in the data to suggest BPD predictions should be more uncertain. In fact, this leads me to wonder, similarly to what another reviewer has already suggested, whether BPD participants generate self-referential priors over others in the same way CON participants do, they are just less favourable (i.e., in relation to oneself, but always less prosocial) - I think there is currently no model that would incorporate this possibility? It should at least be possible to explore this by checking if there is any statistical relationship between the estimated θ_ppt^m and 〖p(θ〗_par |D^0).

      "To note, social contagion under M3 was highly correlated with contagion under M1 (see Fig S11). This provides some preliminary evidence that trauma impacts beliefs about individualism directly, whereas trauma and persecutory beliefs impact beliefs about prosociality through impaired trait mentalising" - I don't understand what the authors mean by this, can they please elaborate and add some explanation to the main text?

    3. Reviewer #2 (Public review):

      Summary:

      The paper investigates social-decision making, and how this changes after observing the behaviour of other people, in borderline personality disorder. The paper employs a task including three phases, the first where participants make decision on how to allocate rewards to oneself and to a virtual partner, the second where they observe the same task performed by someone else, and a third phase equivalent to phase one, but with a new partner. Using sophisticated computational modelling to analyse choice data, the study reports that borderline participants (versus controls) are more certain about their preferences in phase one, used more neutral priors and are less flexible during phase two, and are less influenced by partners in phase three.

      Strengths:

      The topic is interesting and important, and the findings are potentially intriguing. The computational methods employed is clever and sophisticated, at the cutting edge of research in the field.

      Weaknesses:

      The paper is not based on specific empirical hypotheses formulated at the outset, but, rather, it uses an exploratory approach. Indeed, the task is not chosen in order to tackle specific empirical hypotheses. This, in my view, is a limitation since the introduction reads a bit vague and it is not always clear which gaps in the literature the paper aims to fill. As a further consequence, it is not always clear how the findings speak to previous theories on the topic.

    4. Reviewer #3 (Public review):

      In this paper, the authors use a three-phase economic game to examine the tendency to engage in prosocial versus competitive exchanges with three anonymous partners. In particular, they consider individual differences in the tendency to infer about others' tendencies based on one's preferences and to update one's preferences based on observations of others' behavior. The study includes a sample of individuals diagnosed with borderline personality disorder and a matched sample of psychiatrically healthy control participants.

      On the whole, the experimental design is well-suited to the questions and the computational model analyses are thorough, including modern model-fitting procedures. I particularly appreciated the clear exposition regarding model parameterization and the descriptive Table 2 for qualitative model comparison. In the revised manuscript, the authors now provide a more thorough treatment of examining group differences in computational parameters given that the best-fitting model differed by group. They also examine the connection of their task and findings to related research focusing on self-other representation and mentalization (e.g., Story et al., 2024).

      The authors note that the task does not encourage competition and instead captures individual differences in the motivation to allocate rewards to oneself and others in an interdependent setting. The paper could have been strengthened by clarifying how the Social Value Orientation framework can be used to interpret the motivations and behavior of BPD versus CON participants on the task. Although the authors note that their approach makes "clear and transparent a priori predictions," the paper could be improved by providing a clear and consolidated statement of these predictions so that the results could be interpreted vis-a-vis any a priori hypotheses.

      Finally, the authors have amended their individual difference analyses to examine psychometric measures such as the CTQ alongside computational model parameter estimate differences. I appreciate that these analyses are described as exploratory. The approach of using a partial correlation network with bootstrapping (and permutation) was interesting, but the logic of the analysis was not clearly stated. In particular, there are large group (Table 1: CON vs. BPD) differences in the measures introduced into this network. As a result, it is hard to understand whether any partial correlations are driven primarily by mean differences in severity (correlations tend to be inflated in extreme groups designs due to the absence of observation in middle of scales forming each bivariate distribution). I would have found these exploratory analyses more revealing if group membership was controlled for.

    5. Author response:

      The following is the authors’ response to the original reviews

      Response to the Editors’ Comments

      Thankyou for this summary of the reviews and recommendations for corrections. We respond to each in turn, and have documented each correction with specific examples contained within our response to reviewers below.

      ‘They all recommend to clarify the link between hypotheses and analyses, ground them more clearly in, and conduct critical comparisons with existing literature, and address a potential multiple comparison problem.’

      We have restructured our introduction to include the relevant literature outlined by the reviewers, and to be more clearly ground the goals of our model and broader analysis. We have additionally corrected for multiple comparisons within our exploratory associative analyses. We have additionaly sign posted exploratory tests more clearly.

      ‘Furthermore, R1 also recommends to include a formal external validation of how the model parameters relate to participant behaviour, to correct an unjustified claim of causality between childhood adversity and separation of self, and to clarify role of therapy received by patients.’

      We have now tempered our language in the abstract which unintentionally implied causality in the associative analysis between childhood trauma and other-to-self generalisation. To note, in the sense that our models provide causal explanations for behaviour across all three phases of the task, we argue that our model comparison provides some causal evidence for algorithmic biases within the BPD phenotype. We have included further details of the exclusion and inclusion criteria of the BPD participants within the methods.

      R2 specifically recommends to clarify, in the introduction, the specific aim of the paper, what is known already, and the approach to addressing it.’

      We have more thoroughly outlined the current state of the art concerning behavioural and computational approaches to self insertion and social contagion, in health and within BPD. We have linked these more clearly to the aims of the work.

      ‘R2 also makes various additional recommendations regarding clarification of missing information about model comparison, fit statistics and group comparison of parameters from different models.’

      Our model comparison approach and algorithm are outlined within the original paper for Hierarchical Bayesian Model comparison (Piray et al., 2019). We have outlined the concepts of this approach in the methods. We have now additionally improved clarity by placing descriptions of this approach more obviously in the results, and added points of greater detail in the methods, such as which statistics for comparison we extracted on the group and individual level.

      In addition, in response to the need for greater comparison of parameters from different models, we have also hierarchically force-fitted the full suite of models (M1-M4) to all participants. We report all group differences from each model individually – assuming their explanation of the data - in Table S2. We have also demonstrated strong associations between parameters of equivalent meaning from different models to support our claims in Fig S11. Finally, we show minimal distortion to parameter estimates in between-group analysis when models are either fitted hierarchically to the entire population, or group wise (Figure S10).

      ‘R3 additionally recommends to clarify the clinical and cognitive process relevance of the experiment, and to consider the importance of the Phase 2 findings.’

      We have now included greater reference to the assumptions in the social value orientation paradigm we use in the introduction. We have also responded to the specific point about the shift in central tendencies in phase 2 from the BPD group, noting that, while BPD participants do indeed get more relatively competitive vs. CON participants, they remain strikingly neutral with respect to the overall statespace. Importantly, model M4 does not preclude more competitive distributions existing.

      ‘Critically, they also share a concern about analyzing parameter estimates fit separately to two groups, when the best-fitting model is not shared. They propose to resolve this by considering a model that can encompass the full dynamics of the entire sample.’

      We have hierarchically force-fitted the full suite of models (M1-M4) to all participants to allow for comparison between parameters within each model assumption. We report all group differences from each model individually – assuming their explanation of the data - in Table S2 and Table S3. We have also demonstrated strong associations between parameters of equivalent meaning from different models to support our claims in Fig S11. We also show minimal distortion to parameter estimates in between-group analysis when models are either fitted hierarchically to the entire population, or group wise (Figure S10).

      Within model M1 and M2, the parameters quantify the degree to which participants believe their partner to be different from themselves. Under M1 and M2 model assumptions, BPD participants have meaningfully larger versus CON (Fig S10), which supports the notion that a new central tendency may be more parsimonious in phase 2 (as in the case of the optimal model for BPD, M4). We also show strong correlations across models between under M1 and M2, and the shift in central tendenices of beliefs between phase 1 and 2 under M3 and M4. This supports our primary comparison, and shows that even under non-dominant model assumptions, parameters demonstrate that BPD participants expect their partner’s relative reward preferences to be vastly different from themselves versus CON.

      ‘A final important point concerns the psychometric individual difference analyses which seem to be conducted on the full sample without considering the group structure.’

      We have now more clearly focused our psychometric analysis. We control for multiple comparisons, and compare parameters across the same model (M3) when assessing the relationship between paranoia, trauma, trait mentalising, and social contagion. We have relegated all other exploratory analyses to the supplementary material and noted where p values survive correction using False Discovery Rate.

      Reviewer 1:

      ‘The manuscript's primary weakness relates to the number of comparisons conducted and a lack of clarity in how those comparisons relate to the authors' hypotheses. The authors specify a primary prediction about disruption to information generalization in social decision making & learning processes, and it is clear from the text how their 4 main models are supposed to test this hypothesis. With regards to any further analyses however (such as the correlations between multiple clinical scales and eight different model parameters, but also individual parameter comparisons between groups), this is less clear. I recommend the authors clearly link each test to a hypothesis by specifying, for each analysis, what their specific expectations for conducted comparisons are, so a reader can assess whether the results are/aren't in line with predictions. The number of conducted tests relating to a specific hypothesis also determines whether multiple comparison corrections are warranted or not. If comparisons are exploratory in nature, this should be explicitly stated.’

      We have now corrected for multiple comparisons when examining the relationship between psychometric findings and parameters, using partial correlations and bootstrapping for robustness. These latter analyses were indeed not preregistered, and so we have more clearly signposted that these tests were exploratory. We chose to focus on the influence of psychometrics of interest on social contagion under model M3 given that this model explained a reasonable minority of behaviour in each group. We have now fully edited this section in the main text in response, and relegated all other correlations to the supplementary materials.

      ‘Furthermore, the authors present some measures for external validation of the models, including comparison between reaction times and belief shifts, and correlations between model predicted accuracy and behavioural accuracy/total scores. However it would be great to see some more formal external validation of how the model parameters relate to participant behaviour, e.g., the correlation between the number of pro-social choices and ß-values, or the correlation between the change in absolute number of pro-social choices and the change in ß. From comparing the behavioural and computational results it looks like they would correlate highly, but it would be nice to see this formally confirmed.’

      We have included this further examination within the Generative Accuracy and Recovery section:

      ‘We also assessed the relationship (Pearson rs) between modelled participant preference parameters in phase 1 and actual choice behaviour: was negatively correlated with prosocial versus competitive choices (r=-0.77, p<0.001) and individualistic versus competitive choices (r=-0.59, p<0.001); was positively correlated with individualistic versus competitive choices (r=0.53, p<0.001) and negatively correlated with prosocial versus individualistic choices (r=-0.69, p<0.001).’

      ‘The statement in the abstract that 'Overall, the findings provide a clear explanation of how self-other generalisation constrains and assists learning, how childhood adversity disrupts this through separation of internalised beliefs' makes an unjustified claim of causality between childhood adversity and separation of self - and other beliefs, although the authors only present correlations. I recommend this should be rephrased to reflect the correlational nature of the results.’

      Sorry – this was unfortunate wording: we did not intend to imply causation with our second clause in the sentence mentioned. We have amended the language to make it clear this relationship is associative:

      ‘Overall, the findings provide a clear explanation of how self-other generalisation constrains and assists learning, how childhood adversity is associated with separation of internalised beliefs, and makes clear causal predictions about the mechanisms of social information generalisation under uncertainty.’

      ‘Currently, from the discussion the findings seem relevant in explaining certain aberrant social learning and -decision making processes in BPD. However, I would like to see a more thorough discussion about the practical relevance of their findings in light of their observation of comparable prediction accuracy between the two groups.’

      We have included a new paragraph in the discussion to address this:

      ‘Notably, despite differing strategies, those with BPD achieved similar accuracy to CON participants in predicting their partners. All participants were more concerned with relative versus absolute reward; only those with BPD changed their strategy based on this focus. Practically this difference in BPD is captured either through disintegrated priors with a new median (M4) or very noisy, but integrated priors over partners (M1) if we assume M1 can account for the full population. In either case, the algorithm underlying the computational goal for BPD participants is far higher in entropy and emphasises a less stable or reliable process of inference. In future work, it would be important to assess this mechanism alongside momentary assessments of mood to understand whether more entropic learning processes contribute to distressing mood fluctuation.’

      ‘Relatedly, the authors mention that a primary focus of mentalization based therapy for BPD is 'restoring a stable sense of self' and 'differentiating the self from the other'. These goals are very reminiscent of the findings of the current study that individuals with BPD show lower uncertainty over their own and relative reward preferences, and that they are less susceptible to social contagion. Could the observed group differences therefore be a result of therapy rather than adverse early life experiences?’

      This is something that we wish to explore in further work. While verbal and model descriptions appear parsimonious, this is not straight forward. As we see, clinical observation and phenomenological dynamics may not necessarily match in an intuitive way to parameters of interest. It may be that compartmentalisation of self and other – as we see in BPD participants within our data – may counter-intuitively express as a less stable self. The evolutionary mechanisms that make social insertion and contagion enduring may also be the same that foster trust and learning.

      ‘Regarding partner similarity: It was unclear to me why the authors chose partners that were 50% similar when it would be at least equally interesting to investigate self-insertion and social contagion with those that are more than 50% different to ourselves? Do the authors have any assumptions or even data that shows the results still hold for situations with lower than 50% similarity?’

      While our task algorithm had a high probability to match individuals who were approximately 50% different with respect to their observed behaviour, there was variation either side of this value. The value of 50% median difference was chosen for two reasons: 1. We wanted to ensure participants had to learn about their partner to some degree relative to their own preferences and 2. we did not want to induce extreme over or under familiarity given the (now replicated) relationship between participant-partner similarity and intentional attributions (see below). Nevertheless, we did have some variation around the 50% median. Figure 3A in the top left panel demonstrates this fluctuation in participant-partner similarity and the figure legend further described this distribution (mean = 49%, sd = 12%). In future work we want to more closely manipulate the median similarity between participants and partners to understand how this facilitates or inhibits learning and generalisation.

      There is some analysis of the relationship between degrees of similiarity and behaviour. In the third paragraph of page 15 we report the influence of participant-partner similarity on reaction times. In prior work (Barnby et al., 2022; Cognition) we had shown that similarity was associated with reduced attributions of harm about a partner, irrespective of their true parameters (e.g. whether they were prosocial/competitive). We replicate this previous finding with a double dissociation illustrated in Figure 4, showing that greater discrepancies in participant-partner prosociality increases explicit harmful intent attributions (but not self-interest), and discrepancies in participant-partner individualism reduces explicit self-interest attributions (but not harmful intent). We have made these clearer in our results structure, and included FDR correction values for multiple comparisons.

      The methods section is rather dense and at least I found it difficult to keep track of the many different findings. I recommend the authors reduce the density by moving some of the secondary analyses in the supplementary materials, or alternatively, to provide an overall summary of all presented findings at the end of the Results section.

      We have now moved several of our exploratory findings into the supplementary materials, noteably the analysis of participant-partner similarity on reaction times (Fig S9), as well as the uncorrected correlation between parameters (Fig S7).

      Fig 2C) and Discussion p. 21: What do the authors mean by 'more sensitive updates'? more sensitive to what?

      We have now edited the wording to specify ‘more belief updating’ rather than ‘sensitive’ to be clearer in our language.

      P14 bottom: please specify what is meant by axial differences.

      We have changed this to ‘preference type’ rather than using the term ‘axial’.

      It may be helpful to have Supplementary Figure 1 in the main text.

      Thank you for this suggestion. Given the volume of information in the main text we hope that it is acceptable for Figure S1 to remain in the supplementary materials.

      Figure 3D bottom panel: what is the difference between left and right plots? Should one of them be alpha not beta?

      The left and right plots are of the change in standard deviation (left) and central tendency (right) of participant preference change between phase 1 and 3. This is currently noted in the figure legend, but we had added some text to be clearer that this is over prosocial-competitive beliefs specifically. We chose to use this belief as an example given the centrality of prosocial-comeptitive beliefs in the learning process in Figure 2. We also noticed a small labelling error in the bottom panels of 3D which should have noted that each plot was either with respect to the precision or mean-shift in beliefs during phase 3.

      ‘The relationship between uncertainty over the self and uncertainty over the other with respect to the change in the precision (left) and median-shift (right) in phase 3 prosocial-competitive beliefs .’

      Supplementary Figure 4: The prior presented does not look neutral to me, but rather right-leaning, so competitive, and therefore does indeed look like it was influenced by the self-model? If I am mistaken please could the authors explain why.

      This example distribution is taken from a single BPD participant. In this case, indeed, the prior is somewhat right-shifted. However, on a group level, priors over the partner were closely centred around 0 (see reported statistics in paragraph 2 under the heading ‘Phase 2 – BPD Participants Use Disintegrated and Neutral Priors). However, we understand how this may come across as misleading. For clarity we have expanded upon Figure S4 to include the phase 1 and prior phase 2 distributions for the entire BPD population for both prosocial and individualistic beliefs. This further demonstrates that those with BPD held surprisingly neutral beliefs over the expectations about their partners’ prosociality, but had minor shifts between their own individualistic preferences and the expected individualistic preferences of their partners. This is also visible in Figure S2.

      Reviewer 2:

      ‘There are two major weaknesses. First, the paper lacks focus and clarity. The introduction is rather vague and, after reading it, I remained confused about the paper's aims. Rather than relying on specific predictions, the analysis is exploratory. This implies that it is hard to keep track, and to understand the significance, of the many findings that are reported.’

      Thank you for this opportunity to be clearer in our framing of the paper. While the model makes specific causal predictions with respect to behavioural dynamics conditional on algorithmic differences, our other analyses were indeed exploratory. We did not preregister this work but now given the intriguing findings we intent to preregister our future analyses.

      We have made our introduction clearer with respect to the aims of the paper:

      ‘Our present work sought to achieve two primary goals: 1. Extend prior causal computational theories to formalise the interrelation between self-insertion and social contagion within an economic paradigm, the Intentions Game and 2., Test how a diagnosis of BPD may relate to deficits in these forms of generalisation. We propose a computational theory with testable predictions to begin addressing this question. To foreshadow our results, we found that healthy participants employ a mixed process of self-insertion and contagion to predict and align with the beliefs of their partners. In contrast, individuals with BPD exhibit distinct, disintegrated representations of self and other, despite showing similar average accuracy in their learning about partners. Our model and data suggest that the previously observed computational characteristics in BPD, such as reduced self-anchoring during ambiguous learning and a relative impermeability of the self, arise from the failure of information about others to transfer to and inform the self. By integrating separate computational findings, we provide a foundational model and a concise, dynamic paradigm to investigate uncertainty, generalization, and regulation in social interactions.’

      ‘Second, although the computational approach employed is clever and sophisticated, there is important information missing about model comparison which ultimately makes some of the results hard to assess from the perspective of the reader.’

      Our model comparison employed what is state of the art random-effects Bayesian model comparison (Piray et al., 2019; PLOS Comp. Biol.). It initially fits each individual to each model using Laplace approximation, and subsequently ‘races’ each model against each other on the group level and individual level through hierarchical constraints and random-effect considerations. We included this in the methods but have now expanded on the descrpition we used to compare models:

      In the results -

      ‘All computational models were fitted using a Hierarchical Bayesian Inference (HBI) algorithm which allows hierarchical parameter estimation while assuming random effects for group and individual model responsibility (Piray et al., 2019; see Methods for more information). We report individual and group-level model responsibility, in addition to protected exceedance probabilities between-groups to assess model dominance.’

      We added to our existing description in the methods –

      ‘All computational models were fitted using a Hierarchical Bayesian Inference (HBI) algorithm which allows hierarchical parameter estimation while assuming random effects for group and individual model responsibility (Piray et al., 2019). During fitting we added a small noise floor to distributions (2.22e<sup>-16</sup>) before normalisation for numerical stability. Parameters were estimated using the HBI in untransformed space drawing from broad priors (μM\=0, σ<sup>2</sup><sub>M</sub> = 6.5; where M\={M1, M2, M3, M4}). This process was run independently for each group. Parameters were transformed into model-relevant space for analysis. All models and hierarchical fitting was implemented in Matlab (Version R2022B). All other analyses were conducted in R (version 4.3.3; arm64 build) running on Mac OS (Ventura 13.0). We extracted individual and group level responsibilities, as well as the protected exceedance probability to assess model dominance per group.’

      (1) P3, third paragraph: please define self-insertion

      We have now more clearly defined this in the prior paragraph when introducing concepts.

      ‘To reduce uncertainty about others, theories of the relational self (Anderson & Chen, 2002) suggest that people have availble to them an extensive and well-grounded representation of themselves, leading to a readily accessible initial belief (Allport, 1924; Kreuger & Clement, 1994) that can be projected or integrated when learning about others (self-insertion).’

      (2) Introduction: the specific aim of the paper should be clarified - at the moment, it is rather vague. The authors write: "However, critical questions remain: How do humans adjudicate between self-insertion and contagion during interaction to manage interpersonal generalization? Does the uncertainty in self-other beliefs affect their generalizability? How can disruptions in interpersonal exchange during sensitive developmental periods (e.g., childhood maltreatment) inform models of psychiatric disorders?". Which of these questions is the focus of the paper? And how does the paper aim at addressing it?

      (3) Relatedly, from the introduction it is not clear whether the goal is to develop a theory of self-insertion and social contagion and test it empirically, or whether it is to study these processes in BPD, or both (or something else). Clarifying which specific question(s) is addressed is important (also clarifying what we already know about that specific question, and how the paper aims at elucidating that specific question).

      We have now included our specific aims of the paper. We note this in the above response to the reviwers general comments.

      (4) "Computational models have probed social processes in BPD, linking the BPD phenotype to a potential over-reliance on social versus internal cues (Henco et al., 2020), 'splitting' of social latent states that encode beliefs about others (Story et al., 2023), negative appraisal of interpersonal experiences with heightened self-blame (Mancinelli et al., 2024), inaccurate inferences about others' irritability (Hula et al., 2018), and reduced belief adaptation in social learning contexts (Siegel et al., 2020). Previous studies have typically overlooked how self and other are represented in tandem, prompting further investigation into why any of these BPD phenotypes manifest." Not clear what the link between the first and second sentence is. Does it mean that previous computational models have focused exclusively on how other people are represented in BPD, and not on how the self is represented? Please spell this out.

      Thank you for the opportunity to be clearer in our language. We have now spelled out our point more precisely, and included some extra relevant literature helpfully pointed out by another reviewer.

      ‘Computational models have probed social processes in BPD, although almost exclusively during observational learning. The BPD phenotype has been associated with a potential over-reliance on social versus internal cues (Henco et al., 2020), ‘splitting’ of social latent states that encode beliefs about others (Story et al., 2023), negative appraisal of interpersonal experiences with heightened self-blame (Mancinelli et al., 2024), inaccurate inferences about others’ irritability (Hula et al., 2018), and reduced belief adaptation in social learning contexts (Siegel et al., 2020). Associative models have also been adapted to characterize  ‘leaky’ self-other reinforcement learning (Ereira et al., 2018), finding that those with BPD overgeneralize (leak updates) about themselves to others (Story et al., 2024). Altogether, there is currently a gap in the direct causal link between insertion, contagion, and learning (in)stability.’

      (5) P5, first paragraph. The description of the task used in phase 1 should be more detailed. The essential information for understanding the task is missing.

      We have updated this section to point toward Figure 1 and the Methods where the details of the task are more clearly outlined. We hope that it is acceptable not to explain the full task at this point for brevity and to not interrupt the flow of the results.

      “Detailed descriptions of the task can be found in the methods section and Figure 1.’

      (6) P5, second paragraph: briefly state how the Psychometric data were acquired (e.g., self-report).

      We have now clarified this in the text.

      ‘All participants also self-reported their trait paranoia, childhood trauma, trust beliefs, and trait mentalizing (see methods).’

      (7) "For example, a participant could make prosocial (self=5; other=5) versus individualistic (self=10; other=5) choices, or prosocial (self=10; other=10) versus competitive (self=10; other=5) choices". Not sure what criteria are used for distinguishing between individualistic and competitive - they look the same?

      Sorry. This paragraph was not clear that the issue is that the interpretation of the choice depends on both members of the pair of options. Here, in one pair {(self=5,other=5) vs (self=10,other=5)}, it is highly pro-social for the self to choose (5,5), sacrificing 5 points for the sake of equality. In the second pair {(self=10,other=10) vs (self=10,other=5)}, it is highly competitive to choose (10,5), denying the other 5 points at no benefit to the self. We have clarified this:

      ‘We analyzed the ‘types’ of choices participants made in each phase (Supplementary Table 1). The interpretation of a participant’s choice depends on both values in a choice. For example, a participant could make prosocial (self=5; other=5) versus individualistic (self=10; other=5) choices, or prosocial (self=10; other=10) versus competitive (self=10; other=5) choices. There were 12 of each pair in phases 1 and 3 (individualistic vs. prosocial; prosocial vs. competitive; individualistic vs. competitive).’  

      (8) "In phase 1, both CON and BPD participants made prosocial choices over competitive choices with similar frequency (CON=9.67[3.62]; BPD=9.60[3.57])" please report t-test - the same applies also various times below.

      We have now included the t test statistics with each instance.

      ‘In phase 3, both CON and BPD participants continued to make equally frequent prosocial versus competitive choices (CON=9.15[3.91]; BPD=9.38[3.31]; t=-0.54, p=0.59); CON participants continued to make significantly less prosocial versus individualistic choices (CON=2.03[3.45]; BPD=3.78 [4.16]; t=2.31, p=0.02). Both groups chose equally frequent individualistic versus competitive choices (CON=10.91[2.40]; BPD=10.18[2.72]; t=-0.49, p=0.62).’

      (9) P 9: "Models M2 and M3 allow for either self-insertion or social contagion to occur independently" what's the difference between M2 and M3?

      Model M2 hypothesises that participants use their own self representation as priors when learning about the other in phase 2, but are not influenced by their partner. M3 hypothesises that participants form an uncoupled prior (no self-insertion) about their partner in phase 2, and their choices in phase 3 are influenced by observing their partner in phase 2 (social contagion). In Figure 1 we illustrate the difference between M2 and M3. In Table 1 we specifically report the parameterisation differences between M2 and M3. We have also now included a correlational analysis of parameters between models to demonstrate the relationship between model parameters of equivalent value between models (Fig S11). We have also force fitted all models (M1-M4) to the data independently and reported group differences within each (see Table S2 and Table S3).

      (10) P 9, last paragraph: I did not understand the description of the Beta model.

      The beta model is outlined in detail in Table 1. We have also clarified the description of the beta model on page 9:

      ‘The ‘Beta model’ is equivalent to M1 in its causal architecture (both self-insertion and social contagion are hypothesized to occur) but differs in richness: it accommodates the possibility that participants might only consider a single dimension of relative reward allocation, which is typically emphasized in previous studies (e.g., Hula et al., 2018).’

      (11) P 9: I wonder whether one could think about more intuitive labels for the models, rather than M1, M2 etc.. This is just a suggestion, as I am not sure a short label would be feasible here.

      Thank you for this suggestion. We apologise that it is not very intitutive. The problem is that given the various terms we use to explain the different processes of generalisation that might occur between self and other, and given that each model is a different combination of each, we felt that numbering them was a lesser evil. We hope that the reader will be able to reference both Figure 1 and Table 1 to get a good feel for how the models and their causal implications differ.

      (12) Model comparison: the information about what was done for model comparison is scant, and little about fit statistics is reported. At the moment, it is hard for a reader to assess the results of the model comparison analysis.

      Model comparison and fitting was conducted using simultaneous hierarchical fitting and random-effects comparison. This is employed through the HBI package (Piray et al., 2019) where the assumptions and fitting proceedures are outlined in great detail. In short, our comparison allows for individual and group-level hierarchical fitting and comparison. This overcomes the issue of interdependence between and within model fitting within a population, which is often estimated separately.

      We have outlined this in the methods, although appreciate we do not touch upon it until the reader reaches that point. We have added a clarification statement on page 9 to rectify this:

      ‘All computational models were fitted using a Hierarchical Bayesian Inference (HBI) algorithm which allows hierarchical parameter estimation while assuming random effects for group and individual model responsibility (Piray et al., 2019; see Methods for more information). We report individual and group-level model responsibility, in addition to protected exceedance probabilities between-groups to assess model dominance.’

      (13) P 14, first paragraph: "BPD participants were also more certain about both types of preference" what are the two types of preferences?

      The two types of preferences are relative (prosocial-competitive) and absolute (individualistic) reward utility. These are expressed as b and a respectively. We have expanded the sentence in question to make this clearer:

      ‘BPD participants were also more certain about both self-preferences for absolute and relative reward ( = -0.89, 95%HDI: -1.01, -0.75; = -0.32, 95%HDI: -0.60, -0.04) versus CON participants (Figure 2B).’

      (14) "Parameter Associations with Reported Trauma, Paranoia, and Attributed Intent" the results reported here are intriguing, but not fully convincing as there is the problem of multiple comparisons. The combinations between parameters and scales are rather numerous. I suggest to correct for multiple comparisons and to flag only the findings that survive correction.

      We have now corrected this and controlled for multiple comparisons through partial correlation analysis, bootstrapping assessment for robustness, permutation testing, and False Detection Rate correction. We only report those that survive bootstrapping and permutation testing, reporting both corrected (p[fdr]) and uncorrected (p) significance.

      (15) Results page 14 and page 15. The authors compare the various parameters between groups. I would assume that these parameters come from M1 for controls and from M4 for BDP? Please clarify if this is indeed the case. If it is the case, I am not sure this is appropriate. To my knowledge, it is appropriate to compare parameters between groups only if the same model is fit to both groups. If two different models are fit to each group, then the parameters are not comparable, as the parameter have, so to speak, different "meaning" in two models. Now, I want to stress that my knowledge on this matter may be limited, and that the authors' approach may be sound. However, to be reassured that the approach is indeed sound, I would appreciate a clarification on this point and a reference to relevant sources about this approach.

      This is an important point. First, we confirmed all our main conclusions about parameter differences using the maximal model M1 to fit all the participants. We added Supplementary Table 2 to report the outcome of this analysis. Second, we did the same for parameters across all models M1-M4, fitting each to participants without comparison. This is particularly relevant for M3, since at least a minority of participants of both groups were best explained by this model. We report these analyses in Fig S11:

      Since the M4 is nested within M1, we argue that this comparison is still meaningful, and note explanations in the text for why the effects noted between groups may occur given the differences in their causal meaning, for example in the results under phase 2 analyses:

      ‘Belief updating in phase 2 was less flexible in BPD participants. Median change in beliefs (from priors to posteriors) about a partner’s preferences was lower versus. CON ( = -5.53, 95%HDI: -7.20, -4.00; = -10.02, 95%HDI: -12.81, -7.30). Posterior beliefs about partner were more precise in BPD versus CON ( = -0.94, 95%HDI: -1.50, -0.45;  = -0.70, 95%HDI: -1.20, -0.25).  This is unsurprising given the disintegrated priors of the BPD group in M4, meaning they need to ‘travel less’ in state space. Nevertheless, even under assumptions of M1 and M2 for both groups, BPD showed smaller posteriors median changes versus CON in phase 2 (see Table T2). These results converge to suggest those with BPD form rigid posterior beliefs.’

      (16) "We built and tested a theory of interpersonal generalization in a population of matched participants" this sentence seems to be unwarranted, as there is no theory in the paper (actually, as it is now, the paper looks rather exploratory)

      We thank the reviewer for their perspective. Formal models can be used as a theoretical statement on the casual algorithmic process underlying decision making and choice behaviour; the development of formal models are an essential theoretical tool for precision and falsification (Haslbeck et al., 2022). In this sense, we have built several competing formal theories that test, using casual architectures, whether the latent distribution(s) that generate one’s choices generalise into one’s predictions about another person, and simultaneously whether one’s latent distribution(s) that represent beliefs about another person are used to inform future choices.

      Reviewer 3:

      ‘My broad question about the experiment (in terms of its clinical and cognitive process relevance): Does the task encourage competition or give participants a reason to take advantage of others? I don't think it does, so it would be useful to clarify the normative account for prosociality in the introduction (e.g., some of Robin Dunbar's work).’

      We agree that our paradigm does not encourage competition. We use a reward structure that makes it contingent on participants to overcome a particular threshold before earning rewards, but there is no competitive element to this, in that points earned or not earned by partners have no bearing on the outcomes for the participant. This is important given the consideration of recursive properties that arise through mixed-motive games; we wanted to focus purely on observational learning in phase 2, and repercussion-free choices made by participants in phase 1 and 3, meaning the choices participants, and decisions of a partner, are theoretically in line with self-preferences irrespective of the judgement of others. We have included a clearer statement of the structure of this type of task, and more clearly cited the origin for its structure (Murphy & Ackerman, 2011):

      ‘Our present work sought to achieve two primary goals. 1. Extend prior causal computational theories to formalise and test the interrelation between self-insertion and social contagion on learning and behaviour to better probe interpersonal generalisation in health, and 2., Test whether previous computational findings of social learning changes in BPD can be explained by infractions to self-other generalisation. We accomplish these goals by using a dynamic, sequential social value economic paradigm, the Intentions Game, building upon a Social Value Orientation Framework (Murphy & Ackerman, 2011) that assumes motivational variation in joint reward allocation.’

      Given the introductions structure as it stands, we felt providing another paragraph on the normative assumptions of such a game was outside the scope of this article.

      ‘The finding that individuals with BPD do not engage in self-other generalization on this task of social intentions is novel and potentially clinically relevant. The authors find that BPD participants' tendency to be prosocial when splitting points with a partner does not transfer into their expectations of how a partner will treat them in a task where they are the passive recipient of points chosen by the partner. In the discussion, the authors reasonably focus on model differences between groups (Bayesian model comparison), yet I thought this finding -- BPD participants not assuming prosocial tendencies in phase 2 while CON participant did -- merited greater attention. Although the BPD group was close to 0 on the \beta prior in Phase 2, their difference from CON is still in the direction of being more mistrustful (or at least not assuming prosociality). This may line up with broader clinical literature on mistrustfulness and attributions of malevolence in the BPD literature (e.g., a 1992 paper by Nigg et al. in Journal of Abnormal Psychology). My broad point is to consider further the Phase 2 findings in terms of the clinical interpretation of the shift in \beta relative to controls.’

      This is an important point, that we contextualize within the parameterisation of our utility model. While the shift toward 0 in the BPD participants is indeed more competitive, as the reviewer notes, it is surprisingly centred closely around 0, with only a slight bias to be prosocial (mean = -0.47;  = -6.10, 95%HDI: -7.60, -4.60). Charitably we might argue that BPD participants are expecting more competitive preferences from their partner. However even so, given their variance around their priors in phase 2, they are uncertain or unconfident about this. We take a more conservative approach in the paper and say that given the tight proximity to 0 and the variance of their group priors, they are likely to be ‘hedging their bets’ on whether their partner is going to be prosocial or competitive. While the movement from phase 1 to 2 is indeed in the competitive direction it still lands in neutral territory. Model M4 does not preclude central tendancies at the start of Phase 2 being more in the competitive direction.

      ‘First, the authors note that they have "proposed a theory with testable predictions" (p. 4 but also elsewhere) but they do not state any clear predictions in the introduction, nor do they consider what sort of patterns will be observed in the BPD group in view of extant clinical and computational literature. Rather, the paper seems to be somewhat exploratory, largely looking at group differences (BPD vs. CON) on all of the shared computational parameters and additional indices such as belief updating and reaction times. Given this, I would suggest that the authors make stronger connections between extant research on intention representation in BPD and their framework (model and paradigm). In particular, the authors do not address related findings from Ereira (2020) and Story (2024) finding that in a false belief task that BPD participants *overgeneralize* from self to other. A critical comparison of this work to the present study, including an examination of the two tasks differ in the processes they measure, is important.’

      Thank you for this opportunity to include more of the important work that has preceded the present manuscript. Prior work has tended to focus on either descriptive explanations of self-other generalisation (e.g. through the use of RW type models) or has focused on observational learning instability in absence of a causal model from where initial self-other beliefs may arise. While the prior work cited by the reviewer [Ereira (2020; Nat. Comms.) and Story (2024; Trans. Psych.)] does examine the inter-trial updating between self-other, it does not integrate a self model into a self’s belief about an other prior to observation. Rather, it focuses almost exclusively on prediction error ‘leakage’ generated during learning about individual reward (i.e. one sided reward). These findings are important, but lie in a slightly different domain. They also do not cut against ours, and in fact, we argue in the discussion that the sort of learning instability described above and splitting (as we cite from Story ea. 2024; Psych. Rev.) may result from a lack of self anchoring typical of CON participants. Nevertheless we agree these works provide an important premise to contrast and set the groundwork for our present analysis and have included them in the framing of our introduction, as well as contrasting them to our data in the discussion.

      In the introduction:

      ‘The BPD phenotype has been associated with a potential over-reliance on social versus internal cues (Henco et al., 2020), ‘splitting’ of social latent states that encode beliefs about others (Story et al., 2023), negative appraisal of interpersonal experiences with heightened self-blame (Mancinelli et al., 2024), inaccurate inferences about others’ irritability (Hula et al., 2018), and reduced belief adaptation in social learning contexts (Siegel et al., 2020). Associative models have also been adapted to characterize  ‘leaky’ self-other reinforcement learning (Ereira et al., 2018), finding that those with BPD overgeneralize (leak updates) about themselves to others (Story et al., 2024). Altogether, there is currently a gap in the direct causal link between insertion, contagion, and learning (in)stability.’

      In the discussion:

      ‘Disruptions in self-to-other generalization provide an explanation for previous computational findings related to task-based mentalizing in BPD. Studies tracking observational mentalizing reveal that individuals with BPD, compared to those without, place greater emphasis on social over internal reward cues when learning (Henco et al., 2020; Fineberg et al., 2018). Those with BPD have been shown to exhibit reduced belief adaptation (Siegel et al., 2020) along with ‘splitting’ of latent social representations (Story et al., 2024a). BPD is also shown to be associated with overgeneralisation in self-to-other belief updates about individual outcomes when using a one-sided reward structure (where participant responses had no bearing on outcomes for the partner; Story et al., 2024b). Our analyses show that those with BPD are equal to controls in their generalisation of absolute reward (outcomes that only affect one player) but disintegrate beliefs about relative reward (outcomes that affect both players) through adoption of a new, neutral belief. We interpret this together in two ways: 1. There is a strong concern about social relativity when those with BPD form beliefs about others, 2. The absence of constrained self-insertion about relative outcomes may predispose to brittle or ‘split’ beliefs. In other words, those with BPD assume ambiguity about the social relativity preferences of another (i.e. how prosocial or punitive) and are quicker to settle on an explanation to resolve this. Although self-insertion may be counter-intuitive to rational belief formation, it has important implications for sustaining adaptive, trusting social bonds via information moderation.’

      In addition, perhaps it is fairer to note more explicitly the exploratory nature of this work. Although the analyses are thorough, many of them are not argued for a priori (e.g., rate of belief updating in Figure 2C) and the reader amasses many individual findings that need to by synthesized.’

      We have now noted the primary goals of our work in the introduction, and have included caveats about the exploratory nature of our analyses. We would note that our model is in effect a causal combination of prior work cited within the introduction (Barnby et al., 2022; Moutoussis et al., 2016). This renders our computational models in effect a causal theory to test, although we agree that our dissection of the results are exploratory. We have more clearly signposted this:

      ‘Our present work sought to achieve two primary goals. 1. Extend prior causal computational theories to formalise and test the interrelation between self-insertion and social contagion on learning and behaviour to better probe interpersonal generalisation in health, and 2., Test whether previous computational findings of social learning changes in BPD can be explained by infractions to self-other generalisation. We accomplish these goals by using a dynamic, sequential economic paradigm, the Intentions Game, building upon a Social Value Orientation Framework (Murphy & Ackerman, 2011) that assumes innate motivational variation in joint reward allocation.‘

      ‘Second, in the discussion, the authors are too quick to generalize to broad clinical phenomena in BPD that are not directly connected to the task at hand. For example, on p. 22: "Those with a diagnosis of BPD also show reduced permeability in generalising from other to self. While prior research has predominantly focused on how those with BPD use information to form impressions, it has not typically examined whether these impressions affect the self." Here, it's not self-representation per se (typically, identity or one's view of oneself), but instead cooperation and prosocial tendencies in an economic context. It is important to clarify what clinical phenomena may be closely related to the task and which are more distal and perhaps should not be approached here.’

      Thank you for this important point. We agree that social value orientation, and particularly in this economically-assessed form, is but one aspect of the self, and we did not test any others. A version of the social contagion phenomena is also present in other aspects of the self in intertemporal (Moutoussis et al., 2016), economic (Suzuki et al., 2016) and moral preferences (Yu et al., 2021). It would be most interesting to attempt to correlate the degrees of insertion and contagion across the different tasks.

      We take seriously the wider concern that behaviour in our tasks based on economic preferences may not have clinical validity. This issue is central in the whole field of computational psychiatry, much of which is based on generalizing from tasks like ours, and discussing correlations with psychometric measures. We hope that it is acceptable to leave such discussions to the many reviews on computational psychiatry (Montague et al., 2012; Hitchcock et al., 2022; Huys et al., 2016). Here, we have just put a caveat in the dicussion:

      ‘Finally, a limitation may be that behaviour in tasks based on economic preferences may not have clinical validity. This issue is central to the field of computational psychiatry, much of which is based on generalising from tasks like that within this paper and discussing correlations with psychometric measures. Extrapolating  economic tasks into the real world has been the topic of discussion for the many reviews on computational psychiatry (e.g. Montague et al., 2012; Hitchcock et al., 2022; Huys et al., 2016). We note a strength of this work is the use of model comparison to understand causal algorithmic differences between those with BPD and matched healthy controls. Nevertheless, we wish to further pursue how latent characteristics captured in our models may directly relate to real-world affective change.’

      ‘On a more technical level, I had two primary concerns. First, although the authors consider alternative models within a hierarchical Bayesian framework, some challenges arise when one analyzes parameter estimates fit separately to two groups, particularly when the best-fitting model is not shared. In particular, although the authors conduct a model confusion analysis, they do not as far I could tell (and apologies if I missed it) demonstrate that the dynamics of one model are nested within the other. Given that M4 has free parameters governing the expectations on the absolute and relative reward preferences in Phase 2, is it necessarily the case that the shared parameters between M1 and M4 can be interpreted on the same scale? Relatedly, group-specific model fitting has virtues when believes there to be two distinct populations, but there is also a risk of overfitting potentially irrelevant sample characteristics when parameters are fit group by group.

      To resolve these issues, I saw one straightforward solution (though in modeling, my experience is that what seems straightforward on first glance may not be so upon further investigation). M1 assumes that participants' own preferences (posterior central tendency) in Phase 1 directly transfer to priors in Phase 2, but presumably the degree of transfer could vary somewhat without meriting an entirely new model (i.e., the authors currently place this question in terms of model selection, not within-model parameter variation). I would suggest that the authors consider a model parameterization fit to the full dataset (both groups) that contains free parameters capturing the *deviations* in the priors relative to the preceding phase's posterior. That is, the free parameters $\bar{\alpha}_{par}^m$ and $\bar{\beta}_{par}^m$ govern the central tendency of the Phase 2 prior parameter distributions directly, but could be reparametrized as deviations from Phase 1 $\theta^m_{ppt}$ parameters in an additive form. This allows for a single model to be fit all participants that encompasses the dynamics of interest such that between-group parameter comparisons are not biased by the strong assumptions imposed by M1 (that phase 1 preferences and phase 2 observations directly transfer to priors). In the case of controls, we would expect these deviation parameters to be centred on 0 insofar as the current M1 fit them best, whereas for BPD participants should have significant deviations from earlier-phase posteriors (e.g., the shift in \beta toward prior neutrality in phase 2 compared to one's own prosociality in phase 1). I think it's still valid for the authors to argue for stronger model constraints for Bayesian model comparison, as they do now, but inferences regarding parameter estimates should ideally be based on a model that can encompass the full dynamics of the entire sample, with simpler dynamics (like posterior -> prior transfer) being captured by near-zero parameter estimates.’

      Thank you for the chance to be clearer in our modelling. In particular, the suggestion to include a model that can be fit to all participants with the equivalent of the likes of partial social insertion, to check if the results stand, can actually be accomplished through our existing models.  That is, the parameter that governs the flexibility over beliefs in phase 2 under models M1 (dominant for CON participant) and M2 parameterises the degree to which participants think their partner may be different from themselves. Thus, forcibly fitting M1 and M2 hierarchically to all participants, and then separately to BPD and CON participants, can quantify the issue raised: if BPD participants indeed distinguish partners as vastly different from themselves enough to warent a new central tendency, should be quantitively higher in BPD vs CON participants under M1 and M2.

      We therefore tested this, reporting the distributional differences between for BPD and CON participants under M1, both when fitted together as a population and as separate groups. As is higher for BPD participants under both conditions for M1 and M2 it supports our claim and will add more context for the comparison - may be large enough in BPD that a new central tendency to anchor beliefs is a more parsimonious explanation.

      We cross checked this result by assessing the discrepancy between the participant’s and assumed partner’s central tendencies for both prosocial and individualistic preferences via best-fitting model M4 for the BPD group. We thereby examined whether belief disintegration is uniform across preferences (relative vs abolsute reward) or whether one tendency was shifted dramatically more than another.  We found that beliefs over prosocial-competitive preferences were dramatically shifted, whereas those over individualistic preferences were not.

      We have added the following to the main text results to explain this:

      Model Comparison:

      ‘We found that CON participants were best fit at the group level by M1 (Frequency = 0.59, Protected Exceedance Probability = 0.98), whereas BPD participants were best fit by M4 (Frequency = 0.54, Protected Exceedance Probability = 0.86; Figure 2A). We first analyse the results of these separate fits. Later, in order to assuage concerns about drawing inferences from different models, we examined the relationships between the relevant parameters when we forced all participants to be fit to each of the models (in a hierarchical manner, separated by group). In sum, our model comparison is supported by convergence in parameter values when comparisons are meaningful. We refer to both types of analysis below.’

      Phase 1:

      ‘These differences were replicated when considering parameters between groups when we fit all participants to the same models (M1-M4; see Table S2).’

      Phase 2:

      ‘To check that these conclusions about self-insertion did not depend on the different models, we found that only under M1 and M2 were consistently larger in BPD versus CON. This supports the notion that new central tendencies for BPD participants in phase 2 were required, driven by expectations about a partner’s relative reward. (see Fig S10 & Table S2). and parameters under assumptions of M1 and M2 were strongly correlated with median change in belief between phase 1 and 2 under M3 and M4, suggesting convergence in outcome (Fig S11).’

      ‘Furthermore, even under assumptions of M1-M4 for both groups, BPD showed smaller posterior median changes versus CON in phase 2 (see Table T2). These results converge to suggest those with BPD form rigid posterior beliefs.’

      ‘Assessing this same relationship under M1- and M2-only assumptions reveals a replication of this group effect for absolute reward, but the effect is reversed for relative reward (see Table S3). This accords with the context of each model, where under M1 and M2, BPD participants had larger phase 2 prior flexibility over relative reward (leading to larger initial surprise), which was better accounted for by a new central tendency under M4 during model comparison. When comparing both groups under M1-M4 informational surprise over absolute reward was consistently restricted in BPD (Table S3), suggesting a diminished weight of this preference when forming beliefs about an other.’

      Phase 3

      ‘In the dominant model for the BPD group—M4—participants are not influenced in their phase 3 choices following exposure to their partner in phase 2. To further confirm this we also analysed absolute change in median participant beliefs between phase 1 and 3 under the assumption that M1 and M3 was the dominant model for both groups (that allow for contagion to occur). This analysis aligns with our primary model comparison using M1 for CON and M4 for BPD  (Figure 2C). CON participants altered their median beliefs between phase 1 and 3 more than BPD participants (M1: linear estimate = 0.67, 95%CI: 0.16, 1.19; t = 2.57, p = 0.011; M3: linear estimate = 1.75, 95%CI: 0.73, 2.79; t = 3.36, p < 0.001). Relative reward was overall more susceptible to contagion versus absolute reward (M1: linear estimate = 1.40, 95%CI: 0.88, 1.92; t = 5.34, p<0.001; M3: linear estimate = 2.60, 95%CI: 1.57, 3.63; t = 4.98, p < 0.001). There was an interaction between group and belief type under M3 but not M1 (M3: linear estimate = 2.13, 95%CI: 0.09, 4.18, t = 2.06, p=0.041). There was only a main effect of belief type on precision under M3 (linear estimate = 0.47, 95%CI: 0.07, 0.87, t = 2.34, p = 0.02); relative reward preferences became more precise across the board. Derived model estimates of preference change between phase 1 and 3 strongly correlated between M1 and M3 along both belief types (see Table S2 and Fig S11).’

      ‘My second concern pertains to the psychometric individual difference analyses. These were not clearly justified in the introduction, though I agree that they could offer potentially meaningful insight into which scales may be most related to model parameters of interest. So, perhaps these should be earmarked as exploratory and/or more clearly argued for. Crucially, however, these analyses appear to have been conducted on the full sample without considering the group structure. Indeed, many of the scales on which there are sizable group differences are also those that show correlations with psychometric scales. So, in essence, it is unclear whether most of these analyses are simply recapitulating the between-group tests reported earlier in the paper or offer additional insights. I think it's hard to have one's cake and eat it, too, in this regard and would suggest the authors review Preacher et al. 2005, Psychological Methods for additional detail. One solution might be to always include group as a binary covariate in the symptom dimension-parameter analyses, essentially partialing the correlations for group status. I remain skeptical regarding whether there is additional signal in these analyses, but such controls could convince the reader. Nevertheless, without such adjustments, I would caution against any transdiagnostic interpretations such as this one in the Highlights: "Higher reported childhood trauma, paranoia, and poorer trait mentalizing all diminish other-to-self information transfer irrespective of diagnosis." Since many of these analyses relate to scales on which the groups differ, the transdiagnostic relevance remains to be demonstrated.’

      We have restructured the psychometric section to ensure transparency and clarity in our analysis. Namely, in response to these comments and those of the other reviewers, we have opted to remove the parameter analyses that aimed to cross-correlate psychometric scores with latent parameters from different models: as the reviewer points out, we do not have parity between dominant models for each group to warrant this, and fitting the same model to both groups artificially makes the parameters qualitatively different. Instead we have opted to focus on social contagion, or rather restrictions on , between phases 1 and 3 explained by M3. This provides us with an opportunity to examine social contagion on the whole population level isolated from self-insertion biases. We performed bootstrapping (1000 reps) and permutation testing (1000 reps) to assess the stability and significance of each edge in the partial correlation network, and then applied FDR correction (p[fdr]), thus controlling for multiple comparisons. We note that while we focused on M3 to isolate the effect across the population, social contagion across both relative and absolute reward under M3 strongly correlated with social contagion under M1 (see Fig S11).

      ‘We explored whether social contagion may be restricted as a result of trauma, paranoia, and less effective trait mentalizing under the assumption of M3 for all participants (where everyone is able to be influenced by their partner). To note, social contagion under M3 was highly correlated with contagion under M1 (see Fig S11). We conducted partial correlation analysis to estimate relationships conditional on all other associations and retained all that survived bootstrapping (1000 reps), permutation testing (1000 reps), and subsequent FDR correction. Persecution and CTQ scores were both moderately associated with MZQ scores (RGPTSB r = 0.41, 95%CI: 0.23, 0.60, p = 0.004, p[fdr]=0.043; CTQ r = 0.354 95%CI: 0.13, 0.56, p=0.019, p[fdr]=0.02). MZQ scores were in turn moderately and negatively associated with shifts in prosocial-competitive preferences () between phase 1 and 3 (r = -0.26, 95%CI: -0.46, -0.06, p=0.026, p[fdr]=0.043). CTQ scores were also directly and negatively associated with shifts in individualistic preferences (; r = -0.24, 95%CI: -0.44, -0.13, p=0.052, p[fdr]=0.065). This provides some preliminary evidence that trauma impacts beliefs about individualism directly, whereas trauma and persecutory beliefs impact beliefs about prosociality through impaired mentalising (Figure 4A).’

      (1) As far as I could tell, the authors didn't provide an explanation of this finding on page 5: "However, CON participants made significantly fewer prosocial choices when individualistic choices were available" While one shouldn't be forced to interpret every finding, the paper is already in that direction and I found this finding to be potentially relevant to the BPD-control comparison.

      Thank you for this observation. This sentance reports the fact that CON participants were effectively more selfish than BPD participants. This is captured by the lower value of reported in Figure 2, and suggests that CON participants were more focused on absolute value – acting in a more ‘economically rational’ manner – versus BPD participants. This fits in with our fourth paragraph of the discussion where we discuss prior work that demonstrates a heightened social focus in those with BPD. Indeed, the finding the reviewer highlights further emphasises the point that those with BPD are much more sensitive, and motived to choose, options concerning relative reward than are CON participants. The text in the discussion reads:

      ‘We also observe this in self-generated participant choice behaviour, where CON participants were more concerned over absolute reward versus their BPD counterparts, suggesting a heighted focus on relative vs. absolute reward in those with BPD.’

      (2) The adaptive algorithm for adjusting partner behavior in Phase 2 was clever and effective. Did the authors conduct a manipulation check to demonstrate that the matching resulted in approximately 50% difference between one's behavior in Phase 1 and the partner in Phase 2? Perhaps Supplementary Figure suffices, but I wondered about a simpler metric.

      Thanks for this point. We highlight this in Figure 3B and within the same figure legend although appreciate the panel is quite small and may be missed.  We have now highlighted this manipulation check more clearly in behavioural analysis section of the main text:

      ‘Server matching between participant and partner in phase 2 was successful, with participants being approximately 50% different to their partners with respect to the choices each would have made on each trial in phase 2 (mean similarity=0.49, SD=0.12).’

      (3) The resolution of point-range plots in Figure 4 was grainy. Perhaps it's not so in the separate figure file, but I'd suggest checking.

      Apologies. We have now updated and reorganised the figure to improve clarity.

      (4) p. 21: Suggest changing to "different" as opposed to "opposite" since the strategies are not truly opposing: "but employed opposite strategies."

      We have amended this.

      (5) p. 21: I found this sentence unclear, particularly the idea of "similar updating regime." I'd suggest clarifying: "In phase 2, CON participants exhibited greater belief sensitivity to new information during observational learning, eventually adopting a similar updating regime to those with BPD."

      We have clarified this statement:

      ‘In observational learning in phase 2, CON participants initially updated their beliefs in response to new information more quickly than those with BPD, but eventually converged to a similar rate of updating.’

      (6) p. 23: The content regarding psychosis seemed out of place, particularly as the concluding remark. I'd suggest keeping the focus on the clinical population under investigation. If you'd like to mention the paradigm's relevance to psychosis (which I think could be omitted), perhaps include this as a future direction when describing the paradigm's strengths above.

      We agree the paragraph is somewhat speculative. We have omitted it in aid of keeping the messaging succinct and to the point.

      (7) p. 24: Was BPD diagnosis assess using unstructured clinical interview? Although psychosis was exclusionary, what about recent manic or hypomanic episodes or Bipolar diagnosis? A bit more detail about BPD sample ascertainment would be useful, including any instruments used to make a diagnosis and information about whether you measured inter-rater agreement.

      Participants diagnosed with BPD were recruited from specialist personality disorder services across various London NHS mental health trusts. The diagnosis of BPD was established by trained assessors at the clinical services and confirmed using the Structured Clinical Interview for DSM-IV (SCID-II) (First et al., 1997). Individuals with a history of psychotic episodes, severe learning disability or neurological illness/trauma were excluded. We have now included this extra detail within our methods in the paper:

      ‘The majority of BPD participants were recruited through referrals by psychiatrists, psychotherapists, and trainee clinical psychologists within personality disorder services across 9 NHS Foundation Trusts in the London, and 3 NHS Foundation Trusts across England (Devon, Merseyside, Cambridgeshire). Four BPD participants were also recruited by self-referral through the UCLH website, where the study was advertised. To be included in the study, all participants needed to have, or meet criteria for, a primary diagnosis of BPD (or emotionally-unstable personality disorder or complex emotional needs) based on a professional clinical assessment conducted by the referring NHS trust (for self-referrals, the presence of a recent diagnosis was ascertained through thorough discussion with the participant, whereby two of the four also provided clinical notes). The patient participants also had to be under the care of the referring trust or have a general practitioner whose details they were willing to provide. Individuals with psychotic or mood disorders, recent acute psychotic episodes, severe learning disability, or current or past neurological disorders were not eligible for participation and were therefore not referred by the clinical trusts.‘

    1. eLife Assessment

      This work provides important insights into mucosal antibody responses against SARS-CoV-2 following intranasal immunization by characterizing a large number of monoclonal antibodies at both mucosal and non-mucosal sites. The evidence supporting the claims is solid. The demonstrated in vitro antiviral activity of antibodies characterized provides a rationale for developing mucosal vaccines, especially if confirmed in vivo and benchmarked against antibodies generated following intramuscular vaccination.

    2. Reviewer #2 (Public review):

      Summary:

      Demonstrate the breadth of IgA response as determined by isolating individual antigen-specific B cells and generating mAbs in mice following intranasal immunization of mice with SARS-CoV2 Spike protein. The findings show that some IgA mAb can neutralize the virus, but many do not. Notable immunization with Wuhan S protein generates a weak response to the omicron variant.

      Strengths:

      Detailed analysis characterizing individual B cells with the generation of mAbs demonstrates the response's breadth and diversity of IgA responses and the ability to generate systemic immune responses.

      Comments on Revision:

      I have re-reviewed the paper and responses to my and other reviewers' comments. I feel the authors have adequately addressed my and other reviewer's comments.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review): 

      Despite evidence suggesting the benefits of neutralizing mucosa-derived IgA in the upper airway in protection against the SARS-CoV-2 virus, all currently approved vaccines are administered intramuscularly, which mainly induces systemic IgG. Waki et al. aimed to characterize the benefits of intranasal vaccination at the molecular level by isolating B cell clones from nasal tissue. The authors found that Spike-specific plasma cells isolated from the spleen of vaccinated mice showed significant clonal overlap with Spikespecific plasma cells isolated from nasal tissue. Interestingly, they could not detect any spike-specific plasma cells in the bone marrow or Peyer's patches, indicating that these nose-derived cells did not necessarily home to and reside in these locations, although the Peyer's patch is not a typical plasma cell niche - rather the lamina propria of the gut would have been a better place to look. Furthermore, they found that multimerization improves the antibody/antigen binding when the antibody is of low or intermediate affinity, but that high-affinity monomeric antibodies do not benefit from multimerization. Lastly, the authors used a competitive ELISA assay to show that multimerization could improve the neutralizing capacity of these

      antibodies. 

      The strength of this paper is the cloning of multiple IgA from the nasal mucosae (n=99) and the periphery (n=114) post-SARS-CoV-2 i.n. vaccination to examine the clonal relationship of this IgA with other sites, including the spleen. This analysis provides novel insights into the nature of the mucosal antibody response at the site where the host would encounter the virus, and whether this IgA response disseminates to other

      tissues. 

      There were also some weaknesses: 

      (1) The finding that multimerization improves binding and neutralization is not surprising as this was observed before by Wang and Nussenzweig for anti-SARS-CoV-2 IgA (authors should cite Enhanced SARS-CoV-2 neutralization by dimeric IgA. Wang et al., Sci. Transl. Med 2021, 13:3abf1555). 

      We have cited the paper, and the relevant sentence has been modified as follows (line 51-53); Recent studies have demonstrated that multimeric IgA is more effective and provides greater cross-protection than IgG and M-IgA (Okuya et al., 2020b) (Asahi et al., 2002) (Dhakal et al., 2018) (Asahi-Ozaki et al., 2004) (Wang et al., 2021).

      In addition, as far as I can tell we cannot ascertain the purity of fractions from the size exclusion chromatography thus I wasn't sure whether the input material used in Fig. 4 was a mixed population of dimer/trimer/tetramer?  

      The S-IgAs used in the SPR analysis in Fig. 4 consist of a mixture of dimers, trimers, and tetramers. The observed values indicate the average affinity of the S-IgAs. Please refer to the revised version (line 278280).

      (2) The flow cytometric assessment of the IgA+ clones from the nasal mucosae was difficult to interpret (Fig. 1B). It was hard for me to tell what they were gating on and subsequently analyzing without an IgA-negative population for reference. 

      We have updated FACS plots to illustrate the presence of IgA+ plasma cells in Fig. 1B, and the detailed gating strategy is outlined in Fig. 1B legend. Please find the relevant statements (line 115-120).

      (3) While the i.n. study itself is large and challenging, it would have been interesting to compare an i.m. route and examine the breadth of SARS-CoV-2 variant S1 binding for IgGs as in Fig. 2A. Are the IgA responses derived from the mucosae of greater breadth than systemic IgG responses? Alternatively, and easier, authors could do some comparisons with well-characterized IgG mAb for affinity and cross-reactivity as a benchmark to compare with the IgAs they looked at. Overall the authors did a good job of looking at a large range of systemic vs mucosal S1-specific antibodies in the context of an intra-nasal vaccination and this provides additional evidence for the utility of mucosal vaccination approaches for reducing person-to-person transmission. 

      I appreciate your consideration. Recent reports indicate that some M-IgA monomers possess neutralizing activity that is equivalent to or less than that of IgGs. However, the opposite phenomenon has also been observed. These results suggest that the Fc does not merely correlate with the degree of increase in antibody reactivity or functionality. We believe the discrepancies in previous studies are due to variations in the binding modes between the epitope and paratope of each antibody clone. Nevertheless, oligomerization enhances the functionality of most monomeric antibody clones, suggesting that the multivalent S-IgA enables a mode of action that is challenging to achieve with a monomeric antibody. Please refer to the revised version (line 399-403).

      Alternatively, and easier, authors could do some comparisons with well-characterized IgG mAb for affinity and cross-reactivity as a benchmark to compare with the IgAs they looked at. Overall the authors did a good job of looking at a large range of systemic vs mucosal S1-specific antibodies in the context of an intra-nasal vaccination and this provides additional evidence for the utility of mucosal vaccination approaches for reducing person-to-person transmission. 

      We have summarized the characteristics of the four types of nasal IgAs in Fig.7 and in the Discussion. Please refer to the revised version (line 405-422).

      Reviewer #2 (Public Review): 

      Summary: 

      This research demonstrates the breadth of IgA response as determined by isolating individual antigenspecific B cells and generating mAbs in mice following intranasal immunization of mice with SARS-CoV2 Spike protein. The findings show that some IgA mAb can neutralize the virus, but many do not. Notable immunization with Wuhan S protein generates a weak response to the omicron variant. 

      Strengths: 

      Detailed analysis characterizing individual B cells with the generation of mAbs demonstrates the response's breadth and diversity of IgA responses and the ability to generate systemic immune responses. 

      Weaknesses: 

      The data presentation needs clarity, and results show mAb ability to inhibit SARS-CoV2 in vitro. How IgA functions in vivo is uncertain. 

      We conducted an additional experiment using a hamster model and confirmed that S-IgAs can protect against SARS-CoV-2 infection. Please refer to the revised version (line 349-373 and 431-438).

      Reviewer #1 (Recommendations For The Authors): 

      (1) Figure 1A shows antibody titers in nasal lavage fluid and serum of mice post intranasal vaccination with SARS-CoV-2 Spike protein. The Y-axis of this figure is labeled as "U/mg" however these units are not clearly defined. 

      The antibody titers are expressed as optical density (OD450) value per total protein in nasal lavage fluids or serum. Please find the relevant statements (line 113-114).

      Furthermore, what do antibody titers in the nasal lavage fluid and serum look like post-intramuscular vaccination with the same vaccine and dose? Comparison of titers to the intramuscular route as well as to the PBS control would make this data more impactful. 

      We appreciate your consideration. We have not conducted experiments comparing the effects of intramuscular and intranasal administration using the same dosage and adjuvant. Cholera toxin has primarily been used as an adjuvant for nasal immunization, but it is seldom applied for intramuscular injection. We are interested in its impact on the immune compartment when using cholera toxin as an adjuvant for intramuscular injection. We plan to conduct further experiments in the future.

      Lastly, in Figure 1B, the detection of nasal IgG is not shown even though the authors assess nasally-derived IgG in the spleen further into the study.  

      Since the number of lymphocytes that can be collected from the nasal mucosa is limited, there is an insufficient capacity to isolate IgG+ plasma cells after collecting IgA+ plasma cells. Therefore, conducting such an experiment on mice is technically challenging. A larger animal, such as rats, will be necessary to perform this experiment. Further investigation is needed to determine whether antigen-specific IgG+ plasma cells, sharing V-(D)-J with nasal IgA, can be detected in the nasal mucosa.

      (2) There appears to be something amiss with the IgA stain. It is smushed up against the X-axis. Better flow cytometry profiles should be shown. Likewise in Supplemental Fig. 1A, their IgA stain appears to not be working. This must be addressed using positive and negative controls. 

      We have updated FACS-polts to show the IgA+ plasma cell in Fig.1B, and the detailed gating strategy is outlined in the Fig.1B legend. Please find the relevant statements on line 115-120.

      (3) We do not know the purity of the samples that were subjected to SPR and since the legend of Fig. 4 is partially incorrect, it was difficult to know how this experiment was done. 

      The S-IgA used in the SPR analysis shown in Figure 4 is a mixture of dimers, trimers, and tetramers, and the observed values are believed to reflect the affinity of the S-IgA in the nasal mucosa. Please refer to the revised version (line 278-280).

      (4) Fig. 5 results need to compare with some of the well-characterized mAb (IgG) to understand the biological significance of these neutralizing titres. 

      We have summarized the characteristics of the four types of nasal IgA in Fig.7 and in the Discussion. Please refer to the revised version (page 405-422).

      Communication of results: 

      (1) Authors could improve the communication of their results by introducing the vaccination protocol in the results section accompanied by a diagram of the vaccination strategy (nature of the Ag, route, and frequency). This could be Fig. 1A .  

      A schematic diagram of the vaccination protocol is presented in Fig.1.

      (2) Care should be taken with some of the terminology. Intranasal is the accepted term but authors sometimes use "internasal". The term "immunosuppression" on page 2 could be misleading as it means something different to other audiences. The distinction when speaking about "protection from harmful pathogens" should be made between protection against infection (ie sterilizing immunity) vs protection against disease (ie morbidity and mortality). Instead of "nose", one should say "nasal". Nose-related could be rephrased as "potentially nasal-derived". P.5, line 2 didn't make sense: "IgG+ plasma cells that express nose-related IgA"...

      In many places, Spike is missing it's "e".  

      We have made the correction accordingly.

      (3) Page 3: The lumping of the human and animal SARS-CoV-2 intranasal studies together is a bit misleading. Very little has worked for intranasal vaccination against SARS-CoV-2 in humans at this point in time (although hopefully that will change soon!). Authors should specify which studies were done in animals and which were done in humans. 

      The manuscript has been revised to include two citations on line 73-75 (Ewer et al., 2021 and Zhu et al., 2023).

      (4) What is ER-tracker? It comes out of nowhere and should be explained why it was used to the reader (as well as why they used the other markers) to sort for Spike-specific PC. 

      ER-Tracker is a fluorescent dye that is highly selective for the endoplasmic reticulum of living cells. Because plasma cells have an expanded endoplasmic reticulum for properly folding and secreting large quantities of antibodies, using ER-Tracker along with anti-CD138 facilitates the isolation of plasma cells from lymphocytes without the need for additional antibodies. Please refer to the revised version for details. (ine 130-134).

    1. eLife Assessment

      This study uses C. elegans to investigate how the Calcium/Calmodulin-dependent kinase CMK-1 regulates adaptation to thermo-nociceptive stimuli. The authors use compelling approaches to identify Calcineurin as a phosphorylation target of CMK-1 and to investigate the relationship between CMK-1 and Calcineurin using gain and loss of function genetic and pharmacological methods. The findings of this study are valuable as they show that CMK-1 and Calcineurin act in separate neurons in an antagonistic and complex manner to regulate thermo-nociceptive adaptation, and these results may be relevant for understanding some chronic human pain conditions.

    2. Reviewer #1 (Public review):

      Summary:

      Goal: Find downstream targets of cmk-1 phosphorylation, identify one that also seems to act in thermosensory habituation, test for genetic interactions between cmk-1 and this gene and assess where these genes are acting in the thermosensory circuit during thermosensory habituation.

      Methods: Two in vitro analyses of cmk-1 phosphorylation of C. elegans proteins. Thermosensory habituation of cmk-1 and tax-6 mutants and double mutants was assessed by measuring rate of heat evoked reversals (reversal probability) of C. elegans before and after 20s ISI repeated heat pulses over 60 minutes.

      Conclusions: cmk-1 and tax-6 act in separate habituation processes primarily in AFD, that interact complexly, but both serve to habituate the thermosensory reversal response. They found that cmk-1 primarily acts in AFD and tax-6 primarily acts in RIM (and FLP for naïve responses). They also identified hundreds of potential cmk-1 phosphorylation substrates in vitro.

      Strengths:

      The effects size in the genetic data is quite strong and a large number of genetic interaction experiments between cmk-1 and tax-1 demonstrate a complex interaction.

      A major concern concerning this manuscript was the assumption that the process they are observing is habituation. The two previously cited papers using this (or a very similar) protocol, Lia and Glauser 2020 and Jordan and Glauser 2023, both use the word 'adaptation' to describe the observed behavioral decrement. Jordan and Glauser 2023 does occasionally use the words 'habituation' or 'habituation-like' 10 times, however it uses 'adaptation' over 100 times. It is critical to distinguish habituation from sensory adaptation (or fatigue) in this thermal reversal protocol. These processes are often confused/conflated, however they are very different; sensory adaptation is a process that decreases how much the nervous system is activated by a repeated stimulus, therefore it can even occur outside of the nervous system. Habituation is a learning process where the nervous system responds less to a repeated stimulus, despite (at least part of the nervous system) the nervous system still being similarly activated by the stimulus. Habituation is considered an attentional process, while adaptation is due to fatigue of sensory transduction machinery. Control experiments such as tests for dishabituation (where application of a different stimulus causes recovery of the decremented response) or rate of spontaneous recovery (more rapid recovery after short inter-stimulus intervals) are required to determine if habituation or sensory adaptation are occurring. These experiments will allow the results to be interpreted with clarity; without them, it isn't actually clear what biological process is actually being studied. The authors have accepted this distinction and now correctly call the process adaptation.

      While there was originally some discrepancy between the two in vitro phosphorylation experiments and the in silico predictions, the revision has cleared up the issues.<br /> Figure 3 -S1: This model has been adjusted to more closely fit the data.

      The authors have expanded the discussion about the significance of the sites of cmk-1 and tax-6 function in the neural circuit.

    3. Reviewer #2 (Public review):

      Summary:

      The reduction in a response to a specific stimuli after repeated exposures is called habituation. Alterations in habituation to noxious stimuli are associated with chronic pain in humans, however the underlying molecular mechanisms involved are not clear. This study uses the nematode C. elegans to study genes and mechanisms that underlie adaptation to a form of noxious stimuli based on heat, termed thermo-noxious stimuli. The authors previously showed that the Calcium/Calmodulin-dependent protein kinase (CMK-1) regulates thermo-nociceptive adaptation in the nematode C. elegans. Although CMK-1 is a kinase with many known substrates, the downstream targets relevant for thermo-nociceptive adaptation are not known. In this study, the authors use two different kinase screens to identify phosphorylation targets of CMK-1. One of the targets they identify is Calcineurin (TAX-6). The authors show that CMK-1 phosphorylates a regulatory domain of Calcineurin at a highly conserved site (S443). In a series of elegant experiments, the authors use genetic and pharmacological approaches to increase or decrease CMK-1 and Calcineurin signaling to study their effects on thermo-nociceptive adaptation in C. elegans. They also combine these various approaches to study the interactions between these two signaling proteins. The authors use specific promoters to determine in which neurons CMK-1 and Calcineurin function to regulate thermo-nociceptive adaptation. The authors propose a model based on their findings, illustrating that CMK-1 and Calcineurin act mostly in different neurons to antagonistically regulate adaptation to thermo-nociceptive stimuli in a complex manner.

      Strengths:

      - Given the conservation of adaptation across phylogeny, identifying genes and mechanisms that underlie nociceptive adaptation in C. elegans may be relevant for understanding chronic pain in humans.<br /> - The identification of canonical CaM Kinase phosphorylation motifs in the substrates identified in the CMK-1 substrate screen validates the screen.<br /> - The use of loss and gain of function approaches to study the effects of CMK-1 and Calcineurin on thermo-nociceptive responses and adaptation is elegant.<br /> - The ability to determine the cellular place of action of CMK-1 and Calcineurin using neuron specific promoters in the nematode is a clear strength of the genetic model system.

      Weaknesses:

      - The manuscript begins by identifying Calcineurin as a direct substrate of CMK-1 but ends by showing that CMK-1 and Calcineurin mostly act in different neurons to regulate nociceptive adaptation, thus the physiological relevance of CMK-1 phosphorylation of Calcineurin is not clear.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      Goal: Find downstream targets of cmk-1 phosphorylation, identify one that also seems to act in thermosensory habituation, test for genetic interactions between cmk-1 and this gene, and assess where these genes are acting in the thermosensory circuit during thermosensory habituation.  

      Methods: Two in vitro analyses of cmk-1 phosphorylation of C. elegans proteins. Thermosensory habituation of cmk-1 and tax-6 mutants and double mutants was assessed by measuring the rate of heat-evoked reversals (reversal probability) of C. elegans before and after 20s ISI repeated heat pulses over 60 minutes.  

      Conclusions: cmk-1 and tax-6 act in separate habituation processes, primarily in AFD, that interact complexly, but both serve to habituate the thermosensory reversal response. They found that cmk-1 primarily acts in AFD and tax-6 primarily acts in RIM (and FLP for naïve responses). They also identified hundreds of potential cmk-1 phosphorylation substrates in vitro.  

      Strengths:  

      The effect size in the genetic data is quite strong and a large number of genetic interaction experiments between cmk-1 and tax-1 demonstrate a complex interaction.  

      Thanks a lot for these positive remarks.

      Weaknesses:  

      The major concern about this manuscript is the assumption that the process they are observing is habituation. The two previously cited papers using this (or a very similar) protocol, Lia and Glauser 2020 and Jordan and Glauser 2023, both use the word 'adaptation' to describe the observed behavioral decrement. Jordan and Glauser 2023 use the words 'habituation' or 'habituation-like' 10 times, however, they use 'adaptation' over 100 times. It is critical to distinguish habituation from sensory adaptation (or fatigue) in this thermal reversal protocol. These processes are often confused/conflated, however, they are very different; sensory adaptation is a process that decreases how much the nervous system is activated by a repeated stimulus, therefore it can even occur outside of the nervous system. Habituation is a learning process where the nervous system responds less to a repeated stimulus, despite (at least part of the nervous system) the nervous system still being similarly activated by the stimulus. Habituation is considered an attentional process, while adaptation is due to the fatigue of sensory transduction machinery. Control experiments such as tests for dishabituation (where the application of a different stimulus causes recovery of the decremented response) or rate of spontaneous recovery (more rapid recovery after short inter-stimulus intervals) are required to determine if habituation or sensory adaptation are occurring. These experiments will allow the results to be interpreted with clarity, without them, it isn't actually clear what biological process is actually being studied.  

      Thanks for the comment. As this reviewer points out, “adaptation” and “habituation” are often conflated. Many scientists (maybe not the majority though) use a less stringent definition for the word habituation, than the one presented by this reviewer. More particularly, the term habituation is used in human pain research to refer solely to the reduction of response to repeated stimuli, in the absence of a detailed assessment of the more stringent criteria mentioned here (see, e.g.,  PMID: 22337205 ; PMID: 18947923 ; PMID: 17258858; PMID: 20685171 ; PMID: 15978487). In addition to the practice in pain research, the main reason why we steered toward ‘habituation’ from our previous publication is because it immediately conveys the idea of a response reduction, whereas ‘adaptation’ could in principle be either an up-regulation or a downregulation of the response (again, based on various definitions). But we agree that using the word “habituation” came at the cost of triggering a confusion about the exact nature of the process, for those considering the stricter definition of the word “habituation” and those not in the narrower field of pain research. In the revised manuscript, we have thus changed this terminology to “adaptation”. Also following suggestions from Reviewer 2, we have strengthened the description of the protocol in the Result section and clarified, why the adaptation phenomenon is not a ‘thermal damage’ effect or ‘fatigue’ effect in the neuro-muscular circuit controlling reversal. One of the most convincing piece of evidence it cannot be solely explained by “damages” or “exhaustion” is simply the existence of non-adapting mutants (like cmk-1(lf)) or pharmacological treatments (Cyclosporin A) blocking the adaptation effect and enabling worm to continuously reverse for hours without any problems.  

      While the discrepancy between the in vitro phosphorylation experiments and the in silico predictions was discussed, the substantial discrepancy (over 85% of the substrates in the smaller in vitro dataset were not identified in the larger dataset) between the two different in vitro datasets was not discussed. This is surprising, as these approaches were quite similar, and it may indicate a measure of unreliability in the in vitro datasets (or high false negative rates).

      Thanks for the comment. This is an important aspect which we now more extensively cover in the Discussion section.

      The strong consistency of the CMK-1 recognition consensus sequences across the two in vitro dataset speaks against the unreliability of the analyses. Instead, there are a few points to highlight that explain the somewhat low degree of overlap between the two datasets, which indeed relate to the false negative rates as this reviewer suggests.

      (1) In the peptide library analysis, Trypsin cleavage prior to kinase treatment will leave a charged N-term or C- terminus and in addition remove part of the protein context required for efficient kinase recognition. This will have a variable effect across the different substrates in the peptide library, depending on the distance between the cleavage site and the phosphosite, but will not affect the native protein library. This effect increases the false negative rate in the peptide library.

      (2) The number and distribution of “available substrate phosphosites” diverge in the two libraries. Indeed, the peptide library is expected to contain a markedly larger diversity of potential CMK-1 substrate sites than the protein library (because the Trypsin digestion will reveal substrates that are normally buried in a native protein), but the depth of MS analysis is the same for the two libraries. In somewhat simplistic terms, the peptide-library analysis is prone to be saturated with abundant phosphorylated peptides, which prevent detecting all phosphosites. If the peptide analysis could have been made deeper, we would probably have increased the overlap (at the cost of increasing the number of false positive too).

      (3) We have chosen quite strict criteria and applied them separately to define each hit list; therefore, we know we have many false negatives in each list, which will naturally reduce the expected overlap.

      We now extended the discussion of the limited overlap of the two dataset in a dedicated paragraph in the discussion. We also clarify that we tend to give more trust to the protein-library dataset (since substrates are in a configuration closer to that in vivo), with those hits also present in the peptide dataset (like TAX-6 was) as the most convincing hits, as they could be validated in a second type of experiment.

      Additionally, the rationale for, and distinction between, the two separate in vitro experiments is not made clear.  

      We reasoned that both substrate types have their own benefits and limitations (as discussed in the manuscript), so it was an added value to run both. We proposed that the subset of targets present in both datasets to be the most solid list of candidates. We have reinforced this point in the discussion.  

      Line 207: After reporting that both tax-6 and cnb-1 mutants have high spontaneous reversals, it is not made clear why cnb-1 is not further explored in the paper. Additionally, this spontaneous reversal data should be in a supplementary figure.  

      We kept the focus of the article primarily on TAX-6, because it was identified as CMK-1 target in vitro; CNB-1 was not. Moreover, we didn’t have cnb-1(gf) mutants to pursue the analysis with, and we were stuck by the cnb-1(lf) constitutive high reversal rate for any further follow up. We have added a supplementary file to present the spontaneous reversals rates.

      Figure 3 -S1: This model doesn't explain why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement (presumably by reducing the inhibition by tax-6) but the +cyclo A group (inhibited tax-6) showed weaker response decrement, as here there is even further weakened inhibition of tax-6 on this process. Also, the cmk-1(lf) +cyclo A group is labeled as constitutive habituation, however, this doesn't appear to be the case in Figure 3 (seems like a similar initial level and response decrement phenotype to wildtype).  

      Thanks a lot for the comment. We are glad that the presentation of our complex dataset was clear enough to bring the reader to that level of detailed reflection and interpretation on the proposed model. To address the two points raised in this reviewer comment, we made modifications to the model presentation and provide additional clarifications below, where we use the term adaptation instead of habituation (as in the revised Figure):

      Regarding the first point, “why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement … but the +cyclo A group showed weaker response decrement”. This is really a very good point, that cannot be easily explained if all the branches (arrows) in the model have the same weight or work as ON/OFF switches. We tried to convey the relative importance of the regulation effect via the thickness of the arrow lines (which we have now clarified in the legend in the revised ms). The main ‘quantitative’ nuances to take into consideration here originate from 2 assumptions of the model (which we have clarified in the revised ms):

      Assumption 1: the inhibitory effect of TAX-6 on the CMK-1 antiadaptation branch and the inhibitory effect of TAX-6 on the CMK-1 pro-adaptation branch are not of the same magnitude (we have further enhanced the line thickness differences in the revised model, top left panel for wild type).

      Assumption 2: the two antagonistic direct effects of CMK-1 on adaptation are not of the same magnitude, most strikingly in the context of CMK-1(gf) mutants.

      In our model, the cyclosporin A treatment alone (bottom left panel) causes a strong boost on the CMK-1 inhibitory branch and a less marked boost on the CMK-1 activator branch (following assumption 1). This causes an imbalance between the two antagonist direct CMK-1-dependent drives, which reduces (but doesn’t fully block) adaptation. Indeed, we don’t observe a total block of adaptation with cyclosporin A in wild type, the effect being significantly milder than the totally nonadapting phenotypes seen, e.g., in TAX-6(gf) mutants. From there, the question is what happen in CMK-1(gf) background that would mask the anti-adaptation effect of Cyclosporin A? Here assumption 2 is relevant, and the CMK-1(gf) pro-adaptation direct branch is always prevalent and imbalances the regulation toward faster adaptation (the role of TAX-6 becoming negligible in the CMK-1(gf) background and ipso facto that of Cyclosporin A).

      Regarding the second point, “the cmk-1(lf) +cyclo A group is labeled as constitutive habituation”. We regret a confusing word choice in the first version of the manuscript; we intended to mean “normal habituation phenotype” but in the joint absence of antagonistic CMK-1 and TAX-6 regulatory signaling (so the regulation is not like in wild-type, but the phenotype ends up like in wild type). We have modified the label to “normal adaptation” and left a note in the legend that an apparently normal adaptation phenotype seems to be the default situation when the two antagonistic regulatory pathways are shut off.

      More discussion of the significance of the sites of cmk-1 and tax-6 function in the neural circuit should take place. Additionally, incorporating the suspected loci of cmk-1 and tax-6 in the neural circuit into the model would be interesting (using proper hypothetical language). For example, as it seems like AFD is not required for the naïve reversal response but just its reduction, cmk-1 activity in AFD might be generating inhibition of the reversal response by AFD. It certainly would be understandable if this isn't workable, given extrasynaptic signaling and other unknowns, but it potentially could also be helpful in generating a working model for these complex interactions. For example, cmk1 induces AIZ inhibition of AVA (AIZ is electrically coupled to AFD), and tax-6 reduces RIM activation of AVA (these neurons are also electrically coupled according to the diagram). RIM is also a neuropeptide-rich neuron, so this could allow it to interact with the cmk-1-related process(es) in AFD. Some discussion of possibilities like this could be informative.  

      Thanks for the comment. These hypothetical inter-cellular communication pathways are indeed nice possibilities. On the other hand, we could envision several additional pathways. While RIM is indeed a neuropeptide-rich neurons, all these neurons actually express neuropeptides. Following this helpful suggestion, we have slightly expanded the discussion of hypothetical cellular pathways that can be modulated downstream of CMK-1 in AFD. We also slightly lengthened the discussion to mention hypothetical post-synaptic target of TAX-6 within interneurons based on the literature.

      Provide an explanation for why some of the experiments in Figure 4 have such a high N, compared to other experiments.  

      The conditions with the highest n correspond to conditions which we have also used as ‘control’ condition for other type of experiments in the lab and as part of side projects, but which could be gathered for the present article. We have been working with cmk-1(lf) and tax-6(gf) mutants for many years… and the robust non-adapting phenotype was a reference point and a quality control when analyzing other nonadapting mutants.

      Because the loss of function and gain of function mutations in cmk-1 have a similar effect, it is likely that this thermosensory plasticity phenotype is sensitive to levels of cmk-1 activity. Therefore, it is not surprising that the cmk-1 promoter failed to rescue very well as these plasmid-driven rescues often result in overexpression. Given this and that the cmk-1p rescue itself was so modest, these rescue experiments are not entirely convincing (and very hard to interpret; for example, is the AFD rescue or the ASER rescue more complete? The ASER one is actually closer to the cmk-1p rescue). Given the sensitivity to cmk-1 activity levels, a degradation strategy would be more likely to deliver clear results (or perhaps even the overactivation approach used for tax-6).  

      Thanks for the comment. We respectfully disagree with this reviewer’s statement “the loss of function and gain of function mutations in cmk-1 have a similar effect”. We suspect a confusion here, because our data clearly show that these two mutant types have an opposite phenotype. That being said, we interpret the weak rescue effect with cmk-1p as a probable result of overexpression or incomplete/imbalanced expression across neurons (as the promoter used might not include all the relevant regulatory regions). We dedicated considerable efforts to establish an endogenous CMK-1::degron knock in, for tissue-specific auxin-induced degradation (AID), but we were unfortunately not able to obtain consistent results. Unfortunately, the only useful data regarding CMK-1 place-of-action are the cell-specific rescue data already included in the report.

      Reviewer #2 (Public review):  

      Summary:  

      The reduction in a response to a specific stimulus after repeated exposures is called habituation. Alterations in habituation to noxious stimuli are associated with chronic pain in humans, however, the underlying molecular mechanisms involved are not clear. This study uses the nematode C. elegans to study genes and mechanisms that underlie habituation to a form of noxious stimuli based on heat, termed thermo-noxious stimuli. The authors previously showed that the Calcium/Calmodulin-dependent protein kinase (CMK-1) regulates thermo-nociceptive habituation in the nematode C. elegans. Although CMK-1 is a kinase with many known substrates, the downstream targets relevant for thermo-nociceptive habituation are not known. In this study, the authors use two different kinase screens to identify phosphorylation targets of CMK-1. One of the targets they identify is Calcineurin (TAX-6). The authors show that CMK-1 phosphorylates a regulatory domain of Calcineurin at a highly conserved site (S443). In a series of elegant experiments, the authors use genetic and pharmacological approaches to increase or decrease CMK-1 and Calcineurin signaling to study their effects on thermo-nociceptive habituation in C. elegans. They also combine these various approaches to study the interactions between these two signaling proteins. The authors use specific promoters to determine in which neurons CMK-1 and Calcineurin function to regulate thermonociceptive habituation. The authors propose a model based on their findings illustrating that CMK-1 and Calcineurin act mostly in different neurons to antagonistically regulate habituation to thermo-nociceptive stimuli in a complex manner.  

      Strengths:  

      (1) Given the conservation of habituation across phylogeny, identifying genes and mechanisms that underlie nociceptive habituation in C. elegans may be relevant for understanding chronic pain in humans.  

      (2) The identification of canonical CaM Kinase phosphorylation motifs in the substrates identified in the CMK-1 substrate screen validates the screen.  

      (3) The use of loss and gain of function approaches to study the effects of CMK-1 and Calcineurin on thermo-nociceptive responses and habituation is elegant.  

      (4) The ability to determine the cellular place of action of CMK-1 and Calcineurin using neuron-specific promoters in the nematode is a clear strength of the genetic model system.  

      Thanks a lot for these positive remarks.

      Weaknesses:  

      (1) The manuscript begins by identifying Calcineurin as a direct substrate of CMK-1 but ends by showing that CMK-1 and Calcineurin mostly act in different neurons to regulate nociceptive habituation which disrupts the logical flow of the manuscript.  

      We understand this point and we have carefully considered and (reconsidered) the way to articulate the report. However, we could not present the story much differently as we would have no justification to investigate the role of TAX-6 and its interaction with CMK-1, if we would not have first identified it as phospho-target in vitro. Carefully considering this point, we found that the abstract of the first manuscript version was probably too cursory and susceptible to trigger wrong expectations among readers. We have thus extensively revised the abstract to clarify this point. Furthermore, we have reinforced this point in the last paragraph of the introduction and in the conclusion paragraph of the Discussion.

      (2) The physiological relevance of CMK-1 phosphorylation of Calcineurin is not clear.

      We do agree and have explicitly mentioned this aspect in the abstract, in the end of the introduction, and in the discussion section.

      (3) It is not clear if Calcineurin is already a known substrate of CaM Kinases in other systems or if this finding is new.  

      We are not aware of any study having shown Calcineurin is a direct target of CaM kinase I. But it was found to be substrate of CaM kinase II as well as of other kinases, as we explicitly presented in the discussion section. We have complemented the text mentioning we are not aware of Calcineurin having so far been reported to be a CaM kinase I substrate.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):  

      (1) The authors might consider reorganizing the results, so that the substrate phosphorylation analysis follows the cmk-1 habituation data, as it may not be clear to the reader why you are looking for substrates downstream of cmk-1 at that point. Or the authors could mention the previous habituation data for cmk-1 at the beginning of the results.  

      Thank you. This is something that we considered while (re-)writing. However, we prefer to keep CMK-1 data side-by-side with TAX-6 data, regarding the result section. Nevertheless, we have modified the last paragraph of intro to better transition and justify the specific interest of searching for CMK-1 targets in the context of the present study.

      (2) Line 209: 'controls' is too strong a word. 'regulates' would be better, and it should be stated that this is for 'spontaneous reversal behavior'.  

      Thank you. This was modified.

      (3) Line 359: we suspect that these reflect functional enrichments.  

      We don’t see what would exactly be wrong with the original sentence. The proposed change (if it is a proposed change) would completely obliterate the intended meaning of our sentence. We rewrote the sentence to be as clear as possible, as follows: ”Even if we cannot rule out an actual inclination of the CaM kinase pathway to regulate these processes, we suspect that these GO term enrichments rather reflect an analytical bias toward abundant proteins.”

      (4) Line 563: In this subsection, it is not made clear when the T0 and T60 heat pulses are given, in relation to the 20s ISI heat pulses given for 60 minutes. Are they the first and last pulse, or given some time before or after this train of heat pulses?  

      Thanks for spotting this poor description, which we have improved in the revised manuscript. The heat pulse recording is given immediately before and immediately after the 60 min of repeated stimulation. After the T0 heat pulse recording there is a period of about 30 s (period of post stimuli recording + transfer from the recording device (INFERNO) to the habituation device (ThermINATOR)).  For the T60 acquisition, there is a lag of about 50 s between the last ‘habituation’ stimuli and the recording stimuli (time needed to move the plate between the habituation device and the recording device + 40 s of baseline reversal recording in the absence of heat stimuli).

      Reviewer #2 (Recommendations for the authors):  

      (1) There appears to be little to no connection between the phosphorylation site discovered in Calcineurin (S443) and the behavioral phenotypes being studied. What is the thermo-nociceptive response if phosphorylation of S443 in Calcineurin is blocked (using a S443A mutation) and/or combined with CMK-1 gain of function?  

      Thanks for the suggestion. The suggested analysis is complicated by several factors. First, the tax-6(lf) is not directly suitable for rescue analysis (until we would have identified a way to restore baseline reversal), so we cannot use a S443A-carrying rescue transgene. Second, the truncated TAX-6(GF) mutant lacks the C-terminal part, including S443, so we cannot introduce a S443A in this context. The left approach would be to modify the endogenous locus. This again is complicated by the fact that S443 exists in two different isoforms (with conserved RxxS motifs in two different alternative exons). It will be very difficult to perform these experiments until we know more about the expression pattern and function of the respective isoforms. This is work in progress, but this analysis will need to await a future publication.

      (2) The authors should state clearly if Calcineurin is a novel substrate of CaM Kinase or if this is already known in the field.  

      We have complemented the text mentioning we are not aware of Calcineurin having so far been reported to be a CaM kinase I substrate.

      (3) The logical flow of the manuscript could be improved given that CMK-1 and Calcineurin appear to act in different cells to regulate nociceptive habituation.  

      As detailed above, we have considered this point carefully and modified the introduction and the abstract. The discussion about the two places of action was also improved.

      (4) More detail about the experimental methods used for the heat-evoked reversals should be included in the Results section.  

      Thanks for the suggestion. We have improved the description in the Method section and expanded the partial description in the result section, so readers could hopefully proceed without needing to go back and forth with the methods.

      (5) Check for typos. For example: line 197 - fix typo "...to a series repeated heat stimulation...".  

      Thank you. We have carefully read the revised manuscript to correct remaining typos.

    1. eLife Assessment

      The authors developed a methodology to graph antigenic surface loops on influenza virus neuraminidases. The hybrid proteins retained the structure of the neuraminidase scaffold and the antigenicity of the grafted loops. This fundamental work should help in developing novel neuraminidase constructs for use in influenza virus vaccines. The paper presents compelling evidence supporting the conclusions arrived at by the authors.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript described a structure-guided approach to graft important antigenic loops of the neuraminidase to a homotypic but heterologous NA. This approach allows the generation of well-expressed and thermostable recombinant proteins with antigenic epitopes of choice to some extent. The loop-grafted NA was designated hybrid.

      Strengths:

      The hybrid NA appeared to be more structurally stable than the loop-donor protein while acquiring its antigenicity. This approach is of value when developing a subunit NA vaccine which is difficult to express. So that antigenic loops could be potentially grafted to a stable NA scaffold to transfer strain-specific antigenicity.

    3. Reviewer #2 (Public review):

      In their manuscript, Rijal and colleagues describe a 'loop grafting' strategy to enhance expression levels and stability of recombinant neuraminidase. The work is interesting and important.

      Major points from first round of review:

      (1) The authors overstress the importance of the epitopes covered by the loops they use and play down the importance of antibodies binding to the side, the edges, or the underside of the NA. A number of papers describing those mAbs are also not included.

      (2) The rationale regarding the PR8 hybrid is not well described and should be described better.

      (3) Figure 3B and 6C: This should be given as numbers (quantified), not as '+'.

      (4) Figure 5A and 7A: Negative controls are missing.

      (5) The authors claim that they generate stable tetramers. Judging from SDS-PAGE provided in Supplementary Figure 3B (BS3-crosslined), many different species are present including monomers, dimers, tetramers, and degradation products of tetramers. In line 7 for example there are at least 5 bands.

      [Editors' note: the authors have appropriately responded to and addressed these points.]

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript described a structure-guided approach to graft important antigenic loops of the neuraminidase to a homotypic but heterologous NA. This approach allows the generation of well-expressed and thermostable recombinant proteins with antigenic epitopes of choice to some extent. The loop-grafted NA was designated hybrid.

      Strengths:

      The hybrid NA appeared to be more structurally stable than the loop-donor protein while acquiring its antigenicity. This approach is of value when developing a subunit NA vaccine which is difficult to express. So that antigenic loops could be potentially grafted to a stable NA scaffold to transfer strain-specific antigenicity.

      Weaknesses:

      However, major revisions to better organize the text, and figure and make clarifications on a number of points, are needed. There are a few cases in which a later figure was described first, data in the figures were not sufficiently described, or where there were mismatched references to figures.

      More importantly, the hybrid proteins did not show any of the advantages over the loop-donor protein in the format of VLP vaccine in mouse studies, so it's not clear why such an approach is needed to begin with if the original protein is doing fine.

      We thank the reviewer for their helpful comments. We have incorporated feedback from the authors to improve the manuscript. Please see our point-by-point response.

      The purpose of loop-grafting between H5N1/2021 (a high-expressor) and the PR8 virus was not to improve the expression of PR8, which is already a good expressing NA. Instead, the loop-grafting and the in vivo experiments were done to show the loop-specific protection following a lethal PR8 virus challenge.

      Reviewer #2 (Public review):

      In their manuscript, Rijal and colleagues describe a 'loop grafting' strategy to enhance expression levels and stability of recombinant neuraminidase. The work is interesting and important, but there are several points that need the author's attention.

      Major points

      (1) The authors overstress the importance of the epitopes covered by the loops they use and play down the importance of antibodies binding to the side, the edges, or the underside of the NA. A number of papers describing those mAbs are also not included.

      We have discussed the distribution of epitopes on NA molecule in the Discussion section "The distribution of epitopes in neuraminidase" (new line number 350). In Supplementary Figures 1 and 2, we have compiled the epitopes reported by polyclonal sera and mAbs via escape virus selection or crystal structural studies. There are 45 residues examples of escape virus selection, and we found that approximately 90% of the epitopes are located within the top loops (Loops 01 and Loops 23, which include the lateral sides and edges of NA). We have also included the epitopes of underside mAbs NDS.1 and NDS.3 in Supplementary Figure 2. Some of the interactions formed by these mAbs are also within the L01 and L23 loops. All relevant references are cited in Supplementary Figures 1 and 2.

      A new figure has been added [Figure 1b (ii)] to illustrate the surface mapping of epitopes on NA.

      (2) The rationale regarding the PR8 hybrid is not well described and should be described better.

      We described the rationale for the PR8 hybrid (new lines 247-250). For clarity, we have added the following sentence within the section "Loop transfer between two distant N1 NAs:...."

      (new lines 255-258):

      "mSN1 showed sufficient cross-reactivity to N1/09 to protect mice against virus challenge. Therefore, we performed loop transfer between mSN1 and PR8N1, which differ by 18 residues within the L01 and L23 loops and show no or minimal cross-reactivity, to assess the loop-specific protection."

      (3) Figure 3B and 6C: This should be given as numbers (quantified), not as '+'.

      We have included the numerical data in Supplementary Figure 6. The data is presented in semi-quantitative manner for simplification. To improve clarity, we have now added the following sentence to the Figure 3c legend: "Refer to Supplementary Figure 6 for binding titration data".

      (4) Figure 5A and 7A: Negative controls are missing.

      A pool of Empty VLP sera was included as a negative control, showing no inhibition at 1:40 dilution. In the figure legends, we have stated "Pooled sera to unconjugated mi3 VLP was negative control and showed no inhibition at 1:40 dilution (not included in the graphs)"

      (5) The authors claim that they generate stable tetramers. Judging from SDS-PAGE provided in Supplementary Figure 3B (BS3-crosslinked), many different species are present including monomers, dimers, tetramers, and degradation products of tetramers. In line 7 for example there are at least 5 bands.

      Tetrameric conformation of soluble proteins is evidenced by the size-exclusion chromatographs shown in Figures 3a and 6b. The BS3 crosslinked SDS-PAGE are only suggestive data, indicating that the protein is a tetramer if a band appears at ~250 kDa. However, depending on the reaction conditions, lower molecular weight bands may also be observed if crosslinking is incomplete.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      - Description of Figure 2 on page 3 should go before Figure 3 lines 87-105 or swap the order of the two figures.

      We have moved lines 91-96, which refer to Figure 3, to appear after Figure 2.

      - Figure 3a, an EC50 should be calculated for both NA activity assay.

      Figure 3a has been updated to include the EC50 and AUC (Area under curve) values for both NA activity assays. The same update has also been made for Figure 6b.

      - Line 150, I'm not sure it's appropriate to cite a manuscript that was in preparation but not published. I'm referring to the two mAbs AG7C and AF9C that were claimed to bind to the L01 and L23 loops but not.

      We have changed the "manuscript in preparation" to "personal communication with Dr. Yan Wu, Capital Medical University".

      - The description in Figure 4a is lacking.

      We have added a detailed description for Figure 4a.

      - Figure 4c, sufficient description is needed. For example, the cavity should be outlined and annotated, what is the role of Val149? Why the first monomer is assigned a number of II and the second monomer with a number of I.

      We have added a detailed description for Figure 4c and amended the figure as per the reviewer’s suggestions.

      - Figure 5a, in addition to ELLA data to mSN1 and N1/09, ELLA data to N1/19 should also be measured and shown. Figure S7, please show IC50 instead of curves for better comparison.

      We included IC50 for mSN1 and N1/09 as we intended to associate the loops with protection.  Graphs for N1/19 have not been reported, but the IC50 titres from pooled sera are shown in Supplementary Figure 7 as a representation. Due to the limited sera sample sourced from tail vein bleed, these assays were performed using pooled sera, which represent the total response (established in numbers of experiments).

      - Line 234-238, the author made a statement about the data shown in Figure 7b "These results mirrored several studies in the literature which showed that immunization with the 2009 N1 could provide at least partial protection in mice and ferrets to the avian H5N1 challenge". The data did not reflect that. In Figure 5b, mSN1 protects as well as other proteins. In fact, there was no advantage of N109 and N109 hybrid over mSN1 in protection against the homologous H1N109. Although higher levels of NAI antibodies were induced with the homologous protein in Figure 5a. The protection could be contributed by non-NAI antibodies, so the authors should measure binding antibodies. The author may increase the challenge dose from 200 LD50 to 1000 LD50 to see a difference due to the strong immunogenicity of the nanoparticles vaccine plus addavax. Otherwise, it looks like loop grafting is not necessary as heterologous NA could broadly protect.

      We agree that msN1, despite its low NAI titres, was equally protective as homologous NA or its hybrid NA against H1N1/09 virus challenge at 200 LD50. There may be additional protective components, including non-NAI antibodies in homologous groups that may have contributed to the protection.

      We assessed sera binding to H1N1/2009 and found that the binding antibody levels were also lower in the msN1 group. The corresponding graph has now been added in Figure S7d. It was difficult to determine the NAI titre required to confer protection in this experiment. For this reason, we later chose PR8 as the challenge virus to demonstrate loop-specific protection.

      We are uncertain whether a 1000 LD50 challenge would have helped establish a correlation between protection and NAI IC50 titres, as the dose used is already lethal for DBA/2 mice.

      - Why would the authors separate work with N1/09 and N1/19 from PR8 N1? To this reviewer's understanding, they are all the same strategies with increasing numbers of dissimilar residues from N1/09 (12) to N1/19 (16) and to PR8 (18). They are all characterized by the same approaches in vitro and in vivo.

      We had two different goals for making hybrids with N1/09 and PR8 N1, therefore, we have presented these results separately.

      (1) For N1/09 and N1/19, we showed that loop-grafting improved protein yield and stability. Additionally, we showed that the N1/09 hybrid can be as protective as the homologous protein.

      (2) PR8 N1 is a high-yielding protein, so loop grafting did not significantly increase its yield. However, the PR8 virus challenge confirmed loop-specific protection.

      - For in vivo study testing the PR8 construct, although PR8 and PR8 hybrid protect better than the heterologous mSN1, the hybrid again did not show any advantages over the PR8 original proteins.

      That's correct - the PR8 hybrid was not advantageous over the original PR8 protein. However, the purpose of this experiment was to demonstrate loop specific protection. The PR8 hybrid (PR8 loops - mS scaffold) protected 6/6 mice, whereas mS hybrid (mS loops - PR8 scaffold) provided no protection.

      - Line 243-249, lack of reference to figures.

      References to Supplementary Figure 7b,c and Figure 2 has been added.

      - What was the reason that the challenge was one by 200 LD50 for 2009 H1N1 and 1000 LD50 for PR8.

      Viruses were titrated in the BALB/c strain for PR8 virus and the DBA/2 strain for X-179A (H1N1/2009) virus. These doses were selected based on their lethality and the time required to reach the endpoint (~20% weight loss) post-infection, which is 5-6 days. Most studies in the literature have used 10 LD50 or higher; thus the virus doses we used are relatively high.

      - Line 268, there is no Figure 5C.

      This was a mistake and has been corrected to Figure 6c.

      - Line 275 what are the readers supposed to see in supplementary Figure 5a? There is not enough description for the referred figures.

      A sentence has been added to Fig S5a description, to make a point about recognition of the NA scaffold by mAb CD6. "Binding by mAb CD6 is predominantly scaffold dependent and occurs across two protomers"

      - The discussion is very long and some of it is not relevant to the study. For example, the role of the tetramerization domain and the basis for structurally stable tetramer formation, were not the focuses of this study.

      We felt it was important to discuss the tetramerisation domain and the basis for stable tetramer formation. A previous study by Ellis et al.  used the VASP tetramerisation domain and introduced multiple NA interface mutations to achieve a more stable closed conformation. In contrast, NA proteins used in our study required the tetrabrachion tetramerisation domain to form a properly assembled tetramer.

      In lines 382-383, there is one unfinished sentence.

      This is corrected.

      The definition of the loops is also confusing. Line 381, the author stated that in the N1/19 hybrid design, residue N200S, could have been considered as part of the loop B2L23, and was it not?

      The designation of loop ends should not be rigid but rather based on multiple factors such as, their proximity to antigenic epitopes, charge, and hydrophobicity. This is discussed in the " Definition of loops" section.

      - Figure 1a and Figure S2, please provide sufficient descriptions, what do the blocks in different colors mean?

      We have updated the Figure 1a legend to indicate the colours.

      The descriptions for Figures S1 and S2 have also been revised for clarity.

      Reviewer #2 (Recommendations for the authors):

      Minor points

      (1) Line 37: Should be 'Influenza virus neuraminidase'.

      This is corrected.

      (2) Line 65: https://pubmed.ncbi.nlm.nih.gov/35446141/, https://pubmed.ncbi.nlm.nih.gov/33568453/ and https://pubmed.ncbi.nlm.nih.gov/28827718/ indicate that protective mAbs bind all over the NA head domain.

      We have discussed the epitopes on the NA head in detail in the section "The distribution of epitopes on Neuraminidase". In Supplementary Figures 1 and 2, we compiled several studies, including those on polyclonal sera and mAbs epitopes, emphasizing that loops 01 and 23 are the predominant antibody targets (~90%). Some antibodies also bind to the underside of NA. We have discussed and referenced these studies accordingly.

      A new figure has been added [Figure 1b (ii)] to illustrate the surface mapping of epitopes on NA.

      The first reference has been included in both our discussion and Supplementary figure 1.

      The NA epitopes discussed in the second reference have also been incorporated into our discussion and Supplementary figures 1 and 2. Note that, the E258K mutation generated on the NA underside was not relevant to mAbs and was generated randomly by passaging of H3N2 A/New York/PV190/2017 virus. 

      The third reference pertains to murine mAbs against influenza B virus NA.

      (3) Lines 71, 72, and throughout: 'et al.' should be in italics.

      All "et al." have been italicised.

      (4) Many abbreviations are not defined including CHO, SDS-PAGE, MUNANA, mi3, HEPES, BSA, TPCK, MWCO, HRP, PBS, TMB, TCID50, LD50, MES, PEG, PGA, MME, PGA-LM.

      The text has been amended to define these abbreviations.

      (5) Line 209: Shouldn't this be ID50 instead of IC50? Also, it is not defined.

      IC50 has been defined.

      (6) Line 210, line 346, line 581-582: No need to capitalize letters at the beginning of words mid-sentence.

      This is amended.

      (7) Line 227: Is 2009 H1N1 NA meant?

      This has been changed to "H1N1/2009 neuraminidase"

      (8) Line 310: Is this really quantitatively true? (see major comment 1).

      Based on the compilation of epitopes from published NA mAbs and polyclonal sera (via escape mutagenesis and NA-Fabs crystal structures), it is accurate to state that the protective epitopes are primarily located within loops 01 and 23.

      Please also refer to our response to minor point 2. 

      (9) Line 352 and throughout the manuscript: 'in vitro' should be in italics.

      This is amended.

      (10) Line 355: https://pubmed.ncbi.nlm.nih.gov/35446141/https://pubmed.ncbi.nlm.nih.gov/33568453/ and https://pubmed.ncbi.nlm.nih.gov/28827718/ should be included here.

      Studies reporting epitopes on Influenza A neuraminidase have been compiled in Supplementary Figures 1 and 2 and cited appropriately.

      (11) Line 365: https://pubmed.ncbi.nlm.nih.gov/35446141/ and https://pubmed.ncbi.nlm.nih.gov/33568453/ also describe epitopes on the underside of the NA.

      Please refer to the above response to point 10.

      (12) Line 365: Reference https://pubmed.ncbi.nlm.nih.gov/37506693/ is missing here.

      The reference has been added.

      (13) Line 369-371: Is it really a minority?

      In terms of the protective response, the majority of the antibody response is directed towards loops 01 and 23, which form the top antigenic surface. The term 'lateral' is used in some literature to describe NA mAb epitopes; loops 01 and 23 also encompass the lateral regions.

      To clarify this, we have added the following sentence to the Discussion section - "The distribution of epitopes on neuraminidase"

      "It is important to note that loops 01 and 23 include a portion of epitopes that have been described in the literature as side, lateral, or underside (see mAbs NDS.1, NDS.3, and CD6 in Supplementary Fig. 2)"

      Additionally in our studies in mice, we showed that protection is mediated by antibodies targeting the loops (Figure 7). We are uncertain about the binding response to the NA underside, but the NA inhibiting and protective response to the underside appears to be minimal.

      Furthermore Lederhof et al. showed that among the 'underside' mAbs, NDS.1 protected mice against virus challenge, whereas NDS.3 did not. In our analysis (Supplementary Figure 2), NDS.1 makes eight-residue contacts with B4L01 and B5L01, whereas NDS.3 make five-residue contacts with B3L01 and B4L01.

      (14) Line 530: The A in ELLA already stands for assay.

      This is corrected.

    1. eLife Assessment

      This is an important study that examines the role of TFAM, a protein that helps maintain mtDNA, in mtDNA mutator mice. With convincing evidence, the authors have demonstrated that TFAM's counteractive role in mtDNA mutator mice is tissue-specific. The study does a thorough job of assessing the impact of modulating TFAM levels in a polg mutator mouse model of aging. The authors have thoroughly addressed all the points raised during the first round of review.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Kremer et al. characterizes the tissue-specific responses to changes in TFAM levels and mtDNA copy number in prematurely aging mice (polg mutator model). The authors find that overexpression of TFAM can have beneficial or detrimental effects depending on the tissue type. For instance, increased TFAM levels increase mtDNA copy number in the spleen and improve spleen homeostasis but do not elevate mtDNA copy number in the liver and impair mtDNA expression. Similarly, the consequences of reduced TFAM expression are tissue-specific. Reduced TFAM levels improve brown adipocyte tissue function while other tissues are unaffected. The authors conclude that these tissue-specific responses to altered TFAM levels demonstrate that there are tissue-specific endogenous compensatory mechanisms in response to the continuous mutagenesis produced in the prematurely aging mice model, including upregulation of TFAM expression, elevated mtDNA copy number, and altered mtDNA gene expression. Thus, the impact of genetically manipulating global TFAM expression is limited and there must be other determinants of mtDNA copy number under pathological conditions beyond TFAM.

      Strengths:

      Overall, this is an interesting study. It does a good job of demonstrating that given the multi-functional role of TFAM, the outcome of manipulating its activity is complex.

      Weaknesses:

      No major weaknesses noted. The authors have adopted all our suggestions to improve the clarity of the manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Kremer et al. investigates the impact of modulation of expression of TFAM, a key protein involved in mitochondrial DNA (mtDNA) packaging and expression, in mtDNA mutator mice, which carry random mtDNA mutations. While previous research suggested that increasing TFAM could counteract the pathological effects of mtDNA mutations, this study reveals that the effects of TFAM modulation are tissue-specific. These findings highlight the complexity of mtDNA copy number regulation and gene expression, emphasizing that TFAM alone is not the sole determinant of mtDNA levels in contexts where oxidative phosphorylation is impaired. Other factors likely play a significant role, underscoring the need for nuanced approaches when targeting TFAM for therapeutic interventions.

      Strengths:

      The data presented in the manuscript are of high quality and support the major conclusions.

      Comments on revisions:

      The authors have thoroughly addressed all the points raised during the first round of review. Their revisions effectively clarify key aspects of the manuscript, and the additional data and explanations have significantly improved the overall quality of the work. I believe the manuscript is now well-prepared for publication.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This manuscript by Kremer et al. characterizes the tissue-specific responses to changes in TFAM levels and mtDNA copy number in prematurely aging mice (polg mutator model). The authors find that overexpression of TFAM can have beneficial or detrimental effects depending on the tissue type. For instance, increased TFAM levels increase mtDNA copy number in the spleen and improve spleen homeostasis but do not elevate mtDNA copy number in the liver and impair mtDNA expression.

      Similarly, the consequences of reduced TFAM expression are tissue-specific. Reduced TFAM levels improve brown adipocyte tissue function while other tissues are unaffected. The authors conclude that these tissue-specific responses to altered TFAM levels demonstrate that there are tissue-specific endogenous compensatory mechanisms in response to the continuous mutagenesis produced in the prematurely aging mice model, including upregulation of TFAM expression, elevated mtDNA copy number, and altered mtDNA gene expression. Thus, the impact of genetically manipulating global TFAM expression is limited and there must be other determinants of mtDNA copy number under pathological conditions beyond TFAM. 

      Strengths: 

      Overall, this is an interesting study. It does a good job of demonstrating that given the multi-functional role of TFAM, the outcome of manipulating its activity is complex. 

      Weaknesses: 

      No major weaknesses were noted. We have minor suggestions for improving the clarity of the manuscript that are detailed in the "recommendations for the authors" section. 

      We thank the reviewer for the suggestions and addressed them as described in the "recommendations for the authors" section.

      Reviewer #2 (Public review): 

      Summary: 

      This study by Kremer et al. investigates the impact of modulation of expression of TFAM, a key protein involved in mitochondrial DNA (mtDNA) packaging and expression, in mtDNA mutator mice, which carry random mtDNA mutations. While previous research suggested that increasing TFAM could counteract the pathological effects of mtDNA mutations, this study reveals that the effects of TFAM modulation are tissue-specific. These findings highlight the complexity of mtDNA copy number regulation and gene expression, emphasizing that TFAM alone is not the sole determinant of mtDNA levels in contexts where oxidative phosphorylation is impaired. Other factors likely play a significant role, underscoring the need for nuanced approaches when targeting TFAM for therapeutic interventions. 

      Strengths: 

      The data presented in the manuscript is of high quality and supports major conclusions. 

      Weaknesses: 

      The statistical methods used are not clearly described, and some marked nonsignificant results appear visually significant, which raises concerns about data analysis. 

      Data presentation requires improvement. 

      We thank the reviewer for the comments. We updated the text in the Materials and Methods section to state the statistical methods and improved the figures as described in detail in the "recommendations for the authors" section.

      Recommendations for the authors:

      (1) Please include testis data in Figure 2 given previous work by authors showing that elevated mtDNA copy number can improve testis function. It would be interesting to compare the changes in mtDNA copy number in testis to these other tissues.

      We measured mtDNA copy number in testis using the CytB probe and added it as Supplementary figure 2 A.

      (2) The clarity of Table 1 could be improved. It is difficult to know whether the changes in the TFAM to mtDNA ratio are driven by changes in TFAM levels or mtDNA copy number. A suggestion is to include the TFAM and mtDNA values in parenthesis next to each listed ratio.

      We updated Table 1 and included the values of the normalized TFAM and mtDNA levels in parentheses.

      (3) The authors should consider showing TFAM western blot data in Figure 1.

      We thank the reviewer for the suggestion but would like to keep the TFAM western blot data with the other western blot data for the respective tissue.

      (4) The graphs for qPCR data (e.g. Figure 2) show mRNA or mtDNA levels relative to the control, which is always set to 1. Why, then, does the control group display error bars?

      For the normalization of the data to the WT group, we first calculate the average of the values from all the samples of the WT group. We then divide all values from the samples of all groups, including the WT group, by that average value. By doing so, we set the average value of the WT group to 1 and express all values from all samples of all groups, including the WT group, relative to this average value. Differences between the samples of the WT group are hence retained and allow for error calculations and the display of error bars.  

      (5) Page 3 second sentence to the last: overexpression of TFAM leads to...? Did the author mean mtDNA?

      We updated the text to “Heterozygous knockout of Tfam in wild-type mice results in ~50% decrease of mtDNA levels, whereas moderate overexpression of Tfam leads to ~50% increase in mtDNA levels25,26”

      (6) The sentence "In summary, mtDNA copy number regulation is more complex than previously assumed and the TFAM-to-mtDNA ratio seems to be finely tuned in a tissue-specific manner" - not clear who assumed (references?) and based on what data, please rephrase.

      We updated the text and it now reads “In summary, mtDNA copy number regulation is more complex than suggested by previous studies23–27 and the TFAM-to-mtDNA ratio seems to be finely tuned in a tissue-specific manner.”

      (7) The significant increase in complex II activity under TFAM overexpression (Figure 3) warrants additional discussion.

      We updated the Results section and it now reads “We detected increased levels of the complex II subunit Succinate Dehydrogenase Complex Iron Sulfur Subunit B (SDHB). Complex II is exclusively nuclear encoded and a compensatory increase upon impaired mitochondrial gene expresson has been observed before32.

      We proceeded to measure the enzyme activities of individual OXPHOS complexes in liver mitochondria (Fig. 3C). The complex I and complex IV activities were reduced to about 50% in Polg-/mut; Tfam+/+ mice in comparison with wild-type mice (Fig. 3C). However, we did not see any further alteration of the reduced enzyme activities induced by TFAM overexpression or reduced TFAM expression (Fig. 3C). Interestingly, we detected a significant increase in complex II and complex II + complex III activity upon TFAM overexpression, which can partially be explained by the increased complex II protein levels we oberseved in Polg-/mut; Tfam+/OE mice (Fig. 3, B and C).”

      (8) The statistical methods used should be explicitly stated. Some results marked as non-significant appear visually significant, for example, mt-Cytb in Figure 2C, Supplementary Figure 2B).

      We updated the text in the Materials and Methods section to state the statistical methods and it now reads “Statistical analysis and generation of graphs were performed with GraphPad Prism v9 software except for quantitative mass spectrometry data which was analyzed and plotted using R as described above. Statistical comparisons were performed using one-way analysis of variance (ANOVA), and post hoc analysis was conducted with Dunnett’s multiple comparisons test. Values of P < 0.05 were considered statistically significant.”

      Minor points: 

      (1) Replace numerical indications of significance with asterisks for consistency.

      We replaced all numerical indications of significance with asterisks.

      (2) Abbreviations SKM and BAT are not defined.

      We removed the mentioning of SKM (skeletal muscle) as the data from this tissue was not included. The Introduction reads “In contrast, in brown adipose tissue (BAT), a decrease in TFAM levels normalized Uncoupling protein 1 (Ucp1) expression.”

      (3) Use uniform scales across bar graphs in Figure 2 to improve clarity.

      We updated Figure 2 to have uniform scales.

      (4) Remove or increase the transparency of data points in Figure 1A to make group averages more discernible.

      We removed the data points in Figure 1A.

      (5) Add a Y-axis title to Figure 1C.

      We added the Y-axis title “Heart / body weight” to Figure 1C.

      (6) Size of the font used in some figures (4?) is not appropriate.

      We increased the font size for the figures.

      (7) All figure legend titles need work. Insert "expression" after TFAM in the Figure 2 title, Change the title to "Modulation of TFAM expression..." in Figure 4. 

      The figure legends now read as follows:

      “Figure 2: Modulation of TFAM expression affects mtDNA copy number in a tissue-specific manner.”

      “Figure 4: Alteration of TFAM expression does not affect the heart phenotype of mtDNA mutator mice.”

    1. eLife Assessment

      This important paper describes the regulatory pathway of rRNA synthesis by Meioc-Piwil1 in germ cell differentiation in zebrafish. Using the molecular genetic and cytological approaches, the authors provide convincing evidence that Meioc antagonizes Piwil1, which downregulates the 45S pre-rRNA synthesis by heterochromatin formation for spermatocyte differentiation. The results will be of use to researchers in the field of germ cell/meiosis as well as RNA biosynthesis and chromatin.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper Kawasaki et al describe a regulatory role for the PIWI/piRNA pathway in rRNA regulation in Zebrafish. This regulatory role was uncovered through a screen for gonadogenesis defective mutants, which identified a mutation in the meioc gene, a coiled-coil germ granule protein. Loss of this gene leads to redistribution of Piwil1 from germ granules to the nucleolus, resulting in silencing of rRNA transcription.

      Strengths:

      Most of the experimental data provided in this paper is compelling. It is clear that in the absence of meioc, PiwiL1 translocates in to the nucleolus and results in down regulation of rRNA transcription. the genetic compensation of meioc mutant phenotypes (both organismal and molecular) through reduction in PiwiL1 levels are evidence for a direct role for PiwiL1 in mediating the phenotypes of meioc mutant.

      Weaknesses:

      Questions remain on the mechanistic details by which PiwiL1 mediated rRNA down regulation, and whether this is a function of Piwi in an unperturbed/wildtype setting. There is certainly some evidence provided in support of the a natural function for piwi in regulating rRNA transcription (figure 5A+5B). However, the de-enrichment of H3K9me3 in the heterozygous (Figure 6F) is very modest and in my opinion not convincingly different relative to the control provided. It is certainly possible that PiwiL1 is regulating levels through cleavage of nascent transcripts. Another aspect I found confounding here is the reduction in rRNA small RNAs in the meioc mutant; I would have assumed that the interaction of PiwiL1 with the rRNA is mediated through small RNAs but the reduction in numbers do not support this model. But perhaps it is simply a redistribution of small RNAs that is occurring. Finally, the ability to reduce PiwiL1 in the nucleolus through polI inhibition with actD and BMH-21 is surprising. What drives the accumulation of PiwiL1 in the nucleolus then if in the meioc mutant there is less transcription anyway?

      Despite the weaknesses outlined, overall I find this paper to be solid and valuable, providing evidence for a consistent link between PIWI systems and ribosomal biogenesis. Their results are likely to be of interest to people in the community, and provide tools for further elucidating the reasons for this link.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors report that Meioc is required to upregulate rRNA transcription and promote differentiation of spermatogonial stem cells in zebrafish. The authors show that upregulated protein synthesis is required to support spermatogonial stem cells' differentiation into multi-celled cysts of spermatogonia. Coiled coil protein Meioc is required for this upregulated protein synthesis and for increasing rRNA transcription, such that the Meioc knockout accumulates 1-2 cell spermatogonia and fails to produce cysts with more than 8 spermatogonia. The Meioc knockout exhibits continued transcriptional repression of rDNA. Meioc interacts with and sequesters Piwil1 to the cytoplasm. Loss of Meioc increases Piwil1 localization to the nucleolus, where Piwil1 interacts with transcriptional silencers that repress rRNA transcription.

      Strengths:

      This is fundamental study that expands our understanding of how ribosome biogenesis contributes to differentiation and demonstrates that zebrafish Meioc plays a role in this process during spermatogenesis. This work also expands our evolutionary understanding of Meioc and Ythdc2's molecular roles in germline differentiation. In mouse, the Meioc knockout phenocopies the Ythdc2 knockout, and studies thus far have indicated that Meioc and Ythdc2 act together to regulate germline differentiation. Here, in zebrafish, Meioc has acquired a Ythdc2-independent function. This study also identifies a new role for Piwil1 in directing transcriptional silencing of rDNA.

      Comments on revisions:

      Major and minor concerns were addressed in the revision.

    4. Reviewer #3 (Public review):

      Summary:

      The paper describes the molecular pathway to regulate germ cell differentiation in zebrafish through ribosomal RNA biogenesis. Meioc sequesters Piwil1, a Piwi homolog, which suppresses the transcription of the 45S pre-rDNA by the formation of heterochromatin, to the perinuclear bodies.

      Strong points:

      The authors nicely provided the molecular evidence on the antagonism of Meioc to Piwil1 in the rRNA synthesis, which supported by the genetic evidence that the inability of the meioc mutant to enter meiosis is suppressed by the piwil1 heterozygosity. The authors nicely address my previous points.

      Weak points:

      Although the authors made an effort to revise the text. However, there are still some points that the authors need to check their text. Some of them are shown in "Minor points" below. I am sorry that some of them should have been pointed in my previous review.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper Kawasaki et al describe a regulatory role for the PIWI/piRNA pathway in rRNA regulation in Zebrafish. This regulatory role was uncovered through a screen for gonadogenesis defective mutants, which identified a mutation in the meioc gene, a coiled-coil germ granule protein. Loss of this gene leads to redistribution of Piwil1 from germ granules to the nucleolus, resulting in silencing of rRNA transcription.

      Strengths:

      Most of the experimental data provided in this paper is compelling. It is clear that in the absence of meioc, PiwiL1 translocates in to the nucleolus and results in down regulation of rRNA transcription. the genetic compensation of meioc mutant phenotypes (both organismal and molecular) through reduction in PiwiL1 levels are evidence for a direct role for PiwiL1 in mediating the phenotypes of meioc mutant.

      Weaknesses:

      Questions remain on the mechanistic details by which PiwiL1 mediated rRNA down regulation, and whether this is a function of Piwi in an unperturbed/wildtype setting. There is certainly some evidence provided in support of the natural function for piwi in regulating rRNA transcription (figure 5A+5B). However, the de-enrichment of H3K9me3 in the heterozygous (Figure 6F) is very modest and in my opinion not convincingly different relative to the control provided. It is certainly possible that PiwiL1 is regulating levels through cleavage of nascent transcripts. Another aspect I found confounding here is the reduction in rRNA small RNAs in the meioc mutant; I would have assumed that the interaction of PiwiL1 with the rRNA is mediated through small RNAs but the reduction in numbers do not support this model. But perhaps it is simply a redistribution of small RNAs that is occurring. Finally, the ability to reduce PiwiL1 in the nucleolus through polI inhibition with actD and BMH-21 is surprising. What drives the accumulation of PiwiL1 in the nucleolus then if in the meioc mutant there is less transcription anyway?

      Despite the weaknesses outlined, overall I find this paper to be solid and valuable, providing evidence for a consistent link between PIWI systems and ribosomal biogenesis. Their results are likely to be of interest to people in the community, and provide tools for further elucidating the reasons for this link.

      The amount of cytoplasmic rRNA in piwi+/- was increased by 26% on average (figure 5A+5B), the amount of ChiP-qPCR of H3K9 was decreased by about 26% (Figure 6F), and ChiP-qPCR of Piwil1 was decreased by 35% (Figure 6G), so we don't think there is a big discrepancy. On the other hand, the amount of ChiP-qPCR of H3K9 in meioc<sup>mo/mo</sup> was increased by about 130% (Figure 6F), while ChiP-qPCR of Piwil1 was increased by 50%, so there may be a mechanism for H3K9 regulation of Meioc that is not mediated by Piwil1. As for what drives the accumulation of Piwil1 in the nucleolus, although we have found that Piwil1 has affinity for rRNA (Fig. 6A), we do not know what recruits it. Significant increases in the 18-35nt small RNA of 18S, 28S rRNAs and R2 were not detected in meioc<sup>mo/mo</sup> testes enriched for 1-8 cell spermatogonia, compared with meioc<sup>+/mo</sup> testes. The nucleolar localization of Piwil1 has revealed in this study, which will be a new topic for future research.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors report that Meioc is required to upregulate rRNA transcription and promote differentiation of spermatogonial stem cells in zebrafish. The authors show that upregulated protein synthesis is required to support spermatogonial stem cells' differentiation into multi-celled cysts of spermatogonia. Coiled coil protein Meioc is required for this upregulated protein synthesis and for increasing rRNA transcription, such that the Meioc knockout accumulates 1-2 cell spermatogonia and fails to produce cysts with more than 8 spermatogonia. The Meioc knockout exhibits continued transcriptional repression of rDNA. Meioc interacts with and sequesters Piwil1 to the cytoplasm. Loss of Meioc increases Piwil1 localization to the nucleolus, where Piwil1 interacts with transcriptional silencers that repress rRNA transcription.

      Strengths:

      This is a fundamental study that expands our understanding of how ribosome biogenesis contributes to differentiation and demonstrates that zebrafish Meioc plays a role in this process during spermatogenesis. This work also expands our evolutionary understanding of Meioc and Ythdc2's molecular roles in germline differentiation. In mouse, the Meioc knockout phenocopies the Ythdc2 knockout, and studies thus far have indicated that Meioc and Ythdc2 act together to regulate germline differentiation. Here, in zebrafish, Meioc has acquired a Ythdc2-independent function. This study also identifies a new role for Piwil1 in directing transcriptional silencing of rDNA.

      Weaknesses:

      There are limited details on the stem cell-enriched hyperplastic testes used as a tool for mass spec experiments, and additional information is needed to fully evaluate the mass spec results. What mutation do these testes carry? Does this protein interact with Meioc in the wildtype testes? How could this mutation affect the results from the Meioc immunoprecipitation?

      Stem cell-enriched hyperplastic testes came from wild-type adult sox17::GFP transgenic zebrafish. Sperm were found in these hyperplastic testes, and when stem cells were transplanted, they self-renewed and differentiated into sperm. It is not known if the hyperplasias develop due to a genetic variant in the line. We added the following comment in L201-204.

      “The SSC-enriched hyperplastic testes, which are occasionally found in adult wildtype zebrafish, contain cells at all stages of spermatogenesis. Hyperplasia-derived SSCs self-renewed and differentiated in transplants of aggregates mixed with normal testicular cells.”

      Reviewer #3 (Public review):

      Summary:

      The paper describes the molecular pathway to regulate germ cell differentiation in zebrafish through ribosomal RNA biogenesis. Meioc sequesters Piwil1, a Piwi homolog, which suppresses the transcription of the 45S pre-rDNA by the formation of heterochromatin, to the perinuclear bodies. The key results are solid and useful to researchers in the field of germ cell/meiosis as well as RNA biosynthesis and chromatin.

      Strengths:

      The authors nicely provided the molecular evidence on the antagonism of Meioc to Piwil1 in the rRNA synthesis, which supported by the genetic evidence that the inability of the meioc mutant to enter meiosis is suppressed by the piwil1 heterozygosity.

      Weaknesses:

      (1) Although the paper provides very convincing evidence for the authors' claim, the scientific contents are poorly written and incorrectly described. As a result, it is hard to read the text. Checking by scientific experts would be highly recommended. For example, on line 38, "the global translation activity is generally [inhibited]", is incorrect and, rather, a sentence like "the activity is lowered relative to other cells" is more appropriate here. See minor points for more examples.

      Thank you for pointing that out. I corrected the parts pointed out.

      (2) In some figures, it is hard for readers outside of zebrafish meiosis to evaluate the results without more explanation and drawing.

      We refined Figure 1A and added explanation about SSC, sox17::egfp positive cells, and the SSC-enriched hyperplastic testis in L155-158.

      (3) Figure 1E, F, cycloheximide experiments: Please mention the toxicity of the concentration of the drug in cell proliferation and viability.

      When testicular tissue culture was performed at 0.1, 1, 10, 100, 250, and 500mM, abnormal strong OP-puro signals including nuclei were found in cells at 10mM or more. We added the results in the Supplemental Figure S2G. In addition, at 1mM, growth was perturbed in fast-growing 32≤-cell cysts of spermatogonia, but not in 1-4-cell spermatogonia, as described in L127-130.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I don't have any recommendations for improvement. While I have outlined some of the weaknesses of the paper above. I don't see addressing these questions as pertinent for publication of this paper.

      Reviewer #2 (Recommendations for the authors):

      (1) The manuscript uses the terms 1-2 cell spermatogonia, GSC, and SSC throughout the figures and text. For example, 1-2 cell spermatogonia is used in Figure 1C, GSC is used in Figure 1F, and SSC is used in Figure 1 legend. The use of all three terms without definitions as to how they each relate with one another is confusing, particularly to those outside the zebrafish spermatogenesis field. It would be best to only use one term if the three terms are used interchangeably or to define each term if they represent different populations.

      GSC is a writing mistake. In this study, sox17-positive cells, which have been confirmed to self-renew and differentiate (Kawasaki et al., 2016), are considered SSCs. On the other hand, a comparison of meioc and ythdc2 mutants revealed differences in the composition of each cyst, so we describe the number of cysts confirmed. We added new data that 1-2 cell spermatogonia are sox17-positive in Supplemental Figure S3 (L157-158).

      (2) Figure 1B: What does the "SC" label represent in these figure panels?

      We added the explanation in the Figure legend.

      (3) Fig 7B and S7B show incongruent results, and the text implies that Fig S7B data better reflects in vivo biology. It is not clear how the authors interpret the different results between 7B and S7B.

      Thank you for pointing that out. Fig 7A and 7B were obtained by isolating sox17-positive cells. Because it was difficult to detect nucleoli in the isolated cells, probably due to the isolation procedure, we added S7B, which was analyzed in sectioned tissues. As this reviewer pointed out, S7B reflects the in vivo state better, so we changed S7B to 7B and 7B to S7B.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      (1) For general readers, it is nice to add a scheme of zebrafish spermatogenesis (lines 77-78) together with Figure 1A.

      As mentioned above, we refined Figure 1A.

      (2) Line 28, silence: the word "silence" is too strong here since rDNA is transcribed in some levels to ensure the cell survival.

      Thank you for your comment. We changed "silence" to "maintain low levels."

      (3) Line 60, YTDHC2: Please explain more about what protein YTDHC2 is.

      We added a description of Ythdc2 in the introduction.

      (4) Line 69, Piwil1: Please explain more about what protein Piwil1 is.

      We added a description of Piwil1 in the introduction.

      (5) Figure 1B, sperm: Please show clearly which sperms are in this figure using arrows etc.

      We represented sperm using arrowheads in Fig 1B.

      (6) Figure 1C, SC: Please show what SC is in the legend.

      We added the explanation in the Figure legend.

      (7) Line 83, meiotic makers: should be "meiotic prophase I makers".

      Thank you for pointing out the inaccurate expression description. We revised it.

      (8) Line 84, phosphor-histone H3: Should be "histone H3 phospho-S10 "

      We revised it.

      (9) Figure S1A, PH3: Please add PH3 is "histone H3 phospho-S10 ".

      We revised it.

      (10) Figure S1A, moto+/-: this heterozygous mutant showed an increased apoptosis. If so, please mention this in the text. If not, please remove the data.

      Thank you for pointing that out. The heterozygous mutant did not increase apoptosis, so we removed the data.

      (11) Line 88, no females developed: This means all males in the mutant. If so, what Figure S1B shows? These cells are spermatocytes? No "oocytes" developed is correct here?

      All meioc<sup>mo/mo</sup> zebrafish were males, and the meioc<sup>mo/mo</sup> cells in Fig. S1B are spermatogonia. No spermatocytes or oocytes were observed. To show this, we added "no oocytes" in L90.

      (12) Line 89, initial stages: What do the initial stages mean here? Please explain.

      The “initial stages” was changed to the pachytene stage.

      (13) Figure S1C: mouse Meioc rectangle lacks a right portion of it. Please explain two mutations encode a truncated protein in the main text.

      I apologize. It seems that the portion was missing during the preparation of the manuscript. We corrected it. In addition, we added a description of the protein truncation in L100-101.

      (14) Line 99: What "GRCz11" is.

      GRCz11 refers to the version of the zebrafish reference genome assembly. We added this.

      (15) Figure S2A: Dotted lines are cysts. If so, please mention it in the legend.

      We corrected the figure legend.

      (16) Figure S2B and C:, B1-4, C1-7: Rather use spermatogonia etc as a caption here.

      We corrected the figure and figure legend.

      (17) Line 113, hereafter, wildtype: Should be "wild type" or "wild-type".

      We corrected them.

      (18) Figure 1C: Please indicate what dotted lines mean here.

      We added “Dotted lines; 1-2 cell spermatogonia.”

      (19) Line 113, de novo: Please italicize it.

      We corrected it.

      (20) Line 113-116: Figure 1D shows two populations in the protein synthesis (low and high) in the 1-2-cell stage. Please mention this in the text.

      We added mention of two population.

      (21) Line 121, in vitro: Please italicize it.

      We corrected it.

      (22) Line 138-139, Figure 2A: Please indicate two populations in the rRNA concentrations (low and high) in the 1-2-cell stage. How much % of each cell is?

      We added mention of two population and % of each cell.

      (23) Figure 2B, cytes: Please explain the rRNA expression in spermatocytes (cytes) in the text.

      The decrease in rRNA signal intensity in spermatocytes was added.

      (24) Figure 2A, lines 147, low signals: Figure 2A did not show big differences between wild type and the mutant. What did the authors mean here? Lower levels of rRNAs in the mutant than in wild type. If so, please write the text in that way.

      We think that it is important to note that we were unable to find cells with upregulated rRNA signals, and therefore changed to “could not find cells with high signals of rRNAs and Rpl15 in meioc<sup>mo/mo</sup> spermatogonia”.

      (25) Figure 2E: Please add a schematic figure of a copy of rDNA locus such as Fig. S3A right.

      We added a schema of rDNA locus and primer sites such as Figure S3A right (now Figure 2F) in Figure 2E.

      (26) Figure S3A: This Figure should be in the main Figure. The quantification of Northern blots should be shown as a graph with statistical analysis.

      We added the quantification and transfer to the main Figure (Figure 2F).

      (27) Figure 4A: Please show single-color images (red or green) with merged ones.

      We added single-color images in the Figure 4A.

      (28) Line 198, Piwil1: Please explain what Piwil1 is briefly.

      We are sorry, but we could not quite understand the meaning of this comment. To show that Piwil1 is located in the nucleolus, we indicated it as (Figure 4A, arrowhead) in L209.

      (29) Line 198, Ddx4-positive: What is "Ddx4-positive"? Explain it for readers.

      Ddx4 is a marker for germinal granules, and the description was changed to reflect this.

      (30) Line 209, Fig. S4D-G: Please mention the method of the detection of piRNA briefly.

      We have described that we have sequenced small RNAs of 18-35 nt. Accordingly, we changed the term piRNA to small RNA.

      (31) Line 217: Please mention piwil1 homozygous mutant are inviable.

      We added that piwil1-/- are viable in L231.

    1. eLife Assessment

      This study provides new insights into the expression profile of ILCs that demonstrate a history of RAG expression. It examines in part the potential intrinsic regulation of RAG expression and seeks to understand how the epigenetic state of ILCs is established, although a full understanding of intrinsic factors is only partially supported. The work provides a convincing and important molecular dataset, and strengthens our understanding of intrinsic regulation, and would be of interest more broadly to cell biologists seeking to understand immune cell development.

    2. Reviewer #1 (Public Review):

      The study starts with the notion that in an AD-like disease model, ILC2s in the Rag1 knock-out were expanded and contained relatively more IL-5+ and IL-13+ ILC2s. This was confirmed in the Rag2 knock-out mouse model.

      By using a chimeric mouse model in which wild-type knock-out splenocytes were injected into irradiated Rag1 knock-out mice, it was shown that even though the adaptive lymphocyte compartment was restored, there were increased AD-like symptoms and increased ILC2 expansion and activity. Moreover, in the reverse chimeric model, i.e. injecting a mix of wild-type and Rag1 knock-out splenocytes into irradiated wild-type animals, it was shown that the Rag1 knock-out ILC2s expanded more and were more active. Therefore, the authors could conclude that the RAG1 mediated effects were ILC2 cell-intrinsic.

      Subsequent fate-mapping experiments using the Rag1Cre;reporter mouse model showed that there were indeed RAGnaïve and RAGexp ILC2 populations within naïve mice. Lastly, the authors performed multi-omic profiling, using single-cell RNA sequencing and ATAC-sequencing, in which a specific gene expression profile was associated with ILC2. These included well-known genes but the authors notably also found expression of Ccl1 and Ccr8 within the ILC2. The authors confirmed their earlier observations that in the RAGexp ILC2 population, the Th2 regulome was more suppressed, i.e. more closed, compared to the RAGnaïve population, indicative of the suppressive function of RAG on ILC2 activity. I do agree with the authors' notion that the main weakness was that this study lacks the mechanism by which RAG regulates these changes in ILC2s.

      The manuscript is very well written and easy to follow, and the compelling conclusions are well supported by the data. The experiments are meticulously designed and presented. I wish to commend the authors for the study's quality.

    3. Reviewer #2 (Public Review):

      Summary:

      The study by Ver Heul et al., investigates the consequences of RAG expression for type 2 innate lymphoid cell (ILC2) function. RAG expression is essential for the generation of the receptors expressed by B and T cells and their subsequent development. Innate lymphocytes, which arise from the same initial progenitor populations, are in part defined by their ability to develop in the absence of RAG expression. However, it has been described in multiple studies that a significant proportion of innate lymphocytes show a history of Rag expression. In compelling studies several years ago, members of this research team revealed that early Rag expression during the development of Natural Killer cells (Karo et al., Cell 2014), the first described innate lymphocyte, had functional consequences.

      Here, the authors revisit this topic, a worthwhile endeavour given the broad history of Rag expression within all ILCs and the common use of RAG-deficient mice to specifically assess ILC function. Focusing on ILC2s and utilising state-of-the-art approaches, the authors sought to understand whether early expression of Rag during ILC2 development had consequences for activity, fitness, or function. Having identified cell-intrinsic effects in vivo, the authors investigated the causes of this, identifying epigenetic changes associated with the accessibility genes associated with core ILC2 functions.

      The manuscript is well written and does an excellent job of supporting the reader through reasonably complex transcriptional and epigenetic analyses, with considerate use of explanatory diagrams. Overall I think that the conclusions are fair, the topic is thought-provoking, and the research is likely of broad immunological interest. I think that the extent of functional data and mechanistic insight is appropriate.

      Strengths:

      - The logical and stepwise use of mouse models to first demonstrate the impact on ILC2 function in vivo and a cell-intrinsic role. Initial analyses show enhanced cytokine production by ILC2 from RAG-deficient mice. Then through two different chimeric mice (including BM chimeras), the authors convincingly show this is cell intrinsic and not simply as a result of lymphopenia. This is important given other studies implicating enhanced ILC function in RAG-/- mice reflect altered competition for resources (e.g. cytokines).

      - Use of Rag expression fate mapping to support analyses of how cells were impacted - this enables a robust platform supporting subsequent analyses of the consequences of Rag expression for ILC2.

      - Use of snRNA-seq supports gene expression and chromatin accessibility studies - these reveal clear differences in the data sets consistent with altered ILC2 function.

      - Convincing evidence of epigenetic changes associated with loci strongly linked to ILC2 function. This forms a detailed analysis that potentially helps explain some of the altered ILC2 functions observed in ex vivo stimulation assays.

      - Provision of a wealth of expression data and bioinformatics analyses that can serve as valuable resources to the field.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The study starts with the notion that in an AD-like disease model, ILC2s in the Rag1 knockout were expanded and contained relatively more IL-5<sup>+</sup> and IL-13<sup>+</sup> ILC2s. This was confirmed in the Rag2 knock-out mouse model.

      By using a chimeric mouse model in which wild-type knock-out splenocytes were injected into irradiated Rag1 knock-out mice, it was shown that even though the adaptive lymphocyte compartment was restored, there were increased AD-like symptoms and increased ILC2 expansion and activity. Moreover, in the reverse chimeric model, i.e. injecting a mix of wild-type and Rag1 knock-out splenocytes into irradiated wild-type animals, it was shown that the Rag1 knock-out ILC2s expanded more and were more active. Therefore, the authors could conclude that the RAG1 mediated effects were ILC2 cell-intrinsic.

      Subsequent fate-mapping experiments using the Rag1Cre;reporter mouse model showed that there were indeed RAGnaïve and RAGexp ILC2 populations within naïve mice. Lastly, the authors performed multi-omic profiling, using single-cell RNA sequencing and ATACsequencing, in which a specific gene expression profile was associated with ILC2. These included well-known genes but the authors notably also found expression of Ccl1 and Ccr8 within the ILC2. The authors confirmed their earlier observations that in the RAGexp ILC2 population, the Th2 regulome was more suppressed, i.e. more closed, compared to the RAGnaïve population, indicative of the suppressive function of RAG on ILC2 activity. I do agree with the authors' notion that the main weakness was that this study lacks the mechanism by which RAG regulates these changes in ILC2s.

      The manuscript is very well written and easy to follow, and the compelling conclusions are well supported by the data. The experiments are meticulously designed and presented. I wish to commend the authors for the study's quality.

      Even though the study is compelling and well supported by the presented data, some additional context could increase the significance:

      (1) The presence of the RAGnaïve and RAGexp ILC2 populations raises some questions on the (different?) origin of these populations. It is known that there are different waves of ILC2 origin (most notably shown in the Schneider et al Immunity 2019 publication, PMID 31128962). I believe it would be very interesting to further discuss or possibly show if there are different origins for these two ILC populations.

      Several publications describe the presence and origin of ILC2s in/from the thymus (PMIDs 33432227 24155745). Could the authors discuss whether there might be a common origin for the RAGexp ILC2 and Th2 cells from a thymic lineage? If true that the two populations would be derived from different populations, e.g. being the embryonic (possibly RAGnaïve) vs. adult bone marrow/thymus (possibly RAGexp), this would show a unique functional difference between the embryonic derived ILC2 vs. adult ILC2.

      We agree with the Reviewer that our findings raise important questions about ILC ontogeny. These are areas of ongoing investigation for us, and it is our hope this study may inform further investigation by others as well.

      Regarding the Schneider et al study, we have considered the possibility that RAG expression may mark a particular wave of ILC2 origin. In that study, the authors used a tamoxifen-based inducible Cre strategy in their experiments to precisely time the lineage tracing of a reporter from the Rosa26 locus. Those lineage tracing mice would overlap genetically with the RAG lineage tracing mice we used in our current study, thus performing combined timed migration fate mapping and RAG fate mapping experiments would require creating novel mouse strains.

      Similarly, the possible influence of the thymic or bone marrow environment on RAG expression in ILCs is an exciting possibility. Perhaps there are signals common to those environments that can influence all developing lymphocytes, including not only T and B cells but also ILCs, with one consequence being induction of RAG expression. While assessing levels of RAG-experienced ILCs in these tissues using our lineage tracing mouse may hint at these possibilities, conclusive evidence would require more precise control over the timing of RAG lineage tracing than our current reagents allow (e.g. to control for induction in those environments vs migration of previously fate-mapped cells to those environments).

      To answer these questions directly, we are developing orthogonal lineage tracing mouse strains, which can report on both timing of ILC development and RAG expression, but these mice are not available yet. Given the limitations of our currently available reagents, we were careful to focus our manuscript on the skin phenotype and the more descriptive aspects of the RAG-induced phenotype. We have elaborated on these important questions and referenced all the studies noted by the Reviewer in the Discussion section as areas of future inquiry on lines 421-433.  

      (2) On line 104 & Figures 1C/G etc. the authors describe that in the RAG knock-out ILC2 are relatively more abundant in the lineage negative fraction. On line 108 they further briefly mentioned that this observation is an indication of enhanced ILC2 expansion. Since the study includes an extensive multi-omics analysis, could the authors discuss whether they have seen a correlation of RAG expression in ILC2 with regulation of genes associated with proliferation, which could explain this phenomenon?

      We thank the Reviewer for pointing out this opportunity to further correlate our functional and multiomic findings. To address this, we first looked deeper into our prior analyses and found that among the pathways enriched in GSEA analysis of differentially expressed genes (DEGs) between RAG<sup>+</sup> and RAG<sup>-</sup> ILC2s, one of the pathways suppressed in RAG<sup>+</sup> ILC2s was “GOBP_EPITHELIAL_CELL_PROLIFERATION.”

      ( Author response image 1). There are a few other gene sets present in other databases such as MSigDB with terms including “proliferation,” but these are often highly specific to a particular cell type and experimental or disease condition (e.g. tissue-specific cancers). We did not find any of these enriched in our GSEA analysis.

      Author response image 1.

      GSEA plot of GOBP epithelial proliferation pathway in RAG-experienced vs RAG-naïve ILC2s.

      The ability to predict cellular proliferation states from transcriptomic data is an area of active research, and there does not appear to be any universally accepted method to do this reliably. We found two recent studies (PMIDs 34762642; 36201535) that identified novel “proliferation signatures.” Since these gene sets are not present in any curated database, we repeated our GSEA analysis using a customized database with the addition of these gene sets. However, we did not find enrichment of these sets in our RAG+/- ILC2 DEG list. We also applied our GPL strategy integrating analysis of our epigenomic data to the proliferation signature genes, but we did not see any clear trend. Conversely, our GSEA analysis did not identify any enrichment for apoptotic signatures as a potential mechanism by which RAG may suppress ILC2s.

      Notwithstanding the limitations of inferring ILC2 proliferation states from transcriptomic and epigenomic data, our experimental data suggest RAG exerts a suppressive effect on ILC2 proliferation. To formally test the hypothesis that RAG suppresses proliferation in the most rigorous way, we feel new mouse strains are needed that allow simultaneous RAG fate mapping and temporally restricted fate mapping. We elaborate on this in new additions to the discussion on lines 421-433.

      Reviewer #2 (Public Review):

      Summary:

      The study by Ver Heul et al., investigates the consequences of RAG expression for type 2 innate lymphoid cell (ILC2) function. RAG expression is essential for the generation of the receptors expressed by B and T cells and their subsequent development. Innate lymphocytes, which arise from the same initial progenitor populations, are in part defined by their ability to develop in the absence of RAG expression. However, it has been described in multiple studies that a significant proportion of innate lymphocytes show a history of Rag expression. In compelling studies several years ago, members of this research team revealed that early Rag expression during the development of Natural Killer cells (Karo et al., Cell 2014), the first described innate lymphocyte, had functional consequences.

      Here, the authors revisit this topic, a worthwhile endeavour given the broad history of Rag expression within all ILCs and the common use of RAG-deficient mice to specifically assess ILC function. Focusing on ILC2s and utilising state-of-the-art approaches, the authors sought to understand whether early expression of Rag during ILC2 development had consequences for activity, fitness, or function. Having identified cell-intrinsic effects in vivo, the authors investigated the causes of this, identifying epigenetic changes associated with the accessibility genes associated with core ILC2 functions.

      The manuscript is well written and does an excellent job of supporting the reader through reasonably complex transcriptional and epigenetic analyses, with considerate use of explanatory diagrams. Overall I think that the conclusions are fair, the topic is thoughtprovoking, and the research is likely of broad immunological interest. I think that the extent of functional data and mechanistic insight is appropriate.

      Strengths:

      - The logical and stepwise use of mouse models to first demonstrate the impact on ILC2 function in vivo and a cell-intrinsic role. Initial analyses show enhanced cytokine production by ILC2 from RAG-deficient mice. Then through two different chimeric mice (including BM chimeras), the authors convincingly show this is cell intrinsic and not simply as a result of lymphopenia. This is important given other studies implicating enhanced ILC function in RAG-/- mice reflect altered competition for resources (e.g. cytokines).

      - Use of Rag expression fate mapping to support analyses of how cells were impacted - this enables a robust platform supporting subsequent analyses of the consequences of Rag expression for ILC2.

      - Use of snRNA-seq supports gene expression and chromatin accessibility studies - these reveal clear differences in the data sets consistent with altered ILC2 function.

      - Convincing evidence of epigenetic changes associated with loci strongly linked to ILC2 function. This forms a detailed analysis that potentially helps explain some of the altered ILC2 functions observed in ex vivo stimulation assays.

      - Provision of a wealth of expression data and bioinformatics analyses that can serve as valuable resources to the field.

      We appreciate the strengths noted by the Reviewer for our study. We would like to especially highlight the last point about our single cell dataset and provision of supplemental data tables. Although our study is focused on AD-like skin disease and skin draining lymph nodes, we hope that our findings can serve as a valuable resource for future investigation into mechanisms of RAG modulation of ILC2s in other tissues and disease states.  

      Weaknesses:

      - Lack of insight into precisely how early RAG expression mediates its effects, although I think this is beyond the scale of this current manuscript. Really this is the fundamental next question from the data provided here.

      We thank the Reviewer for their recognition of the context of our current work and its future implications. We aimed to present compelling new observations within the scope of what our current data can substantiate. We believe answering the next fundamental question of the mechanisms by which RAG mediates its effects in ILC2s will require development of novel reagents. We are actively pursuing this, and we look forward to others building on our findings as well.

      - The epigenetic analyses provide evidence of differences in the state of chromatin, but there is no data on what may be interacting or binding at these sites, impeding understanding of what this means mechanistically.

      We thank the Reviewer for pointing out this aspect of the epigenomic data analysis and the opportunity to expand the scope of our manuscript. We performed additional analyses of our data to identify DNA binding motifs and infer potential transcription factors that may be driving the effects of a history of RAG expression that we observed. We hope that these additional data, analyses, and interpretation add meaningful insight for our readers.

      We first performed the analysis for the entire dataset and validated that the analysis yielded results consistent with prior studies (e.g. finding EOMES binding motifs as a marker in NK cells). Then, we examined the differences in RAG fate-mapped ILC2s. These analyses are in new Figure S10 and discussed on lines 277-316.  

      We also performed an analysis specifically on the Th2 locus, given the effects of RAG on type 2 cytokine expression. These analyses are in new Figure S12 and discussed on lines 366-378.

      - Focus on ILC2 from skin-draining lymph nodes rather than the principal site of ILC2 activity itself (the skin). This may well reflect the ease at which cells can be isolated from different tissues.

      We appreciate the Reviewer’s insight into the limitations of our study. Difficulties in isolating ILC2s from the skin were indeed a constraint in our study. In particular, we were unable to isolate enough ILC2s from the skin for stimulation and cytokine staining. Given that one of our main hypotheses was that RAG affects ILC2 function, we focused our studies on skin draining lymph nodes, which allowed measurement of the two main ILC2 functional cytokines, IL-5 and IL-13, as readouts in the key steady state and AD-like disease experiments.

      - Comparison with ILC2 from other sites would have helped to substantiate findings and compensate for the reliance on data on ILC2 from skin-draining lymph nodes, which are not usually assessed amongst ILC2 populations.

      We agree with the Reviewer that a broader survey of the RAG-mediated phenotype in other tissues and by extension other disease models would strengthen the generalizability of our observations. Indeed, we did a more expansive survey of tissues in our BM chimera experiments. We found a similar trend to our reported findings in the sdLN in tissues known to be affected by ILC2s ( Author response image 2) including the skin and lung and in other lymphoid tissues including spleen and mesenteric lymph nodes (mLN). We found that donor reconstitution in each tissue was robust except for the skin, where there was no significant difference between host and -donor CD45<sup>+</sup> immune cells and where CD45<sup>-</sup> parenchymal cells predominated ( Author response image 2A,C,E,G,I). This may explain why Rag1<sup>-/-</sup> donor ILC2s were significantly higher in proportion in all tissues except the skin, where we observed a similar trend that was not statistically significant ( Author response image 2B,D,F,H,J).

      Notwithstanding these results, given that we unexpectedly observed enhanced AD-like inflammation in the MC903 model in Rag1 KO mice, we concentrated our later experiments and analyses on defining the differences in skin draining ILC2s modulated by RAG. Our subsequent findings in the skin provoke many new hypotheses about the role of RAG in ILC2s in other tissues, and our tissue survey in the BM chimera provides additional rationale to pursue similar studies in disease models in other tissues. While this is an emerging area of investigation in our lab, we opted to focus this manuscript on our findings related to the AD-like disease model. We have ongoing studies to investigate other tissues, and we are still in the early stages of developing disease models to expand on these findings. However, if the reviewer feels strongly this additional data should be included in the manuscript, we are happy to add it. Considering the complexity of the data and concepts in the manuscript, we hoped to keep it focused to where we have strong molecular, cellular, and phenotypic outcomes.

      Author response image 2.

      Comparison of immune reconstitution in and ILC2 donor proportions in different tissues from BM chimeras. Equal quantities of bone marrow cells from Rag1<sup>-/-</sup> (CD45.2,CD90.2) and WT (CD45.2, CD90.1) C57Bl/6J donor mice were used to reconstitute the immune systems of irradiated recipient WT (CD45.1) C57Bl/6J mice. The proportion of live cells that are donor-derived (CD45.2), host-derived (CD45.1), or parenchymal (CD45-) [above] and proportion of ILC2s that are from Rag1<sup>-/-</sup> (CD90.2) or WT (CD90.1) donors [below] for A,B) skin C,D) sdLN E,F) lung G,H) spleen and I,J) mLN.

      - The studies of how ILC2 are impacted are a little limited, focused exclusively on IL-13 and IL-5 cytokine expression.

      We agree with the reviewer that our functional readout on IL-5 and IL-13 is relatively narrow. However, this focused experimental design was based on several considerations. First, IL-5 and IL-13 are widely recognized as major ILC2 effector molecules (Vivier et al, 2018, PMID 30142344). Second, in the MC903 model of AD-like disease, we have previously shown a clear correlation between ILC2s, levels of IL-5 and IL-13, and disease severity as measured by ear thickness (Kim et al, 2013, PMID 23363980). Depletion of ILC2s led to decreased levels of IL-13 and IL-5 and correspondingly reduced ear inflammation. However, while ILC2s are also recognized to produce other effector molecules such as IL-9 and Amphiregulin, which are likely involved in human atopic dermatitis (Namkung et al, 2011, PMID 21371865; Rojahn et al, 2020, PMID 32344053), there is currently no evidence linking these effectors to disease severity in the MC903 model. Third, IL-13 is emerging as a key cytokine driving atopic dermatitis in humans (Tsoi et al, 2019, PMID 30641038). Drugs targeting the IL-4/IL-13 receptor (dupilumab), or IL-13 itself (tralokinumab, lebrikizumab), have shown clear efficacy in treating atopic dermatitis. Interestingly, drugs targeting more upstream molecules, like TSLP (tezepelumab) or IL-33 (etokimab), have failed in atopic dermatitis. Taken together, these findings from both mouse and human studies suggest IL-13 is a critical therapeutic target, and thus functional readout, in determining the clinical implications of type 2 immune activation in atopic dermatitis.

      Aside from effector molecules, other readouts such as surface receptors may be of interest in understanding the mechanism of how RAG influences ILC2 function. For example, IL-18 has been shown to be an important co-stimulatory molecule along with TSLP in driving production of IL-13 by cutaneous ILC2s (Ricardo-Gonzalez et al, 2018, PMID 30201992). Our multiomic analysis showed decreased IL-18 receptor regulome activity in RAG-experienced ILC2s, which may be a mechanism by which RAG suppresses IL-13 production. Ultimately, in that study the role of IL-18 in enhancing MC903-induced inflammation through ILC2s was via increased production of IL-13, which was one of our major functional readouts. To clearly define mechanisms like these will require generation of new mice to interrogate RAG status in the context of tissue-specific knockout of other genes, such as the IL-18 receptor. We plan to perform these types of experiments in follow up studies. Notwithstanding this, we have now included additional discussion on lines 476508 to highlight why understanding how RAG impacts other regulatory and effector pathways would be an interesting area of future inquiry.

      Reviewer #3 (Public Review):

      In this study, Ver Heul et al. investigate the role of RAG expression in ILC2 functions. While RAG genes are not required for the development of ILCs, previous studies have reported a history of expression in these cells. The authors aim to determine the potential consequences of this expression in mature cells. They demonstrate that ILC2s from RAG1 or RAG2 deficient mice exhibit increased expression of IL-5 and IL-13 and suggest that these cells are expanded in the absence of RAG expression. However, it is unclear whether this effect is due to a direct impact of RAG genes or a consequence of the lack of T and B cells in this condition. This ambiguity represents a key issue with this study: distinguishing the direct effects of RAG genes from the indirect consequences of a lymphopenic environment.

      The authors focus their study on ILC2s found in the skin-draining lymph nodes, omitting analysis of tissues where ILC2s are more enriched, such as the gut, lungs, and fat tissue. This approach is surprising given the goal of evaluating the role of RAG genes in ILC2s across different tissues. The study shows that ILC2s derived from RAG-/- mice are more activated than those from WT mice, and RAG-deficient mice show increased inflammation in an atopic dermatitis (AD)-like disease model. The authors use an elegant model to distinguish ILC2s with a history of RAG expression from those that never expressed RAG genes. However, this model is currently limited to transcriptional and epigenomic analyses, which suggest that RAG genes suppress the type 2 regulome at the Th2 locus in ILC2s.

      We agree with the Reviewer that understanding the role of RAG in ILC2s across different tissues is an important goal. One of the primary inspirations for our paper was the clinical paradox that patients with Omenn syndrome, despite having profound adaptive T cell deficiency, develop AD with much greater penetrance than in the general population. Thus, there was always an appreciation for the likelihood that skin ILC2s have a unique proclivity towards the development of AD-like disease. Notwithstanding this, given the profound differences that can be found in ILC2s based on their tissue residence and disease state (as the Reviewer also points out below), we focused our investigations on characterizing the skin draining lymph nodes to better define factors underlying our initial observations of enhanced AD-like disease in Rag1<sup>-/-</sup> mice. While our findings in skin provoke the hypothesis that similar effects may be observed in other tissues and influence corresponding disease states, we were cautious not to suggest this may be the case by reporting surveys of other tissues without development of additional disease models to formally test these hypotheses. We present this manuscript now as a short, skin-focused study, rather than delaying publication to expand its scope. Truthfully, this project started in 2015 and has undergone many delays with the hopes of newer technologies and reagents coming to add greater clarity. We hope our study will enable others to pursue the goal of understanding the broader effects of RAG in ILC2s, and potentially other innate lymphoid lineages as well.

      We did a more expansive survey of tissues in our BM chimera experiments. We found a similar trend to our reported findings in the sdLN in tissues known to be affected by ILC2s ( Author response image 2) including the skin and lung and in other lymphoid tissues including spleen and mesenteric lymph nodes (mLN). We found that donor reconstitution in each tissue was robust except for the skin, where there was no significant difference between host and donor CD45<sup>+</sup> immune cells and where CD45<sup>-</sup> parenchymal cells predominated ( Author response image 2A,C,E,G,I). This may explain why Rag1<sup>-/-</sup> donor ILC2s were significantly higher in proportion in all tissues except the skin, where we observed a similar trend that was not statistically significant ( Author response image 2B,D,F,H,J). However, given the lack of correlation to disease readouts in other organ systems, we chose to not include this data in our manuscript. However, if the Reviewer feels these data should be included, we would be happy to include as a supplemental figure.

      The authors report a higher frequency of ILC2s in RAG-/- mice in skin-draining lymph nodes, which is expected as these mice lack T and B cells, leading to ILC expansion. Previous studies have reported hyper-activation of ILCs in RAG-deficient mice, suggesting that this is not necessarily an intrinsic phenomenon. For example, RAG-/- mice exhibit hyperphosphorylation of STAT3 in the gut, leading to hyperactivation of ILC3s. This study does not currently provide conclusive evidence of an intrinsic role of RAG genes in the hyperactivation of ILC2s. The splenocyte chimera model is artificial and does not reflect a normal environment in tissues other than the spleen. Similarly, the mixed BM model does not demonstrate an intrinsic role of RAG genes, as RAG1-/- BM cells cannot contribute to the B and T cell pool, leading to an expected expansion of ILC2s. As the data are currently presented it is expected that a proportion of IL-5-producing cells will come from the RAG1/- BM.

      The Reviewer raises an important point about the potential cell-intrinsic roles of RAG vs the many cell-extrinsic explanations that could affect ILC2 populations, with the most striking being the lack of T and B cells in RAG knockout mice. It is well-established that splenocyte transfer into T and B cell-deficient mice reconstitutes T cell-mediated effects (such as the T cell transfer colitis model pioneered by Powrie and others), and we were careful in our interpretation of the splenocyte chimera experiment to conclude only that lack of Tregs was unlikely to explain the enhanced ADlike disease in T (and B) cell-deficient mice.

      We agree with the Reviewer that the Rag1<sup>-/-</sup> BM will not contribute to the B and T cell pool. However, BM from the WT mice would be expected to contribute to development of the adaptive lymphocyte pool. Indeed, we found that most of the CD45<sup>+</sup> immune cells in the spleens of BM chimera mice were donor-derived ( Author response image 3A), and total levels of B cells and T cells showed reconstitution in a pattern similar to control spleens from donor WT mice, while spleens from donor Rag1<sup>-/-</sup> mice expectedly had essentially no detectable adaptive lymphocytes ( Author response image 3B-D). From this, we concluded the BM chimera experiment was successful in establishing an immune environment with the presence of adaptive lymphocytes, and the differences in ILC2 proportions we observed were in the context of developing alongside a normal number of B and T lymphocytes. Notwithstanding the potential role of the adaptive lymphocyte compartment in shaping ILC2 development, since we transplanted equal amounts of WT and Rag1<sup>-/-</sup> BM into the same recipient environment, we are not able to explain how cell-extrinsic effects alone would account for the unequal numbers of WT vs Rag1<sup>-/-</sup> ILC2s we observed after immune reconstitution.

      Author response image 3.

      Comparison of immune reconstitution in BM chimeras to controls. Equal quantities of bone marrow cells from Rag1<sup>-/-</sup> (CD45.2) and WT (CD45.2) C57Bl/6J donor mice were used to reconstitute the immune systems of irradiated recipient WT (CD45.1) C57Bl/6J mice. A) Number of WT recipient CD45.1+ immune cells in the spleens of recipient mice compared to number of donor CD45.2+ cells (WT and Rag1<sup>-/-</sup>) normalized to 100,000 live cells. Comparison of numbers of B cells, CD4+ T cells, and CD8+ T cells in spleens of B) BM chimera mice, C) control WT mice and D) control Rag1<sup>-/-</sup> mice.

      We also subsequently found transcriptional and epigenomic differences in RAG-experienced ILC2s compared to RAG-naïve ILC2s. Critically, these differences were present in ILC2s from the same mice that had developed normally within an intact immune system, rather than in the setting of a BM transplant or a defective immune background such as in Rag1<sup>-/-</sup> mice.

      We recognize that there are almost certainly cell-extrinsic factors affecting ILC2s in Rag1<sup>-/-</sup> mice due to lack of B and T cells, and that BM chimeras are not perfect substitutes for simulating normal hematopoietic development. However, the presence of cell-extrinsic effects does not negate the potential contribution of cell-intrinsic factors as well, and we respectfully stand by our conclusion that our data support a role, however significant, for cell-intrinsic effects of RAG in ILC2s.

      Finally, the Reviewer mentions the interesting observation that gut ILC3s exhibit hyperphosphorylation of STAT3 in Rag1<sup>-/-</sup> mice compared to WT as an example of cell-extrinsic effects of RAG deficiency (we assume this is in reference to Mao et al, 2018, PMID 29364878 and subsequent work). We now reference this paper and have included additional discussion on how our observations of ILC2s may be generalizable to not only other organ systems, but also other ILC subsets, limitations on these generalizations, and future directions on lines 477-520.

      Overall, the level of analysis could be improved. Total cell numbers are not presented, the response of other immune cells to IL-5 and IL-13 (except the eosinophils in the splenocyte chimera mice) is not analyzed, and the analysis is limited to skin-draining lymph nodes.

      We thank the Reviewer for the suggestions to add rigor to our analysis. ILC2 populations are relatively rare, and we designed our experiments to assess frequencies, rather than absolute numbers. We did not utilize counting beads, so our counts may not be comparable between samples. We have added additional data for absolute cell counts normalized to 100,000 live cells for each experiment (see below for a summary of new panels in each figure). Our new data on total cell numbers are consistent with the initial observations regarding frequency of ILC2s we reported from our experiments. For the BM chimera experiments, we presented the proportions of ILC2s, and IL-5 and IL-13 positive ILC2s, by donor source, as this is the critical question of the experiment. Notwithstanding our analysis by proportion, we found that the frequency of Rag1<sup>-/-</sup> ILC2s, IL-5<sup>+</sup> cells, or IL-13<sup>+</sup> cells within Lin- population was also significantly increased. While our initial submission included only the proportions for clarity and simplicity, we now include frequency and absolute numbers in new panels for more critical appraisal of our data by readers.

      In New Figure 1, we added new panels for ILC2 cell number in both the AD-like disease experiment (C) and in steady state (H).

      In New Figure S2, we added a panel for ILC2 cell number in steady state (B).

      In Figure 2 and associated supplemental data in Figure S4, we added several more panels. For the splenocyte chimera, we added a panel for ILC2 cell number in New Figure 2C.

      We incorporated multiple new panels in New Figure S4 to address the need for more data to be shown for the BM chimera (also requested by Reviewer #2). These included total cell counts and frequency for ILC2 (New Figure S4F,G), and IL-5<sup>+</sup> (New Figure S4I,K) and IL-13<sup>+</sup> (New Figure S4J,L) ILCs in addition to the proportions originally presented in Figure 2.  

      In terms of the limited analysis of other tissues, our initial observation of enhanced AD-like disease in Rag1<sup>-/-</sup> compared to WT mice built on our prior work elucidating the role of ILC2s in the MC903 model of AD-like disease in mice and AD in humans (Kim et al, 2013, PMID 23363980). Consequently, we focused on the skin to further develop our understanding of the role of RAG1 in this model. As in our prior studies, technical limitations in obtaining sufficient numbers of ILC2s from the skin itself for ex vivo stimulation to assess effector cytokine levels required performing these experiments in the skin draining lymph nodes.

      We agree that IL-5 and IL-13 are major mediators of type 2 pathology and studying their effects on immune cells is an important area of inquiry, particularly since there are multiple drugs available or in development targeting these pathways. However, our goal was not to study what was happening downstream of increased cytokine production from ILC2s, but instead to understand what was different about RAG-deficient or RAG-naïve ILC2s themselves that drive their expansion and production of effector cytokines compared to RAG-sufficient or RAGexperienced ILC2s. By utilizing the same MC903 model in which we previously showed a critical role for ILC2s in driving IL-5 and IL-13 production and subsequent inflammation in the skin, we were able to instead focus on defining the cell-intrinsic aspects of RAG function in ILC2s.

      The authors have a promising model in which they can track ILC2s that have expressed RAG or not. They need to perform a comprehensive characterization of ILC2s in these mice, which develop in a normal environment with T and B cells. Approximately 50% of the ILC2s have a history of RAG expression. It would be valuable to know whether these cells differ from ILC2s that never expressed RAG, in terms of proliferation and expression of IL5 and IL-13. These analyses should be conducted in different tissues, as ILC2s adapt their phenotype and transcriptional landscape to their environment. Additionally, the authors should perform their AD-like disease model in these mice.

      We agree with the Reviewer (and a similar comment from Reviewer #2) that a broader survey of the RAG-mediated phenotype in other tissues and by extension other disease models would strengthen the generalizability of our observations. Indeed, we did a more expansive survey of tissues in our BM chimera experiments. We found a similar trend to our reported findings in the sdLN in tissues known to be affected by ILC2s ( Author response image 2) including the skin and lung and in other lymphoid tissues including spleen and mesenteric lymph nodes (mLN). We found that donor reconstitution in each tissue was robust except for the skin, where there was no significant difference between host and donor CD45<sup>+</sup> immune cells and where CD45<sup>-</sup> parenchymal cells predominated (Author response image 2A,C,E,G,I). This may explain why Rag1<sup>-/-</sup> donor ILC2s were significantly higher in proportion in all tissues except the skin, where we observed a similar trend that was not statistically significant (Author response image 2B,D,F,H,J). We omitted these analyses to maintain the focus on the skin, but we will be happy to add this data to the manuscript if the Reviewer feels this figure should be helpful.

      Notwithstanding these results, given that we unexpectedly observed enhanced AD-like inflammation in the MC903 model in Rag1 KO mice, we concentrated our later experiments and analyses on defining the differences in skin draining ILC2s modulated by RAG. Our subsequent findings in the skin provoke many new hypotheses about the role of RAG in ILC2s in other tissues, and our tissue survey in the BM chimera provides additional rationale to pursue similar studies in disease models in other tissues. While this is an emerging area of investigation in our lab, we opted to focus this manuscript on our findings related to the AD-like disease model. We have ongoing studies to investigate other tissues, and we are still in the early stages of developing disease models to expand on these findings. However, if the reviewer feels strongly this additional data should be included in the manuscript, we are happy to add it. Considering the complexity of the data and concepts in the manuscript, we hoped to keep it focused to where we have strong molecular, cellular, and phenotypic outcomes. We elaborate on the implications of our work for future studies, including limitations of our study and currently available reagents and need for new mouse strains to rigorously answer these questions on lines 476-508

      The authors provide a valuable dataset of single-nuclei RNA sequencing (snRNA-seq) and ATAC sequencing (snATAC-seq) from RAGexp (RAG fate map-positive) and RAGnaïve (RAG fate map-negative) ILC2s. This elegant approach demonstrates that ILC2s with a history of RAG expression are epigenomically suppressed. However, key genes such as IL-5 and IL-13 do not appear to be differentially regulated between RAGexp and RAGnaïve ILC2s according to Table S5. Although the authors show that the regulome activity of IL-5 and IL-13 is decreased in RAGexp ILC2s, how do the authors explain that these genes are not differentially expressed between the RAGexp and RAGnaïve ILC2? I think that it is important to validate this in vivo.

      We thank the Reviewer for highlighting the value and possible elegance of our data. The Reviewer brings up an important issue that we grappled with in this study and that highlights a major technical limitation of single cell sequencing studies. Genes for secreted factors such as cytokines are often transcribed at low levels and are poorly detected in transcriptomic studies. This is particularly true in single cell studies with lower sequencing depth. Various efforts have been made to overcome these issues such as computational approaches to estimate missing data (e.g. van Djik et al, 2018, PMID 29961576; Huang et al, 2018, PMID 29941873), or recent use of cytokine reporter mice and dial-out PCR to enhance key cytokine signals in sequenced ILCs (Bielecki et al, 2021, PMID 33536623). We did not utilize computational methods to avoid the risk of introducing artifacts into the data, and we did not perform our study in cytokine reporter mice. Thus, cytokines were poorly detected in our transcriptomic data, as evidenced by lack of identification of cytokines as markers for specific clusters (e.g. IL-5 for ILC2s) or significant differential expression between RAG-naïve and RAG-experienced ILC2s.

      However, the multiomic features of our data allowed a synergistic analysis to identify effects on cytokines. For example, transcripts for the IL-4 and IL-5 were not detected at a high enough level to qualify as marker genes of the ILC2 cluster in the gene expression (GEX) assay but were identified as markers for the ILC2 cluster in the ATAC-seq data in the differentially accessible chromatin (DA) assay. Using the combined RNA-seq and ATAC-seq gene to peak links (GPL) analyses, many GPLs were identified in the Th2 locus for ILC2s, including for IL-13, which was not identified as a marker for ILC2s by any of the assays alone. Thus, our combined analysis took advantage of the potential of multiomic datasets to overcome a general weakness inherent to most scRNAseq datasets.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Line 168; Reference 23 also showed expression in the NK cells, please add this reference to reference 24.

      We thank the reviewer for catching this oversight, and we have corrected it in the revised manuscript.

      - Please add the full names for GPL and sdLN in the text of the manuscript when first using these abbreviations. They are now only explained in the legends.

      We reviewed the manuscript text and found that we defined sdLNs for the first time on line 104. We defined GPLs for the first time on line 248. We believe these definitions are placed appropriately near the first references to the corresponding figures/analysis, but if the Reviewer believes we should move these definitions earlier, we are happy to do so.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest that the following reanalyses would improve the clarity of the data:

      - Can ILC2 numbers, rather than frequency, be used (e.g. in Figure 1C, S2B, and so on). This would substantiate the data that currently relies on percentages.

      This was a weakness also noted by Reviewer #3. We have added data on ILC2 numbers for each experiment as outlined below:

      In New Figure 1, we added new panels for ILC2 cell number in both the AD-like disease experiment (C) and in steady state (H).

      In New Figure S2, we added a panel for ILC2 cell number in steady state (B).

      In Figure 2 and associated supplemental data in Figure S4, we added several more panels. For the splenocyte chimera, we added a panel for ILC2 cell number in New Figure 2C.

      We incorporated multiple new panels in New Figure S4 to address the need for more data to be shown for the BM chimera (also requested by Reviewer #2). These included total cell counts and frequency for ILC2 (New Figure S4F,G), and IL-5<sup>+</sup> (New Figure S4I,K) and IL-13<sup>+</sup> (New Figure S4J,L) ILCs in addition to the proportions originally presented in Figure 2.  

      - Can the authors provide data on IL-33R expression on sdLN ILC2s? Expression of ST-2 (IL-33R) does vary between ILC2 populations and is impacted by the digestion of tissue. All of the data provided here requires ILC2 to be IL-33R<sup>+</sup>. In the control samples, the ILC2 compartment is very scarce - in LNs, ILC2s are rare. The gating strategy with limited resolution of positive and negative cells in the lineage gate doesn't help this analysis.

      The Reviewer raises a valid point regarding the IL-33R marker and ILC2s. We designed our initial experiments to be consistent with our earlier observations of skin ILC2s, which were defined as CD45<sup>+</sup>Lin-CD90+CD25+IL33+, and the scarcity of skin draining lymph node ILC2s at steady state was consistent with our prior findings (Kim et al, 2013, PMID 23363980). We can include MFI data on IL-33R expression in these cells if the reviewer feels strongly that this would add to the manuscript, but we did not include other ILC2-specific markers in these experiments that would give us an alternative total ILC2 count to calculate frequency of IL-33R<sup>+</sup> ILC2s, which would also make the context of the IL-33 MFI difficult to interpret.

      Other studies defining tissue specific expression patterns in ILC2s have called into question whether IL-33R is a reliable marker to define skin ILC2s (Ricardo-Gonzalez et al, 2018, PMID 30201992). However, there is evidence for region-specific expression of IL-33R (Kobayashi et al, 2019, PMID 30712873), with ILC2s in the subcutis expressing high levels of IL-33R and both IL5 and IL-13, while ILC2s in the epidermis and dermis have low levels of IL-33R and IL-5 expression. In contrast to the Kobayashi et al study, Ricardo-Gonzalez et al sequenced ILC2s from whole skin, thus the region-specific expression patterns were not preserved, and the lower expression of IL-33R in the epidermis and dermis may have diluted the signal from the ILC2s in the subcutis. These may also be the ILC2s most likely to drain into the lymph nodes, which is the tissue on which we focused our analyses (consistent with our prior work in Kim et al, 2013).

      - In Figure 2 (related to 2H, 2I) can flow plots of the IL-5 versus IL-13 gated on either CD90.1+CD45.2+ or CD90.2+CD45.2+ ILC2 be shown? I.e. gate on the ILC2s and show cytokine expression, rather than the proportion of donor IL5/13. The proportion of donor ILC2 is shown to be significantly higher in 2G. Therefore gating on the cells of interest and showing on a cellular basis their ability to produce the cytokines would better make the point I think.

      We agree that this is important additional data to include. We have added flow plots of sdLN ILC2s from the BM chimera divided by donor genotype showing IL-5 and IL-13 expression in New Figure S4H.

      I assume the authors have looked and there is no obvious data, but does analysis of transcription factor consensus binding sequences in the open chromatin provide any new insight?

      The Reviewer also commented on this in the public review. As copied from our response above:

      We found that the most enriched sites in the ILC2 gene loci contained the consensus sequence GGGCGG (or its reverse complement), a motif recognized by a variety of zinc finger transcription factors (TFs). Predictions from our analyses predicted the KLF family of zinc finger TFs as most likely to be enriched at the identified open chromatin regions. To infer which KLFs might be occupying these sites in the RAG-experienced or RAG-naïve cells, we also assessed the expression levels of these identified TFs. Interestingly, KLF2 and KLF6 are more expressed in RAG-experienced ILC2s. KLF6 is a tumor suppressor (PMID: 11752579), and both KLF6 and KLF2 were recently shown to be markers of “quiescent-like” ILCs (PMID: 33536623). Further, upon analysis of the Th2 locus, the (A/T)GATA(A/G) consensus site (or reverse complement) was enriched in identified open chromatin at that locus. The algorithm predicted multiple TFs from the GATA family as possible binding partners, but expression analysis showed only GATA3 was highly expressed in ILC2s, consistent with what would be predicted from prior studies (PMID: 9160750).

      We have added this data in new Figure S10 and new Figure S12, with corresponding text in the Results section on lines 277-316 and lines 366-378.

      In terms of phrasing and presentation:

      - It would help to provide some explanation of why all analyses focus on the draining LNs rather than the actual site of inflammation (the ear skin). I do not think it appropriate to ask for data on this as this would require extensive further experimentation, but there should be some discussion on this topic. This feels relevant given that the skin is the site of inflammatory insult and ILC2 is present here. How the ILC2 compartment in the skindraining lymph nodes relates to those in the skin is not completely clear, particularly given the prevailing dogma that ILC2 are tissue-resident.

      Given limitations of assessing cytokine production of the relatively rare population of skin-resident ILC2s, we focused on the skin-draining lymph nodes (sdLN). Our findings in the current manuscript are consistent with our prior work in Kim et al, 2013 (PMID 23363980), and more recently in Tamari et al, 2024 (PMID 38134932), which demonstrated correlation of increased ILC2s in sdLN with increased skin inflammation in the MC903 model. Similarly, Dutton et al (PMID 31152090) have demonstrated expansion of the sdLN ILC2 pool in response to MC903-induced AD-like inflammation in mice. We elaborate on the implications of our work for future studies, including limitations of our study (including the focus on the sdLN), and currently available reagents and need for new mouse strains to rigorously answer these questions on lines 476-508

      - I think the authors should explicitly state that cytokine production is assessed after ex vivo restimulation (e.g. Lines 112-113).

      We have added this statement to the revised text.

      - I also think that it would help to be consistent with axis scales where analyses are comparable (e.g. Figure 1D vs Figure 1H).

      We agree with the Reviewer and we have adjusted the axes for consistency. The data remains unchanged, but axes are slightly adjusted in New Figure 1 (D&I, E&J, F&K) and New Figure S2 (C-E match New Figure 1 D-F). This same axis scaling scheme is carried forward to New Figure 2 (D-E) and New Figure S4 (G,K,L). New data on cell counts is also included per request by Reviewers 2 and 3 (see above). However, we found results for total cells, including ILC2s (New Figure 1C,H, New Figure S2B, New Figure 2C, New Figure S4F), were consistent within experiments, but not between experiments, likely representing issues with normalizing counts (we did not include counting beads for more accurate total counts). Thus, the y-axes in those panels are not consistent between experiments/figures.

      We feel reporting the proportion of WT vs Rag1<sup>-/-</sup> donor cells for the BM chimera is most illustrative of the effect of RAG and have kept it in the main New Figure 2, but for the BM chimera experiment panels we also include the total counts of IL-5<sup>+</sup> and IL-13<sup>+</sup> ILC2s (New Figure S4I,J).

    1. eLife Assessment

      This important revised manuscript presents compelling findings by delineating two molecularly distinct liver cancer subtypes through comprehensive multi-omics integration and constructing a rigorously validated prognostic model. The authors have strengthened the analytical framework and validation across multiple datasets, including single-cell RNA sequencing. The evidence remains robust, with enhanced methodological clarity and expanded validation in both internal and independent cohorts. The revisions have improved the study's rigor and translational relevance.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to classify hepatocellular carcinoma (HCC) patients into distinct subtypes using a comprehensive multi-omics approach. They employed an innovative consensus clustering method that integrates multiple omics data types, including mRNA, lncRNA, miRNA, DNA methylation, and somatic mutations. The study further sought to validate these subtypes by developing prognostic models using machine learning algorithms and extending the findings through single-cell RNA sequencing (scRNA-seq) to explore the cellular mechanisms driving subtype-specific prognostic differences.

      Strengths:

      (1) Comprehensive Data Integration: The study's integration of various omics data provides a well-rounded view of the molecular characteristics underlying HCC. This multi-omics approach is a significant strength, as it allows for a more accurate and detailed classification of cancer subtypes.

      (2) Innovative Methodology: The use of a consensus clustering approach that combines results from 10 different clustering algorithms is a notable methodological advancement. This approach reduces the bias that can result from relying on a single clustering method, enhancing the robustness of the findings.

      (3) Machine Learning-Based Prognostic Modeling: The authors rigorously apply a wide array of machine learning algorithms to develop and validate prognostic models, testing 101 different algorithm combinations. This comprehensive approach underscores the study's commitment to identifying the most predictive models, which is a considerable strength.

      (4) Validation Across Multiple Cohorts: The external validation of findings in independent cohorts is a critical strength, as it increases the generalizability and reliability of the results. This step is essential for demonstrating the clinical relevance of the proposed subtypes and prognostic models.

      Weaknesses:

      (1) Inconsistent Storyline:<br /> Despite the extensive data mining and rigorous methodologies, the manuscript suffers from a lack of a coherent and consistent narrative. The transition between different sections, particularly from multi-omics data integration to single-cell validation, feels disjointed. A clearer articulation of how each analysis ties into the overall research question would improve the manuscript.

      (2) Questionable Relevance of Immune Cell Activity Analysis:<br /> The evaluation of immune cell activities within the cancer cell model raises concerns about its meaningfulness. The methods used to assess immune function in the tumor microenvironment may not be fully appropriate, potentially limiting the insights gained from this part of the study.

      (3) Incomplete Single-Cell RNA-Seq Validation:<br /> The validation of the findings using single-cell RNA-seq data appears insufficient to fully support the study's claims. While the authors make an effort to extend their findings to the single-cell level, the analysis lacks depth. A more comprehensive validation is necessary to substantiate the robustness of the identified subtypes.

      (4) Figures and Visualizations:<br /> Several figures in the manuscript are missing necessary information, which affects the clarity of the results. For instance, the pathways in Figure 3A could be clustered to enhance interpretability, the blue bar in Figure 4A is unexplained, and Figure 4B is not discussed in the text. Additionally, the figure legend in Figure 7C lacks detail, and many figure descriptions merely repeat the captions without providing deeper insights.

      (5) Appraisal of the Study's Aims and Results<br /> The authors have set out to achieve an ambitious goal of classifying HCC patients into distinct prognostic subtypes and validating these findings through both bulk and single-cell analyses. While the methodologies employed are innovative and the data integration comprehensive, the study falls short in fully achieving its aims due to inconsistencies in the narrative and incomplete validation. The results partially support the conclusions, but the lack of coherence and depth in certain areas limits the overall<br /> impact of the study.

      (6) Impact on the Field<br /> If the identified weaknesses are addressed, this study has the potential to significantly impact the field of HCC research. The multi-omics approach combined with machine learning is a powerful framework that could set a new standard for cancer subtype classification. However, the current state of the manuscript leaves some uncertainty regarding the practical applicability of the findings, particularly in clinical settings.

      (7) Additional Context<br /> For readers and researchers, this study offers a valuable look into the potential of integrating multi-omics data with machine learning to improve cancer classification and prognostication. However, readers should be aware of the noted weaknesses, particularly the need for more consistent narrative development and comprehensive validation of the methods. Addressing these issues could greatly enhance the study's utility and relevance to the community.

      Comments on revisions:

      The authors have addressed the reviewers' concerns effectively.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      (1) Storyline and Narrative Flow:

      Consider revising the manuscript to create a more coherent and consistent narrative. Clarify how each section of the study-particularly the transition from multi-omics data integration to single-cell RNA-seq validation-contributes to the overall research question. This will help readers better understand the logical flow of the study.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have modified some text, including the connections between different sections in the results part and the objectives and roles of various analyses in each section, thus enhancing the coherence between the contexts and clarifying the objectives and functions of each analysis, We believe this will help readers better understand the main content of the entire text.

      (2) Immune Cell Activity Analysis:

      Reevaluate the methods used to assess immune cell activities within the context of the tumor microenvironment. Consider providing additional justification for the relevance of using the cancer cell model for this analysis. If necessary, explore alternative methods or models that might offer more meaningful insights into immune-tumor interactions.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      Using RNA-Bulk data, we evaluated the tumor immune microenvironment through various methods to assess immune infiltration levels and responses to immunotherapy. We found that the results were largely consistent with those presented in the manuscript, providing strong support for our viewpoints. We also acknowledge the limitations of findings from bioinformatics analysis. In our upcoming research, we plan to develop organoid models with gene expression patterns of both CS1 and CS2 subtypes, using these models as a foundation for studying the tumor immune microenvironment.

      (3) Single-Cell RNA-Seq Validation:

      Expand the validation of your findings using single-cell RNA-seq data. This could include more in-depth analyses that explore the heterogeneity within the subtypes and confirm the robustness of your classification method at the single-cell level. This would strengthen the support for your claims about the relevance of the identified subtypes.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      In this manuscript, we employed the NTP algorithm to classify malignant cells identified by the CopyKAT algorithm using characteristic genes of CS1 and CS2 subtypes. This approach is similar to previous method that analyzed patients in the ICGC cohort with the same subtype genes. We consider this classification method valid.

      After classifying the malignant cells, we performed metabolic and cell communication analyses on the CS1 and CS2 subtype cells, revealing significant differences in biological pathways enriched by differential genes, metabolic levels, and cell signaling patterns. These differences align with variations observed in prior classifications and analyses based on RNA-Bulk data.

      We also acknowledge that validating the classification method solely with the single-cell dataset from this study is insufficient. We analyzed GSE202642 using the same processes and methods as GSE229772, finding that the results were generally consistent, indicating that our classification method exhibits a degree of robustness at the single-cell level.

      (4) Methodological Justification:

      Provide a more detailed rationale for the selection of machine learning algorithms and integration strategies used in the study. Explain why the chosen methods are particularly well-suited for this research, and discuss any potential limitations they might have.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have updated the methodology section to enhance readers' understanding of the fundamental principles involved. This analysis has two key features: first, it combines 10 machine learning algorithms to generate 101 models and ultimately selects the prognostic prediction model with the highest C-index from these 101 algorithms; second, it utilizes the LOOCV method to analyze the training and validation sets. Compared to the conventional method of randomly dividing the training and validation sets by a fixed ratio, this approach significantly minimizes the bias and randomness introduced by the splitting process. Therefore, we believe this analysis can leverage the characteristic genes of the CS1 and CS2 subtypes, combined with existing clinical data from public databases, to yield results that are more accurate and reliable than the commonly used prognostic models in previous literature, such as COX regression and Lasso regression, as well as other individual algorithms. While this analysis presents advantages over some previous modeling methods, it is essential to recognize that it remains based on analyses conducted using public databases, which may obscure certain factors that might be clinically relevant to patient prognosis due to the mathematical logic of the algorithms.

      (5) Figures and Visualizations:

      Improve the clarity of your figures by addressing the following:

      a) Figure 3A: Cluster the pathways to make the comparisons clearer and more meaningful.

      b) Figure 4A: Clearly explain the significance of the blue bar.

      c) Figure 4B: Ensure this figure is discussed in the main text to justify its inclusion.

      d) Figure 7C: Enhance the figure legend to provide more informative details.

      Additionally, ensure that figure descriptions go beyond the captions and provide detailed explanations that help the reader understand the significance of each figure.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      Figure 3A: We clustered the samples based on CS1 and CS2 subtypes and displayed the immune-related cell scores of each sample as a heatmap.

      Figure 4A: The blue bars in the figure represent the average C-index of this algorithm combination in the training dataset TCGA and the validation dataset ICGC, which we have supplemented in the corresponding sections of the text.

      Figure 4B: We described this figure in the results section, which primarily aims to validate whether our prognostic prediction model can predict patient outcomes in the TCGA cohort. The results showed that after performing prognostic risk scoring on patients based on the prediction model and categorizing them into high-risk and low-risk groups, the two groups exhibited significant prognostic differences, with the high-risk group showing worse outcomes compared to the low-risk group. This indicates that our prognostic prediction model can effectively distinguish the prognostic risk differences among patients in the TCGA-LIHC cohort. We also discussed these findings in the discussion section.

      Figure 7C: We used both point color and size to visualize the levels of metabolic scores, resulting in two dimensions in the legend, which actually represent the same information. Therefore, we removed the results that used point size to indicate the levels of metabolic scores.

      (6) Supplementary Materials:

      Consider including more detailed supplementary materials that provide additional validation data, extended methodological descriptions, and any other information that would support the robustness of your findings.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      In the subsequent version of the record, we will upload the important results obtained during the research to GitHub, and in this revision, we have updated some figures that may better explain the results or the robustness of the findings as supplementary materials.

      (7) Recent Literature:

      a) Incorporate more recent studies in your discussion, especially those related to HCC subtypes and the application of machine learning in oncology. This will provide a more current context for your work and help position your findings within the broader field.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have reviewed several studies related to HCC subtype classification and the application of machine learning in this field. In the discussion section, we summarize the significance and limitations of these studies. Additionally, we discuss the characteristics of our study in comparison to previous research in this field.

      (8) Data and Code Availability:

      Ensure that all data, code, and materials used in your study are made available in line with eLife's policies. Provide clear links to repositories where readers can access the data and code used in your analyses.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have examined the relevant data, code, and materials. We confirm that we have indicated the sources of the data and tools used in the analysis within the manuscript. Moreover, these data and tools are accessible via the websites or references we have provided.

      Reviewer #2 (Recommendations for the authors):

      (1) While the computational findings are robust, further experimental validation of the two subtypes, particularly the role of the MIF signaling pathway, would strengthen the biological relevance of the findings. In vitro or in vivo validation could confirm the proposed mechanisms and their influence on patient prognosis.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We intend to verify our findings in future studies using tumor cell line models and animal models. We aim to identify and intervene with key molecules in the MIF signaling pathway. We will investigate how the MIF signaling pathway affects tumor sensitivity to treatment in both cell line and animal models, along with the underlying mechanisms.

      (2) Consider testing the model on additional independent cohorts beyond the TCGA and ICGC datasets to further demonstrate its generalizability and applicability across different patient populations.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We analyzed the GSE14520 study recorded in the GEO database, which uploaded a cohort consisting of 209 HCC patients and their corresponding RNA sequencing data. We validated the prognostic model obtained in this study using this cohort, and found that the model effectively distinguishes patients into high-risk and low-risk prognostic categories. Furthermore, there is a significant prognostic difference between the high-risk and low-risk patient groups. This is consistent with the results we obtained previously.

      (3) Review the manuscript for long or complex sentences, which can be broken down into shorter, more readable parts.

      We have made revisions to the long and complex sentences in the manuscript without compromising its academic integrity and rationality, with the hope that this will help readers better understand the content of this study.

      During the revision process, in addition to addressing the reviewer comments, we conducted a thorough review of the analysis. In the course of this review, we identified a few errors in the data usage and have since corrected the relevant data and figures:

      Figure 4: Due to space constraints, we adjusted the composition of the figures after incorporating the validation results from the GSE14520 dataset.

      Figure 5A: We rechecked the regression coefficients included in the model, updated several more recent prognostic models, and calculated the C-index for 20 prognostic models in the TCGA and ICGC cohorts using a method consistent with previous studies.

      Figure 5C-D: We adjusted the clarity of the figures.

      Figure 8: We reclassified the selected malignant cells and updated the subtypes results. Subsequently, based on the repeatedly confirmed typing results, we comprehensively updated the analysis results of the subsequent cell communication network construction, ensuring that the entire analysis process remains consistent with previous findings. We also adjusted the composition of the figure and presented the images that could not be conveniently merged due to space constraints as Figure 9.

    1. eLife Assessment

      In this important work, a quantitative analysis method for three-dimensional morphogenetic processes during embryonic development is introduced. The proposed method is a pipeline combining several methods, allowing quantitative analysis of developmental processes without cell segmentation and tracking. Upon application of their method, the authors obtain convincing evidence that ascidian gastrulation is a two-step process. This work should be of interest to a broad range of developmental biologists who aim to obtain a quantitative understanding of morphogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      The authors propose a new method to quantitatively assess morphogenetic processes during organismal development. They apply their method to ascidian morphogenesis and thus find that gastrulation is a two-step process.

      The method applies to morphogenetic changes of surfaces. It consists of the following steps: first, surface deformations are quantified based on microscopy images without requiring cellular segmentation and tracking. This is achieved by mapping, at each time point, a polygonal mesh initially defined on a sphere to the surface of the embryo. The mapped vertices of this polygonal mesh then serve as (Lagrangian) markers for the embryonic surface. From these, one can infer the deformation of the surface, which can be expressed in terms of the strain tensor at each point of the surface. Changes in the strain tensor give the strain rate, which captures the morphogenetic processes. Second, at each time point, the strain rate field is decomposed in terms of spherical harmonics. Finally, the evolution of the weights of the various spherical harmonics in the decomposition is analysed via a wavelet analysis. The authors apply their workflow to ascidian development between 4 and 8.7 hpf. From their analysis they find clear indications for gastrulation and neurulation and identify two sub-phases of gastrulation, namely, endoderm invagination and 'blastophore closure'.

      Strengths:

      The combination of various tools allows the authors to obtain a quantitative description of the developing embryo without the necessity of identifying fiducial markers. Visual inspection shows that their method works well. Furthermore, this quantification then allows for an unbiased identification of different morphogenetic phases.

      Weaknesses:

      At times, the explanation of the method is hard to follow, unless the reader is already familiar with concepts like level-set methods or wavelet transforms. Furthermore, the software for performing the determination of Lagrangian markers or the subsequent spectral analysis does not seem to be available to the readers.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors proposed a method to quantitatively analyze 3D live imaging data of early developing embryos, using the ascidian development as an example. For this purpose, the previously proposed level set method was used to computationally track the temporal evolution of reference points introduced on the embryo surface. Then, from the obtained three-dimensional trajectories, the velocity field was obtained, from which the strain rate field was computed. The strain rate field was analyzed using spherical harmonics.

      In this paper, the authors focused on the modes with lower order with real coefficients. The time evolution of these modes was analyzed using wavelet transforms. The results obtained by the pipeline reflected the developmental stages of ascidian embryos.

      Strengths:

      In this way, this manuscript proposes a pipeline of analyses combining various methods. The strength of this method lies in its ability to quantitatively analyze the deformation of the entire embryo without the requirement for cellular segmentation and tracking.

      Weaknesses:

      The mathematics behind this method is not straightforward to understand. The value of this method will be understood as analyses of real data using this method accumulate.

      Comments on revised version:

      I have reviewed the revised manuscript and the reply from the authors. All concerns have been addressed appropriately.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1:

      (1) Figure 2 is mentioned before Figure 1

      We thank the reviewer for pointing this out, this was a mistake. What was meant by Figure 2 was actually Figure 1. This has been corrected in the manuscript.

      (2) Figure 1c: red is used to indicate cell junctions on raw data, but also the error.

      The color red is used to indicate cell junctions on raw data on figure 1c left, while it is used to indicate the error on figure 1c right.

      The Lagrangian error can be negative right? This is not reflected by the error scale which goes from 0% to 100%

      A negative Lagragian error would mean that the distance between real and simulated cellular junctions decreased over time. We effectively treat this case as if there was no displacement, and the error is hence 0%.

      Why do you measure the error in percent?

      The error is measured in percentages because it is relative to the apical length of a cell.

      (3) Figure 2: The distinction between pink and red in e_2(t) is very difficult. What do the lines indicate?

      The lines indicate directions of the eigen vectors of the strain rate tensor at every material particle of the embryo.

      (4) L156 "per unit length": Rather per unit time?

      We thank the reviewer for pointing this out. We apologize for this mistake. "per unit length" has been changed to "per unit time"

      (5) L159 "Eigen vectors in this sense": is there another sense?

      "In this sense" is referring to the geometric description of eigen vectors. The phrase has been removed

      (6) L164 "magnitude of the rate of change underwent by a particle at the surface of the embryo in the three orthogonal spatial directions of most significant rate of change."

      Would a decomposition in two directions within the surface's tangent plane and one perpendicular to it not be better?

      We also performed the decomposition of the strain rate tensor as suggested within the surface's tangent plane and one perpendicular to it, but did not notice any tangible differences in the overall analysis, especially after derivation of the scalar field.

      (7) L174 "morphological activity": I think this notion is never defined

      By morphological activity we mean any noticeable shape changes

      (8) L177: I did not quite understand this part

      This part tries to convey that the scalar strain rate field evidences coordinated cell behaviors by highlighting wide regions of red that traverse cell boundaries (e.g. fig.2b, $t=5.48hpb$). At the same time, the strain rate field preserves cell boundaries, highlighted by bands of red at cellular intersections, when cell coordinated cell behaviors are not preponderant (e.g. fig.2b, $t=4hpb$).

      (9) Ll 194 "Unsurprisingly, these functions play an important role in many branches of science including quantum mechanics and geophysics Knaack and Stenflo (2005); Dahlen and Tromp (2021)." Does this really help in understanding spherical harmonics?

      This comment was made with the aim of showing to the reader that Spherical Harmonics have proved to be useful in other fields. Although it does not help in understanding spherical harmonics, it establishes that they can be effective.

      (10) Figure 3a: I do not find this panel particularly helpful. What does the color indicate? What are the prefactors of the spherical harmonics?

      This panel showcases the restriction of the strain rate scalar field to the spherical harmonics with the l and m specified. Each material particle of the embryo surface at the time  is colored with respect to the value of . The values are computed according to equation 2 and are showcased in figure 3c.

      (11) L 265: Please define "scalogram" as opposed to a spectrogram.

      Scalograms are the result of wavelet transforms applied to a signal. Although spectrogram can specifically refer to the spectrum of frequencies resulting for example from a Fourier transform, the term can also be used in a broader sense to designate any time-frequency representation. In the context of this paper, we used it interchangeably with scalogram. We have changed all occurrences of spectrogram to scalogram in the revised manuscript.

      (12) L 299 "the analysis was carried out the 64-cell stage.": Probably 'the analysis was carried out at the 64-cell stage'

      We thank the reviewer for pointing this out. The manuscript was revised to reflect the suggested change.

      (13) L 340 "Another outstanding advantage over traditional is": Something seems to be missing in this sentence.

      We thank the reviewer for pointing this out. We have modified the sentence in the revised manuscript. It now reads “Another outstanding advantage of our workflow over traditional methods is that our workflow is able to compress the story of the development ... ”.

      (14) Ll 357 "on the one hand, the overall spatial resolution of the raw data, on the other hand, the induced computational complexity.": Is there something missing in this sentence

      The sentence tries to convey the idea that in implementing our method, there is a comprise to be made between the choice of the number of particles on the constructed mesh and the computational complexity induced by this choice. There is also a comprise to be made between this choice of the number of particles and the spatial resolution of the original dataset.

      Reviewer 2:

      (1) The authors should clearly state to which data this method has been applied in this paper. Also, to what kind of data can this method be applied? For instance, should the embryo surface be segmented?

      The method has been applied on 3D+time imaging data of ascidian embryonic development data hosted on the morphonet (morphonet.org) platform. The data on the morphonet platform comes in two formats: closed surface meshes of segmented cells spatially organized into the embryo, and 3D voxelated images of the embryo. The method was first designed for the former format and then extended to the later. There is no requirement for the embryo surface to be segmented.

      (2) In this paper, it is essential to understand the way that the authors introduced the Lagrangian markers on the surface of the embryo. However, understanding the method solely based on the description in the main text was difficult. I recommend providing a detailed explanation of the methodology including equations in the main text for clarity.

      We believe that adding mathematical details of the method into the text will cloud the text and make it more difficult to understand. Interested readers can refer to the supplementary material for detailed explanation of the method.

      (3) In eq.(1) of the supplementary information, d(x,S_2(t)) could be a distance function between S_1 and S_2 although it was not stated. How was the distance function between the surfaces defined?

      What was meant here was d(x,S_1(t)) where x is a point of S_2(t). d(x,S_1(t)) referring to the distance between point x and S_1(t). The definition of the distance function has been clarified in the supplementary information.

      (4) In the section on the level set scheme of supplementary information, the derivation of eq.(4) from eq.(3) was not clear.

      We added an intermediary equation for clarification.

      (5) Why is a reference shape S_1(0) absent at t=0?

      A reference shape S_1(0) is absent at t=0 precisely because that is what we are trying to achieve: construct an evolving Lagrangian surface S_2(t) matching S_1(t) at all times.

      (6) In Figure 2(a), it is unclear what was plotted. What do the colors mean? A color bar should be provided.

      The caption of the figure describes the colors: “a) Heatmap of the eigenvector fields of the strain rate tensor. Each row represents a vector field distinguished by a distinct root color (\textit{yellow, pink, white}). The gradient from the root color to red represents increasing magnitudes of the strain rate tensor.”

      (7) With an appropriate transformation, it would be possible to create a 2D map from a 3D representation shown in for instance Figure 2. Such a 2D representation would be more tractable for looking at the overall activities.

      We thank the reviewer for pointing this out. In Figure 4b of the supplementary information, we provide a 2D projection of the scalar strain rate field.

      (8) The strain rate is a second-order tensor that contains rich information. In this paper, the information in the tensor has been compressed into a scalar field by taking the square root of the sum of the squares of the eigenvalues. However, such a representation may not distinguish important events such as stretching and compression of the tissue. The authors should provide appropriate arguments regarding the limitations of this analysis.

      The tensor form of the strain rate field is indeed endowed with more information than the scalar eigen value field derived. However, our objective in this project was not to exhaust the richness of the strain rate tensor field but rather to serve as a proof of concept that our global approach to studying morphogenesis could in fact unveil sufficiently rich information on the dynamical processes at play. Although not in the scope of this project, a more thorough exploration of the strain rate tensor field could be the object of future investigations.

      (9) The authors claimed that similarities emerge between the spatiotemporal distribution of morphogenesis processes in the previous works and the heatmaps in this work. Some concrete data should be provided to support this claim.

      All claims have been backed with references to previous works. For instances, looking at figure 2b, the two middle panels on the lower row (5.48hpf, 6.97hpf), we explained that the concentration of red refers respectively to endoderm invagination during gastrulation, and zippering during neurulation [we cited Hashimoto et al. (2015)]. Here, we relied on eye observation to spot the similarities. The rest of the paper provides substantial and robust additional support for these claims using spectral decomposition in space and time.

      (10) The authors also claimed that "A notable by-product of this scalar field is the evidencing of the duality of the embryo as both a sum of parts constituted of cells and an emerging entity in itself: the strain rate field clearly discriminates between spatiotemporal locations where isolated single cell behaviours are preponderant and those where coordinated cell behaviours dominate." The authors should provide specific examples and analysis to support this argument.

      Here, we relied on eye observation to make this claim. This whole section of the paper “Strain rate field describes ascidian morphogenesis” was about computing, plot and observing the strain rate field.

      However, specific examples were provided. This paragraph was building towards this statement, and the evidence was scattered through the paragraph. We have now revised the sentence to ensure that we highlight specific examples:

      “A notable by-product of this scalar field is the evidencing of the duality of the embryo as both a sum of parts constituted of cells and an emerging entity in itself: the strain rate field clearly discriminates between spatiotemporal locations where isolated single cell behaviours are preponderant (e.g. fig.2b, $t=4hpb$) and those where coordinated cell behaviours dominate (e.g. fig.2b, $t=5.48hpb$).”

      (11) The authors should provide the details of the analysis method used in Figure 3b, including relevant equations. In particular, it would be helpful to clarify the differences that cause the observed differences between Figure 3b and Figure 3c.

      Figure 3b was introduced with the sentence: “In analogy to Principal Components Analysis, we measure the average variance ratio over time of each harmonic with respect to the original signal (Fig.3b).” explaining the origin of variance ratio values used in figure 3b. We have now added the mathematical expression to further clarify.

      (12) The authors found that the variance ratio of Y_00 was 64.4%. Y_00 is a sphere, indicating that most of the activity can be explained by a uniform activity. Which actual biological process explains this symmetrical activity?

      The reviewer makes a good point which also gave us a lot to think about during the analysis. Observing that the contribution of Y00 peaks during synchronous divisions, which are interestingly restricted only to the animal pole, we conjecture that localized morphological ripples and can be felt throughout the embryo. 

      (13) The contribution of other spherical harmonics than Y_00 and Y_10 should be shown.

      Other spherical harmonics contributed individual to less than 1% and we did not find it important to include them in the main figure. We will add supplementary material.

    1. eLife Assessment

      This important manuscript by Genzoni et al. reports the striking discovery of a regulatory role for trophic eggs in ant caste determination. Prior to this study, trophic eggs were widely assumed to play only a nutritional role in the colony, but this compelling study shows that trophic eggs can suppress queen development, and therefore regulate caste determination in specific social contexts.

    2. Reviewer #2 (Public review):

      The revised manuscript by Genzoni et al. reports the striking discovery of a regulatory role for trophic eggs. Prior to this study, trophic eggs were widely assumed to play a nutritional role in the colony, but this study shows that trophic eggs can suppress queen development, and therefore, can play a role in regulating caste determination in specific social contexts. In this revised version of the manuscript, the authors have addressed many of the concerns raised in the first version regarding the lack of sufficient information and context in the Introduction and Discussion. I have several (mostly minor) comments I would like the authors to address:

      Comments:

      (1) The authors' experimental design is based on the comparison of a larva-only (control) versus larva+3 trophic eggs (treatment). The authors convincingly show that the larva plus 3 trophic eggs treatment has an inhibitory effect versus larva-only control. However, the authors should have also done a treatment composed of larva + 3 viable eggs to determine if the inhibitory effect observed on queens is specific to trophic eggs or whether it is an inhibitory effect of all eggs. This has had important mechanistic consequences, because if the inhibitory effect is specific to trophic eggs, it means there are specific inhibitory factors deposited in trophic eggs during oogenesis and the differences observed between trophic versus viable eggs are meaningful beyond just nutritional differences. If the inhibitory effect is a property of all eggs, then the inhibitory factor is dumped into all eggs and the differences observed between trophic and viable eggs are related to something else. In all cases, this reviewer is not necessarily asking that they perform this additional treatment, but the authors have to be clear in the text that they cannot claim that the inhibitory effect is specific to trophic eggs alone without doing this experiment.

      (2) The other untested assumption the authors are making is that queen-laid trophic eggs would behave the same as worker-laid trophic eggs. This is apparent in the Discussion (line 422). They should instead highlight the interesting question of whether worker-laid trophic eggs would be similar in composition and have the same effect on caste as queen-laid eggs.

      (3) To this reviewer, they are missing a crucial explanation in the discussion. As far as this reviewer knows, young queens produce a higher proportion of trophic eggs than older queens, meaning that trophic egg production decreases with age of the queen. This raises the possibility that trophic eggs may, in part, function to prevent the production of more virgin queens in young and immature colonies with small colony sizes. This would allow colonies to invest in producing more workers at a time when rapidly expanding the colony is crucial in young colonies' life. Production of trophic eggs, therefore, may have a dual function: one for nutrition and larval survival, and one in suppressing queen development in immature young colonies. It can be said then that trophic eggs can regulate / influence caste determination in specific social / life history contexts of the colony, rather than only proposing that trophic eggs are a constant attempt by the queen to manipulate her offspring. I prefer the superorganism explanation, but readers should at least hear explanations at the individual and superorganism scales as a way of explaining the authors' discovery that trophic eggs suppress further queen development.

      (4) Why did the authors change the wording from caste "determination" to caste "differentiation." Determination is more appropriate because the trophic eggs do not affect morphogenesis of queens or workers, but rather the developmental switch between queens and workers.

      (5) Khila and Abouheif (2008) is listed in the References but not cited in the text.

      (6) On Line 70-81: "...may play a role in the regulation of body size" - I think the authors are trying to be broad in their language here since one study showed trophic eggs increased worker size but didn't induce queens, but this statement implies that the hypothesis is that trophic eggs act via body size to affect caste. Since the authors don't measure body size changes, only binary caste outcome, this is not the best way to set up the question. Could instead just conclude that previous work shows an effect on both caste and body size.

      (7) Paragraph beginning line 432: this paragraph seems out of place, not well connected to previous parts of discussion. It introduces the term "egg cannibalism" without defining it - not clear if this is meant as a synonym for eating of trophic eggs, or broader (i.e., eating viable eggs also). Could either remove the paragraph, or better set up the context that egg-eating behaviour is common in ants, could have evolved for worker policing reasons and/or for nutritional exchange, trophic eggs (and potentially co-option of trophic eggs for caste determination functions) presumably evolved in this context of existing egg-eating behaviour.

      (8) Line 41: Should read 'play an important part.

      (9) Line 51: The food that was given is listed, but there is no information about the quantity of food given.

      (10) Line 74: The paragraph states that queens were isolated for 16 hours per day. However, it lacks a clear reason for this specific duration. Why 16 hours? Could this isolation period have impacted egg quality or larval development?

      (11) Line 76: The eggs were collected every 8 hours and then held for 10 days until hatching. This is a very long time for eggs to be held outside of the normal colony environment. This could have a large impact on the viability of the eggs, and the resulting larvae.

      (12) Line 78: twice "that" in "suggested that that the larger castes"

      (13) Lines 96-97: the following sentence is unclear: "The question mark indicates that it is unclear whether about the evidence for the production trophic eggs by queens and workers"

      (14) Line 209: By simply stating "binomial GLMM," the authors are leaving out a crucial piece of information. Readers cannot fully understand how the model was fitted or how the coefficients should be interpreted without knowing the link function. Therefore, the critique is that for complete and replicable science, the link function must be reported.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript describes a series of experiments documenting trophic egg production in a species of harvester ant, Pogonomyrmex rugosus. In brief, queens are the primary trophic egg producers, there is seasonality and periodicity to trophic egg production, trophic eggs differ in many basic dimensions and contents relative to reproductive eggs, and diets supplemented with trophic eggs had an effect on the queen/worker ratio produced (increasing worker production).

      The manuscript is very well prepared and the methods are sufficient. The outcomes are interesting and help fill gaps in knowledge, both on ants as well as insects, more generally. More context could enrich the study and flow could be improved.

      We thank the reviewer for these comments. We agree that the paper would benefit from more context. We have therefore greatly extended the introduction.

      Reviewer #2 (Public Review):

      The manuscript by Genzoni et al. provides evidence that trophic eggs laid by the queen in the ant Pogonomyrmex rugosis have an inhibitory effect on queen development. The authors also compare a number of features of trophic eggs, including protein, DNA, RNA, and miRNA content, to reproductive eggs. To support their argument that trophic eggs have an inhibitory effect on queen development, the authors show that trophic eggs have a lower content of protein, triglycerides, glycogen, and glucose than reproductive eggs, and that their miRNA distributions are different relative to reproductive eggs. Although the finding of an inhibitory influence of trophic eggs on queen development is indeed arresting, the egg cross-fostering experiment that supports this finding can be effectively boiled down to a single figure (Figure 6). The rest of the data are supplementary and correlative in nature (and can be combined), especially the miRNA differences shown between trophic and reproductive eggs. This means that the authors have not yet identified the mechanism through which the inhibitory effect on queen development is occurring. To this reviewer, this finding is more appropriate as a short report and not a research article. A full research article would be warranted if the authors had identified the mechanism underlying the inhibitory effect on queen development. Furthermore, the article is written poorly and lacks much background information necessary for the general reader to properly evaluate the robustness of the conclusions and to appreciate the significance of the findings.

      We thank the reviewer for these comments. We agree that the paper would benefit by having more background information and more discussion. We have followed this advice in the revision.

      Reviewer #3 (Public Review):

      In "Trophic eggs affect caste determination in the ant Pogonomyrmex rugosus" Genzoni et al. probe a fundamental question in sociobiology, what are the molecular and developmental processes governing caste determination? In many social insect lineages, caste determination is a major ontogenetic milestone that establishes the discrete queen and worker life histories that make up the fundamental units of their colonies. Over the last century, mechanisms of caste determination, particularly regulators of caste during development, have remained relatively elusive. Here, Genzoni et al. discovered an unexpected role for trophic eggs in suppressing queen development - where bi-potential larvae fed trophic eggs become significantly more likely to develop into workers instead of gynes (new queens). These results are unexpected, and potentially paradigm-shifting, given that previously trophic eggs have been hypothesized to evolve to act as an additional intracolony resource for colonies in potentially competitive environments or during specific times in colony ontogeny (colony foundation), where additional food sources independent of foraging would be beneficial. While the evidence and methods used are compelling (e.g., the sequence of reproductive vs. trophic egg deposition by single queens, which highlights that the production of trophic eggs is tightly regulated), the connective tissue linking many experiments is missing and the downstream mechanism is speculative (e.g., whether miRNA, proteins, triglycerides, glycogen levels in trophic eggs is what suppresses queen development). Overall, this research elevates the importance of trophic eggs in regulating queen and worker development but how this is achieved remains unknown.

      We thank the reviewer for these comments and agree that future work should focus on identifying the substances in trophic eggs that are responsible for caste determination.  

      Reviewer #1 (Recommendations For The Authors):

      Introduction:

      The context for this study is insufficiently developed in the introduction - it would be nice to have a more detailed survey of what is known about trophic eggs in insects, especially social insects. The end of the introduction nicely sets up the hypothesis through the prior work described by Helms Cahan et al. (2011) where they found JH supplementation increased trophic egg production and also increased worker size. I think that the introduction could give more context about egg production in Pogonomyrmex and other ants, including what is known about worker reproduction. For example, Suni et al. 2007 and Smith et al. 2007 both describe the absence of male production by workers in two different harvester ants. Workers tend to have underdeveloped ovaries when in the presence of the queen. Other species of ants are known to have worker reproduction seemingly for the purpose of nutrition (see Heinze and Hölldober 1995 and subsequent studies on Crematogaster smithi). Because some ants, including Pogonomyrmex, lack trophallaxis, it has been hypothesized that they distribute nutrients throughout the nest via trophic eggs as is seen in at least one other ant (Gobin and Ito 2000). Interestingly, Smith and Suarez (2009) speculated that the difference in nutrition of developing sexual versus worker larvae (as seen in their pupal stable isotope values) was due to trophic egg provisioning - they predicted the opposite as was found in this study, but their prediction was in line with that of Helms Cahan et al. (2011). This is all to say that there is a lot of context that could go into developing the ideas tested in this paper that is completely overlooked. The inclusion of more of what is known already would greatly enrich the introduction.

      We agree that it would be useful to provide a larger context to the study. We now provide more information on the life-history of ants and explained under what situations queens and workers may produce trophic eggs. We also mentioned that some ants such as Crematogaster smithi have a special caste of “large workers” which are morphologically intermediate between winged queens and small workers and appear to be specialized in the production of unfertilized eggs. We now also mention the study of Goby and Ito (200) where the authors show that trophic eggs may play an important role in food distribution withing the colony, in particular in species where trophallaxis is rare or absent.

      Methods:

      L49: What lineage is represented in the colonies used? The collection location is near where both dependent-lineage (genetic caste determining) P. rugosus and "H" lineage exist. This is important to know. Further, depending on what these are, the authors should note whether this has relevance to the study. Not mentioning genetic caste determination in a paper that examines caste determination is problematic.

      This is a good point. We have now provided information at the very beginning of the material and method section that the queens had been collected in populations known not to have dependentlineage (genetic caste determining) mechanisms of caste determination.

      L63 and throughout: It would be more efficient to have a paragraph that cites R (must be done) and RStudio once as the tool for all analyses. It also seems that most model construction and testing was done using lme4 - so just lay this out once instead of over and over.

      We agree and have updated the manuscript accordingly.

      L95: 'lenght' needs to be 'length' in the formula.

      Thanks, corrected.

      L151: A PCA was used but not described in the methods. This should be covered here. And while a Mantel test is used, I might consider a permANOVA as this more intuitively (for me, at least) goes along with the PCA.

      We added the PCA description in the Material and Method section.

      Results:

      I love Fig. 3! Super cool.

      Thanks for this positive comment.

      Discussion:

      It would be good to have more on egg cannibalism. This is reasonably well-studied and could be good extra context.

      We have added a paragraph in the discussion to mention that egg cannibalism is ubiquitous in ants.

      Supp Table 1: P. badius is missing and citations are incorrectly attributed to P. barbatus.

      P. badius was present in the Table but not with the other Pogonomyrmex species. For some genera the species were also not listed in alphabetic order. This has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      COMMENTS ON INTRODUCTION:

      The introduction is missing information about caste determination in ants generally and Pogonomyrmex rugosis specifically. This is important because some colonies of Pogonomyrmex rugosis have been shown to undergo genetic caste determination, in which case the main result would be rendered insignificant. What is the evidence that caste determination in the lineages/colonies used is largely environmentally influenced and in what contexts/environmental factors? All of this should be made clear.

      This is a good point. We have expanded the introduction to discuss previous work on caste determination in Pogonomyrmex species with environmental caste determination and now also provide evidence at the beginning of the Material and Method section that the two populations studied do not have a system of genetic caste determination.

      Line 32 and throughout the paper: What is meant exactly by 'reproductive eggs'? Are these eggs that develop specifically into reproductives (i.e., queens/males) or all eggs that are non-trophic? If the latter, then it is best to refer to these eggs as 'viable' in order to prevent confusion.

      We agree and have updated the manuscript accordingly.

      Figure 1/Supp Table 1: It is surprising how few species are known to lay trophic eggs. Do the authors think this is an informative representation of the distribution of trophic egg production across subfamilies, or due to lack of study? Furthermore, the branches show ant subfamilies, not families. What does the question mark indicate? Also, the information in the table next to the phylogeny is not easy to understand. Having in the branches that information, in categories, shown in color for example, could be better and more informative. Finally, having the 'none' column with only one entry is confusing - discuss that only one species has been shown to definitely not lay trophic eggs in the text, but it does not add much to the figure.

      Trophic eggs are probably very common in ants, but this has not been very well studied. We added a sentence in the manuscript to make this clear.

      Thanks for noticing the error family/subfamily error. This has been corrected in Figure 1 and Supplementary Table 1.

      The question mark indicates uncertainty about whether queens also contribute to the production of trophic eggs in one species (Lasius niger). We have now added information on that in the Figure legend.

      We agree with the reviewer that it would be easier to have the information on whether queens and workers produce trophic on the branches of the Tree. However, having the information on the branches would suggest that the “trait” evolved on this part of the tree. As we do not know when worker or queen production of trophic eggs exactly evolved, we prefer to keep the figure as it is.

      Finally, we have also removed the none in the figure as suggested by the reviewer and discussed in the manuscript the fact that the absence of trophic eggs has been reported in only one ant species (Amblyopone silvestrii: Masuko 2003).

      COMMENTS ON MATERIALS AND METHODS:

      Why did they settle on three trophic eggs per larva for their experimental setup?

      We used three trophic eggs because under natural conditions 50-65% of the eggs are trophic. The ratio of trophic eggs to viable eggs (larvae) was thus similar natural condition.

      Line 50: In what kind of setup were the ants kept? Plaster nests? Plastic boxes? Tubes? Was the setup dry or moist? I think this information is important to know in the context of trophic eggs.

      We now explain that colonies were maintained in plastic boxes with water tubes.

      Line 60: Were all the 43 queens isolated only once, or multiple times?

      Each of the 43 queens were isolated for 8 hours every day for 2 weeks, once before and once after hibernation (so they were isolated multiple times). We have changed the text to make clear that this was done for each of the 43 queens.

      Could isolating the queen away from workers/brood have had an effect on the type of eggs laid?

      This cannot be completely ruled out. However, it is possible to reliably determine the proportion of viable and trophic eggs only by isolating queens. And importantly the main aim of these experiments was not to precisely determine the proportion viable and trophic eggs, but to show that this proportion changes before and after hibernation and that queens do not lay viable and trophic eggs in a random sequence.

      Since it was established that only queens lay trophic eggs why was the isolation necessary?

      Yes this was necessary because eggs are fragile and very difficult to collect in colonies with workers (as soon as eggs are laid they are piled up and as soon as we disturb the nest, a worker takes them all and runs away with them). Moreover, it is possible that workers preferentially eat one type of eggs thus requiring to remove eggs as soon as queens would have laid them. This would have been a huge disturbance for the colonies.

      Line 61: Is this hibernation natural or lab induced? What is the purpose of it? How long was the hibernation and at what temperature? Where are the references for the requirement of a diapause and its length?

      The hibernation was lab induced. We hibernated the queens because we previously showed that hibernation is important to trigger the production of gynes in P. rugosus colonies in the laboratory (Schwander et al 2008; Libbrecht et al 2013). Hibernation conditions were as described in Libbrecht et al (2013).  

      Line 73: If the queen is disturbed several times for three weeks, which effect does it have on its egg-laying rate and on the eggs laid? Were the eggs equally distributed in time in the recipient colonies with and without trophic eggs to avoid possible effects?

      It is difficult to respond what was the effect of disturbance on the number and type of eggs laid. But again our aim was not to precisely determine these values but determine whether there was an effect of hibernation on the proportion of trophic eggs. The recipient colonies with and without trophic eggs were formed in exactly the same way. No viable eggs were introduced in these colonies, but all first instar larvae have been introduced in the same way, at the same time, and with random assignment. We have clarified this in the Material and Method section.

      Line 77: Before placing the freshly hatched larvae in recipient colonies, how long were the recipient colonies kept without eggs and how long were they fed before giving the eggs? Were they kept long enough without the queen to avoid possible effects of trophic eggs, or too long so that their behavior changed?

      The recipient colonies were created 7 to 10 days before receiving the first larvae and were fed ad libitum with grass seeds, flies and honey water from the beginning. Trophic eggs that would have been left over from the source colony should have been eaten within the first few days after creating the recipient colonies. However, even if some trophic eggs would have remained, this would not influence our conclusion that trophic eggs influence caste fate, given the fully randomized nature of our treatments and the considerable number of independent replicates. The same applies to potential changes in worker behavior following their isolation from the queen.

      Line 77: Is it known at what stage caste determination occurs in this species? Here first instar larvae were given trophic eggs or not. Does caste-determination occur at the first instar stage? If not, what effect could providing trophic eggs at other stages have on caste-determination?

      A previous study showed that there is a maternal effect on caste determination in the focal species (Schwander et al 2008). The mechanism underlying this maternal effect was hypothesized to be differential maternal provisioning of viable eggs. However, as we detail in the discussion, the new data presented in our study suggests that the mechanism is in fact a different abundance of trophic eggs laid by queens. There is currently no information when exactly caste determination occurs during development

      COMMENTS ON RESULTS:

      Line 65: How does investigating the order of eggs laid help to "inform on the mechanisms of oogenesis"?

      We agree that the aim was not to study the mechanism of oogenesis. We have changed this sentence accordingly: “To assess whether viable and trophic eggs were laid in a random order, or whether eggs of a given type were laid in clusters, we isolated 11 queens for 10 hours, eight times over three weeks, and collected every hour the eggs laid”

      Figure 2: There is no description/discussion of data shown in panels B, C, E, and F in the main text.

      We have added information in the main text that while viable eggs showed embryonic development at 25 and 65 hours (Fig 12 B, C) there was no such development for trophic eggs (Fig. 2 E,F).

      Line 172: Please explain hibernation details and its significance on colony development/life cycle.

      We have added this information in the Material and Method section.

      Figure 6: How is B plotted? How could 0% of gynes have 100% survival?

      The survival is given for the larvae without considering caste. We have changed the de X axis of panel B and reworded the Figure legend to clarify this.

      Is reduced DNA content just an outcome of reduced cell number within trophic eggs, i.e., was this a difference in cell type or cell number? Or is it some other adaptive reason?

      It is likely to be due to a reduction in cell number (trophic eggs have maternal DNA in the chorion, while viable eggs have in addition the cells from the developing zygote) but we do not have data to make this point.

      Is there a logical sequence to the sequence of egg production? The authors showed that the sequence is non-random, but can they identify in what way? What would the biological significance be?

      We could not identify a logical sequence. Plausibly, the production of the two types of eggs implies some changes in the metabolic processes during egg production resulting in queens producing batches of either viable or trophic eggs. This would be an interesting question to study, but this is beyond the scope of this paper.

      Figure 6b is difficult to follow, and more generally, legends for all figures can be made clearer and more easy to follow.

      We agree. We have now improved the legends of Fig 6B and the other figures.

      Lines 172-174: "The percentage of eggs that were trophic was higher before hibernation...than after. This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable" - are these data shown? It would be nice to see how the total egglaying rate changes after hibernation. Also, is the proportion of trophic eggs laid similar between individual queens?

      No the data were not shown and we do not have excellent data to make this point. We have therefore removed the sentence “This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable” from the manuscript.

      Figure 6B: Do several colonies produce 100% gynes despite receiving trophic eggs? It would be interesting if the authors discussed why this might occur (e.g., the larvae are already fully determined to be queens and not responsive to whatever signal is in the trophic eggs).

      The reviewer is correct that 4 colonies produced 100% gynes despite receiving trophic eggs. However, the number of individuals produced in these four colonies was small (2,1,2,1, see supplementary Table 2). So, it is likely that it is just by chance that these colonies produced only gynes.

      Figure 5: Why a separation by "size distribution variation of miRNA"? What is the relevance of looking at size distributions as opposed to levels?

      We did that because there many different miRNA species, reflected by the fact that there is not just one size peak but multiple one. This is why we looked at size distribution

      Figure 2: The image of the viable embryo is not clear. If possible, redo the viable to show better quality images.

      Unfortunately, we do not anymore have colonies in the laboratory so this is not possible.

      COMMENTS ON DISCUSSION:

      Lines 236-247: Can an explanation be provided as to why the effect of trophic eggs in P. rugosus is the opposite of those observed by studies referenced in this section? Could P. rugosus have any life history traits that might explain this observation?

      In the two mentioned studies there were other factors that co-varied with variation in the quantity of trophic eggs. We mentioned that and suggested that it would be useful to conduct experimental manipulation of the quantity of trophic eggs in the Argentine ant and P. barbatus (the two species where an effect of trophic eggs had been suggested).

      The discussion should include implications and future research of the discovery.

      We made some suggestions of experiments that should be performed in the future

      The conclusion paragraph is too short and does not represent what was discussed.

      We added two sentences at the end of the paragraph to make suggestions of future studies that could be performed.

      Lines 231 to 247: Drastically reduce and move this whole part to the introduction to substantiate the assumption that trophic eggs play a nutritional role.

      We moved most of this paragraph to the introduction, as suggested by the reviewer.

      Reviewer #3 (Recommendations For The Authors):

      I would like to commend the authors on their study. The main findings of the paper are individually solid and provide novel insight into caste determination and the nature of trophic eggs. However, the inferences made from much of the data and connections between independent lines of evidence often extend too far and are unsubstantiated.

      We thank the reviewer for the positive comment. We made many changes in the manuscript to improve the discussion of our results.

    1. eLife Assessment

      This paper introduces an important theoretical method for characterizing symmetries of cells in biological tissues by capturing their real shape, and it sets results in contrast to related methods. The robustness of the paper's method to correctly capture dynamic and geometric changes that the cells may undergo is determined by convincing computational models, but the experimental support is incomplete and would benefit from better experimental imaging with higher quality and extended analysis. This would not only support the advantage of this method, but also strengthen its application to biological systems.

    2. Reviewer #1 (Public review):

      Summary:

      The authors' stated aim is to introduce so-called Minkowski tensors to characterize and quantify the shape of cells in tissues. The authors introduce Minkowski tensors and then define the p-atic order q<sub>p</sub>, where p is an integer, as a cell shape measure. They also introduce a previously defined measure of p-atic order in the form of the parameter γ<sub>p</sub>. The authors compute q<sub>p</sub>p for data obtained by simulating an active vertex model and a multiphase field model, where they focus on p=2 and p=6 - nematic and hexatic order - as the two values of highest biological relevance. Based on their analysis, the authors claim that q<sub>2</sub> and q<sub>6</sub> are independent, that there is no crossover for the coarse-grained quantities, that the comparison of q<sub>p</sub> for different values of p is not meaningful, and determine the dependence of the mean value of q<sub>2</sub> and q<sub>6</sub>q<sub>6</sub> on cell activity and deformability. They then apply their method to data from MDCK monolayers and argue that the γ<sub>p</sub> "fail to capture the nuances of irregular cell shapes".

      Strength:

      The work presents a set of parameters that are useful for analyzing cell shape.

      Weaknesses:

      The main weakness of the manuscript is that the points that the authors make are not sufficiently elaborated or supported by the data. Although they start out with Minkowski tensors, they eventually only consider the parameters q<sub>p</sub>, which can be defined without any recourse to Minkowski tensors. Also, I dare to doubt that the average reader will benefit from the introduction to Minkowski tensors as it remains abstract and does not really go beyond repeating definitions. Eventually, for me, the work boils down to the statement that when you want to characterize (2d) cell shape, then it is better to take the whole cell contour instead of only the positions of the vertices of a polygon that approximates the full cell shape. By the way, for polygons, the q<sub>p</sub> and γ<sub>p</sub> should convey the same information as the vertex positions contain the whole geometric information.

      Some statements made about the values of q<sub>p</sub> are not supported by the data. For example, an independence of values of q<sub>2</sub> and q<sub>6</sub> cannot be inferred from Figure 7. Actually, Figure 8 points to some dependence between these values as the peaks of the pdfs move in the opposite direction as deformability and activity are changed. Figure 1 suggests that in general, larger cells have lower values of q<sub>p</sub> for all p. Some more serious quantification should be obtained here.

      The presented experimental data on MDCK cells is anecdotal.

    3. Reviewer #2 (Public review):

      Summary:

      Orientational symmetries of cells and tissues play an important role in describing processes in development and disease, and the methods used to investigate them rely on the detection of cell shape. In this interesting and very timely manuscript by Lea Happel et al., Minkowski tensors are introduced to study the orientational symmetries of cells and set in comparison to existing shape descriptors, such as the shape function introduced by Armengol-Collado et al., which captures the orientational symmetry by the vertex positions of the polygonal shape of the cell. As an advantage, the Minkowski tensors consider the real cell shape with its arbitrary curvature of the cortex. Using computational models, such as the active vertex model and the multiphase field model, as well as experimental support with MDCK monolayers, the authors find that the orientational symmetries are independent of one another, as well as that they are dependent on the activity and deformability of the cells, resulting in a monotonic trend. A trend that has not been observed for the hexatic symmetry using the shape function. Together with the lack of hexatic-nematic crossover at the tissue scale, the authors suggest a reconsideration of findings from other shape descriptors. Taken together, the Minkowski tensors set a framework to investigate orientational symmetries at a single cell scale and how they may interplay in biological tissues.

      Strengths:

      The authors introduce the Minkowski tensors, which capture the p-atic orders of cells in tissues, considering their real shape instead of a polygonal approximation as reported for other shape descriptors in the literature. Thus, they do not depend on the vertex positions of the cells nor on the number of neighboring cells. The Minkowski tensors capture the dependence of the p-atic orders on the cell activity and deformability in a monotonic manner, which makes them a robust tool for quantifying p-atic orders at a single-cell scale, especially for rounded cells. The robustness has been tested by comparing the results of two computational model systems that simulate cell monolayers and whose results have been extended with experimental data. The Minkowski tensors have been used to explore the role of cell-cell adhesion and density in epithelial cells and have shown similar results to the shape function, a polygonal shape descriptor.

      Weaknesses:

      The authors point out the importance of studying the orientational order in biological systems. However, the current version of the manuscript lacks statistical information, a description of analysis methods, and experimental support. This support is needed to strengthen (i) the results of the two computational models and (ii) give weight to the authors' strong claim against other widely accepted shape descriptors capturing p-atic orders. The Minkowski tensors, which consider the real cell shapes, are reported to be a better method to investigate the p-atic orders of cells than the shape function introduced by Armengol-Collado et al. While there may be differences in the reported results coming from the two different approaches, both approaches show similar trends. As it stands, there is substantiated discussion as to why one method would be better than the other. The shape function, γ<sub>6</sub>, may not be monotonic for great changes in cell activity and deformability, hinting at a potential weakness. In contrast to the shape function and results by Armengol-Collado et al. and Eckert et al., the coarse-grained Minkowski tensors do not capture the hexatic-nematic crossover at the tissue scale, applied here only to computational models. The cells simulated in the computational models have a similar size and the monolayer has a nearly regular pattern, which does not reflect the density variance in biological tissues. To strengthen the author's claim that there is no crossover at the tissue scale, experimental verification is essential. Further, the robustness of the Minkowski tensors seems to rely on determining the p-atic orders on the shape of individual cells in the tissue. However, when applying the shape descriptor to experimental systems, the p-atic orders are very low, perhaps too low for comparisons between different p-atic orders with meaningful conclusions.

    4. Reviewer #3 (Public review):

      Hapel et al. submit an article entitled “Quantifying the shape of cells - from Minkowski tensors to p-atic order”. The paper reports the p-actic quantitative method - established in physics - to extract cell shapes in experiments using phase contrast images of MDCK cells and simulations - vertex model and phase fields. The rationale of the quantification with adaptation of Minkowski tensors, as well as the detailed extraction of distributions of shapes and plots, distributions quantifying shapes are documented, with an emphasis on changes in cell shapes and their importance in epithelial dynamics.

      Higher rank tensors are considered as well as representations with intuitive meanings and q<sub>i</sub> orders and their potential correlations or absence of correlations. For example, q<sub>2</sub> and q<sub>6</sub>, and statements about nematic and hexatic orders. A strong body of evidence is already reported in the papers of Armengol et al., quoted substantially in the paper, and the authors insist on an improvement thanks to the Minkowski tensors approach to challenge the former crossovers correlations statements.

      Although the approach seems to present advantages, the paper does not appear sufficiently novel. Beyond the Armengol et al. paper, the advantages of this approach compared to the shear decomposition (from MPI-PKS Dresden) or the links joining centroids and its neighbours approach (MSC/Curie Paris) for example.

    5. Author response:

      We thank the editors and the reviewers for their valuable comments. In response to these suggestions, we will add rigorous statistical measures and extend the experimental support of our findings in a revised version. Indeed, as we will show, doing so strengthens all the main claims. Specifically:

      Concerning Reviewer 1:

      - It is important to emphasise that the advantage of deriving shape measures q<sub>p</sub> from Minkowski tensors is their robustness and stability, that is well-established from extensive, rigorous mathematical analyses. Introducing q<sub>p</sub> without this connection to revised Minkowski tensors would not allow to claim this stability property for the considered measures.

      - Even though for a polygon the vertex positions contain the whole geometric information, using q<sub>p</sub> and γ<sub>p</sub> lead to different results, see Fig. 6 for an example.

      - We wholeheartedly agree that our statement on independence of values of q<sub>2</sub> and q<sub>6</sub> can be extended and more quantitatively established by rigorous statistical measures. This is exactly what we will do in the revised version, not only providing statistical measures on the presented data, but also extending our analyses to the published data from Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779. As we shall show these analyses further strengthen this claim, unequivocally establishing the independence of q<sub>2</sub> and q<sub>6</sub> in two different models (active vertex model and multiphase-field model), as well as two different sets of experiments (the ones in the original manuscript, and the published one from Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779).

      Concerning Reviewer 2:

      To fully address this point, we have extended our analyses to explore the published data of Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779. As we shall show in the revised manuscript, the crossover between nematic and hexatic is only specific to the use of γ<sub>p</sub> for characterizing the shape and coarse-graining of the associated order. Using q<sub>p</sub> as the shape measure this crossover disappears. Therefore, this analyses concretely demonstrate that the crossover is not a robust physical feature of the system and is dependent on the method used to define shape characteristics.

      Concerning Reviewer 3:

      We respectfully note a misunderstanding from the referee: The briefly mentioned approaches of other groups, turn out to be not measuring shape but connections between cells. Conceptually these approaches are therefore related to bond order parameters. We already comment at the end of the section introducing Minkowski tensors that bond order parameters cannot quantify the shape of a cell. The same argumentation also holds for other such approaches. In our revised version we will further clarify this distinction, to avoid any confusion or misinterpretation.

    1. eLife Assessment

      In this manuscript, the authors analyse the nanoscale localisation of α5β1 and αVβ3 integrins in integrin adhesion complexes (IAC) by dual-colour STORM and assess the spatial organisation at the nano and mesoscale of their main adaptors (paxillin, talin and vinculin). This is an important work that provides detailed analyses that reveal how elements of these complex structures are really organised at the nanoscale, an essential perspective for a better understanding of how IACs function and regulate mechanotransduction processes. The evidence presented is solid, with super-resolution imaging experiments conducted using a single, validated methodology and subsequent computational modelling that enabled a quantitative assessment of the resulting data.

    2. Reviewer #1 (Public review):

      Summary:

      In recent years, it has become increasingly evident how beautifully intricate IAC are at the nanoscale. Studies like the one presented here that shed light on the precise inner organisation of IAC are thus quite important and relevant in order to obtain a better in-depth understanding of IAC functioning and the contribution of different integrin subtypes to cell adhesive and mechanotransductive processes.

      Interestingly, the authors found a distinct localisation of α5β1 and αVβ3 integrin nanoclusters within focal adhesion of human fibroblasts, with α5β1 integrin nanoclusters being at the periphery of IAC and αVβ3 integrin nanoclusters randomly distributed. Furthermore, a surprisingly high percentage of inactive integrins within IAC and relatively low spatial integrin colocalisation with adaptor proteins has been shown.

      Strengths:

      This is a very thoroughly performed STORM-based assessment of the nanodistribution of α5β1 and αVβ3 nanoclusters within IAC (and outside). The image quality is outstanding, and the authors have meticulously executed the experiments and the image analyses.

      Weaknesses:

      The only weakness is maybe that the manuscript remains descriptive. However, the high quality of the "description" of the nano-organisation of IAC by this scrupulous study is really important to better understand the inner workings of IAC. It provides a very solid foundation to look deeper into the (patho)physiological implications of this organisation, see recommendations (which are rather suggestions in this case).

    3. Reviewer #2 (Public review):

      Summary:

      In this study, dual-color super-resolution microscopy analysis was performed to study the co-operation between integrins and focal adhesion proteins in human fibroblast cells. The study focused on two integrins which have been previously found to be mainly responsible for focal adhesions, namely α5β1 and αvβ3.

      Specifically, the study tried to shed light on the nanoclustering of integrins in focal adhesions.

      In the current study, more integrin nanoclusters were observed in focal adhesions compared to other cell-matrix adhesion structures. The study revealed that both α5β1 and αvβ3 form nanoclusters, and those appear segregated from each other. While αvβ3 nanoclusters organize randomly inside focal adhesions regardless of their activation state, α5β1 nanoclusters, and particularly the nanoclusters containing β1-integrin in active conformation, preferentially organized at the edges of focal adhesions. The nanoclusters formed by each integrin were similar in size.

      Cytoplasmic adapter proteins appeared less in nanocluster assemblies, suggesting that integrin nanoclusters are also forming without the studied cytoplasmic adapter proteins (talin, vinculin, paxillin). Active integrins were identified with the help of conformation-specific antibodies, and this enabled us to study the colocalization between integrins and their cytoplasmic adapter proteins. This analysis revealed that activated integrins are strongly engaged with adapter proteins

      Strengths:

      The study stems from the thorough computational modelling of the nanoclusters, which enables quantification of the behavior of the clusters, including their mesoscale distribution.

      The study strengthens the view that α5β1 and αvβ3 have specific functions in focal adhesions, α5β1 nanoclusters localizing preferentially on focal adhesion edges. The study also revealed that nanoclusters localized at the edges of focal adhesion were enriched for talin and paxillin but not for vinculin.

      Analysis of adaptor protein nanoclusters (paxillin, talin, and vinculin) revealed that all adapter protein nanoclusters studied here close to active β1 nanoclusters are enriched on the focal adhesion edge region, whereas integrin adaptor nanoclusters far from active β1 appear to be more uniformly distributed.

      Importantly, the current study suggests that integrin subtype-specific nanoclusters are not only present at an early stage of adhesion formation, but integrin nanoclusters remain segregated from each other also in mature focal adhesions, maintaining their sizes and number of molecules.

      Interestingly, the study revealed that selected cytoplasmic adaptors (paxillin, talin, and vinculin), also form nanoclusters of similar size and number of single molecule localizations as the integrins, regardless of whether they locate inside or outside focal adhesions. The adapter nanoclusters are enriched in the focal adhesion "belt", colocalizing with the active α5β1 integrin nanoclusters.

      Weaknesses:

      The current study is highly dependent on the antibodies. It is possible that antibodies containing two binding sites for antigen influence the nanoscale organization (and also activation) of the receptors. Control experiments to study the possible contribution of antibodies to the measured outcome should be performed to verify the main findings. One possible approach could be to use fluorescently tagged integrins available. Alternatively, integrins (or adapter proteins) could be tagged with a small ligand and detected using a monovalent binder.

      Only a limited number of integrin adapter proteins were investigated. Given the high number of identified adapter proteins, this is an understandable choice. However, it would be fascinating to understand if the nanoclusters of inactive integrins are dominantly bound with a certain adapter protein, such as tensin.

    4. Reviewer #3 (Public review):

      Summary:

      In their study, the authors reveal using dual-color super-resolution STORM microscopy modality and immunolabeling in fixed adherent cells, that β1 and β3 integrins as well as adaptors (paxillin, talin and vinculin) are all organized in nanoclusters of similar size (50nm) and molecular density (20 copy number) inside FAs but also outside. Using activity-specific immunolabeling of β1 and β3 integrins, they revealed that active integrin subpopulations were both clustered but in distinct exclusive nano-aggregates in agreement with Spiess et al. (2018). Once more, the "active" integrin nanoclusters displayed similar properties in terms of size and molecular density, suggesting that molecular organization in nanoclusters is an intrinsic property of integrins in plasma membrane multimerizing independently of their location (inside or outside FAs), their level of activation, or their connection to the cytoskeleton. Then the authors followed up by analyzing at the mesoscale how these "universal" nanoclustered adhesive units are distributed spatially. Inspecting the surface density of nanoclusters revealed that the density of integrin nanoclusters in FAs was 5x larger, compared to integrin nanoclusters outside adhesions. Interestingly, whereas the density of total integrin nanoclusters was 2-4x larger than adaptor nanoclusters, the density of "active" integrin nanoclusters stoichiometrically matches that of talin and vinculin nanoclusters, and was slightly outnumbered by paxillin nanoclusters. These findings suggest that inside FAs, among the total number of integrin nanoclusters, the subset of "active" integrin nanoclusters could be engaged with "adaptor" nanoclusters on a 1:1 ratio. Using analysis of the nearest neighbor distance (NND) between distinct integrin clusters and each of the adaptors, the authors report that they found negligible spatial colocalization of integrins with these adaptor proteins and that spatial segregation is essentially determined by the density of nanoclusters within the FAs. As authors reported that α5β1 and αvβ3 do not intermix at the nanoscale, the authors finally highlighted how α5β1 and αvβ3 distinct nanoclusters are differently organized and segregated inside FAs. Adapting the NND analysis in order to inspect how far the nanoclusters are from the edges of FAs they are located in, authors revealed that α5β1 but not αvβ3 integrin nanoclusters are enriched on FA edges and that similar FA edge-enriched distribution for "active" α5β1 and adaptor protein nanoclusters was found for talin and paxillin but not vinculin. The latter results suggest that FA edges could constitute multiprotein hubs for enhanced colocalization and activation for α5β1 integrin nanoclusters and adaptors such as talin and paxillin. Unfortunately NND analysis could not confirm this enhanced colocalization hypothesis.

      General Assessment:

      While the study presents some valuable findings, it reads currently as a compilation of intriguing but preliminary observations derived primarily from a single methodology (dual-color STORM and DBSCAN clustering analysis). As the initial findings often lack confirmation through additional data analysis (such as the NND analysis the authors used), there's a critical necessity to bolster the methodological approach. This should involve replicating the main findings using alternative single-molecule super-resolution techniques (such as quantitative DNA-PAINT) or employing different clustering analytical tools (such as voronoi-tessellation). Furthermore, the manuscript feels incomplete, focusing solely on describing molecular organization without offering substantial insights into how these observations correlate with the regulation, activation, and functionality of integrins at the cellular level.

      The manuscript presents extensive datasets and utilizes methodologies in which the investigators demonstrate expertise. Nevertheless, there's uncertainty regarding the novelty and broad appeal of the findings. For instance, the observation of integrin nanoclustering has been previously reported in several publications (e.g., Changede et al., Dev Cell 2015; Spiess et al., JCB 2018; Fujiwara et al., JCB 2023). Similarly, the accumulation of specific proteins at the periphery of FAs has been documented elsewhere (e.g., Sun et al., NCB 2016; Stubb et al., NatComm 2019; Nunes-Vicente TCB 2023), as well as the differential dynamic organization of α5β1 and αvβ3 integrins inside FAs (e.g., Rossier et al., NCB 2012). Beyond the universal organization of adhesive proteins, there's a need to identify novel insights that significantly advance the field. One potential avenue could involve pinpointing the molecular determinant controlling the FA edge enrichment of active α5β1 integrins and talin nanoclusters. For instance, could there be an interplay between α5β1 and αvβ3 integrin nanoclusters visible on one's organisation when suppressing the other using deletion (KO) or depletion (SiRNA)? Also, could KANK, which also exhibits enrichment and regulates talin activity (e.g., Sun et al., NCB 2016), play a role in this process? Identifying the molecular players that regulate even partially the mesoscale organization of nanoclusters of proteins would really benefit the breadth of this manuscript.

      Echoing the previous concern, the manuscript described a novel and rather surprising finding related to molecular clustering of adhesion proteins. Indeed, the fact that nanoclusters exhibit uniform size and molecular density regardless of the protein type, location, or activation level is indeed surprising and raises many questions about the methodology used to assess molecular clustering. I feel that the description and characterization of integrin nanoclusters appear incomplete and need to be expanded by comparing different analytical strategies for protein clustering. Furthermore, a lack of the manuscript in its actual form concerns the quantification of integrin numbers inside the observed nanoclusters. I agree that the path from optical microscopy to protein stoichiometry quantification is hard and full of drawbacks. But the authors do not fully address these issues that are extremely important when discussing protein nanoclustering. This quantitative aspect should be discussed.

      First, it is crucial for the authors to carefully examine and discuss in their manuscript whether there are any potential biases or limitations in the experimental techniques (dual-color STORM) or data analysis methods employed (DBSCAN). Second, the authors did not in the current manuscript, but should provide control samples to demonstrate the sensitivity and dynamic range of their experimental strategy.

      In STORM images displayed in Figure S1, the authors highlighted localization clusters detected by DBSCAN as a signature for integrin nanoclusters. But the authors do not discuss the localization spots that were not detected by DBSCAN. Could they be individual integrins? And if so, they should also be considered as useful information? This brings me to another related technical question about how DBSCAN handles the case where fluorescent molecules are blinking. This is important as multiple emissions by a single fluorophore could be detected as a nanocluster of several molecules where it would be an artefact due to the photophysics of the fluorophore. Could the authors comment on these points?

      Also, using isolated and stochastically physisorbed fluorophores (Ab coupled with activator /reporter pairs used in this study) on glass helped define the signature in STORM of a single isolated molecule. To obtain the signature of clustered fluorophores, the authors could use anti-donkey antibodies to cross-link those STORM-specifically labeled Ab as a means to artificially obtain clustered fluorophores. Ultimately, to avoid the bias effect of the glass surfaces on the photophysics of fluorophores and be in the same imaging conditions as for the described nanoclusters, the authors should use model systems composed of multimers of GFP vs. single GFP, immunolabeled with a GFP-binding monoclonal antibody. This will permit evaluation of the cluster signature obtained with DBSCAN analysis of STORM data for single vs. multimers of known stoichiometry. This would constitute an undisputable molecular stoichiometry ruler.

      Due to the surprising finding of the nanoclusters' "universality", it is imperative for the authors to validate the findings through complementary methodologies and analytical tools. This should involve replication of results using alternative super-resolution techniques (quantitative DNA-PAINT) and exploring different clustering algorithms (Voronoï-Tesselation) to ensure the robustness and reliability of the observations.

    5. Author response:

      As a short response to the public reviews, we would like to outline the following planned revisions:

      (1) Address the antibody concerns as indicated by reviewer 1

      (2) Assess the role of tensin (and possibly KANK), as suggested by reviewers 2 and 3, respectively.

      (3) Validate our main experimental findings using alternative super-resolution approaches, including STED to avoid potential blinking artefacts associated to standard STORM, and most possibly DNA-PAINT as a more quantitative technique, as suggested by reviewer 3.

      (4) Implement alternative analytical strategies to DBSCAN, including Voronoi tessellation as suggested by reviewer 3.

      (5) Expanded discussion on the main findings of our work and biological significance.

    1. eLife Assessment

      This paper presents a method for detecting Naegleria fowleri infection, which is almost always fatal, using small RNA from blood. This could be an important advance since early detection might improve treatment outcomes. The mouse work is methodologically solid, but only a very small number of human samples were available for human validation.

    2. Reviewer #1 (Public review):

      Summary:

      Early and accurate diagnosis is critical to treating N. fowleri infections, which often lead to death within 2 weeks of exposure. Current methods-sampling cerebrospinal fluid are invasive, slow, and sometimes unreliable. Therefore, there is a need for a new diagnostic method. Russell et al. address this need by identifying small RNAs secreted by Naegleria fowleri (Figure 1) that are detectable by RT-qPCR in multiple biological fluids including blood and urine. SmallRNA-1 and smallRNA-2 were detectable in plasma samples of mice experimentally infected with 6 different N. fowleri strains, and were not detected in uninfected mouse or human samples (Figure 4). Further, smallRNA-1 is detectable in the urine of experimentally infected mice as early as 24 hours post-infection (Figure 5). The study culminates with testing human samples (obtained from the CDC) from patients with confirmed N. fowleri infections; smallRNA-1 was detectable in cerebrospinal fluid in 6 out of 6 samples (Figure 6B), and in whole blood from 2 out of 2 samples (Figure 6C). These results suggest that smallRNA-1 could be a valuable diagnostic marker for N. fowleri infection, detectable in cerebrospinal fluid, blood, or potentially urine.

      Strengths:

      This study investigates an important problem, and comes to a potential solution with a new diagnostic test for N. fowleri infection that is fast, less invasive than current methods, and seems robust to multiple N. fowleri strains. The work in mice is convincing that smallRNA1 is detectable in blood and urine early in infection. Analysis of patient blood samples suggest that whole blood (but not plasma) could be tested for smallRNA-1 to diagnose N. fowleri infections.

      Weaknesses:

      (1) There are not many N. fowleri cases, so the authors were limited in the human samples available for testing. It is difficult to know how robust this biomarker is in whole blood (only 2 samples were tested, both had detectable smallRNA-1), serum (1 out of 1 sample tested negative), or human urine (presumably there is no material available for testing). This limitation is openly discussed in the last paragraph of the discussion section.

      (2) There seems to be some noise in the data for uninfected samples (Figures 4B-C, 5B, and 6C), especially for those with serum (2E). While this is often orders of magnitude lower than the positive results, it does raise questions about false positives, especially early in infection when diagnosis would be the most useful. A few additional uninfected human samples may be helpful.

    3. Reviewer #2 (Public review):

      Summary:

      The authors sought to develop a rapid and non-invasive diagnostic method for primary amoebic meningoencephalitis (PAM), a highly fatal disease caused by Naegleria fowleri. Due to the challenges of early diagnosis, they investigated extracellular vesicles (EVs) from N. fowleri, identifying small RNA biomarkers. They developed an RT-qPCR assay to detect these biomarkers in various biofluids.

      Strengths:

      (1) This study has a clear methodological approach, which allows for the reproducibility of the experiments.

      (2) Early and Non-Invasive Diagnosis - The identification of a small RNA biomarker that can be detected in urine, plasma, and cerebrospinal fluid (CSF) provides a non-invasive diagnostic approach, which is crucial for improving early detection of PAM.

      (3) High Sensitivity and Rapid Detection - The RT-qPCR assay developed in the study is highly sensitive, detecting the biomarker in 100% of CSF samples from human PAM cases and in mouse urine as early as 24 hours post-infection. Additionally, the test can be completed in ~3 hours, making it feasible for clinical use.

      (4) Potential for Disease Monitoring - Since the biomarker is detectable throughout the course of infection, it could be used not only for early diagnosis but also for tracking disease progression and monitoring treatment efficacy.

      (5) Strong Experimental Validation - The study demonstrates biomarker detection across multiple sample types (CSF, urine, whole blood, plasma) in both animal models and human cases, providing robust evidence for its clinical relevance.

      (6) Addresses a Critical Unmet Need - With a >97% case fatality rate, PAM urgently requires improved diagnostics. This study provides one of the first viable liquid biopsy-based diagnostic approaches, potentially transforming how PAM is detected and managed.

      Weaknesses:

      (1) Limited Human Sample Size - While the biomarker was detected in 100% of CSF samples from human PAM cases, the number of human samples analyzed (n=6 for CSF) is relatively small. A larger cohort is needed to validate its diagnostic reliability across diverse populations.

      (2) Lack of Pre-Symptomatic or Early-Stage Human Data - Although the biomarker was detected in mouse urine as early as 24 hours post-infection, there is no data on whether it can be reliably detected before symptoms appear in humans, which is crucial for early diagnosis and treatment initiation.

      (3) Plasma Detection Challenges - While the biomarker was detected in whole blood, it was not detected in human plasma, which could limit the ease of clinical implementation since plasma-based diagnostics are more common. Further investigation is needed to understand why it is absent in plasma and whether alternative blood-based approaches (e.g., whole blood assays) could be optimized.

    1. eLife Assessment

      The study investigates an emerging research field: the interaction between sleep and development. The authors use Drosophila larvae sleep as a study model and provide valuable insight into how neuropeptide circuitry controls larvae sleep. By using a broad range of behaviour and imaging methods and analysis, the authors conclude a sleep regulatory neural pathway of Hugin-PK2-Dilps in the Drosophila neurosecretory centre IPC. However, the evidence that supports this pathway is incomplete - in particular, the methodology in sleep measurement and the specificity at each step of the Hugin-PK2-Dilps pathway require further clarifying experiments or explanation.

    2. Reviewer #1 (Public review):

      Summary:

      The study investigates how neuropeptidergic signaling affects sleep regulation in Drosophila larvae. The authors first conduct a screen of CRISPR knock-out lines of genes encoding enzymes or receptors for neuropeptides and monoamines. As a result of this screen, the authors follow up on one hit, the hugin receptor, PK2-R1. They use genetic approaches, including mutants and targeted manipulations of PK2-R1 activity in insulin-producing cells (IPCs) to increase total sleep amounts in 2nd instar larvae. Similarly, dilp3 and dilp5 null mutants and genetic silencing of IPCs show increases in sleep. The authors also show that hugin mutants and thermogenetic/optogenetic activation of hugin-expressing neurons caused reductions in sleep. Furthermore, they show through imaging-based approaches that hugin-expressing neurons activate IPCs. A key finding is that wash-on of hugin peptides, Hug-γ and PK-2, in ex vivo brain preparations activates larval IPCs, as assayed by CRTC::GFP imaging. The authors then examine how the PK2-R1, hugin, and IPC manipulations affect adult sleep. Finally, the authors examine how Ca2+ responses through CRTC::GFP imaging in adult IPCs are influenced by the wash-on of hugin peptides. The conclusions of this paper are somewhat well supported by data, but some aspects of the experimental approach and sleep analysis need to be clarified and extended.

      Strengths:

      (1) This paper builds on previously published studies that examine Drosophila larval sleep regulation. Through the power of Drosophila genetics, this study yields additional insights into what role neuropeptides play in the regulation of Drosophila larval sleep.

      (2) This study utilizes several diverse approaches to examine larval and adult sleep regulation, neural activity, and circuit connections. The impressive array of distinct analyses provides new understanding into how Drosophila sleep-wake circuitry in regulated across the lifespan.

      (3) The imaging approaches used to examine IPC activation upon hugin manipulation (either thermogenetic activation or wash-on of peptides) demonstrate a powerful approach for examining how changes in neuropeptidergic signaling affect downstream neurons. These experiments involve precise manipulations as the authors use both in vivo and ex vivo conditions to observe an effect on IPC activity.

      Weaknesses:

      Although the paper does have some strengths in principle, these strengths are not fully supported by the experimental approaches used by the authors. In particular:

      (1) The authors show total sleep amount over an 18-hour period for all the measures of 2nd instar larval sleep throughout the paper. However, published studies have shown that sleep changes over the course of 2nd instar development, so more precise time windows are necessary for the analyses in this study.

      (2) Previously published reports of sleep metrics in both Drosophila larvae and adults include the average number of sleep episodes (bout number) and the average length of sleep episodes (bout length). Neither of these metrics is included in the paper for either the larval sleep or adult sleep data. Not including these metrics makes it difficult for readers to compare the findings in this study to previously published papers in the established Drosophila sleep literature.

      (3) Because Drosophila adult & larval sleep is based on locomotion, the authors need to show the activity values for the experiments supporting their key conclusions. They do show travel distances in Figure 2 - Figure Supplement 1, however, it is not clear how these distances were calculated or how the distances relate to the overall activity of individual larvae during sleep experiments. It is also concerning that inactivation of the PK2-R1-expressing neurons causes a reduction in locomotion speed. This could partially explain the increase in sleep that they observe.

      (4) The authors rely on homozygous mutant larvae and adult flies to support many of their conclusions. They also rely on Gal4 lines with fairly broad expression in the Drosophila brain to support their conclusions. Adding more precise tissue-specific manipulations, including thermogenetic activation and inhibition of smaller populations of neurons in the study would be needed to increase confidence in the presented results. Similarly, demonstrating that larval development and feeding are not affected by the broad manipulations would strengthen the conclusions.

      (5) Many of the experiments presented in this study would benefit from genetic and temperature controls. These controls would increase confidence in the presented results.

      (6) The authors claim that their findings in larvae uncover the circuit basis for larval sleep regulation. However, there is very little comparison to published studies demonstrating that neuropeptides like Dh44 regulate larval sleep. Because hugin-expressing neurons have been shown to be downstream of Dh44 neurons, the authors need to include this as part of their discussion. The authors also do not explain why other neuropeptides in the initial screen are not pursued in the study. Given the effect that these manipulations have on larval sleep in their initial screen, it seems likely that other neuropeptidergic circuits regulate larval sleep.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines larval sleep patterns and compares them to sleep regulation in adult flies. The authors demonstrate hallmark sleep characteristics in larvae, including sleep rebound and increased arousal thresholds. Through genetic and behavioral analyses, they identify PK2-R1 as a key receptor involved in sleep modulation, likely via the HuginPC-IPC signaling pathway. Loss of PK2-R1 results in increased sleep, which aligns with previous findings in hugin knockout mutants. While the study presents significant contributions to the field, further investigation is needed to address discrepancies with earlier research and strengthen mechanistic claims.

      Strengths:

      (1) The study explores a relatively understudied aspect of sleep regulation, focusing on larval development.

      (2) The use of an automated behavioral measurement system ensures precise quantification of sleep patterns.

      (3) The findings provide strong genetic and behavioral evidence supporting the role of the HuginPC-IPC pathway in sleep regulation.

      (4) The study has broader implications for understanding the evolution and functional divergence of sleep circuits.

      Weaknesses:

      (1) The manuscript does not sufficiently discuss previous studies, particularly concerning hugin mutants and their metabolic effects.

      (2) The specificity of IPC secretion mechanisms is unclear, particularly regarding potential indirect effects on Dilp2.

      (3) Alternative circuits, such as the HuginPC-DH44 pathway, require further consideration.

      (4) Functional connectivity between HuginPC neurons and IPCs is not directly validated.

      (5) Developmental differences in sleep regulatory mechanisms are not thoroughly examined.

    4. Reviewer #3 (Public review):

      Summary:

      Sleep affects cognition and metabolism, evolving throughout development. In mammals, infants have fast sleep-wake cycles that stabilize in adults via circadian regulation. In this study, the author performed a genetic screen for neurotransmitters/peptides regulating sleep and identified the neuropeptide Hugin and its receptor PK2-R1 as essential components for sleep in Drosophila larvae. They showed that IPCs express Pk2-R1 and silencing IPCs resulted in a significant increase in the sleep amount, which was consistent with the effect they observed in PK2-R1 knock-out mutants. They also showed that Hugin peptides, secreted by a subset of Hugin neurons (Hug-PC), activate IPCs through the PK2-R1 receptor. This activation prompts IPCs to release insulin-like peptides (Dilps), which are implicated in the modulation of sleep. They showed that Hugin peptides induce a PK2-R1 dependent calcium (Ca²⁺) increase in IPCs, which they linked to the release of Dilp3, showing a connection between Hugin signaling to IPCs, Dilp3 release, and sleep regulation. Additionally, the activation of Hug-PC neurons reduced sleep amounts, while silencing them had the opposite effect. In contrast to the larval stage, the Hugin/PK2-R1 axis was not critical for sleep regulation in Drosophila adults, suggesting that this neuropeptidergic circuitry has divergent roles in sleep regulation across different stages of development.

      Strengths:

      This study used an updated system for sleep quantification in Drosophila larvae, and this method allowed precise measurement of larval sleep patterns which is essential for the understanding of sleep regulation.

      The authors performed unbiased genetics screening and successfully identified novel regulators for larval sleep, Hugin and its receptor PK2-R1, making a substantial contribution to the understanding of neuropeptidergic control of sleep regulation.

      They clearly demonstrated the mechanism by which Hugin-expressing neurons influence sleep through the activation of IPCs via PK2-R1 with Ca2+ responses and can modulate sleep.

      Based on the demonstrated activation of PK2-R1 by the human Hugin orthologue Neuromedin U, research on human sleep disorders may benefit from the discoveries from Drosophila since sleep-regulating mechanisms are conserved across species.

      Weaknesses:

      The study primarily focused on sleep regulation in Drosophila larvae, showing that the Hugin/PK2-R1 axis is critical for larval sleep but not necessary for adult sleep. The effects of the Hugin axis in the adult are, however, incompletely explained and somewhat inconsistent. PK2-R1 knockout adults also display increased sleep, as does HugPC silencing, at least for daytime sleep. The difference lies in Dilp3/5 mutant animals showing decreased sleep and IPCs seemingly responding with reduced Dilp3 release to PK-2 treatment (Figure 6). It seems difficult to reconcile the author's conclusions regarding this point without additional data. It could be argued that PK2-R1 still regulates adult sleep, but not via Hugin and IPCs/Dilps.

      Another issue might be that the authors show relative sleep levels for adults using Trikinetics monitoring. From the methods, it is not clear if the authors backcrossed their line to an isogenic wild-type background to normalize for line-specific effects on sleep. Thus, it is likely that each line has differences in total sleep time due to background effects, e.g., their Kir2.1 control line showed reduced sleep relative to the compared genotypes. This might limit the conclusions on the role of Hugin/PK2-R1 on adult sleep.

    1. eLife Assessment

      This valuable study presents an alternative platform for nanobody discovery using phage-displayed synthetic libraries. The evidence supporting the platform is compelling, which is used to isolate and validate nanobodies targeting Drosophila secreted proteins. By making this library openly accessible, this provides an excellent resource to the wider scientific community. The detailed protocol used in this manuscript, associated with various methods for nanobody screening, provides an alternative and reliable platform for nanobody discovery.

    2. Reviewer #1 (Public review):

      Summary:

      Using highly specific antibody reagents for biological research is of prime importance. In the past few years, novel approaches have been proposed to gain easier access to such reagents. This manuscript describes an important step forward toward the rapid and widespread isolation of antibody reagents. Via the refinement and improvement of previous approaches, the Perrimon lab describes a novel phage-displayed synthetic library for nanobody isolation. They used the library to isolate nanobodies targeting Drosophila secreted proteins. They used these nanobodies in immunostainings and immunoblottings, as well as in tissue immunostainings and live cell assays (by tethering the antigens on the cell surface).

      Since the library is made freely available, it will contribute to gaining access to better research reagents for non-profit use, an important step towards the democratisation of science.

      Strengths:

      (1) New design for a phage-displayed library of high content.

      (2) Isolation of valuble novel tools.

      (3) Detailed description of the methods such that they can be used by many other labs.

      Weaknesses:

      My comments largely concentrate on the representation of the data in the different Figures.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors propose an alternative platform for nanobody discovery using a phage-displayed synthetic library. The authors relied on DNA templates originally created by McMahon et al. (2018) to build the yeast-displayed synthetic library. To validate their platform, the authors screened for nanobodies against 8 Drosophila secreted proteins. Nanobody screening has been performed with phage-displayed nanobody libraries followed by an enzyme-linked immunosorbent assay (ELISA) to validate positive hits. Nanobodies with higher affinity have been tested for immunostaining and immunoblotting applications using Drosophila adult guts and hemolymph, respectively.

      Strengths:

      The authors presented a detailed protocol with various and complementary approaches to select nanobodies and test their application for immunostaining and immunoblotting experiments. Data are convincing and the manuscript is well-written, clear, and easy to read.

      Weaknesses:

      On the eight Drosophila secreted proteins selected to screen for nanobodies, the authors failed to identify nanobodies for three of them. While the authors mentioned potential improvements of the protocol in the discussion, none of them have been tested in this manuscript.

      The same comment applies to the experiments using membrane-tethered forms of the antigens to test the affinity of nanobodies identified by ELISA. Many nanobodies fail to recognize the antigens. While authors suggested a low affinity of these nanobodies for their antigens, this hypothesis has not been tested in the manuscript.

      Improving the protocol at each step for nanobody selection would greatly increase the success rate for the discovery of nanobodies with high affinity.

    1. eLife Assessment

      This valuable study determines the functional requirements for localization and activity of S. cerevisiae septin-associated kinases using in vivo imaging, in vitro and in vivo protein-protein interaction assays, and an instructive in vivo "tethering" approach. In addition to confirming previous results, the study offers evidence that the septin-associated kinases may directly interact with the contractile ring machinery. Although the experiments appear to have been conducted correctly, the quantitative analysis of some experiments is incomplete and should be improved to strengthen the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors wanted to better understand how the various septin-associated kinases contribute to septin organization and function in budding yeast. This question has been recently addressed by similar kinds of studies but there are still some open questions, particularly as regards to what extent the kinases may interact with and/or modify components of the contractile ring that drives cytokinesis.

      Strengths:

      This study uses sensitive imaging with good temporal and spatial resolution to monitor the localization of various proteins in living cells. Particularly informative is the use of a GFP/GFP-binding-protein "tethering" approach to ask if the requirement for one protein can be bypassed by physically tethering another protein to a third protein. Results from a yeast two-hybrid assay for measuring protein-protein interactions in vivo are buttressed by direct in vitro binding assays using purified proteins, which is important given the likelihood of "bridging" interactions between yeast proteins in the two-hybrid approach. The authors' conclusions are quite well supported by the data.

      Weaknesses:

      A control for non-specific binding is missing from the in vitro binding assay. The figures suffer sometimes from the very small text in the labels, which obscures understanding. Ultimately, while the study provides some interesting and novel insights, we still don't understand which phosphorylation events on which proteins are important for the events occurring at the molecular level, so the advance in knowledge is somewhat incremental.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Bhojappa et al. provide insights into the function of septin-related kinases Elm1, Gin4, Hsl1, and Kcc4 in septin organization and actomyosin ring (AMR) structure and constriction. Their findings are both corroborative of and complementary to previous related studies.

      First, the authors provide a comparative analysis of the dynamic localization of these kinases at the bud neck, as well as a comparative analysis of defects in septin localization, splitting dynamics, AMR constriction rates, and cell morphology in kinase-deficient cells. They find that septin localization and splitting kinetics, as well as AMR constriction rates, are significantly perturbed in elm1∆ and gin4∆ mutants but remain largely unaffected in hsl1∆ and kcc4∆. A similar trend is observed in terms of cell morphology and viability.

      Next, the authors focus on elm1∆ and gin4∆ cells, demonstrating that the residence time of the F-BAR protein Hof1 is significantly increased and defective in these mutants. Using yeast two-hybrid (Y2H) and in vitro binding assays, they show that the KA1 domain of Gin4 interacts with the F-BAR domain of Hof1, which may explain the cytokinesis-related functions of Elm1 and Gin4. Supporting this, they find that Gin4's role in septin localization, AMR constriction kinetics, and Hof1 bud neck localization is kinase-independent.

      The authors then conduct a series of artificial tethering experiments given their bud neck localization is mostly interdependent. They first demonstrate that artificially tethering Gin4 to the bud neck rescues the morphology defects of elm1∆ cells, with the strongest rescue observed when Gin4 was forced to interact with Hsl1-an effect that was also kinase-independent. Additionally, artificial tethering of Hsl1 to the bud neck restores the morphology of elm1∆ cells in a KA1 domain-dependent manner, suggesting that Hsl1 functions downstream of Elm1 to maintain normal cell morphology. Consistently, artificial tethering of Elm1 to the bud neck in gin4∆ cells rescues morphology defects, as well as defects in Myo1 localization and AMR constriction, but only in the presence of full-length Hsl1. The rescue fails in the absence of Hsl1 or when using a version of Hsl1 lacking the KA1 domain, which supports the role of Hsl1 downstream to Elm1 in cytokinesis.

      Strengths

      Altogether, this study offers valuable insights into the mode of cytokinesis regulation mediated by the septin-related kinases, mainly Elm1, Gin4, and Hsl1, and would be an important contribution to the field of septins and cytokinesis after addressing current weaknesses.

      Weaknesses

      (1) When assessing rescue of the elm1∆ phenotype, it needs to become clearer whether only morphology or also cytokinesis and septin organization are rescued.

      (2) The quantification of the microscopy data does not always match up with the example images, and it's not always clear how the authors quantitatively analyzed their data.

      (3) The forced tethering data are key to the paper, but the lack of a summarizing table makes it difficult to grasp the full picture.

      (4) Novel results and those confirming earlier results could be better distinguished.

    4. Reviewer #3 (Public review):

      Summary:

      The study by Bhojappa et al. brings new and interesting elements about the stability of the septin ring and the crosstalk between septin and actomyosin ring assemblies. The study focuses on the four kinases associated with the septin ring, Elm1p, Gin4p, Hsl1p, and Kcc4p. Elm1 and Gin4 show strong knock-out phenotypes, whereas Hsl1p and Kcc4p show weak knock-out phenotypes. The Elm1p/Kccp1p and Gin4p/Hsl1p pairs show similar timing at the bud neck. While these kinases share redundant functions, Gin4 appears to have a unique interaction with the BAR domain protein Hof1, revealing a novel direct interaction between the septin and actomyosin rings. Interestingly, the kinase activity of Gin4 is not required for its role in septin organisation and AMR constriction. The last part of the manuscript shows an original protein tethering protocol used to show that Hsl1 and its membrane binding ability are required for phenotype rescue of gin4null cells.

      Strengths:

      The combination of genetics, cell imaging, and biochemical characterization of protein-protein interactions is attractive.

      Weaknesses:

      (1) Imaging and data analysis is the main weakness of this manuscript. The authors must avoid manual counting and selection when easy analysis software can be used to limit bias. Instead of presenting unclear statistics of "percentage phenotypes", they need to define clear metrics to offer meaningful phenotype analysis.

      (2) This manuscript examines a very complex mechanism with four kinases of overlapping function using new data and existing literature. A clearer picture/model at the end of the manuscript that synthesizes the current knowledge would be beneficial.

    1. eLife Assessment

      This important study presents single-unit activity collected during model-based (MB) and model-free (MF) reinforcement learning in non-human primates. The dataset was carefully collected, and the statistical analyses, including the modeling, are rigorous. The evidence convincingly supports different roles for particular cortical and subcortical areas in representing key variables during reinforcement learning.

    2. Reviewer #1 (Public review):

      Summary:

      Using single-unit recording in 4 regions of non-human primate brains, the authors tested whether these regions encode computational variables related to model-based and model-free reinforcement learning strategies. While some of the variables seem to be encoded by all regions, there is clear evidence for stronger encoding of model-based information in the anterior cingulate cortex and caudate.

      Strengths:

      The analyses are thorough, the writing is clear, and the work is well-motivated by prior theory and empirical studies.

      Weaknesses:

      My comments here are quite minor.

      The correlation between transition and reward coefficients is interesting, but I'm a little worried that this might be an artifact. I suspect that reward probability is higher after common transitions, due to the fact that animals are choosing actions they think will lead to higher reward. This suggests that the coefficients might be inevitably correlated by virtue of the task design and the fact that all regions are sensitive to reward. Can the authors rule out this possibility (e.g., by simulation)?

      The explore/exploit section seems somewhat randomly tacked on. Is this really relevant? If yes, then I think it needs to be integrated more coherently.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate single-neuron activity in rhesus macaques during model-based (MB) and model-free (MF) reinforcement learning (RL). Using a well-established two-step choice task, they analyze neural correlates of MB and MF learning across four brain regions: the anterior cingulate cortex (ACC), dorsolateral PFC (DLPFC), caudate, and putamen. The study provides strong evidence that these regions encode distinct RL-related signals, with ACC playing a dominant role in MB learning and caudate updating value representations after rare transitions. The authors apply rigorous statistical analyses to characterize neural encoding at both population and single-neuron levels.

      Strengths:

      (1) The research fills a gap in the literature, which has been limited in directly dissociating MB vs. MF learning at the single unit level and across brain areas known to be involved in reinforcement learning. This study advances our understanding of how different brain regions are involved in RL computations.

      (2) The study used a two-step choice task Miranda et al., (2020), which was previously established for distinguishing MB and MF reinforcement learning strategies.

      (3) The use of multiple brain regions (ACC, DLPFC, caudate, and putamen) in the study enabled comparisons across cortical and subcortical structures.

      (4) The study used multiple GLMs, population-level encoding analyses, and decoding approaches. With each analysis, they conducted the appropriate controls for multiple comparisons and described their methods clearly.

      (5) They implemented control regressors to account for neural drift and temporal autocorrelation.

      (6) The authors showed evidence for three main findings:<br /> a) ACC as the strongest encoder of MB variables from the four areas, which emphasizes its role in tracking transition structures and reward-based learning. The ACC also showed sustained representation of feedback that went into the next trial.<br /> b) ACC was the only area to represent both MB and MF value representations.<br /> c) The caudate selectively updates value representations when rare transitions occur, supporting its role in MB updating.

      (7) The findings support the idea that MB and MF reinforcement learning operate in parallel rather than strictly competing.

      (8) The paper also discusses how MB computations could be an extension of sophisticated MF strategies.

      Weaknesses: o

      (1) There is limited evidence for a causal relationship between neural activity and behavior. The authors cite previous lesion studies, but causality between neural encoding in ACC, caudate, and putamen and behavioral reliance on MB or MF learning is not established.

      (2) There is a heavy emphasis on ACC versus other areas, but it is unclear how much of this signal drives behavior relative to the caudate.

      (3) The role of the putamen is somewhat underexplored here.

      (4) The authors mention the monkeys were overtrained before recording, which might have led to a bias in the MB versus MF strategy.

      (5) The GLM3 model combines MB and MF value estimates but does not clearly mention how hyperparameters were optimized to prevent overfitting. While the hybrid model explains behavior well, it does not clarify whether MB/MF weighting changes dynamically over time.

      (6) It was unclear from the task description whether the images used changed periodically or how the transition effect (e.g., in Figure 3) could be disambiguated from a visual response to the pair of cues.

    1. eLife Assessment

      This is an important study that connects the polymerase-associated factor 1 complex (Paf1C) with Histone 2B monoubiquitination and the expression of genes key to virulence in Cryptococcus neoformans. The provided information is convincing and has the potential to open several opportunities to further understand the basic biology of this significant human fungal pathogen.

    2. Reviewer #1 (Public review):

      In the manuscript entitled "Rtf1 HMD domain facilitates global histone H2B monoubiquitination and regulates morphogenesis and virulence in the meningitis-causing pathogen Cryptococcus neoformans" by Jiang et al., the authors employ a combination of molecular genetics and biochemical approaches, along with phenotypic evaluations and animal models, to identify the conserved subunit of the Paf1 complex (Paf1C), Rtf1, and functionally characterize its critical roles in mediating H2B monoubiquitination (H2Bub1) and the consequent regulation of gene expression, fungal development, and virulence traits in C. deneoformans or C. neoformans. Specially, the authors found that the histone modification domain (HMD) of Rtf1 is sufficient to promote H2B monoubiquitination (H2Bub1) and the expression of genes related to fungal mating and filamentation, and restores the fungal morphogenesis and pathogenicity defects caused by RTF1 deletion. These findings highlight the critical contribution of Rtf1's HMD to epigenetic regulation and cryptococcal virulence. This work will be of interest to fungal biologists and medical mycologists, particularly those studying fungal epigenetic regulation and fungal morphogenesis.

      Comments on revisions:

      The revised manuscript addresses all my previous concerns satisfactorily.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript entitled "Rtf1 HMD domain facilitates global histone H2B monoubiquitination and regulates morphogenesis and virulence in the meningitis-causing pathogen Cryptococcus neoformans" by Jiang et al., the authors employ a combination of molecular genetics and biochemical approaches, along with phenotypic evaluations and animal models, to identify the conserved subunit of the Paf1 complex (Paf1C), Rtf1, and functionally characterize its critical roles in mediating H2B monoubiquitination (H2Bub1) and the consequent regulation of gene expression, fungal development, and virulence traits in C. deneoformans or C. neoformans. Specially, the authors found that the histone modification domain (HMD) of Rtf1 is sufficient to promote H2B monoubiquitination (H2Bub1) and the expression of genes related to fungal mating and filamentation, and restores the fungal morphogenesis and pathogenicity defects caused by RTF1 deletion.

      Strengths:

      The manuscript is well-written and presents the findings in a clear manner. The findings are interesting and contribute to a better understanding of Rtf1-mediated epigenetic regulation of fungal morphogenesis and pathogenicity in a major human fungal pathogen, and potentially in other fungal species, as well.

      Weaknesses:

      A major limitation of this study is the absence of genome-wide information on Rtf1-mediated H2B monoubiquitination (H2Bub1), as well as a lack of detail regarding the function of the Plus3 domain. Although overexpression of HMD in the rtf1Δ mutant restored global H2Bub1 levels, it did not rescue certain critical biological functions, such as growth at 39 °C and melanin production (Figure 4C-D). This suggests that the precise positioning of H2Bub1 is essential for Rtf1's function. A comprehensive epigenetic landscape of H2Bub1 in the presence of HMD or full-length Rtf1 would elucidate potential mechanisms and shed light on the function of the Plus3 domain.

      We thank the reviewer (and other reviewers) for this excellent suggestion. We have conducted CUT&Tag assays with WT, _rtf1_Δ mutant, and complementary strains with the full length Rtf1 and only HMD domain cultured under 30 and 39 °C. We indeed found that the epigenetic landscape of H2Bub1 in the presence of HMD or full-length Rtf1 has variations. This results strongly suggest that the distribution of H2Bub1 is regulated by Rtf1, and H2B modifications at specific loci in the chromosome may contribute to thermal tolerance in C. neoformans. These new findings from CUT&Tag assays shed lights on understanding the mechanism of thermal tolerance, and we decided not to include these results in the current manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to determine the role of Rtf1 in Cryptococcal biology, and demonstrate that Rtf1 acts independently of the Paf1 complex to exert regulation of Histone H2B monoubiquitylation (H2Bub1). The biological impact of the loss of H2Bub1 was observed in defects in morphogenesis, reduced production of virulence factors, and reduced pathogenic potential in animal models of cryptococcal infection.

      Strengths:

      The molecular data is quite compelling, demonstrating that the Rtf1-depednent functions require only this histone modifying domain of Rtf1, and are dependent on nuclear localization. A specific point mutation in a residue conserved with the Rtf1 protein in the model yeast demonstrates the conservation of that residue in H2Bub1 modification. Interestingly, whereas expression of the HMD alone suppressed the virulence defect of the rtf1 deletion mutant, it did not suppress defects in virulence factor production.

      Weaknesses:

      The authors use two different species of Cryptococcus to investigate the biological effect of Rtf1 deletion. The work on morphogenesis utilized C. deneoformans, which is well-known to be a robust mating strain. The virulence work was performed in the C. neoformans H99 background, which is a highly pathogenic isolate. The study would be more complete if each of these processes were assessed in the other strain to understand if these biological effects are conserved across the two species of Cryptococcus. H99 is not as robust in morphogenesis, but reproducible results assessing mating and filamentation in this strain have been performed. Similarly, C. deneoformans does produce capsule and melanin.

      We thank the reviewer for the suggestion. We have conducted assays to quantify both capsule and melanin production in both C. neoformans and C. deneoformans strain background. We found that capsule production was affected in the same pattern in these two serotypes. Interestingly, we found the cell size was significantly affected by deletion of RTF1 in both serotypes. In addition, melanin production was reduced due to the deletion of RTF1 in both serotypes; However, complementation with Plus3 or mutated alleles of HMD gave different phenotypes in these two serotypes. These new findings were included Figure 4 in the revised manuscript.

      There are some concerns with the conclusions related to capsule induction. The images reported in Figure B are purported to be grown under capsule-inducing conditions, yet the H99 panel is not representative of the induced capsule for this strain. Given the lack of a baseline of induction, it is difficult to determine if any of the strains may be defective in capsule induction. Quantification of a population of cells with replicates will also help to visualize the capsular diversity in each strain population.

      We thank the reviewer for raising this concern. We have tested capsule production under capsule-inducing condition on 10% fetal bovine serum (FBS) agar medium [1]. Under this condition, the capsule layers surrounding the cells were obvious. We also included noncapsule-producing control in our assay to help the visualization of capsule. In addition, we quantified the ratio between diameters of capsule layer and cell body to show the capsular diversity in each strain population. The results were included in the Figure 4 in the revised manuscript.

      The authors demonstrate that for specific mating-related genes, the expression of the HMD recapitulated the wild-type expression pattern. The RNA-seq experiments were performed under mating conditions, suggesting specificity under this condition. The authors raise the point in the discussion that there may be differences in Rtf1 deposition on chromatin in H99, and under conditions of pathogenesis. The data that overexpression of HMD restores H2Bub1 by western is quite compelling, but does not address at which promoters H2Bub1 is modulating expression under pathogenesis conditions, and when full-length Rtf1 is present vs. only the HMD.

      We thank the reviewer for raising these concerns. Please see our response to Reviewer #1.

      Reviewer #3 (Public Review):

      Summary:

      In this very comprehensive study, the authors examine the effects of deletion and mutation of the Paf1C protein Rtf1 gene on chromatin structure, filamentation, and virulence in Cryptococcus.

      Strengths:

      The experiments are well presented and the interpretation of the data is convincing.

      Weaknesses:

      Yet, one can be frustrated by the lack of experiments that attempt to directly correlate the change in chromatin structure with the expression of a particular gene and the observed phenotype. For example, the authors observed a strong defect in the expression of ZNF2, a known regulator of filamentation, mating, and virulence, in the rtf1 mutant. Can this defect explain the observed phenotypes associated with the RTF1 mutation? Is the observed defect in melanin production associated with altered expression of laccase genes and altered chromatin structure at this locus?

      We completely agree with the reviewer. We have conducted CUT&Tag assay, and checked the Rtf1-mediated H2Bub1 at these particular gene loci. We found that the distribution of H2Bub1 at the promoter region of ZNF2 and the gene body of laccase-encoding gene varied possibly due to RTF1 mutation. We would like to save those preliminary findings for another story and not to include in this manuscript as we mentioned in the response to Reviewer #1.

      (1) Jang, E.-H., et al., Unraveling Capsule Biosynthesis and Signaling Networks in Cryptococcus neoformans. Microbiology Spectrum, 2022. 10(6): p. e02866-22.

    1. eLife Assessment

      Goswami and colleagues used rod-specific Gls1 (the gene encoding glutaminase 1) knockout mice to investigate the role of GLS1 in photoreceptor health when GLS1 was deleted from developing or adult photoreceptor cells. This study is fundamental as it shows the critical role of glutamine catabolism in photoreceptor cell health using in vivo model systems. The evidence supporting the authors' claims is compelling. The studies add new insight into how specific metabolites support vision.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show for the first time that deleting GLS from rod photoreceptors results in the rapid death of these cells. The death of photoreceptor cells could result from loss of synaptic activity because of a decrease in glutamate, as has been shown in neurons, changes in redox balance, or nutrient deprivation.

      Strengths:

      The strength of this manuscript is that the author shows a similar phenotype in the mice when Gls was knocked out early in rod development or the adult rod. They showed that rapid cell death is through apoptosis, and there is an increase in the expression of genes responsive to oxidative stress.

      Comments on revisions:

      The authors addressed all of my concerns in their responses to reviewers.

    3. Reviewer #2 (Public review):

      Summary:

      Photoreceptor neurons are crucial for vision, and discovering pathways necessary for photoreceptor health and survival can open new avenues for therapeutics. Studies have shown that metabolic dysfunction can cause photoreceptor degeneration and vision loss, but the metabolic pathways maintaining photoreceptor health are not well understood. This is a fundamental study that shows that glutamine catabolism is critical for photoreceptor cell health using in vivo model systems.

      Strengths:

      The data are compelling, and the consideration of potential confounding factors (such as glutaminase 2 expression) and additional experiments to examine the synaptic connectivity and inner retina added strength to this work. The authors were also careful not to overstate their claims, but to provide solid conclusions that fit the results and data provided in their study. The findings linking asparagine supplementation and the inhibition of the integrated stress response to glutamine catabolism within the rod photoreceptor cell are intriguing and innovative. Overall, the authors provide convincing data to highlight that photoreceptors utilize various fuel sources to meet their metabolic needs, and that glutamine is critical to these cells for their biomass, redox balance, function and survival.

    4. Reviewer #3 (Public review):

      Summary:

      The authors explored the role of GLS, a glutaminase, which is an enzyme catalyzes the conversion of glutamine to glutamate, in rod photoreceptor function and survival. The loss of GLS was found to cause rapid autonomous death of rod photoreceptors.

      Strengths:

      Interesting and novel phenotype. Two types of cre-lines were rigorously used to knockout Gls gene in rods. Both of the conditional knockouts led to a similar phenotype, i.e. rod death. Histology and ERG were carefully done to characterize the loss of rods over specific ages. Necessary metabolomic study was performed and appreciated. Some rescue experiments were performed, and revealed possible mechanism.

      Weaknesses:

      No major weaknesses. Mechanism of GLS-loss induced rod death could be followed up in the future, and same for GLS's role in cones. Authors have addressed all minor points raised by this reviewer.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show for the first time that deleting GLS from rod photoreceptors results in the rapid death of these cells. The death of photoreceptor cells could result from loss of synaptic activity because of a decrease in glutamate, as has been shown in neurons, changes in redox balance, or nutrient deprivation.

      Strengths:

      The strength of this manuscript is that the author shows a similar phenotype in the mice when Gls was knocked out early in rod development or the adult rod. They showed that rapid cell death is through apoptosis, and there is an increase in the expression of genes responsive to oxidative stress.

      We thank the reviewer for their time reviewing the manuscript and their comments regarding the potential mechanism(s) by which rod photoreceptors rapidly degenerate upon knockout of GLS.

      Weaknesses:

      In this manuscript, the authors show a "metabolic dependency of photoreceptors on glutamine catabolism in vivo". However, there is a potential bias in their thinking that glutamine metabolism in rods is similar to cancer cells where it feeds into the TCA cycle. They should consider that as in neurons, GLS1 activity provides glutamate for synaptic transmission. The modest rescue shown by providing α-ketoglutarate in the drinking water suggests that glutamine isn't a key metabolic substrate for rods when glucose is plentiful. The ERG studies performed on the iCre-Glsflox/flox mice showed a large decrease in the scotopic b wave at saturating flashes which could indicate a decrease in glutamate at the rod synapse as stated by the authors. While EM micrographs of wt and iCre-Glsflox/flox mice were shown for the outer retina at p14, the synapse of the rods needs to be examined by EM.

      We agree with the reviewer that in the presence of sufficient glucose, it appears a lack of GLS-driven glutamine (Gln) catabolism does not drastically alter the levels of TCA cycle metabolites or mitochondrial function as we demonstrated in Figure 4, and supplementation with alpha-ketoglutarate improved outer nuclear layer thickness by only a small amount as observed in Figure 5e. Hence, as we stated in the Results and Discussion, at least in the mouse where Gls is selectively deleted from rod photoreceptors by crossing Gls<sup>fl/fl</sup> mice with Rho-Cre mice (Gls<sup>fl/fl</sup>; Rho-Cre<sup>+</sup>, cKO), Gln’s role in supporting the TCA cycle is not the major mechanism by which rod photoreceptors utilize Gln to suppress apoptosis.

      With regards to GLS-driven Gln catabolism providing glutamate (Glu) for synaptic transmission, we again agree with the reviewer that Glu is an important excitatory neurotransmitter, but it is also a key metabolite necessary for the synthesis of glutathione, amino acids, and proteins. As noted and discussed at length in the manuscript, a lack of GLS-driven Gln catabolism in rod photoreceptors leads to reduced levels of oxidized glutathione (Figure 4D) possibly signaling an overall reduction in the biosynthesis of glutathione as Glu is directly and indirectly responsible for its synthesis. Furthermore, Gln and GLS-derived Glu play a central role in the biosynthesis of several nonessential amino acids and proteins. To this end, we see a reduction in the level of Glu, which is the product of the GLS reaction and further confirms the loss of GLS function. We also noted a significant decrease in aspartate (Asp), which can be constructed from the carbons and nitrogens of Gln as discussed at length in the manuscript (Figure 6A). Finally, we noted a significant decrease in global protein synthesis in the cKO retina as compared to the wild-type animal as well (Figure 6E). Therefore, the data suggest that GLS-driven Gln catabolism is critical for amino acid metabolism and protein synthesis and to some degree redox balance; although, the small but statistically significant changes in oxidized glutathione, NADP/NADPH, and redox gene expression may not fully account for the rapid and complete photoreceptor degeneration observed. Future studies are necessary to shed light on the role of redox imbalance in this novel transgenic mouse model.

      Glu also plays a role in synaptic transmission, and we considered this scenario as described in Figure 1 – figure supplement 5. Here, the synaptic connectivity between photoreceptors and the inner retina did not demonstrate significant differences in the labeling of photoreceptor synaptic membranes in the outer plexiform layer nor alterations in the labeling of a key protein (Bassoon) in ribbon synapses. These data suggest that the synaptic connectivity between photoreceptors and second-order neurons was unaltered at P14 in the cKO retina, which is the time just prior to rapid photoreceptor degeneration when Glu was shown to be decreased (Figure 6A).

      With regards to the ERG changes noted in Figure 2, we agree with the reviewer that a large decrease was noted in the scotopic b-wave at P21 and P42 in the cKO. We also agree, that to obtain greater insight into these ERG changes, the ribbon synapse in EM images can be examined. The EM images shown in Figure 1 – figure supplement 4 are from P21, which coincide with the age at which the ERG changes were first noted and when significant photoreceptor degeneration has already occurred. These images were utilized to assess the ribbon synapse for the revised version of the manuscript. As now shown in Figure 1 – figure supplement 4D, ribbon synapses are intact in WT animals as denoted by the yellow boxes. Similarly, the ribbons (yellow arrows) appear structurally intact in the photoreceptors that remain in the P21 cKO retina. These results are in accordance with the lack of significant differences in the labeling of photoreceptor synaptic membranes in the outer plexiform layer as well as the lack of alterations in the labeling of a key protein (Bassoon) in ribbon synapses (Figure 1-figure supplement 5A and B).  While we cannot fully rule out that the decrease in glutamate is altering synaptic transmission, our structural data suggests the synapses remain intact. These data have been added to the revised manuscript.

      However, an even larger reduction in the scotopic a-wave was noted at these ages as well. In animal models that disrupt photoreceptor synaptic function (Dick et al. Neuron. 2003; Johnson et al. J Neuroscience. 2007; Haeseleer et al. Nature Neuroscience. 2004; Chang et al. Vis Neurosci. 2006), a more negative ERG pattern is typically observed with the b-wave altered to a much larger degree than the a-wave. Additionally, in these models that disrupt photoreceptor synaptic transmission, the overall structure of the retina with respect to thickness is maintained (Dick et al. Neuron. 2003) or noted to have modest changes in the outer plexiform layer within the first two months of age with the outer nuclear layer not significantly altered until 8-10 months of age (Haeseleer et al. Nature Neuroscience. 2004). In contrast, a rapid decline in the outer nuclear layer thickness was observed in the cKO retina after P14 likely contributing to the ERG changes noted in Figure 2. Also, Gln is catabolized to Glu primarily by GLS as suggested by the approximately 50% reduction in Glu levels in the cKO retina (Figure 6A), but other enzymes are also capable of catabolizing Gln to Glu, so Glu levels in the rod photoreceptors are unlikely to be zero. Coupling this with the fact that rods are equipped with a self-sufficient Glu recollecting system at their synaptic terminals (Hasegawa et al. Neuron. 2006; Winkler et al. Vis Neurosci. 1999) and that GLS activity is at least two-fold higher in the photoreceptor inner segments, which support energy production and metabolism, than any other layer in the retina (Ross et al. Brain Res. 1987) suggests that altered synaptic transmission secondary to reduced levels of Glu likely does not account in full for the rapid and robust photoreceptor degeneration observed in the cKO retina.

      The authors note that the outer segments are shorter but they do not address whether there is a decrease in the number of cones.

      We have adjusted Figure 2E by removing the GLS staining to better highlight the secondary degeneration of cone outer segments, the main point of the Figure, as we had already shown that GLS was cleanly knocked out of rod photoreceptors in Figure 1. Furthermore, qualitatively the number of cones appears the same at P14, P21, and P42 between the WT and cKO, which is consistent with other retinal degeneration models, like rd1 and rd10, where cones do not begin to die until all the rods have degenerated (Xue et al. eLife. 2021).

      Rod-specific Gls ko mice with an inducible promoter were generated by crossing the Pde6g-CreERT2 and homozygous for either the WT or floxed Gls allele (IND-cKO). In Figure 3 the authors document that by western blots and antibody labeling the GLS1 expression is lost in the IND-cKO 10 days post tamoxifen. OCT images show a decrease in the thickness of the outer nuclear layer between 17 and 38 days post-TAM. Ergs should be performed on the animals at 10 and 30 days post TAM, before and after major structural changes in rod photoreceptor cells, to determine if changes in light-stimulated responses are observed. These studies could help to parse out the cause of photoreceptor cell death.

      We agree with the reviewer that the IND-cKO is a useful tool to help parse out the cause of photoreceptor cell death in this model as well as shed light on the role of GLS-driven Gln catabolism in photoreceptor synaptic transmission as discussed at length above. Hence, ERG analyses were performed 10 days post TAM, before major structural changes in the ONL are observed. Interestingly, ERG demonstrated statistically significant reductions in the IND-cKO scotopic a- and b-waves as compared to the WT 10 days post TAM. Similarly, photopic ERG demonstrated statistically significant decreases in the b-wave of the IND-cKO retina. These data suggest that GLS-driven Gln catabolism plays a significant role not only in rod photoreceptor survival but their function as well. This data has been added to Figure 3H-I and discussed in the corresponding manuscript text.

      To this end, as discussed below and added to Figure 6 – figure supplement 1, amino acid levels, including glutamate (Glu), are already reduced 10 days post TAM. Reductions in the level of Glu may impact synaptic transmission and as a result, the scotopic b-wave. However, as noted above, altered synaptic transmission secondary to reduced levels of Glu likely does not account in full for the rapid and robust photoreceptor degeneration observed in the cKO retina as the b-wave to a-wave ratio is not significantly altered in the IND-cKO retina as compared to the WT retina, suggesting GLS-driven Gln catabolism is impairing both to a similar degree.

      Additionally, Pde6g is expressed by rods to a significant degree but also by cones (GSE63473, scRNAseq data). Therefore, the IND-cKO mouse likely knocks out GLS from both rods and cones, which is in accordance with the immunofluorescence image in Figure 3B where GLS is not observed in rod or cone inner segments unlike in Figure 1B where GLS remains in cones. Hence, the reduction in photopic b-wave may be demonstrating that GLS-driven Gln catabolism in cones impairs synaptic transmission. As noted in our reply to reviewer #3’s comments, we have generated mice lacking GLS in cone photoreceptors specifically and are currently elucidating the role of GLS in cone photoreceptor metabolism, function, and survival. These results will be published in a separate manuscript.

      The studies in Figure 4 were all performed on iCre-Glsflox/flox and control mice at p14, why weren't the IND-cKO mice used for these studies since the findings would not be confounded by development?

      To gain further insight into the role of GLS-driven Gln catabolism in the maintenance of rod photoreceptors as compared to their development/maturation, we conducted a targeted metabolomic analysis on IND-cKO and WT retinas 10 days post TAM. For the purpose of this manuscript, we have included data regarding changes in amino acid levels in Figure 6 – figure supplement 1. Specifically, levels of glutamate, aspartate and asparagine are all significantly decreased in the IND-cKO retina prior to PR degeneration, which demonstrates that similar to the GLS cKO mouse (i.e. iCre-Gls flox/flox), GLS-driven Gln catabolism is critical for amino acid biosynthesis in mature rod PRs as well.

      In all rescue studies, the endpoint was an ONL thickness, which only addressed rod cell death. The authors should also determine whether there are small improvements in the ERG, which would distinguish the role of GLS in preventing oxidative stress.

      Optical coherence tomography (OCT) provides a sensitive in vivo method to detect small changes in retinal thickness without potential artifacts incurred through histological processing. Considering the Gls cKO retina demonstrates significant and rapid photoreceptor degeneration, we wanted to assess pathways that may be critical to photoreceptor survival downstream of GLS-driven Gln catabolism using rescue experiments with pharmacologic treatment or metabolite supplementation. That said, disruption of GLS-driven Gln catabolism may also significantly alter rod photoreceptor function beyond that which is secondary to photoreceptor cell death as we have demonstrated in the IND-cKO animal for the revised version of this manuscript and discussed in a response above. Therefore, the IND-cKO model provides a unique tool to assess the impact of rescue studies on photoreceptor function as the functional changes occur prior to significant degeneration. Also, unlike the GLS cKO mouse (i.e. iCre-Gls flox/flox) where photoreceptor degeneration starts very early, impairing our ability to capture reliable and robust ERG measurements, the IND-cKO mice are older at the time of functional changes allowing for robust ERG measurements. While the rate of photoreceptor degeneration in both mouse models is similar and the levels of key amino acids are altered similarly in both models, the mechanisms of cell death in developing/maturing photoreceptors may be different than that in mature photoreceptors. Hence, before we can assess if similar rescue experiments impact photoreceptor function via ERG in the IND-cKO mouse, we need to thoroughly examine how these photoreceptors are dying. These experiments and results will be published in a separate manuscript in the future.

      Reviewer #2 (Public Review):

      Summary:

      Photoreceptor neurons are crucial for vision, and discovering pathways necessary for photoreceptor health and survival can open new avenues for therapeutics. Studies have shown that metabolic dysfunction can cause photoreceptor degeneration and vision loss, but the metabolic pathways maintaining photoreceptor health are not well understood. This is a fundamental study that shows that glutamine catabolism is critical for photoreceptor cell health using in vivo model systems.

      Strengths:

      The data are compelling, and the consideration of potential confounding factors (such as glutaminase 2 expression) and additional experiments to examine the synaptic connectivity and inner retina added strength to this work. The authors were also careful not to overstate their claims, but to provide solid conclusions that fit the results and data provided in their study. The findings linking asparagine supplementation and the inhibition of the integrated stress response to glutamine catabolism within the rod photoreceptor cell are intriguing and innovative. Overall, the authors provide convincing data to highlight that photoreceptors utilize various fuel sources to meet their metabolic needs, and that glutamine is critical to these cells for their biomass, redox balance, function, and survival.

      We greatly appreciate the reviewer’s thoughtful comments and time spent reviewing this manuscript.

      Weaknesses:

      Recent studies have explored the metabolic "crosstalk" that exists within the mammalian retina, where metabolites are transferred between the various retinal cells and the retinal pigment epithelium. It would be of interest to test whether the conditional knockout mice have changes in metabolism (via qPCR such as shown in Figure 4 - Supplemental Figure 1) within the retinal pigment epithelium that may be contributing to the authors' findings in the neural retina. Additionally, the authors have very compelling data to show that inhibition of eIF2a or supplementation with asparagine can delay photoreceptor death via OCT measurements in their conditional knockout mouse model (Figure 6G, H). However, does inhibition of eIF2a or asparagine adversely impact the WT retina? It would also be impactful to know whether this has a prolonged effect, or if it is short-term, as this would provide strength to potential therapeutic targeting of these pathways to maintain photoreceptor health.

      We agree with the reviewer that metabolic communication in the outer retina is crucial to the function and survival of both photoreceptors and RPE. Therefore, we have performed qRT-PCR on eyecups from cKO and WT mice at P14, prior to photoreceptor degeneration. These data, now included in Figure 4 – figure supplement 2, show no significant changes in genes related to glycolysis, pyruvate metabolism and the TCA cycle in eyecups from cKO mice compared to WT mice at P14. The only exception is a significant decrease in Pdk4 in cKO mouse eyecups compared to WT, which was not observed in retina samples.

      Additionally, we have added data demonstrating that systemic treatment with ISRIB does not adversely impact the anatomy of the wild-type retina. Specifically, we performed OCT after 21 days of ISRIB treatment via intraperitoneal delivery in WT mice and show that total retinal, ONL and inner segment/outer segment thickness is unchanged compared to vehicle. These data are now included in Figure 6 – figure supplement 2A. We have also included data to suggest that the effect of ISRIB extends beyond P21 in the cKO mouse. This data, presented in Figure 6 – figure supplement 2B, shows that at P28, ISRIB continues to statistically significantly increase ONL thickness compared to vehicle in cKO animals.

      Reviewer #3 (Public Review):

      Summary:

      The authors explored the role of GLS, a glutaminase, which is an enzyme that catalyzes the conversion of glutamine to glutamate, in rod photoreceptor function and survival. The loss of GLS was found to cause rapid autonomous death of rod photoreceptors.

      Strengths:

      Interesting and novel phenotype. Two types of cre-lines were rigorously used to knockout the Gls gene in rods. Both of the conditional knockouts led to a similar phenotype, i.e. rod death. Histology and ERG were carefully done to characterize the loss of rods over specific ages. A necessary metabolomic study was performed and appreciated. Some rescue experiments were performed and revealed possible mechanisms.

      We thank the reviewer for their comments and appreciation of the methods utilized herein to address the role of GLS-driven Gln catabolism in rod photoreceptors.

      Weaknesses:

      No major weaknesses were identified. The mechanism of GLS-loss-induced rod death seems not fully elucidated by this study but could be followed up in the future, and the same for GLS's role in cones.

      We agree with the reviewer that the downstream metabolic and molecular mechanisms by which Gln catabolism impacts rod photoreceptor health are not fully elucidated. Defining these mechanisms will advance our understanding of photoreceptor metabolism and identify therapeutic targets promoting photoreceptor resistance to stress. Future studies are underway to uncover these mechanisms. Additionally, while outside the scope of the current manuscript, we have generated mice lacking GLS in cone photoreceptors specifically and are currently elucidating the role of GLS in cone photoreceptor metabolism, function, and survival. These results will be published in a separate manuscript.

      Reviewer #1 (Recommendations For The Authors):

      (1) The results could start at line 135, but the first paragraph isn't necessary. The data is published and could be referred to in the introduction.

      We appreciate the reviewer’s suggestion to shorten the beginning of the Results section; however, we believe the supplementary data, which is described in these lines, confirms the scRNAseq gene expression data, while adding GLS expression and localization data within the retina. The scRNAseq data and its publication was noted in the introduction, so we removed the sentence in line 117-119 that restates these results to shorten this section. We also reduced redundancy by removing an introductory sentence to the second Results paragraph.

      (2) "However, like other metabolically-demanding cells, recent work has demonstrated that PRs have the flexibility to utilize fuel sources beyond glucose to meet their metabolic needs (Adler et al., 2014; Du, Cleghorn, Contreras, Linton, et al., 2013; Grenell et al., 2019; Joyal et al., 2016; Xu et al., 2020)." The paper by Daniele et al. demonstrated that glucose is essential for maintaining the viability of rod photoreceptor cells.

      We thank the reviewer for highlighting published literature, which we apologetically overlooked. The reference for Daniele et al. has now been included.

      (3) "Single-cell RNA sequencing data has demonstrated that Gls is expressed throughout the human and mouse retina and much greater than Gls2 (Voigt et al., 2020). The authors should indicate the specific databases searched in Spectacle.

      We appreciate the reviewer’s attention to detail and have now included the references in the Introduction for GSE63473 from Macosko et al. and GSE142449 from Voigt et al., which were the databases we used in Spectacle to assess Gls levels in the mouse and human retina, respectively.

      References:

      (1) Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas AR, Kamitaki N, Martersteck EM, Trombetta JJ, Weitz DA, Sanes JR, Shalek AK, Regev A, McCarroll SA. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015 May 21;161(5):1202-1214. doi: 10.1016/j.cell.2015.05.002. PMID: 26000488; PMCID: PMC4481139.

      (2) Voigt AP, Binkley E, Flamme-Wiese MJ, Zeng S, DeLuca AP, Scheetz TE, Tucker BA, Mullins RF, Stone EM. Single-Cell RNA Sequencing in Human Retinal Degeneration Reveals Distinct Glial Cell Populations. Cells. 2020 Feb 13;9(2):438. doi: 10.3390/cells9020438. PMID: 32069977; PMCID: PMC7072666.

      (4) The immunolabeling in Figure 2 looks like the images are overexposed, and the Gls antibody is labeling the outer segment, not just the inner segment of photoreceptors.

      We thank the reviewer for their comments regarding our immunofluorescence data. There was background staining of the outer segment in both the WT and cKO retina with decreased GLS staining in the inner segment of the cKO rod photoreceptors at P14 demonstrating loss of GLS in rod photoreceptors similar to Figure 1B.  For Figure 2E, we have provided adjusted images with PNA staining only that better represent the secondary cone degeneration that occurs in the rod photoreceptor-specific Gls cKO, which is the take home point of Figure 2E.

      (5) The authors could use a glutamate antibody to compare it to Gls KO mice as done in Davanger, S., Ottersen, O.P. and Storm-Mathisen, J. (1991), Glutamate, GABA, and glycine in the human retina: An immunocytochemical investigation. J. Comp. Neurol., 311: 483-494. https://doi.org/10.1002/cne.903110404

      We appreciate the reviewer’s suggestion to assess glutamate levels in the wild-type and Gls KO retina via antibody labeling. Our targeted metabolomics studies in Figure 6A provide quantitative evidence that glutamate, the product of the GLS-catalyzed reaction, is decreased as one would expect in that Gls KO retina. The antibody would add to these data by providing the localization of glutamate in the retina. With a rod photoreceptor-specific genetic KO, we would expect glutamate levels to be decreased in these cells. The antibody may also show that glutamate is not only decreased in the rod photoreceptor inner segment, where GLS predominates, but also in the synaptic terminal in accordance with the reviewer’s concerns regarding the impact of GLS KO on synaptic transmission. We have addressed this concern at length above, adding TEM images of the ribbon synapses in the GLS KO retina, and ERG analyses from the IND-cKO animals prior to significant degeneration. In the end, we agree with the reviewer that reduced Glu levels in the GLS cKO retina may impact synaptic transmission to a degree, but the synapses remain intact based on immunofluorescence and TEM analyses and a negative ERG pattern is not observed in the GLS cKO (i.e. iCre-Gls flox/flox) or IND-cKO mouse. As noted above, the structure of the retina in models that disrupt photoreceptor synaptic transmission is maintained (Dick et al. Neuron. 2003) or noted to have modest changes within the first two months of age with the outer nuclear layer not significantly altered until 8-10 months of age (Haeseleer et al. Nature Neuroscience. 2004). So, the impact of the reduced Glu levels on synaptic transmission in the GLS KO retina are unlikely to account in full for the rapid and profound photoreceptor degeneration observed. That said, the IND-cKO mouse, which allows us to assess photoreceptor function prior to significant degeneration unlike the GLS cKO mouse (i.e. iCre-Gls flox/flox), demonstrates GLS-driven Gln catabolism plays a significant role in photoreceptor function but still does not demonstrate a negative ERG pattern. Therefore, assessing Glu localization in this mouse model 10 days post TAM will be informative as to how GLS-driven Gln catabolism impacts photoreceptor function prior to degeneration. The IND-cKO mouse model is currently being extensively characterized for future publication.

      Reviewer #2 (Recommendations For The Authors):

      Main Concerns:

      (1) The authors checked for Gls2 compensation at P14 in the mouse retina. However, this data would be more compelling with an additional timepoint, particularly at P21 which is used in many of their figures throughout the study.

      We thank the reviewer for their suggestion. Figure 1-figure supplement 1D demonstrates no change in Gls2 gene expression at P14 between the WT and cKO retina. With regards to the reviewer’s concern, in Figure 1-figure supplement 1E of the original submission, we demonstrate that the expression of GLS2 is not increased in the cKO retina at P21 via immunofluorescence.

      (2) Recent studies have explored the metabolic "crosstalk" that exists within the mammalian retina, where metabolites are transferred between the various retinal cells and the retinal pigment epithelium. It would be compelling to see whether the cKO mice have changes in metabolism (via qPCR such as shown in Supplementary Figure 1 for Figure 4) within the RPE that may be contributing to their findings in the neural retina. Additionally, mention of this crosstalk and how it may impact their results should be added to the discussion.

      We appreciate the reviewer’s concern for metabolism changes in the RPE of Gls cKO mice. In agreement with reviewer 2, we performed qRT-PCR on eyecups from cKO and WT mice at P14, prior to photoreceptor degeneration. These data, now included in Figure 4 – figure supplement 2, show no significant changes in genes related to glycolysis, pyruvate metabolism and the TCA cycle in eyecups from cKO mice compared to WT mice at P14. The only exception is a significant decrease in Pdk4 in cKO mouse eyecups compared to WT, which was not observed in retina samples.

      (3) The authors use a tamoxifen-inducible cKO model to support their findings in developed rods. However, in Figure 3A it appears that this model has a greater reduction in GLS compared to the Rho-cre mouse model. Can the authors discuss this? Is this cre more efficient at targeting rods or is it leaky and may have affected other retinal cells?

      We thank the reviewer for pointing out this interesting result associated with using the Pde6g-Cre-ERT2 mouse line. Pde6g is expressed by rods to a significant degree but also by cones (GSE63473, scRNAseq data). Therefore, the IND-cKO mouse likely knocks out GLS from both rods and cones upon the TAM induction. To this end, the immunofluorescence image in Figure 3B shows GLS is knocked out in both rod or cone inner segments unlike in Figure 1B where GLS remains in cones when using the rod photoreceptor-specific, Gls<sup>fl/fl</sup> Rho-Cre<sup>+</sup> mouse. As such, as the astute reviewer noted, the fact that Western blot demonstrates greater reduction in GLS protein content fits with the protein being knocked out of both rods and cones. We have added this note about the mouse model in the corresponding text.

      (4) The authors have very compelling data to show that inhibition of eIF2a can delay photoreceptor death via OCT measurements in their cKO mouse model (Figure 6G). However, does ISRIB adversely impact the WT retina? WT vehicle and ISRIB should be shown. It would also be compelling to know whether this has a prolonged effect, or if it is short-term (i.e. would the effect still be present at P42)?

      We appreciate the reviewer’s comments regarding antagonizing the effects of p-eIF2a to prolong photoreceptor survival in the Gls cKO retina. As described above, we have data demonstrating systemic treatment with ISRIB does not adversely impact the anatomy of the wild-type retina (Figure 6-figure supplement 2A). Specifically, we treated WT animals with daily intraperitoneal ISRIB starting at P5 and performed OCT at P21 to show that total retinal, ONL and the inner segment/outer segment thickness is unchanged compared to vehicle-treated WT animals. Additionally, we have included data demonstrating the photoreceptor neuroprotective effect of ISRIB treatment in the Gls cKO mouse extends beyond P21 in the cKO mouse (Figure 6-figure supplement 2B).

      (5) For Figure 6H, same as point #4.

      While we have not specifically assessed potential retinal toxicity secondary to systemic Asn supplementation, oral Asn supplementation (up to 100mg/kg/day) was provided to patients for 24 months and found to be well-tolerated (PMID:31123592). Allometric scaling of this dose to the mouse would yield a mouse dose of 1234 mg/kg/day, which is much greater than the 200mg/kg/day dose provided here (PMID: 27057123). Additionally, a 90-day toxicity study of Asn in rats demonstrated a no observed adverse effect level of 1.62g/kg bodyweight/day in males and 1.73g/kg bodyweight/day in females (PMID: 18508175). The lower dose in that study equates to a mouse dose of 3.2g/kg bodyweight/day, well above the mouse dose utilized in this report. As such, future studies should focus on a dose-response relationship with Asn supplementation, and as the reviewer suggested, determining the duration of effect with Asn supplementation.

      (6) Some of the results section belongs in the introduction or discussion and can be moved.

      We have addressed the reviewer’s concern by moving some of the results to the discussion and removing statements in the results that were either noted in the Introduction or conferred in the Discussion.

      Minor Concerns:

      (1) Scale bar mentions in the figure legends use plural when only one is present, or in some cases are missing. A scale bar should be added to the OCT images if possible.

      We appreciate the reviewer’s attention to detail, and information regarding scale bars has been updated in the figure legends.

      (2) For Figures 1I and J, the sample size changes when J is a quantification of I. Please correct.

      We have corrected the sample size to be consistent between Figures 1I and J.

      (3) In Figure 1 - Figure Supplement 3 the P42 timepoint is not mentioned in the legend. Please correct.

      We have now included the P42 timepoint in the legend for in Figure 1 – Figure Supplement 3 as well as the manuscript text.

      (4) In Figure 1 - Figure Supplement 5 the wrong P value is mentioned in the legend. Please correct.

      We have corrected the P value in the legend for Figure 1 – Figure Supplement 5.

      (5) Can the authors double-check their ERG light intensity settings? They seem high. Please confirm if they are correct.

      We appreciate the reviewer’s concern for ERG light intensity settings and have confirmed the settings used in the study were 32 cd*s/m<sup>2</sup> and 100 cd*s/m<sup>2</sup> for scotopic and photopic ERG recordings, respectively.

      (6) The legend key in Figure 2A would be more helpful if the axis were present by the representative traces.

      We thank the reviewer for the suggestion of adding axes to the ERG traces. Figure 2A has been updated to reflect this modification.

      (7) Can the authors check that the error bars are present in Figure 5E?

      We appreciate the reviewer’s concern for error bars in Figure 5E, which are included in the figure. The standard error in this experiment is so small that the symbols overlap with the error bars.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.

      (1) Figure 6: ISRIB seems to give the most dramatic rescue of cKO GLS in P21 rods. Does it completely prevent rod death? i.e. What's the ONL thickness of P21 WT control? What's the ISRIB rescue of an older cKO animal, say P35?

      The ONL thickness of P21 WT control is on average 0.06 mm (Figure 1E), while the ONL thickness of the Gls cKO retina with ISRIB treatment at P21 is on average 0.044 mm. Therefore, rod death is not completely prevented with ISRIB but rather, rod photoreceptor survival is prolonged. As noted above, we have provided data to demonstrate that the photoreceptor neuroprotective effect of ISRIB lasts beyond P21 (Figure 6-figure supplement 2B).

      (2) What's the mechanistic link between ISR and GLS beyond current speculation? Does GLS have other unknown functions beyond converting glutamine to glutamate? Any novel insights from GLS protein structure?

      We thank the reviewer for this thoughtful question. It is certainly possible that GLS has other functions outside of its role in glutaminolysis. It is well known that other metabolic enzymes have moonlighting functions including hexokinase 2, which has been shown to be important in preventing intrinsic apoptosis through blocking the binding of pro-apoptotic proteins to the mitochondria. While not directly related to ISR, a single report suggests GLS functions non-canonically in Gln-deprived states, promoting mitochondrial fusion to suppress ROS production (PMID: 29934617). Investigating the moonlighting functions of metabolic enzymes is part of our ongoing research program and GLS is included in these studies.

      (3) Just curious about GLS cKO in cones. Any similar phenotype?

      We appreciate the reviewer’s curiosity regarding Gls cKO in cones and this study is currently ongoing with a poster presented at ARVO 2024 (Subramanya et al; Glutaminase-driven glutamine catabolism supports cone photoreceptor metabolism, function, and structure. Invest. Ophthalmol. Vis. Sci. 2024;65(7):193) and a manuscript in preparation. As discussed above, GLS knock out in cones likely impacts their function, in accordance with the data presented at ARVO 2024.

      Recommendations for improving the writing and presentation.

      (1) In the Discussion, lines 458-466, it's incorrect to compare the importance of glucose metabolism to GLS-dependent pathway to photoreceptors in this way. An alternative explanation: glucose metabolism is so important that the system has many redundancies, e.g. HK1 exists in addition to HK2, thus single gene KO leads to no phenotype. The only fair comparison is nutrient deprivation, e.g. taking out glucose or glutamine from retina explants (Punzo et al., 2009).

      The reviewer makes an excellent point. While we do not see an upregulation of GLS2 in the retina or rod PRs upon GLS knockout (Figure 1-figure supplement 1 D and E), loss of Gls in rod PRs does alter the expression of many metabolism-related genes (Figure 4-figure supplement 1).  We alluded to these data and the reviewer’s point in the second paragraph of the discussion: “In any of these transgenic mouse models, PRs may use other transporters to take up fatty acids or glucose or rewire their metabolism to maintain metabolic homeostasis and stave off degeneration (Subramanya et al., 2023; Wubben et al., 2017). Our data show that any metabolic reprogramming that is occurring in the cKO mouse retina appears unable to significantly circumvent the significant and rapid PR degeneration suggesting the importance of Gln catabolism in rod PRs. Furthermore, inducing GLS knockdown in mature PRs also demonstrated rapid PR degeneration (Figure 3).”

      In the revised article, we have amended these sentences to include the importance of metabolic redundancies. “In any of these transgenic mouse models, PRs may use other transporters to take up fatty acids or glucose, rewire their metabolism, or utilize metabolic redundancies to maintain metabolic homeostasis and stave off degeneration (Subramanya et al., 2023; Wubben et al., 2017). Our data show that any metabolic reprogramming that is occurring in the cKO mouse retina appears unable to significantly circumvent the significant and rapid PR degeneration suggesting the importance of Gln catabolism in rod PRs. Furthermore, inducing GLS knockdown in mature PRs also demonstrated rapid PR degeneration (Figure 3).”

      (2) Please discuss the mosaic activity of Rho-cre used in this study, as described in the original study (Le et al 2006). Line 221 (Li et al 2005) seems to be a different Rho-Cre created by a different group. Please make sure the citation is correct and consistent.

      We apologize for the confusion and have corrected the reference on line 221 to Le et al, 2006. The reviewer is correct that the original report (Le at al. 2006) demonstrated a mosaic of Cre-mediated recombination in rod photoreceptors and rod bipolar cells in the mouse line that had the shorter (0.2 kb) mouse opsin promoter-controlled Cre. In contrast, this same report showed only Cre-mediated recombination in rod photoreceptors in another line that utilized a long (4.1 kb) mouse opsin promoter-controlled Cre. We have published using this latter promoter-controlled Cre recombinase in at least 5 different mouse models (Wubben et al. 2017; Weh et al. 2020; Weh et al. 2023; Subramanya et al. 2023; the current report), and in all these models, we observe clear and consistent knockout by immunofluorescence only in rod photoreceptors with residual protein in cones and no significant change in protein expression in the INL where bipolar cells reside. Western blots confirm the reduction in protein expression.

      (3) The authors should provide representative images of retina cross-sections for key rescue data (Figure 6G&H).

      As requested by Reviewer 3, representative histology images of retina cross-sections for the ISRIB and Asn rescue experiments in Gls cKO mice at P21 are now included in the manuscript in Figure 6 – figure supplement 3.

      Minor corrections to the text and figures.

      (1) Spell out Gln in the Abstract when used for the first time.

      We have included glutamine (Gln) in the abstract upon first use.

      (2) Line 433, Figure 6G should be 6H.

      Thank you for the correction, the manuscript has been updated.

    1. eLife Assessment

      This fundamental study provides a critical challenge to a great many studies of the neural correlates of consciousness that were based on post hoc sorting of reported awareness experience. The evidence supporting this criticism is compelling, based on simulations and decoding analysis of EEG data. The results will be of interest not only to psychologists and neuroscientists but also to philosophers who work on addressing mind-body relationships.

    2. Reviewer #2 (Public review):

      Summary:

      The study investigates the potential influence of the response criterion on neural decoding accuracy in consciousness and unconsciousness, utilizing either simulated data or reanalyzing experimental data with post-hoc sorting data.

      Strengths:

      When comparing the neural decoding performance of Target versus NonTarget with or without post-hoc sorting based on subject reports, it is evident that response criterion can influence the results. This was observed in simulated data as well as in two experiments that manipulated the subject response criterion to be either more liberal or more conservative. One experiment involved a two-level response (seen vs unseen), while the other included a more detailed four-level response (ranging from 0 for no experience to 3 for a clear experience). The findings consistently indicated that adopting a more conservative response criterion could enhance neural decoding performance, whether in conscious or unconscious states, depending on the sensitivity or overall response threshold.

      The uneven distribution of trails for Target (75%) and NonTarget (25%) was identified as a potential weakness in the initial review of this study. Nevertheless, we support the authors' assertion that their analysis methodology validates comparing liberal and conservative approaches. Future investigations could further explore differences between liberal and conservative on different ratios of Target vs NonTarget, particularly when the proportion of Target matches or falls below that of NonTarget.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The study aimed to investigate the significant impact of criterion placement on the validity of neural measures of consciousness, examining how different standards for classifying a stimulus as 'seen' or 'unseen' can influence the interpretation of neural data. They conducted simulations and EEG experiments to demonstrate that the Perceptual Awareness Scale, a widely used tool in consciousness research, may not effectively mitigate criterion-related confounds, suggesting that even with the PAS, neural measures can be compromised by how criteria are set. Their study challenged existing paradigms by showing that the construct validity of neural measures of conscious and unconscious processing is threatened by criterion placement, and they provided practical recommendations for improving experimental designs in the field. The authors' work contributes to a deeper understanding of the nature of conscious and unconscious processing and addresses methodological concerns by exploring the pervasive influence of criterion placement on neural measures of consciousness and discussing alternative paradigms that might offer solutions to the criterion problem.

      The study effectively demonstrates that the placement of criteria for determining whether a stimulus is 'seen' or 'unseen' significantly impacts the validity of neural measures of consciousness. The authors found that conservative criteria tend to inflate effect sizes, while liberal criteria reduce them, leading to potentially misleading conclusions about conscious and unconscious processing. The authors employed robust simulations and EEG experiments to demonstrate the effects of criterion placement, ensuring that the findings are well-supported by empirical evidence. The results from both experiments confirm the predicted confounding effects of criterion placement on neural measures of unconscious and conscious processing.

      The results are consistent with their hypotheses and contribute meaningfully to the field of consciousness research.

      We would like to thank reviewer 1 for their positive words and for taking the time to evaluate our manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study investigates the potential influence of the response criterion on neural decoding accuracy in consciousness and unconsciousness, utilizing either simulated data or reanalyzing experimental data with post-hoc sorting data.

      Strengths:

      When comparing the neural decoding performance of Target versus NonTarget with or without post-hoc sorting based on subject reports, it is evident that response criterion can influence the results. This was observed in simulated data as well as in two experiments that manipulated subject response criterion to be either more liberal or more conservative. One experiment involved a two-level response (seen vs unseen), while the other included a more detailed four-level response (ranging from 0 for no experience to 3 for a clear experience). The findings consistently indicated that adopting a more conservative response criterion could enhance neural decoding performance, whether in conscious or unconscious states, depending on the sensitivity or overall response threshold.

      Weaknesses:

      (1) In the realm of research methodology, conducting post-hoc sorting based on subject reports raises an issue. This operation leads to an imbalance in the number of trials between the two conditions (Target and NonTarget) during the decoding process. Such trial number disparity introduces bias during decoding, likely contributing to fluctuations in neural decoding performance. This potential confounding factor significantly impacts the interpretation of research findings. The trial number imbalance may cause models to exhibit a bias towards the category with more trials during the learning process, leading to misjudgments of neural signal differences between the two conditions and failing to accurately reflect the distinctions in brain neural activity between target and non-target states. Therefore, it is recommended that the authors extensively discuss this confounding factor in their paper. They should analyze in detail how this factor could influence the interpretation of results, such as potentially exaggerating or diminishing certain effects, and whether measures are necessary to correct the bias induced by this imbalance to ensure the reliability and validity of the research conclusions.

      We would like to thank reviewer 2 for their positive words and for taking the time to evaluate our manuscript. In response to this asserted weakness, we would like to point out that the issue of trial imbalances was already comprehensively addressed in the manuscript. No trial imbalances are present in the analyzed data for any of the conditions, so that none of our reported results could have been impacted by this. This was done through the following set of measures:

      (1) Training data (method section): “a linear discriminant analytic (LDA) classifier was trained for each participant using all trials from all sessions (3 sessions in Experiment 1, 2 sessions in Experiment 2) to discriminate target from no-target trials based on EEG data, irrespective of seen/unseen responses and irrespective of the response criterion. To maximize signal-to-noise ratio, we applied a leave-one-person-out cross validated decoding scheme by using all classifiers from all participants except the participants that was being tested (separately for Experiment 1 and for Experiment 2). This leave-one-person-outcross validation procedure maximized the available data for training without requiring k-foldingon subsets of cells with low response counts, so that all test sets were classified by the same fully independent classifiers. A single time series of classification performance across time was obtained for every participant (every testing set) by averaging classification performance across all classifiers that tested that set (see Methods and supplementary Figure S2 for details).”<br /> This leave-one-person-outcross validation scheme made surre that no trial selection needed to be performed to analyze conservative or liberal conditions. Both conditions were classified using the same classifier, consisting of all data from the other participants.

      (2) Testing data (methods section): “To ensure that differences resulting from post hoc sorting could not be explained by differences in signal-to-noise ratio resulting from disparities in trial counts in the testing set, we equated trial counts between the liberal and conservative condition within each participant by randomly selecting the same number of trials from overrepresented cells (for Experiment 1, this was done at the level of ‘seen’ and ‘unseen’ responses, for experiment 2 the trial counts were equated at eachof the PAS levels, see methods for details). As a result, response-contingent conditions in the liberal and conservative conditions had identical input for all classification analyses. Although different trial counts in the testing set might affect the precision with which AUC is estimated in a decoding analysis, it does not affect the size of AUC itself. Trial count equation was merely performed tomake sure the liberal and conservative condition were as comparable as possible.”

      Indeed, we also report at the end of this section that running the same analyses without selecting trials in the test set yielded qualitatively identical results: “Analyzing the data without equating trial counts resulted in qualitatively identical results.”

      To remove any lack of clarity about this, we now also briefly report in the beginning of the discussion section that the results cannot be explained by unequal trial counts:

      “We found that in both experiments, criterion shifts modulated effect size in neural measures of ‘unconscious’ (unseen) and/or ‘conscious’ (seen) processing, and that this happens even though the conservative and liberal condition used the same independent training data (identical classifiers), and even though the trial counts in the test sets were equated for the conservative and liberal condition.”

      Reviewer #3 (Public review):

      Summary:

      Fahrenfort et al. investigate how liberal or conservative criterion placement in a detection task affects the construct validity of neural measures of unconscious cognition and conscious processing. Participants identified instances of "seen" or "unseen" in a detection task, a method known as post hoc sorting. Simulation data convincingly demonstrate that, counterintuitively, a conservative criterion inflates effect sizes of neural measures compared to a liberal criterion. While the impact of criterion shifts on effect size is suggested by signal detection theory, this study is the first to address this explicitly within the consciousness literature. Decoding analysis of data from two EEG experiments further shows that different criteria lead to differential effects on classifier performance in post hoc sorting. The findings underscore the pervasive influence of experimental design and participant reports on neural measures of consciousness, revealing that criterion placement poses a critical challenge for researchers.

      Strengths and Weaknesses

      One of the strengths of this study is the inclusion of the Perceptual Awareness Scale (PAS), which allows participants to provide more nuanced responses regarding their perceptual experiences. This approach ensures that responses at the lowest awareness level (selection 0) are made only when trials are genuinely unseen. This methodological choice is important as it helps prevent the overestimation of unconscious processing, enhancing the validity of the findings.

      The authors also do a commendable job in the discussion by addressing alternative paradigms, such as wagering paradigms, as a possible remedy to the criterion problem (Peters & Lau, 2015; Dienes & Seth, 2010). Their consideration of these alternatives provides a balanced view and strengthens the overall discussion.

      Our initial review identified a lack of measures of variance as one potential weakness of this work. However we agree with the authors' response that plotting individual datapoints for each condition is indeed a good visualization of variance within a dataset.

      Impact of the Work:

      This study effectively demonstrates a phenomenon that, while understood within the context of signal detection theory, has been largely unexplored within the consciousness literature. Subjective measures may not reliably capture the construct they aim to measure due to criterion confounds. Future research on neural measures of consciousness should account for this issue, and no-report measures may be necessary until the criterion problem is resolved.

      We thank reviewer 3 for their positive words and for taking the time to evaluate our manuscript.

    1. eLife Assessment

      This is a valuable study describing how rhabdomyosarcoma fusion-oncogenes, VGLL2-NCOA2 and TEAD1-NCOA2, function at the genomic, transcriptional, and proteomic levels in multiple systems. The experimental data is convincing, supporting a model in which these fusion-oncogenes leverage TEAD transcriptional signatures independent of YAP/TAZ. This work offers new mechanistic insights into oncogenic gene fusion events and reveals potential therapeutic strategies for the treatment of rhabdomyosarcomas.

    2. Reviewer #1 (Public review):

      Guo, Hue et al., is focused on understanding the epigenetic activity and functional dependencies for two different fusions found in spindle cell rhabdomyosarcoma, VGLL2::NCOA2 and TEAD1::NCOA2. They use a variety of models and methods; specifically, ectopic expression of the fusions in human 293T cells to perform RNAseq (both fusions), CUT&RUN (VGLL2::NCOA2) and BioID mass spec (both fusions). These data identify that the VGLL2::NCOA2 fusion has peaks that are enriched for TEAD motifs. Further, CPB/p300 CUT&RUN support an enrichment of binding sites and three TEAD targets in VGLL2::NCOA2 and TEAD1::NCOA2 expressing cells. They also functionally evaluate genetic and chemical dependencies (TEAD inhibition), and found this was only effective for the VGLL2::NCOA2 fusion, and not for TEAD1::NCOA2. Using complementary biochemical approaches, they suggest (with other supporting data) the fusions regulate TEAD transcriptional outputs via a YAP/TAZ independent mechanism. Further, they expand into a C2C12 myoblast model and show that TEAD1::NCOA2 is transforming in colony formation assays and in mouse allograft. These strategies for TEAD1-NCOA2 are consistent with previous published strategies using VGLL2::NCOA2. Importantly, they show that a CBP/p300 (a binding partner found in their BioID mass spec) small molecule inhibitor suppresses tumor formation using this mouse allograft model, and that the tumors are less proliferative, and have a reduction in transcriptional of three TEAD target genes. They complement in vivo data with biochemical approaches, and suggest this interface with p300 (for VGLL2::NCOA2) is through the NCOA2 fusion partner, as Co-IP in HEK293T with a mutant fusion that does not contain NCOA2 loses the association with endogenous p300. The data is interesting and suggests new biology for these fusion-oncogenes. However, the choice of 293T may limit the broad applicability of the findings. Strikingly, in 293T there was more transcriptional overlap with the VGLL2-NCOA2 fusion with the YAP5SA mutant than with TEAD1-NCOA2. Further, there is an additional opportunity to directly compare transcriptional profiles in 293T to the human disease and in the mouse allograft system to directly compare and discuss VGLL2-NCOA2 and TEAD1-NCOA2 histological differences or how A485 treatment may change the histology. Overall, the breadth of methods used in this study, and comparison of the two fusion-oncogene's biology is of interest to the fusion-oncogene, pediatric sarcoma, and epigenetic therapeutic targeting fields.

    3. Reviewer #2 (Public review):

      In the manuscript entitled "VGLL2 and TEAD1 fusion proteins drive YAP/TAZ-independent transcription and tumorigenesis by engaging p300", Gu et al. investigated two Hippo pathway-related gene fusion events (i.e., VGLL2-NCOA2, TEAD1-NCOA2) in spindle cell rhabdomyosarcoma (scRMS). They demonstrate that these fusion proteins activate Hippo downstream gene transcription independently of YAP/TAZ. Using BioID-based mass spectrometry analysis, the authors identify histone acetyltransferase CBP/p300 as a specific binding protein for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Pharmacologically targeting p300 inhibits the fusion proteins-induced Hippo downstream gene transcription and tumorigenesis.

      Overall, this work provides novel mechanistic insights into scRMS-associated gene fusions in tumorigenesis and reveals potential therapeutic targets for cancer treatment. The manuscript is well-written and easy to follow. Below are a few comments based on the revised study.

      (1) While the study majorly focuses on Hippo downstream gene transcription, a significant portion of genes regulated by the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins are non-Hippo downstream genes (Fig. 3). Further characterization of how both Hippo and non-Hippo downstream genes contribute to fusion proteins-induced oncogenesis would enhance our understanding of scRMS etiology.

      (2) A potential limitation of this study is the reliance on overexpression approaches to investigate VGLL2-NCOA2 and TEAD1-NCOA2 fusion genes, which may not fully reflect pathological conditions in scRMS patients. Despite this, the significant study offers valuable mechanistic insights into fusion genes-induced scRMS and provides molecular foundation for developing targeted therapies.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      (1) The rationale for performing genomics, transcriptional, and proteomics work in 293T cells is not discussed. Further, there are no functional readouts mentioned in the 293T cells with expression of the fusion-oncogenes. Did these cells have any phenotypes associated with fusion-oncogene expression (proliferation differences, morphological changes, colony formation capacity)? Further, how similar are the gene expression signatures from RNA-seq to rhabdomyosarcoma? This would help the reader interpret how similar these cell models are to human disease.

      We appreciate the reviewer’s comments and understand the limitation of HEK293T cell culture. HEK293T cells were used as a surrogate system that enabled us to systemically examine and compare the transcriptional activation mechanisms between VGLL2-NCOA2/TEAD1-NCOA2 and YAP/TAZ. HEK293T cells have previously been used as a model system to study the signaling and transcriptional mechanisms of the Hippo/YAP pathway (1,2). Our data also showed that the ectopic expression of VGLL2-NCOA2 and TEAD1-NCOA2 in HEK293 cells can promote proliferation (Figure 1-figure supplement 1B), consistent with their potential oncogenic function.

      (2) TEAD1::NCOA2 fusion-oncogene model was not credentialed past H&E, and expression of Desmin. Is the transcriptional signature in C2C12 or 293T similar to a rhabdomyosarcoma gene signature?

      We understand the reviewer’s concern. VGLL2-NCOA2 in vivo tumorigenesis model generated by C2C12 cell orthotopic transplantation has recently been reported, and it exhibits similar characteristics with zebrafish transgenic tumors as well as human scRMS samples that carry the VGLL2-NCOA2 fusion (3). Due to the similar transcriptional and oncogenic mechanisms employed by both VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins, we expect that the TEAD1-NCOA2 dependent C2C12 transplantation model will closely resemble that induced by VGLL2-NCOA2.

      (3) For the fusion-oncogenes, did the HA, FLAG, or V5 tag impact fusion-oncogene activity? Was the tag on the 3' or 5' of the fusion? This was not discussed in the methods.

      To address the reviewer’s concern, we carefully compared the transcriptional activity of the fusion proteins with the HA tag at the 5’ end or FLAG and V5 tag at the 3’ end. We found that neither the tag type nor its location significantly affects the ability of VGLL2-NCOA2 and TEAD1-NCOA2 to induce downstream gene transcription, measured by qPCR. The data is summarized in Figure 1-figure supplement 1 G-H.

      (4) Generally, the lack of details in the figures, figure legends, and methods make the data difficult to interpret. A few examples are below:

      a. Individual data points are not shown for figure bar plots (how many technical or biological replicates are present and how many times was the experiment repeated?).

      As requested, we have added the individual data points to the bar plots. The Method section now includes information on the number of biological replicates and the times the experiments were repeated.

      b. What exons were included in the fusion-oncogenes from VGLL2 and NCOA2 or TEAD1 and NCOA2?

      We have now included the exon structure organization of VGLL2-NCOA2 or TEAD1-NCOA2 fusions in Figure 1-figure supplement 1A.

      c. For how long were the colony formation experiments performed? Two weeks?

      We have included more detailed information about the colony formation assay in the Methods section.

      d. In Figure 2D, what concentration of CP1 was used and for how long?

      The CP1 concentration and treatment duration information has now been included in the figure legend and Methods section.

      e. How was A485 resuspended for cell culture and mouse experiments, what is the percentage of DMSO?

      The Methods section now includes detailed information on how A485 is prepared for in vitro and in vivo experiments.

      f. How many replicates were done for RNA-seq, CUT&RUN, and ATACseq experiments?

      RNA-seq was done with three biological replicates and CUT&RUN and ATAC-seq were performed with two biological replicates. This information is now included in the Methods section for clarification.

      Reviewer #2 (Public Review):

      In the manuscript entitled "VGLL2 and TEAD1 fusion proteins drive YAP/TAZ-independent transcription and tumorigenesis by engaging p300", Gu et al. studied two Hippo pathway-related gene fusion events (i.e., VGLL2-NCOA2, TEAD1-NCOA2) in spindle cell rhabdomyosarcoma (scRMS) and showed that their fusion proteins can activate Hippo downstream gene transcription independent of YAP/TAZ. Using the BioID-based mass spectrometry analysis, the authors revealed histone acetyltransferase CBP/p300 as specific binding proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Pharmacologically targeting p300 inhibited the fusion proteins-induced Hippo downstream gene transcription and tumorigenic events.

      Overall, this study provides mechanistic insights into the scRMS-associated gene fusions in tumorigenesis and reveals potential therapeutic targets for cancer treatment. The manuscript is well-written and easy to follow.

      Here, several suggestions are made for the authors to improve their study.

      Main points

      (1) The authors majorly focused on the Hippo downstream gene transcription in this study, while a significant portion of genes regulated by the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins are non-Hippo downstream genes (Figure 3). The authors should investigate whether the altered Hippo pathway transcription is essential for VGLL2-NCOA2 and TEAD1-NCOA2-induced cell transformation and tumorigenesis. Specifically, they should test if treatment with the TEAD inhibitor can reverse the cell transformation and tumorigenesis caused by VGLL2-NCOA2 but not TEAD1-NCOA2. In addition, it is important to examine whether YAP-5SA expression can rescue the inhibitory effects of A485 on VGLL2-NCOA2 and TEAD1-NCOA2-induced colony formation and tumor growth. This will help clarify whether Hippo downstream gene transcription is important for the oncogenic activities of these two fusion proteins.

      We thank the reviewer for the comments. Although we have not tested the small molecular TEAD inhibitor on VGLL2-NCOA2 or TEAD1-NCOA2-induced cell transformation and tumorigenesis, we expect that TEAD inhibition will block VGLL2-NCOA2- but not TEAD1-NCOA2-induced oncogenic activity. It is because TEAD1-NCOA2 does not contain the auto-palmitoylation sites and the hydrophobic pocket in the C-terminal YAP-binding domain of TEAD1 that the TEAD small molecule inhibitor occupies (4). We also appreciate the reviewer’s suggestion of YAP5SA rescue experiments. However, due to its strong oncogenic activity, YAP5SA itself can induce robust downstream transcription and cell transformation with or without A485 treatment, as shown in Figure 5. Thus, it will be unlikely to address whether non-Hippo downstream genes induced by the fusions are important for cell transformation and tumorigenesis. Because of the distinct nature of transcriptional and chromatin landscapes controlled by VGLL2-NCOA2/TEAD-NCOA2 and YAP, we speculate that both Hippo and non-Hippo-related downstream genes contribute to the oncogenic activation and tumor phenotypes induced by the fusion proteins.

      (2) Rationale for selecting CBP/p300 for functional studies needs to be provided. The BioID-MS experiment identified many interacting proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins (Table S4). The authors should explain the scoring system used to identify the high-interacting proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Was CEP/p300 the top candidates on the list? Providing this information will help justify the focus on CBP/p300 and validate their importance in this study.

      We appreciate the reviewer’s point. CBP/P300 is among the top hits in our proteomics screens of both VGLL2-NCOA2 and TEAD1-NCOA2. Our focus on CBP/P300 is mainly due to the well-established interactions between CBP/P300 and the NCOA family transcriptional co-activators, in which the CBP/P300-NCOA complex plays a central role in mediating nuclear receptors-induced transcriptional activation (5). In addition, our data is consistent with another re-current Vgll2 fusion identified in scRMS, VGLL2-CITED2 (6) that has a C-term fusion partner from CITED2, which is a known CBP/P300 interacting protein (7).

      (3) p300 was revealed as a key driver for the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins-induced transcriptome alteration and tumorigenesis. To strengthen the point, the authors should identify the p300 binding region on VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Mutants with defects in p300 binding/recruitment should be generated and included as a control in the related q-PCR and tumorigenic studies. This work will help confirm the crucial role of p300 in mediating the oncogenic effects of these two fusion proteins.

      We thank the reviewer for the suggestion. We have performed the co-immunoprecipitation assay using the deletion mutant form of VGLL2-NCOA2. We have performed additional co-immunoprecipitation experiments and demonstrated that the C-term NCOA2 part of the fusion is responsible for mediating the interaction between the fusion protein and CBP/P300. These results are now included in the new Figure 5A and are consistent with the reported structural analysis of CBP/P300-NCOA complex (8). In addition, our new data showed the inability of the VGLL2-NCOA2 ∆NCOA2 mutant to induce gene transcription (Figure 1-figure supplement 1D). Furthermore, our data using the small molecular CBP/P300 inhibitor clearly demonstrated that CBP/P300 is required to mediate cell transformation and tumorigenesis induced by the two fusion proteins in vitro and in vivo (Figure 5 and 6).

      (4) Another major issue is the overexpression system extensively used in this study. It is important to determine whether the VGLL2-NCOA2 and TEAD1-NCOA2 fusion genes are also amplified in cancer. If not, the expression levels of the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins should be adjusted to endogenous levels to assess their oncogenic effects on gene transcription and tumorigenesis. This approach would make the study more relevant to the pathological conditions observed in scRMS cancer patients.

      We appreciate the reviewer’s input and acknowledge the limitation of the HEK293T and C2C12 cell-based models that rely on ectopic expression of VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. It is currently unclear whether the VGLL2-NCOA2 and TEAD1-NCOA2 fusion genes are also amplified in sarcoma. As mentioned before, these surrogate cell culture systems allowed us to systemically compare the transcriptional regulation by the fusion proteins and YAP/TAZ and elucidate the molecular mechanism underlying the Hippo/YAP-independent oncogenic transformation induced by VGLL2-NCOA2 and TEAD1-NCOA2.

      References:

      (1) Genes Dev . 2007 Nov 1;21(21):2747-61. doi: 10.1101/gad.1602907. Inactivation of YAP oncoprotein by the Hippo pathway is involved in cell contact inhibition and tissue growth control

      (2) Genes Dev . 2010 Jan 1;24(1):72-85. doi: 10.1101/gad.1843810. A coordinated phosphorylation by Lats and CK1 regulates YAP stability through SCF(beta-TRCP)

      (3) VGLL2-NCOA2 leverages developmental programs for pediatric sarcomagenesis. Watson S, LaVigne CA, Xu L, Surdez D, Cyrta J, Calderon D, Cannon MV, Kent MR, Cell Rep. 2023 Jan 31;42(1):112013.

      (4) Lats1/2 Sustain Intestinal Stem Cells and Wnt Activation through TEAD-Dependent and Independent Transcription. Cell Stem Cell. 2020 May 7;26(5):675-692.e8.

      (5) Yi, P., Yu, X., Wang, Z., and O’Malley, B.W. (2021). Steroid receptor-coregulator transcriptional complexes: new insights from CryoEM. Essays Biochem. 65, 857–866.

      (6) A Molecular Study of Pediatric Spindle and Sclerosing Rhabdomyosarcoma: Identification of Novel and Recurrent VGLL2-related Fusions in Infantile Cases. Am J Surg Pathol . 2016 Feb;40(2):224-35. doi: 10.1097/

      (7) CITED2 and the modulation of the hypoxic response in cancer. Fernandes MT, Calado SM, Mendes-Silva L, Bragança J.World J Clin Oncol. 2020 May 24;11(5):260-274.

      (8) Yu, X., Yi, P., Hamilton, R.A., Shen, H., Chen, M., Foulds, C.E., Mancini, M.A., Ludtke, S.J., Wang, Z., and O’Malley, B.W. (2020). Structural insights of transcriptionally active, full-length Androgen receptor coactivator complexes. Mol. Cell 79, 812–823.e4.

    1. eLife Assessment

      This important study substantially expands observations of HERV expression in the clinical settings. The evidence provided by the authors that HERV activity is an underlying etiological factor in ME/CFS and fibromyalgia is compelling and suggests further investigation into mechanisms. This work will be of broad interest to clinicians and researchers alike.

    2. Reviewer #1 (Public review):

      Summary:

      Giménez-Orenga et al. investigate the origin and pathophysiology of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and fibromyalgia (FM). Using RNA microarrays, the authors compare the expression profiles and evaluate the biomarker potential of human endogenous retroviruses (HERV) in these two conditions. Altogether, the authors show that HERV expression is distinct between ME/CFS and FM patients, and HERV dysregulation is associated with higher symptom intensity in ME/CFS. HERV expression in ME/CFS patients is associated with impaired immune function and higher estimated levels of plasma cells and resting CD4 memory T cells. This work provides interesting insights into the pathophysiology of ME/CFS and FM, creating opportunities for several follow-up studies.

      Strengths:

      (1) Overall, the data is convincing and supports the authors' claims. The manuscript is clear and easy to understand, and the methods are generally well-detailed. It was quite enjoyable to read.<br /> (2) The authors combined several unbiased approaches to analyse HERV expression in ME/CFS and FM. The tools, thresholds, and statistical models used all seem appropriate to answer their biological questions.<br /> (3) The authors propose an interesting alternative to diagnosing these two conditions. Transcriptomic analysis of blood samples using an RNA microarray could allow a minimally invasive and reproducible way of diagnosing ME/CFS and FM.

      Weakness:<br /> (1) While this work makes several intriguing observations, some results will need to be validated in future studies using experimental approaches.

    3. Reviewer #2 (Public review):

      Summary:

      Giménez-Orenga carried out this study to assess whether human endogenous retroviruses (HERVs) could be used to improve the diagnosis of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and Fibromyalgia (FM). To this end, they used the HERV-V3 array developed previously, to characterize the genome-wide changes in expression of HERVs in patients suffering from ME/CFS, FM or both, compared to controls. In turn, they present a useful repertoire of HERVs that might characterize ME/CFS and FM. For most part, the paper is written in a manner that allows a natural understanding of the workflow and analyses carried out, making it compelling. The figures and additional tables presents solid support for the findings. However, some statements made by the authors seem incomplete and would benefit by a more thorough literature review. Overall, this work will be of interest to the medical community seeking in better understanding the co-occurrence of these pathologies, hinting at a novel angle by integrating HERVs, which are often overlooked, into their assessment.

      Strengths:

      - The work is well-presented, allowing the reader to understand the overall workflow and how the specific aims contribute to filling the knowledge gap in the field.

      - The analyses carried out to understand the potential impact on gene expression mediated by HERVs are in line with previous works, making it solid and robust in the context of this study.

      Weaknesses:

      - The authors claim to obtain genome-wide HERV expression profiles. However, the array used was developed using hg19, while the genomic analysis of this work are carried out using a liftover to hg38. It would improve the statement and findings to include a comparation of the differences in HERVs available in hg38, and how this could impact the "genome-wide" findings.

      - The authors in some points are not thorough with the cited literature. Two examples are:<br /> (1) Lines 396-397 the authors say "the MLT1, usually found enriched near DE genes (Bogdan et al., 2020)". I checked the work by Bogdan, and they studied bacterial infection. A single work in a specific topic is not sufficient to support the statement that MLT1 is "usually" in close vicinity to differentially expressed genes. More works are needed to support this.<br /> (2) After the previous statement, the authors go on to mention "contributing to the coding of conserved lncRNAs (Ramsay et al., 2017)". First, lnc = long non-coding, so this doesn't make sense. Second, in the work by Ramsay they mention "that contributed a significant amount of sequence to primate lncRNAs whose expression was conserved", which is different to what the authors in this study are trying to convey. Again, additional work and a rephrasing might help to support this idea.

      - When presenting the clusters, the authors overlook the fact that cluster 4 is clearly control-specific, and fail to discuss what this means. Could this subset of HERV be used as bona fide markers of healthy individuals in the context of these diseases? Are they associated with DE genes? What could be the impact of such associations?

      Appraisals on aims:

      The authors set specific questions and presented the results to successfully answer them. The evidence is solid, with some weaknesses discussed above that will methodologically strengthen the work.

      Likely impact of work on the field:<br /> This work will be of interest to the medical community looking for novel ways to improve clinical diagnosis. Although future works with a greater population size, and more robust techniques such as RNA-Seq, are needed, this is the first step in presenting a novel way to distinguish these pathologies.

      It would be of great benefit to the community to provide a table/spreadsheet indicating the specific genomic locations of the HERVs specific to each condition. This will allow proper provenance for future researchers interesting in expanding on this knowledge, as these genomic coordinates will be independent of the technique used (as was the array used here).

      Comments on revisions:

      When addressing the comments made in the previous round, there are some answers that lack substance and don't seem to be incorporated in the manuscript. For example, the authors say:

      Authors' response: This is an important point. However, the low number of probes (less than 100) that were excluded from our analysis by lack of correspondence with hg38 among the 1,290,800 probesets was interpreted as insignificant for "genome-wide" claims. An aspect that will be explained in the revised version of this manuscript.

      I checked the revised manuscript with tracked changes, and there doesn't seem to be an updated explanation to this. In which lines is this explained?

      For the other response:

      Authors' response: Using control DE HERV as bona fide markers of healthy individuals seems like an interesting possibility worth exploring. Control DE HERV (cluster 4) associate with DE genes involved in apoptosis, T cell activation and cell-cell adhesion (modules 1 and 6). The impact of which deserves further study.

      I couldn't find an updated mention of this in the discussion.

      Another point that I raised was regarding the decision of using an FDR of 0.1 instead of 0.05. The authors only speculate about the impacts in their answer, while I believe that this could have been rigorously addressed. Since this was done in R, and DE analysis are relatively fast, I don't see a reason as to why this part was not repeated and discussed accordingly.

      For other analyses, there doesn't seem to be a problem with using 0.05 as threshold. Examples of this are the "Overrepresentation functional analysis", or the "Statistical analysis" part of the methods they say "we used a Fisher exact test to calculate p-value, considering enriched in the provided list if an adjusted p-value (FDR) was less than 0.05".

      Just to make this point clear: I'm not asking the authors to repeat all the work using the 0.05 FDR threshold, but rather that they are aware and conscious about the impact of this, and give an idea to the audience on how it would change the DE numbers. This would put in perspective the findings to any future reader.

      I think that most of the other answers to both my previous concerns and the other reviewer's concerns are ok. My last outstanding concern is that the probe coordinates apparently can't be shared, which undermines a lot this study reproducibility, and its use by future researches which won't be able to compare their results to this study.

    4. Reviewer #3 (Public review):

      Summary:

      The authors find that HERV expression patterns can be used as new criteria for differential diagnosis of FM and ME/CFS and patient subtyping. The data are based on transcriptome analysis by microarray for HERVs using patient blood samples, followed by differential expression of ERVs and bioinformatic analyses. This is a standard and solid data processing pipeline, and the results are well presented and support the authors' claim.

      Strengths:

      It provides an innovative diagnostic approach using ERV profiles to subtype patients and distinguish FM and ME/CFS.

      Comments on revisions:

      This is a revised manuscript which addresses the comments well.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Giménez-Orenga et al. investigate the origin and pathophysiology of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and fibromyalgia (FM). Using RNA microarrays, the authors compare the expression profiles and evaluate the biomarker potential of human endogenous retroviruses (HERV) in these two conditions. Altogether, the authors show that HERV expression is distinct between ME/CFS and FM patients, and HERV dysregulation is associated with higher symptom intensity in ME/CFS. HERV expression in ME/CFS patients is associated with impaired immune function and higher estimated levels of plasma cells and resting CD4 memory T cells. This work provides interesting insights into the pathophysiology of ME/CFS and FM, creating opportunities for several follow-up studies.

      Strengths:

      (1) Overall, the data is convincing and supports the authors' claims. The manuscript is clear and easy to understand, and the methods are generally well-detailed. It was quite enjoyable to read.

      (2) The authors combined several unbiased approaches to analyse HERV expression in ME/CFS and FM. The tools, thresholds, and statistical models used all seem appropriate to answer their biological questions.

      (3) The authors propose an interesting alternative to diagnosing these two conditions. Transcriptomic analysis of blood samples using an RNA microarray could allow a minimally invasive and reproducible way of diagnosing ME/CFS and FM.

      Weaknesses:

      (1) The cohort analysed in this study was phenotyped by a single clinician. As ME/CFS and FM are diagnosed based on unspecific symptoms and are frequently misdiagnosed, this raises the question of whether the results can be generalised to external cohorts.

      Thank you for your comment. Surely the study of larger cohorts will determine the external validity of these results in a clinical scenario. However, this pilot study, first of its kind, was designed to maximize homogeneity across participants which seemed primarily ensured by the study of females only and diagnosis by a single experienced observer.

      (2) The analyses performed to unravel the causes and effects of HERV expression in ME/CFS and FM are solely based on sequencing data. Experimental approaches could be used to validate some of the transcriptomic observations.

      Certainly, experimental approaches may add robustness to the implication of HERVs in ME/CFS. We indeed consider taking this avenue to deepen in the findings presented here for future work. However, the limited knowledge of HERV-mediated physiological functions may hamper the obtention of prompt results towards revealing causes and effects of HERV expression in ME/CFS and FM.

      Reviewer #2 (Public review):

      Summary:

      Giménez-Orenga carried out this study to assess whether human endogenous retroviruses (HERVs) could be used to improve the diagnosis of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and Fibromyalgia (FM). To this end, they used the HERV-V3 array developed previously, to characterize the genome-wide changes in the expression of HERVs in patients suffering from ME/CFS, FM, or both, compared to controls. In turn, they present a useful repertoire of HERVs that might characterize ME/CFS and FM. For the most part, the paper is written in a manner that allows a natural understanding of the workflow and analyses carried out, making it compelling. The figures and additional tables present solid support for the findings. However, some statements made by the authors seem incomplete and would benefit from a more thorough literature review. Overall, this work will be of interest to the medical community seeking in better understanding of the co-occurrence of these pathologies, hinting at a novel angle by integrating HERVs, which are often overlooked, into their assessment.

      Strengths:

      (1) The work is well-presented, allowing the reader to understand the overall workflow and how the specific aims contribute to filling the knowledge gap in the field.

      (2) The analyses carried out to understand the potential impact on gene expression mediated by HERVs are in line with previous works, making it solid and robust in the context of this study.

      Weaknesses:

      (1) The authors claim to obtain genome-wide HERV expression profiles. However, the array used was developed using hg19, while the genomic analysis of this work are carried out using a liftover to hg38. It would improve the statement and findings to include a comparison of the differences in HERVs available in hg38, and how this could impact the "genome-wide" findings.

      This is an important point. However, the low number of probes (less than 100) that were excluded from our analysis by lack of correspondence with hg38 among the 1,290,800 probesets was interpreted as insignificant for "genome-wide" claims. An aspect that will be explained in the revised version of this manuscript.

      (2) The authors in some points are not thorough with the cited literature. Two examples are:

      a) Lines 396-397 the authors say "the MLT1, usually found enriched near DE genes (Bogdan et al., 2020)". I checked the work by Bogdan, and they studied bacterial infection. A single work in a specific topic is not sufficient to support the statement that MLT1 is "usually" in close vicinity to differentially expressed genes. More works are needed to support this.

      b) After the previous statement, the authors go on to mention "contributing to the coding of conserved lncRNAs (Ramsay et al., 2017)". First, lnc = long non-coding, so this doesn't make sense. Second, in the work by Ramsay they mention "that contributed a significant amount of sequence to primate lncRNAs whose expression was conserved", which is different from what the authors in this study are trying to convey. Again, additional work and a rephrasing might help to support this idea.

      Certainly, these two sentences need rephrasing to better adjust to current evidence.

      Revised sentences can now be found in lines 397-402

      (3) When presenting the clusters, the authors overlook the fact that cluster 4 is clearly control-specific, and fail to discuss what this means. Could this subset of HERV be used as bona fide markers of healthy individuals in the context of these diseases? Are they associated with DE genes? What could be the impact of such associations?

      Using control DE HERV as bona fide markers of healthy individuals seems like an interesting possibility worth exploring. Control DE HERV (cluster 4) associate with DE genes involved in apoptosis, T cell activation and cell-cell adhesion (modules 1 and 6). The impact of which deserves further study.

      Appraisals on aims:

      The authors set specific questions and presented the results to successfully answer them. The evidence is solid, with some weaknesses discussed above that will methodologically strengthen the work.

      Likely impact of work on the field:

      This work will be of interest to the medical community looking for novel ways to improve clinical diagnosis. Although future works with a greater population size, and more robust techniques such as RNA-Seq, are needed, this is the first step in presenting a novel way to distinguish these pathologies.

      It would be of great benefit to the community to provide a table/spreadsheet indicating the specific genomic locations of the HERVs specific to each condition. This will allow proper provenance for future researchers interested in expanding on this knowledge, as these genomic coordinates will be independent of the technique used (as was the array used here).

      We agree with the reviewer that sharing genomic locations of DE HERVs in these pathologies would contribute to the development of these findings. Unfortunately, we do not hold the rights to share probe coordinates from this custom HERV-V3 microarray which we used under MTA agreement with its developer.

      Reviewer #3 (Public review):

      The authors find that HERV expression patterns can be used as new criteria for differential diagnosis of FM and ME/CFS and patient subtyping. The data are based on transcriptome analysis by microarray for HERVs using patient blood samples, followed by differential expression of ERVs and bioinformatic analyses. This is a standard and solid data processing pipeline, and the results are well presented and support the authors' claim.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Recommandations/questions:

      (1) The authors point towards the biomarker potential of HERV expression signatures. In line with this, it would be important to test if they can predict the correct pathology for patients using the expression of DE HERVs. Additionally, as a single clinician annotated the cohort analysed in this study, it would be interesting to validate the signatures identified in this work by reanalysing publicly available transcriptomic data from independent studies.

      Thank you for the suggestion. We plan to conduct this analysis and have added the following statement to the manuscript (lines 482-483): “Given the limited sample size in our cohort, validation of the findings in extended cohorts is a must.”

      (2) The authors suggest that an epigenetic mechanism causes the dysregulated HERV expression in ME/CFS patients. However, in Fig.1A, HERV expression profiles of co-diagnosed patients are more similar to healthy controls than patients with either condition. How could the co-morbidity of FM "rescue" the phenotype of ME/CFS?

      Thank you for the insightful comment. It is notable that co-diagnosed patients exhibit HERV expression profiles more similar to those of healthy controls than to either FM´s or ME/CFS´s. These findings may suggest a distinct underlying pathomechanism for this patient group, supporting the identification of a novel nosologic entity, as discussed in lines 372-374 of the manuscript.

      (3) Abundant evidence in the literature links HERV dysregulation with the production of RNA:DNA hybrids and dsRNAs and viral mimicry. The authors found that ME/CFS subgroup 2, which exhibits the most important HERV dysregulation, is also associated with decreased signatures of pathogen detection. It would be interesting to quantify the abundance of DNA:RNA hybrids and dsRNAs in PBMCs of ME/CFS and FM patients as well as healthy controls. It would be interesting to discuss how downregulation of pathogen detection pathways could be a mechanism in ME/CFS patients to avoid viral mimicry and potential links with inflammation in this disease.

      Certainly, HERVs can influence disease pathophysiology by generating RNA:DNA hybrids and dsRNA. However, microarray data does not allow this analysis. Future actions to investigate the underlying mechanisms of differentially expressed HERVs could investigate this interesting possibility.

      (4) Another intriguing result is how overexpression of Module 3 in ME/CFS subgroup 2 is associated with higher levels of plasma cells. The authors hypothesize that the changes in immune cell abundances reflect previous viral infections, but another possibility would be immune activation against HERVs. Are there protein-coding sequences (gag, pro, pol, env) amongst the HERV sequences of module 3? If so, it would be interesting to validate HERV protein expression in these samples. Additionally, blood samples of ME/CFS patients and healthy controls should be analysed in flow cytometry to describe the abundance and phenotype of immune cells precisely.

      Thank you for your insightful comments. In fact, we identified three HERV elements with protein-coding regions whose functional relevance remains uncertain. They present an interesting avenue for future investigation, particularly regarding immune activation.

      Minor comments:

      (1) On lines 170-172, it is unclear to me how Figure 1E is linked to the text.

      We have added a line better explaining Fig. 1E: “Top 10 contributing HERVs to principal components PC1 and PC2 are shown” (lines 171-172).

      (2) Figure S2: grouping or colouring the plots based on the cluster to which HERVs were assigned could facilitate the understanding of the figure.

      We appreciate the suggestion to enhance the clarity of the figures. However, this color-coding cannot be implemented, as a family is not exclusively assigned to a single cluster.

      (3) How are the 4 HERV clusters of Figure 2 and the 8 modules of Figure 3 related to the clusters identified by hierarchical clustering in Figure 1? More details should be provided in the text (Results and Methods sections), and figures to illustrate the clustering strategy should be added if needed.

      To enhance clarity, we have included the following explanation in the results section (lines 244-251): “To uncover potentially affected physiologic functions linked to DE HERV, we examined how DE HERVs and DE genes with similar expression patterns grouped together in modules based on their intrinsic relationships by their hierarchical co-clustering (Fig. 3). Then, the functional significance of these modules was assessed by gene ontology (GO) analysis of the DE genes within each module. The hierarchical clustering analysis resulted in the identification of eight distinct modules, each characterized by unique combinations of DE HERV and DE gene patterns across all four study groups (Fig. 3)”.

      (4) Related to Figure 4, are there HERV sequences in module 3 located near genes important for plasma cells and/or resting CD4 memory T cells?

      Thank you for your insightful comment. However, gene relevance for plasma cells and/or resting CD4 memory T cells may depend on multiple factors in addition to cell type and subtypes and, therefore, the analysis may not be straight forward.

      Reviewer #2 (Recommendations for the authors):

      In Figure 1, the heatmap scale goes from -4 to 4. This should reflect at least the numbers on the lowest and highest end of the scale.

      Thank you for bringing this to our attention. The scale was correct; however, when arranging the panels, the numbers were not properly positioned. The figure has now been updated with the corrected version.

      Figure 2F and G, percentages are shown as decimal numbers up to 1.00, while it should be 100%, and so on.

      We also replaced this figure, changing the numbers to fit percentages.

      It would be interesting to know how the results change using FDR of 0.05. I'm not familiar with microarray thresholds, but in RNA-Seq, 0.1 is rarely used, with 0.05 being the standard. Could it be that a more stringent result better distinguishes the pathologies?

      Applying a more stringent threshold, such as FDR 0.05, may remove sequences that, while not strongly differentially expressed, may be still important for distinguishing between these pathologies. Therefore, we decided to also include DE tendencies (FDR<0.1) in this first of a kind study. Findings will need validation in enlarged cohorts.

    1. eLife Assessment

      The study by Power and colleagues is important as elucidating the dynamic immune responses to photoreceptor damage in vivo potentiates future work in the field to better understand the disease process. The evidence supporting the authors' claims is compelling. The current manuscript would further benefit from including limitations/future improvements in the discussion or conclusion, exploring neutrophil recruitment under different degree of photoreceptor loss (mild to severe).

    2. Reviewer #2 (Public review):

      Summary:

      This study uses in vivo multimodal high-resolution imaging to track how microglia and neutrophils respond to light-induced retinal injury from soon after injury to 2 months post-injury. The in vivo imaging finding was subsequently verified by ex vivo study. The results suggest that despite the highly active microglia at the injury site, neutrophils were not recruited in response to acute light-induced retinal injury.

      Strengths:

      An extremely thorough examination of the cellular-level immune activity at the injury site. In vivo imaging observations being verified using ex vivo techniques is a strong plus.

      Weaknesses:

      This paper is extremely long, and in the perspective of this reviewer, needs to be better organized. Update: Modifications have been made throughout, which has made the manuscript easier to follow.

      Study weakness: though the finding prompts more questions and future studies, the findings discussed in this paper is potentially important for us to understand how the immune cells respond differently to different severity level of injury. The study also demonstrated an imaging technology which may help us better understand cellular activity in living tissue during earlier time points.

      Comments on revisions:

      I appreciate the thorough clarification and re-organization by the authors, and the messages in the manuscript are now more apparent. I recommend also briefly discussing limitations/future improvements in the discussion or conclusion.

    3. Reviewer #3 (Public review):

      Summary

      This work investigated the immune response in the murine retina after focal laser lesions. These lesions are made with close to 2 orders of magnitude lower laser power than the more prevalent choroidal neovascularization model of laser ablation. Histology and OCT together show that the laser insult is localized to the photoreceptors and spares the inner retina, the vasculature and the pigment epithelium. As early as 1-day after injury, a loss of cell bodies in the outer nuclear layer is observed. This is accompanied by strong microglial proliferation to the site of injury in the outer retina where microglia do not typically reside. The injury did not seem to result in the extravasation of neutrophils from the capillary network, constituting one of the main findings of the paper. The demonstrated paradigm of studying the immune response and potentially retinal remodeling in the future in vivo is valuable and would appeal to a broad audience in visual neuroscience.

      Strengths

      Adaptive optics imaging of murine retina is cutting edge and enables non-destructive visualization of fluorescently labeled cells in the milieu of retinal injury. As may be obvious, this in vivo approach is a benefit for studying fast and dynamic immune processes on a local time scale - minutes and hours, and also for the longer days-to-months follow-up of retinal remodeling as demonstrated in the article. In certain cases, the in vivo findings are corroborated with histology.

      The analysis is sound and accompanied by stunning video and static imagery. A few different sets of mouse models are used, a) two different mouse lines, each with a fluorescent tag for neutrophils and microglia, b) two different models of inflammation - endotoxin-induced uveitis (EAU) and laser ablation are used to study differences in the immune interaction.

      One of the major advances in this article is the development of the laser ablation model for 'mild' retinal damage as an alternative to the more severe neovascularization models. This model would potentially allow for controlling the size, depth and severity of the laser injury opening interesting avenues for future study.

      The time-course, 2D and 3D spatial activation pattern of microglial activation are striking and provide an unprecedented view of the retinal response to mild injury.

      Weaknesses

      Generalization of the (lack of) neutrophil response to photoreceptor loss - there is ample evidence in literature that neutrophils are heavily recruited in response to severe retinal damage that includes photoreceptor loss. Why the same was not observed here in this article remains an open question. One could hypothesize that neutrophil recruitment might indeed occur under conditions that are more in line with the more extreme damage models, for example, with a stronger and global ablation (substantially more photoreceptor loss over a larger area). This parameter space is unwieldy and sufficiently large to address the question conclusively in the current article, i.e. how much photoreceptor loss leads to neutrophil recruitment? By the same token, the strong and general conclusion in the title - Photoreceptor loss does not recruit neutrophils - cannot be made until an exhaustive exploration be made of the same parameter space. A scaling back may help here, to reflect the specific, mild form of laser damage explored here, for instance - Mild photoreceptor loss does not recruit neutrophils despite...

      EIU model - The EIU model was used as a positive control for neutrophil extravasation. Prior work with flow cytometry has shown a substantial increase in neutrophil counts in the EIU model. Yet, in all, the entire article shows exactly 2 examples in vivo and 3 ex vivo (Figure 7) of extravasated neutrophils from the EIU model (n = 2 mice). The general conclusion made about neutrophil recruitment (or lack thereof) is built partly upon this positive control experiment. But these limited examples, especially in the case where literature reports a preponderance of extravasated neutrophils, raise a question on the paradigm(s) used to evaluate this effect in the mild laser damage model.

      Overall, the strengths outweigh the weaknesses, provided the conclusions/interpretations are reconsidered.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate the interaction between tissue-resident immune cells (microglia) and circulating systemic neutrophils in response to acute, focal retinal injury. They induced retinal lesions using 488 nm light to ablate photoreceptor (PR) outer segments, then utilized various imaging techniques (AOSLO, SLO, and OCT) to study the dynamics of fluorescent microglia and neutrophils in mice over time. Their findings revealed that while microglia showed a dynamic response and migrated to the injury site within a day, neutrophils were not recruited to the area despite being nearby. Post-mortem confocal microscopy confirmed these in vivo results. The study concluded that microglial activation does not recruit neutrophils in response to acute, focal photoreceptor loss, a scenario common in many retinal diseases.

      Strengths:

      The primary strength of this manuscript lies in the techniques employed.

      In this study, the authors utilized advanced Adaptive Optics Scanning Laser Ophthalmoscopy (AOSLO) to document immune cell interactions in the retina accurately. AOSLO's micron-level resolution and enhanced contrast, achieved through near-infrared (NIR) light and phase-contrast techniques, allowed visualization of individual immune cells without extrinsic dyes. This method combined confocal reflectance, phase-contrast, and fluorescence modalities to reveal various cell types simultaneously. Confocal AOSLO tracked cellular changes with less than 6 μm axial resolution, while phase-contrast AOSLO provided detailed views of vascular walls, blood cells, and immune cells. Fluorescence imaging enabled the study of labeled cells and dyes throughout the retina. These techniques, integrated with conventional histology and Optical Coherence Tomography (OCT), offered a comprehensive platform to visualize immune cell dynamics during retinal inflammation and injury.

      Thank you!

      Weaknesses:

      One significant weakness of the manuscript is the use of Cx3cr1GFP mice to specifically track GFP-expressing microglia. While this model is valuable for identifying resident phagocytic cells when the blood-retinal barrier (BRB) is intact, it is important to note that recruited macrophages also express the same marker following BRB breakdown. This overlap complicates the interpretation of results and makes it difficult to distinguish between the contributions of microglia and infiltrating macrophages, a point that is not addressed in the manuscript.

      We agree that greater emphasis is required that CX3CR1 mice exhibit fluorescence in not only microglia, but also other cells of macrophage origin including monocytes, perivascular macrophages and some hyalocytes.

      Through the advantages of in vivo AOSLO, however, we are able to establish that CX3CR1 cells are present within the tissue before the laser lesion is placed. This suggests they are tissue resident. We agree that it is possible that at later time points (days-weeks), systemic macrophages and/or monocytes may participate. Lack of rolling/crawling cells suggest they are not systemic. We elaborate on this point in a new section in the discussion:

      P29 L534-541:

      “CX3CR1-GFP mice exhibit fluorescence not only in microglia

      We recognize that the CX3CR1-GFP model can also label systemic cells such as monocytes/macrophages77. While it is possible these cells could infiltrate the retina in response to the lesion, we find it unlikely since there was no indication of the leukocyte extravasation cascade (rolling/crawling/stalled cells) within the nearest retinal vasculature. In addition to microglia, retinal perivascular macrophages and hyalocytes also exhibit GFP fluorescence and thus that these cells may also contribute toward damage resolution.”

      Another major concern is the time point chosen for analyzing the neutrophil response. The authors assess neutrophil activity 24 hours after injury, which may be too late to capture the initial inflammatory response. This delayed assessment could overlook crucial early dynamics that occur shortly after injury, potentially impacting the overall findings and conclusions of the study.

      The power of in vivo imaging makes these early assessments possible. Therefore, we have taken the reviewers concern and conducted an additional experiment which examines whether neutrophils are seen in the window of time between lesion and 24hrs. In a newly examined mouse, we find that within 3.5 hours post-lesion, neutrophils do not extravasate adjacent to the lesion site (see new “figure 8 – figure supplement 1”).

      Also see accompanying video (new “figure 8 – video 3”) for an example of nearby neutrophils flowing through OPL capillaries just microns away from the lesion site. Neutrophils are clearly contained within the vasculature and exhibit dynamics consistent with healthy retinal tissue. While it remains possible that the lesion may increase leukocyte stalling within the nearest capillaries, we are unable to confirm or deny this with a single experiment. We now submit this evidence as a new supplementary figure following the reviewer’s suggestion.

      Reviewer #2 (Public review):

      Summary:

      This study uses in vivo multimodal high-resolution imaging to track how microglia and neutrophils respond to light-induced retinal injury from soon after injury to 2 months post-injury. The in vivo imaging finding was subsequently verified by an ex vivo study. The results suggest that despite the highly active microglia at the injury site, neutrophils were not recruited in response to acute light-induced retinal injury.

      Strengths:

      An extremely thorough examination of the cellular-level immune activity at the injury site. In vivo imaging observations being verified using ex vivo techniques is a strong plus.

      We appreciate this recognition and hope that the reviewer considers the weaknesses below in the context of the papers identified strengths.

      Weaknesses:

      This paper is extremely long, and in the perspective of this reviewer, needs to be better organized.

      We agree and have taken the following steps to address this:

      (1) Paper has been shortened overall by 8%

      (2) We reorganized the following sections:

      a. Introduction: shortened

      b. Methods: merged section “Ex vivo confocal image processing” with “Ex vivo confocal imaging”.

      c. Results: most sections shortened, others simplified for concision

      d. Discussion: most sections shortened, removed “Microglial/neutrophil discrimination using label-free phase contrast”

      e. Figure references reorganized in order of their appearance.

      Study weakness: though the finding prompts more questions and future studies, the findings discussed in this paper are potentially important for us to understand how the immune cells respond differently to different severity levels of injury.

      On the heels of this burgeoning technology, we consider this report among the first studies of its kind. We are hopeful that it forms the foundation of many further investigations to come. We expect a rich parameter space to be explored with future studies including investigation of other time points, other injuries of varying degree and other immune cell populations (along with their interactions with each other). Each has the potential to reveal the complexities of the ocular immune system in action.

      Reviewer #3 (Public review):

      Summary:

      This work investigated the immune response in the murine retina after focal laser lesions. These lesions are made with close to 2 orders of magnitude lower laser power than the more prevalent choroidal neovascularization model of laser ablation. Histology and OCT together show that the laser insult is localized to the photoreceptors and spares the inner retina, the vasculature, and the pigment epithelium. As early as 1-day after injury, a loss of cell bodies in the outer nuclear layer is observed. This is accompanied by strong microglial proliferation at the site of injury in the outer retina where microglia do not typically reside. The injury did not seem to result in the extravasation of neutrophils from the capillary network constituting one of the main findings of the paper. The demonstrated paradigm of studying the immune response and potentially retinal remodeling in the future in vivo is valuable and would appeal to a broad audience in visual neuroscience. However, there are some issues with the conclusions drawn from the data and analysis that can be addressed to further bolster the manuscript.

      Strengths:

      Adaptive optics imaging of the murine retina is cutting edge and enables non-destructive visualization of fluorescently labeled cells in the milieu of retinal injury. As may be obvious, this in vivo approach is beneficial for studying fast and dynamic immune processes on a local time scale - minutes and hours, and also for the longer days-to-months follow-up of retinal remodeling as demonstrated in the article. In certain cases, the in vivo findings are corroborated with histology.

      Thank you!

      The analysis is sound and accompanied by stunning video and static imagery. A few different sets of mouse models are used, (a) two different mouse lines, each with a fluorescent tag for neutrophils and microglia, (b) two different models of inflammation - endotoxin-induced uveitis (EAU) and laser ablation are used to study differences in the immune interaction.

      Thank you!

      One of the major advances in this article is the development of the laser ablation model for 'mild' retinal damage as an alternative to the more severe neovascularization models. While not directly shown in the article, this model would potentially allow for controlling the size, depth, and severity of the laser injury opening interesting avenues for future study.

      We agree that there is an established community that is invested in developing titrated dosimetry for light damage models. As the reviewer recognizes, this parameter space is exceptionally large therefore we controlled this parameter by choosing a single wavelength that is commonly used in ophthalmoscopy (488nm), fixed duration and exposure regime that created a reproducible, mild damage of photoreceptors. At this titration we created a mild lesion that spares retina above and below.

      Weaknesses:

      (1) It is unclear based on the current data/study to what extent the mild laser damage phenotype is generalizable to disease phenotypes. The outer nuclear cell loss of 28% and a complete recovery in 2 months would seem quite mild, thus the generalizability in terms of immune-mediated response in the face of retinal remodeling is not certain, specifically whether the key finding regarding the lack of neutrophil recruitment will be maintained with a stronger laser ablation.

      It seems the concern here is whether our finding is generalizable to other damage regimes, especially more severe ones. While speculative, we would suspect that it is not generalizable across different lesions of greater severity. For example, puncturing Bruch’s membrane is an example of a more severe phenotype that is often encountered in laser damage. However, this creates a complicated model that not only induces inflammation, but also compromises BRB integrity and promotes CNV. The parameter space to be tested in the reviewer’s question is quite vast and therefore have tried to summarize the generalizability within our manuscript in

      P31 L586-588 “There are limitations on how generalizable this mild damage to more severe damage or disease phenotypes, but this acute damage model can begin to provide clues about how immune cells interact in response to PR loss. In this laser lesion model, we ablate 27% of the PRs in a 50 µm region.”

      (2) Mice numbers and associated statistics are insufficient to draw strong conclusions in the paper on the activity of neutrophils, some examples are below:

      a) 2 catchup mice and 2 positive control EAU mice are used to draw inferences about immune-mediated activity in response to injury. If the goal was to show 'feasibility' of imaging these mouse models for the purposes of tracking specific cell type behavior, the case is sufficiently made and already published by the authors earlier. It is possible that a larger sample size would alter the conclusion.

      We would like to highlight that the total number of mice studied in this report was 28 (18 in-vivo imaging, 10 ex-vivo histology, >40 lesions total). While power analysis is challenging as these are the first studies of their kind, we underscore that in vivo imaging allows those same mice to be studied multiple times longitudinally. This is not possible with traditional histology. Therefore, in vivo imaging not only reveals the temporal progression (unlike histology), but also increases the number of observations beyond a simple count of the “number of mice”.

      The goal of the study was not one of feasibility. The goal was to address a specific question in ocular biology: “do resident CX3CR1 cells recruit neutrophils in early, regional retinal injury”

      The low numbers that the reviewer points to, are not the primary data of the paper, rather, supportive control data. Moreover, we refocus the attention on the fact that our study is performed on 28 mice across multiple modalities and each corroborates a common finding that neutrophils do not appear to be recruited despite strong microglial response; a central finding of the paper.

      b) There are only 2 examples of extravasated neutrophils in the entire article, shown in the positive control EAU model. With the rare extravasation events of these cells and their high-speed motility, the chance of observing their exit from the vasculature is likely low overall, therefore the general conclusions made about their recruitment or lack thereof are not justified by these limited examples shown.

      The spirit of the challenge raised is that because nothing was seen, is not proof that nothing occurred. Said more commonly, “absence of evidence is not evidence of absence”- a quote often attributed to Carl Sagan. Yet we push back on this conjecture as we have shown, not only with cutting edge in vivo imaging, but also with ample histological controls as well as multiple transgenic animals (and corroborating IHC antibodies) that in none of these imaging modalities, at none of the time points we evaluated, did neutrophils aggregate or extravasate in response to photoreceptor ablation.

      Reviewer adds: “the chance of observing their exit from the vasculature is likely low overall…”

      This is the reason that we specifically chose a focal lesion model to increase any possible chance of imaging a rare event. The focal lesion provides both a time and a location for “where” to look. Small 50 micrometer lesions were sufficient to drive a strong local microglial response (figures 5,6,9). This was evidence that local inflammatory cues were present. Yet despite this activation, neutrophils were not recruited to this location. We emphasize that this is a strength of our approach over other pan-retinal damage models that may indeed miss the rare extravasation events that are geographically sparse and happen over hours.

      c) In Figure 3, the 3-day time point post laser injury shows an 18% reduction in the density of ONL nuclei (p-value of 0.17 compared to baseline). In the case of neutrophils, it is noted that "Control locations (n = 2 mice, 4 z-stacks) had 15 {plus minus} 8 neutrophils per sq.mm of retina whereas lesioned locations (n = 2 mice, 4 z-stacks) had 23 {plus minus} 5 neutrophils per sq.mm of retina (Figure 10b). The difference between control and lesioned groups was not statistically significant (p = 0.19)." These data both come from histology. While the p-values - 0.17 and 0.19 - are similar, in the first case a reduction in ONL cell density is concluded while in the latter, no difference in neutrophil density is inferred in the lesioned case compared to control. Why is there a difference in the interpretation where the same statistical test and methodology are used in both cases? Besides this statistical nuance, is there an alternate possibility that there is an increased, albeit statistically insignificant, concentration of circulating neutrophils in the lesioned model? The increase is nearly 50% (15 {plus minus} 8 vs. 23 {plus minus} 5 neutrophils per sq.mm) and the reader may wonder if a larger animal number might skew the statistic towards significance.

      The statistics and p-values will be dependent on the strategy of analysis performed. As described in the methods, we used a predetermined 50 micron cylinder for our counting analysis based on the average lesion size created. We used this circular window to roughly approximate the size of the common lesion size. However, recall that the damage is created in a single axis (a line projected on the retina) therefore it is possible that the analysis region is too generous to capture the exceptionally local damage.

      While the reviewer is focused on the nuance of statistics, we would like to refocus the conversation on our data that shows that very few neutrophils were observed at all (105 cells from 8 locations, P value reported). But missed in the above critique is that all neutrophils were contained within capillaries (Fig 10). We found no examples of extravasated neutrophils.  This is the major finding and is supported by our in vivo as well as ex vivo confirmation.

      (2) The conclusions on the relative activity of neutrophils and microglia come from separate animals. The reader may wonder why simultaneous imaging of microglia and neutrophils is not shown in either the EAU mice or the fluorescently labeled catchup mice where the non-labeled cell type could possibly be imaged with phase-contrast as has been shown by the authors previously. One might suspect that the microglia dynamics are not substantially altered in these mice compared to the CX3CR1-GFP mice subjected to laser lesions, but for future applicability of this paradigm of in vivo imaging assessment of the laser damage model, including documenting the repeatability of the laser damage model and the immune cell behavior, acquiring these data in the same animals would be critical.

      A double fluorescent mouse (neutrophils and microglia) is a logical next step of this research. In fact, we have now crossed these transgenic mice and are studying this double labeled mouse in a second manuscript in preparation. However, for this study, it was imperative that the fluorescent imaging light was kept at low levels as not to contribute or alter the lesion phenotype and accompanying immune response. Therefore, imaging two fluorescent channels to simultaneously view neutrophils and microglia in the same animal would have required at least 2X the visible light exposure for imaging. The imaging light levels used in the current study were carefully examined in our previous publications as to not create additional light damage (Joseph et al 2021).

      (3) Along the same lines as above, the phase contrast ONL images at time points from 3-day to 2-month post laser injury are not shown and the absence of this data is not addressed. This missing data pertains only to the in vivo imaging mice model but are conducted in histology that adequately conveys the time-course of cell loss in the ONL.

      The ocular preparation of the phase contrast data in figure 2, unfortunately developed an anesthesia induced cataract that precluded adequate image quality. This is not uncommon in long-term mouse ocular imaging preparations (Feng et al 2023). Instead, we chose to include the phase-contrast data to show the visually compelling intact and disrupted ONL damage for baseline and 1 day to show that the damage is not only focal, but also shows clear disruption to the somatic layers of the photoreceptors.

      It is suggested that the reason be elaborated for the exclusion of this data and the simultaneous imaging of microglia and neutrophils mentioned above.

      We agree and we have included the reason for the “not acquired” data within the figure 2 legend:

      “Phase contrast data was not acquired for time points 3 days-2 months due to development of cataract which obscured the phase contrast signal”

      Also, it would be valuable to further qualify and check the claims in the Discussion that "ex vivo analysis confirms in vivo findings" and "Microglial/neutrophil discrimination using label-free phase contrast"

      We maintain that ex vivo analysis both corroborates and in many cases, confirms our in vivo findings. We feel this is a strength of our manuscript rather than a qualifier. A) Damage localization is visible with OCT and confocal/phase contrast AOSLO in a region that matches the DAPI loss we see ex vivo. B) Disruption of the ONL seen with in vivo AOSLO is of the same size, shape and location as the ONL damage quantified ex vivo. C) No damage or disruption was seen in locations above the lesion with OCT or AOSLO, which matches our finding that only the ONL shows loss of nuclei whereas other more superficial layers are spared. D) Microglial localization is found both in vivo and ex vivo and E) lack of neutrophil aggregation or extravasation was neither seen in vivo or ex vivo. Given the evidence above, we contend that this strong synergistic and complementary approach corroborates the experimental data in two ways of studying this tissue.

      We agree that the claims made in the section entitled “Microglial/neutrophil discrimination using label-free phase contrast” are not strongly supported by the phase-contrast imaging presented in this paper. Accordingly, we have since removed this section based on reviewer suggestion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Based on the title and abstract, the main focus of the manuscript appears to be the immune response. However, most of the manuscript is dedicated to the authors' imaging technique. Additionally, several important concerns regarding the investigation of the immune response in the retina need to be addressed.

      We understand that emphasis may appear to be on the imaging technique, however, because AOSLO is not a widely used technology, we are committed to explaining the technique so that it both builds awareness and confidence in the way this exciting new data is acquired.

      (2) The authors indicate '1 day post-injury' as a timeframe spanning between 18 and 28 hours post-injury. This is a rather wide window of time, which could potentially affect the analysis. It is necessary to demonstrate that there is no significant difference in the immune response, particularly in terms of microglial morphology and branch orientation, between 18 and 28 hours post-injury.

      We agree that a fine time scale may show even greater insight to the natural history of the inflammatory response. However, we feel that our chosen time points go above and beyond the temporal precision that is offered by other investigations, especially considering the novel multi-modal imaging performed here. Studies using finer temporal sampling are poised for future investigation.

      (3) The authors should consider using additional markers or complementary techniques to differentiate between microglia and recruited macrophages, such as incorporating immunohistochemistry with P2RY12, a specific marker for microglia that helps distinguish them from macrophages, and CD68 or F4/80, markers for recruited macrophages. It is also crucial for the authors to include a discussion addressing the limitations of using Cx3cr1GFP mice and the potential impact on result interpretation. It is fundamental to validate the findings and clarify the roles of microglia and macrophages.

      The wonders of current IHC is that there are myriad antibodies and labels that “could” be used. We used what we felt were the most compelling for this stage of early investigation. We look forward to studies that employ this wider range of labels. See our response to reviewer 1’s first comment above for addressing the limitations of using Cx3CR1 mice.

      (4) Analyzing neutrophil responses at 24 hours post-injury may be too late to capture the critical early dynamics of inflammation. By this time, the initial recruitment and activation phases of neutrophils may have already peaked or begun to resolve, potentially missing key insights into the immediate immune response. The authors should conduct additional analysis of neutrophil responses at earlier time points post-injury, such as 6 or 12 hours. Including these time points would provide a more comprehensive and conclusive analysis of the neutrophil response, helping to delineate the progression of inflammation and its implications for subsequent healing processes.

      This point has been addressed above. Briefly, we have now included a new experiment (and figure + video) that shows no neutrophil extravasation at earlier time points. We thank the reviewer for this helpful suggestion.

      Reviewer #2 (Recommendations for the authors):

      This paper is extremely long, and in the perspective of this reviewer, needs to be better organized.

      (1) There was a lengthy description and verification of light-induced injury and longitudinal tracking of healing, which I believe can be further cleaned up and made more succinct.

      We have cleaned-up and re-organized the manuscript (see above response for details). Manuscript has been reorganized and reduced by 8%.

      (2) The intention/goal of the paper can be further strengthened. On page 33: "to what extent do neutrophils respond to acute neural loss in the retina?" This particular statement is so clear and really brings out the purpose of this study, and it will be great to see something like this in the opening statement.

      We thank the reviewer for this excellent suggestion. We have modified the final paragraph of the introduction to strengthen our study’s intention.

      P4 L45-47: Here, we ask the question: “To what extent do microglia/neutrophils respond to acute neural loss in the retina?” To begin unraveling the complexities in this response, we deploy a deep retinal laser ablation model.

      (3) The figures are not mentioned in the manuscript in the order they were numbered. It makes it extremely challenging to follow along. The methods/results sections started with Figure 1, then on to Figure 4, then back to Figures 2 and 3, etc. This reviewer recommends re-organizing figures and their order of appearance so the contents of the figures are referred to in the paragraph in the most efficient and clear manner.

      We have re-organized the appearance of figure references throughout the paper.

      (4) Figure 2: phase contrast was not acquired on days 3, 7, and 2 months. Please briefly explain the reason in the caption.

      Addressed above.

      (5) Figure 4 OPL layer, the area highlighted in a dashed circle was meant to demonstrate that perfusion was intact, but I cannot see the flow in the highlighted area very well at day 7 and 2 months (especially 2 months). Please explain.

      Perfusion maps are often difficult to interpret as a static image. Therefore, we have additionally provided the raw video data (“OPL_vasculature_7d” and “OPL_vasculature_2mo”) which helps visualize active perfusion. To the reviewer’s point, videos reveal that RBC motion is maintained in the capillaries of this location.

      (6) While there's a thorough discussion of the biological impact of the finding, the uniqueness of the imaging technique can be better highlighted. Immune response toward injury is highly dynamic and is often the first step of wound healing. To observe such dynamic events longitudinally in the living eye at the cellular level, it requires a special imaging technique such as the type addressed here. The author can better address the technical uniqueness of studying this type of biological event for readers less familiar with AOSLO.

      We agree and following the reviewer’s suggestion have further emphasized the advance in the current manuscript in two additional places:

      (1) Within the introduction

      P3-4 L21-42: “A missed window of interaction is highly problematic in histological study where a single time point reveals a snapshot of the temporally complex immune response, which changes dynamically over time. Here, we use in vivo imaging to overcome these constraints.

      Documenting immune cell interactions in the retina over time has been challenged by insufficient resolution and contrast to visualize single cells in the living eye. The microscopic size of immune cells requires exceptional resolution for detection. Recently, advances in AOSLO imaging have provided micron-level resolution and enhanced contrast for imaging individual immune cells in the retina and without requiring extrinsic dyes(7,23). AOSLO provides multi-modal information from confocal reflectance, phase-contrast and fluorescence modalities, which can reveal a variety of cell types simultaneously in the living eye. Here, we used confocal AOSLO to track changes in reflectance at cellular scale. Phase-contrast AOSLO provides detail on highly translucent retinal structures such as vascular wall, single blood cells(27–29), PR somata(30), and is well-suited to image resident and systemic immune cells.(7,23) Fluorescence AOSLO provides the ability to study fluorescently-labeled cells(25,31,32) and exogenous dyes(27,33) throughout the living retina. These modalities used in combination have recently provided detailed images of the retinal response to a model of human uveitis.(23,34) Together, these innovations now provide a platform to visualize, for the first time, the dynamic interplay between many immune cell types, each with a unique role in tissue inflammation.”

      (2) Within the discussion

      P34-35 L656-662 “Beyond the context of this specific finding, we share this work with the excitement that AOSLO cellular level imaging may reveal the interaction of multiple immune cell types in the living retina. By using fluorophores associated with specific immune cell populations, the complex dynamics that orchestrate the immune response may be examined in this specialized tissue. This work and future studies may reveal further insights to the interactions of single immune cells in the living body in a non-invasive way.”

      Reviewer #3 (Recommendations for the authors):

      Some other comments:

      (1) The reader may wonder why if all findings are confirmed by histology would an in vivo imaging model be needed. This does not need a generalized explanation given the typical virtues of an in vivo model, but perhaps the authors may want to amplify their findings in the current context, for example, those on the shorter minutes to hours timescales (Figure 2, Supplement 1) that would have been resource and time intensive, and likely impossible, to gather via histology alone.

      The reviewer appropriately underscores the utility of in vivo imaging above histological-only investigation. In response, we have added text in the introduction to emphasize the nuanced, but important value of both longitudinal imaging as well as dynamic imaging which is not possible with conventional histology (e.g. blood perfusion status, immune cell interactions etc.)

      P3-4 L21-42 (these points also addressed in response to reviewer #2 above)

      (2) A few questions and comments on the laser ablation model<br /> - It is alluded to in the Discussion in Lines 519-521 that the procedure is highly reproducible (95%) but the associated data for this repeatability metric is not shown.

      We agree that the criterion for determining a “successful lesion” requires further elaboration. Therefore, we have now included the criteria for successful lesions in the methods as well as discussion (in bullet below):

      Methods:

      P9-10 L129-133: “This protocol produced a hyper-reflective phenotype in the >40 locations across 28 mice. In rare cases, the exposure yielded no hyper-reflective lesion and were often in mice with high retinal motion, where the light dosage was spread over a larger retinal area. These locations were not included in the in-vivo or histological analysis.”

      - The methods state that a 24 x 1-micron line is focused on the retina, but all lesions seem to appear elliptical where the major to minor axis ratio is a lot smaller than this intended size. One wonders what leads to this discrepancy.

      We expect that this observation is related to the response above, we have added the following:

      Discussion:

      P27 L497-505: “The damage took on an elliptical form, likely due to: 1) Eye motion from respiration and heart rate which spreads the light over a larger integrative area (rather than line). 2) The impact of focal light scatter. 3) A micron-thin line imparting damage on cells that are many microns across manifesting as an ellipse. The majority of light exposures produced lesions of this elliptical shape. In a few conditions, for the reasons described above, the exposure failed to produce a strong, focal damage phenotype. To improve lesion reproducibility, future experiments should control for subtle eye motion affecting light damage, especially for long exposures.”

      (3) Lastly, a thickening is noted in the ONL after laser injury that seems to cause a thinning of the INL as well (Figure 3) which may increase the apparent INL nuclei density.

      The reviewer’s careful eye finds local swelling after injury. However, despite swelling, the segregation between INL and ONL was maintained in all days we examined. Thus, no ONL cells were included in INL counts (see figure 3A & 3D).

      Also, the ONL - inner (panel B) seems to show a little reduction in cell density in the same elliptical shape as the outer ONL in panel C.

      We agree with this observation and was one of the reasons we included this detailed analysis of both the inner and outer half of the ONL. Our finding is that there is more prominent loss of nuclei in the outer half of the ONL. While the mechanism for this is not understood, we felt it was an important finding to include and further shows the axial specificity of the light damage we are inducing (especially at day 1 observation).

      Lastly, the reduction in nuclear density is visually obvious in the ONL at the 1 and 3-day time points but the p-statistic does not seem to convey this. One may consider performing the analysis on panel F on a smaller region surrounding the lesion to more reliably reveal these effects.

      Related to the response above, the ONL shows a persistence of nuclei in the upper half of that layer, whereas the outer half, shows a visible reduction. Therefore, we expect that the reviewer is correct that a statistical analysis that considers just the outer half of the ONL would likely show a strong statistical significance. The challenge, however, is that our analysis strategy counted all cells within a 50 micron diameter cylinder through the entirety of the ONL (meaning strong loss in the outer half was attenuated by weak loss in the inner half). A more detailed sub-layer analysis is challenging given the notable retinal remodeling over days-to-weeks that make it challenging to attribute layers within the ONL as viable landmarks for the requested analysis.

      (4) In Figure 6, the NIR confocal image and fluorescent microglia seem to share the same shape, starting from the OPL and posterior to it. This is particularly evident in the 3 and 7-day time points in the ONL and ONL/IS images. This departs from lines 567-577 where the claim is made that the hyperreflective phenotype in NIR images does not emerge from the microglia and neutrophils. This discrepancy should be clarified. It may be so that the hyperreflective phenotype as observed by Figure 2 at shorter timescales is not related to the microglia but the locus of hyper-reflections changes at longer time scales to involve the microglia as well as in Figure 6. One potential clue/speculation of the common shapes/size in confocal hyper-reflectance and fluorescent microglia of Figure 6 comes from Figure 9 where the microglia seem to engulf the photoreceptor phagosomes in the DAPI stains. It is possible that the hyper-reflections arise from the phagosomes but their co-localization with microglia seems to demonstrate a shared size/shape. As an addendum to the first point, such correlations are a power of the in vivo model and impossible to achieve in histology.

      The reviewer shows a deep understanding of our data. We agree with many of the points, but for the purpose of the paper many of the above offerings are speculative and we have chosen not to elaborate on these points as it is not definitive from the data. Instead, we direct the reader to an important finding that within hours, the hyper-reflective phenotype is seen in both OCT and AOSLO, whereas microglial somas/processes have not yet migrated into the hyper-reflective region. We have now emphasized this point in the discussion section:

      P29-30 L543-552: “A common speculation is that the increased backscatter may arise from local inflammatory cells that activate or move into the damage location. In our data, confocal AOSLO and OCT revealed a hyperreflective band at the OPL and ONL after 488 nm light exposure (Figure 2a, b). We found that the hyperreflective bands appeared within 30 minutes after the laser injury, preceding any detectable microglial migration toward the damage location (Figure 2 – figure supplement 1 and Figure 6 – figure supplement 1). We thus conclude that the initial hyperreflective phenotype is not caused by microglial cell activity or aggregation.”

    1. eLife Assessment

      This important work presents a self-supervised method for the segmentation of 3D cells in fluorescent microscopy images, conveniently packaged as a Napari plugin and tested on an annotated dataset. The segmentation method is solid and compares favorably to other learning-based methods and Otsu thresholding on four datasets, offering the possibility of eliminating time-consuming data labeling to speed up quantitative analysis. This work will be of interest to a wide variety of laboratories analysing fluorescently labeled images.

    2. Reviewer #1 (Public review):

      The manuscript now compares the WNet3D quantitatively against other methods on all four datasets:

      Figure 1b shows results on the mouse cortex dataset, comparing StarDist, CellPose, SegResNet, SwinUNetR against self-supervised (or learning-free methods) WNet3D and Otsu thresholding.

      Figure 2b shows results on an unnamed dataset (presumably the mouse cortex dataset), comparing StarDist, CellPose, SegResNet, SwinUNetR with different levels of training data against WNet3D.

      Figure 3 shows results on three datasets (Platynereis-ISH-Nuclei-CBG, Platynereis-Nuclei-CBG, and Mouse-Skull-Nuclei-CBG), comparing StarDist, CellPose against WNet3D and Otsu thresholding.

      It is unclear whether the Otsu thresholding baseline was given the same post-processing as the WNet3D. Figure 1b shows two versions for WNet3D ("WNet3D - No artifacts" and "WNet3D"), but only one for Otsu thresholding. Given that post-processing (or artifact removal) seems to have a substantial impact on accuracy, the authors should clarify whether the Otsu thresholding results were treated in the same way and if Otsu thresholding was not post-processed. Figure 2a would also benefit from including the thresholding results (with and without artifact removal).

    3. Reviewer #2 (Public review):

      The authors have now addressed the most important points, and they include more comprehensive evaluation of their method and comparisons to other approaches for multiple datasets.

      Some points would benefit from clarification:

      - Figure 1B now compares "Otsu thresholding", "WNet 3D - No artifacts" and "WNet 3d". Why don't you also report the score for "Otsu thresholding - No Artifacts"? To my understanding this is a post-processing operation to remove small and very large objects, so it could easily be applied to the Otsu thresholding. Given the good results for Otsu thresholding alone (quite close F1-score to WNet 3d), it seems like DL might not really be necessary at all for this dataset and including "Otsu thresholding - No artifacts" would enable evaluating this point.

      - CellPose and StarDist perform poorly in all the experiments performed by the authors. In almost all cases they underperform Otsu thresholding, which is in most cases on par with the WNet results (except for "Mouse Skull Nuclei CBG"). This is surprising and contradicts the collective expertise of the community: good CellPose and StarDist models can be trained for the 3D instance segmentation tasks studied here. Perhaps these methods were not trained in an optimal way. Seems unlikely that it is not possible to get much better CellPose or StarDist models for these tasks (current versions are on par or much worse than Otsu!), as I have applied both of these models successfully in similar settings. Specifically, it seems unlikely that the developers of CellPose or StarDist would obtain similarly poor scores on the same data (note I am not one of the developers).

      The current experiments still highlight an interesting aspect: the problem of training / fine-tuning these methods correctly on new data and the technical challenges associated with this. But the reported results should by no means be taken as a fair assessment of the capabilities of StarDist or CellPose.

      Please note that I did not have time to test the Napari plugin again, so I did not evaluate whether it improved in usability.

    4. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This work presents a valuable self-supervised method for the segmentation of 3D cells in microscopy images, alongside an implementation as a Napari plugin and an annotated dataset. While the Napari plugin is readily applicable and promises to eliminate time consuming data labeling to speed up quantitative analysis, there is incomplete evidence to support the claim that the segmentation method generalizes to other light-sheet microscopy image datasets beyond the two specific ones used here.

      Technical Note: We showed the utility of CellSeg3D in the first submission and in our revision on 5 distinct datasets; 4 of which we showed F1-Score performance on. We do not know which “two datasets” are referenced. We also already showed this is not limited to LSM, but was used on confocal images; we already limited our scope and changed the title in the last rebuttal, but just so it’s clear, we also benchmark on two non-LSM datasets.

      In this revision, we have now additionally extended our benchmarking of Cellpose and StarDrist on all 4 benchmark datasets, where our Wet3D (our novel contribution of a self-supervised model) outperforms or matches these supervised baselines. Moreover, we perform rigorous testing of our model’s generalization by training on one dataset and testing generalization to the other 3; we believe this is on par (or beyond) what most cell segmentation papers do, thus we hope that “incomplete” can now be updated.

      Public Reviews:

      Reviewer #1 (Public review):

      This work presents a self-supervised method for the segmentation of 3D cells in microscopy images, an annotated dataset, as well as a napari plugin. While the napari plugin is potentially useful, there is insufficient evidence in the manuscript to support the claim that the proposed method is able to segment cells in other light-sheet microscopy image datasets than the two specific ones used here.

      Thank you again for your time. We benchmarked already on four datasets the performance of WNet3Dd (our 3D SSL contribution) - thus, we do not know which two you refer to. Moreover, we now additionally benchmarked Cellpose and StarDist on all four so readers can see that on all datasets, WNet3D outperforms or matches these supervised methods.

      I acknowledge that the revision is now more upfront about the scope of this work. However, my main point still stands: even with the slight modifications to the title, this paper suggests to present a general method for self-supervised 3D cell segmentation in light-sheet microscopy data. This claim is simply not backed up.

      We respectfully disagree; we benchmark on four 3D datasets: three curated by others and used in learning ML conference proceedings, and one that we provide that is a new ground truth 3D dataset - the first of its kind - on mesoSPIM-acquired brain data. We believe benchmarking on four datasets is on par (or beyond) with current best practices in the field. For example, Cellpose curated one dataset and tested on held-out test data on this one dataset (https://www.nature.com/articles/s41592-020-01018-x) and benchmarked against StarDist and Mask R-CNN (two models). StarDist (Star-convex Polyhedra for 3D Object Detection and Segmentation in Microscopy) benchmarked on two datasets and against two models, IFT-Watershed and 3D U-Net. Thus, we feel our benchmarking on more models and more datasets is sufficient to claim our model and associated code is of interest to readers and supports our claims (for comparison, Cellpose’s title is “Cellpose: a generalist algorithm for cellular segmentation”, which is much broader than our claim).

      I still think the authors should spell out the assumptions that underlie their method early on (cells need to be well separated and clearly distinguishable from background). A subordinate clause like "often in cleared neural tissue" does not serve this purpose. First, it implies that the method is also suitable for non-cleared tissue (which would have to be shown). Second, this statement does not convey the crucial assumptions of well separated cells and clear foreground/background differences that the method is presumably relying on.

      We expanded the manuscript now quite significantly. To be clear, we did show our method works on non-cleared tissue; the Mouse Skull, 3D platynereis-Nuclei, and 3D platynereis-ISH-Nuclei is not cleared tissue, and not all with LSM, but rather with confocal microscopy. We attempted to make that more clear in the main text.

      Additionally, we do not believe it needs to be well separated and have a perfectly clean background. While we removed statements like "often in cleared neural tissue", expanded the benchmarking, and added a new demo figure for the readers to judge. As in the last rebuttal, we provide video-evidence (https://www.youtube.com/watch?v=U2a9IbiO7nE) of the WNet3D working on the densely packed and hard to segment by a human, Mouse Skull dataset and linked this directly in the figure caption.

      We have re-written the main manuscript in an attempt to clarify the limitations, including a dedicated “limitations” section. Thank you for the suggestion.

      It does appear that the proposed method works very well on the two investigated datasets, compared to other pre-trained or fine-tuned models. However, it still remains unclear whether this is because of the proposed method or the properties of those specific datasets (namely: well isolated cells that are easily distinguished from the background). I disagree with the authors that a comparison to non-learning methods "is unnecessary and beyond the scope of this work". In my opinion, this is exactly what is needed to proof that CellSeg3D's performance can not be matched with simple image processing.

      We want to again stress we benchmarked WNet3D on four datasets, not two. But now additionally added benchmarking with Cellpose, StarDist and a non-deep learning method as requested (see new Figures 1 and 3).

      As I mentioned in the original review, it appears that thresholding followed by connected component analysis already produces competitive segmentations. I am confused about the authors' reply stating that "[this] is not the case, as all the other leading methods we fairly benchmark cannot solve the task without deep learning". The methods against which CellSeg3D is compared are CellPose and StarDist, both are deep-learning based methods.

      That those methods do not perform well on this dataset does not imply that a simpler method (like thresholding) would not lead to competitive results. Again, I strongly suggest the authors include a simple, non-learning based baseline method in their analysis, e.g.: * comparison to thresholding (with the same post-processing as the proposed method) * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)

      We added a non-deep learning based approach, namely, comparing directly to thresholding with the same post hoc approach we use to go from semantic to instance segmentation. WNet3D (and other deep learning approaches) perform favorably (see Figure 2 and 3).

      Regarding my feedback about the napari plugin, I apologize if I was not clear. The plugin "works" as far as I tested it (i.e., it can be installed and used without errors). However, I was not able to recreate a segmentation on the provided dataset using the plugin alone (see my comments in the original review). I used the current master as available at the time of the original review and default settings in the plugin.

      We updated the plugin and code for the revision at your request to make this possible directly in the napari GUI in addition to our scripts and Jupyter Notebooks (please see main and/or `pip install --upgrade napari-cellseg3d`’ the current is version 0.2.1). Of course this means the original submission code (May 2024) will not have this in the GUI so it would require you to update to test this. Alternatively, you can see the demo video we now provide for ease: https://www.youtube.com/watch?v=U2a9IbiO7nE (we understand testing code takes a lot of time and commitment).

      We greatly thank the review for their time, and we hope our clarifications, new benchmarking, and re-write of the paper now makes them able to change their assessment from incomplete to a more favorable and reflective eLife adjective.

      Reviewer #2 (Public review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      -  The idea behind the self-supervised learning loss is interesting.

      -  It provides a new annotated dataset for an important segmentation problem.

      -  The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      -  The comparison to other methods on the provided dataset is extensive and experiments are reproducible via public notebooks.

      Weaknesses:

      The experiments presented by the authors support the core claims made in the paper. However, they do not convincingly prove that the method is applicable to segmentation problems with more complex morphologies or more crowded cells/nuclei.

      Major weaknesses:

      (1) The method only provides functionality for semantic segmentation outputs and instance segmentation is obtained by morphological post-processing. This approach is well known to be of limited use for segmentation of crowded objects with complex morphology. This is the main reason for prediction of additional channels such as in StarDist or CellPose. The experiments do not convincingly show that this limitation can be overcome as model comparisons are only done on a single dataset with well separated nuclei with simple morphology. Note that the method and dataset are still a valuable contribution with this limitation, which is somewhat addressed in the conclusion. However, I find that the presentation is still too favorable in terms of the presentation of practical applications of the method, see next points for details.

      Thank you for noting the methods strengths and core features. Regarding weaknesses, we have revised the manuscript again and added direct benchmarking now on four datasets and a fifth “worked example” (https://www.youtube.com/watch?v=3UOvvpKxEAo&t=4s) in a new Figure 4.

      We also re-wrote the paper to more thoroughly present the work (previously we adhered to the “Brief Communication” eLife format), and added an explicit note in the results about model assumptions.

      (2) The experimental set-up for the additional datasets seems to be unrealistic as hyperparameters for instance segmentation are derived from a grid search and it is unclear how a new user could find good parameters in the plugin without having access to already annotated ground-truth data or an extensive knowledge of the underlying implementations.

      We agree that of course with any self-supervised method the user will need a sense of what a good outcome looks like; that is why we provide Google Colab Notebooks

      (https://github.com/AdaptiveMotorControlLab/CellSeg3D/tree/main/notebooks) and the napari-plugin GUI for extensive visualization and even the ability to manually correct small subsets of the data and refine the WNet3D model.

      We attempted to make this more clear with a new Figure 2 and additional functionality directly into the plugin (such as the grid search). But, we believe this “trade-off” for SSL approaches over very labor intensive 3D labeling is often worth it; annotators are also biased so extensive checking of any GT data is equally required.

      We also added the “grid search” functionality in the GUI (please `pip install --upgrade napari-cellseg3d`; the latest v0.2.1) to supplement the previously shared Notebook (https://github.com/C-Achard/cellseg3d-figures/blob/main/thresholds_opti/find_best_threshold s.ipynb) and added a new YouTube video: https://www.youtube.com/watch?v=xYbYqL1KDYE.

      (3) Obtaining segmentation results of similar quality as reported in the experiments within the napari plugin was not possible for me. I tried this on the "MouseSkull" dataset that was also used for the additional results in the paper.

      Again we are sorry this did not work for you, but we added new functionality in the GUI and made a demo video (https://www.youtube.com/watch?v=U2a9IbiO7nE) where you either update your CellSeg3D code or watch the video to see how we obtained these results.

      Here, I could not find settings in the "Utilities->Convert to instance labels" widget that yielded good segmentation quality and it is unclear to me how a new user could find good parameter settings. In more detail, I cannot use the "Voronoi-Otsu" method due to installation issues that are prohibitive for a non expert user and the "Watershed" segmentation method yields a strong oversegmentation.

      Sorry to hear of the installation issue with Voronoi-Otsu; we updated the documentation and the GUI to hopefully make this easier to install. While we do not claim this code is for beginners, we do aim to be a welcoming community, thus we provide support on GitHub, extensive docs, videos, the GUI, and Google Colab Notebooks to help users get started.

      Comments on revised version

      Many of my comments were addressed well:

      -  It is now clear that the results are reproducible as they are well documented in the provided notebooks, which are now much more prominently referenced in the text.

      Thanks!

      -  My concerns about an unfair evaluation compared to CellPose and StarDist were addressed. It is now clear that the experiments on the mesoSPIM dataset are extensive and give an adequate comparison of the methods.

      Thank you; to note we additionally added benchmarking of Cellpose and StarDist on the three additional datasets (for R1), but hopefully this serves to also increase your confidence in our approach.

      -  Several other minor points like reporting of the evaluation metric are addressed.

      I have changed my assessment of the experimental evidence to incomplete/solid and updated the review accordingly. Note that some of my main concerns with the usability of the method for segmentation tasks with more complex morphology / more crowded cells and with the napari plugin still persist. The main points are (also mentioned in Weaknesses, but here with reference to the rebuttal letter):

      - Method comparison on datasets with more complex morphology etc. are missing. I disagree that it is enough to do this on one dataset for a good method comparison.

      We benchmarked WNet3D (our contribution) on four datasets, and to aid the readers we additionally now added Cellpose and StarDist benchmarking on all four. WNet3D performs favorably, even on the crowded and complex Mouse Skull data. See the new Figure 3 as well as the associated video: https://www.youtube.com/watch?v=U2a9IbiO7nE&t=1s.

      -  The current presentation still implies that CellSeg3d **and the napari plugin** work well for a dataset with complex nucleus morphology like the Mouse Skull dataset. But I could not get this to work with the napari plugin, see next points.

      - First, deriving hyperparameters via grid search may lead to over-optimistic evaluation results. How would a user find these parameters without having access to ground-truth? Did you do any experiments on the robustness of the parameters?

      -  In my own experiments I could not do this with the plugin. I tried this again, but ran into the same problems as last time: pyClesperanto does not work for me. The solution you link requires updating openCL drivers and the accepted solution in the forum post is "switch to a different workstation".

      We apologize for the confusion here; the accepted solution (not accepted by us) was user specific as they switched work stations and it worked, so that was their solution. Other comments actually solved the issue as well. For ease this package can be installed on Google Colab (here is the link from our repo for ease: https://colab.research.google.com/github/AdaptiveMotorControlLab/CellSeg3d/blob/main/not ebooks/Colab_inference_demo.ipynb) where pyClesperanto can be installed via: !pip install pyclesperanto-prototype without issue on Google Colab.

      This a) goes beyond the time I can invest for a review and b) is unrealistic to expect computationally inexperienced users to manage. Then I tried with the "watershed" segmentation, but this yields a strong oversegmentation no matter what I try, which is consistent with the predictions that look like a slightly denoised version of the input images and not like a proper foreground-background segmentation. With respect to the video you provide: I would like to see how a user can do this in the plugin without having a prior knowledge on good parameters or just pasting code, which is again not what you would expect a computationally unexperienced user to do.

      We agree with the reviewer that the user needs domain knowledge, but we never claim our method was for inexperienced users. Our main goal was to show a new computer vision method with self-supervised learning (WNet3D) that works on LSM and confocal data for cell nuclei. To this end, we made you a demo video to show how a user can visually perform a thresholding check https://www.youtube.com/watch?v=xYbYqL1KDYE&t=5s, and we added all of these new utilities to the GUI, thanks for the suggestion. Otherwise, the threshold can also be done in a Notebook (as previously noted).

      I acknowledge that some of these points are addressed in the limitations, but the text still implies that it is possible to get good segmentation results for such segmentation problems: "we believe that our self-supervised semantic segmentation model could be applied to more challenging data as long as the above limitations are taken into account." From my point of view the evidence for this is still lacking and would need to be provided by addressing the points raised above for me to further raise the Incomplete/solid rating, especially showing how this can be done wit the napari plugin. As an alternative, I would also consider raising it if the claims are further reduced and acknowledge that the current version of the method is only a good method for well separated nuclei.

      We hope our new benchmarking and clear demo on four datasets helps improve your confidence in our evidence in our approach. We also refined our over text and hope our contributions, the limitations and the advantages are now more clear.

      I understand that this may be frustrating, but please put yourself in the role of a new reader of this work: the impression that is made is that this is a method that can solve 3D segmentation tasks in light-sheet microscopy with unsupervised learning. This would be a really big achievement! The wording in the limitation section sounds like strategic disclaimers that imply that it is still possible to do this, just that it wasn't tested enough.

      But, to the best of my assessment, the current version of the method only enables the more narrow case of well separated nuclei with a simple morphology. This is still a quite meaningful achievement, but more limited than the initial impression. So either the experimental evidence needs to be improved, including a demonstration how to achieve this in practice, including without deriving parameters via grid-search and in the plugin, or the claim needs to be meaningfully toned down.

      Thanks for raising this point; we do think that WNet3D and the associated CellSeg3D package - aimed to continue to integrate state of the art models, is a non-trivial step forward. Have we completely solved the problem, certainly not, but given the limited 3D cell segmentation tools that exist, we hope this, coupled with our novel 3D dataset, pushes the field forward. We don’t show it works on the narrow well-separated use case, but rather show this works even better than supervised models on the very challenging benchmark Mouse Skull. Given we now show evidence that we outperform or match supervised algorithms with an unsupervised approach, we respectfully do think this is a noteworthy achievement. Thank you for your time in assessing our work.

    1. eLife Assessment

      This important work advances our understanding of the aging trajectory and heterogeneity of hippocampal microglia. The authors provide an in-depth characterization of microglia in young and old mice as well as at intermediate time points, which reveals the existence of intermediate states characterized by a distinct transcriptional signature. The experimental approach is solid, especially with the validation of scRNA-seq findings with other methods. The study should be of interest to neuroimmunologists and biologists interested in aging

    2. Reviewer #2 (Public review):

      Summary:

      The goal of the paper was to trace the transitions hippocampal microglia undergo along aging. ScRNA-seq analysis allowed the authors to predict a trajectory and hypothesize about possible molecular checkpoints, which keep the pace of microglial aging. E.g. TGF1b was predicted as a molecule slowing down the microglial aging path and indeed, loss of TGF1 in microglia led to premature microglia aging, which was associated with premature loss of cognitive ability. The authors also used the parabiosis model to show how peripheral, blood-derived signals from the old organism can "push" microglia forward on the aging path.

      Strengths:

      A major strength and uniqueness of this work is the in-depth single-cell dataset, which may be a useful resource for the community, as well as the data showing what happens to young microglia in heterochronic parabiosis setting and upon loss of TGFb in their environment.

      Weaknesses:

      All weaknesses were addressed during revision.

      Overall:

      In general, I think the authors did a good job following the initial observations and devised clever ways to test the emerging hypotheses. The resulting data are an important addition to what we know about microglial aging and can be fruitfully used by other researchers, e.g. those working on microglia in a disease context.

      Comments on revisions:

      All my comments were addressed.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      To gain further insight into the dynamics of microglial aging in the hippocampus, the authors used a bioinformatics method known as "pseudotime" or "trajectory inference" to understand how cells may progress through different functional states, as defined by cellular transcriptome (15,16). These bioinformatics approaches can reveal key patterns in scRNAseq / snRNAseq datasets and, in the present study, the authors conclude that a "stress response" module characterized by expression of TGFb1 represents a key "checkpoint" in microglial aging in midlife, after which the cells can move along distinct transcriptional trajectories as aging progresses. This is an intriguing possibility. However, pseudotime analyses need to be validated via additional bioinformatics as well as follow-up experiments. Indeed, Heumos et al, in their Nature Genetics "Expert Guidelines" Review, emphasize that "inferred trajectories might not necessarily have biological meaning." They recommend that "when the expected topology is unknown, trajectories and downstream hypotheses should be confirmed by multiple trajectory inference methods using different underlying assumptions."(15) Numerous algorithms are available for trajectory inference (e.g. Monocle, PAGA, Slingshot, RaceID/StemID, among many others) and their performance and suitability depends on the individual dataset and nature of the trajectories that are to be inferred. It is recommended to use dynGuidelines(16) for the selection of optimal pseudotime analysis methods. In the present manuscript, the authors do not provide any justification for their use of Monocle 3 over other trajectory inference approaches, nor do they employ a secondary trajectory inference method to confirm observations made with Monocle 3. Finally, follow-up validation experiments that the authors carry out have their own limitations and caveats (see below). Hence, while the microglial aging trajectories identified by this study are intriguing, they remain hypothetical trajectories that need to be proven with additional follow-up experiments.

      We thank the reviewer for their suggestion. We have utilized the dynGuidelines kindly provided by the reviewer to utilize an additional trajectory inference tool to analyze our data. We selected Scorpius based on the structure of our data. The tool has provided additional support that microglia progress from a homeostatic state (Cx3cr1, Mef2c) to the induction of stress genes (Hspa1, Atf3) at an intermediate point during aging progression. Furthermore, we observe a concordant increase in ribosomal protein genes at a time point in the pseudotime analysis immediately prior to activation of inflammation-related genes (Il1b, Cst7). These additional analyses support the main findings of our original pseudotime analysis and have been added to the manuscript as Figure S3C,D. Additionally, in the statistical test that uncovers differentially expressed genes along the pseudotime trajectory in this analyses, we find that Tgfb1 is one of the genes that is differentially expressed with peak expression at an intermediate timepoint along the pseudotime trajectory. Furthermore, we have done some preliminary trajectory analysis with slingshot (Street et al, BMC Genomics, PMID: 29914354) that found a similar trajectory with analogous gene expression patterns and dynamic expression of Tgfb1.

      To follow up on the idea that TGFb1 signaling in microglia plays a key role in determining microglial aging trajectories, the authors use RNAscope to show that TGFb1 levels in microglia peak in middle age. They also treat primary LPS-activated microglia with TGFb1 and show that this restores expression of microglial homeostatic gene expression and dampens expression of stress response and, potentially, inflammatory genes. Finally, they utilize transgenic approaches to delete TGFb1 from microglia around 8-10mo of age and scRNAseq to show that homeostatic signatures are lost and inflammatory signatures are gained. Hence, findings in this study support the idea that TGFb1 can strongly regulate microglial phenotype. Loss of TGFb1 signaling to microglia in adulthood has already been shown to cause decreased microglial morphological complexity and upregulation of genes typically associated with microglial responses to CNS insults(17-19). TGFb1 signaling to microglia has also been implicated in microglial responses to disease and manipulations to increase this signaling can improve disease progression in some cases(19). In this light, the findings in the present study are largely confirmatory of previous findings in the literature. They also fall short of unequivocally demonstrating that TGFb1 signaling acts as a "checkpoint" for determining subsequent microglial aging trajectory. To show this clearly, one would need to perturb TGFb1 signaling around 12mo of age and carry out sequencing (bulkRNAseq or scRNAseq) of microglia at 18mo and 24mo. Such experiments could directly demonstrate whether the whole microglial population has been diverted to the TGFb1-low aging trajectory (that progresses through a translational burst state to an inflammation state as proposed). Future development of tools to tag TGFb1 high or low microglia could also enable fate tracing type experiments to directly show whether the TGFb1 state in middle age predicts cell state at later phases of aging.

      We apologize for the use of the term “checkpoint” when referring to the role of Tgfb1 in microglial aging. Instead, our model posits that Tgfb1 expression increases in response to the early insults of the aging process in an attempt to return microglia to homeostasis. Therefore, this would predict that increasing TGFB1 levels after an insult would decrease activation and age-related progression of microglia, which we demonstrate in vitro (Figure 3). Alternatively, the loss of TGFB1 should prevent microglia from returning to a homeostatic state after an age-related stressor, and thus increase the number of microglia in activated states. We observe this increase in activated microglia in our middle-aged microglia-specific Tgfb1 knockout mouse model. Furthermore, the haploinsufficiency of Tgfb1 at this age indicates that TGFB1 signaling in microglia is sensitive to relative levels of Tgfb1. The transient increase in Tgfb1 expression further suggests that the threshold for TGFB1 signaling is dynamic. Finally, RNA-Seq analysis of both in vitro TGFB1 supplemented microglia and in vivo Tgfb1 depleted microglia highlight that TGFB1 alters the aging microglia transcriptome. Combined, these results provide evidence that Tgfb1 modulates advancement of microglia through an aging continuum.

      The present study would also like to draw links between features of microglial aging in the hippocampus and a decline in hippocampal-dependent cognition during aging. To this end, they carry out behavioral testing in 8-10mo old mice that have undergone microglial-specific TGFb1 deletion and find deficits in novel object recognition and contextual fear conditioning. While this provides compelling evidence that TGFb1 signaling in microglia can impact hippocampus-dependent cognition in midlife, it does not demonstrate that this signaling accelerates or modulates cognitive decline (see below). Age-associated cognitive decline refers to cognitive deficits that emerge as a result of the normative brain aging process (20-21). For a cognitive deficit to be considered age-associated cognitive decline, it must be shown that the cognitive operation under study was intact at some point earlier in the adult lifespan. This requires longitudinal study designs that determine whether a manipulation impacts the relationship between brain status and cognition as animals age (22-24). Alternatively, cross-sectional studies with adequate sample sizes can be used to sample the variability in cognitive outcomes at different points of the adult lifespan (22-24) and show that this is altered by a particular manipulation. For this specific study, one would ideally demonstrate that hippocampal-based learning/memory was intact at some point in the lifespan of mice with microglial TGFb1 KO but that this manipulation accelerated or exacerbated the emergence of deficits in hippocampal-dependent learning/memory during aging. In the absence of these types of data, the authors should tone down their claims that they have identified a cellular and molecular mechanism that contributes to cognitive decline.

      We agree with the reviewer that to adequately demonstrate an age-dependent effect of microglia-derived TGFB1 on cognition it is necessary to perturb microglial TGFB1 at young and mature ages and assess the age-dependent effect on cognition. To address this, we have now performed a complementary behavioral study utilizing the Tmem119-CreER mouse model to drive the microglia-specific excision of Tgfb1 in two separate cohorts of mice – one young (2-3 months) and one in mature mice (7-8 months) – followed by cognitive testing. Using the novel object recognition test, we find that young mice of all genotypes (WT, Tgfb1 Het and Tgfb1 cKO ) retain the ability to recognize the novel object (as determined by having a significant preference in exploring the novel object). Alternatively, only the WT mature mice demonstrate a preference for the novel object, while the Tgfb1 Het and Tgfb1 cKO show no preference for the novel object. These behavioral data demonstrate an age-dependent necessity for microglia-specific TGFB1 in in maintain proper hippocampal-dependent memory and is now included in the manuscript as revised Figure 4I-J. We have also included additional behavioral tests (Y-Maze and open field) that did not show any difference between the genotypes as Figure S6D-G. Unfortunately, we were unable to perform the fear conditioning testing, as our apparatus broke during this time. Together, these results reveal that there is an age-dependent necessity for microglia-derived TGFB1 for hippocampal-dependent cognitive function.

      A final point of clarification for the reader pertains to the mining of previously generated data sets within this study. The language in the results section, methods, and figure legends causes confusion about which experiments were actually carried out in this study versus previous studies. Some of the language makes it sound as though parabiosis experiments and experiments using mouse models of Alzheimer's Disease were carried out in this study. However, parabiosis and AD mouse model experiments were executed in previous studies (25,26), and in the present study, RNAseq datasets were accessed for targeted data mining. It is fantastic to see further mining of datasets that already exist in the field. However, descriptions in the results and methods sections need to make it crystal clear that this is what was done.

      The reviewer makes an excellent point. While we referenced the public dataset in the original manuscript, the citation style of superscripted numbers diminishes our ability to adequately reference the datasets. Therefore, we have added the names of the first authors (Palovics for the parabiosis dataset and Sala Frigerio for the Alzheimer’s Disease dataset) to all the instances in the results and figure legends when we refer to these datasets.

      Additional recommendations:

      Major comments.

      (1) There is some ambiguity surrounding how to interpret the microglial TGFb1 knockout that seems incompatible with viewing this molecule as a "checkpoint" in microglial aging. TGFb1 is believed to be primarily produced by microglia. Secreted TGFb1 is then detected by microglial TGFbR2. Are the microglia that have high levels of TGFb1 in middle age signaling to themselves (autocrine signaling)? Or contributing to a local milieu that impacts multiple neighbor microglia (paracrine signaling)? The authors could presumably look in their own dataset to evaluate microglial capacity to detect TGFb1 via its receptors.

      We thank the reviewer for this insightful suggestion. We have undertaken analysis of our dataset to assess whether Tgfb1 acts through autocrine or paracrine signaling. To do so, we reanalyzed our microglia aging scRNA-Seq dataset leveraging the variation in microglia Tgfb1 expression to probe the relative activity of TGFB1. Specifically, we partitioned microglia into quartiles based on their Tgfb1 expression, and subsequently investigated the expression of TGFB signaling effectors and targets. High expression of downstream TGFB signaling pathway components in microglia with high Tgfb1 expression would point to autocrine mechanisms while, alternatively, high expression of downstream TGFB signaling pathway components in microglia with low Tgfb1 expression would point to paracrine mechanisms. We observed highest expression of TGFB signaling pathway components and targets in microglia with the highest expression of Tgfb1. These data suggest that Tgfb1 acts through an autocrine mechanism. These results have been added to our manuscript as Figure S4E-G. Additionally, while our manuscript was under review, a paper by Bedolla et al (Nature Communications 2024; PMID: 38906887) was published that investigated the role of Tgfb1 in adult microglia. This paper utilized orthogonal techniques – sparse microglia-specific Tgfb1 knockout and IHC - to also suggest that microglia utilize autocrine Tgfb1 signaling. Together, these complementary data provide strong evidence that Tgfb1 acts through an autocrine mechanism in adult microglia.

      (2) Conclusions of the study rest on the assumption that microglial inflammatory responses are a central driver of cognitive decline. They assume that manipulations that increase microglial progression into an inflammatory state will negatively impact cognitive function. Although there are certainly a lot of data in the field that inflammatory factors can impact synaptic function, additional experiments would be required to unequivocally demonstrate that a "TGFb1 dependent" progression of microglia to an inflammatory state underlies any observed changes in cognition. For example, in the context of microglial TGFb1 deletion, can NSAIDs or blockers of soluble TNFa (e.g. XENP345), or blockers of SPP1, etc. rescue behavior? Can microglial depletion in this context rescue behavior? Assuming behavior was carried out in the same microglial TGFb1 KO mice that were used for microglial scRNAseq, they could also carry out linear regression-type analyses to link microglial inflammatory status to the behavioral performance of individual mice. In the absence of additional evidence of this sort, the authors should tone down claims about mechanistic relationships between microglial state and cognitive performance.

      We thank the reviewer for realizing that the link between cognition and inflammation in our paper is speculative. Therefore, we have taken the reviewer’s advice and toned down the claims linking inflammation to cognition in our manuscript. Instead, we connect the disruption in cognition to what is observed in our data, a loss of microglia homeostasis and a shift in the microglia aging trajectories.

      Additional Recommendations:

      Minor comments:

      (1) Ideally at some point in the results or discussion, the authors should acknowledge that the hippocampus has highly distinct sub-regions and that microglia show different functions and properties across these sub-regions (e.g. microglia in hilus and subgranular zone vs microglia in stratum radiatum, vs microglia immediately adjacent to or embedded within stratum pyrimidale). Do expression levels of TGFb1 and microglial aging trajectories vary across sub-regions? To what extent can this account for heterogeneity of aging trajectories observed in microglial aging within the hippocampus?

      We are interested in how microglia heterogeneity during aging is influenced by the specific functions, and thus microenvironments within the hippocampus. Therefore, we have expanded our IHC analysis of microglia to determine how the microenvironment influences microglia phenotypes by looking at several different regions of the hippocampus. We have included this regional analysis as Figure S2 in the manuscript. This analysis has revealed region-specific effects on microglia activation during aging.

      (2) For immunohistochemistry data, it is not particularly convincing to see one example of one cell from each condition. Generally, an accepted approach in the field is to present lower magnification images accompanied by zoom panels for several cells from each field of view. This reassures the reader that specific cells haven't simply been "cherry-picked" to support a particular conclusion.

      To allay the concerns of the reviewer that cells haven’t been “cherry-picked”, we have provided low magnification images for the aging CD68 and NF<sub>κ</sub>B stains in Supplemental Figure S2.

      (3) In immunohistochemistry data, have measures been taken to ensure that observed signals are not simply autofluorescence that becomes prominent in tissues with aging? (i.e. use of trueblack or photoquenching of tissue prior to staining) See PMID 37923732

      We agree that autofluorescence, at least partially due to the accumulation of lipofuscin, becomes prominent in certain regions and cells of the hippocampus during aging. This most prominently occurs in the microglia of the hilus. This autofluorescence has a particular subcellular distribution, as it is localized to lyso-endosomal bodies. The microglia activation marker CD68 is also localized to lysosomes. A previous publication by Burns et al (eLife; PMID: 32579115) identified autofluorescent microglia (AF+) with unique molecular profiles that accumulate with age. They posited that these AF+ microglia resembled other microglia subsets that have pronounced storage compartments, such as the pro-inflammatory lipid droplet-containing microglia that accumulate with age reported by Marschallinger et al (Nature; PMID: 31959936). As such, autofluorescence present in microglia potentially represents distinctive and functional states of microglia. Our CD68 immunostaining accumulates with age, which could overlap with autofluorescent storage bodies. Thus, we performed a complementary CD68 immunostaining in an independent cohort of young (3 months) and aged (24 months) mice with autofluorescence quencher TrueBlack, and found that the staining pattern and accumulation of CD68 microglia with age persisted as previously observed after use of this quencher (see Authpr response image 1). Images are IBA1 (cyan) and CD68 (yellow) with the molecular layer (ML), granule cell (GC), and hilus illustrated and corresponding quantification provided (Two-way ANOVA with Sidak’s multiple comparisons test; ***P<0.001; ****P<0.0001).

      We would like to note that the subcellular localization of the other immunostainings included in the manuscript was distinct from CD68, and not likely to be associated with the autofluorescent storage bodies. Additionally, our RNAScope staining for Tgfb1 did not show an accumulation with age, but rather a transient increase at 12 months of age, which indicates that the interpretation of the RNAScope stain for Tgfb1 was not unduly influenced by autofluorescence.

      Author response image 1.

      (4) Ideally, more care is needed with the language used to describe microglial state during aging. The terms "dystrophic," "dysfunctional," and "inflammatory" all carry their own implications and assumptions. Many changes exhibited by microglia during aging can initially be adaptive or protective, particularly during middle age. Without additional experiments to show that specific microglial attributes during aging are actively detrimental to the tissue and additional experiments to show that microglia have ceased to be capable of engaging in many of their normal actions to support tissue homeostasis, the authors should exercise caution in using terms like dysfunctional.

      We appreciate the reviewers’ suggestion. To allay the concerns of the reviewer about the multiple implications of terms such as “dysfunctional” and “inflammatory”, we have tried to replace them throughout the text with more specific terms.

      Reviewer #2:

      That said, given what we recently learned about microglia isolation for RNA-seq analysis, there is a danger that some of the observations are a result of not age, but cell stress from sample preparation (enzymatic digestion 10min at 37C; e.g. PMID: 35260865). Changes in cell state distribution along aging were made based on scRNA-seq and were not corroborated by any other method, such as imaging of cluster-specific marker expression in microglia at different ages. This analysis would allow confirming the scRNA-seq data and would also give us an idea of where the subsets are present within the hippocampus, and whether there is any interesting distribution of cell states (e.g. some are present closer to stem cells?). Since TGFb is thought to be crucial to microglia biology, it would be valuable to include more analysis of the mice with microglia-specific Tgfb deletion e.g. what was the efficiency of recombination in microglia? Did their numbers change after induction of Tgfb deletion in Cx3cr1-creERT2::Tgfb-flox mice.

      We thank the reviewer for their comment regarding potential ex vivo transcriptional alterations with the approaches used in our study. We performed our aging microglia scRNA-Seq characterization prior to the release of Marsh et al (Nature Neuroscience; PMID: 35260865), which revealed the potential transcriptional artefacts induced by isolation. That being said, we took great care to minimize the amount of time samples were subjected to enzymatic digestion (15 minutes) and kept cells at 4C during the remainder of the isolation. Furthermore, we performed all isolations simultaneously, so that transcriptional changes induced by the isolation would be present across all ages and should not be observed during our analysis unless indicative of a true age-related change. Additionally, we have corroborated changes in cell state distribution across ages using several markers (Tgfb1 and KLF2 for the intermediate stress state, S6 for the translation state, and NFKB and CD68 for activation states). In the revised manuscript, we have added additional hippocampal subregion analysis of several IHC immunostains to provide spatial insights into the microglia aging process (Figure S2). This analysis reveals unique spatial dynamics of microglia aging. For example, as the reviewer foresaw, we found that the granule cell layer (the location of adult hippocampal neurogenesis) had a more pronounced age-associated progression of microglial activation than several other regions. A subset of regions had minimal levels of activation during aging, such as the molecular layer and the stratum radiatum of the CA1 (inner CA1in the manuscript) – regions enriched in synaptic terminals. Furthermore, this analysis highlights the susceptibility of microglia aging to microenvironmental influences.

      Regarding the temporally controlled microglia-specific genetic KO mouse model used in our original submission, the Cx3cr1-CreER allele selected (B6.129P2(Cg)-Cx3cr1tm2.1(cre/ERT2)Litt/WganJ) has been reported to have very high recombination efficiency (~94% in Parkhurst et al (Cell; PMID: 24360280)), and we used a tamoxifen induction protocol very similar to Faust et al. (Cell Reports; PMID: 37635351) that achieved ~98% recombination (they injected 100mg/kg for 5 days, while we injected 90mg/kg for 5 days). We analyzed our scRNA-Seq data for the expression of Tgfb1 and found that the knockout mice had a 67% reduction in cells expressing higher levels of Tgfb1 (see panel A in Author response image 2). This is likely a large underestimate of the recombination efficiency, as exon 3 is floxed and residual nonfunctional transcripts could be present, given nonsense-mediated decay is not realized in a number of knockout lines (Lindner et al, Methods, PMID: 33838271). We likely achieved a much higher excision efficiency. We would like to highlight that our data indicating increased microglia activation after tamoxifen treatment (Figure S5A) and the involvement of autonomous signaling (Figure S4E-G) are consistent with recently published work by Bedolla et al, (Nature Communications; PMID: 38906887). Additionally, as part of the revision process, we have now corroborated our behavioral data using and independent temporally controlled microglia-specific KO mouse model - Tmem119-CreER::Tgfb1 knockout mice (Figure 4I-K). We performed qPCR on sorted microglia to determine RNA levels in wildtype and knockout mice. Relative levels of Tgfb1 and exon 3 of Tgfb1 (the floxed exon) on technical replicates of 3 pooled samples indicated overall loss of Tgfb1 expression, as well as undetectable levels of exon 3 as normalized to Actb (see panel B in Author response image 2).

      Author response image 2.

      With respect to the effects of aging and Tgfb1 on microglia density, we find a slight region-specific increase in microglia density with age (see Author response image 3). The density of Iba1 cells across hippocampal regions was analyzed at 3 and 24 months of age (see panel A in Author response image 3) and along an aging continuum at 3, 6, 12, 18, and 24 months (see panel B in Author response image 3). These data are also included in the revised manuscript (Figure S2D-F).

      Author response image 3.

      Deletion of Tgfb1 also had region-specific effects on microglia. While there was no difference in microglia density between wildtype and heterozygous microglia, there was a significant increase in microglia density in the hilus and molecular layers in knockout mice (see Author response image 4) and included in the revised manuscript (Figure S5A). These data indicate that there are subtle region-specific increases in microglia density with age, as well as following the deletion of Tgfb1 from microglia of mature mice.

      Author response image 4.

      Additional Recommendations:

      (1) The problem of possible digestion artifacts in scRNA-seq should be at least addressed in the discussion as a caveat in data interpretation. Staining for unique cluster markers in undigested tissue would solve the problem. It can be done with microscopy or using flow cytometry, but for this microglia, isolation should be done with no enzymes or with Actinomycin (PMID: 35260865).

      The ex vivo activation signature uncovered by Marsh et al. (Nature Neuroscience; PMID: 35260865) arises from the digestion methods used to isolate microglia. We took the utmost care in processing our microglia identically within experiments, which should minimize the amount of uneven ex vivo activation of microglia. This is borne out by the structures of our single-cell sequencing data. Unlike Marsh et al_. where they observe unique cluster after addition of their inhibitors, we do not see any clusters unique to a single condition, suggesting that any influence of _ex vivo activation was evenly distributed.

      Importantly, as suggested by the review, we have we have complemented our scRNA-Seq analysis by corroborating several markers for various stages of microglia aging progression using RNAScope and IHC in intact tissue. Specifically, the transient age-dependent increase in Tgfb1 high microglia was confirmed using RNAScope (Figure 3B), the age-related increase in ribosomal high microglia was confirmed using S6 immunostaining (Figure 3I), and the increase of various markers of age-associated activation (C1q, CD68 and NFkB) was confirmed using immunostaining (Figure 1F and Figure S2D-I). Additionally, we have also performed immunostainings for KLF2 and confirmed peak microglia expression at 18 months of age with lower levels at 24 months of age (Figure 2H).

      (2) The figures of GO and violin plots are not easy to follow sometimes... what are the data points in the violin plots, maybe worth showing them as points? For the GO, e.g. in 3D, 3J, including a short description of the figure could help, e.g. in Figure 1. it was clear.

      We chose not to include the datapoints in the violin plots for aesthetic purposes. Each violin plot would have had hundreds of points that would have made the plots very busy and hidden the structure of the distribution. In Author response image 5 we show the violin plot in Figure 2M with (panel A) and without (panel B) individual points. In a small format, the points overlap and become jumbled together. Therefore, we chose to present the violin plots without points for clarity on the data structure. As for the gene ontology plots in Figure 3, we have updated the descriptions in both the text and figure legends to provide clarification on what they represent.

      Author response image 5.

      (3) I'm very curious to see the mechanism of action of "aged" microglia in the TGFb-depletion model. Is it creating hostile conditions for stem cells, or we have increased synapse loss? Something else?

      We thank the reviewer for their insightful questions. We would like to note that during the revision process of our manuscript, a complementary study was published reporting that the loss of microglia-derived Tgfb1 leads to an aberrant increase in the density of dendritic spines in the CA1 region of the hippocampus (Bedolla et al, Nature Communications, PMID: 38906887). The data from Bedolla et al, shows sparsely labeled neurons in the CA1 with a mGreenLantern expressing virus in mice the had Tgfb1 deleted from microglia using the Cx3cr1-CreERT driver (Figure 7U,V). Additionally, McNamara et al (Nature; PMID: 36517604) demonstrated that microglia-derived Tgfb1 signaling regulates myelin integrity during development and several studies have revealed links between Tgfb1 signaling and altered neurogenesis (e.g., He et al, Nature, PMID: 24859199 and Dias et al, Neuron, PMID: 25467979). Together, this growing body of work indicates that microglia-derived TGFB1 regulates myelination, neurogenesis and synaptic plasticity, which have all been shown to play a role in cognition.

    1. eLife Assessment

      This useful study examines the neural activity in the motor cortex as a monkey reaches to intercept moving targets, focusing on how tuned single neurons contribute to an interesting overall population geometry. The presented results and analyses are solid, though the investigation of this novel task could be strengthened by clarifying the assumptions behind the single neuron analyses, and further analyses of the neural population activity and its relation to different features of behaviour.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the question of how task-relevant sensory information affects activity in motor cortex. The authors use various approaches to address this question, looking at single units and population activity. They find that there are three subtypes of modulation by sensory information at the single unit level. Population analyses reveal that sensory information affects the neural activity orthogonally to motor output. The authors then compare both single unit and population activity to computational models to investigate how encoding of sensory information at the single-unit level is coordinated in a network. They find that an RNN that displays similar orbital dynamics and sensory modulation to motor cortex also contains nodes that are modulated similarly to the three subtypes identified by the single unit analysis.

      Strengths:

      The strengths of this study lie in the population analyses and the approach of comparing single-unit encoding to population dynamics. In particular, the analysis in Figure 3 is very elegant and informative about the effect of sensory information on motor cortical activity. The task is also well designed to suit the questions being asked and well controlled.

      It is commendable that the authors compare single-unit to population modulation. The addition of the RNN model and perturbations strengthen the conclusion that the subtypes of individual units all contribute to the population dynamics.

      Weaknesses:

      The main weaknesses of the study lie in the categorization of the single units into PD shift, gain and addition types. The single units exhibit clear mixed selectivity, as the authors highlight. Therefore, the subsequent analyses looking only at the individual classes in the RNN are a little limited. Another weakness of the paper is that the choice of windows for analyses is not properly justified and the dependence of the results on the time windows chosen for single unit analyses is not assessed. This is particularly pertinent because tuning curves are known to rotate during movements (Sergio et al. 2005 Journal of Neurophysiology).

      This study uses insights from single-unit analysis to inform mechanistic models of these population dynamics, which is a powerful approach, but is dependent on the validity of the single-cell analysis, which I have expanded on below.

      I have clarified some of the areas that would benefit from further analysis below:

      Task:

      The task is well designed, although it would have benefited from perhaps one more target speed (for each direction). One monkey appears to have experienced one more target speed than the others (seen in Figure 3C). It would have been nice to have this data for all monkeys, although, of course, unfeasible given that the study has been concluded.

      Single unit analyses:

      The choice of the three categories (PD shift, gain addition) is not completely justified in a satisfactory way. It would be nice to see whether these three main categories are confirmed by unsupervised methods.

      The decoder analyses in Figure 2 provide evidence that target speed modulation may change over the trial. Therefore, it is important to see how the window considered for the firing rate in Figure 1 (currently 100ms pre - 100ms post movement onset) affects the results. Whilst it is of course understandable that a window must be chosen and will always be slightly arbitrary, using different windows and comparing the results of two or three different sizes or timed windows would be more convincing that the results are not dependent on this particular window.

      RNN:

      Mixed selectivity is not analysed in the RNN, which would help to compare the model to the real data where mixed selectivity is common. The CCA and Procrustes analysis are a good start to validate the claim of similarity between RNN and neural dynamics, rather than allowing comparisons to be dominated by geometric similarities that may be features of the task. However, some of the disparity values for the Procrustes analysis are quite high, albeit below that of the shuffle. Maybe a comment about this in the text should be included. There is also an absence of alternate models to compare the perturbation model results to.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Zhang et al. examine neural activity in motor cortex as monkeys make reaches in a novel target interception task. Zhang et al. begin by examining the single neuron tuning properties across different moving target conditions, finding several classes of neurons: those that shift their preferred direction, those that change their modulation gain, and those that shift their baseline firing rates. The authors go on to find an interesting, tilted ring structure of the neural population activity, depending on the target speed, and find that 1) the reach direction has consistent positioning around the ring, and 2) the tilt of the ring is highly predictive of the target movement speed. The authors then model the neural activity with a single neuron representational model and a recurrent neural network model, concluding that this population structure requires a mixture of the three types of single neurons described at the beginning of the manuscript.

      Strengths:

      I find the task the authors present here to be novel and exciting. It slots nicely into an overall trend to break away from a simple reach-to-static-target tasks to better characterize the breadth of how motor cortex generates movements. I also appreciate the movement from single neuron characterization to population activity exploration, which generally serves to anchor the results and make them concrete. Further, the orbital ring structure of population activity is fascinating, and the modeling work at the end serves as a useful baseline control to see how it might arise.

      Weaknesses:

      While I find the behavioral task presented here to be excitingly novel, I find the presented analyses and results to be far less interesting than they could be. Key to this, I think, is that the authors are examining this task and related neural activity primarily with a single-neuron representational lens. This would be fine as an initial analysis, since the population activity is of course composed of individual neurons, but the field seems to have largely moved towards a more abstract "computation through dynamics" framework that has, in the last several years, provided much more understanding of motor control than the representational framework has. As the manuscript stands now, I'm not entirely sure what interpretation to take away from the representational conclusions the authors made (i.e. the fact that the orbital population geometry arises from a mixture of different tuning types). As such, by the end of the manuscript, I'm not sure I understand any better how motor cortex or its neural geometry might be contributing to the execution of this novel task.

      Main Comments:

      My main suggestions to the authors revolve around bringing in the computation through a dynamics framework to strengthen their population results. The authors cite the Vyas et al. review paper on the subject, so I believe they are aware of this framework. I have three suggestions for improving or adding to the population results:

      (1) Examination of delay period activity: one of the most interesting aspects of the task was the fact that the monkey had a random-length delay period before he could move to intercept the target. Presumably, the monkey had to prepare to intercept at any time between 400 and 800 ms, which means that there may be some interesting preparatory activity dynamics during this period. For example, after 400ms, does the preparatory activity rotate with the target such that once the go cue happens, the correct interception can be executed? There is some analysis of the delay period population activity in the supplement, but it doesn't quite get at the question of how the interception movement is prepared. This is perhaps the most interesting question that can be asked with this experiment, and it's one that I think may be quite novel for the field--it is a shame that it isn't discussed.

      (2) Supervised examination of population structure via potent and null spaces: simply examining the first three principal components revealed an orbital structure, with a seemingly conserved motor output space and a dimension orthogonal to it that relates to the visual input. However, the authors don't push this insight any further. One way to do that would be to find the "potent space" of motor cortical activity by regression to the arm movement and examine how the tilted rings look in that space. Presumably, then, the null space should contain information about the target movement. The ring tilt will likely be evident if the authors look at the highest variance neural dimension orthogonal to the potent space (the "null space")--this is akin to PC3 in the current figures, but it would be nice to see what comes out when you look in the data for it.

      The authors attempt this sort of analysis in the supplement, alongside their dPCA results, but the results seem misinterpreted. The authors do identify one kind of output-potent space using the reach direction components of dPCA, and the reach directions are indeed aligned here. However, they then go on to interpret the target-velocity space as the output-null space, orthogonal to the potent space. There are two problems with this. 1) The target-velocity space is not necessarily orthogonal to the reach-direction space. This is a key aspect of dPCA--while the individual components within a particular marginalization space are orthogonal, the marginalization spaces themselves are not necessarily orthogonal unless they are forced to be (which the authors don't mention doing). 2) Even if the target-velocity space were orthogonal to the reach-direction space, it would not comprise the whole output-null space--such a null space would also include dimensions of neural population activity that have target-velocity/reach-direction interaction, which the authors show is a major component of neural population variance. Incidentally, the dPCA analysis the authors present shows what I would expect from their unsupervised results, but as it is written, the dPCA results are interpreted in a strange or potentially misleading way.

      (3) RNN perturbations: as it's currently written, the RNN modeling has promise, but the perturbations performed don't provide me with much insight. I think this is because the authors are trying to use the RNN to interpret the single neuron tuning, but it's unclear to me what was learned from perturbing the connectivity between what seems to me almost arbitrary groups of neurons. It seems to me that a better perturbation might be to move the neural state before the movement onset to see how it changes the output. For example, the authors could move the neural state from one tilted ring to another to see if the virtual hand then reaches a completely different (yet predictable) target. Moreover, if the authors can more clearly characterize the preparatory movement, perhaps perturbations in the delay period would provide even more insight into how the interception might be prepared.

    4. Reviewer #3 (Public review):

      Summary:

      This experimental study investigates the influence of sensory information on neural population activity in M1 during a delayed reaching task. In the experiment, monkeys are trained to perform a delayed interception reach task, in which the goal is to intercept a potentially moving target.

      This paradigm allows the authors to investigate how, given a fixed reach end point (which is assumed to correspond to a fixed motor output), the sensory information regarding the target motion is encoded in neural activity.

      At the level of single neurons, the authors find that target motion modulates the activity is three main ways: gain modulation (scaling of the neural activity depending on the target direction), shift (shift of the preferred direction of neurons tuned to reach direction), or addition (offset to the neural activity).

      At the level of the neural population, target motion information was largely encoded along the 3rd PC of the neural activity, leading to a tilt of the manifold along which reach direction was encoded that was proportional to target speed. The tilt of the neural manifold was found to be largely driven by the variation of activity of the population of gain modulated neurons.

      Finally, the authors study the behaviour of an RNN trained to generate the correct hand velocity given the sensory input and reach direction. The RNN units are found to similarly exhibit mixed selectivity to the sensory information, and the geometry of the « neural population » resembles that observed in the monkeys.

      Overall, the experiment is well set up to address the question of how sensory information that is directly relevant to the behaviour but does not lead to a direct change in behavioural output modulates motor cortical activity.<br /> The finding that sensory information modulates the neural activity in M1 during motor preparation and execution is non trivial, given that this modulation of the activity must occur in the nullspace of the movement.<br /> The authors provide analyses at both the single neuron and the population level, leading to a relatively complete characterization of the effect of the target motion on neural activity.<br /> Additionally, they start exploring the link between the population geometry and the mixed selectivity of the single neurons in their RNN model. While they could be extended in future work, the analyses of the RNN provide a good starting point to address how exactly the task setup and constraints on the network shape the single neuron selectivity and the population geometry.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study addresses the question of how task-relevant sensory information affects activity in the motor cortex. The authors use various approaches to address this question, looking at single units and population activity. They find that there are three subtypes of modulation by sensory information at the single unit level. Population analyses reveal that sensory information affects the neural activity orthogonally to motor output. The authors then compare both single unit and population activity to computational models to investigate how encoding of sensory information at the single unit level is coordinated in a network. They find that an RNN that displays similar orbital dynamics and sensory modulation to the motor cortex also contains nodes that are modulated similarly to the three subtypes identified by the single unit analysis.

      Strengths:

      The strengths of this study lie in the population analyses and the approach of comparing single-unit encoding to population dynamics. In particular, the analysis in Figure 3 is very elegant and informative about the effect of sensory information on motor cortical activity.

      The task is also well designed to suit the questions being asked and well controlled.

      We appreciate these kind comments.

      It is commendable that the authors compare single units to population modulation. The addition of the RNN model and perturbations strengthen the conclusion that the subtypes of individual units all contribute to the population dynamics. However, the subtypes (PD shift, gain, and addition) are not sufficiently justified. The authors also do not address that single units exhibit mixed modulation, but RNN units are not treated as such.

      We’re sorry that we didn’t provide sufficient grounds to introduce the subtypes. We have updated this in the revised manuscript, in Lines 102-104 as:

      “We determined these modulations on the basis of the classical cosine tuning model (Georgopoulos et al., 1982) and several previous studies (Bremner and Andersen, 2012; Pesaran et al., 2010; Sergio et al., 2005).”

      In our study, we applied the subtype analysis as a criterion to identify the modulation in neuron populations, rather than sorting neurons into exclusively different cell types.

      Weaknesses:

      The main weaknesses of the study lie in the categorization of the single units into PD shift, gain, and addition types. The single units exhibit clear mixed selectivity, as the authors highlight. Therefore, the subsequent analyses looking only at the individual classes in the RNN are a little limited. Another weakness of the paper is that the choice of windows for analyses is not properly justified and the dependence of the results on the time windows chosen for single-unit analyses is not assessed. This is particularly pertinent because tuning curves are known to rotate during movements (Sergio et al. 2005 Journal of Neurophysiology).

      In our study, the mixed selectivity or specifically the target-motion modulation on reach- direction tuning is a significant feature of the single neurons. We categorized the neurons into three subclasses, not intending to claim their absolute cell types, but meaning to distinguish target-motion modulation patterns. To further characterize these three patterns, we also investigated their interaction by perturbing connection weights in RNN.

      Yes, it’s important to consider the role of rotating tuning curves in neural dynamics during interception. In our case, we observed population neural state with sliding windows, and we focused on the period around movement onset (MO) due to the unexpected ring-like structure and the highest decoding accuracy of transferred decoders (Figure S7C). Then, the single-unit analyses were implemented.

      This paper shows sensory information can affect motor cortical activity whilst not affecting motor output. However, it is not the first to do so and fails to cite other papers that have investigated sensory modulation of the motor cortex (Stavinksy et al. 2017 Neuron, Pruszynski et al. 2011 Nature, Omrani et al. 2016 eLife). These studies should be mentioned in the Introduction to capture better the context around the present study. It would also be beneficial to add a discussion of how the results compare to the findings from these other works.

      Thanks for the reminder. We’ve introduced these relevant researches in the updated manuscript in Lines 422-426 as:

      “To further clarify, the discussing target-motion effect is different from the sensory modulation in action selection (Cisek and Kalaska, 2005), motor planning (Pesaran et al., 2006), visual replay and somatosensory feedback (Pruszynski et al., 2011; Stavisky et al., 2017; Suway and Schwartz, 2019; Tkach et al., 2007), because it occurred around movement onset and in predictive control trial-by-trial.”

      This study also uses insights from single-unit analysis to inform mechanistic models of these population dynamics, which is a powerful approach, but is dependent on the validity of the single-cell analysis, which I have expanded on below.

      I have clarified some of the areas that would benefit from further analysis below:

      (1) Task:

      The task is well designed, although it would have benefited from perhaps one more target speed (for each direction). One monkey appears to have experienced one more target speed than the others (seen in Figure 3C). It would have been nice to have this data for all monkeys.

      A great suggestion; however, it is hardly feasible as the Utah arrays have already been removed.

      (2) Single unit analyses:

      In some analyses, the effects of target speed look more driven by target movement direction (e.g. Figures 1D and E). To confirm target speed is the main modulator, it would be good to compare how much more variance is explained by models including speed rather than just direction. More target speeds may have been helpful here too.

      A nice suggestion. The fitting goodness of the simple model (only movement direction) is much worse than the complex models (including target speed). We’ve updated the results in the revised manuscript in Lines 119-122, as “We found that the adjusted R2 of a full model (0.55 ± 0.24, mean ± sd.) can be higher than that of the PD shift (0.47 ± 0.24), gain (0.46 ± 0.22), additive (0.41 ± 0.26), and simple models (only reach direction, 0.34 ± 0.25) for three monkeys (1162 neurons, ranksum test, one-tailed, p<0.01, Figure S5).”

      The choice of the three categories (PD shift, gain addition) is not completely justified in a satisfactory way. It would be nice to see whether these three main categories are confirmed by unsupervised methods.

      A good point. It is a pity that we haven’t found an appropriate unsupervised method.

      The decoder analyses in Figure 2 provide evidence that target speed modulation may change over the trial. Therefore, it is important to see how the window considered for the firing rate in Figure 1 (currently 100ms pre - 100ms post movement onset) affects the results.

      Thanks for the suggestion and close reading. Because the movement onset (MO) is the key time point of this study, we colored this time period in Figure 1 to highlight the perimovement neuronal activity.

      (3) Decoder:

      One feature of the task is that the reach endpoints tile the entire perimeter of the target circle (Figure 1B). However, this feature is not exploited for much of the single-unit analyses. This is most notable in Figure 2, where the use of a SVM limits the decoding to discrete values (the endpoints are divided into 8 categories). Using continuous decoding of hand kinematics would be more appropriate for this task.

      This is a very reasonable suggestion. In the revised manuscript, we’ve updated the continuous decoding results with support vector regression (SVR) in Figure S7A and in Lines 170-173 as:

      “These results were stable on the data of the other two monkeys and the pseudopopulation of all three monkeys (Figure S6) and reconfirmed by the continuous decoding results with support vector regressions (Figure S7A), suggesting that target motion information existed in M1 throughout almost the entire trial.”

      (4) RNN:

      Mixed selectivity is not analysed in the RNN, which would help to compare the model to the real data where mixed selectivity is common. Furthermore, it would be informative to compare the neural data to the RNN activity using canonical correlation or Procrustes analyses. These would help validate the claim of similarity between RNN and neural dynamics, rather than allowing comparisons to be dominated by geometric similarities that may be features of the task. There is also an absence of alternate models to compare the perturbation model results to.

      Thank you for these helpful suggestions. We have performed decoding analysis on RNN units and updated in Figure S12A and Lines 333-334 as: “First, from the decoding result, target motion information existed in nodes’ population dynamics shortly after TO (Figure S12A).”

      We also have included the results of canonical correlation analysis and Procrustes analysis in Table S2 and Lines 340-342 as: “We then performed canonical component analysis (CCA) and Procrustes analysis (Table S2; see Methods), the results also indicated the similarity between network dynamics and neural dynamics.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Zhang et al. examine neural activity in the motor cortex as monkeys make reaches in a novel target interception task. Zhang et al. begin by examining the single neuron tuning properties across different moving target conditions, finding several classes of neurons: those that shift their preferred direction, those that change their modulation gain, and those that shift their baseline firing rates. The authors go on to find an interesting, tilted ring structure of the neural population activity, depending on the target speed, and find that (1) the reach direction has consistent positioning around the ring, and (2) the tilt of the ring is highly predictive of the target movement speed. The authors then model the neural activity with a single neuron representational model and a recurrent neural network model, concluding that this population structure requires a mixture of the three types of single neurons described at the beginning of the manuscript.

      Strengths:

      I find the task the authors present here to be novel and exciting. It slots nicely into an overall trend to break away from a simple reach-to-static-target task to better characterize the breadth of how the motor cortex generates movements. I also appreciate the movement from single neuron characterization to population activity exploration, which generally serves to anchor the results and make them concrete. Further, the orbital ring structure of population activity is fascinating, and the modeling work at the end serves as a useful baseline control to see how it might arise.

      Thank you for your recognition of our work.

      Weaknesses:

      While I find the behavioral task presented here to be excitingly novel, I find the presented analyses and results to be far less interesting than they could be. Key to this, I think, is that the authors are examining this task and related neural activity primarily with a singleneuron representational lens. This would be fine as an initial analysis since the population activity is of course composed of individual neurons, but the field seems to have largely moved towards a more abstract "computation through dynamics" framework that has, in the last several years, provided much more understanding of motor control than the representational framework has. As the manuscript stands now, I'm not entirely sure what interpretation to take away from the representational conclusions the authors made (i.e. the fact that the orbital population geometry arises from a mixture of different tuning types). As such, by the end of the manuscript, I'm not sure I understand any better how the motor cortex or its neural geometry might be contributing to the execution of this novel task.

      This paper shows the sensory modulation on motor tuning in single units and neural population during motor execution period. It’s a pity that the findings were constrained in certain time windows. We are still working on this task, please look forward to our following work.

      Main Comments:

      My main suggestions to the authors revolve around bringing in the computation through a dynamics framework to strengthen their population results. The authors cite the Vyas et al. review paper on the subject, so I believe they are aware of this framework. I have three suggestions for improving or adding to the population results:

      (1) Examination of delay period activity: one of the most interesting aspects of the task was the fact that the monkey had a random-length delay period before he could move to intercept the target. Presumably, the monkey had to prepare to intercept at any time between 400 and 800 ms, which means that there may be some interesting preparatory activity dynamics during this period. For example, after 400ms, does the preparatory activity rotate with the target such that once the go cue happens, the correct interception can be executed? There is some analysis of the delay period population activity in the supplement, but it doesn't quite get at the question of how the interception movement is prepared. This is perhaps the most interesting question that can be asked with this experiment, and it's one that I think may be quite novel for the field--it is a shame that it isn't discussed.

      It’s a great idea! We are on the way, and it seems promising.

      (2) Supervised examination of population structure via potent and null spaces: simply examining the first three principal components revealed an orbital structure, with a seemingly conserved motor output space and a dimension orthogonal to it that relates to the visual input. However, the authors don't push this insight any further. One way to do that would be to find the "potent space" of motor cortical activity by regression to the arm movement and examine how the tilted rings look in that space (this is actually fairly easy to see in the reach direction components of the dPCA plot in the supplement--the rings will be highly aligned in this space). Presumably, then, the null space should contain information about the target movement. dPCA shows that there's not a single dimension that clearly delineates target speed, but the ring tilt is likely evident if the authors look at the highest variance neural dimension orthogonal to the potent space (the "null space")-this is akin to PC3 in the current figures, but it would be nice to see what comes out when you look in the data for it.

      Thank you for this nice suggestion. While it was feasible to identify potent subspaces encoding reach direction and null spaces for target-velocity modulation, as suggested by the reviewer, the challenge remained that unsupervised methods were insufficient to isolate a pure target-velocity subspace from numerous possible candidates due to the small variance of target-velocity information. Although dPCA components can be used to construct orthogonal subspaces for individual task variables, we found that the targetvelocity information remained highly entangled with reach-direction representation. More details can be found in Figure S8C and its caption as below:

      “We used dPCA components with different features to construct three subspaces (same data in A, reach-direction space #3, #4, #5; target-velocity space #10, #15, #17; interaction space #6, #11, #12), and we projected trial-averaged data into these orthogonal subspaces using different colormaps. This approach allowed us to obtain a “potent subspace” coding reach direction and a “null space” for target velocity. The results showed that the reach-direction subspace effectively represented the reach direction. However, while the target-velocity subspace encoded the target velocity information, it still contained reach-direction clusters within each target-velocity condition, corroborating the results of the addition model in the main text (Figure 4). The interaction subspace revealed that multiple reach-direction rings were nested within each other, similar to the findings from the gain model (Figure 3 & 4). The interaction subspace also captured more variance than target-velocity subspace, consistent with our PCA results, suggesting the target-velocity modulation primarily coexists with reach-direction coding. Furthermore, we explored alternative methods to verify whether orthogonal subspaces could effectively separate the reach direction and target velocity. We could easily identify the reach-direction subspace, but its orthogonal subspace was relatively large, and the target-velocity information exhibited only small variance, making it difficult to isolate a subspace that purely encodes target velocity.”

      (3) RNN perturbations: as it's currently written, the RNN modeling has promise, but the perturbations performed don't provide me with much insight. I think this is because the authors are trying to use the RNN to interpret the single neuron tuning, but it's unclear to me what was learned from perturbing the connectivity between what seems to me almost arbitrary groups of neurons (especially considering that 43% of nodes were unclassifiable). It seems to me that a better perturbation might be to move the neural state before the movement onset to see how it changes the output. For example, the authors could move the neural state from one tilted ring to another to see if the virtual hand then reaches a completely different (yet predictable) target. Moreover, if the authors can more clearly characterize the preparatory movement, perhaps perturbations in the delay period would provide even more insight into how the interception might be prepared.

      We are sorry that we did not clarify the definition of “none” type, which can be misleading. The 43% unclassifiable nodes include those inactive ones; when only activate (taskrelated) nodes included, the ratio of unclassifiable nodes would be much lower. We recomputed the ratios with only activated units and have updated Table 1. By perturbing the connectivity, we intended to explore the interaction between different modulations.

      Thank you for the great advice. We considered moving neural states from one ring to another without changing the directional cluster. However, we found that this perturbation design might not be fully developed: since the top two PCs are highly correlated with movement direction, such a move—similar to exchanging two states within the same cluster but under different target-motion conditions—would presumably not affect the behavior.

      Reviewer #3 (Public Review):

      Summary:

      This experimental study investigates the influence of sensory information on neural population activity in M1 during a delayed reaching task. In the experiment, monkeys are trained to perform a delayed interception reach task, in which the goal is to intercept a potentially moving target.

      This paradigm allows the authors to investigate how, given a fixed reach endpoint (which is assumed to correspond to a fixed motor output), the sensory information regarding the target motion is encoded in neural activity.

      At the level of single neurons, the authors found that target motion modulates the activity in three main ways: gain modulation (scaling of the neural activity depending on the target direction), shift (shift of the preferred direction of neurons tuned to reach direction), or addition (offset to the neural activity).

      At the level of the neural population, target motion information was largely encoded along the 3rd PC of the neural activity, leading to a tilt of the manifold along which reach direction was encoded that was proportional to the target speed. The tilt of the neural manifold was found to be largely driven by the variation of activity of the population of gain-modulated neurons.

      Finally, the authors studied the behaviour of an RNN trained to generate the correct hand velocity given the sensory input and reach direction. The RNN units were found to similarly exhibit mixed selectivity to the sensory information, and the geometry of the “ neural population” resembled that observed in the monkeys.

      Strengths:

      - The experiment is well set up to address the question of how sensory information that is directly relevant to the behaviour but does not lead to a direct change in behavioural output modulates motor cortical activity.

      - The finding that sensory information modulates the neural activity in M1 during motor preparation and execution is non trivial, given that this modulation of the activity must occur in the nullspace of the movement.

      - The paper gives a complete picture of the effect of the target motion on neural activity, by including analyses at the single neuron level as well as at the population level. Additionally, the authors link those two levels of representation by highlighting how gain modulation contributes to shaping the population representation.

      Thank you for your recognition.

      Weaknesses:

      - One of the main premises of the paper is the fact that the motor output for a given reach point is preserved across different target motions. However, as the authors briefly mention in the conclusion, they did not record muscle activity during the task, but only hand velocity, making it impossible to directly verify how preserved muscle patterns were across movements. While the authors highlight that they did not see any difference in their results when resampling the data to control for similar hand velocities across conditions, this seems like an important potential caveat of the paper whose implications should be discussed further or highlighted earlier in the paper.

      Thanks for the suggestion. We’ve highlighted the resampling results as an important control in the revised manuscript in Figure S11 and Lines 257-260 as:

      “To eliminate hand-speed effect, we resampled trials to construct a new dataset with similar distributions of hand speed in each target-motion condition and found similar orbital neural geometry. Moreover, the target-motion gain model provided a better explanation compared to the hand-speed gain model (Figure S11).”

      - The main takeaway of the RNN analysis is not fully clear. The authors find that an RNN trained given a sensory input representing a moving target displays modulation to target motion that resembles what is seen in real data. This is interesting, but the authors do not dissect why this representation arises, and how robust it is to various task design choices. For instance, it appears that the network should be able to solve the task using only the motion intention input, which contains the reach endpoint information. If the target motion input is not used for the task, it is not obvious why the RNN units would be modulated by this input (especially as this modulation must lie in the nullspace of the movement hand velocity if the velocity depends only on the reach endpoint). It would thus be important to see alternative models compared to true neural activity, in addition to the model currently included in the paper. Besides, for the model in the paper, it would therefore be interesting to study further how the details of the network setup (eg initial spectral radius of the connectivity, weight regularization, or using only the target position input) affect the modulation by the motion input, as well as the trained population geometry and the relative ratios of modulated cells after training.

      Great suggestions. In the revised manuscript, we’ve added the results of three alternative modes in Table S4 and Lines 355-365 as below:

      “We also tested three alternative network models: (1) only receives motor intention and a GO-signal; (2) only receives target location and a GO-signal; (3) initialized with sparse connection (sparsity=0.1); the unmentioned settings and training strategies were as the same as those for original models (Table S4; see Methods). The results showed that the three modulations could emerge in these models as well, but with obviously distinctive distributions. In (1), the ring-like structure became overlapped rings parallel to the PC1PC2 plane or barrel-like structure instead; in (2), the target-motion related tilting tendency of the neural states remained, but the projection of the neural states on the PC1-PC2 plane was distorted and the reach-direction clusters dispersed. These implies that both motor intention and target location seem to be needed for the proposed ring-like structure. The initialization of connection weights of the hidden layer can influence the network’s performance and neural state structure, even so, the ring-like structure”

      - Additionally, it is unclear what insights are gained from the perturbations to the network connectivity the authors perform, as it is generally expected that modulating the connectivity will degrade task performance and the geometry of the responses. If the authors wish the make claims about the role of the subpopulations, it could be interesting to test whether similar connectivity patterns develop in networks that are not initialized with an all-to-all random connectivity or to use ablation experiments to investigate whether the presence of multiple types of modulations confers any sort of robustness to the network.

      Thank you for these great suggestions. By perturbations, we intended to explore the contribution of interaction between certain subpopulations. We’ve included the ablation experiments in the updated manuscript in Table S3 and Lines 344-346 as below: “The ablation experiments showed that losing any kind of modulation nodes would largely deteriorate the performance, and those nodes merely with PD-shift modulation could mostly impact the neural state structure (Table S3).”

      - The results suggest that the observed changes in motor cortical activity with target velocity result from M1 activity receiving an input that encodes the velocity information. This also appears to be the assumption in the RNN model. However, even though the input shown to the animal during preparation is indeed a continuously moving target, it appears that the only relevant quantity to the actual movement is the final endpoint of the reach. While this would have to be a function of the target velocity, one could imagine that the computation of where the monkeys should reach might be performed upstream of the motor cortex, in which case the actual target velocity would become irrelevant to the final motor output. This makes the results of the paper very interesting, but it would be nice if the authors could discuss further when one might expect to see modulation by sensory information that does not directly affect motor output in M1, and where those inputs may come from. It may also be interesting to discuss how the findings relate to previous work that has found behaviourally irrelevant information is being filtered out from M1 (for instance, Russo et al, Neuron 2020 found that in monkeys performing a cycling task, context can be decoded from SMA but not from M1, and Wang et al, Nature Communications 2019 found that perceptual information could not be decoded from PMd)?

      How and where sensory information modulating M1 are very interesting and open questions. In the revised manuscript, we discuss these in Lines 435-446, as below: “It would be interesting to explore whether other motor areas also allow sensory modulation during flexible interception. The functional differences between M1 and other areas lead to uncertain speculations. Although M1 has pre-movement activity, it is more related to task variables and motor outputs. Recently, a cycling task sets a good example that the supplementary motor area (SMA) encodes context information and the entire movement (Russo et al., 2020), while M1 preferably relates to cycling velocity (Saxena et al., 2022). The dorsal premotor area (PMd) has been reported to capture potential action selection and task probability, while M1 not (Cisek and Kalaska, 2005; Glaser et al., 2018; Wang et al., 2019). If the neural dynamics of other frontal motor areas are revealed, we might be able to tell whether the orbital neural geometry of mixed selectivity is unique in M1, or it is just inherited from upstream areas like PMd. Either outcome would provide us some insights into understanding the interaction between M1 and other frontal motor areas in motor planning.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      At times the writing was a little hard to parse. It could benefit from being fleshed out a bit to link sentences together better.

      There are a few grammatical errors, such as:

      "These results support strong and similar roles of gain and additive nodes, but what is even more important is that the three modulations interact each other, so the PD-shift nodes should not be neglected."

      should be

      "These results support strong and similar roles of gain and additive nodes, but what is even more important is that the three modulations interact WITH each other, so the PDshift nodes should not be neglected."

      The discussion could also be more extensive to benefit non-experts in the field.

      Thank you. We have proofread and polished the updated manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Other comments:

      - The authors mention mixed selectivity a few times, but Table 1 doesn't have a column for mixed selective neurons--this seems like an important oversight. Likewise, it would be good to see an example of a "mixed" neuron.

      - The structure of the writing in the results section often talked about the supplementary results before the main results - this seems backwards. If the supplementary results are important enough to come before the main figures, then they should not be supplementary. Otherwise, if the results are truly supplementary, they should come after the main results are discussed.

      - Line 305: Authors say "most" RNN units could be classified, and this is technically true, but only barely, according to Table 1. It might be good to put the actual percentage here in the text.

      - Figure 5a: typo ("Motion intention" rather than "Motor")

      - I couldn't find any mention of code or data availability in the manuscript.

      - There were a number of lines that didn't make much sense to me and should probably be rewritten or expanded on:

      - Lines 167-168: "These results qualitatively imply the interaction as that target speeds..." - Lines 178-179: "However, these neural trajectories were not yet the ideal description, because they were shaped mostly by time."

      - Lines 187-188: "...suggesting that target motion affects M1 neural dynamics via a topologically invariant transformation."

      - Lines 224-226: "Note that here we performed an linear transformation on all resulting neural state points to make the ellipse of the static condition orthogonal to the z-axis for better visualization." Does this mean that the z-axis is not PC 3 anymore?

      - Lines 272-274: "These simulations suggest that the existence of PD-shift and additive modulation would not disrupt the neural geometry that is primarily driven by gain modulation; rather it is possible that these three modulations support each other in a mixed population."

      Thank you for these detailed suggestions. By “mixed selectivity”, we mean the joint tuning of both target-motion and movement. In this case, the target-motion modulated neurons (regardless of the modulation type) are of mixed selectivity. The term “motor intention” refers to Mazzoni et al., 1996, Journal of Neurophysiology. We also revised the manuscript for better readership.

      We have updated the data and code availability in Data availability as below:

      “The example experimental datasets and relevant analysis code have been deposited in Mendeley Data at https://data.mendeley.com/datasets/8gngr6tphf. The RNN relevant code and example model datasets are available at https://github.com/yunchenyc/RNN_ringlike_structure.“

      Reviewer #3 (Recommendations For The Authors):

      Minor typos:

      Line 153: “there were”

      Line 301: “network was trained to generate”

      Line 318: “interact with each other”

      Suggested reformulations :

      Line 310 : “tilting angles followed a pattern similar to that seen in the data” Line 187 : the claim of a “topologically invariant transformation” seems strong as the analysis is quite qualitative.

      Suggested changes to the paper (aside from those mentioned in the main review): It could be nice to show behaviour in a main figure panel early on in the paper. This could help with the task description (as it would directly show how the trials are separated based on endpoint) and could allow for discussing the potential caveats of the assumption that behaviour is preserved.

      Thank you. We have corrected these typos and writing problems. As the similar task design has been reported, we finally decided not to provide extra figures or videos. Still, we thank this nice suggestion.

    1. eLife Assessment

      This valuable manuscript describes the immunogenicity of a bead-on-a-string immunogen that allows the inclusion of multiple HA subtypes. The evidence to support the claims is convincing, and more importantly, this approach could be adapted to other vaccine platforms.

    2. Reviewer #2 (Public review):

      Summary:

      The authors describe a "beads-on-a-string" (BOAS) immunogen, where they link, using a non-flexible glycine linker, up to eight distinct hemagglutinin (HA) head domains from circulating and non-circulating influenzas and assess their immunogenicity. They also display some of their immunogens on ferritin NP and compare the immunogenicity. They conclude that this new platform can be useful to elicit robust immune responses to multiple influenza subtypes using one immunogen and that it can also be used for other viral proteins.

      Strengths:

      The paper is clearly written. While the use of flexible linkers has been used many times, this particular approach (linking different HA subtypes in the same construct resembling adding beads on a string, as the authors describe their display platform) is novel and could be of interest.

      Comments on revisions:

      The authors have addressed most comments. Some mistakes/issues remain:

      TI should be defined earlier on line 61 not on line 196

      No legend for Figure 3E - it looks like this is where the authors did the first immunization with the "mix" to compare to the BOAs but strangely they do not mention this in the response to reviewers letter and only mention fig 6G and 7<br /> Maybe add "mix" to the title of Figure 3?

      In Figure 6G they do show the response to the mix but do not mention it in the immunizations for that figure. Also weird because obviously the mix is not a NP while this figure addresses NP format.

      Line 796 - pseudo viruses

      The authors should add some clarification in the paper as they did in response to reviewers.

    3. Reviewer #3 (Public review):

      This work describes the tandem linkage of influenza hemagglutinin (HA) receptor binding domains of diverse subtypes to create 'beads on a string' (BOAS) immunogens. They show that these immunogens elicit ELISA binding titers against full-length HA trimers in mice, as well as varying degrees of vaccine mismatched responses and neutralization titers. They also compare these to BOAS conjugated on ferritin nanoparticles and find that this did not largely improve immune responses. This work offers a new type of vaccine platform for influenza vaccines, and this could be useful for further studies on the effects of conformation and immunodominance on the resulting immune response. 

      Overall, the central claims of immunogenicity in a murine model of the BOAS immunogens described here are supported by the data. 

      Strengths included the adaptability of the approach to include several, diverse subtypes of HAs. The determination of an optimal composition of strains in the 5-BOAS that overall yielded the best immune responses was an interesting finding and one that could also be adapted to other vaccine platforms. Lastly, as the authors discuss, the ease of translation to an mRNA vaccine is indeed a strength of this platform. 

      One interesting and counter-intuitive result is the high levels of neutralization titers seen to vaccine-mismatched, group 2 H7 in the 5-BOAS group that differs from the 4-BOAS with the addition of a group 1 H5 RBD. At the same time, no H5 neutralization titers were observed for any of the BOAS immunogens, yet they were seen for the BOAS-NP. Uncovering where these immune responses are being directed and why these discrepancies are being observed would be informative future work. 

      There are a few caveats in the data that should be noted: 

      (1) 20 ug is a pretty high dose for a mouse and the majority of the serology presented is after 3 doses at 20 ug. By comparison, 0.5-5 ug is a more typical range (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380945/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980174/). Also, the authors state that 20 ug per immunogen was used, including for the BOAS-NP group, which would mean that the BOAS-NP group was given a lower gram dose of HA RBD relative to the BOAS groups. 

      (2) Serum was pooled from all animals per group for neutralization assays, instead of testing individual animals. This could mean that a single animal with higher immune responses than the rest in the group could dominate the signal and potentially skew the interpretation of this data. 

      (3) In Figure S2, it looks like an apparent increase in MW by changing the order of strains here, which may be due to differences in glycosylation. Further analysis would be needed to determine if there are discrepancies in glycosylation amongst the BOAS immunogens and how those differ from native HAs. 

      Comments on revisions:

      The authors have addressed all concerns upon revision.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript by Thronlow Lamson et al., the authors develop a "beads-on-a-string" or BOAS strategy to link diverse hemagglutinin head domains, to elicit broadly protective antibody responses. The authors are able to generate varying formulations and lengths of the BOAS and immunization of mice shows induction of antibodies against a broad range of influenza subtypes. However, several major concerns are raised, including the stability of the BOAS, that only 3 mice were used for most immunization experiments, and that important controls and analyses related to how the BOAS alone, and not the inclusion of diverse heads, impacts humoral immunity.

      Strengths:

      Vaccine strategy is new and exciting.

      Analyses were performed to support conclusions and improve paper quality.

      Weaknesses:

      Controls for how different hemagglutinin heads impact immunity versus the multivalency of the BOAS.

      Only 3 mice were used for most experiments.

      There were limited details on size exclusion data.

      We appreciate the reviewer’s comments and have made the following changes to the manuscript.

      (1) We recognize that deconvoluting the effect of including a diverse set of HA heads and multivalency in the BOAS immunogens is necessary to understand the impact on antigenicity. Therefore, we now include a cocktail of the identical eight HA heads used in the 8-mer and BOAS nanoparticle (NP) as an additional control group. While we observed similar HA binding titers relative to the 8-mer and BOAS NP groups, the cocktail group-elicited sera was unable to neutralize any of the viruses tested; multivalency thus appears to be important for eliciting neutralizing responses

      (2) We increased the sample size by repeated immunizations with n=5 mice, for a total of n=8 mice across two independent experiments.

      (3) We expanded the details on size exclusion data to include:

      a) extended chromatograms from Figure 2C as Supplemental Figure 3.

      b) additional details in the materials and methods section (lines 370-372):

      “Recovered proteins were then purified on a Superdex 200 (S200) Increase 10/300 GL (for trimeric HAs) or Superose 6 Increase 10/300 GL (for BOAS) size-exclusion column in Dulbecco’s Phosphate Buffered Saline (DPBS) within 48 hours of cobalt resin elution.”

      Reviewer #2 (Public Review):

      Summary:

      The authors describe a "beads-on-a-string" (BOAS) immunogen, where they link, using a non-flexible glycine linker, up to eight distinct hemagglutinin (HA) head domains from circulating and non-circulating influenzas and assess their immunogenicity. They also display some of their immunogens on ferritin NP and compare the immunogenicity. They conclude that this new platform can be useful to elicit robust immune responses to multiple influenza subtypes using one immunogen and that it can also be used for other viral proteins.

      Strengths:

      The paper is clearly written. While the use of flexible linkers has been used many times, this particular approach (linking different HA subtypes in the same construct resembling adding beads on a string, as the authors describe their display platform) is novel and could be of interest.

      Weaknesses:

      The authors did not compare to individuals HA ionized as cocktails and did not compare to other mosaic NP published earlier. It is thus difficult to assess how their BOAS compare.<br /> Other weaknesses include the rationale as to why these subtypes were chosen and also an explanation of why there are different sizes of the HA1 construct (apart from expression). Have the authors tried other lengths? Have they expressed all of them as FL HA1?

      We appreciate the reviewer’s comments. We responded to the concerns below and modified the manuscript accordingly.

      (1) We recognize that including a “cocktail” control is important to understand how the multivalency present in a single immunogen affects the immune response. We now include an additional control group comprised of a mixture of the same eight HA heads used in the 8-mer and the BOAS nanoparticle (NP). While this cocktail elicited similar HA binding titers relative to the 8-mer and BOAS NP immunogens (Fig. 6G), there was no detectable neutralization any of the viruses tested (Fig. 7).

      (2) In the introduction we reference other multivalent display platforms but acknowledge that distinct differences in their immunogen design platforms make direct comparisons to ours difficult—which is ultimately why we did not use them as comparators for our in vivo studies. Perhaps most directly relevant to our BOAS platform is the mosaic HA NP from Kanekiyo et al. (PMID 30742080). Here, HA heads, with similar boundaries to ours, were selected from historical H1N1 strains. These NPs however were significantly less antigenic diverse relative to our BOAS NPs as they did not include any group 2 (e.g., H7, H9) or B influenza HAs; restricting their multivalent display to group 1 H1N1s likely was an important factor in how they were able to achieve broad, neutralizing H1N1 responses. Additionally, Cohen et al. (PMID 33661993) used similarly antigenically distinct HAs in their mosaic NP, though these included full-length HAs with the conserved stem region, which likely has a significant impact on the elicited cross-reactive responses observed. Lastly, we reference Hills et al. (PMID 38710880), where authors designed similar NPs with four tandemly-linked betacoronoavirus receptor binding domains (RBDs) to make “quartets”. In contrast to our observations, the authors observed increased binding and neutralization titers following conjugation to protein-based NPs. We acknowledge potential differences between the studies, such as the antigen and larger VLP NP, that could lead to the different observed outcomes.

      (3) We intended to highlight the “plug-and-play” nature of the BOAS platform; theoretically any HA subtype could be interchanged into the BOAS. To that end, our rationale for selecting the HA subtypes in our proof-of-principle immunogen was to include an antigenically diverse set of circulating and non-circulating HAs that we could ultimately characterize with previously published subtype-specific antibodies that were also conformation-specific. In doing so, these diagnostic antibodies could confirm presence and conformation integrity of each component. We intentionally did not include HA subtypes that we did not have a conformation-specific antibody for.

      The different sizes of HA head domains was determined exclusively by expression of the recombinant protein. We have not attempted expression of full-length HA1 domains. Furthermore, we have not attempted to express the full-length HA (inclusive of HA1 and HA2) in our BOAS platform. The primary reason was to avoid including the conserved stem region of HA2 which may distract from the HA1 epitopes (e.g., receptor binding site, lateral patch) that can be engaged by broadly neutralizing antibodies. Additionally, the full-length HA is inherently trimeric and may not be as amenable to our BOAS platform as the monomeric HA1 head domain.

      Reviewer #3 (Public Review):

      This work describes the tandem linkage of influenza hemagglutinin (HA) receptor binding domains of diverse subtypes to create 'beads on a string' (BOAS) immunogens. They show that these immunogens elicit ELISA binding titers against full-length HA trimers in mice, as well as varying degrees of vaccine mismatched responses and neutralization titers. They also compare these to BOAS conjugated on ferritin nanoparticles and find that this did not largely improve immune responses. This work offers a new type of vaccine platform for influenza vaccines, and this could be useful for further studies on the effects of conformation and immunodominance on the resulting immune response.

      Overall, the central claims of immunogenicity in a murine model of the BOAS immunogens described here are supported by the data.

      Strengths included the adaptability of the approach to include several, diverse subtypes of HAs. The determination of the optimal composition of strains in the 5-BOAS that overall yielded the best immune responses was an interesting finding and one that could also be adapted to other vaccine platforms. Lastly, as the authors discuss, the ease of translation to an mRNA vaccine is indeed a strength of this platform.

      One interesting and counter-intuitive result is the high levels of neutralization titers seen in vaccine-mismatched, group 2 H7 in the 5-BOAS group that differs from the 4-BOAS with the addition of a group 1 H5 RBD. At the same time, no H5 neutralization titers were observed for any of the BOAS immunogens, yet they were seen for the BOAS-NP. Uncovering where these immune responses are being directed and why these discrepancies are being observed would constitute informative future work.

      There are a few caveats in the data that should be noted:

      (1) 20 ug is a pretty high dose for a mouse and the majority of the serology presented is after 3 doses at 20 ug. By comparison, 0.5-5 ug is a more typical range (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380945/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980174/). Also, the authors state that 20 ug per immunogen was used, including for the BOAS-NP group, which would mean that the BOAS-NP group was given a lower gram dose of HA RBD relative to the BOAS groups.

      We agree that this is on the “upper end” of recombinant protein dose. While we did not do a dose-response, we now include serum analyses after a single prime. The overall trends and reactivity to matched and mis-matched BOAS components remained similar across days d28 and d42. However, the differences between the BOAS and BOAS NP groups and the mixture group were more pronounced at d28, which reinforces our observation that the multivalency of the HA heads is necessary for eliciting robust serum responses to each component. These data are included in Supplemental Figure 5, and we’ve modified the text (lines 185-187) to include;

      “Similar binding trends were also observed with d28 serum, though the difference between the 8mer and mix groups was more pronounced at d28 (Supplemental Figure 5).”

      Additionally, we acknowledge that there is a size discrepancy between the BOAS NP and the largest BOAS, leading to an approximately ~15-fold difference on a per mole basis of the BOAS immunogen. The smallest and largest BOAS also differ by ~ 2.5-fold on a per mole basis; this could favor the overall amount of the smaller immunogens, however because vaccine doses are typically calculated on a mg per kg basis, we did not calculate on a molar basis for this study. Any promising immunogens will be evaluated in dose-response study to optimize elicited responses.

      (2) Serum was pooled from all animals per group for neutralization assays, instead of testing individual animals. This could mean that a single animal with higher immune responses than the rest in the group could dominate the signal and potentially skew the interpretation of this data.

      We repeated the neutralization assays with data points for individual mice. There does appear to be variability in the immune response between mice. This is most noticeable for responses to the H5 component. We are currently assessing what properties of our BOAS immunogen might contribute to the variability across individual mice.

      (3) In Figure S2, it looks like an apparent increase in MW by changing the order of strains here, which may be due to differences in glycosylation. Further analysis would be needed to determine if there are discrepancies in glycosylation amongst the BOAS immunogens and how those differ from native HAs.

      There does appear to be a relatively small difference in MW between the two BOAS configurations shown in Figure S2. This could be due to differences in glycosylation, as the reviewer points out, and in future studies, we intend to assess the influence of native glycosylation on antibody responses elicited by our BOAS immunogens.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major Concerns

      (1) From Figure 2D-E, it looks like BOAS are forming clusters, rather than a straight line. Do these form aggregates over time? Both at 4 degrees over a few days or after freeze-thaw cycle(s)? It is unclear from the SEC methods how long after purification this was performed and stability should be considered.

      Due to the inherent flexibility of the Gly-Ser linker between each component we do not anticipate that any rigidity would be imposed resulting in a “straight line”. Nevertheless, we appreciate the reviewers concern about the long-term stability of the BOAS immunogens. To address this, we include 1) the extended chromatograms from Figure 2C as Supplemental Figure 3 to show any aggregates present, 2) traces from up to 48 hours post-IMAC, and 3) chromatograms following a freeze-thaw cycle. Post-IMAC purification there is a minor (<10% total peak height) at ~9mL corresponding to aggregation. Note, we excluded this aggregation for immunizations. Post freeze-thaw cycle, we can see that upon immediate (<24hrs) thawing, the BOAS maintain a homogeneous peak with no significant (<10%) aggregation or degradation peak. However, after ~1 week post-freeze-thaw cycle at 4C, additional peaks within the chromatogram correspond to degradation of the BOAS.

      We modified the materials and methods section to state (lines 370-372)

      “Recovered proteins were then purified on a Superdex 200 (S200) Increase 10/300 GL (for trimeric HAs) or Superose 6 Increase 10/300 GL (for BOAS) size-exclusion column in Dulbecco’s Phosphate Buffered Saline (DPBS) within 48 hours of cobalt resin elution.”

      We commented on BOAS stability in the results section (lines 142-148)

      “Following SEC, affinity tags were removed with HRV-3C protease; cleaved tags, uncleaved BOAS, and His-tagged enzyme were removed using cobalt affinity resin and snap frozen in liquid nitrogen before immunizations. BOAS maintained monodispersity upon thawing, though over time, degradation was observed following longer term (>1 week) storage at 4C (Supplemental Figure 3). This degradation became more significant as BOAS increased in length (Supplemental Figure 3).”

      We also included in the discussion (lines 277-279):

      “Notably, for longer BOAS we observed degradation following longer term storage at 4C, which may reflect their overall stability.”

      (2) Figures 3-4 and 6-7, to make conclusions off of 3 mice per group is inappropriate. A sample size calculation should have been conducted and the appropriate number of mice tested. In addition, two independent mouse experiments should always be performed. Moreover, the reliability of the statistical tests performed seems unlikely, given the very small sample size.

      We agree that additional mice are necessary to make assessments regarding immunogenicity and cross-reactivity differences between the immunogens. To address this, we repeated the immunization with 5 additional mice, for a total of n=8 mice over two independent experiments. We incorporated these data into Figure 3B-D, as well as an additional Figure 3E (see below). We also now report the log-transformed endpoint titer (EPT) values rather than reciprocal EC50 values and added clarity to statistical analyses used. We have added the following lines to the methods section

      lines 427-431:

      “Serum endpoint titer (EPT) were determined using a non-linear regression (sigmoidal, four-parameter logistic (4PL) equation, where x is concentration) to determine the dilution at which dilution the blank-subtracted 450nm absorbance value intersect a 0.1 threshold. Serum titers for individual mice against respective antigens are reported as log transformed values of the EPT dilution.”

      lines 406-408:

      “C57BL/6 mice (Jackson Laboratory) (n=8 per group for 3-, 4-, 5-, 6-, 7-, and 8mer cohorts; n=5 for BOAS NP, NP, and mix cohorts) were immunized with 20µg of BOAS immunogens of varying length and adjuvanted with 50% Sigmas Adjuvant for a total of 100µL of inoculum.”

      lines 482-490:

      “Statistical Analysis

      Significance for ELISAs and microneutralization assays were determined using Prism (GraphPad Prism v10.2.3). ELISAs comparing serum reactivity and microneutralization and comparing >2 samples were analyzed using a Kruskal-Wallis test with Dunn’s post-hoc test to correct for multiple comparisons. Multiple comparisons were made between each possible combination or relative to a control group, where indicated. ELISAs comparing two samples were analyzed using a Mann-Whitney test. Significance was assigned with the following: * = p<0.05, ** = p<0.01, *** = p<0.001, and **** = p<0.0001. Where conditions are compared and no significance is reported, the difference was non-significant.”

      (3) One critical control that is missing is a homogenous BOAS, for example, just linking one H1 on a BOAS. Does oligomerization and increasing avidity alone improve humoral immunity?

      We agree that this is an interesting point, However, to address the impact of oligomerization and avidity on humoral immunity, we now include an additional control with a cocktail of HA heads used in the 8mer. We have incorporated this into Figure 3A, 3D and 3E, Figure 6G, and Figure 7.

      Additionally, we have added the following lines in the manuscript:

      lines 38-40:

      “Finally, vaccination with a mixture of the same HA head domains is not sufficient to elicit the same neutralization profile as the BOAS immunogens or nanoparticles.”

      lines 105-106:

      “Additionally, we showed that a mixture of the same HA head components was not sufficient to recapitulate the neutralizing responses elicited by the BOAS or BOAS NP.”

      lines 169-172:

      “To determine immunogenicity of each BOAS immunogen, we performed a prime-boost-boost vaccination regimen in C5BL/6 mice at two-week intervals with 20µg of immunogen and adjuvanted with Sigma Adjuvant (Figure 3A). We compared these BOAS to a control group immunized with a mixture of the eight HA heads present in the 8mer.”

      lines 265-267:

      “There were qualitatively immunodominant HAs, notably H4 and H9, and these were relatively consistent across BOAS in which they were a component. This effect was reduced in the mix cohort.”

      (4) While some cross-reactivity is likely (Figure 6G), there is considerable loss of binding when there is a mismatch. Of the antibodies induced, how much of this is strain-specific? For example, how well do serum antibodies bind to a pre-2009 H1?

      We agree with the reviewer that there is a considerable loss of binding when there is a mismatched HA component. To better understand this and incorporate a mismatched strain into our analysis of the 8mer and BOAS NP, we looked at serum binding titers to a pre-2009 H1, H1/Solomon Islands/2006, and an antigenically distinct H3, H3/Hong Kong/1968. We have incorporated this data into Figures 3D, 3E, 6F and 6G. We observed relatively high titers against both a mismatched H1 and H3, indicating that the BOAS maintain high titers against subtype-specific strains that are conserved over considerable antigenic distance. However, this was similar in the mixture group, indicating that this may not be specific to oligomerization of BOAS immunogens.

      We added the following to the methods section:

      lines 357-361

      “Head subdomains from these HAs were used in the BOAS immunogens, and full-length soluble ectodomain (FLsE) trimers were used in ELISAs. Additional H1 (H1/A/Solomon Islands/3/2006) and H3 (H3/A/Hong Kong/1/1968) FLsEs were used in ELISAs as mismatched, antigenically distinct HAs for all BOAS.”

      Minor Concerns

      (1) Line 44-46, the deaths per year are almost exclusively due to seasonal influenza outbreaks caused by antigenically drifted viruses in humans, not those spilling over from avian sp. and swine. For accuracy, please adjust this sentence.

      We have adjusted lines 45-48 to say “This is largely a consequence of viral evolution and antigenic drift as it circulates seasonally within humans and ultimately impacts vaccine effectiveness. Additionally, the chance for spillover events from animal reservoirs (e.g., avian, swine) is increasing as population and connectivity also increase.”

      (2) Figure 4D-E, provide a legend for what the symbols indicate, or simply just put the symbol next to either the homology score and % serum competition labels on the y-axis.

      We have included a legend in Figures 4D,E to distinguish between homology score and % serum competition

      (3) I am a bit confused by the data presented in Figure 7. The figure legend says the two symbols represent technical replicates. How? Is one technical replicate of all the mice in a group averaged and that's what's graphed? If so, this is not standard practice. I would encourage the authors to show the average technical replicates of each animal, which is standard.

      We thank the reviewer for their suggestion, and we have revised Figure 7 such that each symbol represents a single animal for n=5 animals. We have also adjusted the figure caption to the following:

      “Figure 7: Microneutralization titers to matched and mis-matched virus- Microneutralization of matched and mis-matched psuedoviruses: H1N1 (green, top left), H3N2 (orange, top right), H5N1 (yellow, bottom left), and H7N9 viruses (pink, bottom right) with d42 serum. Solid bars below each plot indicate a matched sub-type, and striped bars indicate a mis-matched subtype (i.e. not present in the BOAS). NP negative controls were used to determine threshold for neutralization. Upper and lower dashed lines represent the first dilution (1:32) (for H1N1, H3N2, and H5N1) or neutralization average with negative control NP serum (H7N9), and the last serum dilution (1:32,768), respectively, and points at the dashed lines indicate IC50s at or outside the limit of detection. Individual points indicate IC50 values from individual mice from each cohort (n=5). The mean is denoted by a bar and error bars are +/- 1 s.d., * = p<0.05 as determined by a Kruskal-Wallis test with Dunn’s multiple comparison post hoc test relative to the mix group.”

      (4) Paragraphs 298-313, multiple studies are referred to but not referenced.

      We have added the following references to this section:

      (38) Kanekiyo, M. et al. Self-assembling influenza nanoparticle vaccines elicit broadly neutralizing H1N1 antibodies. Nature 498, 102–106 (2013).

      (48) Hills, R. A. et al. Proactive vaccination using multiviral Quartet Nanocages to elicit broad anti-coronavirus responses. Nat. Nanotechnol. 1–8 (2024) doi:10.1038/s41565-024-01655-9.

      (65) Jardine, J. et al. Rational HIV immunogen design to target specific germline B cell receptors. Science 340, 711–716 (2013).

      (66) Tokatlian, T. et al. Innate immune recognition of glycans targets HIV nanoparticle immunogens to germinal centers. Science 363, 649–654 (2019).

      (67) Kato, Y. et al. Multifaceted Effects of Antigen Valency on B Cell Response Composition and Differentiation In Vivo. Immunity 53, 548-563.e8 (2020).

      (68) Marcandalli, J. et al. Induction of Potent Neutralizing Antibody Responses by a Designed Protein Nanoparticle Vaccine for Respiratory Syncytial Virus. Cell 176, 1420-1431.e17 (2019).

      (69) Bruun, T. U. J., Andersson, A.-M. C., Draper, S. J. & Howarth, M. Engineering a Rugged Nanoscaffold To Enhance Plug-and-Display Vaccination. ACS Nano 12, 8855–8866 (2018).

      (70) Kraft, J. C. et al. Antigen- and scaffold-specific antibody responses to protein nanoparticle immunogens. Cell Reports Medicine 100780 (2022) doi:10.1016/j.xcrm.2022.100780.

      Reviewer #2 (Recommendations For The Authors):

      Can the authors define "detectable titers"?

      Maybe add a threshold value of reciprocal EC on the figure for each plot.

      We recognize the reviewers concern with reporting serum titers in this way, and we have adjusted our reported titers as endpoint titers (EPT) with a dotted line for the first detectable dilution (1:50). We have also adjusted the methods section to reflect this change:

      (lines 427-431)

      “Serum endpoint titer (EPT) were determined using a non-linear regression (sigmoidal, four-parameter logistic (4PL) equation, where x is concentration) to determine the dilution at which dilution the blank-subtracted 450nm absorbance value intersect a 0.1 threshold. Serum titers for individual mice against respective antigens are reported as log transformed values of the EPT dilution.”

      It also appears that not all X-mer elicits an immune response against matched HA, e.g. for the 7 and 8 -mer. Not sure why the authors do not mention this. It could be due to too many HAs, not sure.

      We apologize for the confusion, and agree that our original method of reporting EC50 values does not reflect weak but present binding titers. Upon further analysis with additional mice as well as adjusting our method of reporting titers, it is easier to see in Figure 3D that all X-mer BOAS do indeed elicit binding detectable titers to matched HA components.

      It will be nice to add a conclusion to the cross-reactivity - again it appears that past 6-mer there has been a loss in cross-reactivity even though there are more subtypes on the BOAS.

      Also, the TI seemed to be the more conserved epitope targeted here.

      (Of note these two are mentioned in the discussion)

      We have updated the results section to include the following:

      (lines 281-294)

      “Based on the immunogenicity of the various BOAS and their ability to elicit neutralizing responses, it may not be necessary to maximize the number of HA heads into a single immunogen. Indeed, it qualitatively appears that the intermediate 4-, 5-, and 6mer BOAS were the most immunogenic and this length may be sufficient to effectively engage and crosslink BCR for potent stimulation. These BOAS also had similar or improved binding cross-reactivity to mis-matched HAs as compared to longer 7- or 8mer BOAS. Notably, the 3mer BOAS elicited detectable cross-reactive binding titers to H4 and H5 mismatched HAs in all mice. This observed cross-reactivity could be due to sequence conservation between the HAs, as H3 and H4 share ~51% sequence identity, and H1 and H2 share ~46% and ~62% overall sequence identity with H5, respectively (Supplemental Figure 6). Additionally, the degree of surface conservation decreased considerably beyond the 5mer as more antigenically distinct HAs were added to the BOAS. These data suggest that both antigenic distance between HA components and BOAS length play a key role in eliciting cross-reactive antibody responses, and further studies are necessary to optimize BOAS valency and antigenic distance for a desired response.”

      Figure 5E, the authors could indicate which subtype each mab is specific to for those who are not HA experts. (They have them color-coded but it is hard to see because very small).

      The authors also do not explain why 3E5 does not bind well to H1, H2, H3, H4 4-mer BOA, etc...

      We apologize for the lack of clarity in this figure. We updated Figure 5E to include the subtype it is specific for as well as listing the antibodies and their subtype and targeted epitope in the figure caption.

      Minor

      Figure 1B zoom looks like the line is hidden to the structure - should come in front

      We adjusted the figure accordingly.

      Line 127 - whether the order

      Corrected

      What is the rationale for thinking that a different order will lead to a different expression and antigenic results?

      We thank the reviewer for this question. We did not necessarily anticipate a difference in protein expression based on BOAS order We, however, wanted to verify that our platform was indeed “plug-and-play” platform and we could readily exchange components and order. We do, however, hypothesize that a different order may in fact lead to different antigenic results. We think that the conformation of the BOAS as well as physical and antigenic distance of HA components may influence cross-linking efficiency of BCRs and lead to different antigenic results with different levels of cross-reactivity. For example, a BOAS design with a cluster of group 1 HAs followed by a cluster of group 2 HAs, rather than our roughly alternating pattern could impact which HAs are in proximity to each other or could be potentially shielded in certain conformations, and thus could affect antigenic results. We expand on this rationale in the discussion in lines 310-314:

      “Further studies with different combinations of HAs could aid in understanding how length and composition influences epitope focusing. For example, a BOAS design with a cluster of group 1 HAs followed by a cluster of group 2 HAs, rather than our roughly alternating pattern could impact which HAs are in close proximity to one other or could be potentially shielded in certain conformations, and thus could affect antigenic results.”

      Maybe list HA#1 HA#2 HA#3 instead of HA1, HA2, HA3 to make sure it is not confounded with HA2 and HA2

      We agree that this may be confusing for readers, and have adjusted Figure 1C to show HA#1, HA#2, etc.

      For nsEM, do the authors have 2D classes and even 3D reconstructions? Line 148-149: maybe or just because there are more HAs.

      We did not obtain 2D class or 3D reconstructions of these BOAS. However, we do agree with the reviewer that the collapsed/rosette structure of the 8mer BOAS may be a consequence of the additional HA heads as well as the flexible Gly-Ser linkers between the components. We have added clarify to our statement in the discussion to read:

      lines 154-156:

      “This is likely a consequence of the flexible GSS linker separating the individual HA head components as well as the addition of significantly more HA head components to the construct.”.

      Line 153 " interface-directed" - what does this mean?

      We apologize for any confusion- we intend for “interface-directed” to refer antibodies that engage the trimer interface (TI) epitope between HA protomers. We have adjusted the manuscript to use the same terminology throughout, i.e. trimer interface or its abbreviation, TI.

      For Figure 2 F - do you have a negative control? Usually one does not determine an ELISA KD, it is not very accurate but shows binding in terms of OD value.

      We did include a negative control, MEDI8852, a stem-directed antibody, though it was not shown in the figure because we observed no binding, as expected. This negative control antibody was also used in Figure 5E for characterizing the BOAS NPs, and also shows no binding. We recognize that in an ELISA the KD is an equilibrium measurement and we do not report kinetic measurements as determined by a method such as bio-layer interferometry (BLI), and have this adjusted the figure caption to denote the values as “apparent K<sub>D</sub> values”.

      Line 169 - reads strangely, "BOAS-elicited serum, regardless of its length, reacted<br /> The length is the one of the Immunogen, not the serum

      We agree that this statement is unclear, and we have modified the sentence to read:

      lines 177-178:

      “Each of the BOAS, regardless of its length, elicited binding titers to all matched full-length HAs representing individual components (Figure 3D).”

      What is the adjuvant used (add in results)?

      We used Sigma adjuvant for all immunizations, and have included this information in the results section:

      lines 169-171:

      “To determine immunogenicity of each BOAS, we performed a prime-boost-boost vaccination regimen in C5BL/6 mice at two-week intervals with 20µg of immunogen and adjuvanted with Sigma Adjuvant (Figure 3A).”

      This information is also included in the methods section in lines 406-412.

      Line 178 - remove " across"

      We have removed the word “across” in this sentence and replaced it with “on” (line 194)

      Trimer- interface, and interface epitopes are used exchangeably - maybe keep it as trimer interface to be more precise

      As stated above, we have adjusted the manuscript to use the same term throughout, i.e., trimer interface or its abbreviation, TI.

      Line 221 - no figure 6H (6G?)

      We apologize for this typo and have corrected to Figure 6G (line 231)

      Reviewer #3 (Recommendations For The Authors):

      (1) Since 20 ug x3 doses is quite a high amount of vaccine, differences between immunogens may become blurred. Thus, it may be informative to compare post-prime serology for all immunogens or select immunogens to compare to the post-3rd dose data.

      We agree with the reviewer that this is on the upper end of vaccine dose and thus we explored the serum responses after a single boost. The overall trends and reactivity to matched and mis-matched BOAS components remained similar across days d28 and d42. However, the differences between the BOAS and BOAS NP groups and the mixture group were more pronounced at d28, which bolsters our claim that the presentation of the HA heads is important for eliciting strong serum responses to all components. We have included this data in Supplemental Figure 5, and have acknowledged this in the text:

      lines 185-187:

      “Similar binding trends were also observed with d28 serum, though the difference between the 8mer and mix groups was more pronounced at d28 (Supplemental Figure 5).”

      (2) Significance statistics for all immunogenicity data should be added and discussed; it is particularly absent in Figures 3D and 7.

      We have added statistical analyses to Figure 3 and Figure 7 to reflect changes in immunogenicity. We have also added the following to the methods section:

      lines 482-490:

      “Statistical Analysis

      Significance for ELISAs and microneutralization assays were determined using either a Mann-Whitney test or a Kruskal-Wallis test with Dunn’s post-hoc test in Prism (GraphPad Prism v10.2.3) to correct for multiple comparisons. Multiple comparisons were made between each possible combination or relative to a control group, where indicated. Significance was assigned with the following: * = p<0.05, ** = p<0.01, *** = p<0.001, and **** = p<0.0001. Where conditions are compared and no significance is reported, the difference was non-significant.”

      (3) Figure 2F: the figure has K03.12 listed for the H3-specific mAb and in the main text, but the caption says 3E5 - is the 3E5 in the caption a typo? 3E5 is listed for the competition ELISAs as an RBS mAb, but its binding site is distal to the RBS at residues 165-170 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787348/), H7.167 binds in the RBS periphery and not directly within the RBS, and the epitope for P2-D9 is undetermined/not presented. This could mean that there is actually a higher proportion of RBS-directed antibodies than what is determined from this serum competition data. Also, reference to these as 'RBS-directed' in the serum competition methods section should be revised for accuracy.

      We sincerely apologize for this error and the resulting confusion. 3E5 in the caption is incorrect and should be K03.12 (https://www.rcsb.org/structure/5W08) and does engage the receptor binding site. We also apologize for the oversight that H7.167 is in the RBS periphery and not directly in the RBS. The additional P2-D9 in the panel of RBS-directed antibodies was also in error, as we do not believe it is RBS-directed, but is indeed H4 specific. We also included a reference to the paper and immunogen that elicited this antibody. We agree that this indicates that there could be a higher proportion of RBS-directed antibodies in the serum and have modified the text in the results and methods sections to read:

      lines 300-306:

      “Notably, this proportion is approximate, as at the time of reporting, antibodies that bind the receptor binding site of all components were not available. RBS-directed antibodies to the H4 and H9 component were not available, and the RBS-directed antibodies used targeting the other HA components have different footprints around the periphery of the RBS. Additionally, there are currently no reported influenza B TI-directed antibodies in the literature. Therefore, this may be an underestimate of the serum proportion focused to the conserved RBS and TI epitopes.”

      lines 435-439:

      “Following blocking with BSA in PBS-T, blocking solution was discarded and 40µL of either DPBS (no competition control), a cocktail of humanized antibodies targeting the RBS and periphery (5J8, 2G1, K03.12, H5.3, H7.167, H1209), a cocktail of humanized TI-directed antibodies (S5V2-29, D1 H1-17/H3-14, D2 H1-1/H3-1), or a negative control antibody (MEDI8852) were added at a concentration of 100µg/mL per antibody.”

      (4) Only nsEM data is shown for the 3-BOAS and 8-BOAS, where differences in morphology were seen between these longer and shorter proteins. Including nsEM images for all BOAS immunogens may show trends in morphology or organization that could correlate with immune responses, e.g. if the 5-BOAS also forms a higher proportion of rosette-like structures, while the the 4-BOAS is still a mix between extended and rosette-like, this could be a factor in the better immune responses seen for 5-BOAS.

      We appreciate the reviewer’s suggestion for further analysis of morphology between the intermediate BOAS sizes. We agree that the relationship between BOAS length and morphology should be explored more in depth, and we intend to do so in future studies and to also vary linker length and rigidity.

    1. eLife Assessment

      This important study introduces a novel split-belt treadmill learning task to reveal distinct and parallel learning sub-components of gait adaptation: slow and gradual error-based perceptual realignment, and a more deliberate and flexible "stimulus-response" style learning process. The behavioural results convincingly support the presence of a non-error-based learning process during continuous movements, and the computational modelling provides comprehensive further evidence for establishing this learning process. These results will be of interest for the broader motor learning community.

    2. Reviewer #1 (Public review):

      Summary:

      Rossi et al. asked whether gait adaptation is solely a matter of slow perceptual realignment or if it also involves fast/flexible stimulus-response mapping mechanisms. To test this, they conducted a series of split-belt treadmill experiments with ramped perturbations, revealing behavior indicative of a flexible, automatic stimulus-response mapping mechanism.

      Strengths:

      (1) The study includes a perceptual test of leg speed, which correlates with the perceptual realignment component of motor aftereffects. This indicates that changes in motor performance are not fully accounted for by perceptual realignment.

      (2) The study evaluates the possible contributions of explicit strategy using a framework (Tsay et al., 2024) and provides evidence for minimal strategy involvement in split-belt adaptation through subjective reports.

      (3) The study incorporates qualitatively distinct, hypothesis-driven models of adaptation and proposes a new framework that integrates these mechanisms. Relatedly, the study considers a range of alternative models, demonstrating that perceptual recalibration and remapping uniquely explain the patterns of behavior and aftereffects, ruling out models that focus solely on a single process (e.g., PReMo, PEA, memory of errors, optimal feedback control) and others that do not incorporate remapping (dual rate state space models).

    3. Reviewer #2 (Public review):

      Recent findings in the field of motor learning have pointed to the combined action of multiple mechanisms that potentially contribute to changes in motor output during adaptation. A nearly ubiquitous motor learning process occurs via the trial-by-trial compensation of motor errors, often attributed to cerebellar-dependent updating. This error-based learning process is slow and largely unconscious. Additional learning processes that are rapid (e.g., explicit strategy-based compensation) have been described in discrete movements like goal-directed reaching adaptation. However, the role of rapid motor updating during continuous movements such as walking has been either under explored or inconsistent with those found during adaptation of discrete movements. Indeed, previous results have largely discounted the role of explicit strategy-based mechanisms for locomotor learning. In the current manuscript, Rossi et al. provide convincing evidence for a previously unknown rapid updating mechanism for locomotor adaptation. Unlike the now well-studied explicit strategies employed during reaching movements, the authors demonstrate that this stimulus-response mapping process is largely unconscious. The authors show that in approximately half of subjects, the mapping process appears to be memory based while the remainder of subjects appear to perform structural learning of the task design. The participants that learned using a structural approach had the capability to rapidly generalize to previously unexplored regions of the perturbation space.

      One result that will likely be particularly important to the field of motor learning is the authors' quite convincing correlation between the magnitude of proprioceptive recalibration and the magnitude error-based updating. This result beautifully parallels results in other motor learning tasks and appears to provide a robust marker for the magnitude of the mapping process (by means of subtracting off the contribution of error-based motor learning). This is a fascinating result with implications for the motor learning field well beyond the current study.

      A major strength of this manuscript is the large sample size across experiments and the extent of replication performed by the authors in multiple control experiments.

      Finally, I commend the authors on extending their original observations via Experiment 2. While it seems that participants use a range of mapping mechanisms (or indeed a combination of multiple mapping mechanisms), future experiments may be able to tease apart why some subjects use memory versus structural mapping. A future ability to push subjects to learn structurally-based mapping rules has the potential to inform rehabilitation strategies.

      Overall, the manuscript is well written, the results are clear, and the data and analyses are convincing.

      Strengths:

      (1) Convincing behavioral data supporting the existence of multiple learning processes during split-belt adaptation. Further convincing correlations typing the extent of forward-model based adaptation with proprioceptive recalibration.<br /> (2) The authors test a veritable "zoo" of prior motor learning models to show that these models do not account for their behavioral results.<br /> (3) The authors develop a convincing alternative model (PM-ReMap) that appears to account for their behavioral results by explicitly modeling forward-model based adaptation in parallel with goal remapping.

    4. Reviewer #3 (Public review):

      Summary:

      In this work, Rossi et al. use a novel split-belt treadmill learning task to reveal distinct sub-components of gait adaptation. The task involved following a standard adaptation phase with a "ramp-down" phase that helped them dissociate implicit recalibration and more deliberate SR map learning. Combined with modeling and re-analysis of previous studies, the authors show multiple lines of evidence that both processes run simultaneously, with implicit learning saturating based on intrinsic learning constraints and SR learning showing sensitivity to a "perceptual" error. These results offer a parallel with work in reaching adaptation showing both explicit and implicit processes contributing to behavior; however, in the case of gait adaptation the deliberate learning component does not appear to be strategic but is instead a more implicit SR learning process.

      The authors have done a commendable job responding to my comments and critiques. I have updated the S/W below to reflect that.

      Strengths:

      - The task design is very clever and the "ramp down" phase offers a novel way to attempt to dissociate competing models of multiple processes in gait adaptation<br /> - The analyses are thorough, as is the re-analysis of multiple previous data sets; the expanded modeling analyses are strong<br /> - The querying of perception of the different relative belt speeds is a very nice addition, allowing the authors to connect different learning components with error perception<br /> - The conceptual framework is compelling, highlighting parallels with work in reaching but also emphasizing differences, especially w/r/t SR learning versus strategic behaviors. Thus the discovery of an SR learning process in gait adaptation would be both novel and also help conjoin different siloed subfields of motor learning research.

      Weaknesses:

      - The expanded modeling analyses are useful although the SR process still seems somewhat mysterious (is it explicit/implicit? how exactly is it interacting with re-calibration?); however, understanding this system more could be a fruitful topic for future work<br /> - The sample size for the individual difference analysis is somewhat modest

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Rossi et al. asked whether gait adaptation is solely a matter of slow perceptual realignment or if it also involves fast/flexible stimulus-response mapping mechanisms. To test this, they conducted a series of split-belt treadmill experiments with ramped perturbations, revealing behavior indicative of a flexible, automatic stimulus-response mapping mechanism.

      Strengths:

      (1) The study includes a perceptual test of leg speed, which correlates with the perceptual realignment component of motor aftereffects. This indicates that there are motor performances that are not accounted for by perceptual re-alignment.

      (2) They study incorporates qualitatively distinct, hypothesis-driven models of adaptation and proposes a new framework that integrates these various mechanisms.

      Weaknesses:

      (1) The study could benefit from considering other alternative models. As the authors noted in their discussion, while the descriptive models explain some patterns of behaviour/aftereffects, they don't currently account for how these mechanisms influence the initial learning process itself.

      (1a) For example, the pattern of gait asymmetric might differ for perceptual realignment (a smooth, gradual process), structural learning (more erratic, involving hypothesis testing/reasoning to understand the perturbation, see (Tsay et al. 2024) for a recent review on Reasoning), and stimulus-response mapping (possibly through a reinforcement based trial-and-error approach). If not formally doing a model comparison, the manuscript might benefit from clearly laying out the behavioural predictions for how these different processes shape initial learning.

      (1b) Related to the above, the authors noted that the absence of difference during initial learning suggests that the differences in Experiment 2 in the ramp-up phase are driven by two distinct processes: structural learning and memory-based processes. If the assumptions about initial learning are not clear, this logic of this conclusion is hard to follow.

      Thank you for this insightful comment. We agree that considering alternative models and clarifying their potential contributions to the initial learning process would enhance the manuscript. We performed additional analyses and revised the text to outline how the mechanisms of adaptation in our study align with the framework described by Tsay et al. (2024) regarding the initial learning process and other features of adaptation.

      First, we referenced the Tsay et al. framework in the Introduction and Discussion to highlight parallels between their description of implicit adaptation and our forward model recalibration mechanism (producing motor changes and perceptual realignment). Specifically, the features defining recalibration in our study – gradual, trial-by-trial adjustments, rigid learning that leads to aftereffects, and limited contribution to generalization – align with those described by Tsay et al.

      Second, we used the description provided by Tsay et al. to test the presence of explicit strategies in our study. We specifically test for the criteria of reportability and intentionality, corroborating the finding that our stimulus response mapping mechanism differs from explicit strategies.

      “A recent framework for motor learning by Tsay et al. defines explicit strategies as motor plans that are both intentional and reportable (Tsay et al., 2024). Within this framework, Tsay et al. clarify that "intentional" means participants deliberately perform the motor plan, while "reportable" means they are able to clearly articulate it.” (Experiment 2 Results, lines 515-518).

      “…the motor adjustments reported by participants consistently fail to meet the criteria for explicit strategies as outlined by Tsay et al.: reportability and intentionality (Tsay et al., 2024).” (Discussion, lines 657-660).

      Third, we interpreted the operation of stimulus-response mapping within the Tsay theoretical framework for the three stages of motor learning: 1) “reasoning” to acquire new action–outcome relationships, 2) “refinement” of the motor action parameters, and 3) “retrieval” of learnt motor actions based on contextual cues. We note that the definition of these stages closely aligns with our definition for stimulus response mapping mechanisms. Moreover, according to Tsay’s definition, both implicit and explicit learning mechanisms can involve similar reasoning and retrieval processes. This shared operational basis may explain why our stimulus-response mapping mechanism exhibits some characteristics associated with explicit strategies, such as flexibility and generalizability.

      We performed a new analysis to evaluate Tsay’s framework predictions that, if walking adaptation includes a stimulus-response mapping mechanism following these three stages of motor learning, the learning process would initially be erratic and would then stabilize as learning progresses. We assessed within-participant residual variance in step length asymmetry around a double exponential model fit during adaptation, testing the prediction that this variability would decrease between the start and end of adaptation. Experiment 1 results confirmed this prediction, showing that a significant reduction in variability as adaptation progressed.

      “We finally tested whether the pattern of motor variability during adaptation aligns with predictions for learning new  stimulus response maps. In contrast to recalibration, mapping mechanisms are predicted to be highly  variable  and  erratic  during  early learning, and stabilize as learning progresses (Tsay et al., 2024). Consistent with these predictions,  the  step  length  asymmetry residual  variance  (around  a  double exponential  fit)  decreased  significantly between the start and end of adaptation (residual variance at start minus end of adaptation = 0.005 [0.004, 0.007], mean [CI]; SI Appendix, Fig. S3). These control analyses corroborate the hypothesis that the “no aftereffects” region of the Ramp Down reflects the operation of a mapping mechanism.”

      (Experiment 1 Results, lines 187-194; Methods, lines 1040-1050).

      Moreover, Experiment 2 results demonstrated that the pattern of variability (its magnitude and decay in adaptation) did not differ between participants using memory-based versus structure-based stimulus-response mapping mechanisms. These findings suggest that both types of mapping operate accordingly to Tsay’s stages of motor learning.

      “Furthermore, the pattern of step length asymmetry variability was similar between the subgroups (structure – memory difference in residual variance relative to double exponential during initial adaptation = -0.0052 [0.0161, 0.0044], adaptation plateau = -0.0007 [-0.0021, 0.0003], difference in variance decay = -0.0045 [-0.0155, 0.0052], mean [CI]; SI Appendix, Fig. S16). This confirms that the distinct performance clusters in the Ramp Up & Down task are not driven by natural variations in learning ability, such as differences in learning speed or variability. Rather, these findings indicate that the subgroups employ different types of mapping mechanisms, which perform similarly during initial learning but differ fundamentally in how they encode, retrieve, and generalize relationships between perturbations and Δ motor outputs.” (Experiment 2 Results, lines 503-511).

      “Both memory- and structure-based operations of mapping align with Tsay et al.’s framework for motor learning: first, action–outcome relationships are learned through exploration; second, motor control policies are refined to optimize rewards or costs, such as reducing error; and finally, learned mappings or policies are retrieved based on contextual cues (Tsay et al., 2024). Consistent with the proposed stages of exploration followed by refinement, we found that motor behavior during adaptation was initially erratic but became less variable at later stages of learning. Similarly, consistent with the retrieval stage, the generalization observed in the ramp tasks indicates that learned motor outputs are flexibly retrieved based on belt speed cues.” (Discussion, lines 701-708).

      Finally, we addressed the prediction outlined by Tsay et al. that repeated exposure to perturbations attenuates the magnitude of forward model recalibration, with savings being driven by stimulus-response mapping mechanisms. While we could not directly test savings for the primary perturbation used during adaptation, we were able to indirectly evaluate savings for a different perturbation through analyses of our control experiments combined with previous results from Leech et al. (Leech et al., 2018). Specifically, we examined how motor aftereffects and perceptual realignment evolved across repeated iterations of the speed-matching task post-adaptation in Ascending groups. Each task began with the right leg stationary and the left leg moving at 0.5 m/s – a configuration corresponding to a perturbation of -0.5 m/s, which is opposite in direction to the adaptation perturbation. By analyzing repeated exposures to this -0.5 m/s perturbation across iterations, we gained insights into the learning dynamics associated with this perturbation and the effect of repeated exposures on motor aftereffects and perceptual realignment. Consistent with predictions from Tsay et al., our results combined with Leech et al. demonstrate that, with repeated exposures to the same perturbation, perceptual realignment decays while the contribution of stimulus-response mapping to aftereffect savings is enhanced. We present this analysis and interpretation in Control Experiments Results, lines 429-442; Figure 8B; Table S7; and Discussion lines 709-753.

      (1c) The authors could also test a variant of the dual-rate state-space model with two perceptual realignment processes where the constraints on retention and learning rate are relaxed. This model would be a stronger test for two perceptual re-alignment processes: one that is flexible and another that is rigid, without mandating that one be fast learning and fast forgetting, and the other be slow learning and slow forgetting.

      We tested multiple variants of the suggested models, and confirmed that they cannot capture the motor behavior observed in our Ramp Down task. We include Author response image 1 with the models fits, Author response table 1 with the BIC statistics, and the models equations below. Only the recalibration + mapping model captures the matching-then-divergent behavior of the Δ motor output, corroborating our interpretation that state-space based models cannot capture the mapping mechanism (see Discussion, “Implications for models of adaptation”). Furthermore, all models fit the data significantly worse than the recalibration+mapping model according to the BIC statistic.

      Model fits:

      Author response image 1.

      Statistical results:

      Author response table 1.

      Model definitions:

      • DualStateRelaxed: same equations as the original Dual State, but no constraints dictating the relative relationship between the parameters

      • DualStateRelaxedV2: same equations as the original Dual State, but no constraints dictating the relative relationship between the parameters, and “loose” parameter bounds (parameters can take values between -10 to 10).

      • PremoOriginalRelaxed: PReMo with two states (see below), no constraints dictating the relative relationship between the parameters

      • PremoOriginalRelaxed: PReMo with two states (see below), no constraints dictating the relative relationship between the parameters, and “loose” parameter bounds (parameters can take values between -10 to 10).

      PReMo with two states – the remaining equations are the same as the original PReMo (see Methods):

      (2) The authors claim that stimulus-response mapping operates outside of explicit/deliberate control. While this could be true, the survey questions may have limitations that could be more clearly acknowledged.

      (2a) Specifically, asking participants at the end of the experiments to recall their strategies may suffer from memory biases (e.g., participants may be biased by recent events, and forget about the explicit strategies early in the experiment), be susceptible to the framing of the questions (e.g., participants not being sure what the experimenter is asking and how to verbalize their own strategy), and moreover, not clear what is the category of explicit strategies one might enact here which dictates what might be considered "relevant" and "accurate".

      (2b) The concept of perceptual realignment also suggests that participants are somewhat aware of the treadmill's changing conditions; therefore, as a thought experiment, if the authors have asked participants throughout/during the experiment whether they are trying different strategies, would they predict that some behaviour is under deliberate control?

      We have expanded the discussion to explicitly acknowledge that our testing methodology for assessing explicit strategies may have limitations, recognizing the factors mentioned by the reviewer. Moreover, as mentioned in response to comment (1), we leveraged the framework from Tsay et al., 2024 and its definition of explicit strategies to ensure a robust and consistent approach in interpreting the survey responses.

      We revised the Experiment 2 Results section, lines 515-518, to specify that we are evaluating the presence of explicit strategies according to the criteria of intentionality and reportability:

      “A recent framework for motor learning by Tsay et al. defines explicit strategies as motor plans that are both intentional and reportable (Tsay et al., 2024). Within this framework, Tsay et al. clarify that "intentional" means participants deliberately perform the motor plan, while "reportable" means they are able to clearly articulate it.”

      We then reorganized the Discussion to include a separate section “Mapping operates independently of explicit control”, lines 646-661, where we discuss limitations of the survey methodology and interpretation of the results according to Tsay et al., 2024:

      “Here, we show that explicit strategies are not systematically used to adapt step length asymmetry and Δ motor output: the participants in our study either did not know what they did, reported changes that did not actually occur or would not lead symmetry. Only one person reported “leaning” on the left (slow) leg for as much time as possible, which is a relevant but incomplete description for how to walk with symmetry. Four reports mentioned pressure or weight, which may indirectly influence symmetry (Hirata et al., 2019; Lauzière et al., 2014), but they were vague and conflicting (e.g., “making heavy steps on the right foot” or “put more weight on my left foot”). All other responses were null, explicitly wrong or irrelevant, or overly generic, like wanting to “stay upright” and “not fall down”. We acknowledge that our testing methodology has limitations. First, it may introduce biases related to memory recall or framing of the questionnaire. Second, while it focuses on participants' intentional use of explicit strategies to control walking, it does not rule out the possibility of passive awareness of motor adjustments or treadmill configurations. Despite these limitations, the motor adjustments reported by participants consistently fail to meet the criteria for explicit strategies as outlined by Tsay et al.: reportability and intentionality (Tsay et al., 2024). Together with existing literature, this supports the interpretation that stimulus response mapping operates automatically.”

      We also made the following addition to the “Limitations” section of the Discussion (lines 917-919):

      “While mapping differs from explicit strategies as they are currently defined, we still lack a comprehensive framework to capture the varying levels and nuanced characteristics of intentionality and awareness of different mechanisms (Tsay et al., 2024).”

      We finally note that “Unlike explicit strategies, which are rapidly acquired and diminish over time, this mapping mechanism exhibits prolonged learning beyond 15 minutes, with a rate comparable to recalibration” (Discussion, lines 632-634).

      (3) The distinction between structural and memory-based differences in the two subgroups was based on the notion that memory-based strategies increase asymmetry. However, an alternative explanation could be that unfamiliar perturbations, due to the ramping up, trigger a surprise signal that leads to greater asymmetry due to reactive corrections to prevent one's fall - not because participants are generalizing from previously learned representations (e.g., (Iturralde & Torres-Oviedo, 2019)).

      We agree that reactive corrections could contribute to the walking pattern in response to split-belt perturbations, as detailed by Iturralde & Torres-Oviedo, 2019. We also acknowledge that reactive corrections are rapid, flexible, feedback-driven, and automatic – characteristics that make them appear similar to stimulus-response mapping. However, a detailed evaluation of our results suggests that the behaviors observed in the ramp tasks cannot be fully explained by reactive corrections. Reactive corrections occur almost immediately, quickly adjusting the walking pattern to reduce error and improve stability. This excludes the possibility that what we identified as stimulusresponse mapping could instead be reactive corrections, because the stimulus-response mapping observed in our study is acquired slowly at a rate comparable to recalibration. It also excludes the possibility that the increased asymmetry in the Ramp Up & Down could be due to reactive corrections, because these would operate alongside mapping to help reduce asymmetry rather than exacerbate it.

      We made substantial revisions to the Discussion and included the section “Stimulus-response mapping is flexible but requires learning” to explain this interpretation (lines 595-622):

      “The mapping mechanism observed in our study aligns with the corrective responses described by Iturralde and Torres-Oviedo, which operate relative to a recalibrated "new normal" rather than relying solely on environmental cues (Iturralde and Torres-Oviedo, 2019). Accordingly, our findings suggest a tandem architecture: forward model recalibration adjusts the nervous system's "normal state," while stimulus-response mapping computes motor adjustments relative to this "new normal." This architecture explains the sharp transition from flexible to rigid motor adjustments observed in our Ramp Down task. The transition occurs at the configuration perceived as "equal speeds" (~0.5 m/s speed difference) because this corresponds to the recalibrated “new normal”.

      In the first half of the Ramp Down, participants adequately modulated their walking pattern to accommodate the gradually diminishing perturbation, achieving symmetric step lengths. Due to the recalibrated “new normal”, perturbations within this range are perceived as congruent with the direction of adaptation but reduced in magnitude. This allows the mapping mechanism to flexibly modulate the walking pattern by using motor adjustments previously learned during adaptation. Importantly, the rapid duration of the Ramp Down task rules out the possibility that the observed modulation may instead reflect washout, as confirmed by the fact the aftereffects measured post-Ramp-Down were comparable to previous work (Kambic et al., 2023; Reisman et al., 2005).

      In the second half of the Ramp Down, aftereffects emerged as participants failed to accommodate perturbations smaller than the recalibrated “new normal”. These perturbations were perceived as opposite to the adaptation perturbation and, therefore, novel. Accordingly, the mapping mechanism responded as it would to a newly introduced perturbation, rather than leveraging previously learned adjustments (Iturralde and Torres-Oviedo, 2019). Due to the rapid nature of the Ramp Down, the mapping mechanism lacked sufficient time to learn the novel motor adjustments required for these perturbations – a process that typically takes several minutes, as shown by our baseline ramp tasks and control experiments. As mapping-related learning was negligible, the rigid recalibration adjustments dominated during this phase. Consequently, the walking pattern did not change to accommodate the gradually diminishing perturbation, leading to the emergence of aftereffects.”

      (4) Further contextualization: Recognizing the differences in dependent variables (reaching position vs. leg speed/symmetry in walking), could the Proprioceptive/Perceptual Re-alignment model also apply to gait adaptation (Tsay et al., 2022; Zhang et al., 2024)? Recent reaching studies show a similar link between perception and action during motor adaptation (Tsay et al., 2021) and have proposed a model aligning with the authors' correlations between perception and action. The core signal driving implicit adaptation is the discrepancy between perceived and desired limb position, integrating forward model predictions with proprioceptive/visual feedback.

      We appreciate the reviewer’s suggestion and agree that the Proprioceptive Re-alignment model (PReMo) and Perceptual Error Adaptation model (PEA), offer valuable insights into the relationship between perception and motor adaptation. To explore whether these frameworks apply to gait adaptation, we conducted an extensive modeling analysis. This is shown in Figure 5 and Supplementary Figures S7-S8, and is detailed in the text of Experiment 1 Results section “Modelling analysis for perceptual realignment” (lines 327–375), Methods section “Proprioceptive re-alignment model (PReMo)” (lines 1181-1221), Methods section “Perceptual Error Adaptation model (PEA)” (lines 1222-1247), Methods section “Perceptuomotor recalibration + mapping (PM-ReMap)” (lines 1248-1286), and SI Appendix section “Evaluation and development of perceptual models.” (lines 99-237).

      First, we evaluated how PReMo and PEA models fitted our Ramp Down data. We translated the original variables to walking adaptation variables using a conceptual equivalence explained by one of the features explored by Tsay et al. (2022). Specifically, the manuscript provides guidance on extending the PReMo model from visuomotor adaptation in response to visual-proprioceptive discrepancies, to force-field adaptation in response to mechanical perturbations – which share conceptual similarities with split-belt treadmill perturbations. The manuscript also discusses that, if vision is removed, the proprioceptive shift decays back to zero according to a decay parameter. This description entails that proprioceptive shift cannot increase or develop in the absence of vision. We applied the models to split-belt adaptation in accordance with this information, as described in the SI Appendix: “PReMo variables equivalents for walking adaptation”. As reported in Experiment 1 Results “Modelling analysis for perceptual realignment” (lines 327–375) and Figure 5, neither PReMo nor PEA adequately captured the key features of our Ramp Down data: “The models could not capture the matching-then-divergent behavior of Δ motor output, performing significantly worse than the recalibration + mapping model (PReMo minus recalibration+mapping BIC difference = 24.591 [16.483, 32.037], PEA minus recalibration+mapping BIC difference = 6.834 [1.779, 12.130], mean [CI]). Furthermore, they could not capture the perceptual realignment and instead predicted that the right leg would feel faster than the left throughout the entire Ramp Down”.

      Second, we used simulations to confirm that PReMo and PEA cannot account for the perceptual realignment observed in our study, and to understand why. At adaptation plateau, PReMo predicts that perceived and actual step length asymmetry converge, as shown in Fig. S7A, top, and as detailed in the SI Appendix “Original PReMo simulations”. We found that this is because PReMo assumes that perceptual realignment arises specifically from mismatches between different sensory modalities. This assumption works for paradigms that introduce an actual mismatch between sensory modalities, such as visuomotor adaptation paradigms with a mismatch between vision and proprioception. This assumption also works for paradigms that indirectly introduce a mismatch between integrated sensory information from different sensory modalities. In force-field adaptation, both proprioceptive and visual inputs are present and realistic, but when these inputs are integrated with sensory predictions, the resulting integrated visual estimate is mismatched compared to the integrated proprioceptive estimate. In contrast, the assumption that perceptual realignment arises from sensory modalities mismatches does not work for paradigms that involve a single sensory modality. Split-belt adaptation only involves proprioception as no visual feedback is given, and perceptual realignment arises from discrepancies between predicted and actual motor outcomes, rather than between integrated sensory modalities.

      To overcome this limitation, we reinterpreted the variables of the PReMo model, while keeping the original equations, to account for realignment driven by mismatches of the same nature as the perturbation driving adaptation. As reported in the SI Appendix “Iterative simulations for the development of PM-ReMap”, the simulation (Fig. S7A, middle row) “showed perceptual realignment at adaptation plateau, addressing a limitation of the original model. However, it failed to account for the Ramp Down perceptual results, inaccurately predicting that belt speeds feel equal when they are actually equal (Fig. S7A, middle row, perceived perturbation decays alongside actual perturbation and converge to zero at the end of the Ramp Down). […] This occurs because, under the retained PReMo equations, β<sub>p</sub> and β<sub>v</sub> change immediately and are proportional to the difference between and on each trial, so that they ramp down to zero in parallel with the perturbation”.

      We also noted that the simulations of the original and reinterpreted PReMo models could also not support the operation of the mapping mechanism observed in the Ramp Down (Fig. S7B). We describe that “This occurs because the overall motor output x<sub>p</sub>, which includes both recalibration and mapping mechanisms, changes gradually according to the learning rate 𝐾. Consequently, changes in 𝐺 take many trials to be fully reflected in x<sub>p</sub>. Hence, we found complementary limitations where PReMo assumes perceptual realignment changes immediately while mapping adjustments develop gradually – but the opposite is true in our data”.

      We therefore modified the PReMo equations and developed a new model, called perceptuomotor recalibration + mapping (PM-ReMap) that addresses these limitations and is able to capture our Ramp Down motor and perceptual results. As described in the SI Appendix “Iterative simulations for the development of PM-ReMap”, “we introduced an update equation for β<sub>p</sub> so that it changes gradually trial-by-trial according to the learning rate 𝐾. We then removed the learning rate from the update equation for x<sub>p</sub> so that it integrates two distinct types of changes: 1) the gradual changes in driven by β<sub>p</sub> and representing the recalibration mechanism, and 2) the immediate changes in 𝐺 – representing the mapping mechanism”. The final equations of the PM-ReMap model are as follows:

      As reported in Experiment 1 Results, “Modelling analysis for perceptual realignment”, and as shown in Fig. 5C, “the PM-ReMap model captured the Δ motor output in the Ramp Down with performance comparable to that of the recalibration + mapping model (BIC difference = 2.381 [-0.739, 5.147], mean [CI]). It also captured perceptual realignment, predicting that some intermediate belt speed difference in the Ramp Down is perceived as “equal speeds” (, Fig. 5C)”. We also found that the estimated aligned with the empirical measurement of the PSE in the Ramp Down both at group and individual level: “At group level, was comparable to the upper bound of compensation<sub>perceptual</sub> (difference = -7 [-15, 1]%, mean [CI]), but significantly larger than the lower bound (difference = 19 [8, 31]%, mean [CI]). Furthermore, we found a significant correlation between individual participants’ and their upper bound of compensation<sub>perceptual</sub> (r=0.63, p=0.003), but not their lower bound (r=0.30, p=0.203). Both sets of results are consistent with those observed for the recalibration + mapping model”.

      Based on these findings, we summarize that PM-ReMap “extends the recalibration + mapping model by incorporating the ability to account for forgetting – typical of state space models – while still effectively capturing both recalibration and mapping mechanisms. However, performance of the PM-ReMap model does not exceed that of the simpler recalibration + mapping model, suggesting that forgetting and unlearning do not have a substantial impact on the Ramp Down”.

      Reviewer #2 (Public review):

      Recent findings in the field of motor learning have pointed to the combined action of multiple mechanisms that potentially contribute to changes in motor output during adaptation. A nearly ubiquitous motor learning process occurs via the trial-by-trial compensation of motor errors, often attributed to cerebellar-dependent updating. This error-based learning process is slow and largely unconscious. Additional learning processes that are rapid (e.g., explicit strategy-based compensation) have been described in discrete movements like goal-directed reaching adaptation. However, the role of rapid motor updating during continuous movements such as walking has been either under-explored or inconsistent with those found during the adaptation of discrete movements. Indeed, previous results have largely discounted the role of explicit strategy-based mechanisms for locomotor learning. In the current manuscript, Rossi et al. provide convincing evidence for a previously unknown rapid updating mechanism for locomotor adaptation. Unlike the now well-studied explicit strategies employed during reaching movements, the authors demonstrate that this stimulus-response mapping process is largely unconscious. The authors show that in approximately half of subjects, the mapping process appears to be memory-based while the remainder of subjects appear to perform structural learning of the task design. The participants that learned using a structural approach had the capability to rapidly generalize to previously unexplored regions of the perturbation space.

      One result that will likely be particularly important to the field of motor learning is the authors' quite convincing correlation between the magnitude of proprioceptive recalibration and the magnitude error-based updating. This result beautifully parallels results in other motor learning tasks and appears to provide a robust marker for the magnitude of the mapping process (by means of subtracting off the contribution of error-based motor learning). This is a fascinating result with implications for the motor learning field well beyond the current study.

      A major strength of this manuscript is the large sample size across experiments and the extent of replication performed by the authors in multiple control experiments.

      Finally, I commend the authors on extending their original observations via Experiment 2. While it seems that participants use a range of mapping mechanisms (or indeed a combination of multiple mapping mechanisms), future experiments may be able to tease apart why some subjects use memory versus structural mapping. A future ability to push subjects to learn structurally-based mapping rules has the potential to inform rehabilitation strategies.

      Overall, the manuscript is well written, the results are clear, and the data and analyses are convincing. The manuscript's weaknesses are minor, mostly related to the presentation of the results and modeling.

      Weaknesses:

      The overall weaknesses in the manuscript are minor and can likely be addressed with textual changes.

      (1) A key aspect of the experimental design is the speed of the "ramp down" following the adaptation period. If the ramp-down is too slow, then no after-effects would be expected even in the alternative recalibration-only/errorbased only hypothesis. How did the authors determine the appropriate rate of ramp-down? Do alternative choices of ramp-down rates result in step length asymmetry measures that are consistent with the mapping hypothesis?

      We thank the reviewer for their insightful comment regarding the rate of the Ramp Down following the adaptation period and its potential impact on aftereffects under different hypotheses. We added a detailed explanation for how we determined the Ramp Down design, including analyses of previous work, to the SI Appendix, “Ramp Down design”, lines 22-98. We also describe the primary points in the main Methods section, “Ramp Tasks”, lines 978-991:

      As described in SI Appendix, “Ramp Down design”, the Ramp Down task was specifically designed to measure the pattern of aftereffects in a way that ensured reliable and robust measurements with sufficient resolution across speeds, and that minimized washout to prevent confounding the results. To balance time constraints with a measurement resolution adequate for capturing perceptual realignment, we used 0.05 m/s speed decrements, matching the perceptual sensitivity estimated from our re-analysis of the baseline data from Leech et al. (Leech et al., 2018a). To obtain robust motor aftereffect measurements, we collected three strides at each speed condition, as averaging over three strides represents the minimum standard for consistent and reliable aftereffect estimates in split-belt adaptation (typically used in catch trials) (Leech et al., 2018a; Rossi et al., 2019; Vazquez et al., 2015). To minimize unwanted washout by forgetting and/or unlearning, we did not pause the treadmill between adaptation and the post-adaptation ramp tasks, and ensured the Ramp Down was relatively quick, lasting approximately 80 seconds on average. Of note, the Ramp Down design ensures that even in cases of partial forgetting, the emergence pattern of aftereffects remains consistent with the underlying hypotheses.

      In the SI Appendix, we explain that, while we did not test longer ramp-down durations directly, previous data suggest that durations of up to at least 4.5 minutes would yield step length asymmetry measures consistent with our results and the mapping hypothesis. Additionally, our control experiments replicated the behavior observed in the Ramp Down using speed match tasks lasting only 30 seconds, further supporting the robustness of our findings across varying durations.

      (2) Overall, the modeling as presented in Figure 3 (Equation 1-3) is a bit convoluted. To my mind, it would be far more useful if the authors reworked Equations 1-3 and Figure 3 (with potential changes to Figure 2) so that the motor output (u) is related to the stride rather than the magnitude of the perturbation. There should be an equation relating the forward model recalibration (i.e., Equation 1) to the fraction of the motor error on a given stride, something akin to u(k+1) = r * (u(k) - p(k)). This formulation is easier to understand and commonplace in other motor learning tasks (and likely what the authors actually fit given the Smith & Shadmehr citation and the derivations in the Supplemental Materials). Such a change would require that Figure 3's independent axes be changed to "stride," but this has the benefit of complementing the presentation that is already in Figure 5.

      We reworked these equations (now numbered 4-6, lines 207-209) so that the motor output u is related to stride k as suggested by the reviewer:

      We changed Figure 2 and Figure 3 accordingly, adding a “stride” x-axis to the Ramp Down data figure.

      Reviewer #2 (Recommendations for the authors):

      I think that some changes to the text/ordering could improve the manuscript's readability. In particular:

      (1) My feeling is that much of the equations presented in the Methods section should be moved to the Results section. Particularly Equations 9-11. The introduction of these motor measures should likely precede Figure 1, as their definitions form the crux of Figure 1 and the subsequent analyses.

      (2) It is unclear to me why many of the analyses and discussion points have been relegated to Supplemental Material. I would significantly revise the manuscript to move much of the content from Supplemental Material to the Methods and Discussion (where appropriate). Even the Todorov and Herzfeld models can likely simply be referenced in the text without a need for their full description in the Supplemental material - as their implementations appear to this reviewer as consistent with those presented in the respective papers. Beyond the Supplementary Tables, my feeling is that nearly all of the content in Supplemental can either be simply cited (e.g. alternative model implementations) or directly incorporated into the main manuscript without compromising the readability of the manuscript.

      We reorganized the manuscript and SI Appendix substantially, moving content to the Results or other main text section. The changes included those recommended by the reviewer:

      • We moved the equations describing step length asymmetry, perturbation, and Δ motor output (originally numbered Eq. 9-11) to the Results section (Experiment 1, “Motor paradigm and hypothesis”, lines 131-133, now numbered Eq. 1-3).

      • We moved Supplementary Methods to the main Methods section

      • We moved the most relevant content of the Supplementary Discussion to the main Discussion, and removed the less relevant content altogether.

      • We moved the methods describing walking-adaptation specific implementation of the Todorov and Herzfeld models to the main Methods section and removed the portions that were identical to the original implementation.

      • We moved the control experiments to the main text (main Results and Methods sections).

      • We removed the SI Appendix section “Experiment 1 mechanisms characteristics”

      Reviewer #3 (Public review):

      Summary:

      In this work, Rossi et al. use a novel split-belt treadmill learning task to reveal distinct sub-components of gait adaptation. The task involved following a standard adaptation phase with a "ramp-down" phase that helped them dissociate implicit recalibration and more deliberate SR map learning. Combined with modeling and re-analysis of previous studies, the authors show multiple lines of evidence that both processes run simultaneously, with implicit learning saturating based on intrinsic learning constraints and SR learning showing sensitivity to a "perceptual" error. These results offer a parallel with work in reaching adaptation showing both explicit and implicit processes contributing to behavior; however, in the case of gait adaptation the deliberate learning component does not appear to be strategic but is instead a more implicit SR learning processes.

      Strengths:

      (1) The task design is very clever and the "ramp down" phase offers a novel way to attempt to dissociate competing models of multiple processes in gait adaptation.

      (2) The analyses are thorough, as is the re-analysis of multiple previous data sets.

      (3) The querying of perception of the different relative belt speeds is a very nice addition, allowing the authors to connect different learning components with error perception.

      (4) The conceptual framework is compelling, highlighting parallels with work in reaching but also emphasizing differences, especially w/r/t SR learning versus strategic behaviors. Thus the discovery of an SR learning process in gait adaptation would be both novel and also help conjoin different siloed subfields of motor learning research.

      Weaknesses:

      (1) The behavior in the ramp-down phase does indeed appear to support multiple learning processes. However, I may have missed something, but I have a fundamental worry about the specific modeling and framing of the "SR" learning process. If I correctly understand, the SR process learns by adjusting to perceived L/R belt speed differences (Figure 7). What is bugging me is why that process would not cause the SR system to still learn something in the later parts of the ramp-down phase when the perceived speed differences flip (Figure 4). I do believe this "blunted learning" is what the SR component is actually modeled with, given this quote in the caption to Figure 7: "When the perturbation is perceived to be opposite than adaptation, even if it is not, mapping is zero and the Δ motor output is constant, reflecting recalibration adjustments only." It seems a priori odd and perhaps a little arbitrary to me that a SR learning system would just stop working (go to zero) just because the perception flipped sign. Or for that matter "generalize" to a ramp-up (i.e., just learn a new SR mapping just like the system did at the beginning of the first perturbation). What am I missing that justifies this key assumption? Or is the model doing something else? (if so that should be more clearly described).

      We concur that this point was confusing, and we performed additional analyses and revised the text to improve clarity. Specifically, we clarify that the stimulus-response mapping does indeed still learn in the second portion of the Ramp Down, when the perceived speed differences flip. However, learning by the mapping mechanism proceeds slowly – at a rate comparable to that of forward model recalibration, taking several minutes. The duration of the task is relatively short, so that learning by the mapping mechanism is limited. We schematize the learning to be zero as an approximation. We have now included an additional modelling analysis (as part of our expanded perceptual modelling analyses), which shows there is no significant improvement in modelling performance when accounting for forgetting of recalibration or learning in the opposite direction by mapping in the second half of the ramp down, supporting this approximation. We explain this and other revisions in detail below.

      We include a Discussion section “Stimulus-response mapping is flexible but requires learning” where we improve our explanation of the operation of the mapping mechanism in the Ramp Down by leveraging the framework proposed by Iturralde and Torres-Oviedo, 2019. The section first explains that mapping operates relative to a new equilibrium corresponding to the current forward model calibration (lines 595-603):

      “The mapping mechanism observed in our study aligns with the corrective responses described by Iturralde and Torres-Oviedo, which operate relative to a recalibrated "new normal" rather than relying solely on environmental cues (Iturralde and Torres-Oviedo, 2019). Accordingly, our findings suggest a tandem architecture: forward model recalibration adjusts the nervous system's "normal state," while stimulus-response mapping computes motor adjustments relative to this "new normal." This architecture explains the sharp transition from flexible to rigid motor adjustments observed in our Ramp Down task. The transition occurs at the configuration perceived as "equal speeds" (~0.5 m/s speed difference) because this corresponds to the recalibrated “new normal”.”

      The following paragraph (lines 604-611) explain how this concept reflects in the first half of the Ramp Down:

      “In the first half of the Ramp Down, participants adequately modulated their walking pattern to accommodate the gradually diminishing perturbation, achieving symmetric step lengths. Due to the recalibrated “new normal”, perturbations within this range are perceived as congruent with the direction of adaptation but reduced in magnitude. This allows the mapping mechanism to flexibly modulate the walking pattern by using motor adjustments previously learned during adaptation. Importantly, the rapid duration of the Ramp Down task rules out the possibility that the observed modulation may instead reflect washout, as confirmed by the fact the aftereffects measured post-Ramp-Down were comparable to previous work (Kambic et al., 2023; Reisman et al., 2005).”

      The last paragraph (lines 612–622) explain the second half of the Ramp Down in light of the equilibrium concept and of the slow learning rate of mapping:

      “In the second half of the Ramp Down, aftereffects emerged as participants failed to accommodate perturbations smaller than the recalibrated “new normal”. These perturbations were perceived as opposite to the adaptation perturbation and, therefore, novel. Accordingly, the mapping mechanism responded as it would to a newly introduced perturbation, rather than leveraging previously learned adjustments (Iturralde and TorresOviedo, 2019). Due to the rapid nature of the Ramp Down, the mapping mechanism lacked sufficient time to learn the novel motor adjustments required for these perturbations – a process that typically takes several minutes, as shown by our baseline ramp tasks and control experiments. As mapping-related learning was negligible, the rigid recalibration adjustments dominated during this phase. Consequently, the walking pattern did not change to accommodate the gradually diminishing perturbation, leading to the emergence of aftereffects.”

      We also revised the Discussion section “Mapping operates as memory-based in some people, structure-based in others”, to clarify the processes of interpolation and extrapolation (lines 689-700). This revision helps explain why mapping may generalize to a ramp-up faster than learning a perturbation perceived in the opposite direction (when considered together with the explanation that mapping operates relative to the new recalibrated equilibrium) In the former case (generalize to a ramp-up), a structure-based mapping can use the extrapolation computation: it leverages previous knowledge of which gait parameters should be modified and how – e.g., modulating the positioning our right foot to be more forward on the treadmill – but must extrapolate the specific parameter values – e.g., how more far forward. In the latter case (learning a perturbation perceived in the opposite direction), even a structure-based mapping would need to figure out what gait parameters to change completely anew – e.g., modulating the positioning of the foot in the opposite way, to be less forward, requires a different set of control policies.

      We mentioned above that this illustration of the mapping mechanism relies on the assumption that the additional learning of the mapping mechanism in the second half of the Ramp Down is negligible. As part of our revisions for the “Modelling analysis for perceptual realignment”, we developed a new model – the perceptuomotor recalibration + mapping model (PM-ReMap) that extends the recalibration + mapping model by accounting for the possibility that Δ motor output is not constant in the second half of the Ramp Down (main points are at lines 355-275, and Figure 5; see response to Reviewer #1 (Public review), Comment 4, for a detailed explanation). We find that performance of the PM-ReMap model does not exceed that of the simpler recalibration + mapping model, suggesting that the Δ motor output does not change substantially in the second half of the Ramp Down. Note that, if the Δ motor output decayed in this phase, it could be due to forgetting or unlearning of the recalibration mechanism, or also it could be due to the mapping mechanism learning in the opposite direction than it did in adaptation. In the Results section, we focused on describing recalibration forgetting/unlearning for simplicity. However, in the Discussion section “Mapping may underly savings upon re-exposure to the same or different perturbation”, we explain in detail how the motor aftereffects also depend on the mapping mechanism learning in the opposite direction, as corroborated by our Control experiments and previous work. Therefore, the finding that the PM-ReMap model performance does not exceed that of the simpler recalibration + mapping model suggest that both effects – recalibration forgetting/unlearning and opposite-direction-learning of mapping – are not significant, nor is their combined effect on the Δ motor output.

      (2) A more minor point, but given the sample size it is hard to be convinced about the individual difference analysis for structure learning (Figure 5). How clear is it that these two groups of subjects are fully separable and not on a continuum? The lack of clusters in another data set seems like a somewhat less than convincing control here.

      We performed an additional analysis – a silhouette analysis – to confirm the presence of these clusters in our data (Methods, lines 1070-1072). The results, reported in Experiment 2 Results, lines 487-490, confirmed that there is strong evidence for the presence of these clusters:

      “A silhouette analysis confirmed strong evidence for these clusters: the average silhouette score was 0.90, with 19 of 20 participants scoring above 0.7 – considered strong evidence – and one scoring between 0.5 and 0.7 – considered reasonable evidence (Dalmaijer et al., 2022; Kaufman and Rousseeuw, 1990; Rousseeuw, 1987).”

      Reviewer #3 (Recommendations for the authors):

      (1) I think there is far too much content pushed into the supplement. The other models and full model comparison should be in the main text, as should the re-analysis of previous data sets. Also, key discussion points should not be in the supplement either.

      We reorganized the manuscript and SI Appendix substantially, including the changes recommended by the reviewer. Please refer to our response to “Reviewer #2 - Recommendations for the authors” for a detailed explanation.

      (2) Line 649: in reaching the calibration system does respond to different error sizes; why not here?

      We apologize for the confusion. Similar to reaching adaptation, the recalibration in walking adaptation also scales based on the error size experienced in adaptation. What we meant to convey is that, once a calibration has been acquired in adaptation, the recalibration process is rigid in that it can only change gradually. So if we jump the perturbation to a different value, the original calibration is transiently used until the system has the time to recalibrate again. For example, if we jump abruptly from the adaptation perturbation to a perturbation of zero in postadaptation, the adaptation calibration persists resulting in aftereffects.

      We revised the manuscript to clarity these points. First, we explicitly report that forward model recalibration scales based on the error size experienced in adaptation:

      “We next compared Medium Descend and Small Abrupt (1m/s or 0.4m/s perturbation), and found that recalibration contributed significantly more for the smaller perturbation (larger compensation<sub>perceptual</sub> / compensation<sub>motor-total</sub> in Small Abrupt than Medium Descend, Fig. 8A middle and Table S6).” (Control experiments Results, lines 422-425)

      “the mapping described here shares some characteristics with explicit mechanisms, such as flexibility and modulation by error size” (Discussion, lines 630-631)

      Additionally, we leverage the framework proposed by Tsay et al., 2024, to improve our explanation of the characteristics of the different learning mechanisms. Please refer to our response to “Reviewer #1 (Public review)”, Comment (1).

      (3) It would be nice to see bar graphs showing model comparison results for each individual subject in the main text, and to see how many subjects are best fit by the SR+calibration model.

      We included the recommended bar graphs to Figure 3 and Figure 5.

      (4) Why exactly does the "perturbation" in Figure 3 have error bars?

      In walking adaptation, the perturbation that participants experienced is closely dictated by the treadmill belt speeds, but not exactly, because participants are free to move their feet as they like, so that their ankle movement may not always match the treadmill belts exactly. Therefore, we record the perturbation that is actually experienced by each participant’s feet using markers. We then display the mean and standard error of this perturbation.

      We moved the equation describing the perturbation measure from the Methods to the Experiment 1 Results (lines 131-133, Eq. 1-3). We believe this change will help the reader understand the measures depicted.

    1. eLife Assessment

      This study presents a useful reassessment of the potential role of dendritic cell-derived IL-27 p28 cytokine in the functional maturation of CD4+CD8- thymocytes, and CD4+ recent thymic emigrants. The evidence supporting the claims of the authors is solid and serves to reaffirm what has been previously described, with the overall advance in understanding the mechanism(s) responsible for the intrathymic functional programming of CD4+ T cells being limited.

    2. Reviewer #1 (Public Review):

      Summary:

      Zhang et al. demonstrate that CD4+ single positive (SP) thymocytes, CD4+ recent thymic emigrants (RTE), and CD4+ T naive (Tn) cells from Cd11c-p28-flox mice, which lack IL-27p28 selectively in Cd11c+ cells, exhibit a hyper-Th1 phenotype instead of the expected hyper Th2 phenotype. Using IL-27R-deficient mice, the authors confirm that this hyper-Th1 phenotype is due to IL-27 signaling via IL-27R, rather than the effects of monomeric IL-27p28. They also crossed Cd11c-p28-flox mice with autoimmune-prone Aire-deficient mice and showed that both T cell responses and tissue pathology are enhanced, suggesting that SP, RTE, and Tn cells from Cd11c-p28-flox mice are poised to become Th1 cells in response to self-antigens. Regarding mechanism, the authors demonstrate that SP, RTE, and Tn cells from Cd11c-p28-flox mice have reduced DNA methylation at the IFN-g and Tbx21 loci, indicating 'de-repression', along with enhanced histone tri-methylation at H3K4, indicating a 'permissive' transcriptional state. They also find evidence for enhanced STAT1 activity, which is relevant given the well-established role of STAT1 in promoting Th1 responses, and surprising given IL-27 is a potent STAT1 activator. This latter finding suggests that the Th1-inhibiting property of thymic IL-27 may not be due to direct effects on the T cells themselves.

      Strengths:

      Overall the data presented are high quality and the manuscript is well-reasoned and composed. The basic finding - that thymic IL-27 production limits the Th1 potential of SP, RTE, and Tn cells - is both unexpected and well described.

      Weaknesses from the original round of review:

      A credible mechanistic explanation, cellular or molecular, is lacking. The authors convincingly affirm the hyper-Th1 phenotype at epigenetic level but it remains unclear whether the observed changes reflect the capacity of IL-27 to directly elicit epigenetic remodeling in developing thymocytes or knock-on effects from other cell types which, in turn, elicit the epigenetic changes (presumably via cytokines). The authors propose that increased STAT1 activity is a driving force for the epigenetic changes and resultant hyper-Th1 phenotype. That conclusion is logical given the data at hand but the alternative hypothesis - that the hyper-STAT1 response is just a downstream consequence of the hyper-Th1 phenotype - remains equally likely. Thus, while the discovery of a new anti-inflammatory function for IL-27 within the thymus is compelling, further mechanistic studies are needed to advance the finding beyond phenomenology.

    3. Reviewer #2 (Public Review):

      Summary:

      Naïve CD4 T cells in CD11c-Cre p28-floxed mice express highly elevated levels of proinflammatory IFNg and the transcription factor T-bet. This phenotype turned out to be imposed by thymic dendritic cells (DCs) during CD4SP T cell development in the thymus [PMID: 23175475]. The current study affirms these observations, first, by developmentally mapping the IFNg dysregulation to newly generated thymic CD4SP cells [PMID: 23175475], second, by demonstrating increased STAT1 activation being associated with increased T-bet expression in CD11c-Cre p28-floxed CD4 T cells [PMID: 36109504], and lastly, by confirming IL-27 as the key cytokine in this process [PMID: 27469302]. The authors further demonstrate that such dysregulated cytokine expression is specific to the Th1 cytokine IFNg, without affecting the expression of the Th2 cytokine IL-4, thus proposing a role for thymic DC-derived p28 in shaping the cytokine response of newly generated CD4 helper T cells. Mechanistically, CD4SP cells of CD11c-Cre p28-floxed mice were found to display epigenetic changes in the Ifng and Tbx21 gene loci that were consistent with increased transcriptional activities of IFNg and T-bet mRNA expression. Moreover, in autoimmune Aire-deficiency settings, CD11c-Cre p28-floxed CD4 T cells still expressed significantly increased amounts of IFNg, exacerbating the autoimmune response and disease severity. Based on these results, the investigators propose a model where thymic DC-derived IL-27 is necessary to suppress IFNg expression by CD4SP cells and thus would impose a Th2-skewed predisposition of newly generated CD4 T cells in the thymus, potentially relevant in autoimmunity.

      Strengths:

      Experiments are well-designed and executed. The conclusions are convincing and supported by the experimental results.

      Weaknesses from the original round of review:

      The premise of the current study is confusing as it tries to use the CD11c-p28 floxed mouse model to explain the Th2-prone immune profile of newly generated CD4SP thymocytes. Instead, it would be more helpful to (1) give full credit to the original study which already described the proinflammatory IFNg+ phenotype of CD4 T cells in CD11c-p28 floxed mice to be mediated by thymic dendritic cells [PMID: 23175475], and then, (2) build on that to explain that this study is aimed to understand the molecular basis of the original finding.

      In its essence, this study mostly rediscovers and reaffirms previously reported findings, but with different tools. While the mapping of epigenetic changes in the IFNg and T-bet gene loci and the STAT1 gene signature in CD4SP cells are interesting, these are expected results, and they only reaffirm what would be assumed from the literature. Thus, there is only incremental gain in new insights and information on the role of DC-derived IL-27 in driving the Th1 phenotype of CD4SP cells in CD11c-p28 floxed mice.

      Altogether, the major issues of this study remain unresolved:

      (1) It is still unclear why the p28-deficiency in thymic dendritic cells would result in increased STAT1 activation in CD4SP cells. Based on their in vitro experiments with blocking anti-IFNg antibodies, the authors conclude that it is unlikely that the constitutive activation of STAT1 would be a secondary effect due to autocrine IFNg production by CD4SP cells. However, this possibility should be further tested with in vivo models, such as Ifng-deficient CD11c-p28 floxed mice. Alternatively, is this an indirect effect by other IFNg producers in the thymus, such as iNKT cells? It is necessary to explain what drives the STAT1 activation in CD11c-p28 floxed CD4SP cells in the first place.

      (2) It is also unclear whether CD4SP cells are the direct targets of IL-27 p28. The cell-intrinsic effects of IL-27 p28 signaling in CD4SP cells should be assessed and demonstrated, ideally by CD4SP-specific deletion of IL-27Ra, or by establishing bone marrow chimeras of IL-27Ra germline KO mice.

      [Editors' note: The resubmitted paper was minimally revised, and many of the initial concerns remain unresolved.]

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zhang et al. demonstrate that CD4<sup>+</sup> single positive (SP) thymocytes, CD4<sup>+</sup> recent thymic emigrants (RTE), and CD4<sup>+</sup> T naive (Tn) cells from Cd11c-p28-flox mice, which lack IL-27p28 selectively in Cd11c+ cells, exhibit a hyper-Th1 phenotype instead of the expected hyper Th2 phenotype. Using IL-27R-deficient mice, the authors confirm that this hyper-Th1 phenotype is due to IL-27 signaling via IL-27R, rather than the effects of monomeric IL-27p28. They also crossed Cd11c-p28-flox mice with autoimmune-prone Aire-deficient mice and showed that both T cell responses and tissue pathology are enhanced, suggesting that SP, RTE, and Tn cells from Cd11c-p28-flox mice are poised to become Th1 cells in response to self-antigens. Regarding mechanism, the authors demonstrate that SP, RTE, and Tn cells from Cd11c-p28-flox mice have reduced DNA methylation at the IFN-g and Tbx21 loci, indicating 'de-repression', along with enhanced histone tri-methylation at H3K4, indicating a 'permissive' transcriptional state. They also find evidence for enhanced STAT1 activity, which is relevant given the well-established role of STAT1 in promoting Th1 responses, and surprising given IL-27 is a potent STAT1 activator. This latter finding suggests that the Th1-inhibiting property of thymic IL-27 may not be due to direct effects on the T cells themselves.

      Strengths:

      Overall the data presented are high quality and the manuscript is well-reasoned and composed. The basic finding - that thymic IL-27 production limits the Th1 potential of SP, RTE, and Tn cells - is both unexpected and well described.

      Weaknesses:

      A credible mechanistic explanation, cellular or molecular, is lacking. The authors convincingly affirm the hyper-Th1 phenotype at epigenetic level but it remains unclear whether the observed changes reflect the capacity of IL-27 to directly elicit epigenetic remodeling in developing thymocytes or knock-on effects from other cell types which, in turn, elicit the epigenetic changes (presumably via cytokines). The authors propose that increased STAT1 activity is a driving force for the epigenetic changes and resultant hyper-Th1 phenotype. That conclusion is logical given the data at hand but the alternative hypothesis - that the hyper-STAT1 response is just a downstream consequence of the hyper-Th1 phenotype - remains equally likely. Thus, while the discovery of a new anti-inflammatory function for IL-27 within the thymus is compelling, further mechanistic studies are needed to advance the finding beyond phenomenology.

      Thank you for your insightful comments and suggestions. We appreciate your feedback and have carefully considered the concerns raised regarding the mechanistic explanation of our findings. To address the issue of whether developing thymocytes are the direct targets of IL-27, we plan to conduct further studies using Cd4-IL-27ra knockout mice or mixed bone marrow chimeras consisting of wildtype and IL-27ra knockout cells. This approach will help us determine if IL-27 directly induces epigenetic remodeling in thymocytes or if the observed effects are secondary to influences from other cell types.

      Regarding the potential autocrine loop contributing to STAT1 hyperactivation, we have performed preliminary experiments by adding IFN-γ antibody to CD4<sup>+</sup> T cell cultures and observed no significant impact on STAT1 phosphorylation. If necessary, we will further investigate this possibility in vivo using Cd4-Ifng and CD11c-p28 double knockout mice.

      The detailed mechanisms underlying STAT1 hyperactivation remain to be elucidated. Recent studies have shown that IL-27p28 can act as an antagonist of gp130-mediated signaling. Structural analyses have also demonstrated that IL-27p28 interacts with EBI3 and the two receptor subunits IL-27Rα and gp130. Given these findings and the similar phenotypes observed in p28 and IL-27ra deficient mice, we speculate that the deficiency of either p28 or IL-27ra may increase the availability of gp130 for signaling by other cytokines. We will focus our future research on gp130-related cytokines to identify potential candidates that could lead to enhanced STAT1 activation in the absence of p28. Alternatively, the release of EBI3 in p28-deficient conditions may promote its interaction with other cytokine subunits. IL-35, which is composed of EBI3 and p35, is of particular interest given the involvement of IL-27Rα in its signaling pathway.

      To narrow down the candidate cytokines, we reanalyzed single-cell RNA sequencing data from CD11c-cre p28<sup>f/f</sup> and wild-type thymocytes (Signal Transduct Target Ther. 2022, DOI: 10.1038/s41392-022-01147-z). Our analysis revealed that thymic dendritic cells (DCs) were categorized into two distinct clusters, with both Il12a (p35, which forms IL-35 with EBI3) and Clcf1 (CLCF1) being upregulated in CD11c-cre p28<sup>f/f</sup> mice. In CD4 single-positive (SP) thymocytes, the expression levels of gp130 and IL-12Rβ2 (the receptor for IL-35) were comparable between knockout and wild-type mice. However, the mRNA levels of Lifr and Cntfr were low in CD4 SP thymocytes.

      Author response image 1.

      Single-cell RNA sequencing data from CD11c-cre p28<sup>f/f</sup> (KO) and wild-type thymocytes (Signal Transduct Target Ther. 2022, DOI: 10.1038/s41392-022-01147-z).

      We have planned to assess the protein levels of IL-35 and CLCF1 in dendritic cells, as well as their respective receptors, to evaluate their effects on STAT1 phosphorylation in CD4<sup>+</sup> thymocytes from both wild-type and p28-deficient mice. Unfortunately, we have encountered challenges with the mouse breeding and anticipate that it will take approximately six months to obtain the appropriate genotype necessary to complete these experiments.

      Reviewer #2 (Public Review):

      Summary:

      Naïve CD4 T cells in CD11c-Cre p28-floxed mice express highly elevated levels of proinflammatory IFNg and the transcription factor T-bet. This phenotype turned out to be imposed by thymic dendritic cells (DCs) during CD4SP T cell development in the thymus [PMID: 23175475]. The current study affirms these observations, first, by developmentally mapping the IFNg dysregulation to newly generated thymic CD4SP cells [PMID: 23175475], second, by demonstrating increased STAT1 activation being associated with increased T-bet expression in CD11c-Cre p28-floxed CD4 T cells [PMID: 36109504], and lastly, by confirming IL-27 as the key cytokine in this process [PMID: 27469302]. The authors further demonstrate that such dysregulated cytokine expression is specific to the Th1 cytokine IFNg, without affecting the expression of the Th2 cytokine IL-4, thus proposing a role for thymic DC-derived p28 in shaping the cytokine response of newly generated CD4 helper T cells. Mechanistically, CD4SP cells of CD11c-Cre p28-floxed mice were found to display epigenetic changes in the Ifng and Tbx21 gene loci that were consistent with increased transcriptional activities of IFNg and T-bet mRNA expression. Moreover, in autoimmune Aire-deficiency settings, CD11c-Cre p28-floxed CD4 T cells still expressed significantly increased amounts of IFNg, exacerbating the autoimmune response and disease severity. Based on these results, the investigators propose a model where thymic DC-derived IL-27 is necessary to suppress IFNg expression by CD4SP cells and thus would impose a Th2-skewed predisposition of newly generated CD4 T cells in the thymus, potentially relevant in autoimmunity.

      Strengths:

      Experiments are well-designed and executed. The conclusions are convincing and supported by the experimental results.

      Weaknesses:

      The premise of the current study is confusing as it tries to use the CD11c-p28 floxed mouse model to explain the Th2-prone immune profile of newly generated CD4SP thymocytes. Instead, it would be more helpful to (1) give full credit to the original study which already described the proinflammatory IFNg+ phenotype of CD4 T cells in CD11c-p28 floxed mice to be mediated by thymic dendritic cells [PMID: 23175475], and then, (2) build on that to explain that this study is aimed to understand the molecular basis of the original finding.

      In its essence, this study mostly rediscovers and reaffirms previously reported findings, but with different tools. While the mapping of epigenetic changes in the IFNg and T-bet gene loci and the STAT1 gene signature in CD4SP cells are interesting, these are expected results, and they only reaffirm what would be assumed from the literature. Thus, there is only incremental gain in new insights and information on the role of DC-derived IL-27 in driving the Th1 phenotype of CD4SP cells in CD11c-p28 floxed mice.

      Thank you for your valuable comments and suggestions. We appreciate your input and have carefully reviewed the concerns raised regarding the premise and novelty of our study.

      Indeed, the current study is built upon the foundational work of Zhang et al. (PMID: 23175475), which first described the proinflammatory IFN-γ<sup>+</sup> phenotype of CD4 T cells in CD11c-p28 floxed mice mediated by thymic dendritic cells. We have cited this study multiple times in our manuscript to acknowledge its significance. Our goal was to expand on this original finding by exploring the functional bias of newly generated CD4<sup>+</sup> T cells, elucidating the mechanisms underlying the hyper-Th1 phenotype in the absence of thymic DC-derived IL-27, and exploring its relevance in pathogenesis of autoimmunity.

      Our study revisits this phenomenon with a focus on the molecular and epigenetic changes that drive the Th1 bias in CD4SP cells. We demonstrated that the deletion of p28 in thymic dendritic cells leads to an unexpected hyperactivation of STAT1, which is associated with epigenetic modifications that favor Th1 differentiation. These findings provide a deeper understanding of the molecular basis behind the original observation of the Th1-skewed phenotype in CD11c-p28 floxed mice.

      However, as you pointed out, there is still a gap in understanding the precise link between p28 deficiency and STAT1 activation. We acknowledge that our study primarily reaffirms previously reported findings with different tools and approaches. While the mapping of epigenetic changes in the IFN-γ and T-bet gene loci and the STAT1 gene signature in CD4SP cells are interesting, they are indeed expected results based on the existing literature. This limits the novelty and incremental gain in new insights provided by our study.

      To address this gap and enhance the novelty of our findings, we plan to conduct further investigations to elucidate the detailed mechanisms connecting p28 deficiency to STAT1 hyperactivation. We will explore potential compensatory pathways or alternative signaling mechanisms that may contribute to the observed epigenetic changes and Th1 bias. Additionally, we will consider the broader impact of IL-27 deficiency on the thymic environment and its downstream effects on CD4<sup>+</sup> T cell differentiation.

      We appreciate your feedback and will work to strengthen the mechanistic underpinnings of our study. We believe that these additional efforts will provide a more comprehensive understanding of the role of DC-derived IL-27 in shaping the Th1 phenotype of CD4SP cells and contribute meaningful insights to the field.

      Altogether, the major issues of this study remain unresolved:

      (1) It is still unclear why the p28-deficiency in thymic dendritic cells would result in increased STAT1 activation in CD4SP cells. Based on their in vitro experiments with blocking anti-IFNg antibodies, the authors conclude that it is unlikely that the constitutive activation of STAT1 would be a secondary effect due to autocrine IFNg production by CD4SP cells. However, this possibility should be further tested with in vivo models, such as Ifng-deficient CD11c-p28 floxed mice. Alternatively, is this an indirect effect by other IFNg producers in the thymus, such as iNKT cells? It is necessary to explain what drives the STAT1 activation in CD11c-p28 floxed CD4SP cells in the first place.

      Thank you for your insightful suggestions. We appreciate your feedback and are committed to addressing the critical questions raised regarding the mechanisms underlying STAT1 activation in CD4SP cells in the context of p28 deficiency in thymic dendritic cells.

      To further investigate the potential autocrine loop for IFN-γ production, we will conduct in vivo studies using Cd4-Ifng and CD11c-p28 double knockout mice. This model will allow us to directly test whether IFN-γ produced by CD4SP cells themselves contributes to the observed STAT1 activation. Additionally, this approach will help exclude the possibility of indirect effects from other IFN-γ-producing cells in the thymus, such as invariant natural killer T (iNKT) cells, as suggested by the reviewer.

      As you correctly pointed out, a key unanswered question is what drives the initial STAT1 activation in CD4SP cells of CD11c-p28 floxed mice. Our current hypothesis is that p28 deficiency enhances the responsiveness of developing thymocytes to STAT1-activating cytokines. This hypothesis is supported by several lines of evidence:

      (1) Functional Antagonism: Recent studies have shown that IL-27p28 can act as an antagonist of gp130-mediated signaling. This suggests that in the absence of p28, the inhibitory effect of IL-27p28 on downstream signaling may be lost, leading to increased sensitivity to other cytokines that activate STAT1.

      (2) Structural Insights: Structural studies have demonstrated that IL-27p28 is centrally positioned within the complex formed with EBI3 and the two receptor subunits IL-27Rα and gp130. This positioning implies that p28 deficiency could disrupt the balance of cytokine signaling pathways involving these components.

      (3)  Phenotypic Similarity: We have observed a similar hyper-Th1 phenotype in mice lacking either p28 or IL-27ra. This similarity suggests that the absence of p28 may lead to increased availability of gp130 for signaling by other cytokines, thereby enhancing STAT1 activation.

      Based on these considerations, we hypothesize that the deficiency of p28 results in a greater availability of gp130 to transduce signals from other cytokines, ultimately leading to enhanced STAT1 activation in CD4SP cells. To identify the specific cytokine(s) responsible for this effect, we will focus on gp130-related cytokines, as outlined in our response to Reviewer 1. This will involve reanalysis of single-cell RNA sequencing data and further experimental validation to pinpoint the candidate cytokines driving the observed STAT1 hyperactivation.

      We are confident that these additional studies will provide a clearer understanding of the mechanisms linking p28 deficiency in thymic dendritic cells to increased STAT1 activation in CD4SP cells. We appreciate your guidance and look forward to sharing our findings.

      (2) It is also unclear whether CD4SP cells are the direct targets of IL-27 p28. The cell-intrinsic effects of IL-27 p28 signaling in CD4SP cells should be assessed and demonstrated, ideally by CD4SP-specific deletion of IL-27Ra, or by establishing bone marrow chimeras of IL-27Ra germline KO mice.

      Thanks for the suggestions. Further studies will be performed to test whether developing thymocytes are the direct targets of IL-27 using Cd4-IL-27ra knockout mice or mixed bone marrow chimeras of wildtype and IL-27ra knockout cells. Unfortunately, we have encountered challenges with the mouse breeding and anticipate that it will take approximately six months to obtain the appropriate genotype necessary to complete these experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Is the hyper-STAT1 response seen in T cells from Cd11c-p28-flox mice due to increased availability and/or increased responsiveness to STAT1 activating cytokines? Studies, where SP, RTE, and Tn cells are pulsed ex vivo with IL-27 and/or other STAT1-activating cytokines, would address the latter (with STAT1 phosphorylation as the major readout). Given the ability of IL-27 to activate STAT3, this pathway should also be addressed. It would be of interest if STAT1 signaling is selectively impaired, as suggested by the work of Twohig et al. (doi: 10.1038/s41590-019-0350-0.)(which should be cited and discussed).

      Thank you for your insightful suggestions. We appreciate your input and are committed to addressing the critical questions raised regarding the mechanisms underlying the hyper-activation of STAT1 in T cells from Cd11c-p28-flox mice.

      The detailed mechanisms driving the hyper-activation of STAT1 remain to be fully elucidated. Recent studies have shown that IL-27p28 can act as an antagonist of gp130-mediated signaling. Structural analyses have also demonstrated that IL-27p28 interacts with EBI3 and the two receptor subunits IL-27Rα and gp130. Considering these findings and the similar phenotypes observed in p28 and IL-27ra deficient mice, we speculate that the deficiency of either p28 or IL-27ra may increase the availability of gp130 for signaling by other cytokines. This could potentially enhance the responsiveness of developing thymocytes to STAT1-activating cytokines. We will focus our future research on gp130-related cytokines to identify the candidate(s) responsible for the enhanced STAT1 activation in the absence of p28. Alternatively, the release of EBI3 in the absence of p28 may facilitate its coupling with other cytokine subunits. IL-35, which is composed of EBI3 and p35, is of particular interest given the involvement of IL-27Rα in its signaling pathway.

      To narrow down the candidate cytokines, we reanalyzed single-cell RNA sequencing data from CD11c-cre p28<sup>f/f</sup> and wild-type thymocytes (Signal Transduct Target Ther. 2022, DOI: 10.1038/s41392-022-01147-z). Our analysis revealed that thymic dendritic cells (DCs) were categorized into two distinct clusters, with both Il12a (p35, which forms IL-35 with EBI3) and Clcf1 (CLCF1) being upregulated in CD11c-cre p28<sup>f/f</sup> mice. In CD4 single-positive (SP) thymocytes, the expression levels of gp130 and IL-12Rβ2 (the receptor for IL-35) were comparable between knockout and wild-type mice. However, the mRNA levels of Lifr and Cntfr were low in CD4 SP thymocytes.

      Single-cell RNA sequencing data from CD11c-cre p28<sup>f/f</sup> (KO) and wild-type thymocytes (Signal Transduct Target Ther. 2022, DOI: 10.1038/s41392-022-01147-z).

      We have planned to assess the protein levels of IL-35 and CLCF1 in dendritic cells, as well as their respective receptors, to evaluate their effects on STAT1 phosphorylation in CD4<sup>+</sup> thymocytes from both wild-type and p28-deficient mice. Unfortunately, we have encountered challenges with the mouse crosses and anticipate that it will take approximately six months to obtain the appropriate genotype necessary to complete these experiments.

      As you correctly noted, the ability of IL-27 to activate STAT3 signaling is an important consideration. We have carefully examined this pathway in our current study, and our results indicate that neither total nor phosphorylated STAT3 and STAT4 were found to be altered with IL-27p28 ablation (Figure 5B). This suggests that the impact is indeed specific to the STAT1 axis. We will also consider the possibility of selective impairment of STAT1 signaling, as suggested by the work of Twohig et al. (doi: 10.1038/s41590-019-0350-0), which we will cite and discuss in our revised manuscript.

      We appreciate your guidance and will work diligently to address these questions in our future studies. We look forward to sharing our findings and contributing to a deeper understanding of the role of IL-27 in the regulation of STAT1 activation in T cells.

      (2) It may be that the hyper-Th1 phenotype is not due to cell-intrinsic differences in STAT1 signaling (see Major Point 1) but rather, hyper-responsiveness to TCR + Co-stimulation (as provided in the re-stim assays used throughout). This issue is particularly relevant for the ChIP studies where the author notes that, "...we chose to treat the cells with anti-CD3 and anti-CD28 for 3 days prior to the assay". Why not treat these cells ex vivo with STAT1-activating cytokines instead of anti-CD3/CD28? The current methodology makes it impossible to distinguish between enhanced TCR/CD28 and cytokine signaling, and ultimately does not address SP, RTE, and Tn cells (since they are now activated, blasts.).

      Thank you for raising this important point. We appreciate your feedback and fully recognize the limitations of our current methodology, which uses anti-CD3/CD28 stimulation for ChIP experiments. This approach indeed complicates the distinction between enhanced TCR/CD28 signaling and cytokine-mediated STAT1 activation, particularly in the context of SP, RTE, and Tn cells, which become activated blasts under these conditions.

      To address these concerns and provide more precise insights into the mechanisms underlying the hyper-Th1 phenotype, we are revising our experimental strategy. Specifically, we are shifting our focus to directly investigate the role of STAT1-activating cytokines in the absence of p28. Based on our previous analysis and re-evaluation of single-cell RNA sequencing data, we have identified IL-35 and CLCF1 as the most promising candidate cytokines.

      We are now planning to perform ChIP experiments using these cytokines directly, rather than relying on TCR + co-stimulation. This approach will allow us to more accurately evaluate the impact of these cytokines on STAT1 signaling in CD4<sup>+</sup> T cells. By treating cells ex vivo with IL-35 and CLCF1, we aim to elucidate whether the observed hyper-Th1 phenotype is driven by enhanced responsiveness to these cytokines, independent of TCR/CD28 signaling.

      We regret to inform you that we have encountered unforeseen challenges with mouse crosses, which have delayed our progress. As a result, we anticipate a delay of approximately six months to obtain the appropriate genotypes necessary to complete these experiments. We understand the importance of these revisions and are committed to overcoming these challenges to provide a more robust and accurate analysis.

      (3) Studies involving STAT1-deficient mice are necessary (ideally with STAT1 deficiency restricted to the T cell compartment). At a minimum, it must be confirmed that these phenocopy Cd11c-p28-flox mice in terms of SP, RTE, and Tn cells (and their Th1-like character). If a similar hyper-Th1 phenotype is not seen, then the attendant hyper STAT1 response can only be viewed as a red herring.

      Thank you for raising this important consideration. We acknowledge the significance of addressing the role of STAT1 specifically within the T cell compartment to validate the mechanisms underlying the hyper-Th1 phenotype observed in Cd11c-p28-flox mice.

      We agree that studies involving STAT1-deficient mice, particularly with STAT1 deficiency restricted to the T cell compartment, are essential to confirm whether the hyper-Th1 phenotype is directly driven by STAT1 hyperactivation in T cells. Ideally, such studies would help determine if STAT1 deficiency in T cells phenocopies the Cd11c-p28-flox mice, particularly in terms of the SP, RTE, and Tn cells and their Th1-like characteristics.

      Unfortunately, we currently face challenges in obtaining and breeding the appropriate STAT1 conditional knockout mice with T cell-specific deletion. This has limited our ability to conduct these experiments in a timely manner. However, we recognize the importance of these studies and are actively working to secure the necessary resources and models to address this critical question.

      We understand that without these experiments, any conclusions drawn about the role of STAT1 hyperactivation in driving the hyper-Th1 phenotype must be considered with caution. If a similar hyper-Th1 phenotype is not observed in STAT1-deficient T cells, then the hyper-STAT1 response may indeed be a secondary or compensatory effect rather than a primary driver.

      We are committed to pursuing these studies and will prioritize them in our future work. We will keep you informed of our progress and will update the manuscript with the results of these experiments once completed. We appreciate your patience and understanding as we work to address this important aspect of our research.

      (4) The authors mine their RNA-seq data using a STAT1 geneset sourced from studies involving IL-21 as the upstream stimulus. Why was this geneset was chosen? It is true that IL-21 can activate STAT1 but STAT3 is typically viewed as its principal signaling pathway. There are many more appropriate genesets, especially from studies where T cells are cultured with traditional STAT1 stimuli (e.g. IL-27 in Hirahara et al., Immunity 2015 or interferons in Iwata et al., Immunity 2017)doi: 10.1016/j.immuni.2015.04.014, 10.1016/j.immuni.2017.05.005).

      Thank you for your insightful comments. We appreciate your attention to the choice of the STAT1 gene set in our RNA-seq analysis.

      Initially, we selected the STAT1 gene set from a study involving IL-21 stimulation (GSE63204) because IL-21 is known to activate STAT1, despite STAT3 being its principal signaling pathway. However, we acknowledge that this choice may not have been optimal given the context of our study, which focuses on the role of IL-27 and its impact on STAT1 signaling in T cells.

      We agree that gene sets derived from studies using more canonical STAT1 stimuli, such as IL-27 or interferons, would be more relevant for our analysis. In response to your suggestion, we have revised our approach and adopted a gene set from GSE65621, which compares STAT1-/- and wild-type CD4 T cells following IL-27 stimulation. This gene set is more aligned with the focus of our study and provides a more appropriate reference for identifying STAT1-activated genes.

      Our re-analysis revealed that 270 genes (FPKM > 1, log2FC > 2) were downregulated in STAT1-/- cells compared to wild-type cells, which we defined as STAT1-activated genes. Notably, approximately 50% of the upregulated differentially expressed genes (55 out of 137) in our dataset fell into the category of STAT1-activated genes, while none were classified as STAT1-suppressed genes (Figure 4B). Furthermore, Gene Set Enrichment Analysis (GSEA) demonstrated significant enrichment of STAT1-activated genes in the transcriptome of CD4 SP thymocytes from the knockout mice (NES = 1.67, nominal p-value = 10<sup>-16</sup>, Figure 4D).

      These findings support our conclusion that IL-27p28 deficiency leads to enhanced STAT1 activity in CD4 SP thymocytes. We believe that using a more relevant gene set has strengthened our analysis and provided clearer insights into the molecular mechanisms underlying the observed phenotype.

      We have cited the relevant studies (Hirahara et al., Immunity 2015; Iwata et al., Immunity 2017) to provide context for our revised analysis and to acknowledge the importance of canonical STAT1 stimuli in T cell signaling. We appreciate your guidance and are confident that these revisions have improved the robustness and relevance of our findings.

      (5) Given the ability of IL-27 to activate STAT1 in T cells, it is surprising that SP, RTE, and Tn cells from Cd11c-p28-flox mice exhibit more STAT1 signaling than WT controls. If not IL-27, then what is the stimulus for this STAT1 activity? The authors rule out autocrine IFN-g production in vitro (not in vivo) but provide no further insight.

      Thank you for raising this important question. We appreciate your interest in understanding the source of enhanced STAT1 signaling in SP, RTE, and Tn cells from Cd11c-p28-flox mice, especially given the role of IL-27 in activating STAT1 in T cells. As previously discussed, we have identified IL-35 and CLCF1 as the most likely candidate cytokines driving the observed STAT1 activity in the absence of p28. These cytokines are of particular interest due to their potential to activate STAT1 and their relevance in the context of our study.

      To address the question of what drives the enhanced STAT1 signaling, we are planning to perform ChIP experiments using these cytokines directly. This approach will allow us to evaluate their impact on STAT1 signaling more precisely, without relying on TCR + co-stimulation. By treating cells ex vivo with IL-35 and CLCF1, we aim to determine whether these cytokines are responsible for the increased STAT1 activity observed in Cd11c-p28-flox mice.

      We acknowledge that ruling out autocrine IFN-γ production in vitro, as we have done, does not fully address the potential role of IFN-γ in vivo. Therefore, we are also considering additional in vivo experiments to further investigate this possibility. These studies will help us determine whether other sources of IFN-γ or other cytokines contribute to the observed STAT1 hyperactivation. Unfortunately, due to unforeseen challenges with mouse crosses, we anticipate a delay of approximately six months to obtain the appropriate genotypes necessary for these experiments. We are actively working to resolve these challenges and will update the manuscript with the results of these experiments upon completion.

      (6) The RNAseq data affirms that SP, RTE, and Tn cells from Cd11c-p28-flox mice exhibit more STAT1 signaling than WT controls. However, this does little to explain the attendant hyper-Th1 phenotype. Is there evidence that epigenetic machinery is deregulated (to account for changes in DNA. histone methylation)? Were IFN-g and Tbet among these few observed DEG? If so, then this should be highlighted. If not, then the authors must address why not. Are there clues as to why STAT1 signing is exaggerated? Also, the hyper-STAT1 effect should be better described using more rigorous STAT1- and interferon-signature genesets (see the work of Virginia Pascual, Anne O'Garra).

      Thank you for your valuable feedback and suggestions. We appreciate your interest in understanding the mechanisms underlying the hyper-Th1 phenotype observed in Cd11c-p28-flox mice. Below, we address each of your points in detail:

      (1) Epigenetic Regulation:

      We have conducted a thorough analysis of the global levels of key histone modifications, including H3K4me3, H3K9me3, and H3K27me3, as well as the mRNA expression of the enzymes responsible for catalyzing these marks. Our results indicate that there are no significant differences in these histone modifications or the expression of the associated enzymes between Cd11c-p28<sup>f/f</sup> and wildtype mice (Figure 3-figure supplement 1A-C). This suggests that the enhanced STAT1 signaling is not a consequence of broad epigenetic deregulation. Instead, we hypothesize that the observed changes may be driven by more specific molecular mechanisms, such as cytokine signaling pathways.

      (2) IFN-γ and Tbx21 Expression:

      Regarding the expression of Th1-associated genes, our analysis revealed a modest induction of ifng and tbx21 (encoding T-bet) in the CD4SP population following TCR stimulation. However, the baseline expression levels of these genes were quite low in freshly isolated CD4SP cells. Specifically, ifng was undetectable, and tbx21 had an FPKM of 0.29 in wildtype mice compared to 1.05 in Cd11c-p28<sup>f/f</sup> mice. While these findings indicate some upregulation of Th1-associated genes, the overall expression levels remain relatively low, suggesting that additional factors may contribute to the hyper-Th1 phenotype.

      (3) STAT1 Signature Genesets:

      We have revised our analysis to incorporate more rigorous STAT1 and interferon-signature genesets, as suggested. We have adopted gene sets from well-established studies, including those by Virginia Pascual and Anne O'Garra, to provide a more comprehensive and accurate assessment of STAT1 signaling. This approach has enhanced our ability to identify and characterize the genes involved in the STAT1 pathway, providing clearer insights into the exaggerated STAT1 signaling observed in our model.

      We appreciate your guidance and are committed to refining our analysis to provide a more detailed understanding of the mechanisms driving the hyper-Th1 phenotype in Cd11c-p28-flox mice. We will continue to explore the potential roles of cytokines such as IL-35 and CLCF1, as well as other factors that may contribute to the observed changes in STAT1 signaling and Th1 differentiation. We look forward to sharing our updated findings and further discussing these mechanisms in our revised manuscript.

      (7) Is the hyper-Th1 phenotype of SP, RTE, and Tn cells from Cd11c-p28-flox mice unique to the CD4 compartment? Are developing CD8<sup>+</sup> cells similarly prone to increased STAT1 signaling and IFN-g production?

      Thank you for raising this important point. Our data indeed suggests that the hyper-Th1 phenotype observed in SP, RTE, and Tn cells from Cd11c-p28<sup>f/f</sup> mice is unique to the CD4<sup>+</sup> T cell compartment. Specifically, we found that while CD4<sup>+</sup> SP cells from Cd11c-p28<sup>f/f</sup> mice exhibited a significant upregulation in IL-27 receptor expression (both IL27Ra and gp130) compared to wild-type (WT) mice, CD8<sup>+</sup> SP cells from the same genotype showed markedly lower expression of these receptor subunits (Figure 1C in Sci Rep. 2016 Jul 29:6:30448. DOI: 10.1038/srep30448). This finding is further supported by our observation that the phosphorylation levels of STAT1, STAT3, and STAT4, downstream targets of IL-27 signaling, were comparable between CD8 SP cells from Cd11c-p28<sup>f/f</sup> and WT mice (Author response image 1). Additionally, we observed no significant difference in IFN-γ and granzyme B production between naïve CD8 T cells isolated from the lymph nodes of the two genotypes (Author response image 1). Taken together, these results suggest that the enhanced Th1 differentiation and IFN-γ production seen in the CD4<sup>+</sup> T cell population from Cd11c-p28<sup>f/f</sup> mice is not recapitulated in the CD8<sup>+</sup> T cell lineage.

      Author response image 2.

      (A) Intracellular staining was performed with freshly isolated thymocytes from Cd11c-p28<sup>f/f</sup> mice and WT littermates mice using antibodies against phosphorylated STAT1 (Y701), STAT3 (Y705), and STAT4 (Y693). The mean fluorescence intensity (MFI) for CD8 SP from three independent experiments (mean ± SD, n=3). (B) CD8<sup>+</sup> naive T cells were cultured under Th0 conditions for 3 days. The frequency of IFN-γ-, and granzyme B-producing CD8<sup>+</sup> T cells were determined analyzed by intracellular staining. Representative dot plots (left) and quantification (right, mean ± SD, n=6).

      Minor points and questions

      (1) Line 84 - Villarino et al. and Pflanz et al. are mis-referenced. Neither involves Trypanosome studies. The former is on Toxoplasma infection and, thus, should be properly referenced in the following sentence.

      Thank you for pointing out this error. You are correct that the references to Villarino et al. and Pflanz et al. were misapplied in the context of Trypanosome studies. Villarino et al. focuses on Toxoplasma infection, and we appreciate your guidance to ensure accurate citation. We will correct this in the manuscript and properly cite the studies in their appropriate contexts. Thank you for your vigilance in maintaining the accuracy of our references.

      (2) T-bet protein should also be measured by cytometry

      We sincerely thank the reviewer for the valuable suggestion regarding the measurement of T-bet protein levels. In response to this comment, we have performed additional experiments to quantify T-bet protein expression using flow cytometry. The results of these analyses have been incorporated into the revised manuscript as Figure 1F.

      Reviewer #2 (Recommendations For The Authors):

      (1) When new mouse strains are generated in this study, there is no comment on whether there are any changes in the frequency or cell number of CD4 T cells. For instance, in Aire-deficient CD11c-p28 floxed mice, it should be noted whether CD4SP, naïve CD4, and CD4 RTE are all the same in frequency and number compared to their littermate controls. Also, is there any effect on the generation of these thymocytes?

      We sincerely thank the reviewer for raising this important point regarding the potential changes in the frequency and cell numbers of CD4<sup>+</sup> T cells in the newly generated mouse strains. In response to the reviewer’s question, we would like to clarify the following:

      (1) Impact of Aire deficiency on CD4<sup>+</sup> T Cells:

      As previously reported by us and others (Aging Dis. 2019, doi: 10.14336/AD.2018.0608; Science. 2002, doi: 10.1126/science.1075958), Aire deficiency does not significantly alter the overall number or frequency of CD4 single-positive (CD4SP) thymocytes, recent thymic emigrants (RTEs), or naïve CD4<sup>+</sup> T cells. However, it profoundly affects their composition and functional properties, leading to the escape of autoreactive T cells and subsequent autoimmune manifestations.

      (2) Observations in Cd11c-p28<sup>f/f</sup>Aire<sup>-/-</sup> mice:

      In our study, we observed that the number and frequency of CD4<sup>+</sup> T cells in the spleen and lymph nodes were comparable among Cd11c-p28<sup>f/f</sup>, Aire<sup>-/-</sup>, and Cd11c-p28<sup>f/f</sup>Aire<sup>-/-</sup> mice, and WT controls. This suggests that the genetic modifications did not significantly impact the overall development or peripheral maintenance of CD4<sup>+</sup> T cells.

      Author response image 3.

      (3) Challenges in assessing RTEs in double knockout mice:

      To accurately assess RTEs in the double knockout mice, it would be necessary to cross these mice with Rag-GFP reporter mice, which specifically label RTEs. However, breeding the appropriate mouse strain for this analysis would require additional time and resources, which were beyond the scope of the current study.

      (2) There are a couple of typos throughout the manuscript. For example, line 91: IL-27Rα or line 313: phenotype.

      We apologize for the typographical errors. We have carefully reviewed the entire manuscript and corrected all identified mistakes, including those on line 91 (IL-27Rα) and line 305 (phenotype).

      (4) The authors should show each data point on their bar graphs.

      Thank you for the suggestion. We have presented each data point on their bar graphs in the revised manuscript.

      (4) It should be noted from which organs the RTE and the naïve T cells were harvested.

      Thank you for the constructive suggestion. We isolated CD4<sup>+</sup> RTEs and mature naive CD4<sup>+</sup> T cells by sorting GFP<sup>+</sup>CD4<sup>+</sup>CD8<sup>-</sup>CD<sup>-</sup>NK1.1<sup>-</sup> cells (RTEs) and GFP<sup>-</sup>CD4<sup>+</sup>CD8<sup>-</sup>CD<sup>-</sup>CD44<sup>lo</sup> cells (naive T cells) from lymph nodes. This detail has been added to the manuscript on line 475.

    1. eLife Assessment

      This paper present an important theoretical exploration of how a flexible protein domain with multiple DNA binding sites may simultaneously provide stability to the DNA-bound state and enables exploration of the DNA strand. The authors present compelling evidence that their findings have implications for the way intrinsically disordered regions (IDR) of transcription factors proteins (TF) can enhance their ability to efficiently find their binding site on the DNA from which they exert control over the transcription of their target gene. The paper concludes with a comparison of model predictions with experimental data which gives further support to the proposed model.

    2. Reviewer #1 (Public review):

      Summary:

      The authors define the principles that, based on first principles, should be guiding the optimisation of trascription factors with intrinsically disordered regions (IDR). The first part of the study defines the following principles to optimize the binding affinities to the genome in the receiving region that is called the "antenna": (i) reduce the target to IDR-binding distance on the genome, (ii) optimise the distance betwee the DNA binding domain and the binding sites on the IDR to be as close as possible to the distance between their binding sites on the genome; (iii) keep the same number of binding sites and their targets and modulate this number with binding strength, reducing them with increased strenght; (iv) modulate the binding strenght to be above a threshold that depends on the proportion of IDR binding sites in the antenna. The second part defines the scaling of the seach time in function of key parameters such as the volume of the nucleus, and the size of the antenna, derived as a combination of 3D search of the antenna and 1D "octopusing" on the antenna. The third part focuses on validation, where the current results are compared to binding probabilith data from a single experiment, and new experiment are proposed to further validate the model as well as testing designed transcription factors.

      Strengths:

      The strength of this work is that it provides simple, interpretable and testable theoretical conclusions. This will allow the derived design principles to be understood, evaluated and improved in the future. The theoretical derivations are rigorous. The authors provides a comparison to experiments, and also propose new experiments to be performed in the future, this is a great value in the paper since it will set the stage and inspire new experimental techniques. Further, the field needs inspiration and motivations to develop these techniques, since they are required to benchmark the transcription factors designed with the methods presented in this paper, as well as to develop novel data based or in vivo methods that would greatly benefit the field. As such, this paper is a fundamental contribution to the field.

      Weaknesses:

      The model assumption that the interaction between the transcription factor and the DNA outside of the antenna region is negligible is probably too strong for many/most transcription factors, particularly in organisms with a longer genome than yeasts. The model presents many first principles to drive the design of transcription factor, but arguably, other principles and mechanisms might also play a role by being beneficial to the search and binding process. Specifically: (i) a role of the IDR in complex formation and cooperativity between multiple trascription factors, (ii) ability of the IDR to do parallel searching based on multiple DNA binding sites spaced by disordered regions, (iii) affinity of the IDR to specific compartmentalisations in the nucleus reducing the search time, etc. The paper would be improved by a discussion over alternative mechanisms.

    3. Reviewer #2 (Public review):

      Summary:

      This is an interesting theoretical exploration of how a flexible protein domain, which has multiple DNA-binding sites along it, affects the stability of the protein-DNA complex. It proposes a mechanism ("octopusing") for protein doing a random walk while bound to DNA which simultaneously enables exploration of the DNA strand and stability of the bound state.

      Strengths:

      Stability of the protein-DNA bound state and the ability of the protein to perform 1d diffusion along the DNA are two properties of a transcription factor that are usually seen as being in opposition of each other. The octopusing mechanism is an elegant resolution of the puzzle of how both could be accommodated. This mechanism has interesting biological implications for the functional role of intrinsically disordered domains in transcription factor (TF) proteins. They show theoretically how these domains, if flexible and able to make multiple weak contacts with the DNA, can enhance the ability of the TF to efficiently find their binding site on the DNA from which they exert control over the transcription of their target gene. The paper concludes with a comparison of model predictions with experimental data which gives further support to the proposed model. Overall, this is an interesting and well executed theoretical paper that proposes an interesting idea about the functional role for IDR domains in TFs.

      Weaknesses:

      IDR domains are assumed flexible which I believe is not always the case. Also, I'm not sure how ubiquitous are the assumed binding sites on the DNA for multiple subdomains along the IDR. These assumptions though seem like interesting points of departure for further experiments.

    1. eLife Assessment

      This manuscript applies state-of-the-art techniques to define the cellular composition of the dorsal vagal complex in two rodent species (mice and rats). The result is an important resource that substantially advances our understanding of the dorsal vagal complex's role in the regulation of feeding and metabolism while also highlighting key differences between species. While most of the analyses in the manuscript provide convincing insight into the cellular architecture of the dorsal vagal complex, other aspects are incomplete and could be bolstered by additional evidence.

    2. Reviewer #1 (Public review):

      Summary:

      This paper uses state-of-the-art techniques to define the cellular composition and its complexity in two rodent species (mice and rats). The study is built on available datasets but extends those in a way that future research will be facilitated. The study will be of high impact for the study of metabolic control.

      Strengths:

      (1) The study is based on experiments that are combined with two exceptional data sets to provide compelling evidence for the cellular composition of the DVC.

      (2) The use of two rodent species is very useful.

      Weaknesses:<br /> There is no conceptual weakness, the performance of experiments is state-of-the-art, and the discussion of results is appropriate. One minor point that would further strengthen the data is a more distinct analysis of receptors that are characteristic of the different populations of neuronal and non-neuronal cells; this part could be improved. Currently, it is only briefly mentioned, e.g., line 585ff. See also lines 603ff; it is true that the previous studies lack some information about the neurotransmitter profile of cells, but combining all data sets should result in an analysis of the receptors as well, e.g. in the form of an easy-to-read table.

    3. Reviewer #2 (Public review):

      In this manuscript, Hes et al. present a comprehensive multi-species atlas of the dorsal vagal complex (DVC) using single-nucleus RNA sequencing, identifying over 180,000 cells and 123 cell types across five levels of granularity in mice and rats. Intriguingly, the analysis uncovered previously uncharacterized cell populations, including Kcnj3-expressing astrocytes, neurons co-expressing Th and Cck, and a population of leptin receptor-expressing neurons in the rat area postrema, which also express the progenitor marker Pdgfra. These findings suggest species-specific differences in appetite regulation. This study provides a valuable resource for investigating the intricate cellular landscape of the DVC and its role in metabolic control, with potential implications for refining obesity treatments targeting this hindbrain region.

      In line with previous work published by the PI, the topic is of clear scientific relevance, and the data presented in this manuscript are both novel and compelling. Additionally, the manuscript is well-structured, and the conclusions are robust and supported by the data. Overall, this study significantly enhances our understanding of the DVC and sheds light on key differences between rats and mice.

      I applaud the authors for the depth of their analysis. However, I have a few major concerns, comments, and suggestions that should be addressed.

      (1) If I understand the methodology correctly, mice were fasted overnight and then re-fed for 2 hours before being sacrificed (lines 91-92), which occurred 4 hours after the onset of the light phase (line 111). This means that the re-fed animals had access and consequently consumed food when they typically would not. While I completely recognize that every timepoint has its limitations, the strong influence of the circadian rhythm on the DVC gene expression (highlighted by the work published by Lukasz Chrobok), and the fact that timing of food/eating is a potent Zeitgeber, might have an impact on the analysis and should be mentioned as a potential limitation in the discussion (along with citing Dr Chrobok's work). Could this (i.e., eating during a time when the animals are not "primed by their own circadian clock to eat" potentially explain why the meal-related changes in gene expression were relatively small?

      (2) In the Materials and Methods section, LiCl is mentioned as one of the treatment conditions; however, very little corresponding data are presented or discussed. Please include these results and elaborate on the rationale for selecting LiCl over other anorectic compounds.

      (3) The number of animals used differs significantly between species, which the authors acknowledge as a limitation in the discussion. Since the authors took advantage of previously published mouse data sets (Ludwig and Dowsett data sets), I wonder if the authors could compare/integrate any rat data set currently available in rats as well to partially address the sample size disparity.

      (4) Dividing cells in AP vs NTS vs DMX clusters and analyzing potential species differences would significantly enhance the quality of the manuscript, given the partially diverse functions of these regions. This could be done by leveraging existing published datasets that employed spatial transcriptomics or more classical methodologies (e.g., PMID: 39171288, PMID: 39629676, PMID: 38092916). I would be interested to hear the authors' perspective on the feasibility of such an analysis.

      (5) Given the manuscript's focus on feeding and metabolism, I believe a more detailed description and comparison of the transcription profile of known receptors, neurotransmitters, and neuropeptides involved in food intake and energy homeostasis between mice and rats would add value. Adding a curated list of key genes related to feeding regulation would be particularly informative.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript from Cecilia H et al provides a compelling resource for single nuclei RNA sequencing data with an emphasis on facilitating the integration of future data sets across mouse and rat data sets.

      Strengths:

      There are also several interesting findings that are highlighted, even though without a functional assay the importance remains unclear. However, the manuscript properly addresses where conclusions are speculative.

      As with other snRNA seq datasets the manuscript demonstrates convincingly an increased level of complexity, while other neuronal populations like Cck and Th neurons were reproduced. Several recent findings from other groups are well addressed and put into a new context, e.g., DMV expression of AgRP (and Hcrt) was found to result from non-coding sequences, co-localization of Cck/Th was identified in a small subset. These statements are informative.

      The integration of rat data into the mouse data sets is excellent, and the comparison of cellular groups is very detailed, with interesting differences between mouse and rat data.<br /> All data sets are easily accessible and usable on open platforms, this will be an impactful resource for other researchers.

      Weaknesses:

      The data analysis seems incomplete. The title indicates the integration of mouse and rat data into a unified rodent dataset. But the discrepancy of animal numbers (30 mice vs. 2 rats) does not fit well with that focus.

      On the other hand, the mouse group is further separated into different treatments to study genetic changes that are associated with distinct energy states of fed/fasting/refeeding responses. Yet, this aspect is not addressed in depth.

      While the authors find transcriptional changes in all neuronal and non-neuronal cell types, which is interesting, the verification of known transcriptional changes (e.g., cFos) is unaddressed. cFos is a common gene upregulated with refeeding that was surprisingly not investigated, even though this should be a strong maker of proper meal-induced neuronal activation in the DMV. This is a missed opportunity either to verify the data set or to highlight important limitations if that had been attempted without success.

      Additional considerations:

      (1) The focus on transmitter classification is highlighted, but surprisingly, the well-accepted distinction of GABAergic neurons by Slc32a1 was not used, instead, Gad1 and Gad2 were used as GABAergic markers. While this may be proper for the DMV, given numerous findings that Gad1/2 are not proper markers for GABAergic neurons and often co-expressed in glutamatergic populations, this confound should have been addressed to make a case if and why they would be proper markers in the DMV.

      (2) Figure S3 for anatomical localization of clusters is excellent, but several of the cluster gene names do not have a good signal in the DMV. Specifically, the mixed neurons that do not seem to have clear marker genes. What top markers (top 10?) were used to identify these anatomical locations? At least some examples should be shown for anatomical areas to support Figure S3.

      (3) Page 15, lines 410-411: "...could not find clusters sharing all markers with our neuronal classes...". Are the authors trying to say that the DMV has more diverse neurons than other brain sites? It seems not too unusual that the hypothalamus is different from the brainstem. Maybe this could be stated more clearly, and the importance of this could be clarified.

      (4) The finding of GIRK1 astrocytes is interesting, but the emphasis that this means these astrocytes are highly/more excitable is confusing. This was not experimentally addressed and should be put into context that astrocyte activation is very different from neuronal activation. This should be better clarified in the results and discussion.

      (5) The Pdgfra IHC as verification is great, but images are not very convincing in distinguishing the 2 (mouse) or 3 (rat) classes of cells. Why not compare Pdgfra and HuC/D co-localization by IHC and snRNAseq data (using the genes for HuC/D) in the mouse and in the rat? That would also clarify how specific HuC/D is for DMV neurons, or if it may also be expressed in non-neuronal populations.

    1. eLife Assessment

      This useful study presents computational analyses of over 5,000 predicted extant and ancestral nitrogenase structures. While the data and some analyses are solid, the study remains incomplete in demonstrating that the metrics used for comparing nitrogenase structures are statistically rigorous. The data generated in this study provide a vast resource that can serve as a starting point for functional studies of reconstructed and extant nitrogenases.

    2. Reviewer #1 (Public review):

      This was a clearly written manuscript that did an excellent job summarizing complex data. In this manuscript, Cuevas-Zuviría et al. use protein modeling to generate over 5,000 predicted structures of nitrogenase components, encompassing both extant and ancestral forms across different clades. The study highlights that key insertions define the various Nif groups. The authors also examined the structures of three ancestral nitrogenase variants that had been previously identified and experimentally tested. These ancestral forms were shown in earlier studies to exhibit reduced activity in Azotobacter vinelandii, a model diazotroph.

      This work provides a useful resource for studying nitrogenase evolution. However, its impact is somewhat limited due to a lack of evidence linking the observed structural differences to functional changes. For example, in the ancestral nitrogenase structures, only a small set of residues (lines 421-431) were identified as potentially affecting interactions between nitrogenase components. Why didn't the authors test whether reverting these residues to their extant counterparts could improve nitrogenase activity of the ancestral variants?

      Additionally, the paper feels somewhat disconnected. The predicted nitrogenase structures discussed in the first half of the manuscript were not well integrated with the findings from the ancestral structures. For instance, do the ancestral nitrogenase structures align with the predicted models? This comparison was never explicitly made and could have strengthened the study's conclusions.

    3. Reviewer #2 (Public review):

      This work aims to study the evolution of nitrogenanses, understanding how their structure and function adapted to changes in the environment, including oxygen levels and changes in metal availability.

      The study predicts > 5000 structures of nitrogenases, corresponding to extant, ancestral, and alternative ancestral sequences. It is observed that structural variations in the nitrogenases correlate with phylogenetic relationships. The amount of data generated in this study represents a massive undertaking that is certain to be a resource for the community. The study also provides strong insight into how structural evolution correlates with environmental and biological phenotypes.

      The challenge with this study is that all (or nearly all) of the quantitative analyses presented are based on RMSD calculations, many of which are under 2 angstroms. For all intents and purposes, two structures with RMSD < 2 angstroms could be considered 'structurally identical'. A lot of insight generated is based on minuscule differences in RMSD, for which it is not clear that they are significantly different. The suggestion would be to find a way to evaluate the RMSD metric and determine whether these values, as obtained for structures being compared, are reliable. Some options are provided in earlier studies: PMID: 11514933, PMID: 17218333, PMID: 11420449, PMID: 8289285 (and others).

      It could also be valuable to focus more on site-specific RMSDs rather than Global RMSDs. The high conservation in the nitrogenases likely ensures that the global RMSDs will remain low across the family. Focusing on specific regions might reveal interesting differences between clades that are more informative regarding the evolution of structure in tandem with environment/time.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This was a clearly written manuscript that did an excellent job summarizing complex data.

      In this manuscript, Cuevas-Zuviría et al. use protein modeling to generate over 5,000 predicted structures of nitrogenase components, encompassing both extant and ancestral forms across different clades. The study highlights that key insertions define the various Nif groups. The authors also examined the structures of three ancestral nitrogenase variants that had been previously identified and experimentally tested. These ancestral forms were shown in earlier studies to exhibit reduced activity in Azotobacter vinelandii, a model diazotroph. This work provides a useful resource for studying nitrogenase evolution.

      However, its impact is somewhat limited due to a lack of evidence linking the observed structural differences to functional changes. For example, in the ancestral nitrogenase structures, only a small set of residues (lines 421-431) were identified as potentially affecting interactions between nitrogenase components. Why didn't the authors test whether reverting these residues to their extant counterparts could improve nitrogenase activity of the ancestral variants?

      We thank the reviewer for their thoughtful comments. We acknowledge that our current study is primarily focused on a computational exploration of the structural differences in both extant and ancestral nitrogenase variants, which allowed us to generate a comprehensive structural dataset. Although we did not carry out experimental reversion tests in this study, we agree that directly assessing the functional consequences of reverting the specific residues (lines 420 to 429) to their extant counterparts is an important next step to elucidate their functional role. Indeed, these findings provide a valuable foundation for our future work, which is designed to include experimental characterization of these variants and further elucidate the role of critical residues in nitrogenase activity and evolution. We believe that these experiments will offer the direct functional validation that the reviewer has rightly pointed out, and we look forward to reporting on these results in a future study.

      Additionally, the paper feels somewhat disconnected. The predicted nitrogenase structures discussed in the first half of the manuscript were not well integrated with the findings from the ancestral structures. For instance, do the ancestral nitrogenase structures align with the predicted models? This comparison was never explicitly made and could have strengthened the study's conclusions.

      We thank the reviewer for this suggestion. Our original analysis (previously shown in Figure S9, now Figure S10) included insights into structural align comparisons. In response, we have reorganized the results section (lines 351-355) to explicitly address this comparison.

      Reviewer #2 (Public review):

      This work aims to study the evolution of nitrogenases, understanding how their structure and function adapted to changes in the environment, including oxygen levels and changes in metal availability. The study predicts > 5000 structures of nitrogenases, corresponding to extant, ancestral, and alternative ancestral sequences. It is observed that structural variations in the nitrogenases correlate with phylogenetic relationships. The amount of data generated in this study represents a massive undertaking that is certain to be a resource for the community. The study also provides strong insight into how structural evolution correlates with environmental and biological phenotypes.

      The challenge with this study is that all (or nearly all) of the quantitative analyses presented are based on RMSD calculations, many of which are under 2 angstroms. For all intents and purposes, two structures with RMSD < 2 angstroms could be considered 'structurally identical'. A lot of insight generated is based on minuscule differences in RMSD, for which it is not clear that they are significantly different. The suggestion would be to find a way to evaluate the RMSD metric and determine whether these values, as obtained for structures being compared, are reliable. Some options are provided in earlier studies: PMID: 11514933, PMID: 17218333, PMID: 11420449, PMID: 8289285 (and others). It could also be valuable to focus more on site-specific RMSDs rather than Global RMSDs. The high conservation in the nitrogenases likely ensures that the global RMSDs will remain low across the family. Focusing on specific regions might reveal interesting differences between clades that are more informative regarding the evolution of structure in tandem with environment/time.

      We thank the reviewer for their suggestions. We agree that while global RMSD values below 2Å typically indicate high structural similarity, relying solely on these measures can mask subtle yet potentially functionally meaningful differences. Our aim was not to test for overall structural identity but rather to quantify fine-scale variations between highly conserved nitrogenase structures, including extant and ancestral variants. Nevertheless, in light of the reviewer’s suggestions, we have implemented an additional metric ( rmsd<sub>100</sub>) for a more nuanced comparison. The results of our additional analyses (Figure S3) align closely with our original results (Figure 2), supporting our decision to retain the un-normalized results in the main text. As an additional measure, we also computed site-specific RMSDs for the active site’s environments (Figure S6) to further delineate subtle structural variations.

    1. eLife Assessment

      Examination of (a)periodic brain activity has gained particular interest in the last few years in the neuroscience fields relating to cognition, disorders, and brain states. Using large EEG/MEG datasets from younger and older adults, the current study provides compelling evidence that age-related differences in aperiodic EEG/MEG signals can be driven by cardiac rather than brain activity. Their findings have important implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac signals is essential.

    2. Reviewer #1 (Public review):

      Summary:

      The present study addresses whether physiological signals influence aperiodic brain activity with a focus on age-related changes. The authors report age effects on aperiodic cardiac activity derived from ECG in low and high-frequency ranges in roughly 2300 participants from four different sites. Slopes of the ECGs were associated with common heart variability measures, which, according to the authors, shows that ECG, even at higher frequencies, conveys meaningful information. Using temporal response functions on concurrent ECG and M/EEG time series, the authors demonstrate that cardiac activity is instantaneously reflected in neural recordings, even after applying ICA analysis to remove cardiac activity. This was more strongly the case for EEG than MEG data. Finally, spectral parameterization was done in large-scale resting-state MEG and ECG data in individuals between 18 and 88 years, and age effects were tested. A steepening of spectral slopes with age was observed, particularly for ECG and, to a lesser extent, in cleaned MEG data in most frequency ranges and sensors investigated. The authors conclude that commonly observed age effects on neural aperiodic activity can mainly be explained by cardiac activity.

      Strengths:

      Compared to previous investigations, the authors demonstrate effects of aging on the spectral slope in the currently largest MEG dataset with equal age distribution available. Their efforts of replicating observed effects in another large MEG dataset and considering potential confounding by ocular activity, head movements, or preprocessing methods are commendable and highly valuable to the community. This study also employs a wide range of fitting ranges and two commonly used algorithms for spectral parameterization of neural and cardiac activity, hence providing a comprehensive overview of the impact of methodological choices. The authors discuss their findings in-depth and give recommendations for the separation of physiological and neural sources of aperiodic activity.

      Weaknesses:

      While the study's aim is well-motivated and analyses rigorously conducted, it remains vague what is reflected in the ECG at higher frequency ranges that contributed to the confounding of the age effects in the neural data. However, the authors address this issue in their discussion.

    3. Reviewer #2 (Public review):

      As remains obvious from my previous reviews, I still consider this to be an important paper and that is final and publishable in its current state.

      In that previous review, I revealed my identity to help reassure the authors that I was doing my best to remain unbiased because I work in this area and some of the authors' results directly impact my prior research. I was genuinely excited to see the earlier preprint version of this paper when it first appeared. I get a lot of joy out of trying to - collectively, as a field - really understand the nature of our data, and I continue to commend the authors here for pushing at the sources of aperiodic activity!

      In their manuscript, Schmidt and colleagues provide a very compelling, convincing, thorough, and measured set of analyses. Previously I recommended that the push even further, and they added the current Figure 5 analysis of event-related changes in the ECG during working memory. In my opinion this result practically warrants a separate paper its own!

      The literature analysis is very clever, and expanded upon from any other prior version I've seen.

      In my previous review, the broadest, most high-level comment I wanted to make was that authors are correct. We (in my lab) have tried to be measured in our approach to talking about aperiodic analyses - including adopting measuring ECG when possible now - because there are so many sources of aperiodic activity: neural, ECG, respiration, skin conductance, muscle activity, electrode impedances, room noise, electronics noise, etc. The authors discuss this all very clearly, and I commend them on that. We, as a field, should move more toward a model where we can account for all of those sources of noise together. (This was less of an action item, and more of an inclusion of a comment for the record.)

      I also very much appreciate the authors' excellent commentary regarding the physiological effects that pharmacological challenges such as propofol and ketamine also have on non-neural (autonomic) functions such as ECG. Previously I also asked them to discuss the possibility that, while their manuscript focuses on aperiodic activity, it is possible that the wealth of literature regarding age-related changes in "oscillatory" activity might be driven partly by age-related changes in neural (or non-neural, ECG-related) changes in aperiodic activity. They have included a nice discussion on this, and I'm excited about the possibilities for cognitive neuroscience as we move more in this direction.

      Finally, I previously asked for recommendations on how to proceed. The authors convinced me that we should care about how the ECG might impact our field potential measures, but how do I, as a relative novice, proceed. They now include three strong recommendations at the end of their manuscript that I find to be very helpful.

      As was obvious from previous review, I consider this to be an important and impactful cautionary report, that is incredibly well supported by multiple thorough analyses. The authors have done an excellent job responding to all my previous comments and concerns and, in my estimation, those of the previous reviewers as well.

    4. Reviewer #3 (Public review):

      Summary:

      Schmidt et al., aimed to provide an extremely comprehensive demonstration of the influence cardiac electromagnetic fields have on the relationship between age and the aperiodic slope measured from electroencephalographic (EEG) and magnetoencephalographic (MEG) data.

      Strengths:

      Schmidt et al., used a multiverse approach to show that the cardiac influence on this relationship is considerable, by testing a wide range of different analysis parameters (including extensive testing of different frequency ranges assessed to determine the aperiodic fit), algorithms (including different artifact reduction approaches and different aperiodic fitting algorithms), and multiple large datasets to provide conclusions that are robust to the vast majority of potential experimental variations.

      The study showed that across these different analytical variations, the cardiac contribution to aperiodic activity measured using EEG and MEG is considerable, and likely influences the relationship between aperiodic activity and age to a greater extent than the influence of neural activity.

      Their findings have significant implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac fields is essential.

      Weaknesses:

      The authors have addressed the weaknesses of their study in their manuscript. Most alternative explanations for their results have been explored to ensure their conclusions are robust and are not explained by unexplored confounds. Minor potential weaknesses are:

      (1) The number of electrodes used in the EEG analyses was on the lower side, and as such, the results do not confirm that the influence of ECG on the 1/f activity in the EEG is high even for higher density EEG montages where ICA may provide better performance at removing cardiac components (as noted by the authors). Having noted this potential weakness, I doubt the effects of cardiac activity can be completely mitigated with current methods, even in higher-density EEG montages.

      (2) Head movements were used as a proxy for muscle activity. However, this may imperfectly address the potential influence of muscle activity on the slope in the EEG activity. As such, remaining muscle artifacts may have affected some of the results, particularly those that included high frequency ranges in the aperiodic estimate. Perhaps if muscle activity were left in the EEG data, it could have disrupted the ability to detect a relationship between age and 1/f slope in a way that didn't disrupt the same relationship in the cardiac data. However, I doubt this would reverse the overall conclusions given the number of converging results, including in lower frequency bands. The authors also note this potential weakness and suggest how future research might address it.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      Examination of (a)periodic brain activity has gained particular interest in the last few years in the neuroscience fields relating to cognition, disorders, and brain states. Using large EEG/MEG datasets from younger and older adults, the current study provides compelling evidence that age-related differences in aperiodic EEG/MEG signals can be driven by cardiac rather than brain activity. Their findings have important implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac signals is essential.

      We want to thank the editors for their assessment of our work and highlighting its importance for the understanding of aperiodic neural activity. Additionally, we want to thank the three present and four former reviewers (at a different journal) whose comments and ideas were critical in shaping this manuscript to its current form. We hope that this paper opens up many more questions that will guide us - as a field - to an improved understanding of how “cortical” and “cardiac” changes in aperiodic activity are linked and want to invite readers to engage with our work through eLife’s comment function.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The present study addresses whether physiological signals influence aperiodic brain activity with a focus on age-related changes. The authors report age effects on aperiodic cardiac activity derived from ECG in low and high-frequency ranges in roughly 2300 participants from four different sites. Slopes of the ECGs were associated with common heart variability measures, which, according to the authors, shows that ECG, even at higher frequencies, conveys meaningful information. Using temporal response functions on concurrent ECG and M/EEG time series, the authors demonstrate that cardiac activity is instantaneously reflected in neural recordings, even after applying ICA analysis to remove cardiac activity. This was more strongly the case for EEG than MEG data. Finally, spectral parameterization was done in large-scale resting-state MEG and ECG data in individuals between 18 and 88 years, and age effects were tested. A steepening of spectral slopes with age was observed particularly for ECG and, to a lesser extent, in cleaned MEG data in most frequency ranges and sensors investigated. The authors conclude that commonly observed age effects on neural aperiodic activity can mainly be explained by cardiac activity.

      Strengths:

      Compared to previous investigations, the authors demonstrate the effects of aging on the spectral slope in the currently largest MEG dataset with equal age distribution available. Their efforts of replicating observed effects in another large MEG dataset and considering potential confounding by ocular activity, head movements, or preprocessing methods are commendable and valuable to the community. This study also employs a wide range of fitting ranges and two commonly used algorithms for spectral parameterization of neural and cardiac activity, hence providing a comprehensive overview of the impact of methodological choices. Based on their findings, the authors give recommendations for the separation of physiological and neural sources of aperiodic activity.

      Weaknesses:

      While the aim of the study is well-motivated and analyses rigorously conducted, the overall structure of the manuscript, as it stands now, is partially misleading. Some of the described results are not well-embedded and lack discussion.

      We want to thank the reviewer for their comments focussed on improving the overall structure of the manuscript. We agree with their suggestions that some results could be more clearly contextualized and restructured the manuscript accordingly.

      Reviewer #2 (Public review):

      I previously reviewed this important and timely manuscript at a previous journal where, after two rounds of review, I recommended publication. Because eLife practices an open reviewing format, I will recapitulate some of my previous comments here, for the scientific record.

      In that previous review, I revealed my identity to help reassure the authors that I was doing my best to remain unbiased because I work in this area and some of the authors' results directly impact my prior research. I was genuinely excited to see the earlier preprint version of this paper when it first appeared. I get a lot of joy out of trying to - collectively, as a field - really understand the nature of our data, and I continue to commend the authors here for pushing at the sources of aperiodic activity!

      In their manuscript, Schmidt and colleagues provide a very compelling, convincing, thorough, and measured set of analyses. Previously I recommended that the push even further, and they added the current Figure 5 analysis of event-related changes in the ECG during working memory. In my opinion this result practically warrants a separate paper its own!

      The literature analysis is very clever, and expanded upon from any other prior version I've seen.

      In my previous review, the broadest, most high-level comment I wanted to make was that authors are correct. We (in my lab) have tried to be measured in our approach to talking about aperiodic analyses - including adopting measuring ECG when possible now - because there are so many sources of aperiodic activity: neural, ECG, respiration, skin conductance, muscle activity, electrode impedances, room noise, electronics noise, etc. The authors discuss this all very clearly, and I commend them on that. We, as a field, should move more toward a model where we can account for all of those sources of noise together. (This was less of an action item, and more of an inclusion of a comment for the record.)

      I also very much appreciate the authors' excellent commentary regarding the physiological effects that pharmacological challenges such as propofol and ketamine also have on non-neural (autonomic) functions such as ECG. Previously I also asked them to discuss the possibility that, while their manuscript focuses on aperiodic activity, it is possible that the wealth of literature regarding age-related changes in "oscillatory" activity might be driven partly by age-related changes in neural (or non-neural, ECG-related) changes in aperiodic activity. They have included a nice discussion on this, and I'm excited about the possibilities for cognitive neuroscience as we move more in this direction.

      Finally, I previously asked for recommendations on how to proceed. The authors convinced me that we should care about how the ECG might impact our field potential measures, but how do I, as a relative novice, proceed. They now include three strong recommendations at the end of their manuscript that I find to be very helpful.

      As was obvious from previous review, I consider this to be an important and impactful cautionary report, that is incredibly well supported by multiple thorough analyses. The authors have done an excellent job responding to all my previous comments and concerns and, in my estimation, those of the previous reviewers as well.

      We want to thank the reviewer for agreeing to review our manuscript again and for recapitulating on their previous comments and the progress the manuscript has made over the course of the last ~2 years. The reviewer's comments have been essential in shaping the manuscript into its current form. Their feedback has made the review process truly feel like a collaborative effort, focused on strengthening the manuscript and refining its conclusions and resulting recommendations.

      Reviewer #3 (Public review):

      Summary:

      Schmidt et al., aimed to provide an extremely comprehensive demonstration of the influence cardiac electromagnetic fields have on the relationship between age and the aperiodic slope measured from electroencephalographic (EEG) and magnetoencephalographic (MEG) data.

      Strengths:

      Schmidt et al., used a multiverse approach to show that the cardiac influence on this relationship is considerable, by testing a wide range of different analysis parameters (including extensive testing of different frequency ranges assessed to determine the aperiodic fit), algorithms (including different artifact reduction approaches and different aperiodic fitting algorithms), and multiple large datasets to provide conclusions that are robust to the vast majority of potential experimental variations.

      The study showed that across these different analytical variations, the cardiac contribution to aperiodic activity measured using EEG and MEG is considerable, and likely influences the relationship between aperiodic activity and age to a greater extent than the influence of neural activity.

      Their findings have significant implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac fields is essential.

      We want to thank the reviewer for their thorough engagement with our work and the resultant substantive amount of great ideas both mentioned in the section of Weaknesses and Authors Recommendations below. Their suggestions have sparked many ideas in us on how to move forward in better separating peripheral- from neuro-physiological signals that are likely to greatly influence our future attempts to better extract both cardiac and muscle activity from M/EEG recordings. So we want to thank them for their input, time and effort!

      Weaknesses:

      Figure 4I: The regressions explained here seem to contain a very large number of potential predictors. Based on the way it is currently written, I'm assuming it includes all sensors for both the ECG component and ECG rejected conditions?

      I'm not sure about the logic of taking a complete signal, decomposing it with ICA to separate out the ECG and non-ECG signals, then including these latent contributions to the full signal back into the same regression model. It seems that there could be some circularity or redundancy in doing so. Can the authors provide a justification for why this is a valid approach?

      After observing significant effects both in the MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> conditions in similar frequency bands we wanted to understand whether or not these age-related changes are statistically independent. To test this we added both variables as predictors in a regression model (thereby accounting for the influence of the other in relation to age). The regression models we performed were therefore actually not very complex. They were built using only two predictors, namely the data (in a specific frequency range) averaged over channels on which we noticed significant effects in the ECG rejected and ECG components data respectively (Wilkinson notation: age ~ 1 + ECG rejected + ECG components). This was also described in the results section stating that: “To see if MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub> explain unique variance in aging at frequency ranges where we noticed shared effects, we averaged the spectral slope across significant channels and calculated a multiple regression model with MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> as predictors for age (to statistically control for the effect of MEG<sub>ECG component</sub>s and MEG<sub>ECG rejected</sub> on age). This analysis was performed to understand whether the observed shared age-related effects (MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub>) are in(dependent).”  

      We hope this explanation solves the previous misunderstanding.

      I'm not sure whether there is good evidence or rationale to support the statement in the discussion that the presence of the ECG signal in reference electrodes makes it more difficult to isolate independent ECG components. The ICA algorithm will still function to detect common voltage shifts from the ECG as statistically independent from other voltage shifts, even if they're spread across all electrodes due to the referencing montage. I would suggest there are other reasons why the ICA might lead to imperfect separation of the ECG component (assumption of the same number of source components as sensors, non-Gaussian assumption, assumption of independence of source activities).

      The inclusion of only 32 channels in the EEG data might also have reduced the performance of ICA, increasing the chances of imperfect component separation and the mixing of cardiac artifacts into the neural components, whereas the higher number of sensors in the MEG data would enable better component separation. This could explain the difference between EEG and MEG in the ability to clean the ECG artifact (and perhaps higher-density EEG recordings would not show the same issue).

      The reviewer is making a good argument suggesting that our initial assumption that the presence of cardiac activity on the reference electrode influences the performance of the ICA may be wrong. After rereading and rethinking upon the matter we think that the reviewer is correct and that their assumptions for why the ECG signal was not so easily separable from our EEG recordings are more plausible and better grounded in the literature than our initial suggestion. We therefore now highlight their view as a main reason for why the ECG rejection was more challenging in EEG data. However, we also note that understanding the exact reason probably ends up being an empirical question that demands further research stating that:

      “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources. ”

      In addition to the inability to effectively clean the ECG artifact from EEG data, ICA and other component subtraction methods have also all been shown to distort neural activity in periods that aren't affected by the artifact due to the ubiquitous issue of imperfect component separation (https://doi.org/10.1101/2024.06.06.597688). As such, component subtraction-based (as well as regression-based) removal of the cardiac artifact might also distort the neural contributions to the aperiodic signal, so even methods to adequately address the cardiac artifact might not solve the problem explained in the study. This poses an additional potential confound to the "M/EEG without ECG" conditions.

      The reviewer is correct in stating that, if an “artifactual” signal is not always present but appears and disappears (like e.g. eye-blinks) neural activity may be distorted in periods where the “artifactual” signal is absent. However, while this plausibly presents a problem for ocular activity, there is no obvious reason to believe that this applies to cardiac activity. While the ECG signal is non-stationary in nature, it is remarkably more stable than eye-movements in the healthy populations we analyzed (especially at rest). Therefore, the presence of the cardiac “artifact” was consistently present across the entirety of the MEG recordings we visually inspected.

      Literature Analysis, Page 23: was there a method applied to address studies that report reducing artifacts in general, but are not specific to a single type of artifact? For example, there are automated methods for cleaning EEG data that use ICLabel (a machine learning algorithm) to delete "artifact" components. Within these studies, the cardiac artifact will not be mentioned specifically, but is included under "artifacts".

      The literature analysis was largely performed automatically and solely focussed on ECG related activity as described in the methods section under Literature Analysis, if no ECG related terms were used in the context of artifact rejection a study was flagged as not having removed cardiac activity. This could have been indeed better highlighted by us and we apologize for the oversight on our behalf. We now additionally link to these details stating that:

      “However, an analysis of openly accessible M/EEG articles (N<sub>Articles</sub>=279; see Methods - Literature Analysis for further details) that investigate aperiodic activity revealed that only 17.1% of EEG studies explicitly mention that cardiac activity was removed and only 16.5% measure ECG (45.9% of MEG studies removed cardiac activity and 31.1% of MEG studies mention that ECG was measured; see Figure 1EF).”

      The reviewer makes a fair point that there is some uncertainty here and our results probably present a lower bound of ECG handling in M/EEG research as, when I manually rechecked the studies that were not initially flagged in studies it was often solely mentioned that “artifacts” were rejected. However, this information seemed too ambiguous to assume that cardiac activity was in fact accounted for. However, again this could have been mentioned more clearly in writing and we apologize for this oversight. Now this is included as part of the methods section Literature Analysis stating that:

      “All valid word contexts were then manually inspected by scanning the respective word context to ensure that the removal of “artifacts” was related specifically to cardiac and not e.g. ocular activity or the rejection of artifacts in general (without specifying which “artifactual” source was rejected in which case the manuscript was marked as invalid). This means that the results of our literature analysis likely present a lower bound for the rejection of cardiac activity in the M/EEG literature investigating aperiodic activity.”

      Statistical inferences, page 23: as far as I can tell, no methods to control for multiple comparisons were implemented. Many of the statistical comparisons were not independent (or even overlapped with similar analyses in the full analysis space to a large extent), so I wouldn't expect strong multiple comparison controls. But addressing this point to some extent would be useful (or clarifying how it has already been addressed if I've missed something).

      In the present study we tried to minimize the risk of type 1 errors by several means, such as A) weakly informative priors, B) robust regression models and C) by specifying a region of practical equivalence (ROPE, see Methods Statistical Inference for further Information) to define meaningful effects.

      Weakly informative priors can lower the risk of type 1 errors arising from multiple testing by shrinking parameter estimates towards zero (see e.g. Lemoine, 2019). Robust regression models use a Student T distribution to describe the distribution of the data. This distribution features heavier tails, meaning it allocates more probability to extreme values, which in turn minimizes the influence of outliers. The ROPE criterion ensures that only effects exceeding a negligible size are considered meaningful, representing a strict and conservative approach to interpreting our findings (see Kruschke 2018, Cohen, 1988).

      Furthermore, and more generally we do not selectively report “significant” effects in the situations in which multiple analyses were conducted on the same family of data (e.g. Figure 2 & 4). Instead we provide joint inference across several plausible analysis options (akin to a specification curve analysis, Simonsohn, Simmons & Nelson 2020) to provide other researchers with an overview of how different analysis choices impact the association between cardiac and neural aperiodic activity.

      Lemoine, N. P. (2019). Moving beyond noninformative priors: why and how to choose weakly informative priors in Bayesian analyses. Oikos, 128(7), 912-928.

      Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification curve analysis. Nature Human Behaviour, 4(11), 1208-1214.

      Methods:

      Applying ICA components from 1Hz high pass filtered data back to the 0.1Hz filtered data leads to worse artifact cleaning performance, as the contribution of the artifact in the 0.1Hz to 1Hz frequency band is not addressed (see Bailey, N. W., Hill, A. T., Biabani, M., Murphy, O. W., Rogasch, N. C., McQueen, B., ... & Fitzgerald, P. B. (2023). RELAX part 2: A fully automated EEG data cleaning algorithm that is applicable to Event-Related-Potentials. Clinical Neurophysiology, result reported in the supplementary materials). This might explain some of the lower frequency slope results (which include a lower frequency limit <1Hz) in the EEG data - the EEG cleaning method is just not addressing the cardiac artifact in that frequency range (although it certainly wouldn't explain all of the results).

      We want to thank the reviewer for suggesting this interesting paper, showing that lower high-pass filters may be preferable to the more commonly used >1Hz high-pass filters for detection of ICA components that largely contain peripheral physiological activity. However, the results presented by Bailey et al. contradict the more commonly reported findings by other researchers that >1Hz high-pass filter is actually preferable (e.g. Winkler et al. 2015; Dimingen, 2020 or Klug & Gramann, 2021) and recommendations in widely used packages for M/EEG analysis (e.g. https://mne.tools/1.8/generated/mne.preprocessing.ICA.html). Yet, the fact that there seems to be a discrepancy suggests that further research is needed to better understand which type of high-pass filtering is preferable in which situation. Furthermore, it is notable that all the findings for high-pass filtering in ICA component detection and removal that we are aware of relate to ocular activity. Given that ocular and cardiac activity have very different temporal and spectral patterns it is probably worth further investigating whether the classic 1Hz high-pass filter is really also the best option for the detection and removal of cardiac activity. However, in our opinion this requires a dedicated investigation on its own..

      We therefore highlight this now in our manuscript stating that:

      “Additionally, it is worth noting that the effectiveness of an ICA crucially depends on the quality of the extracted components(63,64) and even widely suggested settings e.g. high-pass filtering at 1Hz before fitting an ICA may not be universally applicable (see supplementary material of (64)).

      Winkler, S. Debener, K. -R. Müller and M. Tangermann, "On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP," 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 2015, pp. 4101-4105, doi: 10.1109/EMBC.2015.7319296.

      Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117.

      Klug, M., & Gramann, K. (2021). Identifying key factors for improving ICA‐based decomposition of EEG data in mobile and stationary experiments. European Journal of Neuroscience, 54(12), 8406-8420.

      It looks like no methods were implemented to address muscle artifacts. These can affect the slope of EEG activity at higher frequencies. Perhaps the Riemannian Potato addressed these artifacts, but I suspect it wouldn't eliminate all muscle activity. As such, I would be concerned that remaining muscle artifacts affected some of the results, particularly those that included high frequency ranges in the aperiodic estimate. Perhaps if muscle activity were left in the EEG data, it could have disrupted the ability to detect a relationship between age and 1/f slope in a way that didn't disrupt the same relationship in the cardiac data (although I suspect it wouldn't reverse the overall conclusions given the number of converging results including in lower frequency bands). Is there a quick validity analysis the authors can implement to confirm muscle artifacts haven't negatively affected their results?

      I note that an analysis of head movement in the MEG is provided on page 32, but it would be more robust to show that removing ICA components reflecting muscle doesn't change the results. The results/conclusions of the following study might be useful for objectively detecting probable muscle artifact components: Fitzgibbon, S. P., DeLosAngeles, D., Lewis, T. W., Powers, D. M. W., Grummett, T. S., Whitham, E. M., ... & Pope, K. J. (2016). Automatic determination of EMG-contaminated components and validation of independent component analysis using EEG during pharmacologic paralysis. Clinical neurophysiology, 127(3), 1781-1793.

      We thank the reviewer for their suggestion. Muscle activity can indeed be a potential concern, for the estimation of the spectral slope. This is precisely why we used head movements (as also noted by the reviewer) as a proxy for muscle activity. We also agree with the reviewer that this is not a perfect estimate. Additionally, also the riemannian potato would probably only capture epochs that contain transient, but not persistent patterns of muscle activity.

      The paper recommended by the reviewer contains a clever approach of using the steepness of the spectral slope (or lack thereof) as an indicator whether or not an independent component (IC) is driven by muscle activity. In order to determine an optimal threshold Fitzgibbon et al. compared paralyzed to temporarily non paralyzed subjects. They determined an expected “EMG-free” threshold for their spectral slope on paralyzed subjects and used this as a benchmark to detect IC’s that were contaminated by muscle activity in non paralyzed subjects.

      This is a great idea, but unfortunately would go way beyond what we are able to sensibly estimate with our data for the following reasons. The authors estimated their optimal threshold on paralyzed subjects for EEG data and show that this is a feasible threshold to be applied across different recordings. So for EEG data it might be feasible, at least as a first shot, to use their threshold on our data. However, we are measuring MEG and as alluded to in our discussion section under “Differences in aperiodic activity between magnetic and electric field recordings” the spectral slope differs greatly between MEG and EEG recordings for non-trivial reasons. Furthermore, the spectral slope even seems to also differ across different MEG devices. We noticed this when we initially tried to pool the data recorded in Salzburg with the Cambridge dataset. This means we would need to do a complete validation of this procedure for the MEG data recorded in Cambridge and in Salzburg, which is not feasible considering that we A) don’t have direct access to one of the recording sites and B) would even if we had access face substantial hurdles to get ethical approval for the experiment performed by Fitzgibbon et al..

      However, we think the approach brought forward by Fitzgibbon and colleagues is a clever way to remove muscle activity from EEG recordings, whenever EMG was not directly recorded. We therefore suggested in the Discussion section that ideally also EMG should be recorded stating that:

      “It is worth noting that, apart from cardiac activity, muscle activity can also be captured in (non-)invasive recordings and may drastically influence measures of the spectral slope(72). To ensure that persistent muscle activity does not bias our results we used changes in head movement velocity as a control analysis (see Supplementary Figure S9). However, it should be noted that this is only a proxy for the presence of persistent muscle activity. Ideally, studies investigating aperiodic activity should also be complemented by measurements of EMG. Whenever such measurements are not available creative approaches that use the steepness of the spectral slope (or the lack thereof) as an indicator to detect whether or not e.g. an independent component is driven by muscle activity are promising(72,73). However, these approaches may require further validation to determine how well myographic aperiodic thresholds are transferable across the wide variety of different M/EEG devices.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) As outlined above, I recommend rephrasing the last section of the introduction to briefly summarize/introduce all main analysis steps undertaken in the study and why these were done (for example, it is only mentioned that the Cam-CAN dataset was used to study the impact of cardiac on MEG activity although the author used a variety of different datasets). Similarly, I am missing an overview of all main findings in the context of the study goals in the discussion. I believe clarifying the structure of the paper would not only provide a red thread to the reader but also highlight the efforts/strength of the study as described above.

      This is a good call! As suggested by the reviewer we now try to give a clearer overview of what was investigated why. We do that both at the end of the introduction stating that: “Using the publicly available Cam-CAN dataset(28,29), we find that the aperiodic signal measured using M/EEG originates from multiple physiological sources. In particular, significant portions of age-related changes in aperiodic activity –normally attributed to neural processes– can be better explained by cardiac activity. This observation holds across a wide range of processing options and control analyses (see Supplementary S1), and was replicable on a separate MEG dataset. However, the extent to which cardiac activity accounts for age-related changes in aperiodic activity varies with the investigated frequency range and recording site. Importantly, in some frequency ranges and sensor locations, age-related changes in neural aperiodic activity still prevail. But does the influence of cardiac activity on the aperiodic spectrum extend beyond age? In a preliminary analysis, we demonstrate that working memory load modulates the aperiodic spectrum of “pure” ECG recordings. The direction of this working memory effect mirrors previous findings on EEG data(5) suggesting that the impact of cardiac activity goes well beyond aging. In sum, our results highlight the complexity of aperiodic activity while cautioning against interpreting it as solely “neural“ without considering physiological influences.”

      and at the beginning of the discussion section:

      “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources (see Figure 1EF). Additionally, it is worth noting that the effectiveness of an ICA crucially depends on the quality of the extracted components(63,64) and even widely suggested settings e.g. high-pass filtering at 1Hz before fitting an ICA may not be universally applicable (see supplementary material of (64)). “

      (2) I found it interesting that the spectral slopes of ECG activity at higher frequency ranges (> 10 Hz) seem mostly related to HRV measures such as fractal and time domain indices and less so with frequency-domain indices. Do the authors have an explanation for why this is the case? Also, the analysis of the HRV measures and their association with aperiodic ECG activity is not explained in any of the method sections.

      We apologize for the oversight in not mentioning the HRV analysis in more detail in our methods section. We added a subsection to the Methods section entitled ECG Processing - Heart rate variability analysis to further describe the HRV analyses.

      “ECG Processing - Heart rate variability analysis

      Heart rate variability (HRV) was computed using the NeuroKit2 toolbox, a high level tool for the analysis of physiological signals. First, the raw electrocardiogram (ECG) data were preprocessed, by highpass filtering the signal at 0.5Hz using an infinite impulse response (IIR) butterworth filter(order=5) and by smoothing the signal with a moving average kernel with the width of one period of 50Hz to remove the powerline noise (default settings of neurokit.ecg.ecg_clean). Afterwards, QRS complexes were detected based on the steepness of the absolute gradient of the ECG signal. Subsequently, R-Peaks were detected as local maxima in the QRS complexes (default settings of neurokit.ecg.ecg_peaks; see (98) for a validation of the algorithm). From the cleaned R-R intervals, 90 HRV indices were derived, encompassing time-domain, frequency-domain, and non-linear measures. Time-domain indices included standard metrics such as the mean and standard deviation of the normalized R-R intervals , the root mean square of successive differences, and other statistical descriptors of interbeat interval variability. Frequency-domain analyses were performed using power spectral density estimation, yielding for instance low frequency (0.04-0.15Hz) and high frequency (0.15-0.4Hz) power components. Additionally, non-linear dynamics were characterized through measures such as sample entropy, detrended fluctuation analysis and various Poincaré plot descriptors. All these measures were then related to the slopes of the low frequency (0.25 – 20 Hz) and high frequency (10 – 145 Hz) aperiodic spectrum of the raw ECG.”

      With regards to association of the ECG’s spectral slopes at high frequencies and frequency domain indices of heart rate variability. Common frequency domain indices of heart rate variability fall in the range of 0.01-.4Hz. Which probably explains why we didn’t notice any association at higher frequency ranges (>10Hz).

      This is also stated in the related part of the results section:

      “In the higher frequency ranges (10 - 145 Hz) spectral slopes were most consistently related to fractal and time domain indices of heart rate variability, but not so much to frequency-domain indices assessing spectral power in frequency ranges < 0.4 Hz.”

      (3) Related to the previous point - what is being reflected in the ECG at higher frequency ranges, with regard to biological mechanisms? Results are being mentioned, but not further discussed. However, this point seems crucial because the age effects across the four datasets differ between low and high-frequency slope limits (Figure 2C).

      This is a great question that definitely also requires further attention and investigation in general (see also Tereshchenko & Josephson, 2015). We investigated the change of the slope across frequency ranges that are typically captured in common ECG setups for adults (0.05 - 150Hz, Tereshchenko & Josephson, 2015; Kusayama, Wong, Liu et al. 2020). While most of the physiological significant spectral information of an ECG recording rests between 1-50Hz (Clifford & Azuaje, 2006), meaningful information can be extracted at much higher frequencies. For instance, ventricular late potentials have a broader frequency band (~40-250Hz) that falls straight in our spectral analysis window. However, that’s not all, as further meaningful information can be extracted at even higher frequencies (>100Hz). Yet, the exact physiological mechanisms underlying so-called high-frequency QRS remain unclear (HF-QRS; see Tereshchenko & Josephson, 2015; Qiu et al. 2024 for a review discussing possible mechanisms). Yet, at the same time the HF-QRS seems to be highly informative for the early detection of myocardial ischemia and other cardiac abnormalities that may not yet be evident in the standard frequency range (Schlegel et al. 2004; Qiu et al. 2024). All optimism aside, it is also worth noting that ECG recordings at higher frequencies can capture skeletal muscle activity with an overlapping frequency range up to 400Hz (Kusayama, Wong, Liu et al. 2020). We highlight all of this now when introducing this analysis in the results sections as outstanding research question stating that:

      “However, substantially less is known about aperiodic activity above 0.4Hz in the ECG. Yet, common ECG setups for adults capture activity at a broad bandwidth of 0.05 - 150Hz(33,34).

      Importantly, a lot of the physiological meaningful spectral information rests between 1-50Hz(35), similarly to M/EEG recordings. Furthermore, meaningful information can be extracted at much higher frequencies. For instance, ventricular late potentials have a broader frequency band (~40-250Hz(35)). However, that’s not all, as further meaningful information can be extracted at even higher frequencies (>100Hz). For instance, the so-called high-frequency QRS seems to be highly informative for the early detection of myocardial ischemia and other cardiac abnormalities that may not yet be evident in the standard frequency range(36,37). Yet, the exact physiological mechanisms underlying the high-frequency QRS remain unclear (see (37) for a review discussing possible mechanisms). ”

      Tereshchenko, L. G., & Josephson, M. E. (2015). Frequency content and characteristics of ventricular conduction. Journal of electrocardiology, 48(6), 933-937.

      Kusayama, T., Wong, J., Liu, X. et al. Simultaneous noninvasive recording of electrocardiogram and skin sympathetic nerve activity (neuECG). Nat Protoc 15, 1853–1877 (2020). https://doi.org/10.1038/s41596-020-0316-6

      Clifford, G. D., & Azuaje, F. (2006). Advanced methods and tools for ECG data analysis (Vol. 10). P. McSharry (Ed.). Boston: Artech house.

      Qiu, S., Liu, T., Zhan, Z., Li, X., Liu, X., Xin, X., ... & Xiu, J. (2024). Revisiting the diagnostic and prognostic significance of high-frequency QRS analysis in cardiovascular diseases: a comprehensive review. Postgraduate Medical Journal, qgae064.

      Schlegel, T. T., Kulecz, W. B., DePalma, J. L., Feiveson, A. H., Wilson, J. S., Rahman, M. A., & Bungo, M. W. (2004, March). Real-time 12-lead high-frequency QRS electrocardiography for enhanced detection of myocardial ischemia and coronary artery disease. In Mayo Clinic Proceedings (Vol. 79, No. 3, pp. 339-350). Elsevier.

      (4) Page 10: At first glance, it is not quite clear what is meant by "processing option" in the text. Please clarify.

      Thank you for catching this! Upon re-reading this is indeed a bit oblivious. We now swapped “processing options” with “slope fits” to make it clearer that we are talking about the percentage of effects based on the different slope fits.

      (5) The authors mention previous findings on age effects on neural 1/f activity (References Nr 5,8,27,39) that seem contrary to their own findings such as e.g., the mostly steepening of the slopes with age. Also, the authors discuss thoroughly why spectral slopes derived from MEG signals may differ from EEG signals. I encourage the authors to have a closer look at these studies and elaborate a bit more on why these studies differ in their conclusions on the age effects. For example, Tröndle et al. (2022, Ref. 39) investigated neural activity in children and young adults, hence, focused on brain maturation, whereas the CamCAN set only considers the adult lifespan. In a similar vein, others report age effects on 1/f activity in much smaller samples as reported here (e.g., Voytek et al., 2015).

      I believe taking these points into account by briefly discussing them, would strengthen the authors' claims and provide a more fine-grained perspective on aging effects on 1/f.

      The reviewer is making a very important point. As age-related differences in (neuro-)physiological activity are not necessarily strictly comparable and entirely linear across different age-cohorts (e.g. age-related changes in alpha center frequency). We therefore, added the suggested discussion points to the discussion section.

      “Differences in electric and magnetic field recordings aside, aperiodic activity may not change strictly linearly as we are ageing and studies looking at younger age groups (e.g. <22; (44) may capture different aspects of aging (e.g. brain maturation), than those looking at older subjects (>18 years; our sample). A recent report even shows some first evidence of an interesting putatively non-linear relationship with age in the sensorimotor cortex for resting recordings(59)”

      (6) The analysis of the working memory paradigm as described in the outlook-section of the discussion comes as a bit of a surprise as it has not been introduced before. If the authors want to convey with this study that, in general, aperiodic neural activity could be influenced by aperiodic cardiac activity, I recommend introducing this analysis and the results earlier in the manuscript than only in the discussion to strengthen their message.

      The reviewer is correct. This analysis really comes a bit out of the blue. However, this was also exactly the intention for placing this analysis in the discussion. As the reviewer correctly noted, the aim was to suggest “that, in general, aperiodic neural activity could be influenced by aperiodic cardiac activity”. We placed this outlook directly after the discussion of “(neuro-)physiological origins of aperiodic activity”, where we highlight the potential challenges of interpreting drug induced changes to M/EEG recordings. So the aim was to get the reader to think about whether age is the only feature affected by cardiac activity and then directly present some evidence that this might go beyond age.

      However, we have been rethinking this approach based on the reviewers comments and moved that paragraph to the end of the results section accordingly and introduce it already at the end of the introduction stating that:

      “But does the influence of cardiac activity on the aperiodic spectrum extend beyond age? In a preliminary analysis, we demonstrate that working memory load modulates the aperiodic spectrum of “pure” ECG recordings. The direction of this working memory effect mirrors previous findings on EEG data(5) suggesting that the impact of cardiac activity goes well beyond aging.”

      (7) The font in Figure 2 is a bit hard to read (especially in D). I recommend increasing the font sizes where necessary for better readability.

      We agree with the Reviewer and increased the font sizes accordingly.

      (8) Text in the discussion: Figure 3B on page 10 => shouldn't it be Figure 4?

      Thank you for catching this oversight. We have now corrected this mistake.

      (9) In the third section on page 10, the Figure labels seem to be confused. For example, Figure 4 E is supposed to show "steepening effects", which should be Figure 4B I believe.

      Please check the figure labels in this section to avoid confusion.

      Thank you for catching this oversight. We have now corrected this mistake.

      (10) Figure Legend 4 I), please check the figure labels in the text

      Thank you for catching this oversight. We have now corrected this mistake.

      Reviewer #3 (Recommendations for the authors):

      I have a number of suggestions for improving the manuscript, which I have divided by section in the following:

      ABSTRACT:

      I would suggest re-writing the first sentences to make it easier to read for non-expert readers: "The power of electrophysiologically measured cortical activity decays with an approximately 1/fX function. The slope of this decay (i.e. the spectral exponent, X) is modulated..."

      Thank you for the suggestion. We adjusted the sentence as suggested to make it easier for less technical readers to understand that “X” refers to the exponent.

      Including the age range that was studied in the abstract could be informative.

      Done as suggested.

      As an optional recommendation, I think it would increase the impact of the article if the authors note in the abstract that the current most commonly applied cardiac artifact reduction approaches don't resolve the issue for EEG data, likely due to an imperfect ability to separate the cardiac artifact from the neural activity with independent component analysis. This would highlight to the reader that they can't just expect to address these concerns by cleaning their data with typical cleaning methods.

      I think it would also be useful to convey in the abstract just how comprehensive the included analyses were (in terms of artifact reduction methods tested, different aperiodic algorithms and frequency ranges, and both MEG and EEG). Doing so would let the reader know just how robust the conclusions are likely to be.

      This is a brilliant idea! As suggested we added a sentence highlighting that simply performing an ICA may not be sufficient to separate cardiac contributions to M/EEG recordings and refer to the comprehensiveness of the performed analyses.

      INTRODUCTION:

      I would suggest re-writing the following sentence for readability: "In the past, aperiodic neural activity, other than periodic neural activity (local peaks that rise above the "power-law" distribution), was often treated as noise and simply removed from the signal"

      To something like: "In the past, aperiodic neural activity was often treated as noise and simply removed from the signal e.g. via pre-whitening, so that analyses could focus on periodic neural activity (local peaks that rise above the "power-law" distribution, which are typically thought to reflect neural oscillations).

      We are happy to follow that suggestion.

      Page 3: please provide the number of articles that were included in the examination of the percentage that remove cardiac activity, and note whether the included articles could be considered a comprehensive or nearly comprehensive list, or just a representative sample.

      We stated the exact number of articles in the methods section under Literature Analysis. However, we added it to the Introduction on page 3 as suggested by the reviewer. The selection of articles was done automatically, dependent on a list of pre-specified terms and exclusively focussed on articles that had terms related to aperiodic activity in their title (see Literature Analysis). Therefore, I would personally be hesitant in calling it a comprehensive or nearly comprehensive list of the general M/EEG literature as the analysis of aperiodic activity is still relatively niche compared to the more commonly investigated evoked potentials or oscillations. I think whether or not a reader perceives our analysis as comprehensive should be up to them to decide and does not reflect something I want to impose on them. This is exacerbated by the fact that the analysis of neural aperiodic activity has rapidly gained traction over the last years (see Figure 1D orange) and the literature analysis was performed almost 2 years ago and therefore, in my eyes, only represents a glimpse in the rapidly evolving field related to the analysis of aperiodic activity.

      Figure 1E-F: It's not completely clear that the "Cleaning Methods" part of the figure indicates just methods to clean the cardiac artifact (rather than any artifact). It also seems that ~40% of EEG studies do not apply any cleaning methods even from within the studies that do clean the cardiac artifact (if I've read the details correctly). This seems unlikely. Perhaps there should be a bar for "other methods", or "unspecified"? Having said that, I'm quite familiar with the EEG artifact reduction literature, and I would be very surprised if ~40% of studies cleaned the cardiac artifact using a different method to the methods listed in the bar graph, so I'm wondering if I've misunderstood the figure, or whether the data capture is incomplete / inaccurate (even though the conclusion that ICA is the most common method is almost certainly accurate).

      The cleaning is indeed only focussed on cardiac activity specifically. This was however also mentioned in the caption of Figure 1: “We were further interested in determining which artifact rejection approaches were most commonly used to remove cardiac activity, such as independent component analysis (ICA(22)), singular value decomposition (SVD(23)), signal space separation (SSS(24)), signal space projections (SSP(25)) and denoising source separation (DSS(26)).” and in the methods section under Literature Analysis. However, we adjusted figure 1EF to make it more obvious that the described cleaning methods were only related to the ECG. Aside from using blind source separation techniques such as ICA a good amount of studies mentioned that they cleaned their data based on visual inspection (which was not further considered). Furthermore, it has to be noted that only studies were marked as having separated cardiac from neural activity, when this was mentioned explicitly.

      RESULTS:

      Page 6: I would delete the "from a neurophysiological perspective" clause, which makes the sentence more difficult to read and isn't so accurate (frequencies 13-25Hz would probably more commonly be considered mid-range rather than low or high). Additionally, both frequency ranges include 15Hz, but the next sentence states that the ranges were selected to avoid the knee at 15Hz, which seems to be a contradiction. Could the authors explain in more detail how the split addresses the 15Hz knee?

      We removed the “from a neurophysiological perspective” clause as suggested. With regards to the “knee” at ~15Hz I would like to defer the reviewer to Supplementary Figure S1. The Knee Frequency varies substantially across subjects so splitting the data at only 1 exact Frequency did not seem appropriate. Additionally, we found only spurious significant age-related variations in Knee Frequency (i.e. only one out of the 4 datasets; not shown).

      Furthermore, we wanted to better connect our findings to our MEG results in Figure 4 and also give the readers a holistic overview of how different frequency ranges in the aperiodic ECG would be affected by age. So to fulfill all of these objectives we decided to fit slopes with respective upper/lower bounds around a range of 5Hz above and below the average 15Hz Knee Frequency across datasets.

      The later parts of this same paragraph refer to a vast amount of different frequency ranges, but only the "low" and "high" frequency ranges were previously mentioned. Perhaps the explanation could be expanded to note that multiple lower and upper bounds were tested within each of these low and high frequency windows?

      This is a good catch we adjusted the sentence as suggested. We now write: “.. slopes were fitted individually to each subject's power spectrum in several lower (0.25 – 20 Hz) and higher (10-145 Hz) frequency ranges.”

      The following two sentences seem to contradict each other: "Overall, spectral slopes in lower frequency ranges were more consistently related to heart rate variability indices(> 39.4% percent of all investigated indices)" and: "In the lower frequency range (0.25 - 20Hz), spectral slopes were consistently related to most measures of heart rate variability; i.e. significant effects were detected in all 4 datasets (see Figure 2D)." (39.4% is not "most").

      The reviewer is correct in stating that 39.4% is not most. However, the 39.4% is the lowest bound and only refers to 1 dataset. In the other 3 datasets the percentage of effects was above 64% which can be categorized as “most” i.e. above 50%. We agree that this was a bit ambiguous in the sentence so we added the other percentages as well as a reference to Figure 2D to make this point clearer.

      Figure 2D: it isn't clear what the percentages in the semi-circles reflect, nor why some semi-circles are more full circles while others are only quarter circles.

      The percentages in the semi-circles reflect the amount of effects (marked in red) and null effects (marked in green) per dataset, when viewed as average across the different measures of HRV. Sometimes less effects were found for some frequency ranges resulting in quarters instead of semi circles.

      Page 8: I think the authors could make it more clear that one of the conditions they were testing was the ECG component of the EEG data (extracted by ICA then projected back into the scalp space for the temporal response function analysis).

      As suggested by the reviewer we adjusted our wording and replaced the arguably a bit ambiguous “... projected back separately” with “... projected back into the sensor space”. We thank the reviewer for this recommendation, as it does indeed make it easier to understand the procedure.

      “After pre-processing (see Methods) the data was split in three conditions using an ICA(22). Independent components that were correlated (at r > 0.4; see Methods: MEG/EEG Processing - pre-processing) with the ECG electrode were either not removed from the data (Figure 3ABCD - blue), removed from the data (Figure 2ABCD - orange) or projected back into the sensor space (Figure 3ABCD - green).”

      Figure 4A: standardized beta coefficients for the relationship between age and spectral slope could be noted to provide improved clarity (if I'm correct in assuming that is what they reflect).

      This was indeed shown in Figure 4A and noted in the color bar as “average beta (standardized)”. We do not specifically highlight this in the text, because the exact coefficients would depend on both on the analyzed frequency range and the selected electrodes.

      Figure 4I: The regressions explained at this point seems to contain a very large number of potential predictors, as I'm assuming it includes all sensors for both the ECG component and ECG rejected conditions? (if that is not the case, it could be explained in greater detail). I'm also not sure about the logic of taking a complete signal, decomposing it with ICA to separate out the ECG and non-ECG signals, then including them back into the same regression model. It seems that there could be some circularity or redundancy in doing so. However, I'm not confident that this is an issue, so would appreciate the authors explaining why it this is a valid approach (if that is the case).

      After observing significant effects both in the MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> conditions in similar frequency bands we wanted to understand whether or not these age-related changes are statistically independent. To test this we added both variables as predictors in a regression model (thereby accounting for the influence of the other in relation to age). The regression models we performed were therefore actually not very complex. They were built using only two predictors, namely the data (in a specific frequency range) averaged over channels on which we noticed significant effects in the ECG rejected and ECG components data respectively (Wilkinson notation: age ~ 1 + ECG rejected + ECG components). This was also described in the results section stating that: “To see if MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub> explain unique variance in aging at frequency ranges where we noticed shared effects, we averaged the spectral slope across significant channels and calculated a multiple regression model with MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> as predictors for age (to statistically control for the effect of MEG<sub>ECG component</sub>s and MEG<sub>ECG rejected</sub> on age). This analysis was performed to understand whether the observed shared age-related effects (MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub>) are in(dependent).”  

      We hope this explanation solves the previous misunderstanding.

      The explanation of results for relationships between spectral slopes and aging reported in Figure 4 refers to clusters of effects, but the statistical inference methods section doesn't explain how these clusters were determined.

      The wording of “cluster” was used to describe a “category” of effects e.g. null effects. We changed the wording from “cluster” to “category” to make this clearer stating now that: “This analysis, which is depicted in Figure 4, shows that over a broad amount of individual fitting ranges and sensors, aging resulted in a steepening of spectral slopes across conditions (see Figure 4E) with “steepening effects” observed in 25% of the processing options in MEG<sub>ECG not rejected</sub> , 0.5% in MEG<sub>ECG rejected</sub>, and 60% for MEG<sub>ECG components</sub>. The second largest category of effects were “null effects” in 13% of the options for MEG<sub>ECG not rejected</sub> , 30% in MEG<sub>ECG rejected</sub>, and 7% for MEG<sub>ECG components</sub>. ”

      Page 12: can the authors clarify whether these age related steepenings of the spectral slope in the MEG are when the data include the ECG contribution, or when the data exclude the ECG? (clarifying this seems critical to the message the authors are presenting).

      We apologize for not making this clearer. We now write: “This analysis also indicates that a vast majority of observed effects irrespective of condition (ECG components, ECG not rejected, ECG rejected) show a steepening of the spectral slope with age across sensors and frequency ranges.”

      Page 13: I think it would be useful to describe how much variance was explained by the MEG-ECG rejected vs MEG-ECG component conditions for a range of these analyses, so the reader also has an understanding of how much aperiodic neural activity might be influenced by age (vs if the effects are really driven mostly by changes in the ECG).

      With regards to the explained variance I think that the very important question of how strong age influences changes in aperiodic activity is a topic better suited for a meta analysis. As the effect sizes seems to vary largely depending on the sample e.g. for EEG in the literature results were reported at r=-0.08 (Cesnaite et al. 2023), r=-0.26 (Cellier et al. 2021), r=-0.24/r=-0.28/r=-0.35 (Hill et al. 2022) and r=0.5/r=0.7 (Voytek et al. 2015). I would defer the reader/reviewer to the standardized beta coefficients as a measure of effect size in the current study that is depicted in Figure 4A.

      Cellier, D., Riddle, J., Petersen, I., & Hwang, K. (2021). The development of theta and alpha neural oscillations from ages 3 to 24 years. Developmental cognitive neuroscience, 50, 100969.

      Cesnaite, E., Steinfath, P., Idaji, M. J., Stephani, T., Kumral, D., Haufe, S., ... & Nikulin, V. V. (2023). Alterations in rhythmic and non‐rhythmic resting‐state EEG activity and their link to cognition in older age. NeuroImage, 268, 119810.

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076.

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38), 13257-13265.

      Also, if there are specific M/EEG sensors where the 1/f activity does relate strongly to age, it would be worth noting these, so future research could explore those sensors in more detail.

      I think it is difficult to make a clear claim about this for MEG data, as the exact location or type of the sensor may differ across manufacturers. Such a statement could be easier made for source projected data or in case EEG electrodes were available, where the location would be normed eg. according to the 10-20 system.

      DISCUSSION:

      Page 15: Please change the wording of the following sentence, as the way it is currently worded seems to suggest that the authors of the current manuscript have demonstrated this point (which I think is not the case): "The authors demonstrate that EEG typically integrates activity over larger volumes than MEG, resulting in differently shaped spectra across both recording methods."

      Apologies for the oversight! The reviewer is correct we in fact did not show this, but the authors of the cited manuscript. We correct the sentence as suggested stating now that:

      “Bénar et al. demonstrate that EEG typically integrates activity over larger volumes than MEG, resulting in differently shaped spectra across both recording methods.”

      Page 16: The authors mention the results can be sensitive to the application of SSS to clean the MEG data, but not ICA. I think it would be sensitive to the application of either SSS or ICA?

      This is correct and actually also supported by Figure S7, as differences in ICA thresholds affect also the detection of age-related effects. We therefore adjusted the related sentences stating now that:

      “ In case of the MEG signal this may include the application of Signal-Space-Separation algorithms (SSS(24,55)), different thresholds for ICA component detection (see Figure S7), high and low pass filtering, choices during spectral density estimation (window length/type etc.), different parametrization algorithms (e.g. IRASA vs FOOOF) and selection of frequency ranges for the aperiodic slope estimation.”

      It would be worth clarifying that the linked mastoid re-reference alone has been proposed to cancel out the ECG signal, rather than that a linked-mastoid re-reference improves the performance of the ICA separation (which could be inferred by the explanation as it's currently written).

      This is correct and we adjusted the sentence accordingly! Stating now that:

      “ Previous work(12,56) has shown that a linked mastoid reference alone was particularly effective in reducing the impact of ECG related activity on aperiodic activity measured using EEG. “

      The issue of the number of EEG channels could probably just be noted as a potential limitation, as could the issue of neural activity being mixed into the ECG component (although this does pose a potential confound to the M/EEG without ECG condition, I suspect it wouldn't be critical).

      This is indeed a very fair point as a higher amount of electrodes would probably make it easier to better isolate ECG components in the EEG, which may be the reason why the separation did not work so well in our case. However, this is ultimately an empirical question so we highlighted it in the discussion section stating that: “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources. ”

      OUTLOOK:

      Page 19: Although there has been a recent trend to control for 1/f activity when examining oscillatory power, recent research suggests that this should only be implemented in specific circumstances, otherwise the correction causes more of a confound than the issue does. It might be worth considering this point with regards to the final recommendation in the Outlook section: Brake, N., Duc, F., Rokos, A., Arseneau, F., Shahiri, S., Khadra, A., & Plourde, G. (2024). A neurophysiological basis for aperiodic EEG and the background spectral trend. Nature Communications, 15(1), 1514.

      We want to thank the reviewer for recommending this very interesting paper! The authors of said paper present compelling evidence showing that, while peak detection above an aperiodic trend using methods like FOOOF or IRASA is a prerequisite to determine the presence of oscillatory activity, it’s not necessarily straightforward to determine which detrending approach should be applied to determine the actual power of an oscillation. Furthermore, the authors suggest that wrongfully detrending may cause larger errors than not detrending at all. We therefore added a sentence stating that: “However, whether or not periodic activity (after detection) should be detrended using approaches like FOOOF or IRASA still remains disputed, as incorrectly detrending the data may cause larger errors than not detrending at all(75).”

      RECOMMENDATIONS:

      Page 20: "measure and account for" seems like it's missing a word, can this be re-written so the meaning is more clear?

      Done as suggested. The sentence now states: “To better disentangle physiological and neural sources of aperiodic activity, we propose the following steps to (1) measure and (2) account for physiological influences.”

      I would re-phrase "doing an ICA" to "reducing cardiac artifacts using ICA" (this wording could be changed in other places also).

      I do not like to describe cardiac or ocular activity as artifactual per se. This is also why I used hyphens whenever I mention the word “artifact” in association with the ECG or EOG. However, I do understand that the wording of “doing an ICA” is a bit sloppy. We therefore reworded it accordingly throughout the manuscript to e.g. “separating cardiac from neural sources using an ICA” and “separating physiological from neural sources using an ICA”.

      I would additionally note that even if components are identified as unambiguously cardiac, it is still likely that neural activity is mixed in, and so either subtracting or leaving the component will both be an issue (https://doi.org/10.1101/2024.06.06.597688). As such, even perfect identification of whether components are cardiac or not would still mean the issue remains (and this issue is also consistent across a considerable range of component based methods). Furthermore, current methods including wavelet transforms on the ICA component still do not provide good separation of the artifact and neural activity.

      This is definitely a fair point and we also highlight this in our recommendations under 3 stating that:

      “However, separating physiological from neural sources using an ICA is no guarantee that peripheral physiological activity is fully removed from the cortical signal. Even more sophisticated ICA based methods that e.g. apply wavelet transforms on the ICA components may still not provide a good separation of peripheral physiological and neural activity76,77. This turns the process of deciding whether or not an ICA component is e.g. either reflective of cardiac or neural activity into a challenging problem. For instance, when we only extract cardiac components using relatively high detection thresholds (e.g. r > 0.8), we might end up misclassifying residual cardiac activity as neural. In turn, we can’t always be sure that using lower thresholds won’t result in misinterpreting parts of the neural effects as cardiac. Both ways of analyzing the data can potentially result in misconceptions.”

      Castellanos, N. P., & Makarov, V. A. (2006). Recovering EEG brain signals: Artifact suppression with wavelet enhanced independent component analysis. Journal of neuroscience methods, 158(2), 300-312.

      Bailey, N. W., Hill, A. T., Godfrey, K., Perera, M. P. N., Rogasch, N. C., Fitzgibbon, B. M., & Fitzgerald, P. B. (2024). EEG is better when cleaning effectively targets artifacts. bioRxiv, 2024-06.

      METHODS:

      Pre-processing, page 24: I assume the symmetric setting of fastica was used (rather than the deflation setting), but this should be specified.

      Indeed the reviewer is correct, we used the standard setting of fastICA implemented in MNE python, which is calling the FastICA implementation in sklearn that is per default using the “parallel” or symmetric algorithm to compute an ICA. We added this information to the text accordingly, stating that:

      “For extracting physiological “artifacts” from the data, 50 independent components were calculated using the fastica algorithm(22) (implemented in MNE-Python version 1.2; with the parallel/symmetric setting; note: 50 components were selected for MEG for computational reasons for the analysis of EEG data no threshold was applied).”

      Temporal response functions, page 26: can the authors please clarify whether the TRF is computed against the ECG signal for each electrode or sensory independently, or if all electrodes/sensors are included in the analysis concurrently? I'm assuming it was computed for each electrode and sensory separately, since the TRF was computed in both the forward and backwards direction (perhaps the meaning of forwards and backwards could be explained in more detail also - i.e. using the ECG to predict the EEG signal, or using the EEG signal to predict the ECG signal?).

      A TRF can also be conceptualized as a multiple regression model over time lags. This means that we used all channels to compute the forward and backward models. In the case of the forward model we predicted the signal of the M/EEG channels in a multivariate regression model using the ECG electrode as predictor. In case of the backward model we predicted the ECG electrode based on the signal of all M/EEG channels. The forward model was used to depict the time window at which the ECG signal was encoded in the M/EEG recording, which appears at 0 time lags indicating volume conduction. The backward model was used to see how much information of the ECG was decodable by taking the information of all channels.

      We tried to further clarify this approach in the methods section stating that:

      “We calculated the same model in the forward direction (encoding model; i.e. predicting M/EEG data in a multivariate model from the ECG signal) and backward direction (decoding model; i.e. predicting the ECG signal using all M/EEG channels as predictors).”

      Page 27: the ECG data was fit using a knee, but it seems the EEG and MEG data was not.

      Does this different pose any potential confound to the conclusions drawn? (having said this, Figure S4 suggests perhaps a knee was tested in the M/EEG data, which should perhaps be explained in the text also).

      This was indeed tested in a previous review round to ensure that our results are not dependent on the presence/absence of a knee in the data. We therefore added figure S4, but forgot to actually add a description in the text. We are sorry for this oversight and added a paragraph to S1 accordingly:

      “Using FOOOF(5), we also investigated the impact of different slope fitting options (fixed vs. knee model fits) on the aperiodic age relationship (see Supplementary Figure S4). The results that we obtained from these analyses using FOOOF offer converging evidence with our main analysis using IRASA.”

      Page 32: my understanding of the result reported here is that cleaning with ICA provided better sensitivity to the effects of age on 1/f activity than cleaning with SSS. Is this accurate? I think this could also be reported in the main manuscript, as it will be useful to researchers considering how to clean their M/EEG data prior to analyzing 1/f activity.

      The reviewer is correct in stating that we overall detected slightly more “significant” effects, when not additionally cleaning the data using SSS. However, I am a bit wary of recommending omitting the use of SSS maxfilter solely based on this information. It can very well be that the higher quantity of effects (when not employing SSS maxfilter) stems from other physiological sources (e.g. muscle activity) that are correlated with age and removed when applying SSS maxfiltering. I think that just conditioning the decision of whether or not maxfilter is applied based on the amount or size of effects may not be the best idea. Instead I think that the applicability of maxfilter for research questions related to aperiodic activity should be the topic of additional methodological research. We therefore now write in Text S1:

      “Considering that we detected less and weaker aperiodic effects when using SSS maxfilter is it advisable to omit maxfilter, when analyzing aperiodic signals? We don’t think that we can make such a judgment based on our current results. This is because it's unclear whether or not the reduction of effects stems from an additional removal of peripheral information (e.g. muscle activity; that may be correlated with aging) or is induced by the SSS maxfiltering procedure itself. As the use of maxfilter in detecting changes of aperiodic activity was not subject of analysis that we are aware of, we suggest that this should be the topic of additional methodological research.”

      Page 39, Figure S6 and Figure S8: Perhaps the caption could also briefly explain the difference between maxfilter set to false vs true? I might have missed it, but I didn't gain an understanding of what varying maxfilter would mean.

      Figure S6 shows the effect of ageing on the spectral slope averaged across all channels. The maxfilter set to false in AB) means that no maxfiltering using SSS was performed vs. in CD) where the data was additionally processed using the SSS maxfilter algorithm. We now describe this more clearly by writing in the caption:

      “Supplementary Figure S6: Age-related changes in aperiodic brain activity are most prominent on explained by cardiac components irrespective of maxfiltering the data using signal space separation (SSS) or not AC) Age was used to predict the spectral slope (fitted at 0.1-145Hz) averaged across sensors at rest in three different conditions (ECG components not rejected [blue], ECG components rejected [orange], ECG components only [green].”

    1. eLife Assessment

      This comprehensive study presents important findings that delineate how specific dopaminergic neurons (DANs) instruct aversive learning in Drosophila larvae exposed to high salt through an integration of behavioral experiments, imaging, and connectomic analysis. The work reveals how a numerically minimal circuit achieves remarkable functional complexity, with redundancies and synergies within the DL1 cluster that challenge our understanding of how few neurons generate learning behaviors. By establishing a framework for sensory-driven learning pathways, the study makes a compelling and substantial contribution to understanding associative conditioning while demonstrating conservation of learning mechanisms across Drosophila developmental stages.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper Weber et al. investigate the role of 4 dopaminergic neurons of the Drosophila larva in mediating the association between an aversive high-salt stimulus and a neutral odor. The 4 DANs belong to the DL1 cluster and innervate non-overlapping compartments of the mushroom body, distinct from those involved in appetitive associative learning. Using specific driver lines for individual neurons, the authors show that activation of the DAN-g1 is sufficient to mimic an aversive memory and it is also necessary to form a high-salt memory of full strength, although optogenetic silencing of this neuron has only a partial phenotype. The authors use calcium imaging to show that the DAN-g1 is not the only DAN responding to salt. DAN-c1 and d1 also respond to salt, but they seem to play no role for the associative memory. DAN-f1, which does not respond to salt, is able to lead to the formation of a memory (if optogenetically activated), but it is not necessary for the salt-odor memory formation in normal conditions. However, when silenced together with DAN-g1, it enhances the memory deficit of DAN-g1. Overall, this work brings evidence of a complex interaction between DL1 DANs in both the encoding of salt signals and their teaching role in associative learning, with none of them being individually necessary and sufficient for both functions.

      Strengths:

      Overall, the manuscript contributes interesting results that are useful to understand the organization and function of the dopaminergic system. The behavioral role of the specific DANs is accessed using specific driver lines which allow to test their function individually and in pairs. Moreover, the authors perform calcium imaging to test whether DANs are activated by salt, a prerequisite for inducing a negative association to it. Proper genetic controls are carried across the manuscript.

      Weaknesses:

      The authors use two different approaches to silence dopaminergic neurons: optogenetics and induction of apoptosis. The results are not always consistent, but the authors discuss these differences appropriately. In general, the optogenetic approach is more appropriate as developmental compensations are not of major interest for the question investigated.

      The physiological data would suggest the role of a certain subset of DANs in salt-odor association, but a different partially overlapping set is necessary in behavioral assays (with a partial phenotype). No manipulation completely abolishes the salt-odor association, leaving important open questions on the identity of the neural circuits involved in this behavior.

      The EM data analysis reveals a non-trivial organization of sensory inputs into DANs, but it is difficult to extrapolate a link to the functional data presented in the paper.

    3. Reviewer #2 (Public review):

      Summary:

      In this work the authors show that dopaminergic neurons (DANs) from the DL1 cluster in Drosophila larvae are required for the formation of aversive memories. DL1 DANs complement pPAM cluster neurons which are required for the formation of attractive memories. This shows the compartmentalized network organization of how an insect learning center (the mushroom body) encodes memory by integrating olfactory stimuli with aversive or attractive teaching signals. Interestingly, the authors found that the 4 main dopaminergic DL1 neurons act partially redundant, and that single cell ablation did not result in aversive memory defects. However, ablation or silencing of a specific DL1 subset (DAN-f1,g1) resulted in reduced salt aversion learning, which was specific to salt but no other aversive teaching stimuli tested. Importantly, activation of these DANs using an optogenetic approach was also sufficient to induce aversive learning in the presence of high salt. Together with the functional imaging of salt and fructose responses of the individual DANs and the implemented connectome analysis of sensory (and other) inputs to DL1/pPAM DANs this represents a very comprehensive study linking the structural, functional and behavioral role of DL1 DANs. This provides fundamental insight into the function of a simple yet efficiently organized learning center which displays highly conserved features of integrating teaching signals with other sensory cues via dopaminergic signaling.

      Strengths:

      This is a very careful, precise and meticulous study identifying the main larval DANs involved in aversive learning using high salt as a teaching signal. This is highly interesting because it allows to define the cellular substrates and pathways of aversive learning down to the single cell level in a system without much redundancy. It therefore sets the basis to conduct even more sophisticated experiments and together with the neat connectome analysis opens the possibility to unravel different sensory processing pathways within the DL1 cluster and integration with the higher order circuit elements (Kenyon cells and MBONs). The authors' claims are well substantiated by the data and balanced, putting their data in the appropriate context. The authors also implemented neat pathway analyses using the larval connectome data to its full advantage, thus providing network pathways that contribute towards explaining the obtained results.

      Weaknesses:

      Previous comments were fully addressed by the authors.

    4. Reviewer #3 (Public review):

      The study of Weber et al. provides a thorough investigation of the roles of four individual dopamine neurons for aversive associative learning in the Drosophila larva. They focus on the neurons of the DL-1 cluster which already have been shown to signal aversive teaching signals. But the authors go beyond the previous publications and test whether each of these dopamine neurons responds to salt or sugar, is necessary for learning about salt, bitter, or sugar, and is sufficient to induce a memory when optogenetically activated. In addition, previously published connectomic data is used to analyze the synaptic input to each of these dopamine neurons. The authors conclude that the aversive teaching signal induced by salt is distributed across the four DL-1 dopamine neurons, with two of them, DAN-f1 and DAN-g1, being particularly important. Overall, the experiments are well designed and performed, support the authors' conclusions, and deepen our understanding of the dopaminergic punishment system.

      Strengths:

      (1) This study provides, at least to my knowledge, the first in vivo imaging of larval dopamine neurons in response to tastants. Although the selection of tastants is limited, the results close an important gap in our understanding of the function of these neurons.<br /> (2) The authors performed a large number of experiments to probe for the necessity of each individual dopamine neuron, as well as combinations of neurons, for associative learning. This includes two different training regimen (1 or 3 trials), three different tastants (salt, quinine and fructose) and two different effectors, one ablating the neuron, the other one acutely silencing it. This thorough work is highly commendable, and the results prove that it was worth it. The authors find that only one neuron, DAN-g1, is partially necessary for salt learning when acutely silenced, whereas a combination of two neurons, DAN-f1 and DAN-g1, are necessary for salt learning when either being ablated or silenced.<br /> (3) In addition, the authors probe whether any of the DL-1 neurons is sufficient for inducing an aversive memory. They found this to be the case for two of the neurons, largely confirming previous results obtained by a different learning paradigm, parameters and effector.<br /> (4) This study also takes into account connectomic data to analyze the sensory input that each of the dopamine neurons receives. This analysis provides a welcome addition to previous studies and helps to gain a more complete understanding. The authors find large differences in inputs that each neuron receives, and little overlap in input that the dopamine neurons of the "aversive" DL-1 cluster and the "appetitive" pPAM cluster seem to receive.<br /> (5) Finally, the authors try to link all the gathered information in order to describe an updated working model of how aversive teaching signals are carried by dopamine neurons to the larva's memory center. This includes important comparisons both between two different aversive stimuli (salt and nociception) and between the larval and adult stages.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Weber et al. investigate the role of 4 dopaminergic neurons of the Drosophila larva in mediating the association between an aversive high-salt stimulus and a neutral odor. The 4 DANs belong to the DL1 cluster and innervate non-overlapping compartments of the mushroom body, distinct from those involved in appetitive associative learning. Using specific driver lines, they show that activation of the DAN-g1 is sufficient to mimic an aversive memory and it is also necessary to form a high-salt memory of full strength, although optogenetic silencing of this neuron only partially affects the performance index. The authors use calcium imaging to show that the DAN-g1 is not the only one that responds to salt. DAN-c1 and d1 also respond to salt, but they seem to play no role in the assays tested. DAN-f1, which does not respond to salt, is able to lead to the formation of memory (if optogenetically activated), but it is not necessary for the salt-odor memory formation in normal conditions. However, silencing of DAN-f1 together with DAN-g1, enhances the memory deficit of DAN-g1.

      Strengths:

      The paper therefore reveals that also in the Drosophila larva as in the adult, rewards and punishments are processed by exclusive sets of DANs and that a complex interaction between a subset of DANs mediates salt-odor association.

      Overall, the manuscript contributes valuable results that are useful for understanding the organization and function of the dopaminergic system. The behavioral role of the specific DANs is accessed using specific driver lines which allow for testing of their function individually and in pairs. Moreover, the authors perform calcium imaging to test whether DANs are activated by salt, a prerequisite for inducing a negative association with it. Proper genetic controls are carried across the manuscript.

      Weaknesses:

      The authors use two different approaches to silence dopaminergic neurons: optogenetics and induction of apoptosis. The results are not always consistent, and the authors could improve the presentation and interpretation of the data. Specifically, optogenetics seems a better approach than apoptosis, which can affect the overall development of the system, but apoptosis experiments are used to set the grounds of the paper.

      The physiological data would suggest the role of a certain subset of DANs in salt-odor association, but a different partially overlapping set seems to be necessary. This should be better discussed and integrated into the author's conclusion. The EM data analysis reveals a non-trivial organization of sensory inputs into DANs and it is hard to extrapolate a link to the functional data presented in the paper.

      We would like to thank reviewer 1 for the positive evaluation of our work and for the critical suggestions for improvement. In the new version of the manuscript, we have centralized the optogenetic results and moved some of the ablation experiments to the Supplement. We also discuss in detail the experimental differences in the results. In addition, we have softened our interpretation of the specificity of memory for salt. As a result, we now emphasize more the general role of DANs for aversive learning in the larva. These changes are now also summarized and explained more simply and clearly in the Discussion, along with a revised discussion of the EM data.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors show that dopaminergic neurons (DANs) from the DL1 cluster in Drosophila larvae are required for the formation of aversive memories. DL1 DANs complement pPAM cluster neurons which are required for the formation of attractive memories. This shows the compartmentalized network organization of how an insect learning center (the mushroom body) encodes memory by integrating olfactory stimuli with aversive or attractive teaching signals. Interestingly, the authors found that the 4 main dopaminergic DL1 neurons act redundantly, and that single-cell ablation did not result in aversive memory defects. However, ablation or silencing of a specific DL1 subset (DAN-f1,g1) resulted in reduced salt aversion learning, which was specific to salt but no other aversive teaching stimuli were tested. Importantly, activation of these DANs using an optogenetic approach was also sufficient to induce aversive learning in the presence of high salt. Together with the functional imaging of salt and fructose responses of the individual DANs and the implemented connectome analysis of sensory (and other) inputs to DL1/pPAM DANs, this represents a very comprehensive study linking the structural, functional, and behavioral role of DL1 DANs. This provides fundamental insight into the function of a simple yet efficiently organized learning center which displays highly conserved features of integrating teaching signals with other sensory cues via dopaminergic signaling.

      Strengths:

      This is a very careful, precise, and meticulous study identifying the main larval DANs involved in aversive learning using high salt as a teaching signal. This is highly interesting because it allows us to define the cellular substrates and pathways of aversive learning down to the single-cell level in a system without much redundancy. It therefore sets the basis to conduct even more sophisticated experiments and together with the neat connectome analysis opens the possibility of unraveling different sensory processing pathways within the DL1 cluster and integration with the higher-order circuit elements (Kenyon cells and MBONs). The authors' claims are well substantiated by the data and clearly discussed in the appropriate context. The authors also implement neat pathway analyses using the larval connectome data to its full advantage, thus providing network pathways that contribute towards explaining the obtained results.

      Weaknesses:

      While there is certainly room for further analysis in the future, the study is very complete as it stands. Suggestions for clarification are minor in nature.

      We would like to thank reviewer 2 for the positive evaluation of our work. In fact, follow-up work is already underway to further analyze the role of the individual DL1 DANs. We have addressed the constructive and detailed suggestions for improvement in our point-by-point responses in the “Recommendations for the authors” section.

      Reviewer #3 (Public Review):

      The study of Weber et al. provides a thorough investigation of the roles of four individual dopamine neurons for aversive associative learning in the Drosophila larva. They focus on the neurons of the DL-1 cluster which already have been shown to signal aversive teaching signals. However, the authors go far beyond the previous publications and test whether each of these dopamine neurons responds to salt or sugar, is necessary for learning about salt, bitter, or sugar, and is sufficient to induce a memory when optogenetically activated. In addition, previously published connectomic data is used to analyze the synaptic input to each of these dopamine neurons. The authors conclude that the aversive teaching signal induced by salt is distributed across the four DL-1 dopamine neurons, with two of them, DAN-f1 and DAN-g1, being particularly important. Overall, the experiments are well designed and performed, support the authors' conclusions, and deepen our understanding of the dopaminergic punishment system.

      Strengths:

      (1) This study provides, at least to my knowledge, the first in vivo imaging of larval dopamine neurons in response to tastants. Although the selection of tastants is limited, the results close an important gap in our understanding of the function of these neurons.

      (2) The authors performed a large number of experiments to probe for the necessity of each individual dopamine neuron, as well as combinations of neurons, for associative learning. This includes two different training regimens (1 or 3 trials), three different tastants (salt, quinine, and fructose) and two different effectors, one ablating the neuron, the other one acutely silencing it. This thorough work is highly commendable, and the results prove that it was worth it. The authors find that only one neuron, DAN-g1, is partially necessary for salt learning when acutely silenced, whereas a combination of two neurons, DAN-f1 and DAN-g1, are necessary for salt learning when either being ablated or silenced.

      (3) In addition, the authors probe whether any of the DL-1 neurons is sufficient for inducing an aversive memory. They found this to be the case for three of the neurons, largely confirming previous results obtained by a different learning paradigm, parameters, and effector.

      (4) This study also takes into account connectomic data to analyze the sensory input that each of the dopamine neurons receives. This analysis provides a welcome addition to previous studies and helps to gain a more complete understanding. The authors find large differences in inputs that each neuron receives, and little overlap in input that the dopamine neurons of the "aversive" DL-1 cluster and the "appetitive" pPAM cluster seem to receive.

      (5) Finally, the authors try to link all the gathered information in order to describe an updated working model of how aversive teaching signals are carried by dopamine neurons to the larva's memory center. This includes important comparisons both between two different aversive stimuli (salt and nociception) and between the larval and adult stages.

      Weaknesses:

      (1) The authors repeatedly claim that they found/proved salt-specific memories. I think this is problematic to some extent.

      (1a) With respect to the necessity of the DL-1 neurons for aversive memories, the authors' notion of salt-specificity relies on a significant reduction in salt memory after ablating DAN-f1 and g1, and the lack of such a reduction in quinine memory. However, Fig. 5K shows a quite suspicious trend of an impaired quinine memory which might have been significant with a higher sample size. I therefore think it is not fully clear yet whether DAN-f1 and DAN-g1 are really specifically necessary for salt learning, and the conclusions should be phrased carefully.

      (1b) With respect to the results of the optogenetic activation of DL-1 neurons, the authors conclude that specific salt memories were established because the aversive memories were observed in the presence of salt. However, this does not prove that the established memory is specific to salt - it could be an unspecific aversive memory that potentially could be observed in the presence of any other aversive stimuli. In the case of DAN-f1, the authors show that the neuron does not even get activated by salt, but is inhibited by sugar. Why should activation of such a neuron establish a specific salt memory? At the current state, the authors clearly showed that optogenetic activation of the neurons does induce aversive memories - the "content" of those memories, however, remains unknown.

      (2) In many figures (e.g. figures 4, 5, 6, supplementary figures S2, S3, S5), the same behavioural data of the effector control is plotted in several sub-figures. Were these experiments done in parallel? If not, the data should not be presented together with results not gathered in parallel. If yes, this should be clearly stated in the figure legends.

      We would also like to thank reviewer 3 for his positive assessment of our work. As already mentioned by reviewer 1, we understand the criticism that the salt specificity for which the individual DANs are coded is not fully always supported by the results of the work. We have therefore rewritten the relevant passages, which are also cited by the reviewer. We have also included the second point of criticism and incorporated it into our manuscript. As the control groups were always measured in parallel with the experimental animals, we can also present the data together in a sub-figure. We clearly state this now in the revised figure legends.

      Summary of recommendations to authors:

      Overall, the study is commendable for its systematic approach and solid methodology. Several weaknesses were identified, prompting the need for careful revisions of the manuscript:

      We thank the reviewers for the careful revision of our manuscript. In the subsequent sections, we aim to address their concerns as thoroughly as possible. A comprehensive one-to-one listing can be found below.

      (1) The authors should reconsider their assertion of uncovering a salt-specific memory, as the evidence does not conclusively demonstrate the exclusive necessity of DAN-f1 and DAN-g1 for salt learning. In particular, the optogenetic activation of DAN-f1 leads to plasticity but this might not be salt-specific. The precise nature of the memory content remains elusive, warranting a nuanced rephrasing of the conclusions.

      We only partially agree – optogenetic activation of DANs does not really allow to comment on its salt-specificity, true. However, we used high-salt concentrations during test. Over the years, the Gerber lab nicely demonstrated in several papers that larvae recall an aversive odor-salt memory only if salt is present during test (Gerber and Hendel, 2006; Niewalda et al 2008; Schleyer et al. 2011; Schleyer et al. 2015). The used US has to be present during test. Even at the same concentration other aversive stimuli (e.g. bitter quinine) are not able to allow the larvae to recall this particular type of memory. So, if the optogenetic activation of DAN-f1 establishes a memory that can be recalled on salt, we argue that it has to encode aspects of the salt information. On the other hand, only for DAN-g1 we see the necessity for salt learning. And – although (based on the current literature) very unlikely, we cannot fully exclude that the activation of DAN-f1 establishes a yet unknown type of memory that can be also recalled on a salt plate. Therefore, we partially agree and accordingly have rephrased the entire manuscript to avoid an over-interpretation of our data. Throughout the manuscript we avoid now to use the term salt-specific memory but rather describe the type of memory as aversive memory.

      (2) A thorough examination or discussion about the potential influence of blue light aversion on behavioral observations is necessary to ensure a balanced interpretation of the findings.

      To address this point every single behavioral experiment that uses optogenetic blue light activation runs with appropriate and mandatory controls. For blue light activation experiments, two genetic controls are used that either get the same blue light treatment (effector control, w1118>UAS-ChR2XXL) or no blue light treatment (dark control, XY-split-Gal4>UAS-ChR2XXL). For blue light inactivation experiments one group is added that has exactly the same genotype but did not receive food containing retinal. These experiments show that blue light exposure itself does not induce an aversive nor positive memory and blue light exposure does not impair the establishment of odor-high salt memory. In addition, we used the latest established transgenes available. ChR2<sup>XXL</sup> is very sensitive to blue light. Only 220 lux (60 µW/cm<sup>²</sup>) were necessary to obtain stable results. In our hands – short term exposure for up to 5 minutes with such low intensities does not induce a blue light aversion. Following the advice of the reviewer, we also address this concern by adding several sentences into the related results and methods sections.

      (3) The authors should address the limitations associated with the use of rpr/hid for neuronal ablations, such as the effects of potential developmental compensation.

      We agree with this concern. It is well possible that the ablation experiments induce compensatory effects during larval development. Such an effect may be the reason for differences in phenotypes when comparing hid,rpr ablation with optogenetic inhibition. This is now part of the discussion. In addition, we evaluated if the ablation worked in our experiments. So far controls were missing that show that the expression of hid,rpr really leads to the ablation of DANs. We now added these experiments and clearly show anatomically that the DANs are ablated (related to figure 4-figure supplement 6).

      (4) While the connectome analysis offers valuable insights into the observed functions of specific DANs in relation to their extrinsic (sensory) and intrinsic (state) inputs, integrating this data more cohesively within the manuscript through careful rewriting would enhance the coherence of the study.

      We understand this concern. Therefore, the new version of our manuscript is now intensifying the inclusion of the EM data in our interpretation of the results. Throughout the entire manuscript we have now rewritten the related parts. We have also completely revised the corresponding section in the results chapter.

      (5) More generally, the authors are encouraged to discuss internal discrepancies in the results of their functional manipulation experiments.

      Thank you for this suggestion. We do of course understand that we have not given the different results enough space in the discussion. We have now changed this and have been happy to comprehensively address the concern. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some suggestions for clarification and improvement of the manuscript:

      (1) The authors should discuss why the silencing experiment with TH-GAL4 (Fig. 1) does not abolish memory formation (I assume that the PI should go to zero). Does it mean that other non-TH neurons are involved in salt-odor memory formation? Are there other lines that completely abolish this type of learning?

      Thank you very much for highlighting this crucial point. Indeed, the functional intervention does not completely eliminate the memory. There could be several reasons, or a combination thereof, for this outcome. For instance, it's plausible that the UAS-GtACR2 effector doesn't entirely suppress the activity of dopaminergic neurons. Additionally, the memory may comprise different types, not all of which are linked to dopamine function. It's also noteworthy that TH-Gal4 doesn't encompass all dopaminergic neurons – even a neuron from the DL1 cluster is absent (as previously reported in Selcho et al., 2009). Considering we're utilizing high salt concentrations in this experiment, it's conceivable that non gustatory-driven memories are formed based solely on the systemic effects of salt (e.g., increased osmotic pressure). These possibilities are now acknowledged in the text.

      (2) The Rpr experiments in Fig. 4 do not lead to any phenotype and there is a general assumption that the system compensates during development. However, there is no demonstration that Rpr worked or that development compensated for that. What do we learn from these data? Would it make sense to move it to supplement to make the story more compact? In addition: the conclusion at L 236 "DL1.... Are not individually necessary" is later disproved by optogenetic silencing. Similarly, optogenetic silencing of f1+g1 is affecting 1X and 3X learning, but not when using Rpr. Moreover, Rpr wdid not give any phenotype in other data in the supplementary material. I'm not sure how valid these results are.

      We acknowledge this concern and have actively deliberated various options for restructuring the presented ablation data. Ultimately, we reached a consensus that relocating Figure 4 to the supplement is warranted. Furthermore, corresponding adjustments have been made in the text. This decision amplifies the significance of the optogenetic results. In addition, we also addressed the other part of the concern. We examined the efficacy of hid and rpr in our experiments. Indeed, we successfully ablated specific DANs, as illustrated in the new anatomical data presented in Figure 4- figure supplement 6, which strengthens the interpretation of the hid,rpr experiments.

      (3) In most figures that show data for 1X and 3X training, there is no difference between these two conditions (I would suggest moving one set as a supplement). When a difference appears (Fig.5A-D) the implications are not discussed properly. Is it known that some circuits are necessary for the 1X but not for the 3X protocol? Is that a reasonable finding? I would expect the opposite, but I might lack of knowledge here. However, the optogenetic silencing of the same neurons in Figure 7 shows the same phenotype for 1X and 3X. Again, the validity of the Rpr experiments seems debatable.

      Different training protocols lead to different memory phases (STM and STM+ARM). We have shown that in the past in Widmann et al. 2016. Therefore, we are convinced that it makes sense to keep both data sets in the main manuscript. However, we agree that this was not properly introduced and discussed and therefore made the respective changes in the manuscript.

      (4) In Figure 3, it is unclear what the responses were tested against. Since they are so small and noisy there would be a need for a control. Moreover, in some cases, it looks like the DF/F is normalized to the wrong value: e.g. in DAN-c1 100mM, the activity in 0-10s is always above zero, and in pPAM with fructose is always below zero. This might not have any consequence on the results but should be adjusted.

      Thank you very much for your criticism, which we greatly appreciate. We have carefully re-examined the data and found that there was a mistake for the normalization of the values. We made the necessary adjustments to the evaluation, as per your suggestions. The updated figures, figure legends, and results have been incorporated into the new version of the manuscript. As noted by the reviewer, these corrections have not altered the interpretation of the data or the primary responses of the various DANs.

      (5) In the abstract: "Optogenetic activation of DAN-f1 and DAN-g1 alone suffices to substitute for salt punishment... Each DAN encodes a different aspect of salt punishment". These sentences might be misleading and an overstatement: only DAN-g1 shows a clear role, while the function of the other DANs in the context of salt-odor learning remains obscure.

      We have refined the respective part of the abstract accordingly. Consequently, we have reworded the related section, aiming to avoid any exaggeration.

      (6) The physiology is done in L1 larvae but behavior is tested in L3 larvae. There could be a change in this time that could explain the salt responses in c1 and d1 but no role in salt-odor learning?

      While we cannot dismiss the possibility of a developmental change from L1 to L3, a comparison of the anatomical data of the DL1 DANs from electron microscopy (EM) and light microscopy (LM) data indicates that their overall morphology remains consistent. However, it's important to note that this observation does not analyse the physiological aspects of these cells. Consequently, we have incorporated this concern into the discussion of the revised version of the manuscript.

      (7) The introduction needs some editing starting at L 129, as it ends with a discussion of a previously published EM data analysis. I would rather suggest stating which questions are addressed in this paper and which methods will be used and perhaps a hint on the results obtained.

      We understand the concern. We have added a concise paragraph to the conclusion of the introduction, highlighting the biological question, technical details, and a short hint on the acquired findings.

      (8) It is clear to me that the presentation of salt during the test is necessary for recall, however in L 166 I don't understand the explanation: how is the memory used in a beneficial way in the test? The salt is present everywhere and the odor cue is actually useless to escape it.

      Extensive research, exemplified by studies such as Schleyer et al. (2015) published in Elife, clearly demonstrates that the recall of odor-high salt memory occurs exclusively when tested on a high salt plate. Even when tested on a bitter quinine plate, the aversive memory is not recalled. This phenomenon is attributed to the triggering of motivation to recall the memory by the omnipresent abundance of the unconditioned stimulus (US) during the test, which in our case is high salt. Furthermore, the concentration of the stimulus plays a crucial role (Schleyer et al. 2011). The odor cue indicates where the situation could potentially be improved; however, if high salt is absent, this motivational drive diminishes as there is no memory present to enhance the already favorable situation. Additionally, the motivation to evade the omnipresent and unpleasant high salt stimulus persists throughout the entire 5-minute test period.

      (9) L288: the fact that f1 shows a phenotype in this experiment does not mean that it encodes a salt signal, indeed it does not respond to salt. It perhaps induces a plasticity that can be recalled by salt, but not necessarily linked to salt. The synergy between f1 and g1 in the salt assay was postulated based on exp with Rpr, but the validity of these experiments is dubious. I'm not sure there is sufficient evidence from Figures 6 and 7 to support a synergistic action between f1 and g1.

      It is true that DAN-f1 alone is not necessary for mediating a high salt teaching signal based on ablation, optogenetic inhibition and even physiology. However, optogenetic activation alone shows a memory tested on a salt plate. Given the logic explained above that is accepted by several publications, we would like to keep the statement. Especially as the joined activation with DAN-g1 gives rise to significant higher or lower values after joined optogenetic activation or inactivation (Figure 5E and F, Figure 6E and F in the new version). Nevertheless, we have modified the sentence. In the text we describe these effects now as “these results may suggest that DAN-f1 and DAN-g1 encode aspects of the natural aversive high salt teaching signal under the conditions that we tested”. We think that this is an appropriate and three-fold restricted statement. Therefore, we would like to keep it in this restricted version. However, we are happy to reconsider this if the reviewer thinks it is critical. 

      (10) I find the EM analysis hard to read. First of all, because of the two different graphical representations used in Fig. 8, wouldn't one be sufficient to make the point? Secondly, I could not grasp a take-home-message: what do we learn from the EM data? Do they explain any of the results? It seems to me that they don't provide an explanation of why some DL1 neurons respond to salt and others don't.

      We understand that the EM analysis is hard to read and have now carefully rewritten this part of the manuscript. See also general concern 4 above. The main take home message is not to explain why some DL1 neurons respond to salt and other do not. This cannot be resolved due to the missing information on the salt perceiving receptor cells. Unfortunately, we miss the peripheral nervous system in the EM - the first layer of salt information processing. However, our analysis shows clearly that the 4 DANs have their own identity based on their connectivity. None of them is the same – but to a certain extent similarities exist. This nicely reflects the physiological and behavioral results. We have now clarified that in the result to ease the understanding for the readership. In addition, we also clearly state that we don’t address the point why some DL1 neurons respond to salt and why others don’t respond.

      (11) Do the manipulations (activation and silencing) affect odor preference in the presence of salt? Did the authors test that the two odors do not drive different behaviors on the salty plate? Or did they only test the odor preference on plain agarose? Can we exclude a role for the DAN in driving multisensory-driven innate behavior?

      Innate odor preferences are not changed by the presence of salt or even other tastants (this work but see also Schleyer et al 2015, Figure 3, Elife). Even the naïve choice between two odors is the same if tested in the presence of different tastants (Schleyer et al 2015, Figure 3, Elife). This shows – at least for the tested stimuli and conditions – that are similar to the ones that we use – that there is no multisensory-driven innate odor-taste behavior. Therefore – at least to our knowledge - experiments as the ones suggested by the reviewer were never done in larval odor-taste learning studies. Therefore, we suggest that DAN activation has no effect on innate larval behavior. However, we are happy to reconsider this if the reviewer thinks it is critical. 

      (12) L 280: the authors generalize the conclusion to all DL1-DANs, but it does not apply to c1 and d1.

      Thanks for this comment. We deleted that sentence as suggested and thus do not anymore generalize the conclusion to all DL-DANs.

      (13) L345: I do not see the described differences in Fig. 8F, presynaptic sites of both types seem to appear in rather broad regions: could the author try to clarify this?

      We understand that the anatomical description of the data is often hard to read. Especially to readers that are not used to these kind of figures. We have therefore modified the text to ease the understanding and clarify the difference in the labeled brain regions for the broad readership.

      (14) L373: the conclusion on c1 is unsupported by data: this neuron responds to both salt and fructose (Figure 3 ) while the conclusion is purely based on EM data analysis.

      The sentence is not a conclusion but a speculation and we also list the cell's response to positive and negative gustatory stimuli. Therefore, we do not understand exactly what the reviewer means here. However, we have tried to address the criticism and have revised the sentences.

      (15) L385: the data on d1 seem to be inconsistent with Eschbach 2020, but the authors do not discuss if this is due to the differential vs absolute training, or perhaps the presence of the US during the test (which does not seem to be there in Eschbach, 2020) - is the training protocol really responsible for this inconsistency? For f1 the data seem to be consistent across these studies. The authors should clarify how the exp in Fig 6 differs from Eschbach, 2020 and how one could interpret the differences.

      True. This concern is correct. We now discuss the difference in more detail. Eschbach et al. used Cs-Crimson as a genetic tool, a one odor paradigm with 3 training cycles, and no gustatory cues in their approach. These differences are now discussed in the new version of the manuscript.

      (16) L460-475 A long part of this paragraph discusses the similarities between c1 and d1 and corresponding PPL1 neurons in the adult fly. However, c1 and d1 do not really show any phenotype in this paper, I'm not sure what we learn from this discussion and how much this paper can contribute to it. I would have wished for a discussion of how one could possibly reconcile the observed inconsistencies.

      Based on the comments of the different reviewers several paragraphs in the discussion were modified. We agree that the part on the larval-adult comparison is quite long. Thus we have shortened it as suggested by the reviewer.

      Minor corrections:

      L28 "resultant association" maybe resulting instead.

      L55 "animals derive benefit": remove derive.

      L78 "composing 12,000 neurons": composed of.

      L79 what is stable in a "stable behavioral assay"?

      L104: 2 times cluste.

      L122: "DL1 DANs are involved" in what?

      Fig. 1 please check subpanels labels, D repeats.

      L 362: "But how do individual neurons contribute to the teaching signal of the complete cluster?" I don't understand the question.

      L364 I did not hear before about the "labeled line hypothesis" in this context - could the author clarify?

      L368: edit "combinatorically".

      L390: "current suppression" maybe acute suppression.

      L 400 I'm not sure what is meant by "judicious functional configuration" and "redundancy". The functions of these cells are not redundant, and no straightforward prediction of their function can be done from their physiological response to salt.

      Thanks a lot for your in detail review of our manuscript. We welcome your well-taken concerns and have made the requested changes for all points that you have raised.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figure 1 the reconstruction of pPAM and DL1 DANs shows the compartmentalized innervation of the larval MB. However, the images are a bit low in color contrast to appreciate the innervation well. In particular in panel B, it is hard to identify the innervated MB body structure. A schematic model of the larval MB and DAN innervation domains like in Fig. 2A would help to clarify the innervation pattern to the non-specialist.

      We understand this concern and have changed figure 1 as suggested by the reviewer. A schematic model of the MB and DANs is now presented already in figure 1 as well as the according supplemental figure.

      (2) Blue light itself can be aversive for larvae and thus interfere with the aversive learning paradigm. Does the given Illuminance (220 lux) used in these experiments affect the behavior and learning outcome?

      Yes, in former times high intensities of blue light were necessary to trigger the first generation optogenetic tools. The high intensity blue light itself was able to establish an aversive memory (e.g. Rohwedder et al. 2016). Usage of the second generation optogenetic tools allowed us to strongly reduce the applied light intensity. Now we use 220 lux (equal to 60 µW/cm<sup>2</sup>). Please note that all Gal4 and UAS controls in the manuscript are nonsignificant different from zero. The mild blue light stimulation therefore does not serve as a teaching signal and has neither an aversive nor an appetitive effect. Furthermore, we use this mild light intensity for several other behavioral paradigms (locomotion, feeding, naïve preferences) and have never seen an effect on the behavior.

      (3) Fig.2: Except for MB054B-Gal4 only the MB expression pattern is shown for other lines. Is there any additional expression in other cells of the brain? In the legend in line 761, the reporter does not show endogenous expression, rather it is a fluorescent reporter signal labeling the mushroom body.

      The lines were initially identified by a screen on larval MB neurons done together with Jim Truman, Marta Zlatic and Bertram Gerber. Here full brain scans were always analyzed. These images can be seen in Eschbach et al. 2020, extended figure 1. Neither in their evaluation nor in our anatomical evaluation (using a different protocol) additional expression in brain cells was detectable. We also modified the figure legend as suggested.

      (4) Fig.3: Precise n numbers per experiment should be stated in the figure legend.

      True, we now present n numbers per experiment whenever necessary.

      (5) Fig.4: Have the authors confirmed complete ablation of the targeted neuron using rpr/hid? Ablations can be highly incomplete depending on the onset and strength of Gal4 expression, leaving some functionality intact. While the ablation experiments are largely in line with the acute silencing of single DANs during high salt learning performed later on (Fig.7), there is potentially an interesting aspect of developmental compensation hidden in this data. Not a major point, but potentially interesting to check.

      We agree with this criticism. We have not tested if the expression of hid,rpr in DL1 DANs does really ablate them. Therefore we did an additional experiment to show that. The new data is now present as a supplemental figure (Figure 4- figure supplement 6). The result shows that expression of hid,rpr ablates also DL1 DANs similar to earlier experiments where we used the same effectors to ablate serotoniergic neurons (Huser et al., 2012, figure 5).

      (6) The performance index in Fig. 4 and 5 sometimes seems lower and the variability is higher than in some of the other experiments shown. Is this due to the high intrinsic variability of these particular experiments, or the background effects of the rpr/hid or splitGal4 lines?

      The general variability of these experiments is within the expected and known borders. In these kind of experiments there is always some variation due to several external factors (e.g. experimental time over the year). Therefore it is always important to measure controls and experimental animals at the same time. Of course that’s what we did and we only compare directly results of individual datasets. But not between different datasets. This is further hampered given that the experiments of Figure 4 (now Figure 4- figure supplement 1) and Figure 5 (now Figure 4) differ in several parameters from other learning experiments presented later in the text. Optogenetic activation uses blue light stimulation instead of “real world” high salt. Most often direct activation of specific DANs in the brain is more stable than the external high salt stimulation. Also optogenetic inactivation uses blue light stimulation and also retinal supplemented food. Both factors can affect the measurement. We thus want to argue that it is for each experiment most often the particular parameters that affect the variability of the results rather than background effects of the rpr/hid and split-Gal4 lines.

      (7) Fig.7: This is a neat experiment showing the effects of acute silencing of individual DL1 DANs. As silencing DAN-f1/g1 does not result in complete suppression of aversive learning, it would be highly interesting to test (or speculate about) additive or modulatory effects by the other DANs. Dan-c-1/d-1 also responds to high salt but does not show function on its own in these assays. I am aware that this is currently genetically not feasible. It would however be a nice future experiment.

      True, we were intensively screening for DL1 cluster specific driver lines that cover all 4 DL1 neurons or other combinations than the ones we tested. Unfortunately, we did not succeed in identifying them. Nevertheless, we will further screen new genetic resources (e.g. Meissner et al., 2024, bioRxiv) to expand our approach in future experiments. Please also see our comment on concern 1 of reviewer 1 for further technical limitations and biological questions that can also potentially explain the absence of complete suppression of high salt learning and memory. Some of these limitations are now also mentioned and discussed in the new version of the manuscript.

      (8) The discussion is excellent. I would just amend that it is likely that larval DAN-c1, which has high interconnectivity within the larval CNS, is likely integrating state-dependent network changes, similar to the role of some DANs in innate and state-dependent preference behavior. This might contribute to modulating learned behavior depending on the present (acute) and previous environmental conditions.

      Thanks a lot for bringing this up. We rewrote this part and added a discussion on recent work on DAN-c1 function in larvae as well as results on DAN function in innate and state-dependent preference behavior.

      (9) Citation in line 1115 missing access information: "Schnitzer M, Huang C, Luo J, Je Woo S, Roitman L, et al. 2023. Dopamine signals integrate innate and learned valences to regulate memory dynamics. Research Square".

      Unfortunately this escaped our notice. The paper is now published in Nature: Huang, C., Luo, J., Woo, S.J. et al. Dopamine-mediated interactions between short- and long-term memory dynamics. Nature 634, 1141–1149 (2024). https://doi.org/10.1038/s41586-024-07819-w. We have now changed the citation. The new citation includes the missing access information.

      Reviewer #3 (Recommendations For The Authors):

      Regarding my issue about salt specificity in the public review, I want to make clear that I do not suggest additional experiments, but to be very careful in phrasing the conclusions, in particular whenever referring to the experiments with optogenetic activation. This includes presenting these experiments as "(salt) substitution" experiments - inferring that the optogenetic activation would substitute for a natural salt punishment. As important and interesting as the experiments are, they simply do not allow such an interpretation at this point.

      Results, line 140ff: When presenting the results regarding TH-Gal4 crossed to ChR2-XXL, please cite Schroll et al. 2006 who demonstrated the same results for the first time.

      Thanks for mentioning this. We now cite Schroll et al. 2006 here in the text of the manuscript.

      Figure 3: The subfigure labels (ABC) are missing.

      Unfortunately this escaped our notice. Thanks a lot – we have now corrected this mistake.

      Figure 5: For I and L, it reads "salt replaced with fru", but the sketch on the left shows salt in the test. I assume that fructose was not actually present in the test, and therefore the figure can be misleading. I suggest separate sketches. Also, I and L are not mentioned in the figure legend.

      True, this is rather confusing. Based on the well taken concern we have changed the figure by adding a new and correct scheme for sugar reward learning that does not symbolize fructose during test.

      Figure S1: The experimental sketches for E,F and G,H seem to be mixed up.

      We thank the reviewer for bringing this up. In the new version we corrected this mistake.

      Figure S5: There are three sub-figures labelled with B. Please correct.

      Again, thanks a lot. We made the suggested correction in Figure S5.

      Discussion, line 353ff: this and the following sentences can be read as if the authors have discovered the DL-1 neurons as aversive teaching mediators in this study. However, Eschbach et al. 2020 already demonstrated very similar results regarding the optogenetic activation of single DL-1 DANs. I suggest to rephrase and cite Eschbach et al. 2020 at this point.

      That is correct. Our focus was on the gustatory pathway. The original discovery was made by Eschbach et al. We have now corrected this in the discussion and clarified our contribution. It was never our intention to hide this work, as the laboratory was also involved. Nevertheless, this is an annoying omission on our side.

      Line 385-387: this sentence is only correct with respect to Eschbach et al. 2020. Weiglein et al. 2021 used ChR2-XXL as an effector, but another training regimen.

      We understand this criticism. Therefore, we changed the sentence as suggested by the reviewer. See also our response on concern 15 of reviewer 1.

      Line 389ff: I do not understand this sentence. What is meant by persistent and current suppression of activity? If this refers to the behavioural experiments, it is misleading as in the hid, reaper experiments neurons are ablated and not suppressed in activity.

      We made the requested changes in the text. It is true that the ablation of a neuron throughout larval life is different from constantly blocking the output of a persisting neuron.

      Methods, line 615 ff: the performance index is said to be calculated as the difference between the two preferences, but the equation shows the average of the preferences.

      Thanks a lot. We are sorry for the confusion. We have carefully rewritten this part of the methods section to avoid any misunderstanding.

      When discussing the organization of the DL1 cluster, on several occasions I have the impression the authors use the terms "redundant" and "combinatorial" synonymously. I suggest to be more careful here. Redundancy implies that each DAN in principle can "do the job", whereas combinatorial coding implies that only a combination of DANs together can "do the job". If "the job" is establishing an aversive salt memory, the authors' results point to redundancy: no experimental manipulation totally abolished salt learning, implying that the non-manipulated neurons in each experiment sufficed to establish a memory; and several DANs, when individually activated, can establish an aversive memory, implying that each of them indeed can "do the job".

      Based on this concern we have rewritten the discussion as suggested to be more precise when talking about redundancy or combinatorial coding of the aversive teaching signal. Basically, we have removed all the combinatorial terms and replaced them by the term “redundancy”.

      The authors mix parametric and non-parametric statistical tests across the experiments dependent on whether the distribution of the data is normal or not. It would help readers if the authors would clearly state for which data which tests were used.

      We understand the criticism and now have added an additional supplemental file that includes all the information on the statistical tests applied and the distribution of the data.

    1. eLife Assessment

      This work presents a valuable approach based on a complex systems theoretical framework to characterize diet-host-microbe interactions and develop targeted bacteriotherapies through a three-phase workflow. Despite the partial support of the description and experimental setup of the 'complex systems theoretical approach,' the collected data are solid and advance our understanding of oxalate bacterial metabolism in microbial communities. This study will interest researchers working on gut microbiomes and the possible modulation of host-microbial interactions.

    2. Reviewer #2 (Public review):

      Summary:

      Using the well-studied oxalate-microbiome-host system, the authors propose a novel conceptual and experimental framework for developing targeted bacteriotherapies using a three-phase pre-clinical workflow. The third phase is based on a 'complex system theoretical approach' in which multi-omics technologies are combined in independent in vivo and in vitro models to successfully identify the most pertinent variables that influence specific phenotypes in diet-host-microbe systems. The innovation relies on the third phase since phase I and phase II are the dominant approaches everyone in the microbiome field uses.

      Strengths:

      The authors used a multidisciplinary approach which included i] fecal transplant of two distinct microbial communities into Swiss-Webster mice (SWM) to characterize the host response (hepatic response-transcriptomics) and microbial activity (untargeted metabolomics of the stool samples) to different oxalate concentrations; 2] longitudinal analysis of the N. albigulia gut microbiome composition in response to varying concentrations of oxalate by shotgun metagenomics, with deep bioinformatic analyses of the genomes assembled; and 3] development of synthetic microbial communities around oxalate metabolisms and evaluation of these communities' activity into oxalate degradation in vivo.

      Weaknesses:

      This study presents a valuable finding on the oxalate-microbiome-host system using a multitude of approaches. Although the multidisciplinary approach allows for a unique perspective on the system and more robust conclusions, it is challenging for any authors to present all the data clearly and systematically in a conclusive way-especially when introducing unfamiliar concepts such as a complex systems theoretical approach.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study experimentally examined diet-microbe-host interactions through a complex systems framework, centered on dietary oxalate. Multiple, independent molecular, animal, and in vitro experimental models were introduced into this research. The authors found that microbiome composition influenced multiple oxalate-microbe-host interfaces. Oxalobacter formigenes were only effective against a poor oxalate-degrading microbiota background and give critical new insights into why clinical intervention trials with this species exhibit variable outcomes. Data suggest that, while heterogeneity in the microbiome impacts multiple diet-host-microbe interfaces, metabolic redundancy among diverse microorganisms in specific diet-microbe axes is a critical variable that may impact the efficacy of bacteriotherapies, which can help guide patient and probiotic selection criteria in probiotic clinical trials.

      Thank you. The main message of this research, is that through complex modelling, we believe we have identified the critical variable (metabolic redundancy) that is responsible for the efficacy of probiotics designed to reduce oxalate levels, thus allowing for improved patient selection in clinical trials. We also believe that this process and the critical features identified can be translated to other critical microbial functions such as short chain fatty acid synthesis, secondary bile acid synthesis, and others.

      Strengths:

      The paper has made significant progress in both the depth and breadth of scientific research by systematically comparing multiple experimental methods across multiple dimensions. Particularly through in-depth analysis from the enzymatic perspective, it has not only successfully identified several key strains and redundant genes, which is of great significance for understanding the functions of enzymes, the characteristics of strains, and the mechanisms of genes in microbial communities, but also provided a valuable reference for subsequent experimental design and theoretical research.

      More importantly, the establishment of a novel research approach to probiotics and gut microbiota in this paper represents a major contribution to the current research field. The proposal of this new approach not only breaks through the limitations of traditional research but also offers new perspectives and strategies for the screening, optimization of probiotics, and the regulation of gut microbiota balance. This holds potential significant value for improving human health and the prevention and treatment of related diseases.

      Thank you for the comments. We believe that the approach taken here, which contrasts with conventional reductionist techniques, will be critical for translating gut microbiome research into actionable therapeutic approaches.

      Weaknesses:

      While the study has excellently examined the overall changes in microbial community structure and the functions of individual bacteria, it lacks a focused investigation on the metabolic cross-feeding relationships between oxalate-degrading bacteria and related microorganisms, failing to provide a foundational microbial community or model for future research. Although this paper conducts a detailed study on oxalate metabolism, it would be beneficial to visually present the enrichment of different microbial community structures in metabolic pathways using graphical models.

      Thank you for this critique.  In the current study, we broadly examined the response of the gut microbiota to dietary oxalate. Based on initial shotgun metagenomic results, we focused in on specific taxa and metabolic functions.  Through metagenomic and multiple culture-based studies, we quickly honed in on redundancy in oxalate-degrading function as a key feature for oxalate homeostasis. We believe that the defined microbial community we used for microbial transplants (particularly the taxonomic cohort) provides a strong, minimal community to explore oxalate homeostasis further. In fact, we are using this consortium in multiple follow-up studies to fully understand the cross-feeding that may occur among these microorganisms, as you suggest.  We note that figure 3 shows the change of species and metabolic pathways with oxalate exposure.   

      Furthermore, the authors have done a commendable job in studying the roles of key bacteria. If the interactions and effects of upstream and downstream metabolically related bacteria could be integrated, it would provide readers with even more meaningful information. By illustrating how these bacteria interact within the metabolic network, readers can gain a deeper understanding of the complex ecological and functional relationships within microbial communities. Such an integrated approach would not only enhance the scientific value of the study but also facilitate future research in this area.

      Thank you. We note that based on the collective data obtained in this study, that redundancy in the oxalate degradation is the critical feature that maintains oxalate homeostasis. However, we are interested potential metabolic interactions between microbes in our defined community and are currently investigating these interactions through extensive investigations.   

      Reviewer #2 (Public review):

      Summary:

      Using the well-studied oxalate-microbiome-host system, the authors propose a novel conceptual and experimental framework for developing targeted bacteriotherapies using a three-phase pre-clinical workflow. The third phase is based on a 'complex system theoretical approach' in which multi-omics technologies are combined in independent in vivo and in vitro models to successfully identify the most pertinent variables that influence specific phenotypes in diet-host-microbe systems. The innovation relies on the third phase since phase I and phase II are the dominant approaches everyone in the microbiome field uses.

      Thank you. As you note, the proposed phases I and II are the predominant approaches used. In fact, many clinical trials have been conducted to try and reduce urine oxalate in patients, based solely on mechanistic studies with Oxalobacter formigenes.  As noted in our manuscript, only 43% of those studies results in the intended outcome, necessitating the approach we took in the current study. Our results suggest that the reason for the high rate of failure, despite well established mechanisms, is due to insufficient patient selection that focused only on the presence or absence of O. formigenes, which is a species that exhibits very low prevalence and abundance in the human gut microbiota, normally.

      Strengths:

      The authors used a multidisciplinary approach which included:

      (1) fecal transplant of two distinct microbial communities into Swiss-Webster mice (SWM) to characterize the host response (hepatic response-transcriptomics) and microbial activity (untargeted metabolomics of the stool samples) to different oxalate concentrations;

      (2) longitudinal analysis of the N. albigulia gut microbiome composition in response to varying concentrations of oxalate by shotgun metagenomics, with deep bioinformatic analyses of the genomes assembled; and

      (3) development of synthetic microbial communities around oxalate metabolisms and evaluation of these communities' activity in oxalate degradation in vivo.

      Thank you for these comments.  In the complex modelling approach, we focused on complete microbiota from host species known to have high and low capacities for oxalate tolerance, combined with targeting specific metabolic functions vs. specific taxa that may include unknown functions important for oxalate metabolism.  Further, we examined the influence of our target communities on oxalate metabolism through multiple in vitro and in vivo studies.

      Weaknesses:

      However, I have concerns about the frame the authors tried to provide for a 'complex system theoretical approach' and how the data are interpreted within this frame. Several of the conclusions the authors provide do not seem to have sufficient data to support them.

      Thank you.  We have tried to address these concerns by adding an exhaustive figure that broadly represents our complex modelling approach that includes potential complex system-based hypotheses, how they were tested, and the host-microbiome-oxalate interactions found in our study.

      Recommendations for the authors:  

      Reviewer #2 (Recommendations for the authors):

      Major Concerns

      (1) The authors argue about the importance of bringing 'Complex System Theory' to the microbiome field systematically and consistently. However, the authors fail to introduce this theory throughout the entire manuscript. For example, the authors tried to describe key elements and their nomenclature, such as nodes and fractal layers, in the first part of the result section. But the description is wordy and not precise. It would be more useful if the authors connected the model description with a visual representation, such as a figure. Unfortunately, these elements are not emphasizing and carried across the results section and are not mentioned in the discussion section.

      We have now added a figure (Figure 7) that details this process extensively and ties each of our findings to the complex system model and nomenclature.  We have also reiterated how our results fit in the complex system model in the discussion.

      In addition, there is no straightforward approach to integrating multi-omics datasets to identify the variables that are determinants of the system. For example, Figure 1 focuses on the impact of the host, hepatic activity, to oxalate exposure on fecal transplants into Swiss Webster mice; Figure 2 focuses on the effects of oxalate exposure on stool metabolic activity, not only microbial metabolic activity, on fecal transplants into Swiss Webster mice; and Figure 3 focuses on microbiome responses to different oxalate concentration in Neotoma albigula. There is no "model" to really integrate the host, the microbiome activity, and the microbiome composition information. And, unfortunately, the data generated between experiments cannot directly integrate; see major concern # 2.

      Thank you.  We have made more clear the experimental approach and how it applied to understanding the critical factors that maintain oxalate homeostasis.  Specifically, Figure 1 established that the effect of oxalate on the host was dependent on the microbiota, rather than host genetics.  Figure 2 established the effect of oxalate on the gut microbiota was again dependent on the whole gut microbiota and that these oxalate-microbe effects also influenced oxalate-host effects through a direct multi-omic data integration.  Once we established that the oxalate effects on host and microbiota were dependent on the whole microbiota composition, Figure 3 then sought to figure out how oxalate impacted the gut microbiota, using our model of high oxalate tolerance (N. albigula). With the finding in Figure 3 that there were multiple genes attributed to the degradation of oxalate, or acetogenic, methanogenic, and sulfate reducing pathways, Figure 4 and relevant supplemental figures sought to quantify the redundancy of these pathways.  After establishing a very high degree of redundancy, we sought to use a culturomic approach to determine what environmental factors impacted oxalate metabolism and to evaluate oxalate metabolism using our defined, hypothesized communities of microorganisms.  Finally, figure 6 sought to validate our metagenomic, metabolomic, and culturomic results from multiple animal and in vitro models using targeted microbial transplants in mice.  While we did have some direct multi-omic data integration (Figures 2 and 3), the process employed here sought to systematically determine which factors were most important for the oxalate-microbiota-host relationship, and then to use those results to design the subsequent experiments.  We have added this description to the discussion, which helps to contextualize the complex system modelling approach we took here.

      Finally, the authors did not provide a novel variable that successfully influences oxalate degradation in the oxalate-microbiome-host system. The authors argue that "both resource availability and community composition impact oxalate metabolism," which we currently inferred by the failure of the clinical tries and do not provide a clear intervention strategy to develop functional bacteriotherapy. The identification of composition as an important variable that was predictable without any multi-omics approach was highlighted by the development of synthetic microbial communities. Synthetic microbial communities are critical to characterizing complex microbiomes. Still, the authors did not explain how this strategy can be used in their theoretical framework (that is their goal), and these communities are not well introduced across the manuscript; see major concern # 4.

      As stated, it is clear from the failed clinical trials that we do not fully understand what microbial features dictate oxalate homeostasis.  We have specifically identified, through fecal transplant studies, that microbial composition is critical for oxalate homeostasis and that diverse oxalate-degrading bacteria exist.  However, ours is the first study that explicitly shows that it is this diversity that controls oxalate homeostasis.  This is specifically ascertained through the targeted microbial transplants in mice whereby O. formigenes was given alone or with different combinations of other microorganisms.  In other words, we were able to replicate both successful and failed studies by manipulating which specific species were introduced into animals.  This is unprecedented in the literature.

      (2) The authors provide several conclusions that are not completely supported by the data available. For example:

      (a) Lines 236-239: "Within the framework of complex systems, results show microbe-host cooperation whereby oxalate effectively processed within the SW-NALB gut microbiota reduced overall liver activity, indicative of a beneficial impact." - The authors did not provide data related to oxalate levels of oxalate processing for this dataset.

      While we did not specifically quantify oxalate degradation for this specific study, as cited in the text when describing this Swiss-Webster, Neotoma albigula system, we have previously published multiple animal studies explicitly showing that the N. albigula animals were highly effective oxalate degraders, which is transferable to Swiss-Webster mice through fecal transplants. Since the gut microbiota’s impact on oxalate has been welll established through experiments by our group, the purpose of these specific experiments were to look the other way and examine the effect of oxalate on the gut microbiota of these two animal models.  In the referenced text, we again cited our studies showing that the SW-NALB system effectively degrades oxalate.

      (b) Lines 239-243: "Data also suggest that both the gut microbiota and the immune system are involved in oxalate remediation (redundancy), such that if oxalate cannot be neutralized in the gut microbiota or liver, then the molecule will be processed through host immune response mechanisms (fractality), in this case indicated through an overall increase in hepatic activity and specifically in mitochondrial activity." - The authors did not provide any evidence related to the immune system and oxalate metabolism.

      We corrected that statement as follows: “…in this case indicated through an overall increase in inflammatory cytokines with oxalate exposure combined with an ineffective oxalate-degrading microbiota (Figures S6a,b; S9a,b).”  In other words, if the liver and gut microbiota can’t eliminate a toxin, then the immune system must deal with it through inflammatory pathways.  Oxalate is a well established, pro-inflammatory compound.  Our data show that this is dependent on the gut microbiota.

      (c) Lines 250-252: "Following the diet trial, colon stool was collected post-necropsy and processed for untargeted metabolomics, which is a measure of total microbial metabolic output." - Although most metabolites in stool samples are indeed microbial, there are also host metabolites. So, it is not technically correct to relate the metabolomic analysis of stool samples to only microbial metabolic analysis. In addition, the authors discussed compounds such as alkaloids and cholesterol as microbial metabolites, which these compounds are more related to the diet and host correspondingly.

      We have corrected this to state: “total metabolites present in stool from the diet, microbial activity, and host activity”

      (d) Lines 270-273. "Specifically, the SW-NALB mice exhibit hallmarks of homeostatic feedback with oxalate exposure to maintain a consistent metabolic output, defined by the relatively small, net negative, microbial metabolite-hepatic gene network compared to the large, net positive, network of SW-SW mice." - How do the authors define oxalate homeostasis? In addition, do the authors imply feedback between the liver and the microbiome in which the microbiome responds to a liver response related to oxalate levels? Or could the observation in Figure 1 be explained just by microbial consumption of oxalate that would reduce the impact of oxalate that arrives at the liver?

      Oxalate homeostasis is defined in that sentence: “relatively small, net negative, microbial metabolite-hepatic gene network compared to the large, net positive, network of SW-SW mice” – in other words, for SW-NALB mice, oxalate did not produce a considerable change to either microbial or hepatic metabolic activity.  We did not really test the liver impact on gut microbiota and can’t speak to that.  We believe, based on Figure 2 data, that it is not just the degradation of oxalate that explains the lack of change in hepatic activity in SW-NALB mice, rather that the oxalate-induced shift in the gut microbiota metabolic activity broadly altered hepatic activity, as inferred from Figure 2 c.  We made this more clear in the results: “suggests that the oxalate-induced change in microbial metabolism is responsible for the change in hepatic activity”.

      (e) Lines 297-301: "The oxalate-dependent metagenomic divergence of the NALB gut microbiota (Figure 3), combined with the lack of change in the microbial metabolomic profile with oxalate exposure (Figure 2), suggest that oxalate stimulates taxonomically diverse, but metabolically redundant microorganisms, in support of maintaining homeostasis." - The authors cannot conclude anything related between taxonomic changes and microbial activity since the taxonomic data presented is for microbial enrichment in N. albigulia, and the "microbial activity data" is from the fecal transplantation experiment in SWM. These are two completely different systems with two completely different experimental designs.

      We have shown very similar results in that oxalate induces the taxonomic divergence for the NALB gut microbiota, in multiple previous studies.  The experiment in which a minimal, positive increase in microbial metabolites, was saw with oxalate was based on the SW-NALB model whereby Swiss-Webster mice have an NALB microbiota.  We show throughout the manuscript, that the impact of oxalate is very microbiota dependent and supports our claim.  However, the claim is hypothesis generating – that metabolic redundancy is important for oxalate homeostasis.  We modified our statement to make all of this more clear.   

      Related to microbial composition, the authors did not show data validating the efficiency of the fecal transplantations (allograft or xenograft) in the SWM after antibiotic treatment. They also did not show evidence of microbial composition dynamics in response to oxalate exposure.

      Again, the efficacy of fecal transplants, used in the way they were here, has been shown in multiple past studies of our group.  In past studies, we have extensively characterized the microbiota from fecal transplants and which taxa were associated with oxalate levels.  Therefore, that topic was not the focus of the current study, instead focusing on the oxalate impact on gut microbiota activity.  Our past studies, referenced multiple times through the current manuscript, were used in large part to help determine which microbes to include in our taxonomic cohort, as described in the manuscript.

      (f) Lines 301-303: "Given that data came from the same hosts sampled longitudinally, these data also reflect a microbiota that is adaptive to oxalate exposure, which is another important characteristic of complex systems." - In their dataset, what is the evidence that the microbiota of N. albigulia is adapted to oxalate exposure? Is the increase in genomes with pathways related to oxalate metabolism related to an increase of oxalate in the diet? If so, does the microbiota exposure with a higher oxalate concentration decrease the systemic level of oxalate? In neither of the experiments related to Figures 1 to 3, the authors showed a correlation of systemic oxalate levels with microbial composition, hepatic host response, or stool metabolism.

      Figure 3 explicitly shows the longitudinal impact of increasing levels of oxalate showing an increase in oxalate degrading genes (Figure 3d). The specific samples selected for analysis here come from a previous study in which we explicitly quantified changes to the gut microbiota composition and both stool and urine oxalate for every time point listed in figure 3a.  This information is explicitly stated in the methods coupled with the fact that “neither fecal nor urinary oxalate levels increased significantly.”  Again, the effect of the gut microbiota on oxalate in these model systems have been extensively studied by our group and provide the foundation for the current study to look at the effect of oxalate on the gut microbiota and host.

      Considering my last two points, the authors do not present substantial evidence to support their hypothesis that oxalate stimulates taxonomically diverse, metabolically redundant communities.

      As stated above, that oxalate stimulates taxonomically diverse taxa was ascertained through multiple past studies, as well as the current study (Figure 3e).  The metabolically redundant part is ascertained both through untargeted metabolomics (Figure 2a,b) and shotgun metagenomics (Figure 3c,d).  Further evidence for the metabolic redundancy with oxalate comes from our culturomic approach, which showed that 14.58% of isolates could grow on oxalate as a carbon and energy source, in addition to the high proportion of isolates that could grow on other carbon and energy sources, at least much more than can be ascribed to a single species  (Figure 5c).  We made this more clear in the discussion.

      (g) Lines 330-335. "Additionally, the broad diversity of species that contain oxalate-related genes suggests that the distribution of metabolic genes is somewhat independent of the distribution of microbial species, which suggests that microbial genes exist in an autonomous fractal layer, to some degree. This hypothesis is supported by studies which show a high degree of horizontal gene transfer within the gut microbiota as a means of adaptation." - This conclusion is highly speculative, especially since the author did not do any analysis to directly evaluate a relationship between the oxalate metabolic pathways and the microbial species where these pathways are present.

      Figure 3c,d,e explicitly shows the metabolic pathways and species enriched by oxalate exposure.  Figure 4d, generated using the same data from Figure 3, explicitly shows the taxa that harbor oxalate-degrading genes.   

      (h) Lines 364-366. "Collectively, data show that both resource availability and community composition impacts oxalate metabolism, which helps to define the adaptive nature of the NALB gut microbiota." - The authors indeed showed evidence that community composition impacts oxalate metabolism. However, the authors did not show any evidence to directly evaluate the resource availability to impact oxalate metabolism.

      This is explicitly shown through in vitro community-based and single species assays varying multiple different carbon and energy sources to quantify changes to oxalate degradation (chosen based on shotgun metagenomic results; Figure 5a,b).

      (3) Lines 321-325. "Acetogenic genes were also present in 97.18% of genomes, dominated by acetate kinase and formate-tetrahydrofolate ligase (Figure S3A323C). Methanogenic genes were present in 100% of genomes, dominated by phosphoserine phosphatase, atpdependent 6-phosphofructokinase, and phosphate acetyltransferase (Figure S4A-C)." - The authors spent much time analyzing the adjacent pathways related to oxalate and oxalaterelated products of oxalate metabolism. However, my understanding is that the genes used to analyze these pathways (formate metabolism, acetogenesis, methanogenesis), such as the ones named above, are not unique/specific for those pathways but participate in other "housekeeping" pathways. What is the relevance of these analyses when those genes are not unique/specific to the function/pathways that the authors describe? If I infer correctly, these bioinformatic analyses aim to evaluate the hypothesis of whether oxalate metabolism could be a social/cooperation metabolism and whether other species could participate in the metabolism of oxalate subproducts. However, these analyses did not explicitly evaluate this hypothesis.

      The reviewer is correct in that we aimed to evaluate the potential that oxalate metabolism could benefit from metabolic cooperation.  The specific genes chosen for this analysis were those explicitly listed in the target metabolic pathways in KEGG, as described.  However, while the analyses do show the strong potential that the CO2 and formate produced from oxalate degradation could be used in these other pathways, as intended, the genes can be used in other metabolic pathways.  We did, however, explicitly test the hypothesis that formate, produced from oxalate degradation, could be utilized by the gut microbiota.  While the targeted transplants with the taxonomic cohort did not clearly show the use of formate in this way, those from the metabolic cohort did (Figures 6d and S8d).  This question is still in ongoing investigations in our group.  

      We have made it more clear that our genome analyses provide the potential for metabolic redundancy rather than definitive proof for metabolic redundancy, which was evaluated more extensively in other experiments from this study.

      (a) Lines 481-484. "Collectively, data offer strong support for the hypothesis that metabolic redundancy among diverse taxa, is the primary driver of oxalate homeostasis, rather than metabolic cooperation in which the by-products of oxalate degradation are used in downstream pathways such as acetogenesis, methanogenesis, and sulfate reduction." - Although the authors recognize that their data about the metabolic cooperation hypothesis is inconclusive, they never tested the hypothesis related to metabolic cooperation, as mentioned above. This is highly speculative.

      As stated above, the targeted microbial transplants to animals and in vitro studies (Figure 5e,f) did explicitly test the cooperation hypothesis, but it the results did not support it and instead pointed much more strongly to metabolic redundancy.    

      (4) Lines 355-359. "Cohorts, defined in the STAR methods, were used to delineate hypotheses that either carbon and energy substrates are sufficient to explain known effects of the oxalate-degrading microbial network or that additional aspects of taxa commonly stimulated by dietary oxalate are required to explain past results (taxa defined through previous meta-analysis of studies)." - The definition of the metabolic cohorts and the taxonomic cohorts should not be hidden in the material and methods section. It should be explicit and clearly explained in the main text. Related, the table presented in Figure 5D is exceptionally confusing and does not help to understand and differentiate between the metabolic and the taxonomic cohorts. The authors need to explicitly identify the synthetic communities used in each cohort and each group by their members and their characteristics in supplementary tables.

      In the sentences before those referenced, we state: “Culturomic data recapitulates molecular data to show a considerable amount of redundancy surrounding oxalate metabolism (Fig. 5C). Isolates generated from this assay were used for subsequent study (metabolic cohort; Figure 5D). Additionally, a second cohort was defined and commercially purchased based both on known metabolic functions and the proportion of studies that saw an increase in their taxonomic population with oxalate consumption (Fig. 5D; taxonomic cohort). Where possible, isolates from human sources were obtained.”  Figure 5d explicitly shows the specific species used in each cohort along with the groups they were in for transplant studies, the explicit metabolic pathways we were targeting, along with the % of studies that these species were associated with oxalate metabolism.  All of this information is both in the main text of the results and in the figure legends.  It is not hidden in the methods, but the methods do reiterate what was also placed in the results.   

      In Figures 5 and 6, the authors used the following groups with the corresponding nomenclature: 'Group 1, No_bact; Group 2, Ox; Group 3, Ox_form; Group 4, All; Group 5, No_ox'. Although the information related to these groups is present in the material and method section in lines 1139-1143, the authors also need to explicitly explain the groups and their nomenclature in the main text.

      Since this information is explicitly and succinctly given in the referenced figures, I believe that adding the same information in the text would be too redundant.

      Related to the development of the synthetic communities. How did the authors prepare the synthetic communities or 'cohort' for the in vitro experiments? 

      We added more information for the preparation of microbes and execution of the in vitro assays, as needed.  

      Also, it is unclear in the material and method section how the metabolic profile of each isolated was evaluated (Figure 5C). Related to the bacteria isolated from the culturomic assays, including Figure 5C and metabolic cohort, the authors indeed reported the isolation methodology in lines 1262-1275. However, there is no information about the sequencing of these isolates. The authors should present these isolates as a list (supplementary table) with their names, taxonomy, metabolic profile, and Genome ID if these genomes were submitted to NCBI.

      We added additional information for how metabolic cohort isolates were chosen and how they were taxonomically identified.  The taxonomy and substrate utilization of isolates are in Figure 5D.  We did not sequence the genomes of metabolic cohort bacteria.  However, the ATCC isolates, which comprise the taxonomic cohort, are publicly available.

      The author presented the 248 metagenomics assembles in Figure S1 in a circular chart in context with other genomes. However, the metagenomic assembles should be presented in a table form, with their name, taxonomy, coverage, completeness, and Genome ID, if these genomes were submitted to NCBI.

      The information for the genomes submitted to the NCBI is provided in the data availability statement.  However, we added a table (Table S9) that includes the requested information.   

      (5) Lines 371-3374: "To delineate hypotheses of metabolic redundancy or cooperation for mitigating the negative effects of oxalate on the gut microbiota and host, two independent diet trials were conducted with analogous microbial communities derived from the metabolic and taxonomic cohorts". 

      Lines 494-496: "we and others have found that oxalate can differentially exhibit positive or negative effects on microbial growth and metabolism dependent on the species and environment present" - What is the evidence that oxalate has a negative effect on the gut microbiota? The authors clearly showed the negative effect of oxalate on the host. Although there are reports in the literature of oxalate consumers with a negative effect on the microbiome, such as Lactobacilli and Bifidobacteria, there is no evidence in this manuscript about a negative effect of oxalate on the microbiome, and there is not an experimental design to evaluate it.

      These data are presented in Figure 2A and B.  As stated, oxalate led to a net reduction in total microbial metabolites produced of 34 metabolites, with a significant shift in overall metabolome, indicative of metabolic inhibition.  This is in comparison to the net gain of 9 metabolites, with no significant shift overall,  in the mice with the NALB microbiota.  The positive and negative effects of oxalate on the whole gut microbiota here are bolstered by previous studies on the effect of oxalate on pure cultures as discussed and cited on line 623624.

      (6) Related to the last section, it is hard to really compare the results of the taxonomic cohort versus the metabolic cohort when the data of one cohort is in the main figure and the other in a supplementary figure. In addition, all the comparisons between the two cohorts seem to be qualitative. For any comparisons, the authors need to do a statistical comparison between the groups of the two cohorts.

      The comparison of the two sets of data are indeed qualitative.  This is because these mouse models were run in separate experiments to test separate hypotheses (whether utilization of specific substrates is enough to improve oxalate metabolism or if specific taxa previously responsive to dietary oxalate was better, which is stated in the manuscript).  Given that these experimental models were tested separately, it would not be statistically valid to do a direct statistical comparison, even though the experimental procedures were the same and the only difference were the transplanted bacteria.  The separation of the experiments into a main and supplemental figure was done out of necessity given the very large amount of data and many experimental mouse models that were run in this study overall.   

      Minor Comments.

      (1) The authors should define 'antinutrients'. This term is not a familiar concept and could create confusion.

      This is defined in line 104 “molecules produced in plants to deter herbivory, disrupt homeostasis by targeting the function of the microbiome, host, or both”

      (2) The authors should explicitly describe the N. albigulia, aka White-throated woodrat system, as early as possible in the result section.

      We added some statements about the Swiss webster and N. albigula gut microbiota as poor and effective oxalate degraders in the second section of the results.

      (3) SW-SW mice exhibited an oxalate-dependent alteration of 219 hepatic genes, with a net increase in activity. In comparison, the SW-NALB mice exhibited an oxalate-dependent alteration of 21 genes with a net decrease in activity. However, the visual representation of the PCoA in Figure 1B showed that the most different samples are the SW-NALB 0% and 1.5%. Could you please explain this difference?

      In Figure 1b, the SW-NALB data are represented by the blue and black data points, which directly overlap with each other.  The SW-SW data are the orange and purple data points, which exhibit very little overlap.  

      (4) Is Table S7 the same as Table S6? If not, there is a missing supplementary table.

      These tables are different.  We ensured that both are present.

      (5) How did the authors test bacterial growth in in vivo studies (Figure 5B)?

      We added a statement to the culturomic section of the methods – we used media with or without oxalate and quantified colony-forming units.

      (6) A section of 16S rRNA metagenomics in the material and method section is not used across the main manuscript.

      These data are presented in figures S7 and S10, as stated in the results.  We added statements in the results to clarify that these figures show the 16S sequencing data.

      (7) Lines 506-511: "Collectively, data from the current and previous studies on the effect of oxalate exposure on the gut microbiota support the hypothesis that the gut microbiota serves as an adaptive organ in which specific, metabolically redundant microbes respond to and eliminate dietary components, for the benefit of themselves, but which can residually protect or harm host health depending on the dietary molecules and gut microbiota composition." - What is the benefit to bacteria in eliminating oxalate? This is highly speculative to this system.

      The benefit to bacteria is stated earlier in that paragraph – “In the current (Figs. 2B, 5B) and previous studies(33,34,64,65), we and others have found that oxalate can differentially exhibit positive or negative effects on microbial growth and metabolism dependent on the species and environment present.”

      (8) Lines 504 -506: "Importantly, the near-universal presence of formate metabolism genes suggest that formate may be an even greater source of ecological pressure (Figures S2-S5)."

      - Formate is primarily produced by fermentative anaerobic bacteria, such as Bacteroides, Clostridia, and certain species of Escherichia coli, since formate would be present in anaerobic communities independently of oxalate. How is formate an even greater source of ecological pressure?

      We added a statement about the toxicity of formate to both bacteria and mammalian hosts.

    1. eLife Assessment

      This intracranial EEG study presents important and convincing neural evidence supporting the high spatial specificity (receptive field) of visually driven alpha-band oscillation in human brains and its potential role in exogenous cuing attention. The work challenges the predominant view about the role of alpha-band oscillation in visual attention and advocates that stimulus-driven alpha suppression is precisely tuned and might contribute to exogenous spatial attention.

    2. Reviewer #1 (Public Review):

      In this study, the authors build upon previous research that utilized non-invasive EEG and MEG by analyzing intracranial human ECoG data with high spatial resolution. They employed a receptive field mapping task to infer the retinotopic organization of the human visual system. The results present compelling evidence that the spatial distribution of human alpha oscillations is highly specific and functionally relevant, as it provides information about the position of a stimulus within the visual field.

      Using state-of-the-art modeling approaches, the authors not only strengthen the existing evidence for the spatial specificity of the human dominant rhythm but also provide new quantification of its functional utility, specifically in terms of the size of the receptive field relative to the one estimated based on broad band activity.

    3. Reviewer #2 (Public Review):

      Summary:

      In this work, Yuasa et al. aimed to study the spatial resolution of modulations in alpha frequency oscillations (~10Hz) within the human occipital lobe. Specifically, the authors examined the receptive field (RF) tuning properties of alpha oscillations, using retinotopic mapping and invasive electroencephalogram (iEEG) recordings. The authors employ established approaches for population RF mapping, together with a careful approach to isolating and dissociating overlapping, but distinct, activities in the frequency domain. Whereby, the authors dissociate genuine changes in alpha oscillation amplitude from other superimposed changes occurring over a broadband range of the power spectrum. Together, the authors used this approach to test how spatially tuned estimated RFs were when based on alpha range activity, vs. broadband activities (focused on 70-180Hz). Consistent with a large body of work, the authors report clear evidence of spatially precise RFs based on changes in alpha range activity. However, the size of these RFs were far larger than those reliably estimated using broadband range activity at the same recording site. Overall, the work reflects a rigorous approach to a previously examined question, for which improved characterization leads to improved consistency in findings and some advance of prior work.

      Strengths:

      Overall, the authors take a careful and well-motivated approach to data analyses. The authors successfully test a clear question with a rigorous approach and provide strong supportive findings. Firstly, well-established methods are used for modeling population RFs. Secondly, the authors employ contemporary methods for dissociating unique changes in alpha power from superimposed and concomitant broadband frequency range changes. This is an important confound in estimating changes in alpha power not employed in prior studies. The authors show this approach produces more consistent and robust findings than standard band-filtering approaches. As noted below, this approach may also account for more subtle differences when compared to prior work studying similar effects.

      Original Weaknesses:

      - Theoretical framing: The authors frame their study as testing between two alternative views on the organization, and putative functions, of occipital alpha oscillations: i) alpha oscillation amplitude reflects broad shifts in arousal state, with large spatial coherence and uniformity across cortex; ii) alpha oscillation amplitude reflects more specific perceptual processes and can be modulated at local spatial scales. However, in the introduction this framing seems mostly focused on comparing some of the first observations of alpha with more contemporary observations. Therefore, I read their introduction to more reflect the progress in studying alpha oscillations from Berger's initial observations to the present. I am not aware of a modern alternative in the literature that posits alpha to lack spatially specific modulations. I also note this framing isn't particularly returned to in the discussion. A second important variable here is the spatial scale of measurement. It follows that EEG based studies will capture changes in alpha activity up to the limits of spatial resolution of the method (i.e. limited in ability to map RFs). This methodological distinction isn't as clearly mentioned in the introduction, but is part of the author's motivation. Finally, as noted below, there are several studies in the literature specifically addressing the authors question, but they are not discussed in the introduction.

      - Prior studies: There are important findings in the literature preceding the author's work that are not sufficiently highlighted or cited. In general terms, the spatio-temporal properties of the EEG/iEEG spectrum are well known (i.e. that changes in high frequency activity are more focal than changes in lower frequencies). Therefore, the observations of spatially larger RFs for alpha activities is highly predicted. Specifically, prior work has examined the impact of using different frequency ranges to estimate RF properties, for example ECoG studies in the macaque by Takura et al. NeuroImage (2016) [PubMed: 26363347], as well as prior ECoG work by the author's team of collaborators (Harvey et al., NeuroImage (2013) [PubMed: 23085107]), as well as more recent findings from other groups (Luo et al., (2022) BioRxiv: https://doi.org/10.1101/2022.08.28.505627). Also, a related literature exists for invasively examining RF mapping in the time-voltage domain, which provides some insight into the author's findings (as this signal will be dominated by low-frequency effects). The authors should provide a more modern framing of our current understanding of the spatial organization of the EEG/iEEG spectrum, including prior studies examining these properties within the context of visual cortex and RF mapping. Finally, I do note that the author's approach to these questions do reflect an important test of prior findings, via an improved approach to RF characterization and iEEG frequency isolation, which suggests some important differences with prior work.

      - Statistical testing: The authors employ many important controls in their processing of data. However, for many results there is only a qualitative description or summary metric. It appears very little statistical testing was performed to establish reported differences. Related to this point, the iEEG data is highly nested, with multiple electrodes (observations) coming from each subject, how was this nesting addressed to avoid bias?

      [Editors' note: the authors have addressed the original concerns.]

    4. Reviewer #3 (Public Review):

      Summary:

      This study tackles the important subject of sensory driven suppression of alpha oscillations using a unique intracranial dataset in human patients. Using a model-based approach to separate changes in alpha oscillations from broadband power changes, the authors try to demonstrate that alpha suppression is spatially tuned, with similar center location as high broadband power changes, but much larger receptive field. They also point to interesting differences between low-order (V1-V3) and higher-order (dorsolateral) visual cortex. While I find some of the methodology convincing, I also find significant parts of the data analysis, statistics and their presentation incomplete. Thus, I find that some of the main claims are not sufficiently supported. If these aspects could be improved upon, this study could potentially serve as an important contribution to the literature with implications for invasive and non-invasive electrophysiological studies in humans.

      Strengths:

      The study utilizes a unique dataset (ECOG & high-density ECOG) to elucidate an important phenomenon of visually driven alpha suppression. The central question is important and the general approach is sound. The manuscript is clearly written and the methods are generally described transparently (and with reference to the corresponding code used to generate them). The model-based approach for separating alpha from broadband power changes is especially convincing and well-motivated. The link to exogenous attention behavioral findings (figure 8) is also very interesting. Overall, the main claims are potentially important, but they need to be further substantiated (see weaknesses).

      Original Weaknesses:

      I have three major concerns:

      (1) Low N / no single subject results/statistics: The crucial results of Figure 4,5 hang on 53 electrodes from four patients (Table 2). Almost half of these electrodes (25/53) are from a single subject. Data and statistical analysis seem to just pool all electrodes, as if these were statistically independent, and without taking into account subject-specific variability. The mean effect per each patient was not described in text or presented in figures. Therefore, it is impossible to know if the results could be skewed by a single unrepresentative patient. This is crucial for readers to be able to assess the robustness of the results. N of subjects should also be explicitly specified next to each result.

      (2) Separation between V1-V3 and dorsolateral electrodes: Out of 53 electrodes, 27 were doubly assigned as both V1-V3 and dorsolateral (Table 2, Figures 4,5). That means that out of 35 V1-V3 electrodes, 27 might actually be dorsolateral. This problem is exasperated by the low N. for example all the 20 electrodes in patient 8 assigned as V1-V3 might as well be dorsolateral. This double assignment didn't make sense to me and I wasn't convinced by the authors' reasoning. I think it needlessly inflates the N for comparing the two groups and casts doubts on the robustness of these analyses.

      (3) Alpha pRFs are larger than broadband pRFs: first, as broadband pRF models were on average better fit to the data than alpha pRF models (dark bars in Supp Fig 3. Top row), I wonder if this could entirely explain the larger Alpha pRF (i.e. worse fits lead to larger pRFs). There was no anlaysis to rule out this possibility. Second, examining closely the entire 2.4 section there wasn't any formal statistical test to back up any of the claims (not a single p-value is mentioned). It is crucial in my opinion to support each of the main claims of the paper with formal statistical testing.

      [Editors' note: the authors have addressed the original concerns.]

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary

      In this study, the authors build upon previous research that utilized non-invasive EEG and MEG by analyzing intracranial human ECoG data with high spatial resolution. They employed a receptive field mapping task to infer the retinotopic organization of the human visual system. The results present compelling evidence that the spatial distribution of human alpha oscillations is highly specific and functionally relevant, as it provides information about the position of a stimulus within the visual field.

      Using state-of-the-art modeling approaches, the authors not only strengthen the existing evidence for the spatial specificity of the human dominant rhythm but also provide new quantification of its functional utility, specifically in terms of the size of the receptive field relative to the one estimated based on broad band activity.

      We thank the reviewer for their positive summary.

      Weakness 1.1

      The present manuscript currently omits the complementary view that the retinotopic map of the visual system might be related to eye movement control. Previous research in non-human primates using microelectrode stimulation has clearly shown that neuronal circuits in the visual system possess motor properties (e.g. Schiller and Styker 1972, Schiller and Tehovnik 2001). More recent work utilizing Utah arrays, receptive field mapping, and electrical stimulation further supports this perspective, demonstrating that the retinotopic map functions as a motor map. In other words, neurons within a specific area responding to a particular stimulus location also trigger eye movements towards that location when electrically stimulated (e.g. Chen et al. 2020).

      Similarly, recent studies in humans have established a link between the retinotopic variation of human alpha oscillations and eye movements (e.g., Quax et al. 2019, Popov et al. 2021, Celli et al. 2022, Liu et al. 2023, Popov et al. 2023). Therefore, it would be valuable to discuss and acknowledge this complementary perspective on the functional relevance of the presented evidence in the discussion section.

      The reviewer notes that we do not discuss the oculomotor system and alpha oscillations. We agree that the literature relating eye movements and alpha oscillations are relevant.

      At the Reviewer’s suggestion, we added a paragraph on this topic to the first section of the Discussion (section 3.1, “Other studies have proposed … “).

      Reviewer #2 (Public Review):

      Summary:

      In this work, Yuasa et al. aimed to study the spatial resolution of modulations in alpha frequency oscillations (~10Hz) within the human occipital lobe. Specifically, the authors examined the receptive field (RF) tuning properties of alpha oscillations, using retinotopic mapping and invasive electroencephalogram (iEEG) recordings. The authors employ established approaches for population RF mapping, together with a careful approach to isolating and dissociating overlapping, but distinct, activities in the frequency domain. Whereby, the authors dissociate genuine changes in alpha oscillation amplitude from other superimposed changes occurring over a broadband range of the power spectrum. Together, the authors used this approach to test how spatially tuned estimated RFs were when based on alpha range activity, vs. broadband activities (focused on 70-180Hz). Consistent with a large body of work, the authors report clear evidence of spatially precise RFs based on changes in alpha range activity. However, the size of these RFs were far larger than those reliably estimated using broadband range activity at the same recording site. Overall, the work reflects a rigorous approach to a previously examined question, for which improved characterization leads to improved consistency in findings and some advance of prior work.

      We thank the reviewer for the summary.

      Strengths:

      Overall, the authors take a careful and well-motivated approach to data analyses. The authors successfully test a clear question with a rigorous approach and provide strong supportive findings. Firstly, well-established methods are used for modeling population RFs. Secondly, the authors employ contemporary methods for dissociating unique changes in alpha power from superimposed and concomitant broadband frequency range changes. This is an important confound in estimating changes in alpha power not employed in prior studies. The authors show this approach produces more consistent and robust findings than standard band-filtering approaches. As noted below, this approach may also account for more subtle differences when compared to prior work studying similar effects.

      We thank the reviewer for the positive comments.

      Weaknesses:

      Weakness 2.1 Theoretical framing:

      The authors frame their study as testing between two alternative views on the organization, and putative functions, of occipital alpha oscillations: i) alpha oscillation amplitude reflects broad shifts in arousal state, with large spatial coherence and uniformity across cortex; ii) alpha oscillation amplitude reflects more specific perceptual processes and can be modulated at local spatial scales. However, in the introduction this framing seems mostly focused on comparing some of the first observations of alpha with more contemporary observations. Therefore, I read their introduction to more reflect the progress in studying alpha oscillations from Berger's initial observations to the present. I am not aware of a modern alternative in the literature that posits alpha to lack spatially specific modulations. I also note this framing isn't particularly returned to in the discussion.

      This was helpful feedback. We have rewritten nearly the entire Introduction to frame the study differently. The emphasis is now on the fact that several intracranial studies of spatial tuning of alpha (in both human and macaque) tend to show increases in alpha due to visual stimulation, in contrast to a century of MEG/EEG studies, from Berger to the present, showing decreases. We believe that the discrepancy is due to an interaction between measurement type and brain signals. Specifically, intracranial measurements sum decreases in alpha oscillations and increases in broadband power on the same trials, and both signals can be large. In contrast, extracranial measures are less sensitive to the broadband signals and mostly just measure the alpha oscillation. Our study reconciles this discrepancy by removing the baseline broadband power increases, thereby isolating the alpha oscillation, and showing that with iEEG spatial analyses, the alpha oscillation decreases with visual stimulation, consistent with EEG and MEG results.

      Weakness 2.2 A second important variable here is the spatial scale of measurement.

      It follows that EEG based studies will capture changes in alpha activity up to the limits of spatial resolution of the method (i.e. limited in ability to map RFs). This methodological distinction isn't as clearly mentioned in the introduction, but is part of the author's motivation. Finally, as noted below, there are several studies in the literature specifically addressing the authors question, but they are not discussed in the introduction.

      The new Introduction now explicitly contrasts EEG/MEG with intracranial studies and refers to the studies below.

      Weakness 2.3 Prior studies:

      There are important findings in the literature preceding the author's work that are not sufficiently highlighted or cited. In general terms, the spatio-temporal properties of the EEG/iEEG spectrum are well known (i.e. that changes in high frequency activity are more focal than changes in lower frequencies). Therefore, the observations of spatially larger RFs for alpha activities is highly predicted. Specifically, prior work has examined the impact of using different frequency ranges to estimate RF properties, for example ECoG studies in the macaque by Takura et al. NeuroImage (2016) [PubMed: 26363347], as well as prior ECoG work by the author's team of collaborators (Harvey et al., NeuroImage (2013) [PubMed: 23085107]), as well as more recent findings from other groups (Luo et al., (2022) BioRxiv: https://doi.org/10.1101/2022.08.28.505627). Also, a related literature exists for invasively examining RF mapping in the time-voltage domain, which provides some insight into the author's findings (as this signal will be dominated by low-frequency effects). The authors should provide a more modern framing of our current understanding of the spatial organization of the EEG/iEEG spectrum, including prior studies examining these properties within the context of visual cortex and RF mapping. Finally, I do note that the author's approach to these questions do reflect an important test of prior findings, via an improved approach to RF characterization and iEEG frequency isolation, which suggests some important differences with prior work.

      Thank you for these references and suggestions. Some of the references were already included, and the others have been added.

      There is one issue where we disagree with the Reviewer, namely that “the observations of spatially larger RFs for alpha activities is highly predicted”. We agree that alpha oscillations and other low frequency rhythms tend to be less focal than high frequency responses, but there are also low frequency non-rhythmic signals, and these can be spatially focal. We show this by demonstrating that pRFs solved using low frequency responses outside the alpha band (both below and above the alpha frequency) are small, similar to high frequency broadband pRFs, but differing from the large pRFs associated with alpha oscillations. Hence we believe the degree to which signals are focal is more related to the degree of rhythmicity than to the temporal frequency per se. While some of these results were already in the supplement, we now address the issue more directly in the main text in a new section called, “2.5 The difference in pRF size is not due to a difference in temporal frequency.”

      We incorporated additional references into the Introduction, added a new section on low frequency broadband responses to the Results (section 2.5), and expanded the Discussion (section 3.2) to address these new references.

      Weakness 2.4 Statistical testing:

      The authors employ many important controls in their processing of data. However, for many results there is only a qualitative description or summary metric. It appears very little statistical testing was performed to establish reported differences. Related to this point, the iEEG data is highly nested, with multiple electrodes (observations) coming from each subject, how was this nesting addressed to avoid bias?

      We reviewed the primary claims made in the manuscript and for each claim, we specify the supporting analyses and, where appropriate, how we address the issue of nesting. Although some of these analyses were already in the manuscript, many of them are new, including all of the analyses concerning nesting. We believe that putting this information in one place will be useful to the reader, and we now include this text as a new section in supplement, Graphical and statistical support for primary claims.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation 2.1:

      Data presentation: In several places, the authors discuss important features of cortical responses as measured with iEEG that need to be carefully considered. This is totally appropriate and a strength of the author's work, however, I feel the reader would benefit from more depiction of the time-domain responses, to help better understand the authors frequency domain approach. For example, Figure 1 would benefit from showing some form of voltage trace (ERP) and spectrogram, not just the power spectra. In addition, part (a) of Figure 1 could convey some basic information about the timing of the experimental paradigm.

      We changed panel A of Figure 1 to include the timing of the experimental paradigm, and we added panels C and D to show the electrode time series before and after regression out of the ERP.

      Recommendation 2.2

      Update introduction to include references to prior EEG/iEEG work on spatial distribution across frequency spectrum, and importantly, prior work mapping RFs with different frequencies.

      We have addressed this issue and re-written our introduction. Please refer to our response in Public Review for further details.

      Recommendation 2.3

      Figure 3 has several panels and should be labeled to make it easier to follow.The dashed line in lower power spectra isn't defined in a legend and is missing from the upper panel - please clarify.

      We updated Figure 3 and reordered the panels to clarify how we computed the summary metrics in broadband and alpha for each stimulus location (i.e., the “ratio” values plotted in panel B). We also simplified the plot of the alpha power spectrum. It now shows a dashed line representing a baseline-corrected response to the mapping stimulus, which is defined in the legend and explained in the caption.

      Recommendation 2.4

      Power spectra are always shown without error shading, but they are mean estimates.

      We added error shading to Figures 1, 2 and 3.

      Recommendation 2.5

      The authors deal with voltage transients in response to visual stimulation, by subtracting out the trail averaged mean (commonly performed). However, the efficacy of this approach depends on signal quality and so some form of depiction for this processing step is needed.

      We added a depiction of the processing steps for regressing out the averaged responses in Figure 1 in an example electrode (panels C and D). We also show in the supplement the effect of regressing out the ERP on all the electrode pRFs. We have added Supplementary Figure 1-2.

      Recommendation 2.6

      I have a similar request for the authors latency correction of their data, where they identified a timing error and re-aligned the data without ground truth. Again, this is appropriate, but some depiction of the success of this correction is very critical for confirming the integrity of the data.

      We now report more detail on the latency correction, and also point out that any small error in the estimate would not affect our conclusions (4.6 ECoG data analysis | Data epoching). The correction was important for a prior paper on temporal dynamics (Groen et al, 2022), which used data from the same participants and estimated the latency of responses. In this paper, our analyses are in the spectral domain (and discard phase), so small temporal shifts are not critical. We now also link to the public code associated with that paper, which implemented the adjustment and quantified the uncertainty in the latency adjustment.

      More details on latency adjustment provided in section 4.6.

      Recommendation 2.7

      In many places the authors report their data shows a 'summary' value, please clarify if this means averaging or summation over a range.

      For both broadband and alpha, we derive one summary value (a scalar) for trial for each stimulus. For broadband, the summary metric is the ratio of power during a given trial and power during blanks, where power in a trial is the geometric mean of the power at each frequency within the defined band). This is equation 3 in the methods, which is now referred to the first time that summary metrics are mentioned in the results.  For alpha, the summary metric is the height of the Gaussian from our model-based approach. This is in equations 1 and 2, and is also now referred to the first time summary metrics are mentioned in the results.

      We added explanation of the summary metrics in the figure captions and results where they are first used, and also referred to the equations in the methods where they are defined.

      Recommendation 2.8

      The authors conclude: "we have discovered that spectral power changes in the alpha range reflect both suppression of alpha oscillations and elevation of broadband power." It might not have been the intention, but 'discovered' seems overstated.

      We agree and changed this sentence.

      Recommendation 2.9

      Supp Fig 9 is a great effort by the authors to convey their findings to the reader, it should be a main figure.

      We are glad you found Supplementary Figure 9 valuable. We moved this figure to the main text.

      Reviewer #3 (Public Review):

      Summary:

      This study tackles the important subject of sensory driven suppression of alpha oscillations using a unique intracranial dataset in human patients. Using a model-based approach to separate changes in alpha oscillations from broadband power changes, the authors try to demonstrate that alpha suppression is spatially tuned, with similar center location as high broadband power changes, but much larger receptive field. They also point to interesting differences between low-order (V1-V3) and higher-order (dorsolateral) visual cortex. While I find some of the methodology convincing, I also find significant parts of the data analysis, statistics and their presentation incomplete. Thus, I find that some of the main claims are not sufficiently supported. If these aspects could be improved upon, this study could potentially serve as an important contribution to the literature with implications for invasive and non-invasive electrophysiological studies in humans.

      We thank the reviewer for the summary.

      Strengths:

      The study utilizes a unique dataset (ECOG & high-density ECOG) to elucidate an important phenomenon of visually driven alpha suppression. The central question is important and the general approach is sound. The manuscript is clearly written and the methods are generally described transparently (and with reference to the corresponding code used to generate them). The model-based approach for separating alpha from broadband power changes is especially convincing and well-motivated. The link to exogenous attention behavioral findings (figure 8) is also very interesting. Overall, the main claims are potentially important, but they need to be further substantiated (see weaknesses).

      We thank the reviewer for the positive comments.

      Weaknesses:

      I have three major concerns:

      Weakness 3.1. Low N / no single subject results/statistics:

      The crucial results of Figure 4,5 hang on 53 electrodes from four patients (Table 2). Almost half of these electrodes (25/53) are from a single subject. Data and statistical analysis seem to just pool all electrodes, as if these were statistically independent, and without taking into account subject-specific variability. The mean effect per each patient was not described in text or presented in figures. Therefore, it is impossible to know if the results could be skewed by a single unrepresentative patient. This is crucial for readers to be able to assess the robustness of the results. N of subjects should also be explicitly specified next to each result.

      We have added substantial changes to deal with subject specific effects, including new results and new figures.

      • Figure 4 now shows variance explained by the alpha pRF broken down by each participant for electrodes in V1 to V3. We also now show a similar figure for dorsolateral electrodes in Supplementary Figure 4-2.

      • Figure 5, which shows results from individual electrodes in V1 to V3, now includes color coding of electrodes by participant to make it clear how the electrodes group with participant. Similarly, for dorsolateral electrodes, we show electrodes grouped by participant in Supplementary Figure 5-1. Same for Supplementary Figure 6-2.

      • Supplementary Figure 7-2 now shows the benefits of our model-based approach for estimating alpha broken down by individual participants.

      • We also now include a new section in the supplement that summarizes for every major claim, what the supporting data are and how we addressed the issue of nesting electrodes by participant, section Graphical and statistical support for primary claims.

      Weakness 3.2. Separation between V1-V3 and dorsolateral electrodes:

      Out of 53 electrodes, 27 were doubly assigned as both V1-V3 and dorsolateral (Table 2, Figures 4,5). That means that out of 35 V1-V3 electrodes, 27 might actually be dorsolateral. This problem is exasperated by the low N. for example all the 20 electrodes in patient 8 assigned as V1-V3 might as well be dorsolateral. This double assignment didn't make sense to me and I wasn't convinced by the authors' reasoning. I think it needlessly inflates the N for comparing the two groups and casts doubts on the robustness of these analyses.

      Electrode assignment was probabilistic to reflect uncertainty in the mapping between location and retinotopic map. The probabilistic assignment is handled in two ways.

      (1) For visualizing results of single electrodes, we simply go with the maximum probability, so no electrode is visualized for both groups of data. For example, Figure 5a (V1-V3) and supplementary Figure 5-1a (dorsolateral electrodes) have no electrodes in common: no electrode is in both plots.

      (2) For quantitative summaries, we sample the electrodes probabilistically (for example Figures 4, 5c). So, if for example, an electrode has a 20% chance of being in V1 to V3, and 30% chance of being in dorsolateral maps, and a 50% chance of being in neither, the data from that electrode is used in only 20% of V1-V3 calculations and 30% of dorsolateral calculations. In 50% of calculations, it is not used at all. This process ensures that an electrode with uncertain assignment makes no more contribution to the results than an electrode with certain assignment. An electrode with a low probability of being in, say, V1-V3, makes little contribution to any reported results about V1-V3. This procedure is essentially a weighted mean, which the reviewer suggests in the recommendations. Thus, we believe there is not a problem of “double counting”.

      The alternative would have been to use maximum probability for all calculations. However, we think that doing so would be misleading, since it would not take into account uncertainty of assignment, and would thus overstate differences in results between the maps.

      We now clarify in the Results that for probabilistic calculations, the contribution of an electrode is limited by the likelihood of assignment (Section 2.3). We also now explain in the methods why we think probabilistic sampling is important.

      Weakness 3.3. Alpha pRFs are larger than broadband pRFs:

      First, as broadband pRF models were on average better fit to the data than alpha pRF models (dark bars in Supp Fig 3. Top row), I wonder if this could entirely explain the larger Alpha pRF (i.e. worse fits lead to larger pRFs). There was no anlaysis to rule out this possibility.

      We addressed this question in a new paragraph in Discussion section 3.1 (“What is the function of the large alpha pRFs?”, paragraph beginning… “Another possible interpretation is that the poorer model fit in the alpha pRF is due to lower signal-to-noise”). This paragraph both refers to prior work on the relationship between noise and pRF size and to our own control analyses (Supplementary Figure 5-2).

      Weakness 3.4 Statistics

      Second, examining closely the entire 2.4 section there wasn't any formal statistical test to back up any of the claims (not a single p-value is mentioned). It is crucial in my opinion to support each of the main claims of the paper with formal statistical testing.

      We agree that it is important for the reader to be able to link specific results and analyses to specific claims. We are not convinced that null hypothesis statistical testing is always the best approach. This is a topic of active debate in the scientific community.

      We added a new section that concisely states each major claim and explicitly annotates the supporting evidence. (Section 4.7). Please also refer to our responses to Reviewer #2 regarding statistical testing (Reviewer weakness 2.4 “Statistical testing”)

      Weakness 3.5 Summary

      While I judge these issues as crucial, I can also appreciate the considerable effort and thoughtfulness that went into this study. I think that addressing these concerns will substantially raise the confidence of the readership in the study's findings, which are potentially important and interesting.

      We again thank the reviewer for the positive comments.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for how to address the three major concerns:

      Suggestion 3.1.

      I am very well aware that it's very hard to have n=30 in a visual cortex ECOG study. That's fine. Best practice would be to have a linear mixed effects model with patients as a random effect. However, for some figures with just 3-4 patients (Figure 4,5) the sample size might be too small even for that. At the very minimum, I would expect to show in figures/describe in text all results per patient (perhaps one can do statistics within each patient, and show for each patient that the effect is significant). Even in primate studies with just two subjects it is expected to show that the results replicate for subject A and B. It is necessary to show that your results don't depend on a single unrepresentative subject. And if they do, at least be transparent about it.

      We have addressed this thoroughly. Please see response to Weakness 3.1 (“Low N / no single subject results/statistics”).

      Suggestion 3.2.

      I just don't get it. I would simply assign an electrode to V1-V3 or dorsolateral cortex based on which area has the highest probability. It doesn't make sense to me that an electrode that has 60% of being in dorsolateral cortex and only 10% to be in V1-V3 would be assigned as both V1-V3 and dorsolateral. Also, what's the rationale to include such electrode in the analysis for let's say V1-V3 (we have weak evidence to believe it's there)? I would either assign electrodes based on the highest probability, or alternatively do a weighted mean based on the probability of each electrode belonging to each region group (e.g. electrode with 40% to be in V1-V3, will get twice the weight as an electrode who has 20% to be in V1-V3) but this is more complicated.

      We have addressed this issue. Please refer to our response in Public Review (“Weakness 3.2 Separation between V1-V3 and dorsolateral”) for details.

      Suggestion 3.3.

      First, to exclude the possibility that alpha pRF are larger simply because they have a worse fit to the neural data, I would show if there is a correlation between the goodnessof-fit and pRF size (for alpha and broadband signals, separately). No [negative] correlation between goodness-of-fit and pRF size would be a good sign. I would also compare alpha & broadband receptive field size when controlling for the goodness-of-fit (selecting electrodes with similar goodness-of-fit for both signals). If the results replicate this way it would be convincing.

      Second, there are no statistical tests in section 2.4, possibly also in others. Even if you employ bootstrap / Monte-Carlo resampling methods you can extract a p-value.

      We have addressed this issue. Please refer to our response in Public Review Point 3.3 (“Alpha pRFs are larger than broadband pRFs”) for further details.

      Suggestion 3.4.

      Also, I don't understand the resampling procedure described in lines 652-660: "17.7 electrodes were assigned to V1-V3, 23.2 to dorsolateral, and 53 to either " - but 17.7 + 23.2 doesn't add up to 53. It also seems as if you assign visual areas differently in this resampling procedure than in the real data - "and randomly assigned each electrode to a visual area according to the Wang full probability distributions". If you assign in your actual data 27 electrodes to both visual areas, the same should be done in the resampling procedure (I would expect exactly 35 V1-V3 and 45 dorsolateral electrodes in every resampling, just the pRFs will be shuffled across electrodes).

      We apologize for the confusion.

      We fixed the sentence above, clarified the caption to Table 2, and also explained the overall strategy of probabilistic resampling better. See response to Public Review point 3.2 for details.

      Suggestion 3.5.

      These are rather technical comments but I believe they are crucial points to address in order to support your claims. I genuinely think your results are potentially interesting and important but these issues need to be first addressed in a revision. I also think your study may carry implications beyond just the visual domain, as alpha suppression is observed for different sensory modalities and cortical regions. Might be useful to discuss this in the discussion section.

      Agree. We added a paragraph on this point to the Discussion (very end of 3.2).

    1. eLife Assessment

      This fundamental study examines whether synaptic cell adhesion molecules neuroligin 1-3 resident on astrocytes, rather than neurons, exert effect on synaptic structure and function. With compelling evidence, the authors report that deletion of neuroligins 1-3 specifically in astrocytes does not alter synapse formation or astrocyte morphology in the hippocampus or visual cortex. This study highlights the specific role of neuronal neuroligins rather than their astrocytic counterparts in synaptogenesis.

    2. Reviewer #1 (Public review):

      Astrocytes are known to express neuroligins 1-3. Within neurons, these cell adhesion molecules perform important roles in synapse formation and function. Within astrocytes, a significant role for neuroligin 2 in determining excitatory synapse formation and astrocyte morphology was shown in 2017. However, there has been no assessment of what happens to synapses or astrocyte morphology when all three major forms of neuroligins within astrocytes (isoforms 1-3) are deleted using a well characterized, astrocyte specific, and inducible cre line. By using such selective mouse genetic methods, the authors here show that astrocytic neuroligin 1-3 expression in astrocytes is not consequential for synapse function or for astrocyte morphology. They reach these conclusions with careful experiments employing quantitative western blot analyses, imaging and electrophysiology. They also characterize the specificity of the cre line they used. Overall, this is a very clear and strong paper that is supported by rigorous experiments. The discussion considers the findings carefully in relation to past work. This paper is of high importance, because it now raises the fundamental question of exactly what neuroligins 1-3 are actually doing in astrocytes. In addition, it enriches our understanding of the mechanisms by which astrocytes participate in synapse formation and function. The paper is very clear, well written and well illustrated with raw and average data.

      Comments on revisions:

      My previous comments have been addressed. I have no additional points to make and congratulate the authors.

    3. Reviewer #2 (Public review):

      In the present manuscript, Golf et al. investigate the consequences of astrocyte-specific deletion of Neuroligin (Nlgn) family cell adhesion proteins on synapse structure and function in the brain. Decades of prior research had shown that Neuroligins mediate their effects at synapses through their role in the postsynaptic compartment of neurons and their transsynaptic interaction with presynaptic Neurexins. More recently, it was proposed for the first time that Neuroligins expressed by astrocytes can also bind to presynaptic Neurexins to regulate synaptogenesis (Stogsdill et al. 2017, Nature). However, several aspects of the model proposed by Stogsdill et al. on astrocytic Neuroligin function conflict with prior evidence on the role of Neuroligins at synapses, prompting Golf et al. to further investigate astrocytic Neuroligin function in the current study. Using postnatal conditional deletion of Nlgn1-3 specifically from astrocytes in mice, Golf et al. show that virtually no changes in the expression of synaptic proteins or in the properties of synaptic transmission at either excitatory or inhibitory synapses are observed. Moreover, no alterations in the morphology of astrocytes themselves were found. To further extend this finding, the authors additionally analyzed human neurons co-cultured with mouse glia lacking expression of Nlgn1-4. No difference in excitatory synaptic transmission was observed between neurons cultured in the present of wildtype vs. Nlgn1-4 conditional knockout glia. The authors conclude that while Neuroligins are indeed expressed in astrocytes and are hence likely to play some role there, this role does not include any direct consequences on synaptic structure and function, in direct contrast to the model proposed by Stogsdill et al.

      Overall, this is a strong study that addresses a fundamental and highly relevant question in the field of synaptic neuroscience. Neuroligins are not only key regulators of synaptic function, they have also been linked to numerous psychiatric and neurodevelopmental disorders, highlighting the need to precisely define their mechanisms of action. The authors take a wide range of approaches to convincingly demonstrate that under their experimental conditions, Nlgn1-3 are efficiently deleted from astrocytes in vivo, and that this deletion does not lead to major alterations in the levels of synaptic proteins or in synaptic transmission at excitatory or inhibitory synapses, or in the morphology of astrocytes. While the co-culture experiments are somewhat more difficult to interpret due to lack of a control for the effect of wildtype mouse astrocytes on human neurons, they are also consistent with the notion that deletion of Nlgn1-4 from astrocytes has no consequences for the function of excitatory synapses. Together, the data from this study provide compelling and important evidence that, whatever the role of astrocytic Neuroligins may be, they do not contribute substantially to synapse formation or function under the conditions investigated.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Astrocytes are known to express neuroligins 1-3. Within neurons, these cell adhesion molecules perform important roles in synapse formation and function. Within astrocytes, a significant role for neuroligin 2 in determining excitatory synapse formation and astrocyte morphology was shown in 2017. However, there has been no assessment of what happens to synapses or astrocyte morphology when all three major forms of neuroligins within astrocytes (isoforms 1-3) are deleted using a well characterized, astrocyte specific, and inducible cre line. By using such selective mouse genetic methods, the authors here show that astrocytic neuroligin 1-3 expression in astrocytes is not consequential for synapse function or for astrocyte morphology. They reach these conclusions with careful experiments employing quantitative western blot analyses, imaging and electrophysiology. They also characterize the specificity of the cre line they used. Overall, this is a very clear and strong paper that is supported by rigorous experiments. The discussion considers the findings carefully in relation to past work. This paper is of high importance, because it now raises the fundamental question of exactly what neuroligins 1-3 are actually doing in astrocytes. In addition, it enriches our understanding of the mechanisms by which astrocytes participate in synapse formation and function. The paper is very clear, well written and well illustrated with raw and average data.

      We thank the reviewer for the balanced and informative summary.

      Reviewer #2 (Public Review):

      In the present manuscript, Golf et al. investigate the consequences of astrocyte-specific deletion of Neuroligin family cell adhesion proteins on synapse structure and function in the brain. Decades of prior research had shown that Neuroligins mediate their effects at synapses through their role in the postsynaptic compartment of neurons and their transsynaptic interaction with presynaptic Neurexins. More recently, it was proposed for the first time that Neuroligins expressed by astrocytes can also bind to presynaptic Neurexins to regulate synaptogenesis (Stogsdill et al. 2017, Nature). However, several aspects of the model proposed by Stogsdill et al. on astrocytic Neuroligin function conflict with prior evidence on the role of Neuroligins at synapses, prompting Golf et al. to further investigate astrocytic Neuroligin function in the current study. Using postnatal conditional deletion of Neuroligins 1, 2 and 3 specifically from astrocytes, Golf et al. show that virtually no changes in the expression of synaptic proteins or in the properties of synaptic transmission at either excitatory or inhibitory synapses are observed. Moreover, no alterations in the morphology of astrocytes themselves were found. The authors conclude that while Neuroligins are indeed expressed in astrocytes and are hence likely to play some role there, this role does not include any direct consequences on synaptic structure and function, in direct contrast to the model proposed by Stogsdill et al.

      Overall, this is a strong study that addresses an important and highly relevant question in the field of synaptic neuroscience. Neuroligins are not only key regulators of synaptic function, they have also been linked to numerous psychiatric and neurodevelopmental disorders, highlighting the need to precisely define their mechanisms of action. The authors take a wide range of approaches to convincingly demonstrate that under their experimental conditions, no alterations in the levels of synaptic proteins or in synaptic transmission at excitatory or inhibitory synapses, or in the morphology of astrocytes, are observed.

      We are also grateful for this reviewer’s constructive comments.

      One caveat to this study is that the authors do not directly provide evidence that their Tamoxifen-inducible conditional deletion paradigm does indeed result in efficient deletion of all three Neuroligins from astrocytes. Using a Cre-dependent tdTomato reporter line, they show that tdTomato expression is efficiently induced by the current paradigm, and they refer to a prior study showing efficient deletion of Neuroligins from neurons using the same conditional Nlgn1-3 mouse lines but a different Cre driver strategy. However, neither of these approaches directly provide evidence that all three Neuroligins are indeed deleted from astrocytes in the current study. In contrast, Stogsdill et al. employed FACS and qPCR to directly quantify the loss of Nlgn2 mRNA from astrocytes. This leaves the current Golf et al. study somewhat vulnerable to the criticism, however unlikely, that their lack of synaptic effects may be a consequence of incomplete Neuroligin deletion, rather than a true lack of effect of astrocytic Neuroligins.

      The concern is valid. In the original submission of this paper, we did not establish that the Cre recombinase we used actually deleted neuroligins in astrocytes. We have now addressed this issue in the revised paper with new experiments as described below.

      However, the reviewer’s impression that the Stogsdill et al. paper confirmed full deletion of Nlgn2 is a misunderstanding of the data in that paper. The reviewer is correct that Stogsdill et al. performed FACS to test the efficacy of the GLAST-Cre mediated deletion of Nlgn2-flox mice, followed by qRT-PCR comparing heterozygous with homozygous mutant mice. With their approach, no wild-type control could be used, as these would lack reporter expression. However, this experiment does NOT allow conclusions about the degree of recombination, both overall recombination (i.e. recombination in all astrocytes regardless of TdT+) and recombination in TdT+ astrocytes because it doesn’t quantify recombination. To quantify the degree of recombination, the paper would have had to perform genomic PCR measurements.  

      The problem with the data on the degree of recombination in the Stogsdill et al. (2017) paper, as we understand them, is two-fold.

      First, the GLAST-Cre line only targets ~40-70% of astrocytes, at least as evidenced by highly sensitive Cre-reporter mice in a variety of studies using this Cre line. The 40-70% variation is likely due to differences in the reporter mice and the tamoxifen injection schedule used. In comparison, we are targeting most astrocytes using the Aldh1l1-CreERT2 mice. Moreover, GLAST-Cre mice exhibit neuronal off-targeting, consistent with at least some of the remaining Nlgn2 qRT-PCR signal in the FACS-sorted cells. As we describe next, this signal also likely comes from astrocytes where recombination was incomplete This is the reason why we, like everyone else, are now using the Aldh1l1-Cre line that has been shown to be more efficient both in terms of the overall targeting of astrocytes (i.e. nearly complete) and the level of recombination observed in reporter(+) astrocytes.

      Second, Stogsdill et al. detected a significant decrease in the Nlgn2 qRT-PCR signal in the FACS-sorted homozygous Nlgn2 KO cells compared to the heterozygous Nlgn2 KO cells but the Nlgn2 qRT-PCR signal was still quite large. The data is presented as normalized to the HET condition. As a result, we don’t know the true level of gene deletion (i.e. compared to TdT- astrocytes). For example, based on the Stogsdill et al. data the HET manipulation could have induced only a 20% reduction in Nlgn2 mRNA levels in TdT(+) astrocytes, in which case the KO would have produced a 40% reduction in Nlgn2 mRNA in TdT(+) astrocytes. Moreover, it is possible based on our own experience with the GLAST-Cre line, that the reporter may also not turn on in some astrocytes where other alleles have been independently recombined – just as some astrocytes that are Td(+) would still be wild-type or heterozygous for Nlgn2. Thus, it is impossible to calculate the actual percentage of recombination from these data, even in TdT(+) cells, absent of PCR of genomic DNA from isolated cells. Alternatively, comparison of mRNA levels using primers sensitive to floxed sequences in wild-type controls versus cKO mice would have also yielded a much better idea of the recombination efficiency.

      In summary, it is unclear whether the Nlgn2 deletion in the Stogsdill et al. paper was substantial or marginal – it is simply impossible to tell.

      Reviewer #3 (Public Review):

      This study investigates the roles of astrocytes in the regulation of synapse development and astrocyte morphology using conditional KO mice carrying mutations of three neuroligins1-3 in astrocytes with the deletion starting at two different time points (P1 and P10/11). The authors use morphological, electrophysiological, and cell-biological approaches and find that there are no differences in synapse formation and astrocyte cytoarchitecture in the mutant hippocampus and visual cortex. These results differ from the previous results (Stogsdill et al., 2017), although the authors make several discussion points on how the differences could have been induced. This study provides important information on how astrocytes and neurons interact with each other to coordinate neural development and function. The experiments were well-designed, and the data are of high quality.

      We also thank this reviewer for helpful comments!

      Recommendations for the authors:

      This project was meant to rigorously test the intriguing overall question whether neuroligins, which are abundantly expressed in astrocytes, regulate synapse formation as astrocytic synapse organizers. The goal of the paper was NOT to confirm or dispute the conclusion by Stogsdill et al. (Nature 2017) that Nlgn2 expressed in astrocytes is essential for excitatory synapse formation and that astrocytic Nlgn1-3 are required for proper astrocyte morphogenesis. Instead, the project was meant to address the much broader question whether the abundant expression of any neuroligin, not just Nlgn2, in astrocytes is essential for neuronal excitatory or inhibitory synapse formation and/or for the astrocyte cytoarchitecture. We felt that this was an important question independent of the Stogsdill et al. paper. We analyzed in our experiments young adult mice, a timepoint that was chosen deliberately to avoid the possibility of observing a possible developmental delay rather than a fundamental function that extends beyond development.

      We do recognize that the conclusion by Stogsdill et al. (2017) that Nlgn2 expression in astrocytes is essential for excitatory synapse formation was very exciting to the field but contradicted a large literature demonstrating that Nlgn2 protein is exclusively localized to inhibitory synapses and absent from excitatory synapses (to name just a few papers, see Graf et al., Cell 2004; Varoqueaux et al., Eur. J. Cell Biol. 2004; Patrizi et al., PNAS 2008;  Hoon et al., J. Neurosci. 2009). In addition, the conclusion of Stogsdill et al. that astrocytic Nlgn2 specifically drove excitatory synapse formation was at odds with previous findings documenting that the constitutive deletion of Nlgn2 in all cells, including astrocytes, has no effect on excitatory synapse numbers (again, to name a few papers, see Varoqueaux et al., Neuron 2006; Blundell et al., Genes Brain Behav. 2008; Poulopoulos et al., Neuron 2009; Gibson et al., J. Neurosci. 2009). These contradictions conferred further urgency to our project, but please note that this project was primarily driven by our curiosity about the function of astrocytic neuroligins, not by a fruitless desire to test the validity of one particular Nature paper.

      The general goal of our paper notwithstanding, few papers from our lab have received as much attention and as many negative comments on social media as this paper when it was published as a preprint. Because we take these criticisms seriously, we have over the last year performed extensive additional experiments to ensure that our findings are well founded. We feel that, on balance, our data are incompatible with the notion that astrocytic neuroligins play a fundamental role in excitatory synapse formation but are consistent with other prior findings obtained with neuroligin KO mice. In the new data we added to the paper, we not only characterized the Cre-mediated deletion of neuroligins in depth, but also employed an independent second system -human neurons cultured on mouse glia- to further validate our conclusions as described below. Although we believe that our results are incompatible with the notion that astrocytic neuroligins fundamentally regulate excitatory or inhibitory synapse formation, we also conclude with regret that we still don’t know what astrocytic neuroligins actually do. Thus, the function of astrocytic neuroligins, as there surely must be one, remains a mystery.

      Finally, there are many possible explanations for the discrepancies between our conclusions and those of Stogsdill et al. as described in our paper. Most of these explanations are technical and may explain why not only our, but also the results of many other previous studies from multiple labs, are inconsistent with the conclusions by Stogsdill et al. (2017), as discussed in detail in the revised paper.

      Reviewer #1 (Recommendations For The Authors):

      The paper is very clear and well written. I have only one comment and that is to increase the sizes of Figs 2, 4 and 6 so that the imaging panels can be seen more clearly. Also, although I know the n numbers are provided in the figure legends, the authors may help the reader by providing them in the results when key data and findings are reported.

      We agree and have followed the reviewer’s suggestions as best as we could.

      Reviewer #2 (Recommendations For The Authors):

      (1) Given the strength and importance of the claims that the authors make, I would highly recommend adding some quantitative evidence regarding the efficacy of deletion in astrocytes, e.g. using the same strategy as in Stogsdill et al. As unlikely as it may be that Neuroligin deletion is in fact incomplete, this possibility cannot be excluded unless directly measured. To avoid future discussions on this subject, it seems that the onus is on the authors to provide this information.

      We concur that this is an important point and have devoted a year-long effort to address it. Note, however, that the strategy employed by Stogsdill et al. does not actually allow conclusions about their recombination efficiency. As described above, it only allows the conclusion that some recombination took place. The Stogsdill et al. Nature paper (2017) is a bit confusing on this point. This approach is thus not appropriate to address the question raised by the reviewer.

      We have performed two experiments to address the issue raised by the reviewer.

      First, we used a viral (i.e. AAV2/5) approach to express Rpl22 with a triple HA-tag, also known as Ribotag, which allows us to purify ribosome-bound mRNA from targeted cells for downstream gene expression analysis. The novel construct is driven by the GfaABC1D promoter and includes two additional features which make it particularly useful. First, upstream of Ribotag is a membrane-targeted, Lck-mVenus followed by a self-cleaving P2A sequence. This allows easy visualization of targeted astrocytes. Second, we have incorporated a cassette of four copies of six miRNA targeting sequences (4x6T) for mIR-124 as was recently published (Gleichman et al., 2023) to eliminate off-target expression in neurons. Based on qPCR analysis, the updated construct allowed >95% de-enrichment of neuronal mRNA and slightly improved observed recombination rates (~10% per gene) relative to an earlier version without 4x6T. Mice that were injected with tamoxifen at P1, similar to other experiments in the paper, were then stereotactically injected at ~P35-40 within the dorsal hippocampus with AAV2/5-GfaABC1D-Lck-mVenus-P2A-Rpl22-HA-4x6T. Approximately 3 weeks later, acute slices were prepared, visualized for fluorescence, and both CA1 and nearby cortex that was partially targeted were isolated for downstream ribosome affinity purification with HA antibodies. Total RNA was saved as input. qPCR was performed using assays that are sensitive to the exons that are floxed in the Nlgn123 cKO mice, so that our quantifications are not confounded by potential differences in non-sense mediated decay. Our control data reveals a striking enrichment of an astrocyte marker gene (e.g. aquaporin-4) and de-enrichment of genes for other cell types. In the CA1, we observed robust loss of Nlgn3 (~96%), Nlgn2 (~86%), and Nlgn1 (65%) gene expression. Similarly, in the cortex, we observed a similarly robust loss of Nlgn3 (93%), Nlgn2 (83%), and Nlgn1 (72%) expression. Given that our targeting of astrocytes based on Ai14 Cre-reporter mice was ~90-99%, these reductions are striking and definitive. The existence of some residual transcript reflects the presence of a small population of astrocytes heterozygous for Nlgn2 and Nlgn3. In contrast, Nlgn1 appears more difficult to recombine and it is likely that some astrocytes are either heterozygous or homozygous knockout cells. Although it is thus possible that Nlgn1 could provide some compensation in our experiments, it is worth noting that Stogsdill et al. found that only Nlgn2 and Nlgn3 knockdown with shRNAs resulted in impaired astrocyte morphology by P21. Moreover, they found that Nlgn2 cKO in astrocytes with PALE of a Cre-containing pDNA impaired astrocyte morphology in a gene-dosage dependent manner and suppressed excitatory synapse formation at P21. Thus, our inability to delete all of Nlgn1 doesn’t readily explain contradictions between our findings and theirs.

      Second, in an independent approach we have cultured glia from mouse quadruple conditional Nlgn1234 KO mice and infected the glia with lentiviruses expressing inactive (DCre, control) or active Cre-recombinase. We confirmed complete recombination by PCR. We then cultured human neurons forming excitatory synapses on the glia expressing or lacking neuroligins and measured the frequency and amplitude of mEPSCs as a proxy for synapse numbers and synaptic function. As shown in the new Figure 9, we detected no significant changes in mEPSCs, demonstrating in this independent system that the glial neuroligins do not detectably influence excitatory synapse formation.

      (2) Along the same lines, the authors should be careful not to overstate their findings in this direction. For example, the figure caption for Figure 2 reads 'Nlgn1-3 are efficiently and selectively deleted in astrocytes by crossing triple Nlgn1-3 conditional KO mice with Adh1l1-CreERT2 driver mice and inducing Cre-activity with tamoxifen early during postnatal development'. This is not technically correct and should be modified to reflect that the authors are not in fact assessing deletion of Nlgn1-3, but only expression of a tdTomato reporter.

      We agree – this is essentially the same criticism as comment #1.

      (3) In general, the animal numbers used for the experiments are rather low. With an n = 4 for most experiments, only large abnormalities would be detected anyway, while smaller alterations would not reach statistical significance due to the inherent biological and technical variance. For the most part, this is not a concern, since there really is no difference between WTs and Nlgn1-3 cKOs. However, trends are observed in some cases, and it is conceivable that these would become significant changes with larger n's, e.g. Figure 3H (Vglut2); Figure 4E (VGlut2 S.P., D.G.); Figure 6D (Vglut2). Increasing the numbers to n = 6 here would greatly strengthen the claims that no differences are observed.

      We concur that small differences would not have been detected in our experiments but feel that given the very large phenotypes of the neuroligin deletions in neurons and of the phenotypes reported by Stogsdill et al. (2017), which also did not employ a large number of animals, a very small phenotype in astrocytes would not have been very informative.

      Minor points:

      (1) Please state the exact genetic background for the mouse lines used.

      Our lab generally uses hybrid CD1/Bl6 mice to avoid artifacts produced by inbred genetic mutations in so-called ‘pure’ lines, especially Bl6 mice. This standard protocol was followed in the present study. Thus, the mice are on a mixed CD1/Bl6 hybrid background.

      Reviewer #3 (Recommendations For The Authors):

      (1) Figure 4 demonstrates that neuroligin 1-3 deletions restricted to astrocytes do not affect the number of excitatory and inhibitory synapses in layer IV of the primary visual cortex. This conclusion could be further strengthened if the authors could provide electrophysiological evidence such as mE/IPSCs.

      We agree but have chosen a different avenue to further test our conclusions because slice electrophysiological experiments are time-consuming, labor intensive, and difficult to quantitate, especially in cortex.

      Specifically, we have co-cultured human neurons with astrocytes that either contain or lack neuroligins (new Fig. 9). With this experimental design, we have total control over ALL neuroligins in astrocytes. Electrophysiological recordings then demonstrated that the complete deletion of all glial neuroligins has no effect on mEPSC frequencies and amplitudes. Although clearly much more needs to be done, the new results confirm in an independent system that glial neuroligins have no effect on synapse formation in the neurons, even though neurons depend on astrocytes for synaptogenic factors as Ben Barres brilliantly showed a decade ago. However, it is important to note that dissociated glia in culture, while synaptogenic, are reactive and may not faithfully recapitulate all roles of astrocytes in synaptogenesis.

      (2) It would help readers if the images showing the punctate double marker stainings of excitatory/inhibitory synapses are presented in merged colors (i.e., yellow colors for red and green puncta colors).

      We have tried to improve the visualization of the rather voluminous studies we performed and illustrate in the figures as best as we could.

      (3) The resolutions of the images in the figures are not good, although I guess it is because the images are for review processes.

      We apologize and would like to assure the reviewer that we are supplying high-resolution images to the journal.

      (4) Typos in lines 82 and 274.

      We have corrected these errors.

    1. eLife Assessment

      This important work combines theory and experiment to demonstrate convincingly how humans make decisions about sequences of pairs of correlated observations. The proposed model for evidence integration in correlated environments will be of use for the study of decision-making.

    2. Reviewer #1 (Public review):

      Summary:

      The behavioral strategies underlying decisions based on perceptual evidence are often studied in the lab with stimuli whose elements provide independent pieces of decision-related evidence that can thus be equally weighted to form a decision. In more natural scenarios, in contrast, the information provided by these pieces is often correlated, which impacts how they should be weighted. Tardiff, Kang & Gold set out to study decisions based on correlated evidence and compare observed behavior of human decision makers to normative decision strategies. To do so, they presented participants with visual sequences of pairs of localized cues whose location was either uncorrelated, or positively or negatively correlated, and whose mean location across a sequence determined the correct choice. Importantly, they adjusted this mean location such that, when correctly weighted, each pair of cues was equally informative, irrespective of how correlated it was. Thus, if participants follow the normative decision strategy, their choices and reaction times should not be impacted by these correlations. While Tardiff and colleagues found no impact of correlations on choices, they did find them to impact reaction times, suggesting that participants deviated from the normative decision strategy. To assess the degree of this deviation, Tardiff et al. adjusted drift diffusion models (DDMs) for decision-making to process correlated decision evidence. These fits, and a comparison of different model variants revealed that participants considered correlations when weighing evidence, but did so with a slight underestimation of magnitude of this correlation. This finding made Tardiff et al. conclude that participants followed a close-to normative decision strategy that adequately took into account correlated evidence.

      Strength:

      The authors adjust a previously used experimental design to include correlated evidence in a simple, yet powerful way. The way it does so is easy to understand and intuitive, such that participants don't need extensive training to perform the task. Limited training makes it more likely that the observed behavior is natural and reflective of every-day decision-making. Furthermore, the design allowed the authors to make the amount of decision-related evidence equal across different correlation magnitudes, which makes it easy to assess whether participants correctly take account of these correlations when weighing evidence: if they do, their behavior should not be impacted by the correlation magnitude.

      The relative simplicity with which correlated evidence is introduced also allowed the authors to fall back to the well-established DDM for perceptual decisions, that has few parameters, is known to implement the normative decision strategy in certain circumstances, and enjoys a great deal of empirical support. The authors show how correlations ought to impact these parameters, and which changes in parameters one would expect to see if participants mis-estimate these correlations or ignore them altogether (i.e., estimate correlations to be zero). This allowed them to assess the degree to which participants took into account correlations on the full continuum from perfect evidence weighting to complete ignorance. More specifically, the authors showed that a consistent mis-estimation of the correlation magnitude would not impact the fraction of correct choices (as they observe), but only the reaction times. With this, they could show that participants in fact performed rational evidence weighting if one assumed that they slightly underestimated the correlation magnitude.

      Weaknesses:

      While the authors convincingly demonstrate that the observed decision-making behavior seems to stem from a slight underestimation of the correlation magnitudes, their experimental paradigm did not allow them to determine the origin of this bias. Through additional analyses they rule out various possibilities, like the impact of a Bayesian prior on estimated correlations. Nonetheless, the authors provide no normative explanation of the observed bias.

      A further minor weakness is that the authors only focus on a single normative aspect of the observed behavior, namely on whether participants optimally accumulate decision-related evidence across time. Another question is whether participants tune their decision boundaries to maximize reward rates or some other overall performance measures. While the authors discuss that the chosen diffusion models (DDMs) have the potential of also implementing normative decisions in the latter sense, the authors' analysis does not address this question in the context of their task.

    3. Reviewer #2 (Public review):

      This study by Tardiff, Kang & Gold seeks to i) develop a normative account of how observers should adapt their decision-making across environments with different levels of correlation between successive pairs of observations, and ii) assess whether human decisions in such environments are consistent with this normative model. The authors first demonstrate that, in the range of environments under consideration here, an observer with full knowledge of the generative statistics should take both the magnitude and sign of the underlying correlation into account when assigning weight in their decisions to new observations: stronger negative correlations should translate into stronger weighting (due to the greater information furnished by an anticorrelated generative source), while stronger positive correlations should translate into weaker weighting (due to the greater redundancy of information provided by a positively correlated generative source). The authors then report an empirical study in which human participants performed a perceptual decision-making task requiring accumulation of information provided by pairs of perceptual samples, under different levels of pairwise correlation. They describe a nuanced pattern of results with effects of correlation being largely restricted to response times and not choice accuracy, which could be captured through fits of their normative model (in this implementation, an extension of the well-known drift diffusion model) to the participants' behaviour while allowing for mis-estimation of the underlying correlations. An intriguing result is that the observed pattern of behavioural effects is best explained by a model in which observers marginally underestimated the level of correlation between the generative sources, and that this bias affects behaviour through effects on stimulus encoding that then shape how the evidence furnished by each stimulus sample is weighted in decision formation.

      As the authors point out in their very well-written paper, appropriate weighting of information gathered in correlated environments has important consequences for real-world decision-making. Yet, while this function has been well studied for 'high-level' (e.g. economic) decisions, how we account for correlations when making simple perceptual decisions on well-controlled behavioural tasks has not been investigated. As such, this study addresses an important and timely question that will be of broad interest to psychologists and neuroscientists. The computational approach to arrive at normative principles for evidence weighting across environments with different levels of correlation is elegant, makes strong connections with prior work in different decision-making contexts, and should serve as a valuable reference point for future studies in this domain. The empirical study is well designed and executed, and the modelling approach applied to these data showcases an impressively deep understanding of relationships between different parameters of the drift diffusion model and its novel application to this setting. Another strength of the study is that it is preregistered.

      In my view, any major weaknesses of the study have been well addressed by the authors during review. An outstanding question that arises from the current work and remains unanswered here is around the (normative?) origin of the correlation underestimates, and the present work lays a strong foundation from which to pursue this question in the future.

    4. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their thoughtful feedback. We have made substantial revisions to the manuscript to address each of their comments, as we detail below. We want to highlight one major change in particular that addresses a concern raised by both reviewers: the role of the drift rate in our models. Motivated by their astute comments, we went back through our models and realized that we had made a particular assumption that deserved more scrutiny. We previously assumed that the process of encoding the observations made correct use of the objective, generative correlation, but then the process of calculating the weight of evidence used a mis-scaled, subjective version of the correlation. These assumptions led us to scale the drift rate in the model by a term that quantified how the standard deviation of the observation distribution was affected by the objective correlation (encoding), but to scale the bound height by the subjective estimate of the correlation (evidence weighing). However, we realized that encoding may also depend on the subjective correlation experienced by the participant. We have now tested several alternative models and found that the best-fitting model assumes that a single, subjective estimate of the correlation governs both encoding and evidence weighing. An important consequence of updating our models in this way is that we can now account for the behavioral data without needing the additional correlation-dependent drift terms (which, as reviewer #2 pointed out, were difficult to explain).

      We also note that we changed the title slightly, replacing “weighting” with “weighing” for consistency with our usage throughout the manuscript.

      Please see below for more details about this important point and our responses to the reviewers’ specific concerns. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The behavioral strategies underlying decisions based on perceptual evidence are often studied in the lab with stimuli whose elements provide independent pieces of decision-related evidence that can thus be equally weighted to form a decision. In more natural scenarios, in contrast, the information provided by these pieces is often correlated, which impacts how they should be weighted. Tardiff, Kang & Gold set out to study decisions based on correlated evidence and compare the observed behavior of human decision-makers to normative decision strategies. To do so, they presented participants with visual sequences of pairs of localized cues whose location was either uncorrelated, or positively or negatively correlated, and whose mean location across a sequence determined the correct choice. Importantly, they adjusted this mean location such that, when correctly weighted, each pair of cues was equally informative, irrespective of how correlated it was. Thus, if participants follow the normative decision strategy, their choices and reaction times should not be impacted by these correlations. While Tardiff and colleagues found no impact of correlations on choices, they did find them to impact reaction times, suggesting that participants deviated from the normative decision strategy. To assess the degree of this deviation, Tardiff et al. adjusted drift-diffusion models (DDMs) for decision-making to process correlated decision evidence. Fitting these models to the behavior of individual participants revealed that participants considered correlations when weighing evidence, but did so with a slight underestimation of the magnitude of this correlation. This finding made Tardiff et al. conclude that participants followed a close-to-normative decision strategy that adequately took into account correlated evidence.

      Strengths:

      The authors adjust a previously used experimental design to include correlated evidence in a simple, yet powerful way. The way it does so is easy to understand and intuitive, such that participants don't need extensive training to perform the task. Limited training makes it more likely that the observed behavior is natural and reflective of everyday decision-making. Furthermore, the design allowed the authors to make the amount of decision-related evidence equal across different correlation magnitudes, which makes it easy to assess whether participants correctly take account of these correlations when weighing evidence: if they do, their behavior should not be impacted by the correlation magnitude.

      The relative simplicity with which correlated evidence is introduced also allowed the authors to fall back to the well-established DDM for perceptual decisions, which has few parameters, is known to implement the normative decision strategy in certain circumstances, and enjoys a great deal of empirical support. The authors show how correlations ought to impact these parameters, and which changes in parameters one would expect to see if participants misestimate these correlations or ignore them altogether (i.e., estimate correlations to be zero). This allowed them to assess the degree to which participants took into account correlations on the full continuum from perfect evidence weighting to complete ignorance. With this, they could show that participants in fact performed rational evidence weighting if one assumed that they slightly underestimated the correlation magnitude.

      Weaknesses:

      The experiment varies the correlation magnitude across trials such that participants need to estimate this magnitude within individual trials. This has several consequences:

      (1) Given that correlation magnitudes are estimated from limited data, the (subjective) estimates might be biased towards their average. This implies that, while the amount of evidence provided by each 'sample' is objectively independent of the correlation magnitude, it might subjectively depend on the correlation magnitude. As a result, the normative strategy might differ across correlation magnitudes, unlike what is suggested in the paper. In fact, it might be the case that the observed correlation magnitude underestimates corresponds to the normative strategy.

      We thank the reviewer for raising this interesting point, which we now address directly with new analyses including model fits (pp. 15–24). These analyses show that the participants were computing correlation-dependent weights of evidence from observation distributions that reflected suboptimal misestimates of correlation magnitudes. This strategy is normative in the sense that it is the best that they can do, given the encoding suboptimality. However, as we note in the manuscript, we do not know the source of the encoding suboptimality (pp. 23–24). We thus do not know if there might be a strategy they could have used to make the encoding more optimal.

      (2) The authors link the normative decision strategy to putting a bound on the log-likelihood ratio (logLR), as implemented by the two decision boundaries in DDMs. However, as the authors also highlight in their discussion, the 'particle location' in DDMs ceases to correspond to the logLR as soon as the strength of evidence varies across trials and isn't known by the decision maker before the start of each trial. In fact, in the used experiment, the strength of evidence is modulated in two ways:

      (i) by the (uncorrected) distance of the cue location mean from the decision boundary (what the authors call the evidence strength) and

      (ii) by the correlation magnitude. Both vary pseudo-randomly across trials, and are unknown to the decision-maker at the start of each trial. As previous work has shown (e.g. Kiani & Shadlen (2009), Drugowitsch et al. (2012)), the normative strategy then requires averaging over different evidence strength magnitudes while forming one's belief. This averaging causes the 'particle location' to deviate from the logLR. This deviation makes it unclear if the DDM used in the paper indeed implements the normative strategy, or is even a good approximation to it.

      We appreciate this subtle, but important, point. We now clarify that the DDM we use includes degrees of freedom that are consistent with normative decision processes that rely on the imperfect knowledge that participants have about the generative process on each trial, specifically: 1) a single drift-rate parameter that is fit to data across different values of the mean of the generative distribution, which is based on the standard assumption for these kinds of task conditions in which stimulus strength is varied randomly from trial-to-trial and thus prevents the use of exact logLR (which would require stimulus strength-specific scale factors; Gold and Shadlen, 2001); 2) the use of a collapsing bound, which in certain cases (including our task) is thought to support a stimulus strength-dependent calibration of the decision variable to optimize decisions (Drugowitsch et al, 2012); and 3) free parameters (one per correlation) to account for subjective estimates of the correlation, which affected the encoding of the observations that are otherwise weighed in a normative manner in the best-fitting model.

      Also, to clarify our terminology, we define the objective evidence strength as the expected logLR in a given condition, which for our task is dependent on both the distance of the mean from the decision boundary and the correlation (p. 7). 

      Given that participants observe 5 evidence samples per second and on average require multiple seconds to form their decisions, it might be that they are able to form a fairly precise estimate of the correlation magnitude within individual trials. However, whether this is indeed the case is not clear from the paper.

      These points are now addressed directly in Results (pp. 23–24) and Figure 7 supplemental figures 1–3. Specifically, we show that, as the reviewer correctly surmised above, empirical correlations computed on each trial tended to be biased towards zero (Fig 7–figure supplement 1). However, two other analyses were not consistent with the idea that participants’ decisions were based on trial-by-trial estimates of the empirical correlations: 1) those with the shortest RTs did not have the most-biased estimates (Fig 7–figure supplement 2), and 2) there was no systematic relationship between objective and subjective fit correlations across participants (Fig 7–figure supplement 3).

      Furthermore, the authors capture any underestimation of the correlation magnitude by an adjustment to the DDM bound parameter. They justify this adjustment by asking how this bound parameter needs to be set to achieve correlation-independent psychometric curves (as observed in their experiments) even if participants use a 'wrong' correlation magnitude to process the provided evidence. Curiously, however, the drift rate, which is the second critical DDM parameter, is not adjusted in the same way. If participants use the 'wrong' correlation magnitude, then wouldn't this lead to a mis-weighting of the evidence that would also impact the drift rate? The current model does not account for this, such that the provided estimates of the mis-estimated correlation magnitudes might be biased.

      We appreciate this valuable comment, and we agree that we previously neglected the potential impact of correlation misestimates on evidence strength. As we now clarify, the correlation enters these models in two ways: 1) via its effect on how the observations are encoded, which involves scaling both the drift and the bound; and 2) via its effect on evidence weighing, which involves scaling only the bound (pp. 15–18). We previously assumed that only the second form of scaling might involve a subjective (mis-)estimate of the correlation. We now examine several models that also include the possibility of either or both forms using subjective correlation estimates. We show that a model that assumes that the same subjective estimate drives both encoding and weighing (the “full-rho-hat” model) best accounts for the data. This model provides better fits (after accounting for differences in numbers of parameters) than models with: 1) no correlation-dependent adjustments (“base” model), 2) separate drift parameters for each correlation condition (“drift” model), 3) optimal (correlation-dependent) encoding but suboptimal weighing (“bound-rho-hat” model, which was our previous formulation), 4) suboptimal encoding and weighing (“scaled-rho-hat” model), and 5) optimal encoding but suboptimal weighing and separate correlation-dependent adjustments to the drift rate (“boundrho-hat plus drift” model). We have substantially revised Figures 5–7 and the associated text to address these points.

      Lastly, the paper makes it hard to assess how much better the participants' choices would be if they used the correct correlation magnitudes rather than underestimates thereof. This is important to know, as it only makes sense to strictly follow the normative strategy if it comes with a significant performance gain.

      We now include new analyses in Fig. 7 that demonstrate how much participants' choices and RT deviate from: 1) an ideal observer using the objective correlations, and 2) an observer who failed to adjust for the fit subjective correlation when weighing the evidence (i.e., using the subjective correlation for encoding but a correlation of zero for weighing). We now indicate that participants’ performance was quite close to that predicted by the ideal observer (using the true, objective correlation) for many conditions. Thus, we agree that they might not have had the impetus to optimize the decision process further, assuming it were possible under these task conditions.

      Reviewer #2 (Public review):

      Summary:

      This study by Tardiff, Kang & Gold seeks to: i) develop a normative account of how observers should adapt their decision-making across environments with different levels of correlation between successive pairs of observations, and ii) assess whether human decisions in such environments are consistent with this normative model.

      The authors first demonstrate that, in the range of environments under consideration here, an observer with full knowledge of the generative statistics should take both the magnitude and sign of the underlying correlation into account when assigning weight in their decisions to new observations: stronger negative correlations should translate into stronger weighting (due to the greater information furnished by an anticorrelated generative source), while stronger positive correlations should translate into weaker weighting (due to the greater redundancy of information provided by a positively correlated generative source). The authors then report an empirical study in which human participants performed a perceptual decision-making task requiring accumulation of information provided by pairs of perceptual samples, under different levels of pairwise correlation. They describe a nuanced pattern of results with effects of correlation being largely restricted to response times and not choice accuracy, which could partly be captured through fits of their normative model (in this implementation, an extension of the well-known drift-diffusion model) to the participants' behaviour while allowing for misestimation of the underlying correlations.

      Strengths:

      As the authors point out in their very well-written paper, appropriate weighting of information gathered in correlated environments has important consequences for real-world decisionmaking. Yet, while this function has been well studied for 'high-level' (e.g. economic) decisions, how we account for correlations when making simple perceptual decisions on well-controlled behavioural tasks has not been investigated. As such, this study addresses an important and timely question that will be of broad interest to psychologists and neuroscientists. The computational approach to arrive at normative principles for evidence weighting across environments with different levels of correlation is very elegant, makes strong connections with prior work in different decision-making contexts, and should serve as a valuable reference point for future studies in this domain. The empirical study is well designed and executed, and the modelling approach applied to these data showcases a deep understanding of relationships between different parameters of the drift-diffusion model and its application to this setting. Another strength of the study is that it is preregistered.

      Weaknesses:

      In my view, the major weaknesses of the study center on the narrow focus and subsequent interpretation of the modelling applied to the empirical data. I elaborate on each below:

      Modelling interpretation: the authors' preference for fitting and interpreting the observed behavioural effects primarily in terms of raising or lowering the decision bound is not well motivated and will potentially be confusing for readers, for several reasons. First, the entire study is conceived, in the Introduction and first part of the Results at least, as an investigation of appropriate adjustments of evidence weighting in the face of varying correlations. The authors do describe how changes in the scaling of the evidence in the drift-diffusion model are mathematically equivalent to changes in the decision bound - but this comes amidst a lengthy treatment of the interaction between different parameters of the model and aspects of the current task which I must admit to finding challenging to follow, and the motivation behind shifting the focus to bound adjustments remained quite opaque. 

      We appreciate this valuable feedback. We have revised the text in several places to make these important points more clearly. For example, in the Introduction we now clarify that “The weight of evidence is computed as a scaled version of each observation (the scaling can be applied to the observations or to the bound, which are mathematically equivalent; Green and Swets, 1966) to form the logLR” (p. 3). We also provide more details and intuition in the Results section for how and why we implemented the DDM the way we did. In particular, we now emphasize that the correlation enters these models in two ways: 1) via its effect on encoding the observations, which scales both the drift and the bound; and 2) via its effect on evidence weighing, which scales only the bound (pp. 15–18).

      Second, and more seriously, bound adjustments of the form modelled here do not seem to be a viable candidate for producing behavioural effects of varying correlations on this task. As the authors state toward the end of the Introduction, the decision bound is typically conceived of as being "predefined" - that is, set before a trial begins, at a level that should strike an appropriate balance between producing fast and accurate decisions. There is an abundance of evidence now that bounds can change over the course of a trial - but typically these changes are considered to be consistently applied in response to learned, predictable constraints imposed by a particular task (e.g. response deadlines, varying evidence strengths). In the present case, however, the critical consideration is that the correlation conditions were randomly interleaved across trials and were not signaled to participants in advance of each trial - and as such, what correlation the participant would encounter on an upcoming trial could not be predicted. It is unclear, then, how participants are meant to have implemented the bound adjustments prescribed by the model fits. At best, participants needed to form estimates of the correlation strength/direction (only possible by observing several pairs of samples in sequence) as each trial unfolded, and they might have dynamically adjusted their bounds (e.g. collapsing at a different rate across correlation conditions) in the process. But this is very different from the modelling approach that was taken. In general, then, I view the emphasis on bound adjustment as the candidate mechanism for producing the observed behavioural effects to be unjustified (see also next point).

      We again appreciate this valuable feedback and have made a number of revisions to try to clarify these points. In addition to addressing the equivalence of scaling the evidence and the bound in the Introduction, we have added the following section to Results (Results, p.18):

      “Note that scaling the bound in these formulations follows conventions of the DDM, as detailed above, to facilitate interpretation of the parameters. These formulations also raise an apparent contradiction: the “predefined” bound is scaled by subjective estimates of the correlation, but the correlation was randomized from trial to trial and thus could not be known in advance. However, scaling the bound in these ways is mathematically equivalent to using a fixed bound on each trial and scaling the observations to approximate logLR (see Methods). This equivalence implies that in the brain, effectively scaling a “predefined” bound could occur when assigning a weight of evidence to the observations as they are presented.”

      We also note in Methods (pp. 40–41):

      “In the DDM, this scaling of the evidence is equivalent to assuming that the decision variable accumulates momentary evidence of the form (x1 + x2) and then dividing the bound height by the appropriate scale factor. An alternative approach would be to scale both the signal and noise components of the DDM by the scale factor. However, scaling the bound is both simpler and maintains the conventional interpretation of the DDM parameters in which the bound reflects the decision-related components of the evidence accumulation process, and the drift rate represents sensory-related components.”

      We believe we provide strong evidence that participants adjust their evidence weighing to account for the correlations (see response below), but we remain agnostic as to how exactly this weighing is implemented in the brain.

      Modelling focus: Related to the previous point, it is stated that participants' choice and RT patterns across correlation conditions were qualitatively consistent with bound adjustments (p.20), but evidence for this claim is limited. Bound adjustments imply effects on both accuracy and RTs, but the data here show either only effects on RTs, or RT effects mixed with accuracy trends that are in the opposite direction to what would be expected from bound adjustment (i.e. slower RT with a trend toward diminished accuracy in the strong negative correlation condition; Figure 3b). Allowing both drift rate and bound to vary with correlation conditions allowed the model to provide a better account of the data in the strong correlation conditions - but from what I can tell this is not consistent with the authors' preregistered hypotheses, and they rely on a posthoc explanation that is necessarily speculative and cannot presently be tested (that the diminished drift rates for higher negative correlations are due to imperfect mapping between subjective evidence strength and the experimenter-controlled adjustment to objective evidence strengths to account for effects of correlations). In my opinion, there are other candidate explanations for the observed effects that could be tested but lie outside of the relatively narrow focus of the current modelling efforts. Both explanations arise from aspects of the task, which are not mutually exclusive. The first is that an interesting aspect of this task, which contrasts with most common 'univariate' perceptual decision-making tasks, is that participants need to integrate two pieces of information at a time, which may or may not require an additional computational step (e.g. averaging of two spatial locations before adding a single quantum of evidence to the building decision variable). There is abundant evidence that such intermediate computations on the evidence can give rise to certain forms of bias in the way that evidence is accumulated (e.g. 'selective integration' as outlined in Usher et al., 2019, Current Directions in Psychological Science; Luyckx et al., 2020, Cerebral Cortex) which may affect RTs and/or accuracy on the current task. The second candidate explanation is that participants in the current study were only given 200 ms to process and accumulate each pair of evidence samples, which may create a processing bottleneck causing certain pairs or individual samples to be missed (and which, assuming fixed decision bounds, would presumably selectively affect RT and not accuracy). If I were to speculate, I would say that both factors could be exacerbated in the negative correlation conditions, where pairs of samples will on average be more 'conflicting' (i.e. further apart) and, speculatively, more challenging to process in the limited time available here to participants. Such possibilities could be tested through, for example, an interrogation paradigm version of the current task which would allow the impact of individual pairs of evidence samples to be more straightforwardly assessed; and by assessing the impact of varying inter-sample intervals on the behavioural effects reported presently.

      We thank the reviewer for this thoughtful and valuable feedback. We have thoroughly updated the modeling section to include new analysis and clearer descriptions and interpretations of our findings (including Figs. 5–7 and additional references to the Usher, Luyckx, and other studies that identified decision suboptimalities). The comment about “an additional computational step” in converting the observations to evidence was particularly useful, in that it made us realize that we were making what we now consider to be a faulty assumption in our version of the DDM. Specifically, we assumed that subjective misestimates of the correlation affected how observations were converted to evidence (logLR) to form the decision (implemented as a scaling of the bound height), but we neglected to consider how suboptimalities in encoding the observations could also lead to misestimates of the correlation. We have retained the previous best-fitting models in the text, for comparison (the “bound-rho-hat” and “bound-rho-hat + drift” models). In addition, we now include a “full-rho-hat” model that assumes that misestimates of rho affect both the encoding of the observations, which affects the drift rate and bound height, and the weighing of the evidence, which affects only the bound height. This was the best-fitting model for most participants (after accounting for different numbers of parameters associated with the different models we tested). Note that the full-rho-hat model predicts the lack of correlation-dependent choice effects and the substantial correlation-dependent RT effects that we observed, without requiring any additional adjustments to the drift rate (as we resorted to previously).

      In summary, we believe that we now have a much more parsimonious account of our data, in terms of a model in which subjective estimates of the correlation are alone able to account for our patterns of choice and RT data. We fully agree that more work is needed to better understand the source of these misestimates but also think those questions are outside the scope of the present study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A few minor comments:

      (1) Evidence can be correlated in multiple ways. It could be correlated within individual pieces of evidence in a sequence, or across elements in that sequence (e.g., across time). This distinction is important, as it determines how evidence ought to be accumulated across time. In particular, if evidence is correlated across time, simply summing it up might be the wrong thing to do. Thus, it would be beneficial to make this distinction in the Introduction, and to mention that this paper is only concerned with the first type of correlation.

      We now clarify this point in the Introduction (p. 5–6).

      (2) It is unclear without reading the Methods how the blue dashed line in Figure 4c is generated. To my understanding, it is a prediction of the naive DDM model. Is this correct?

      We now specify the models used to make the predictions shown in Fig. 4c (which now includes an additional model that uses unscaled observations as evidence).

      (3) In Methods, given the importance of the distribution of x1 + x2, it would be useful to write it out explicitly, e.g., x1 + x2 ~ N(2 mu_g, ..), specifying its mean and its variance.

      Excellent suggestion, added to p. 38.

      (4) From Methods and the caption of Figure 6 - Supplement 1 it becomes clear that the fitted DDM features a bound that collapses over time. I think that this should also be mentioned in the main text, as it is a not-too-unimportant feature of the model.

      Excellent suggestion, added to p. 15, with reference to Fig. 6-supplement 1 on p. 20.

      (5) The functional form of the bound is 2 (B - tb t). To my understanding, the effective B changes as a function of the correlation magnitude. Does tb as well? If not, wouldn't it be better if it does, to ensure that 2 (B - tb t) = 0 independent of the correlation magnitude?

      In our initial modeling, we also considered whether the correlation-dependent adjustment, which is a function of both correlation sign and magnitude, should be applied to the initial bound or to the instantaneous bound (i.e., after collapse, affecting tb as well). In a pilot analysis of data from 22 participants in the 0.6 correlation-magnitude group, we found that this choice had a negligible effect on the goodness-of-fit (deltaAIC = -0.9, protected exceedance probability = 0.63, in favor of the instantaneous bound scaling). We therefore used the instantaneous bound version in the analyses reported in the manuscript but doubt this choice was critical based on these results. We have clarified our implementation of the bound in Methods (p. 43–44).

      Reviewer #2 (Recommendations for the authors):

      In addition to the points raised above, I have some minor suggestions/open questions that arose from my reading of the manuscript:

      (1) Are the predictions outlined in the paper specific to cases where the two sources are symmetric around zero? If distributions are allowed to be asymmetric then one can imagine cases (i.e. when distribution means are sufficiently offset from one another) where positive correlations can increase evidence strength and negative correlations decrease evidence strength. There's absolutely still value and much elegance in what the authors are showing with this work, but if my intuition is correct, it should ideally be acknowledged that the predictions are restricted to a specific set of generative circumstances.

      We agree that there are a lot of ways to manipulate correlations and their effect on the weight of evidence. At the end of the Discussion, we emphasize that our results apply to this particular form of correlation (p. 32).

      (2) Isn't Figure 4C misleading in the sense that it collapses across the asymmetry in the effect of negative vs positive correlations on RT, which is clearly there in the data and which simply adjusting the correlation-dependent scale factor will not reproduce?

      We agree that this analysis does not address any asymmetries in suboptimal estimates of positive versus negative correlations. We believe that those effects are much better addressed using the model fitting, which we present later in the Results section. We have now simplified the analyses in Fig. 4c, reporting the difference in RT between positive and negative correlation conditions instead of a linear regression.

      (3) I found the transition on p.17 of the Results section from the scaling of drift rate by correlation to scaling of bound height to be quite abrupt and unclear. I suspect that many readers coming from a typical DDM modelling background will be operating under the assumption that drift rate and bound height are independent, and I think more could be done here to explain why scaling one parameter by correlation in the present case is in fact directly equivalent to scaling the other.

      Thank you for the very useful feedback, we have substantially revised this text to make these points more clearly.

      (4) P.3, typo: Alan *Turing*

      That’s embarrassing. Fixed.

      (5) P.27, typo: "participants adopt a *fixed* bound"

      Fixed.

    1. eLife Assessment

      This study focuses on the role of a T-cell-specific receptor, ctla-4, in a new zebrafish model of IBD-like phenotype. Although implicated in IBD diseases, the function of ctla-4 has been hard to study in mice as the KO is lethal. Ctla-4 mutant zebrafish exhibited significant intestinal inflammation and dysbiosis, mirroring the pathology of inflammatory bowel disease (IBD) in mammals, providing a new valuable model to the field of IBD research. This is an key study with convincing evidence, comprehensive transcriptomic analysis, histological examinations, and functional assays all supporting the findings.

    2. Reviewer #1 (Public review):

      "Unraveling the Role of Ctla-4 in Intestinal Immune Homeostasis: Insights from a novel Zebrafish Model of Inflammatory Bowel Disease" generates a 14bp deletion/early stop codon mutation that is viable in a zebrafish homolog of ctla-4. This mutant exhibits an IBD-like phenotype, including decreased intestinal length, abnormal intestinal folds, decreased goblet cells, abnormal cell junctions between epithelial cells, increased inflammation, and alterations in microbial diversity. Bulk and single-cell RNA-seq show upregulation of immune and inflammatory response genes in this mutant (especially in neutrophils, B cells, and macrophages) and downregulation of genes involved in adhesion and tight junctions in mutant enterocytes. The work suggests that the makeup of immune cells within the intestine is altered in these mutants, potentially due to changes in lymphocyte proliferation. Introduction of recombinant soluble Ctla-4-Ig to mutant zebrafish rescued body weight, histological phenotypes, and gene expression of several pro-inflammatory genes, suggesting a potential future therapeutic route.

      Strengths:

      - Generation of a useful new mutant in zebrafish ctla-4<br /> - The demonstration of an IBD-like phenotype in this mutant is extremely comprehensive.<br /> - Demonstrated gene expression differences provide mechanistic insight into how this mutation leads to IBD-like symptoms.<br /> - Demonstration of rescue with a soluble protein suggests exciting future therapeutic potential<br /> - The manuscript is mostly well organized and well written.

      Initial Weaknesses were addressed during review.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to elucidate the role of Ctla-4 in maintaining intestinal immune homeostasis by using a novel Ctla-4-deficient zebrafish model. This study addresses the challenge of linking CTLA-4 to inflammatory bowel disease (IBD) due to the early lethality of CTLA-4 knockout mice. Four lines of evidence were shown to show that Ctla-4-deficient zebrafish exhibited hallmarks of IBD in mammals: 1) impaired epithelial integrity and infiltration of inflammatory cells; 2) enrichment of inflammation-related pathways and the imbalance between pro- and anti-inflammatory cytokines; 3) abnormal composition of immune cell populations; and 4) reduced diversity and altered microbiota composition. By employing various molecular and cellular analyses, the authors established ctla-4-deficient zebrafish as a convincing model of human IBD.

      Strengths:

      The characterization of the mutant phenotype is very thorough, from anatomical to histological and molecular levels. The finding effectively established ctla-4 mutants as a novel zebrafish model for investigating human IBD. Evidence from the histopathological and transcriptome analysis was very strong and supports a severe interruption of immune system homeostasis in the zebrafish intestine. Additional characterization using sCtla-4-Ig further probed the molecular mechanism of the inflammatory response, and provided a potential treatment plan for targeting Ctla-4 in IBD models.

      Weaknesses:

      To probe the molecular mechanism of Ctla-4, the authors used a spectrum of antibodies that target Ctla-4 or its receptors. The phenotype assayed was lymphocyte proliferation, while it was the composition rather than number of immune cells that was observed to be different in the scRNASeq assay. Although sCtla-4 has an effect of alleviating the IBD-like phenotypes, I found this explanation a bit oversimplified.

      Comments on revised version:

      The authors have sufficiently addressed all my concerns and I don't have further suggestions.

    4. Reviewer #3 (Public review):

      Summary:

      Current study on the mutant zebrafish for IBD modeling is worth trying. The author provided lots of evidence, including histopathological observation, gut microflora, as well as intestinal tissue or mucosa cells' transcriptomic data. The multi-omic study has demonstrated the enteritis pathology at multi levels in zebrafish model.

      Strengths:

      The important immune checkpoint of Treg cells were knockout in zebrafish, and the enteritis were found then. It could be a substitution of mouse knockout model to investigate the molecular mechanism of gut disease.

      Weaknesses:

      (1) In Fig. 2I, as to the purple glycogen signals stained by PAS was ignored for the quantitative statistics. The purple stained area could be calculated by ImageJ.<br /> (2) Those characters in Fig. 3G are too small to recognize. It is suggested to adjusted this picture or just put it in the supplementation, with bigger size.<br /> (3) The tissue seems damaged for IgG ctrl in Fig. 8B. It is suggested to find another slice to present here.<br /> (4) Line 667 & 743: "16S rRNA sequencing" should be "16S rRNA gene sequencing". Please check this point throughout the text.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      The manuscript suggests the zebrafish homolog of ctla-4 and generates a new mutant in it. However, the locus that is mutated is confusingly annotated as both CD28 (current main annotation in ZFIN) and CTLA-4/CD152 (one publication from 2020), see: https://zfin.org/ZDB-GENE-070912-128. Both human CTLA-4 and CD28 align with relatively similar scores to this gene. There seem to be other orthologs of these receptors in the zebrafish genome, including CD28-like (https://zfin.org/ZDB-GENE-070912-309) which neighbors the gene annotated as CD28 (exhibiting similar synteny as human CD28 and CTLA-4). It would be helpful to provide more information to distinguish between this family of genes and to further strengthen the evidence that this mutant is in ctla-4, not cd28. Also, is one of these genes in the zebrafish genome (e.g. cd28l) potentially a second homolog of CTLA-4? Is this why this mutant is viable in zebrafish and not mammals? Some suggestions:

      (a) A more extensive sequence alignment that considers both CTLA-4 and CD28, potentially identifying the best homolog of each human gene, especially taking into account any regions that are known to produce the functional differences between these receptors in mammals and effectively assigns identities to the two genes annotated as "cd28" and "cd28l" as well as the gene "si:dkey-1H24.6" that your CD28 ORF primers seem to bind to in zebrafish.

      In response to the reviewer's insightful suggestions, we have conducted more extensive sequence alignment and phylogenetic analyses that consider both CTLA-4, CD28, and CD28-like molecules, taking into account key regions crucial for the functionalities and functional differences between these molecules across various species, including mammals and zebrafish.

      Identification of zebrafish Ctla-4: We identified zebrafish Ctla-4 as a homolog of mammalian CTLA-4 based on key conserved structural and functional characteristics. Structurally, the Ctla-4 gene shares similar exon organization compared to mammalian CTLA-4. Ctla-4 is a type I transmembrane protein with typical immunoglobulin superfamily features. Multiple amino acid sequence alignments revealed that Ctla-4 contains a <sup>113</sup>LFPPPY<sup>118</sup> motif and a <sup>123</sup>GNGT<sup>126</sup> motif in the ectodomain, and a tyrosine-based <sup>206</sup>YVKF<sup>209</sup> motif in the distal C-terminal region. These motifs closely resemble MYPPPY, GNGT, and YVKM motifs in mammalian CTLA-4s, which are essential for binding to CD80/CD86 ligands and molecular internalization and signaling inhibition. Despite only 23.7% sequence identity to human CTLA-4, zebrafish Ctla-4 exhibits a similar tertiary structure with a two-layer β-sandwich architecture in its extracellular IgV-like domain. Four cysteine residues responsible for the formation of two pairs of disulfide bonds (Cys<sup>20</sup>-Cys<sup>91</sup>/Cys<sup>46</sup>-Cys<sup>65</sup> in zebrafish and Cys<sup>21</sup>-Cys<sup>92</sup>/Cys<sup>48</sup>-Cys<sup>66</sup> in humans) that connect the two-layer β-sandwich are conserved. Additionally, a separate cysteine residue (Cys<sup>120</sup> in zebrafish and Cys<sup>120</sup> in humans) involved in dimerization is also present, and Western blot analysis under reducing and non-reducing conditions confirmed Ctla-4’s dimerization. Phylogenetically, Ctla-4 clusters with other known CTLA-4 homologs from different species with high bootstrap probability, while zebrafish Cd28 groups separately with other CD28s. Functionally, Ctla-4 is predominantly expressed on CD4<sup>+</sup> T and CD8<sup>+</sup> T cells in zebrafish. It plays a pivotal inhibitory role in T cell activation by competing with CD28 for binding to CD80/86, as validated through a series of both in vitro and in vivo assays, including microscale thermophoresis assays which demonstrated that Ctla-4 exhibits a significantly higher affinity for Cd80/86 than Cd28 (KD = 0.50 ± 0.25 μM vs. KD = 2.64 ± 0.45 μM). These findings confirm Ctla-4 as an immune checkpoint molecule, reinforcing its identification within the CTLA-4 family.

      Comparison between zebrafish Cd28 and "Cd28l": Zebrafish Cd28 contains an extracellular SYPPPF motif and an intracellular FYIQ motif. The extracellular SYPPPF motif is essential for binding to Cd80/CD86, while the intracellular FYIQ motif likely mediates kinase recruitment and co-stimulatory signaling. In contrast, the "Cd28l" molecule lacks the SYPPPF motif, which is critical for Cd80/CD86 binding, and exhibits strong similarity in its C-terminal 79 amino acids to Ctla-4 rather than Cd28. Consequently, "Cd28l" resembles an atypical Ctla-4-like molecule but fails to exhibit Cd80/CD86 binding activity.

      We have incorporated the relevant analysis results into the main text of the revised manuscript and updated Supplementary Figure 1. Additionally, we provide key supplementary analyses here for the reviewer's convenience.  

      Author response image 1.

      Illustrates the alignment of Ctla-4 (XP_005167576.1) and Ctla-4-like (XP_005167567.1, previously referred to as "Cd28l") in zebrafish, generated using ClustalX and Jalview. Conserved and partially conserved amino acid residues are highlighted in color gradients ranging from carnation to red, respectively. The B7-binding motif is encircled with a red square.

      (b) Clearer description in the main text of such an analysis to better establish that the mutated gene is a homolog of ctla-4, NOT cd28.

      We appreciate the reviewer's advice. Additional confirmation of zebrafish Ctla-4 is detailed in lines 119-126 of the revised manuscript.

      (c) Are there mammalian anti-ctla-4 and/or anti-cd28 antibodies that are expected to bind to these zebrafish proteins? If so, looking to see whether staining is lost (or western blotting is lost) in your mutants could be additionally informative. (Our understanding is that your mouse anti-Ctla-4 antibody is raised against recombinant protein generated from this same locus, and so is an elegant demonstration that your mutant eliminates the production of the protein, but unfortunately does not contribute additional information to help establish its homology to mammalian proteins).

      This suggestion holds significant value. However, a major challenge in fish immunology research is the limited availability of antibodies suitable for use in fish species; antibodies developed for mammals are generally not applicable. We attempted to use human and mouse anti-CTLA-4 and anti-CD28 antibodies to identify Ctla-4 and Cd28 in zebrafish, but the results were inconclusive, with no expected signals. This outcome likely arises from the low sequence identity between human/mouse CTLA-4 and CD28 and their zebrafish homologs (ranging from 21.3% to 23.7% for CTLA-4 and 21.2% to 24.0% for CD28). Therefore, developing specific antibodies against zebrafish Ctla-4 is essential for advancing this research.

      The methods section is generally insufficient and doesn't describe many of the experiments performed in this manuscript. Some examples:

      (a) No description of antibodies used for staining or Western blots (Figure1C, 1D, 1F).

      (b) No description of immunofluorescence protocol (Figure 1D, 1F).

      (c) No description of Western blot protocol (Figure 1C, 2C).

      (d) No description of electron microscopy approach (Figure 2K).

      (e) No description of the approach for determining microbial diversity (Entirety of Figure 6).

      (f) No description of PHA/CFSE/Flow experiments (Figure 7A-E).

      (g) No description of AlphaFold approach (Figures 7F-G).

      (h) No description of co-IP approach (Figure 7H).

      (i) No description of MST assay or experiment (Figure 7I).

      (j) No description of purification of recombinant proteins, generation of anti-Ctla-4 antibody, or molecular interaction assays (Figures S2 and S6).

      We apologize for this oversight. The methods section was inadvertently incomplete due to an error during the file upload process at submission. This issue has been addressed in the revised manuscript. We appreciate your understanding.

      Figure 5 suggests that there are more Th2 cells 1, Th2 cells 2, and NKT cells in ctla-4 mutants through scRNA-seq. However, as the cell numbers for these are low in both genotypes, there is only a single replicate for each genotype scRNA-seq experiment, and dissociation stress can skew cell-type proportions, this finding would be much more convincing if another method that does not depend on dissociation was used to verify these results. Furthermore, while Th2 cells 2 are almost absent in WT scRNA-seq, KEGG analysis suggests that a major contributor to their clustering may be ribosomal genes (Fig. 5I). Since no batch correction was described in the methods, it would be beneficial to verify the presence of this cluster in ctla-4 mutants and WT animals through other means, such as in situ hybridization or transgenic lines.   

      We are grateful for the insightful comments provided by the reviewer. Given that research on T cell subpopulations in fish is still in its nascent stages, the availability of specific marker antibodies and relevant transgenic strains remains limited. Our single-cell RNA sequencing (scRNA-seq) analysis revealed that a distinct Th2 subset 2 was predominantly observed in Ctla-4 mutants but was rare in wild-type zebrafish, it suggests that this subset may primarily arise under pathological conditions associated with Ctla-4 mutation. Due to the near absence of Th2 subset 2 in wild-type samples, KEGG enrichment analysis was performed exclusively on this subset from Ctla-4-deficient intestines. The ribosome pathway was significantly enriched, suggesting that these cells may be activated to fulfill their effector functions. However, confirming the presence of Th2 subset 2 using in situ hybridization or transgenic zebrafish lines is currently challenging due to the lack of lineage-specific markers for detailed classification of Th2 cell subsets and the preliminary nature of scRNA-seq predictions.

      To address the reviewers' suggestion to confirm compositional changes in Th2 and NKT cells using dissociation-independent methods, we quantified mRNA levels of Th2 (il4, il13, and gata3) and NKT (nkl.2, nkl.4, and prf1.1) cell marker genes via RT-qPCR in intestines from wild-type and mutant zebrafish. As shown in Figure S7B and S7C, these markers were significantly upregulated in Ctla-4-deficient intestines compared to wild-type controls. This indicates an overall increase in Th2 and NKT cell activity in mutant zebrafish, aligning with our scRNA-seq analysis and supports the validity of our initial findings.

      Before analyzing the scRNA-seq data, we performed batch correction using the Harmony algorithm via cloud-based Cumulus v1.0 on the aggregated gene-count matrices. This methodological detail has been included in the “Materials and Methods” section of the revised manuscript. Moreover, the RT-qPCR results are presented in Supplementary Figures S7B and S7C.

      Quality control (e.g., no. of UMIs, no. of genes, etc.) metrics of the scRNAseq experiments should be presented in the supplementary information for each sample to help support that observed differential expression is not merely an outcome of different sequencing depths of the two samples.

      As illustrated in Fig. S5, the quality control data have been supplemented to include the effective cell number of the sample, along with pre- and post-filtering metrics such as nFeature_RNA, nCount_RNA and mitochondrial percentage (percent.mito). Furthermore, scatter plots comparing the basic information of the sample cells before and after filtering are provided.

      Some references to prior research lack citations. Examples:

      (a)"Given that Ctla-4 is primarily expressed on T cells (Figure 1E-F), and its absence has been shown to result in intestinal immune dysregulation, indicating a crucial role of this molecule as a conserved immune checkpoint in T cell inhibition."

      The references were incorporated into line 71 of the revised manuscript.

      (b) Line 83: Cite evidence/review for the high degree of conservation in adaptive immunity.

      The references were incorporated into line 93 of the revised manuscript.

      (c) Lines 100-102: Cite the evidence that MYPPPY is a CD80/86 binding motif.

      The references were incorporated into line 117 of the revised manuscript.

      The text associated with Figure 8 (Lines 280-289) does not clearly state that rescue experiments are being done in mutant zebrafish.

      We have provided a clear explanation of the rescue experiments conducted in Ctla-4-deficient zebrafish. This revision has been incorporated into line 319.

      Line 102: Is there evidence from other animals that LFPPPY can function as a binding site for CD80/CD86? Does CD28 also have this same motif?

      The extracellular domains of CTLA-4 and CD28, which bind to CD80/CD86, are largely conserved across various species. This conservation is exemplified by a central PPP core motif, although the flanking amino acids exhibit slight variations. In mammals, both CTLA-4 and CD28 feature the conserved MYPPPY motif. By contrast, in teleost fish, such as rainbow trout, CTLA-4 contains an LYPPPY motif, while CD28 has an MYPPPI motif (Ref. 1). Grass carp CTLA-4 displays an LFPPPY motif, whereas its CD28 variant bears an IYPPPF motif. Yeast two-hybrid assays confirm that these motifs facilitate interactions between grass carp CTLA-4 and CD28 with CD80/CD86 (Ref. 2). Similarly, zebrafish Ctla-4 contains the LFPPPY motif observed in grass carp, while Cd28 exhibits a closely related SYPPPF motif.

      References:

      (1) Bernard, D et al. (2006) Costimulatory Receptors in a Teleost Fish: Typical CD28, Elusive CTLA-4. J Immunol. 176: 4191-4200.

      (2) Lu T Z et al. (2022) Molecular and Functional Analyses of the Primordial Costimulatory Molecule CD80/86 and Its Receptors CD28 and CD152 (CTLA-4) in a Teleost Fish. Frontiers in Immunology. 13:885005.

      Line 110-111: Suggest adding citation of these previously published scRNAseq data to the main text in addition to the current description in the Figure legend.

      The reference has been added in line 129 in the main text.

      Figure 3B: It would be helpful to label a few of the top differentially expressed genes in Panel B?

      The top differentially expressed genes have been labeled in Figure 3B.

      Figure 3G: It's unclear how this analysis was conducted, what this figure is supposed to demonstrate, and in its current form it is illegible.

      Figure 3G displays a protein-protein interaction network constructed from differentially expressed genes. The densely connected nodes, representing physical interactions among proteins, provide valuable insights for basic scientific inquiry and biological or biomedical applications. As proteins are crucial to diverse biological functions, their interactions illuminate the molecular and cellular mechanisms that govern both healthy and diseased states in organisms. Consequently, these networks facilitate the understanding of pathogenic and physiological processes involved in disease onset and progression.

      To construct this network, we first utilized the STRING database (https://string-db.org) to generate an initial network diagram using the differentially expressed genes. This diagram was subsequently imported into Cytoscape (version 3.9.1) for visualization and further analysis. Node size and color intensity reflect the density of interactions, indicating the relative importance of each protein. Figure 3G illustrates that IL1β was a central cytokine hub in the disease process of intestinal inflammation in Ctla-4-deficient zebrafish.

      Expression scale labeling:

      (a) Most gene expression scales are not clearly labeled: do they represent mean expression or scaled expression? Has the expression been log-transformed, and if so, which log (natural log? Log10? Log2?). See: Figure 3E, 3I, 4D, 4E, 5B, 5G, 5H, 6I.

      The gene expression scales are detailed in the figure legends. Specifically, Figures 3E, 3I, and 6I present heatmaps depicting row-scaled expression levels for the corresponding genes. In contrast, Figures 4D and 4E display heatmaps illustrating the mean expression of these genes. Additionally, the dot plots in Figures 5B, 5G, and 5H visualize the mean expression levels of the respective genes.

      (b) For some plots, diverging color schemes (i.e. with white/yellow in the middle) are used for non-diverging scales and would be better represented with a sequential color scale. See: 4D, 4E, and potentially others (not fully clear because of the previous point).

      The color schemes in Figures 4D and 4E have been updated to a sequential color scale. The gene expression data depicted in these figures represent mean expression values and have not undergone log transformation. This information has been incorporated into the figure legend for clarity.

      Lines 186-187: Though it is merely suggested, apoptotic gene expression can be upregulated as part of the dissociation process for single-cell RNAseq. This would be much stronger if supported by a staining, such as anti-Caspase 3.

      Following the reviewer's insightful recommendations, we conducted a TUNEL assay to evaluate apoptosis in the posterior intestinal epithelial cells of both wild-type and Ctla-4-deficient zebrafish. As expected, our results demonstrate a significant increase in epithelial cell apoptosis in Ctla-4-deficient zebrafish compared with wild-type fish. The corresponding data are presented in Figure S6D and have been incorporated into the manuscript. Detailed protocols for the TUNEL assay have also been included in the Materials and Methods section.

      Author response image 2.

      Illustrates the quantification of TUNEL-positive cells per 1 × 10<sup>4</sup> μm<sup>2/⁻</sup> in the posterior intestines of both wild-type (WT) and ctla-4<sup>⁻/⁻</sup> zebrafish (n = 5). The data demonstrate a comparative analysis of apoptotic cell density between the two genotypes.

      Lines 248-251: This manuscript demonstrates gut inflammation and also changes in microbial diversity, but I don't think it demonstrates an association between them, which would require an experiment that for instance rescues one of these changes and shows that it ameliorates the other change, despite still being a ctla-4 mutant.

      We appreciate the valuable comments from the reviewer. Recently, the relationship between inflammatory bowel disease (IBD) and gut microbial diversity has garnered considerable attention, with several key findings emerging from human IBD studies. For instance, patients with IBD (including ulcerative colitis and Crohn's disease) exhibit reduced microbial diversity, which is correlated with disease severity. This decrease in microbial richness is thought to stem from the loss of normal anaerobic bacteria, such as Bacteroides, Eubacterium, and Lactobacillus (Refs. 1-6). Research using mouse models has shown that inflammation increases oxygen and nitrate levels within the intestinal lumen, along with elevated host-derived electron acceptors, thereby promoting anaerobic respiration and overgrowth of Enterobacteriaceae (Ref 7). Consistent with these findings, our study observed a significant enrichment of Enterobacteriaceae in the inflamed intestines of Ctla-4-deficient zebrafish, which supporting the observations in mice. Despite this progress, the zebrafish model for intestinal inflammation remains under development, with limitations in available techniques for manipulating intestinal inflammation and reconstructing gut microbiota. These challenges hinder investigations into the association between intestinal inflammation and changes in microbial diversity. We plan to address these issues through ongoing technological advancements and further research. We thank the reviewer for their understanding.

      References:

      (1) Ott S J, Musfeldt M, Wenderoth D F, Hampe J, Brant O, Fölsch U R et al. (2004) Reduction in diversity of the colonic mucosa associated bacterial microflora in patients with active inflammatory bowel disease. Gut 53:685-693.

      (2) Manichanh C, Rigottier-Gois L, Bonnaud E, Gloux K, Pelletier E, Frangeul L et al. (2006) Reduced diversity of faecal microbiota in Crohn's disease revealed by a metagenomic approach. Gut 55:205-211.

      (3) Qin J J, Li R Q, Raes J, Arumugam M, Burgdorf K S, Manichanh C et al. (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59-U70.

      (4) Sha S M, Xu B, Wang X, Zhang Y G, Wang H H, Kong X Y et al. (2013) The biodiversity and composition of the dominant fecal microbiota in patients with inflammatory bowel disease. Diagn Micr Infec Dis 75:245-251.

      (5) Ray K. (2015) IBD. Gut microbiota in IBD goes viral. Nat Rev Gastroenterol Hepatol 12:122.

      (6) Papa E, Docktor M, Smillie C, Weber S, Preheim S P, Gevers D et al. (2012) Non-Invasive Mapping of the Gastrointestinal Microbiota Identifies Children with Inflammatory Bowel Disease. Plos One 7: e39242-39254.

      (7) Hughes E R, Winter M G, Duerkop B A, Spiga L, de Carvalho T F, Zhu W H et al. (2017) Microbial Respiration and Formate Oxidation as Metabolic Signatures of Inflammation-Associated Dysbiosis. Cell Host Microbe 21:208-219.

      Lines 270-272 say that interaction between Cd28/ctla-4 and Cd80/86 was demonstrated through bioinformatics, flow-cytometry, and Co-IP. Does this need to reference Fig S6D for the flow data? Figures 7F-G are very hard to read or comprehend as they are very small. Figure 7H is the most compelling evidence of this interaction and might stand out better if emphasized with a sentence referencing it on its own in the manuscript. 

      In this study, we utilized an integrated approach combining bioinformatics prediction, flow cytometry, and co-immunoprecipitation (Co-IP) to comprehensively investigate and validate the interactions between Cd28/Ctla-4 and Cd80/86. Flow cytometry analysis, as depicted in Supplementary Figure 6D (revised as Supplementary Figure 8F), demonstrated the surface expression of Cd80/86 on HEK293T cells and quantified their interactions with Cd28 and Ctla-4. These experiments not only validated the interactions between Cd80/86 and Cd28/Ctla-4 but also revealed a dose-dependent relationship, providing robust supplementary evidence for the molecular interactions under investigation. Furthermore, in Figure 7F-G, the axis font sizes were enlarged to improve readability. Additionally, in response to reviewers' feedback, we have emphasized Figure 7H, which presents the most compelling evidence for molecular interactions, by including a standalone sentence in the text to enhance its prominence.

      For Figure 7A-E, for non-immunologists, it is unclear what experiment was performed here - it would be helpful to add a 1-sentence summary of the assay to the main text or figure legend.

      We apologize for this oversight. Figures 7A–E illustrate the functional assessment of the inhibitory role of Ctla-4 in Cd80/86 and Cd28-mediated T cell activation. A detailed description of the methodologies associated with Figures 7A–E is provided in the ‘Materials and Methods’ section of the revised manuscript.

      For Figure 7F-G, it is extremely hard to read the heat map legends and the X and Y-axis. Also, what the heatmaps show and how that fits the overall narrative can be elaborated significantly.

      We regret this oversight. To enhance clarity, we have increased the font size of the heatmap legends and the X and Y-axes, as shown in the following figure. Additionally, a detailed analysis of these figures is provided in lines 299–306 of the main text.

      In general, the main text that accompanies Figure 7 should be expanded to more clearly describe these experiments/analyses and their results.

      We have conducted a detailed analysis of the experiments and results presented in Figure 7. This analysis is described in lines 278-314.

      Reviewer #2:

      The scRNASeq assay is missing some basic characterization: how many WT and mutant fish were assayed in the experiment? how many WT and mutant cells were subject to sequencing? Before going to the immune cell types, are intestinal cell types comparable between the two conditions? Are there specific regions in the tSNE plot in Figure 4A abundant of WT or ctla-4 mutant cells?

      In the experiment, we analyzed 30 wild-type and 30 mutant zebrafish for scRNA-seq, with an initial dataset comprising 8,047 cells in the wild-type group and 8,321 cells in the mutant group. Sample preparation details are provided on lines 620-652. Due to the relatively high expression of mitochondrial genes in intestinal tissue, quality control filtering yielded 3,263 cells in the wild-type group and 4,276 cells in the mutant group. Given that the intestinal tissues were dissociated using identical protocols, the resulting cell types are comparable between the two conditions. Both the wild-type and Ctla-4-deficient groups contained enterocytes, enteroendocrine cells, smooth muscle cells, neutrophils, macrophages, B cells, and a cluster of T/NK/ILC-like cells. Notably, no distinct regions were enriched for either condition in the tSNE plot (Figure 4A).

      The cell proliferation experiment using PHA stimulation assay demonstrated the role of Ctla-4 in cell proliferation, while the transcriptomic evidence points towards activation rather than an overall expansion of T-cell numbers. This should be discussed towards a more comprehensive model of how subtypes of cells can be differentially proliferating in the disease model.

      In the PHA-stimulated T cell proliferation assay, we aimed to investigate the regulatory roles of Ctla-4, Cd28, and Cd80/86 in T cell activation, focusing on validating Ctla-4's inhibitory function as an immune checkpoint. While our study examined general regulatory mechanisms, it did not specifically address the distinct roles of Ctla-4 in different T cell subsets. We appreciate the reviewer's suggestion to develop a more comprehensive model that elucidates differential T cell activation across various subsets in disease models. However, due to the nascent stage of research on fish T cell subsets and limitations in lineage-specific antibodies and transgenic strains, such investigations are currently challenging. We plan to pursue these studies in the future. Despite these constraints, our single-cell RNA sequencing data revealed an increased proportion of Th2 subset cells in Ctla-4-deficient zebrafish, as evidenced by elevated expression levels of Th2 markers (Il4, Il13, and Gata3) via RT-qPCR (see Figures S7B). Notably, recent studies in mouse models have shown that naïve T cells from CTLA-4-deficient mice tend to differentiate into Th2 cells post-proliferation, with activated Th2 cells secreting higher levels of cytokines like IL-4, IL-5, and IL-13, thereby exerting their effector functions (Refs. 1-2). Consequently, our findings align with observations in mice, suggesting conserved CTLA-4 functions across species. We have expanded the "Discussion" section to clarify these points.

      References:

      (1) Bour-Jordan H, Grogan J L, Tang Q Z, Auger J A, Locksley R M, Bluestone J A et al. (2003) CTLA-4 regulates the requirement for cytokine-induced signals in T<sub>H</sub>2 lineage commitment. Nature Immunology 4: 182-188.

      (2) Khattri Roli, Auger, Julie A, Griffin Matthew D, Sharpe Arlene H, Bluestone Jeffrey A et al. (1999) Lymphoproliferative Disorder in CTLA-4 Knockout Mice Is Characterized by CD28-Regulated Activation of Th2 Responses. The Journal of Immunology 162:5784-5791.

      It would be nice if the authors could also demonstrate whether other tissues in the zebrafish have an inflammation response, to show whether the model is specific to IBD.

      In addition to intestinal tissues, we also performed histological analysis on the liver of Ctla-4-deficient zebrafish. The results showed that Ctla-4 deficiency led to mild edema in a few hepatocytes, and lymphocyte infiltration was not significant. Compared to the liver, we consider intestinal inflammation to be more pronounced.

      Some minor comments on terminology

      (a) "multiomics" usually refers to omics experiments with different modalities (e.g. transcriptomics, proteomics, metabolomics etc), while the current paper only has transcriptomics assays. I wouldn't call it "multiomics" analysis.

      We appreciate the reviewer's attention to this issue. The "multi-omics" has been revised to "transcriptomics".

      (b) In several parts of the figure legend the author mentioned "tSNE nonlinear clustering" (Figures 4A and 5A). tSNE is an embedding method rather than a clustering method.

      The "tSNE nonlinear clustering" has been revised to "tSNE embedding”.

      (c) Figure 1E is a UMAP rather than tSNE.

      The "tSNE" has been revised to "UMAP" in the figure legend in line 1043.

      Reviewer #3: 

      Line 28: The link is not directly reflected in this sentence describing CTLA-4 knockout mice.

      We appreciate the reviewer for bringing this issue to our attention. We have expanded our description of CTLA-4 knockout mice on lines 77-84.

      Line 80-83: There is a lack of details about the CTLA-4-deficient mice. The factor that Th2 response could be induced has been revealed in mouse model. See the reference entitled "CTLA-4 regulates the requirement for cytokine-induced signals in TH2 lineage commitment" published in Nature Immunology.

      We thank the reviewer for providing valuable references. We have added descriptions detailing the differentiation of T cells into Th2 cells in CTLA-4-deficient mice on lines 78–81, and the relevant references have been cited in the revised manuscript.

      To better introduce the CTLA-4 immunobiology, the paper entitled "Current Understanding of Cytotoxic T Lymphocyte Antigen-4 (CTLA-4) Signaling in T-Cell Biology and Disease Therapy" published in Molecules and Cells should be referred.

      We have provided additional details on CTLA-4 immunology (lines 75-84) and have included the relevant reference in the revised manuscript.

      In current results, there are many sentences that should be moved to the discussion, such as lines 123-124, lines 152-153, lines 199-200, and lines 206-207. So, the result sections just describe the results, and the discussions should be put together in the discussion.

      We have relocated these sentences to the 'Discussion' section and refined the writing.

      In the discussion, the zebrafish enteritis model, such as DSS/TNBS and SBMIE models, should also be compared with the current CTLA-4 knockout model. Also, the comparison between the current fish IBD model and the previous mouse model should also be included, to enlighten the usage of CTLA-4 knockout zebrafish IBD model.

      We compared the phenotypes of our current Ctla-4-knockout zebrafish IBD model with other models, including DSS-induced IBD models in zebrafish and mice, as well as TNBS- and SBM-induced IBD models in zebrafish. The details are included in the "Discussion" section (lines 353-365).

      As to the writing, the structure of the discussion is poor. The paragraphs are very long and hard to follow. Many findings from current results were not yet discussed. I just can't find any discussion about the alteration of intestinal microbiota.

      In response to the reviewers' constructive feedback, we have revised and enhanced the discussion section. Furthermore, we have integrated the most recent research findings relevant to this study into the discussion to improve its relevance and comprehensiveness.

      In the discussion, the aerobic-related bacteria in 16s rRNA sequencing results should be focused on echoing the histopathological findings, such as the emptier gut of CTLA-4 knockout zebrafish.

      As mentioned above, the discussion section has been revised and expanded to provide a better understanding of the potential interplay among intestinal inflammatory pathology, gut microbiota alterations, and immune cell dysregulation in Ctla-4-deficient zebrafish. Furthermore, promising avenues for future research that warrant further investigation were also discussed.

      In the current method, there are no descriptions for many used methods, which already generated results, such as WB, MLR, MST, Co-IP, AlphaFold2 prediction, and how to make currently used anti-zfCTLA4 antibody. Also, there is a lack of description of the method of the husbandry of knockout zebrafish line.

      We regret these flaws. The methods section was inadvertently incomplete due to an error during the file upload process at submission. This issue has been rectified in the revised manuscript. Additionally, Ctla-4-deficient zebrafish were reared under the same conditions as wild-type zebrafish, and the rearing methods are now described in the "Generation of Ctla-4-deficient zebrafish" section of the Materials and Methods.

      Line 360: the experimental zebrafish with different ages could be a risk for unstable intestinal health. See the reference entitled "The immunoregulatory role of fish-specific type II SOCS via inhibiting metaflammation in the gut-liver axis" published in Water Biology and Security. The age-related differences in zebrafish could be observed in the gut.

      We appreciate the reviewers' reminders. The Ctla-4 mutant zebrafish used in our experiments were 4 months old, while the wild-type zebrafish ranged from 4 to 6 months old. These experimental fish were relatively young and uniformly distributed in age. During our study, we examined the morphological structures of the intestines in zebrafish aged 4 to 6 months and observed no significant abnormalities. These findings align with previous research indicating no significant difference in intestinal health between 3-month-old and 6-month-old wild-type zebrafish (Ref. 1). Consequently, we conclude that there is no notable aging-related change in the intestines of zebrafish aged 4 to 6 months. This reduces the risk associated with age-related variables in our study. We have added an explanation stating that the Ctla-4 mutant zebrafish used in the experiments were 4 months old (Line 449) in the revised manuscript.

      Reference

      (1) Shan Junwei, Wang Guangxin, Li Heng, Zhao Xuyang et al. (2023) The immunoregulatory role of fish-specific type II SOCS via inhibiting metaflammation in the gut-liver axis. Water Biology and Security 2: 100131-100144.

      Section "Generation of Ctla-4-deficient zebrafish": There is a lack of description of PCR condition for the genotyping.

      The target DNA sequence was amplified at 94 °C for 4 min, followed by 35 cycles at 94°C for 30 s, 58°C for 30 s and 72°C for 30 s, culminating in a final extension at 72 °C for 10 min. The polymerase chain reaction (PCR) conditions are described in lines 458-460.

      How old of the used mutant fish? There should be a section "sampling" to provide the sampling details.

      The "Sampling" information has been incorporated into the "Materials and Methods" section of the revised manuscript. Wild-type and Ctla-4-deficient zebrafish of varying months were housed in separate tanks, each labeled with its corresponding birth date. Experiments utilized Ctla-4-deficient zebrafish aged 4 months and wild-type zebrafish aged between 4 to 6 months.

      Line 378-380: The index for the histopathological analysis should be detailed, rather than just provide a reference. I don't think these indexes are good enough to specifically describe the pathological changes of intestinal villi and mucosa. It is suggested to improve with detailed parameters. As described in the paper entitled "Pathology of Gastric Intestinal Metaplasia: Clinical Implications" published in Am J Gastroenterol., histochemical, normal gastric mucins are pH neutral, and they stain magenta with periodic acid-Schiff (PAS). In an inflamed gut, acid mucins replace the original gastric mucins and are stained blue with Alcian blue (AB). So, to reveal the pathological changes of goblet cells and involved mucin components, AB staining should be added. Also, for the number of goblet cells in the inflammatory intestine, combining PAS and AB staining is the best way to reveal all the goblet cells. In Figure 2, there were very few goblet cells. The infiltration of lymphocytes and the empty intestinal lumen could be observed. Thus, the ratio between the length of intestinal villi and the intestinal ring radius should calculated.

      In response to the reviewers’ valuable suggestions, we have augmented the manuscript by providing additional parameters related to the pathological changes observed in the Ctlta-4-deficient zebrafish intestines, including the mucin component changes identified through PAS and AB-PAS staining, the variations in the number of goblet cells evaluated by AB-PAS staining, and the ratio of intestinal villi length to the intestinal ring radius, as illustrated in the following figures. These new findings are detailed in the "Materials and Methods" (lines 563-566) and "Results" (lines 143-146) sections, along with Supplementary Figure S3 of the revised manuscript.

      Section "Quantitative real-time PCR": What's the machine used for qPCR? How about the qPCR validation of RNA seq data? I did not see any related description of data and methods for qPCR validation. In addition, beta-actin is not a stable internal reference gene, to analyze inflammation and immune-related gene expression. See the reference entitled "Actin, a reliable marker of internal control?" published in Clin Chim Acta. Other stable housekeeping genes, such as EF1alpha and 18s, could be better internal references.

      RT-qPCR experiments were conducted using a PCR thermocycler device (CFX Connect Real-Time PCR Detection System with Precision Melt Analysis<sup>TM</sup> Software, Bio-Rad, Cat. No. 1855200EM1). This information has been incorporated into lines 608-610 of the "Materials and Methods" section. In these experiments, key gene sequences of interest, including il13, mpx, and il1β, were extracted from RNA-seq data for RT-qPCR validation. To ensure accurate normalization, potential internal controls were evaluated, and β-actin was identified as a suitable candidate due to its consistent expression levels in the intestines of both wild-type and Ctla-4-deficient zebrafish. The use of β-actin as an internal control is further supported by its application in recent studies on intestinal inflammation (Refs 1–2).

      References:

      (1) Tang Duozhuang, Zeng Ting, Wang Yiting, Cui Hui et al. (2020) Dietary restriction increases protective gut bacteria to rescue lethal methotrexate-induced intestinal toxicity. Gut Microbes 12: 1714401-1714422.

      (2) Malik Ankit, Sharma Deepika et al. (2023) Epithelial IFNγ signaling and compartmentalized antigen presentation orchestrate gut immunity. Nature 623: 1044-1052.

      How to generate sCtla-4-Ig, Cd28-Ig and Cd80/86? No method could be found.

      We apologize for the omission of these methods. The detailed protocols have now been added to the "Materials and Methods" section of the revised manuscript (lines 464-481).

      Figure 5: As reviewed in the paper entitled "Teleost T and NK cell immunity" published in Fish and Shellfsh Immunology, two types of NK cell homologues have been described in fish: non-specific cytotoxic cells and NK-like cells. There is no NKT cell identified in the teleost yet. Therefore, "NKT-like" could be better to describe this cell type.

      We refer to "NKT" cells as "NKT-like" cells, as suggested.

      For the supplementary data of scRNA-seq, there lacks the details of expression level.

      The expression levels of the corresponding genes are provided in Supplemental Table 4.

      Supplemental Table 1: There are no accession numbers of amplified genes.

      The accession numbers of the amplified genes are included in Supplemental Table 1.

      The English needs further editing.

      We have made efforts to enhance the English to meet the reviewers' expectations.

      Line 32: The tense should be the past.

      This tense error has been corrected.

      Line 363-365: The letter of this approval should be provided as an attachment.

      The approval document is provided as an attachment.

      Line 376: How to distinguish the different intestinal parts? Were they judged as the first third, second third, and last third parts of the whole intestine?

      The differences among the three segments of zebrafish intestine are apparent. The intestinal tube narrows progressively from the anterior to the mid-intestine and then to the posterior intestine. Moreover, the boundaries between the intestinal segments are well-defined, facilitating the isolation of each segment.

      Line 404: Which version of Cytoscape was used?

      The version of Cytoscape used in this study is 3.9.1. Information about the Cytoscape version is provided on line 603.

      The product information of both percoll and cell strainer should be provided.

      The information regarding Percoll and cell strainers has been added on lines 626 and 628, respectively.

      Line 814: Here should be a full name to tell what is MST.

      The acronym MST stands for "Microscale Thermophoresis", a technique that has been referenced on lines 1157-1158.

    1. eLife Assessment

      This translational study presents a direct cross-species comparison (between mice, rats, and humans) of choice behavior in the same perceptual decision-making task. The study is rare in opening a window on the evolution of decision-making, and the results will be important for many disciplines including behavioral sciences, psychology, neuroscience, and psychiatry. While the strength of the evidence presented is solid, the manuscript would benefit from additional information and analyses to strengthen and clarify its main conclusions.

    2. Reviewer #1 (Public review):

      This work presents data from three species (mice, rats, and humans) performing an evidence accumulation task, that has been designed to be as similar as possible between species (and is based on a solid foundation of previous work on decision-making). The tasks are well-designed, and the analyses are solid and clearly presented - showing that there are differences in the overall parameters of the decision-making process between the species. This is valuable to neuroscientists who aim to translate behavioral and neuroscientific findings from rodents to humans and offers a word of caution for the field in readily claiming that behavioral strategies and computations are representative of all mammals. The dataset would be of great interest to the community and may be a source of further modelling of across-species behavior, but unfortunately, neither data or code are currently shared.

      A few other questions remain, that make the conclusions of the paper a bit hard to assess:

      (1) The main weakness is that the authors claim that all species rely on evidence accumulation as a strategy, but this is not tested against other models (see e.g. Stine et al. https://elifesciences.org/articles/55365): the fact that the DDM fits rather well does not mean that this is the strategy that each species was carrying out.

      (2) In all main analyses, it is unclear what the effect is of the generative flash rate and how this has been calibrated between species. Only in Figure 6C do we see basic psychometric functions, but these should presumably also feature as a crucial variable dominating the accuracy and RTs (chronometric functions) across species. The very easy trials are useful to constrain the basic sensorimotor differences that may account for RT variability, e.g. perhaps the small body of mice requires them to move a relatively longer distance to trigger the response.

      (3) The GLM-HMM results (that mice are not engaged in all trials) are very important, but they imply that mouse DDM fits may well be more similar to rats and humans if done only on engaged trials. Could it be that the main species differences are driven by different engagement state occupations?

      (4) It would be very helpful if the authors could present a comprehensive overview (perhaps a table) of the factors that may be relevant for explaining the observed species differences. This may include contextual/experimental variables (age range (adolescent humans vs. mice/rats, see https://www.jax.org/news-and-insights/jax-blog/2017/november/when-are-mice-considered-old; reward source, etc) and also outcomes (e.g. training time required to learn the task, # trials per session and in total).

    3. Reviewer #2 (Public review):

      Summary:

      Chakravarty et al. propose a 'synchronized framework' for studying perceptual decision-making (DM) across species -namely humans, rats, and mice. Although all species shared hallmarks of evidence accumulation, the results highlighted species-specific differences. Humans were the slowest and most accurate, rats optimized the speed-accuracy tradeoff to maximize the reward rate and mice were the fastest but least accurate. In addition, while humans were better fit by a classic DDM with fixed bounds, rodents were better fit by a DDM with collapsing bounds. While comparing behavioral strategies in evidence accumulation tasks across species is an important and timely question, some of the presented differences across species lack a clear interpretation and could be simply caused by differences in the task design. There is important information and analyses missing about the DDM and the other models used, which lowers the confidence and enthusiasm about the results.

      Strengths:

      The comparison of behavior across species, including humans and commonly used laboratory species like rats and mice, is a fundamental step in neuroscience to establish more informed links between animal experiments and human cognition. In this work, Chakravarty et al. analyze and model the behavior of three species during the same evidence accumulation task. They draw conclusions about the different strategies used in each case.

      Weaknesses:

      Novelty:<br /> While quite relevant, some parts of the work presented are more novel than others. That EA drives choice behavior and these choices can be described with a DDM have been shown before (see e.g. (Kane et al. 2023; Brunton et al. in 2013; Pinto et al 2018)). The novelty here mostly lies in the comparison of three species in the same task and in fitting the same exact model (close quantitative comparison of behavioral strategies). However, some of the differences lack a clear interpretation. For instance, the values of some of the DDM fitted parameters between the three species are not ordered "as expected" (e.g. non-decision time or DDM BIC). Other comparison results completely lack an explanation (e.g. rats' RT are near optimal while humans and mice are not). The aspect that I found most novel and exciting is the application of HMMs to each of the species. However, this part comes at the end of the paper and has been done without sufficient depth. There is almost no explanation for the results. I would suggest the authors bring up this part and move back to other aspects which are, in my opinion, less novel or interpretable (e.g. results around the optimality of RT).

      Task design:<br /> Since there is no fixation, the response time (RT) reflects both the evidence integration time plus the motor time (stimuli are played until a response is given). This design makes it hard to compare RTs between species. While humans just had to press a button, rodents had to move their whole bodies from a central port to a side port. When comparing rats and mice, their difference in size relative to port distance could explain different RTs. This could for example explain the large difference in non-decision time (ndt) in Figure 3F between mice and rats. Are the measurements of the rat and the mouse boxes comparable? The authors should explain this difference more openly and discuss its implications when interpreting the results. The Methods should also provide information about the distance between ports for each species. I also strongly recommend including a few videos of rats and mice performing the task to have a sense of the movements involved in the task in each species.

      (1) DDM

      Goodness of fit:<br /> The authors conclude that the three species use an accumulation of evidence strategy because they can fit a DDM. However, there is little information about the goodness of these fits. They only show the RT distributions for one example subject (too small to distinguish whether the fit of the histograms is good or not). We suggest they make a figure showing in more detail the match of the RT distributions across subjects (e.g. they can compare RT quartiles for data and model for the entire group of subjects). Then they provide BIC which is a measure that depends on the number of trials. Were the number of trials matched across subjects/species? Could the authors provide a measure independent of the number of trials (e.g. cross-validated log-likelihood per trial)? Moreover, is this BIC computed only on the RTs, mouse responses, or both?

      Overparameterization:<br /> The authors chose to include as DDM parameters the variability of the initial offset, the variability in non-decision time, and the variability of the drift rate. Having so many parameters with just one stimulus condition (80:20 ratio of flashes) may lead to unidentifiability problems as recognized previously (e.g. see M. Jones (2021) here osf.io/preprints/psyarxiv/gja3u). Their parameter recovery Supplementary Figure 3 shows that at least two of these variability parameters can not be recovered. I also couldn't find the values of these parameters for the fitted DDM. So I was wondering the extent to which adding these parameters improves the fits and is overall necessary.

      Tachometric curves:<br /> The authors show increasing tachometric curves (i.e. Accuracy vs RT) and use this finding as proof of accumulation. They fit these curves using a GAAM with little justification or detail (in fact the GAAM seems to over-fit the data a bit). The authors do not say, however, that the other model used, i.e. the DDM, may not reproduce these increasing tachometric curves because "in its basic form", the DDM gives flat tachometric curves. Does the DDM fitted to the individual RT and choice data capture the monotonic increase observed in the tachometric curves?

      Correct vs Error trials:<br /> In a similar line, the authors do not test the fitted DDM separately in correct vs error trials, which is a classical distinction that most DDMs can't capture. It would be good to know if: (1) the RT in the data of correct vs error responses are similar (quantified in panel Figure 2B because in 2E it is not clear) and (2) the same trend between correct and error RTs are observed in the fitted DDMs.

      Urgency model:<br /> It is not clear how the urgency model used works. The authors cite Ditterich (2006), but in that paper, the urgency signal was applied to a race model with two decision variables: the urgency signal "accelerated" both DVs equally and sped up the race without favoring one DV versus the other. In a one-dimensional DDM, it is not clear where the urgency is applied. We assume it is applied in the direction of the stimulus, but then it is unclear how the urgency knows about the stimulus, which is what the DDM is trying to estimate in the first place. The authors should explain this model in greater detail and try to resolve this question.

      Despite finding differences between species, the analyses seem mostly exploratory instead of hypothesis-driven. There is little justification for why differences in some DDM parameters across species would be expected.

      (2) GLM and HMM

      The GLM fits show nicely that humans, rats, and mice weigh differently the total provided evidence (Figures 6C-D). This may be because the internal noise in the accumulation of evidence is higher but also it could simply be because animals do not weigh the evidence that is presented when they are already moving towards the side ports. A parsimonious alternative to the "more noisy" species is simply that they only consider the first part of the stimulus. Extending the GLM to capture the differential weighting of each sequential sample (what is called the Psychophysical kernel, PK) should be straightforward and would provide a more fair comparison between species (i.e. perhaps the slope of the psychometric curves is not that different, once evidence is weighted in each species with its corresponding PK.

      Choice Bias:<br /> Panel 3G (DDM starting point) shows that both rats and mice are slightly but systematically biased to the Left (x0 < 0.5). Panel 6D "Bias" seems to be showing the absolute value of the GLM bias parameter. It would be nice to (i) show the signed GLM bias parameter. (ii) Compare that the biases computed in the DDM and GLM are comparable across species and subjects; it looks like from the GLM they are comparable in magnitude across species whereas the in DDM they weren't (mice had a much bigger |x0| in the DDM), (iii) explain (or at least comment) on why animals show a systematic bias to one side.

    4. Reviewer #3 (Public review):

      Summary:

      This study directly compares decision-making strategies between three species, humans, rats, and mice. Based on a new and common behavioral task that is largely shared across species, specific features of evidence accumulation could be quantified and compared between species. The authors argue their work provides a framework to study decision-making across species, which can be studied by the same decision models. The authors report specific features of decision-making strategies, such as humans having a larger decision threshold leading to more accurate responses, and rodents deciding under time pressure.

      Strengths:

      The behavioral task is set up in similar, comparable ways across species, allowing for employing the same decision models and directly comparing specific features of decision behavior. This approach is compelling since it is otherwise challenging to compare behavior between species. Data analysis is solid and does not only quantify features of classic drift-diffusion models, but also additional commonly applied behavior models or features such as win-stay/lose-shift strategies, reward-maximization behavior, and slow, latent changes in behavior strategies. This approach reveals some interesting species differences, which are a starting point to investigate species-specific decision strategies more deeply and could inform a broad set of past and future behavior studies commonly used in cognitive and neuroscience.

      Weaknesses:

      (1) The choice of the stimulus difficulty is unclear, as choosing a single, specific evidence strength (80:20) could limit model fitting performance and interpretation of psychometric curves. This could also limit conclusions about species differences since the perceptual sensitivity seems quite different between species. Thus, the 80:20 lies at different uncertainty levels for the different species, which are known to influence behavioral strategies. This might be addressed by exploiting the distribution of actually delivered flashes, but it remained unclear to me to what degree this is the case. Previous perceptual discrimination studies typically sample multiple evidence levels to differentiate the source of variability in choice behavior.

      (2) The authors argue that their task is novel and that their task provides a framework to investigate perceptual decision-making. However, very similar, and potentially more powerful, perceptual decision-making tasks (e.g., using several evidence strength levels) have been used in humans, non-human primates, rats, mice, and other species. In some instances, analogous behavioral tasks, including studies using the same sensory stimulus, have been used across multiple species. While these may have been published in different papers, they have been conducted in some instances by the same lab and using the same analyses. Further, much of this work is not referenced here. This limits the impact of this work.

      (3) The employed drift-diffusion model has many parameters, which are not discussed in detail. Results in Supplementary Figures 3-5 are not explained or discussed, including the interpretation that model recovery tests fail to recover some of the parameters (eg, Figures S3E, G). This makes the interpretation of such models more difficult.

      (4) The results regarding potential reward-maximization strategies are compelling and connect perceptual and normative decision models. The results are however limited by the different inter-trial intervals and trial initiation times between species, which are shown in Figure S6. It's unclear to me how to interpret, for example, how the long trial initiation times in rats relate to a putative reward-maximizing strategy. This compares to the very low trial initiation times (ie, very 'efficient') of humans, even though they are 'too accurate' in terms of their sampling time. Reward-maximizing strategies seem difficult with such different trial times and in the absence of experimental manipulation.

    1. eLife Assessment

      In this important study, the authors use computational modeling to explore how rapid learning can be reconciled with the accumulation of stable memories in the olfactory bulb, where adult neurogenesis is prominent. They focus on the "flexibility-stability dilemma" and how it is resolved through local mechanisms within the olfactory bulb. These compelling results present a coherent picture of a neurogenesis-dependent learning process that aligns with diverse experimental observations and may serve as a foundation for further experimental and computational studies.

    2. Reviewer #1 (Public review):

      Summary:

      Sakelaris and Riecke used computational modeling to explore how neurogenesis and sequential integration of new neurons into a network support memory formation and maintenance. They focus on the integration of granule cells in the olfactory bulb, a brain area where adult neurogenesis is prominent. Experimental results published in recent years provide an excellent basis to address the question at hand by biologically constrained models. The study extends previous computational models and provides a coherent picture of how multiple processes may act in concert to enable rapid learning, high stability of memories, and high memory capacity. This computational model generates experimentally testable predictions and is likely to be valuable to understand the roles of neurogenesis and related phenomena in memory. One of the key findings is that important features of the memory system depend on transient properties of adult-born granule cells such as enhanced excitability and apoptosis during specific phases of the development of individual neurons. The model can explain many experimental observations and suggests specific functions for different processes (e.g., importance of apoptosis for continual learning). While this model is obviously a massive simplification of the biological system, it conceptualizes diverse experimental observations into a coherent picture, it generates testable predictions for experiments, and it will likely inspire further modeling and experimental studies. Nonetheless, there are issues that the authors should address.

      Strengths:

      (1) The model can explain diverse experimental observations.

      (2) The model directly represents the biological network.

      Weaknesses:

      As with many other models of biological networks, this model contains major simplifications.

    3. Reviewer #2 (Public review):

      Summary:

      This is an excellent paper that demonstrates Computational Modeling at its best. The authors propose a mechanism to provide flexibility to learn new information while preserving stability in neural networks by combining structural plasticity and synaptic plasticity.

      Strengths:

      An intriguing idea, that is well embedded in experimental data.

      The problem posed is real, the model uses data to be designed and implemented yet adds to the data novel and useful insight. The project proposes a parsimonious explanation for why neurogenesis can be better than classical plasticity and how stability versus flexibility can be solved with this approach.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    4. Reviewer #3 (Public review):

      The manuscript is focused on local bulbar mechanisms to solve the flexibility-stability dilemma in contrast to long-range interactions documented in other systems (hippocampus-cortex). The network performance is assessed in a perceptual learning task: the network is presented with alternating, similar artificial stimuli (defined as enrichment) and the authors assess its ability to discriminate between these stimuli by comparing the mitral cell representations quantified by Fisher discriminant analysis. The authors use enhancement in discriminability between stimuli as a function of the degree of specificity of connectivity in the network to quantify the formation of an odor-specific network structure which as such has memory - they quantify memory as the specificity of that connectivity.

      The focus on neurogenesis, excitability, and synaptic connectivity of abGCs is topical, and the authors systematically built their model, clearly stating their assumptions and setting up the questions and answers. In my opinion, the combination of latent dendritic representations, excitability, and apoptosis in an age-dependent manner is interesting and as the authors point out leads to experimentally testable hypotheses. I have however several concerns with the novelty of the work, the lack of referencing of previous work on granule cells-mitral cell interactions more generally, and the biological plausibility of the model that, in my opinion, should be further addressed to better contextualize the model.

      (1) The authors find that a network with age-dependent synaptic plasticity outperforms one with constant age-independent plasticity and that having more GC per se is not sufficient to explain this effect. In addition, having an initial higher excitability of GCs leads to increased performance. To what degree the increased excitability of abGCs is conceptually necessarily independent of them having higher synaptic plasticity rates / fast synapses?

      (2) The authors do not mention previous theoretical work on the specificity of mitral to granule cell interactions from several groups (Koulakov & Rinberg - Neuron, 2011; Gilra & Bhalla, PLoSOne, 2015; Grabska-Bawinska...Mainen, Pouget, Latham, Nat. Neurosci. 2017; Tootoonian, Schaefer, Latham, PLoS Comput. Biol., 2022), nor work on the relevance of top-down feedback from the olfactory cortex on the abGC during odor discrimination tasks (Wu & Komiyama, Sci. Adv. 2020), or of top-down regulation from the olfactory cortex on regulating the activity of the mitral/tufted cells in task engaged mice (Lindeman et al., PLoS Comput. Biol., 2024), or in naïve mice that encounter odorants (in the absence of specific context; Boyd, et al., Cell Rep, 2015; Otazu et al., Neuron 2015, Chae et al., Neuron, 2022). In particular, the presence of rich top-down control of granule cell activity (including of abGCs) puts into question the plausibility of one of the opening statements of the authors with respect to relying solely on local circuit mechanisms to solve the flexibility-stability dilemma. I think the discussion of this work is important in order to put into context the idea of specific interactions between the abGCs and the mitral cells.

      (3) To what the degree of specific connectivity reflects a specific stimulus configuration, and is a good proxy for determining the stimulus discriminability and memory capacity in terms of temporal activity patterns (difference in latency/phase with respect to the respiration cycle, etc.) which may account to a substantial fraction of ability to discriminate between stimuli? The authors mention in the discussion that this is, indeed, an upper bound and specific connectivity is necessary for different temporal activity patterns, but a further expansion on this topic would help in understanding the limitations of the model.

      (4) Reward or reward prediction error signals are not considered in the model. They however are ubiquitous in nature and likely to be encountered and shape the connectivity and activity patterns of the abGC-mitral cell network. Including a discussion of how the model may be adjusted to incorporate reward/error signals would strengthen the manuscript.

      Specific Comments

      (1) Lines 84-86; 507-509; Eq(3): Sensory input is defined by a basal parameter of MCs spontaneous activity (Sspontaneus) and the odor stimuli input (Siodor) but is not clear from the main text or methods how sensory inputs (glomerular patterns) were modeled.

      (2) Lines 118-122: The used perceptual learning task explanation is done only in the context of the discriminability of similar artificial stimuli using the Fisher discriminant and "Memory" metric. A detailed description of the logic of the perceptual learning task methods and objective, taking into account Comment 1, would help to better understand the model.

      (3) Rapid re-learning of forgotten odor pair is enabled by sensory-dependent dendritic elaboration of neurons that initially encoded the odors and the observed re-learning would occur even if neurogenesis was blocked following the first enrichment and even though the initial learning did require neurogenesis. When this would ever occur in nature? The re-learning of an odor period? Why is this highlighted in the study?

    1. eLife Assessment

      This study presents valuable findings related to seasonal brain size plasticity in the Eurasian common shrew (Sorex araneus), which is an excellent model system for these studies. The evidence supporting the authors' claims is convincing. The work will be of interest to biologists working on neuroscience, plasticity, and evolution.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, Thomas et al. set out to study seasonal brain gene expression changes in the Eurasian common shrew. This mammalian species is unusual in that it does not hibernate or migrate but instead stays active all winter while shrinking and then regrowing its brain and other organs. The authors previously examined gene expression changes in two brain regions and the liver. Here, they added data from the hypothalamus, a brain region involved in the regulation of metabolism and homeostasis. The specific goals were to identify genes and gene groups that change expression with the seasons and to identify genes with unusual expression compared to other mammalian species. The reason for this second goal is that genes that change with the season could be due to plastic gene regulation, where the organism simply reacts to environmental change using processes available to all mammals. Such changes are not necessarily indicative of adaptation in the shrew. However, if the same genes are also expression outliers compared to other species that do not show this overwintering strategy, it is more likely that they reflect adaptive changes that contribute to the shrew's unique traits.

      The authors succeeded in implementing their experimental design and identified significant genes in each of their specific goals. There was an overlap between these gene lists. The authors provide extensive discussion of the genes they found.

      The scope of this paper is quite narrow, as it adds gene expression data for only one additional tissue compared to the authors' previous work in a 2023 preprint. The two papers even use the same animals, which had been collected for that earlier work. As a consequence, the current paper is limited in the results it can present. This is somewhat compensated by an expansive interpretation of the results in the discussion section, but I felt that much of this was too speculative. More importantly, there are several limitations to the design, making it hard to draw stronger conclusions from the data. The main contribution of this work lies in the generated data and the formulation of hypotheses to be tested by future work.

      Strengths:

      The unique biological model system under study is fascinating. The data were collected in a technically sound manner, and the analyses were done well. The paper is overall very clear, well-written, and easy to follow. It does a thorough job of exploring patterns and enrichments in the various gene sets that are identified.

      I specifically applaud the authors for doing a functional follow-up experiment on one of the differentially expressed genes (BCL2L1), even if the results did not support the hypothesis. It is important to report experiments like this and it is terrific to see it done here.

      Comments on revised version:

      This updated version of the paper is improved compared to its initial version. As such, the strengths remain the same as before, with a fascinating model system and an interesting research question. The earlier weaknesses related to overinterpretation of the data have been largely fixed by shortening the paper and adding appropriate caveats throughout. The paper now also includes a significance test for its overlap between gene lists. While this turned out to be negative (i.e., there is not more overlap between lists than expected by chance), reporting this result transparently has strengthened the paper.

    3. Reviewer #2 (Public review):

      Summary:

      Shrews go through winter by shrinking their brain and most organs, then regrow them in the spring. The gene expression changes underlying this unusual brain size plasticity were unknown. Here, the authors looked for potential adaptations underlying this trait by looking at differential expression in the hypothalamus. They found enrichments for DE in genes related to the blood brain barrier and calcium signaling, as well as used comparative data to look at gene expression differences that are unique in shrews. This study leverages a fascinating organismal trait to understand plasticity and what might be driving it at the level of gene expression. This manuscript also lays the groundwork for further developing this interesting system.

      Strengths:

      One strength is that the authors used OU models to look for adaptation in gene expression. The authors also added cell culture work to bolster their findings.

      Comments on revised version:

      I think that the authors have made a strong revision. No other comments.

    4. Reviewer #3 (Public review):

      Summary:

      In their study, the authors combine seasonal and comparative transcriptomics to identify candidate genes with plastic, canalized, or lineage-specific (i.e., divergent) expression patterns associated with an unusual overwintering phenomenon (Dehnel's phenomenon - seasonal size plasticity) in the Eurasian shrew. Their focus is on the shrinkage and regrowth of the hypothalamus, a brain region that undergoes significant seasonal size changes in shrews and plays a key role in regulating metabolic homeostasis. Through comparative transcriptomic analysis, they identify genes showing derived (lineage-specific), plastic (seasonally regulated), and canalized (both lineage-specific and plastic) expression patterns. The authors hypothesize that genes involved in pathways such as the blood-brain barrier, metabolic state sensing, and ion-dependent signaling will be enriched among those with notable transcriptomic patterns. They complement their transcriptomic findings with a cell culture-based functional assessment of a candidate gene believed to reduce apoptosis.

      Strengths:

      The study's rationale and its integration of seasonal and comparative transcriptomics are well-articulated and represent an advancement in the field. The transcriptome, known for its dynamic and plastic nature, is also influenced by evolutionary history. The authors effectively demonstrate how multiple signals-evolutionary, constitutive, and plastic-can be extracted, quantified, and interpreted. The chosen phenotype and study system are particularly compelling, as it not only exemplifies an extreme case of Dehnel's phenotype, but the metabolic requirements of the shrew suggest that genes regulating metabolic homeostasis are under strong selection.

      Weaknesses:

      The results of the expression patterns are quite compelling and a number of interesting downstream hypotheses are outlined; however, the interpretation of the role of each gene and pathway identified is speculative which dampens the overall impact of the work. That said, I commend the authors on functionally testing one of the differentially expressed genes. I also commend the inclusion of that negative result.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study presents valuable findings related to seasonal brain size plasticity in the Eurasian common shrew (Sorex araneus), which is an excellent model system for these studies. The evidence supporting the authors' claims is convincing. However, the authors should be careful when applying the term adaptive to the gene expression changes they observe; it would be challenging to demonstrate the differential fitness effects of these gene expression changes. The work will be of interest to biologists working on neuroscience, plasticity, and evolution.

      We appreciate the reviewers’ suggestions and comments. For the phylogenetic ANOVA we used (EVE), which tests for a separate RNA expression optimum specific to the shrew lineage consistent with expectations for adaptive evolution of gene expression. But, as you noted, while this analysis highlights many candidate genes evolving in a manner consistent with positive selection, further functional validation is required to confirm if and how these genes contribute to Dehnel’s phenomenon. In the discussion, we now emphasize that inferred adaptive expression of these genes is putative and outline that future studies are needed to test the function of proposed adaptations. For example, cell line validations of BCL2L1 on apoptosis is a case study that tests the function of a putatively adaptive change in gene expression, and it illuminates this limitation. We also have refined our discussion to focus more on pathway-level analyses rather than on individual genes, and have addressed other issues presented, including clarity of methods and using sex as a covariate in our analyses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, Thomas et al. set out to study seasonal brain gene expression changes in the Eurasian common shrew. This mammalian species is unusual in that it does not hibernate or migrate but instead stays active all winter while shrinking and then regrowing its brain and other organs. The authors previously examined gene expression changes in two brain regions and the liver. Here, they added data from the hypothalamus, a brain region involved in the regulation of metabolism and homeostasis. The specific goals were to identify genes and gene groups that change expression with the seasons and to identify genes with unusual expression compared to other mammalian species. The reason for this second goal is that genes that change with the season could be due to plastic gene regulation, where the organism simply reacts to environmental change using processes available to all mammals. Such changes are not necessarily indicative of adaptation in the shrew. However, if the same genes are also expression outliers compared to other species that do not show this overwintering strategy, it is more likely that they reflect adaptive changes that contribute to the shrew's unique traits.

      The authors succeeded in implementing their experimental design and identified significant genes in each of their specific goals. There was an overlap between these gene lists. The authors provide extensive discussion of the genes they found.

      The scope of this paper is quite narrow, as it adds gene expression data for only one additional tissue compared to the authors' previous work in a 2023 preprint. The two papers even use the same animals, which had been collected for that earlier work. As a consequence, the current paper is limited in the results it can present. This is somewhat compensated by an expansive interpretation of the results in the discussion section, but I felt that much of this was too speculative. More importantly, there are several limitations to the design, making it hard to draw stronger conclusions from the data. The main contribution of this work lies in the generated data and the formulation of hypotheses to be tested by future work.

      Thank you for your interest in our manuscript and for your insights. We addressed your comments below: we now highlight the limitations of our study design in the discussion and emphasize that, while a second optimum of gene expression in shrews is consistent with adaptive evolution, we recognize that not all sources of variation in gene expression can be fully accounted for. We highlight the putative nature of these results in our revisions, especially in our new limitations section (lines 541-555).

      Strengths:

      The unique biological model system under study is fascinating. The data were collected in a technically sound manner, and the analyses were done well. The paper is overall very clear, well-written, and easy to follow. It does a thorough job of exploring patterns and enrichments in the various gene sets that are identified.

      I specifically applaud the authors for doing a functional follow-up experiment on one of the differentially expressed genes (BCL2L1), even if the results did not support the hypothesis. It is important to report experiments like this and it is terrific to see it done here.

      We are glad to hear that you found our manuscript fascinating and clearly written. While we hoped to see an effect of BCL2L1 on apoptosis as proposed, we agree that reporting null results is valuable when validating evolutionary inferences.

      Weaknesses:

      While the paper successfully identifies differentially expressed seasonal genes, the real question is (as explained by the authors) whether these are evolved adaptations in the shrews or whether they reflect plastic changes that also exist in other species. This question was the motivation for the inter-species analyses in the paper, but in my view, these cannot rigorously address this question. Presumably, the data from the other species were not collected in comparable environments as those experienced by the shrews studied here. Instead, they likely (it is not specified, and might not be knowable for the public data) reflect baseline gene expression. To see why this is problematic, consider this analogy: if we were to compare gene expression in the immune system of an individual undergoing an acute infection to other, uninfected individuals, we would see many, strong expression differences. However, it would not be appropriate to claim that the infected individual has unique features - the relevant physiological changes are simply not triggered in the other individuals. The same applies here: it is hard to draw conclusions from seasonal expression data in the shrews to non-seasonal data in the other species, as shrew outlier genes might still reflect physiological changes that weren't active in the other species.

      There is no solution for this design flaw given the public data available to the authors except for creating matched data in the other species, which is of course not feasible. The authors should acknowledge and discuss this shortcoming in the paper.

      Thank you for taking the time to provide such insightful feedback. As you noted, whiles shrews experience seasonal size changes, their environments may differ from the other species used in this experiment, leading to increased or decreased expression of certain genes and reducing our ability accurately detect selection across the phylogeny. Although we sought to control for as many sources of variation as possible, such as using only post-pubescent, wild, or non-domesticated individuals when feasible, we recognize that not all sources of variation can be fully accounted for within a practical experiment. We agree that these sources of variation can introduce both false positives and negatives into our results, and we have now highlighted this limitation within our discussion (lines 538-552).

      Related to the point above: in the section "Evolutionary Divergence in Expression" it is not clear which of the shrew samples were used. Was it all of them, or only those from winter, fall, etc? One might expect different results depending on this. E.g., there could be fewer genes with inferred adaptive change when using only summer samples. The authors should specify which samples were included in these analyses, and, if all samples were used, conduct a robustness analysis to see which of their detected genes survive the exclusion of certain time points.

      Thank you for this attention to detail. We used spring adults for this analysis. This decision was made as only used post pubescent individuals for all species in the analysis, and this was the only season where adult shrews were going through Dehnel’s phenomenon. We have now clarified this in both the methods and results (line 247 and line 667)

      In the same section, were there also genes with lower shrew expression? None are mentioned in the text, so did the authors not test for this direction, or did they test and there were no significant hits?

      We did test for decreased shrew expression compared to the rest of the species, but there were no significant genes with significant decreases. We hypothesize that there are two potential reasons for this results; 1) If a gene were to be selected for decreased expression, selection for constitutive expression of the gene across all species may be weak, and thus found in other lineages as well, or 2) decreased or no expression may relax selection on the coding regions, and thus these genes are not pulled out as we identify 1:1 orthologs. This is consistent with results provided from the original methods manuscript. Thank you for pointing out that we did not discuss this information in the text, and we now include it in our results (lines 250-251).

      The Discussion is too long and detailed, given that it can ultimately only speculate about what the various expression changes might mean. Many of the specific points made (e.g. about the blood-brain-barrier being more permissive to sensing metabolic state, about cross-organ communication, the paragraphs on single, specific genes) are a stretch based on the available data. Illustrating this point, the one follow-up experiment the authors did (on BCL2L1) did not give the expected result. I really applaud the authors for having done this experiment, which goes beyond typical studies in this space. At the same time, its result highlights the dangers of reading too much into differential expression analyses.

      We agree with your point, while our extensive discussion is useful for testing future hypotheses, ultimately some of the discussion may be too speculative for our readers. To amend this, we have reduced some portions of our discussion and focused more on pathways than individual genes, including removing mechanisms related to HRH2, FAM57B, GPR3, and GABAergic neurons. We hope that this highlights to the reader the speculative nature of many of our results.

      There is no test of whether the five genes observed in both analyses (seasonal change and inter-species) exceed the number expected by chance. When two gene sets are drawn at random, some overlap is expected randomly. The expected overlap can be computed by repeated draws of pairs of random sets of the same size as seen in real data and by noting the overlap between the random pairs. If this random distribution often includes sets of five genes, this weakens the conclusions that can be drawn from the genes observed in the real data.

      Thank you for highlighting this approach, it is greatly needed. After running this test, we found that observed overlapping genes were more than the expected overlap, yet not significant. We now show this in our methods (lines 277-278) and results (lines 719-720).

      Reviewer #2 (Public review):

      Summary:

      Shrews go through winter by shrinking their brain and most organs, then regrow them in the spring. The gene expression changes underlying this unusual brain size plasticity were unknown. Here, the authors looked for potential adaptations underlying this trait by looking at differential expression in the hypothalamus. They found enrichments for DE in genes related to the blood-brain barrier and calcium signaling, as well as used comparative data to look at gene expression differences that are unique in shrews. This study leverages a fascinating organismal trait to understand plasticity and what might be driving it at the level of gene expression. This manuscript also lays the groundwork for further developing this interesting system.

      We are glad you found our manuscript interesting and thank and thank you for your feedback. We hope that we have addressed all of your concerns as described below.

      Strengths:

      One strength is that the authors used OU models to look for adaptation in gene expression. The authors also added cell culture work to bolster their findings.

      Weaknesses:

      I think that there should be a bit more of an introduction to Dehnel's phenomenon, given how much it is used throughout.

      Thank you for this insight. With a lengthy introduction and discussion, we agree that the importance of Dehnel’s phenomenon may have been overshadowed. We have shortened both sections and emphasized the background on Dehnel’s phenomenon in the first two paragraphs of the introduction, allowing this extraordinary seasonal size plasticity to stand out.

      Reviewer #3 (Public review):

      Summary:

      In their study, the authors combine developmental and comparative transcriptomics to identify candidate genes with plastic, canalized, or lineage-specific (i.e., divergent) expression patterns associated with an unusual overwintering phenomenon (Dehnel's phenomenon - seasonal size plasticity) in the Eurasian shrew. Their focus is on the shrinkage and regrowth of the hypothalamus, a brain region that undergoes significant seasonal size changes in shrews and plays a key role in regulating metabolic homeostasis. Through combined transcriptomic analysis, they identify genes showing derived (lineage-specific), plastic (seasonally regulated), and canalized (both lineage-specific and plastic) expression patterns. The authors hypothesize that genes involved in pathways such as the blood-brain barrier, metabolic state sensing, and ion-dependent signaling will be enriched among those with notable transcriptomic patterns. They complement their transcriptomic findings with a cell culture-based functional assessment of a candidate gene believed to reduce apoptosis.

      Strengths:

      The study's rationale and its integration of developmental and comparative transcriptomics are well-articulated and represent an advancement in the field. The transcriptome, known for its dynamic and plastic nature, is also influenced by evolutionary history. The authors effectively demonstrate how multiple signals-evolutionary, constitutive, and plastic-can be extracted, quantified, and interpreted. The chosen phenotype and study system are particularly compelling, as it not only exemplifies an extreme case of Dehnel's phenotype, but the metabolic requirements of the shrew suggest that genes regulating metabolic homeostasis are under strong selection.

      Weaknesses:

      (1) In a number of places (described in detail below), the motivation for the experimental, analytical, or visualization approach is unclear and may obscure or prevent discoveries.

      Thank you for finding our research and manuscript compelling, as well as the valuable feedback that will drastically improve our manuscript. We hope that we have alleviated your concerns below by following your instructions below.

      (2) Temporal Expression - Figure 1 and Supplemental Figure 2 and associated text:

      - It is unclear whether quantitative criteria were used to distinguish "developmental shift" clusters from "season shift" clusters. A visual inspection of Supplemental Figure 2 suggests that some clusters (e.g., clusters 2, 8, and to a lesser extent 12) show seasonal variation, not just developmental differences between stages 1 and 2. While clustering helps to visualize expression patterns, it may not be the most appropriate filter in this case, particularly since all "season shift" clusters are later combined in KEGG pathway and GO analyses (Figure 1B).

      - The authors do not indicate whether they perform cluster-specific GO or KEGG pathway enrichment analyses. The current analysis picks up relevant pathways for hypothalamic control of homeostasis, which is a useful validation, but this approach might not fully address the study's key hypotheses.

      Thank you for this valuable feedback. We did not want to include clusters we deemed to be related to development, as this should not be attributed to changes associated with Dehnel’s phenomenon. We did this through qualitative, visual inspection, which we realize can differ between parties (i.e., clusters 2, 8, and 12 appeared to be seasonal). Qualitatively, we were looking for extreme divergence between Stage 1 and Stage 5 individuals, as expression was related to season and not development, then the average of these stages within cluster should be relatively similar. We have now quantified this as large differences in z-score (abs(summer juvenile-summer adult)>1.25) without meaningful interseason variations determined by a second local maximum (abs(autumn-winter)<0.5 and abs(winter-summer)<0.5)), and added it both our methods (lines 699-702) and results (line 192).

      Regarding the combination of clusters for pathway enrichment compared to individual pathways, we agree that combining clusters may be more informative for overall homeostasis, compared to individual clusters which may inform us on processes directly related to Dehnel’s phenomenon. Initially, we were tentative to conduct this analysis, as clusters contain small gene sets, reducing the ability to detect pathway enrichments. We have now included this analysis, which is reported in our methods (lines 703-704), results (lines 203-204)., and new supplemental table.

      (3) Differential expression between shrinkage (stage 2) and regrowth (stage 4) and cell culture targets

      - The rationale for selecting BCL2L1 for cell culture experiments should be clarified. While it is part of the apoptosis pathway, several other apoptosis-related genes were identified in the differential gene expression (DGE) analysis, some showing stronger differential expression or shrew-specific branch shifts. Why was BCL2L1 prioritized over these other candidates?

      We agree that our rationale for validating BCL2L1 function in neural cell lines was not clearly explained in the manuscript. We selected BCL2L1 because it is the furthest downstream gene in the apoptotic pathway, thus making it the most directly involved gene in programmed cell death, whereas upstream genes could influence additional genes or alternative processes. We have clarified this choice in the revised methods section (lines 748-750).

      - The authors mention maintaining (or at least attempting to maintain) a 1:1 sex ratio for the comparative analysis, but it is unclear if this was also done for the S. araneus analysis. If not, why? If so, was sex included as a covariate (e.g., a random effect) in the differential expression analysis? Sex-specific expression elevates with group variation and could impact the discovery of differentially expressed genes.

      Regarding the use of sex as a covariate, we acknowledge the concerns raised. In our evolutionary analyses, we maintained a balanced sex ratio within species when possible. EVE models handle the effect of sex on gene expression as intraspecific variation. In shrews, however, we used males exclusively, as females were only found among juvenile individuals. Including those juvenile females would have introduced age effects, with perhaps a larger effect on our results. For the seasonal data, we have now included sex as a covariate in differential expression analyses. However, our design is imbalanced in relation to sex, which we have now discussed in our methods (lines 713-714) and discussion limitations (lines 544-548).

      (4) Discussion: The term "adaptive" is used frequently and liberally throughout the discussion. The interpretation of seasonal changes in gene expression as indicators of adaptive evolution should be done cautiously as such changes do not necessarily imply causal or adaptive associations.

      Thank you for this insight. We have reviewed our discussion and clarified that adaptations are putative (i.e. lines 146, 285, and 332), and highlighted this in our limitations section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I would recommend always spelling out "Dehnel's phenomenon" or even replacing this term (after crediting the DP term) with the more informative "seasonal size plasticity". Every time I saw "DP", I had to remind myself what this referred to. If the authors choose not to do so, please use the acronym consistently (e.g. line 186 has it spelled out).

      We have replaced the acronym DP with either the full term or the more informative “seasonal size plasticity” throughout the text.

      (2) Line 202: "DEG" has not been defined. Simply add to the line before.

      Thank you for this attention to detail. We have added this to the line above (210).

      (3) Please add a reference for the "AnAge" tool that was used to determine if samples were pubescent.

      Thank you for identifying this oversight. We have now cited the proper paper in line 634.

      (4) In the BCL2L1 section in the results, add a callout to Figure 2D.

      We have now added a callout to Figure 2D within the results (line 234).

      Reviewer #2 (Recommendations for the authors):

      (1) Line 122: is associated? These adaptations?

      Thank you for identifying that we were missing the words “associated with” here. We have fixed this in the revision.

      (2) The first paragraph of the Results should be moved to the methods, except maybe the number of orthologs.

      Thank you for this insight. We have removed this portion from the results section.

      (3) Why a Bonferroni correction on line 188? That seems too strict.

      We agree the Bonferroni correction is strict. Results when using other less strict methods for controlling false discovery rate are also not significant after correction. These corrections can be found within the data, however, we only report on the Bonferroni correction.

      (4) Line 427: "is a novel candidate gene for several neurological disorders" needs some references. I see them a couple of sentences later, but that's quite a sentence with no references at the end.

      We have added the proper citations for this sentence (line 524).

      Reviewer #3 (Recommendations for the authors):

      (1) Temporal Expression - Figure 1 and Supplemental Figure 2 and associated text Line176-193:

      - The authors report the total number of genes meeting inclusion criteria (>0.5-fold change between any two stages and 2 samples >10 normalized reads), but it would be more informative to also provide the number of genes within each temporal cluster. This would offer a clearer understanding of how gene expression patterns are distributed over time.

      Unfortunately, this information is difficult to depict on our figure and would use too much space in the text. We have thus added a description of the range of genes in a new supplemental table depicting this information.

      - It is unclear whether quantitative criteria were used to distinguish "developmental shift" clusters from "season shift" clusters. A visual inspection of Supplemental Figure 2 suggests that some clusters (e.g., clusters 2, 8, and to a lesser extent 12) show seasonal variation, not just developmental differences between stages 1 and 2. While clustering helps to visualize expression patterns, it may not be the most appropriate filter in this case, particularly since all "season shift" clusters are later combined in KEGG pathway and GO analyses (Fig. 1B). Using a differential gene expression criterion might be more suitable. For example, do excluded genes show significant log-fold differences between late-stage comparisons?

      As previously mentioned, we have now quantified seasonal shifts as large differences in z-score (abs(summer juveniles-summer adults)>1.25) without meaningful interseason variations determined by a second local maximum (abs(autumn-winter)<0.5 and abs(winter-summer)<0.5)), and added it to our methods (lines 699-702).  We then follow this up with differential expression analyses as described in Figure 2.

      - Did the authors perform cluster-specific GO or KEGG pathway enrichment analyses instead of focusing on the combined set of genes across the season shift clusters? While I understand that the small number of genes in each cluster may be limiting, if pathways emerge from cluster-specific analysis, they could provide more detailed insights into the functional significance of these temporal expression patterns. The current analysis picks up relevant pathways for hypothalamic control of homeostasis, which is a useful validation, but this approach might not fully address the study's key hypotheses. Additionally, no corrections for multiple hypothesis testing were applied, as noted in the results. A more refined gene set (e.g., using differential expression criteria, described above) could be more appropriate for these analyses.

      We have now included cluster-specific KEGG enrichments as previously described.

      (2) Differential expression between shrinkage (stage 2) and regrowth (stage 4) and cell culture targets - Figure 2 and lines195-227:

      - The rationale for selecting BCL2L1 for cell culture experiments should be clarified. While it is part of the apoptosis pathway, several other apoptosis-related genes were identified in the differential gene expression (DGE) analysis, some showing stronger differential expression or shrew-specific branch shifts. Why was BCL2L1 prioritized over these other candidates?

      We have now included the reasoning for further validation of BCL2L1 as described above.

      - The relevance of the "higher degree" differentially expressed genes needs more explanation. Although this group of genes is highlighted in the results, they are not featured in any subsequent analyses, leaving their importance unclear.

      Thank you for this insight. We have removed this from the methods as it is not relevant to subsequent analyses or conclusions.

      - The authors mention maintaining (or at least attempting to maintain) a 1:1 sex ratio for the comparative analysis (Line 525), but it is unclear if this was also done for the S. araneus analysis. If so, was sex included as a covariate (e.g., a random effect) in the differential expression analysis?

      We have now incorporated information on sex as described above.

      (3) Discussion:

      The term "adaptive" is used frequently and liberally throughout the discussion, but the authors should be cautious in interpreting seasonal changes in gene expression as indicators of adaptive evolution. Such changes do not necessarily imply causal or adaptive associations, and this distinction should be clearly stated when discussing the results.

      Thank you for this feedback and we agree with your conclusion, while a second expression optimum in the shrew lineage is indicative of adaptive expression, we cannot fully determine whether these are caused by genetic or environmental factors, despite careful attention to experimental design. We have highlighted this as a limitation in the discussion.

      (4) Minor Editorial Comment:

      Line 105: "... maintenance of an energy budgets..." delete "an"

      We have removed this grammatical error.

    1. eLife Assessment

      The manuscript by de La Forest Divonne et al. offers an important and detailed exploration of the immune cells in the oyster Crassostrea gigas, by correlating distinct hemocyte morphotypes with specific single-cell transcriptional profiles. The evidence supporting the conclusion is convincing, deriving from the comprehensive dataset that not only captures unicellular diversity but also associates these cells with distinct immune roles, making it an invaluable resource for the broader research community.

    2. Reviewer #1 (Public review):

      Summary

      In this manuscript, De La Forest Divonne et al. build a repertory of hemocytes from adult Pacific oysters combining scRNAseq data with cytologic and biochemical analyses. Three categories of hemocytes were described previously in this species (i.e. blast, hyalinocyte and granulocytes). Based on scRNAseq data, the authors identified 7 hemocyte clusters presenting distinct transcriptional signatures. Using Kegg pathway enrichment and RBGOA, the authors determined the main molecular features of the clusters. In parallel, using cytologic markers, the authors classified 7 populations of hemocytes (i.e. ML, H, BBL, ABL, SGC, BGC, and VC) presenting distinct sizes, nucleus sizes, acidophilic/basophilic, presence of pseudopods, cytoplasm/nucleus ratio and presence of granules. Then, the authors compared the phenotypic features with potential transcriptional signatures seen in the scRNAseq. The hemocytes were separated in a density gradient to enrich for specific subpopulations. The cell composition of each cell fraction was determined using cytologic markers and the cell fractions were analysed by quantitative PCR targeting major cluster markers (two per cluster). With this approach, the authors could assign cluster 7 to VC, cluster 2 to H, and cluster 3 to SGC. The other clusters did not show a clear association with this experimental approach. Using phagocytic assays, ROS, and copper monitoring, the authors showed that ML and SGC are phagocytic, ML produces ROS, and SGC and BGC accumulate copper. Then with the density gradient/qPCR approach, the authors identified the populations expressing anti-microbial peptides (ABL, BBL, and H). At last, the authors used Monocle to predict differentiation trajectories for each subgroup of hemocytes using cluster 4 as the progenitor subpopulation.

      The manuscript provides a comprehensive characterisation of the diversity of circulating immune cells found in Pacific oysters.

      Strengths

      The combination of scRNAseq, cytologic markers and gradient based hemocyte sorting offers an integrative view of the immune cell diversity.<br /> Hemocytes represent a very plastic cell population that has key roles in homeostatic and challenged conditions. Grasping the molecular features of these cells at the single-cell level will help understand their biology.<br /> This type of study may help elucidate the diversification of immune cells in comparative studies and evolutionary immunology.

      Weaknesses

      Several figures show inconsistency leading to erroneous conclusions and some conclusions are poorly supported. Moreover, the manuscript remains highly descriptive with limited comparison with the available literature.

      Comments on revisions:

      The authors replied to most comments.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations for the authors):

      Major comments

      (1) Line 201: The threshold of 0.25 was maintained to select enriched genes, which minimize the value of the GO term enrichment analyses. It may notably explain why the term phagosome is enriched in cluster 7, while experimental data indicate that cluster 7 is not phagocytic. In addition, the authors mentioned in the 1st response to reviewer that they would include DotPlot to illustrate the specificity of the genes corresponding to the main GO terms. This should notably include the ribosomal genes found enriched in cluster 4, which constitute the basis used by the authors to call cluster 4 the progenitor cluster.

      We appreciate the reviewer’s concern regarding our chosen log2FC threshold (0.25) for GO term enrichment. To assess the robustness of our approach, we tested more stringent thresholds (e.g., 0.5) and verified that our overall interpretations remain consistent. However, we acknowledge that certain GO terms, such as phagosome, may appear in clusters that are not primarily phagocytic. This is likely due to the fact that genes involved in vesicle trafficking, endo-lysosomal compartments and intracellular degradation processes overlap with those classically associated with phagocytosis.

      Therefore, the KEGG-based enrichment of phagosome in cluster 7 does not necessarily imply active phagocytosis but could instead reflect these alternative vesicular processes. As we show, cluster 7 correspond to vesicular cells, and as seen in cytology we named these cells after their very high content of vesicular structures. As functional annotation based solely on transcriptomic data can sometimes lead to overinterpretations, we emphasize the importance of biological validation, which we have partially addressed through functional assays in this study.

      Regarding the specificity of ribosomal gene expression in cluster 4, we analyzed the distribution of ribosomal genes expressed across all clusters, as shown in Supplementary Figure S1-J. This analysis demonstrates that cluster 4 is specifically enriched in ribosome-related genes, reinforcing its characterization as a transcriptionally active population. Given that ribosomal gene expression is a key feature often associated with proliferative or metabolically active cells, these findings support our initial interpretation that cluster 4 may represent an undifferentiated or progenitor-like population.

      We acknowledge the reviewer’s suggestion to include a DotPlot to further illustrate the specificity of these genes in cluster 4. However, we believe that Supplementary Figure S1-J already effectively demonstrates this enrichment by presenting the percentage of ribosomal genes per cluster. A DotPlot representation would primarily convey the same information in a different format, but without providing additional insight into the specificity of ribosomal gene expression within cluster 4.

      (2) The lineage analysis is highly speculative and based on weak evidences. Initiating the hemocyte lineage to C4 is based on rRNA expression levels. C6 would constitute a better candidate, notably with the expression of PU-1, ELF2 and GATA3 that regulate progenitors differentiation in mammals (doi: 10.3389/fimmu.2019.00228, doi:10.1128/microbiolspec.mchd-0024-2, doi: 10.1098/rsob.180152) while C4 do not display any specific transcription factors (Figure 7I). In addition, the representation and interpretation of the transcriptome dynamics in the different lineages are erroneous. There are major inconsistencies between the data shown in the heatmaps Fig7C-H, Fig S10 and the dotplot in Fig7I. For example, Gata3 (G31054) and CgTFEB (G30997) illustrate the inconsistency. Fig S10C show GATA3 going down from cluster 4 to cluster 6 while Fig 7I show an increase level of expression in 6 compared to 4. CgTFEB (G30997) decrease from C4 to VC in Fig 7F while it increases according to Fig 7I. At last, Figure 7D: the umap show transition from C4 to C5 while the heatmap mention C4 to C6 (I believe there is a mix up with Figure 7E.

      We sincerely apologize for the inconsistencies noted between the different panels of Figure 7. These discrepancies resulted from using an incorrect matrix dataset during the initial representation. To address this issue, we have fully reprocessed the data and now provide a corrected and improved depiction of gene expression dynamics along the pseudotime trajectory. We are grateful to the reviewer for having help us to correct theses mistakes.

      In the revised version, we offer a comprehensive and consistent representation of expression level variations for key genes identified by the Monocle3 algorithm. Supplementary Figure S10 now presents the average expression variation of these significant genes as a function of pseudotime. Based on this dataset, we carefully selected representative genes to construct panels C to H of Figure 7, ensuring coherence across all figures. These updated panels show both average expression levels and the percentage of expressing cells along the pseudotime trajectory, providing a clearer interpretation of transcriptomic dynamics.

      We appreciate the reviewer’s helpful feedback regarding our lineage analysis and the suggestion that cluster 6 might be a more appropriate progenitor based on the expression of mammalian-like transcription factors such as PU-1, ELF2, and GATA3. Below, we clarify our rationale for choosing cluster 4 as the root of the pseudotime and discuss the functional implications of the identified transcription factors.

      We can hypothesize that clusters 4, 5, or 6 could each potentially represent early progenitor-like states, as these three clusters are transcriptionally close (Lines 539-541). These clusters have not yet been conclusively identified in terms of classical hemocyte morphology, and they appear to arise from ABL- or BBL-type cells. Our decision to root the pseudotime at cluster 4 was motivated by its strong expression of core transcription and translation genes, suggesting a particular stage of translation activity that was not observed for cluster 5 or cluster 6. Cluster 5 and 6 may correspond to a similar population of cells, most probably Blast-Like cells at different stages of cell cycle or differentiation engagement.

      Although cluster 6 expresses PU-1, ELF2, and GATA3, which are known regulators of haematopoietic progenitor differentiation in vertebrates, it is essential to highlight that structural homology does not necessarily imply functional equivalence. Moreover, the expression of PU-1, ELF2, and GATA3 does not strictly characterize a population as “undifferentiated” or progenitor-like. Studies such as those by Buenrostro et al. (Cell, 2018) have demonstrated that these transcription factors can remain active in or reemerge during more lineage-committed stages. For instance, PU-1 is essential for myeloid and B-cell differentiation, GATA3 is involved in T-lymphocyte lineage commitment (though transiently expressed in early progenitors), and ELF2 participates in lineage-specific pathways. Thus, their presence does not imply a primitive state but rather highlights their broader functional roles in guiding and refining lineage decisions. Functional annotation of these transcription factors in invertebrate systems remains speculative, particularly as morphological or molecular markers specific to these early hemocyte lineages are not yet fully established. Further functional assays (e.g., knockdown/overexpression or lineage tracing using cells (ABL and BBL) from clusters 4, 5 and 6) will be necessary to determine which hemocyte population harbor progenitor properties and differentiation potential.

      To further address the reviewer’s concern, we performed complementary pseudotime analyses by initiating Monocle 3 trajectories from clusters 4, 5, and 6 individually, as well as collectively (4/5/6). These analyses (see attached figure) confirm that the overall differentiation topology remains unchanged regardless of the selected root, consistently revealing two main pathways: one leading to hyalinocytes and the other to the granular lineage (ML, SGC, and VC). This consistency strongly suggests that clusters 4, 5, and 6 represent related pools of progenitor-like cells. Therefore, choosing cluster 4 based on its transcription/translation readiness does not alter the inferred branching architecture of hemocyte differentiation.

      We appreciate the reviewer’s suggestions, which have helped us improve our manuscript and clarify our rationale.

      Author response image 1.

      Representation of the trajectories obtained from Monocle3 analysis using different pseudotime origins, showing that changing the rooting did not alter the overall differentiation topology. (A) Pathways identified with cluster 4, (B) cluster 5, (C) cluster 6, and (D) cluster 4/5/6 origins.

      (3) Concerning the AMP expression analysis in Figure 6: the qPCR data show that Cg-BPI and Cg-Defh are expressed broadly in all fractions including 6 and 7, which is in conflict with the statement Line 473 indicating that SGC (fractions 6 and 7) is not expressing AMP. In addition, this analysis should be combined with the expression profile of all AMP in the scRNAseq data (list available in 10.1016/j.fsi.2015.02.040).

      We thank the reviewer for highlighting this point. We acknowledge that the qPCR data show expression of Cg-BPI and Cg-Defh across all fractions, including fractions 6 and 7 corresponding to SGC. However, our conclusion that SGCs do not express antimicrobial peptides (AMPs) was based on a correlation analysis rather than direct detection of AMPs in granular cells. Specifically, the qPCR experiments were designed to measure AMP expression levels in fractionated hemocyte populations relative to a control sample of whole hemolymph. We then performed a correlation analysis between AMP expression levels and the proportion of each hemocyte type in the fractions. This approach allowed us to infer a lower expression of AMP in granular cells, as reflected in the heatmap presented in Figure 6.

      Regarding the suggestion to integrate AMP expression profiles from scRNA-seq data, we wrote that the limited sequencing depth of our scRNA-seq analysis was insufficient to accurately detect AMP expression (Ligne 472-473 → “However, due to the limited sequencing depth, the scRNA-seq analysis was not sensitive enough to reveal AMP expression.”.  Additionally, many of the known AMPs of Crassostrea gigas are not annotated in the genome, further complicating their identification within the scRNA-seq dataset. As a result, we were unable to perform the requested integration of AMP expression profiles from scRNA-seq data.

      (4) The transcription factor expression analysis is descriptive and the interpretation too partial. These data should be compared with other systems. Most transcription factors show functional conservation, notably in the inflammatory pathways, which can provide valuable information to understand the function of the clusters 5 and 6 for which limited data are available.

      We appreciate the reviewer’s suggestion to compare the identified transcription factors with other systems. However, since we did not perform a detailed phylogenetic analysis of the transcription factors identified in our dataset, we refrain from making assumptions about their functional conservation across species. Our analysis aims to provide a descriptive overview of transcription factor expression patterns in hemocyte clusters, which serves as a foundation for future functional studies. While transcription factor profiles may provide insights into the potential roles of clusters 5 and 6, assigning precise functions based solely on bioinformatic predictions remains speculative. Further experimental validation, including functional assays and evolutionary analyses, would be necessary to confirm the roles of these transcription factors, which is beyond the scope of the present study.

      Minor comments

      Line 212-213: the text should be reformulated. In the result part, it is more important to mention that the reannotation is based on conserved proteins functions than to mention the tool Orson.

      We have reworded this section to emphasize that the updated annotation is function-based, using Orson primarily as the bioinformatics tool for improved GO annotation. We now place the emphasis on the conserved protein functions underlying the reannotation. Lines 212-215 : “Using the Orson pipeline (see Materials and Methods), these files were used to extract and process the longest CDSs for GO-term annotation, and we then reannotated each predicted protein by sequence homology, assigning putative functions and improving downstream GO-term analyses.”

      Figure 2: I would recommend homogenizing the two Dotplot representation with the same color gradient and representing the gene numbers in both case.

      We appreciate the reviewer’s suggestion to improve the clarity and consistency of Figure 2. In response, we have homogenized the color gradients across the two DotPlot representations and have included gene numbers in both cases to ensure a more uniform and informative visualization.

      Table 2: pct1 and pct2 should be presented individually like in table 1

      We now present these columns separately (pct1, pct2) as in Table 1, so readers can compare the fraction of expressing cells in each cluster more transparently.

      Line 403-414: how many cells were quantified for the phagocytic experiments ?

      We have added the exact number of cells that were counted to determine phagocytic indices and the number of technical/biological replicates. Line 411, the text was modified : “Macrophage-like cells and small granule cells showed a phagocytic activity of 49 % and 55 %, respectively, and a phagocytosis index of 3.5 and 5.2 particles per cell respectively (Fig. 5B and Supp. Fig. 7B), as confirmed in 3 independent experiments examining a total of 2,807 cells.”

      Line 458: for copper staining, how many cells and how many replicates were done for the quantification ?

      We have specified the number of hemocytes and number of independent replicates used when quantifying rhodanine-stained (copper-accumulating) cells. Line 458 the following text was added : “and a total of 1,562 cells were examined across three independent experiments.”

      Line 461: what are the authors referring to when mentioning the link between copper homeostasis and scRNAseq?

      Single-cell RNA sequencing (scRNA-seq) analysis revealed an upregulation of several copper transport– related genes, including G4790 (a copper transporter) with a 2.7 log2FC and a pct ratio of 42, as well as the divalent cation transporters G5864 (zinc transporter ZIP10) and G4920 (zinc transporter 8), specifically in cluster 3 cells identified as small granule cells. These findings reinforce a potential role for this cluster in metal homeostasis.

      We modified lines 462-467 as : “ These results provide functional evidence that small granule cells (SGCs) are specialized in metal homeostasis in addition to phagocytosis, as suggested by the scRNA-seq data identifying cluster 3. Specifically, single-cell RNA sequencing revealed an upregulation of copper transport– related genes, including G4790 (a copper transporter) with a 2.7 log2FC and a pct ratio of 42, reinforcing the role of SGCs in copper homeostasis (see Supp. File S1).”

      Line 611: it would be nice to display the enrichment of the phagocytic receptor in cluster 3 (dotplot or feature plot) to illustrate the comment.

      We appreciate the reviewer’s insightful suggestion regarding a more comprehensive analysis of phagocytic receptors. While a full inventory is beyond the scope of this study, we acknowledge the value of such an approach and hope that our findings will serve as a foundation for future investigations in this direction.

      Although we have highlighted certain phagocytic receptors (e.g., a scavenger receptor domain-containing gene) in our scRNA-seq dataset, it is beyond the scope of the current study to inventory all phagocytosisrelated receptors in the C. gigas genome, which itself would be a substantial undertaking. Moreover, singlecell RNA sequencing captures only about 15–20% of each cell’s mRNA, so we inherently lose a significant portion of the transcriptome, further limiting our ability to pinpoint all relevant phagocytic receptor genes. Adding more figures to cover every candidate receptor would risk overloading this paper, thus we focus on the most prominent examples. A promising approach for more exhaustive analysis would involve efficiently isolating granulocytes (e.g., via Percoll gradient) and performing targeted RNA-seq on this cell population to thoroughly explore genes involved in phagocytosis.

      Line 640-644: the authors mentioned that ML may be able to perform ETosis based on the oxidative burst.

      This hypothesis requires further evidences. Are other markers of ETosis expressed in this cell type?

      We agree that additional experimental evidence (e.g., detection of histone citrullination, extracellular DNA networks) is necessary to confirm ETosis in molluscan immune cells. We present ML-mediated ETosis only as a speculative possibility based on oxidative burst capacity as it was shown in different pieces of work that ETosis is inhibited by NADPH inhibitors (Poirier et al. 2014). Nevertheless, the expression of histones in the macrophage-like cluster (cluster 1) reinforces this possibility, as histone modifications play a key role in chromatin decondensation during ETosis.

      Reviewer #2 (Recommendations for the authors):

      Figure 1: In Figure 1B, the cell clusters are named 1 to 7, whereas in Figure 1C they are displayed as clusters 0 to 6. There is a mismatch between the identification of the clusters.

      We thank the reviewer for identifying this inconsistency. The cluster numbering has been corrected to ensure consistency between Figures 1B and 1C.

      Figure 2B: the font size could be increased for greater clarity.

      We thank the reviewer for this suggestion. The font size in Figure 2B has been increased to improve clarity and readability.

      Line 221: "Figures 2B, C and D" appears to refer to Figure S2 rather than the main Figure 2.

      The text has been corrected to properly reference the figure.

      Line 754: "Anopheles gambiae" should be italicised

      We thank the reviewer for pointing this out. "Anopheles gambiae" has been italicized accordingly.

      Bibliography

      Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic

      Differentiation. Buenrostro, Jason D. et al. Cell, Volume 173, Issue 6, 1535 - 1548.e16

      Antimicrobial Histones and DNA Traps in Invertebrate Immunity

      Poirier, Aurore C. et al. Journal of Biological Chemistry, Volume 289, Issue 36, 24821 - 24831

    1. eLife Assessment

      This important study shows how the relative importance of inter-species interactions in microbiomes can be inferred from empirical species abundance data. The methods based on statistical physics of disordered systems are convincing and rigorous, and allow for distinguishing healthy and non-healthy human gut microbiomes via differences in their inter-species interaction patterns. This work should be of broad interest to researchers in microbial ecology and theoretical biophysics.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors develop a novel method to infer ecologically-informative parameters across healthy and diseased states of the gut microbiota, although the method is generalizable to other datasets for species abundances. The authors leverage techniques from theoretical physics of disordered systems to infer different parameters - mean and standard deviation for the strength of bacterial interspecies interactions, a bacterial immigration rate, and the strength of demographic noise - that describe the statistics of microbiota samples from two groups-one for healthy subjects and another one for subjects with chronic inflammation syndromes. To do this, the authors simulate communities with a modified version of the Generalized Lotka-Volterra model and randomly-generated interactions, and then use a moment-matching algorithm to find sets of parameters that better reproduce the data for species abundances. They find that these parameters are different for the healthy and diseased microbiota groups. The results suggest, for example, that bacterial interaction strengths, relative to noise and immigration, are more dominant for microbiota dynamics in diseased states than in healthy states.

      We think that this manuscript brings an important contribution that will be of interest in the areas of statistical physics, (microbiota) ecology, and (biological) data science. The evidence of their results is solid and the work improves the state-of-the-art in terms of methods. There are a few weaknesses that, in our opinion, the authors could address to further improve the work.

      Strengths:

      (1) Using a fairly generic ecological model, the method can identify the change in the relative importance of different ecological forces (distribution of interspecies interactions, demographic noise, and immigration) in different sample groups. The authors focus on the case of the human gut microbiota, showing that the data are consistent with a higher influence of species interactions (relative to demographic noise and immigration) in a disease microbiota state than in healthy ones.

      (2) The method is novel, original, and it improves the state-of-the-art methodology for the inference of ecologically relevant parameters. The analysis provides solid evidence for the conclusions.

      Weaknesses:

      In the way it is written, this work might be mostly read by physicists. We believe that, with some rewriting, the authors could better highlight the ecological implications of the results and make the method more accessible to a broader audience.

    3. Reviewer #2 (Public review):

      Summary:

      This valuable work aims to infer, from microbiome data, microbial species interaction patterns associated with healthy and unhealthy human gut microbiomes. Using solid techniques from statistical physics, the authors propose that healthy and unhealthy microbiome interaction patterns substantially differ. Unhealthy microbiomes are closer to instability and single-strain dominance; whereas healthy microbiomes showcase near-neutral dynamics, mostly driven by demographic noise and immigration.

      Strengths:

      A well-written article, relatively easy to follow and transparent despite the high degree of technicality of the underlying theory. The authors provide a powerful inferring procedure, which bypasses the issue of having only compositional data.

      Weaknesses:

      (1) This sentence in the introduction seems key to me: "Focusing on single species properties as species abundance distribution (SAD), fail to characterise altered states of microbiome." Yet it is not explained what is meant by 'fail', and thus what the proposed approach 'solves'.

      (2) Lack of validation, following arbitrary modelling choices made (symmetry of interactions, weak-interaction limit, uniform carrying capacity).<br /> Inconsistent interpretation of instability. Here, instability is associated with the transition to the marginal phase, which becomes chaotic when interaction symmetry is broken. But as the authors acknowledge, the weak interaction limit does not reproduce fat-tailed abundance distributions found in data. On the other hand, strong interaction regimes, where chaos prevails, tend to do so (Mallmin et al, PNAS 2024). Thus, the nature of the instability towards which unhealthy microbiomes approach is unclear.

      (3) Three technical points about the methodology and interpretation.<br /> a) How can order parameters h and q0 can be inferred, if in the compositional data they are fixed by definition?<br /> b) How is it possible that weaker interaction variance is associated with approach to instability, when the opposite is usually true?<br /> c) Having an idea of what the empirical data compares to the theoretical fits would be valuable.

      Implications:

      As the authors say, this is a proof of concept. They point at limits and ways to go forward, in particular pointing at ways in which species abundance distributions could be better reproduced by the predicted dynamical models. One implication that is missing, in my opinion, is the interpretability of the results, and what this work achieves that was missing from other approaches (see weaknesses section above): what do we learn from the fact that changes in microbial interactions characterise healthy from unhealthy microbiota? For instance, what does this mean for medical research?

    4. Reviewer #3 (Public review):

      Summary:

      I found the manuscript to be well-written. I have a few questions regarding the model, though the bulk of my comments are requests to provide definitions and additional clarity. There are concepts and approaches used in this manuscript that are clear boons for understanding the ecology of microbiomes but are rarely considered by researchers approaching the manuscript from a traditional biology background. The authors have clearly considered this in their writing of S1 and S2, so addressing these comments should be straightforward. The methods section is particularly informative and well-written, with sufficient explanations of each step of the derivation that should be informative to researchers in the microbial life sciences who are not well-versed with physics-inspired approaches to ecology dynamics.

      Strengths:

      The modeling efforts of this study primarily rely on a disordered form of the generalized Lotka-Volterra (gLV) model. This model can be appropriate for investigating certain systems, and the authors are clear about when and how more mechanistic models (i.e., consumer-resource) can lead to gLV. Phenomenological models such as this have been found to be highly useful for investigating the ecology of microbiomes, so this modeling choice seems justified, and the limitations are laid out.

      Weaknesses:

      The authors use metagenomic data of diseased and healthy patients that were first processed in Pasqualini et al. (2024). The use of metagenomic data leads me to a question regarding the role of sampling effort (i.e., read counts) in shaping model parameters such as $h$. This parameter is equal to the average of 1/# species across samples because the data are compositional in nature. My understanding is that $h$ was calculated using total abundances (i.e., read counts). The number of observed species is strongly influenced by sampling effort, so it would be useful if the number of reads were plotted against the number of species for healthy and diseased subjects.

      However, the role of sampling effort can depend on the type of data, and my instinct about the role that sampling effort plays in species detection is primarily based on 16S data. The dependency between these two variables may be less severe for the authors' metagenomic pipeline. This potential discrepancy raises a broader issue regarding the investigation of microbial macroecological patterns and the inference of ecological parameters. Often microbial macroecology researchers rely on 16S rRNA amplicon data because that type of data is abundant and comparatively low-cost. Some in microbiology and bioinformatics are increasingly pushing researchers to choose metagenomics over 16S. Sometimes this choice is valid (discovery of new MAGs, investigate allele frequency changes within species, etc.), sometimes it is driven by the false equivalence "more data = better". The outcome, though, is that we have a body of more-or-less established microbial macroecological patterns which rest on 16S data and are now slowly incorporating results from metagenomics. To my knowledge, there has not been a systematic evaluation of the macroecological patterns that do and do not vary by one's choice in 16S vs. metagenomics. Several of the authors in this manuscript have previously compared the MAD shape for 16S and metagenomic datasets in Pasqualini et al., but moving forward, a more comprehensive study seems necessary (2024).

      References

      Pasqualini, Jacopo, et al. "Emergent ecological patterns and modelling of gut microbiomes in health and in disease." PLOS Computational Biology 20.9 (2024): e1012482.

    5. Author response:

      Reviewer #1:

      Strengths:

      (1) Using a fairly generic ecological model, the method can identify the change in the relative importance of different ecological forces (distribution of interspecies interactions, demographic noise, and immigration) in different sample groups. The authors focus on the case of the human gut microbiota, showing that the data are consistent with a higher influence of species interactions (relative to demographic noise and immigration) in a disease microbiota state than in healthy ones. (2) The method is novel, original, and it improves the state-of-the-art methodology for the inference of ecologically relevant parameters. The analysis provides solid evidence for the conclusions. 

      Weaknesses:

      In the way it is written, this work might be mostly read by physicists. We believe that, with some rewriting, the authors could better highlight the ecological implications of the results and make the method more accessible to a broader audience.

      We thank the reviewer for their positive and constructive feedback. We particularly appreciate the recognition of the novelty and robustness of our method, as well as the insight that it sheds light on the shifting ecological forces between healthy and diseased microbiomes. In response to the concern about the manuscript’s accessibility, we aim to revise key sections – including the Introduction, Results, and Discussion – to more clearly articulate the ecological relevance of our theoretical findings. We would like to emphasize that our approach offers a novel perspective for analyzing individual species' abundances, as well as for understanding interaction patterns and stability at the community level. By placing our results within a broader context accessible to readers from diverse backgrounds, we aim for the revised version to appeal to a wider audience, including ecologists and microbiome scientists, while preserving the rigor of our underlying statistical physics framework.

      Reviewer #2:

      Strengths:

      A well-written article, relatively easy to follow and transparent despite the high degree of technicality of the underlying theory. The authors provide a powerful inferring procedure, which bypasses the issue of having only compositional data. 

      Weaknesses:

      (1) This sentence in the introduction seems key to me: "Focusing on single species properties as species abundance distribution (SAD), it fails to characterise altered states of microbiome." Yet it is not explained what is meant by 'fail', and thus what the proposed approach 'solves'. (2) Lack of validation, following arbitrary modelling choices made (symmetry of interactions, weak-interaction limit, uniform carrying capacity). Inconsistent interpretation of instability. Here, instability is associated with the transition to the marginal phase, which becomes chaotic when interaction symmetry is broken. But as the authors acknowledge, the weak interaction limit does not reproduce fat-tailed abundance distributions found in data. On the other hand, strong interaction regimes, where chaos prevails, tend to do so (Mallmin et al, PNAS 2024). Thus, the nature of the instability towards which unhealthy microbiomes approach is unclear. (3) Three technical points about the methodology and interpretation. a) How can order parameters ℎ and 𝑞0 can be inferred, if in the compositional data they are fixed by definition? b) How is it possible that weaker interaction variance is associated with an approach to instability, when the opposite is usually true? c) Having an idea of what the empirical data compares to the theoretical fits would be valuable. Implications: As the authors say, this is a proof of concept. They point at limits and ways to go forward, in particular pointing at ways in which species abundance distributions could be better reproduced by the predicted dynamical models. One implication that is missing, in my opinion, is the interpretability of the results, and what this work achieves that was missing from other approaches (see weaknesses section above): what do we learn from the fact that changes in microbial interactions characterise healthy from unhealthy microbiota? For instance, what does this mean for medical research?

      We greatly appreciate the reviewer’s thoughtful analysis highlighting both the strengths and areas of ambiguity in our work.

      (1) To clarify the sentence on the limitations of species abundance distributions (SADs), we aim to explain in the revised version that while SADs summarize the relative abundance of individual species, they fail to capture the species-species correlations that we have shown (Seppi et al., Biomolecules 2023) to be more susceptible to the healthy state of the host. Our method thus focused on the interaction statistics among species, providing insights into underlying dynamics and stability of the microbiomes and their differences between healthy and unhealthy hosts.

      (2) Regarding model assumptions, we acknowledge that the weak interaction regime and symmetry hypotheses simplify the analysis and may not capture all empirical richness, such as fat-tailed distributions of species abundance. However, we interpret instability not as a path to chaos per se, but as a transition toward a multi-attractor phase, where each microbiome reaches a different fixed point. This is consistent with prior empirical findings invoking the “Anna Karenina principle”, where healthy microbiomes resemble one another, but disease states tend to deviate from this picture (see Pasqualini et al., PLOS Comp. Bio. 2024). We consider our framework as a starting point and agree that further extensions incorporating strong interaction regimes (as suggested by Mallmin et al., PNAS 2024) or relaxing other model assumptions could reveal even richer dynamical patterns. The computational pipeline we present can be, in fact, easily generalizable to include different population dynamics models.

      On the technical questions: (a) While compositional data constrain relative abundances, we can still estimate diversity-dependent parameters (h and q0) using alpha-diversity statistics across samples, which show meaningful variation; (b) The counter-intuitive instability that the reviewer pointed out arises from the interplay between demographic stochasticity and quenched disorder. It is the combined contribution of these two factors in phase space – not either one alone – that drives the transition. For clarity, see Figure 1 in Altieri et al., Phys. Rev. Lett. 2021; (c) We plan to include plots that compare empirical data to theoretical model fits. This will help visualize how well the model captures observed microbial community properties demographic noise (𝑇), healthy communities are more stable (i.e., distantσ from the and how even with larger species interaction heterogeneity (σ) and larger critical line), as measured, by the replicon eigenvalue. Finally, regarding interpretability and implications: by showing that ecological interaction networks – not just species identities – differ between healthy and unhealthy states, our work suggests a conceptual shift. This could inform medical strategies aimed at restoring community-level stability rather than targeting individual microbes. In the revised Discussion section, we will elaborate on this point to better highlight its practical implications and outline potential directions for future research.

      Reviewer #3:

      Strengths:

      The modeling efforts of this study primarily rely on a disordered form of the generalized Lotka-Volterra (gLV) model. This model can be appropriate for investigating certain systems, and the authors are clear about when and how more mechanistic models (i.e., consumer-resource) can lead to gLV. Phenomenological models such as this have been found to be highly useful for investigating the ecology of microbiomes, so this modeling choice seems justified, and the limitations are laid out. 

      Weaknesses:

      The authors use metagenomic data of diseased and healthy patients that were first processed in Pasqualini et al. (2024). The use of metagenomic data leads me to a question regarding the role of sampling effort (i.e., read counts) in shaping model parameters such as h. This parameter is equal to the average of 1/# species across samples because the data are compositional in nature. My understanding is that it was calculated using total abundances (i.e., read counts). The number of observed species is strongly influenced by sampling effort, so it would be useful if the number of reads were plotted against the number of species for healthy and diseased subjects. However, the role of sampling effort can depend on the type of data, and my instinct about the role that sampling effort plays in species detection is primarily based on 16S data. The dependency between these two variables may be less severe for the authors' metagenomic pipeline. This potential discrepancy raises a broader issue regarding the investigation of microbial macroecological patterns and the inference of ecological parameters. Often microbial macroecology researchers rely on 16S rRNA amplicon data because that type of data is abundant and comparatively low-cost. Some in microbiology and bioinformatics are increasingly pushing researchers to choose metagenomics over 16S. Sometimes this choice is valid (discovery of new MAGs, investigate allele frequency changes within species, etc.), sometimes it is driven by the false equivalence "more data = better". The outcome, though, is that we have a body of more-or-less established microbial macroecological patterns which rest on 16S data and are now slowly incorporating results from metagenomics. To my knowledge, there has not been a systematic evaluation of the macroecological patterns that do and do not vary by one's choice in 16S vs. metagenomics. Several of the authors in this manuscript have previously compared the MAD shape for 16S and metagenomic datasets in Pasqualini et al., but moving forward, a more comprehensive study seems necessary.

      We thank the reviewer for this insightful and nuanced comment, which particularly highlights the broader methodological context of our data sources. Indeed, metagenomic sequencing introduces different biases with respect to 16S data. First, we would like to emphasize that we estimated the order parameters from the data by using relative abundances. Second, while the concern regarding the influence of sequencing depth and species diversity on the estimation of the order parameters is valid, we refer to a previous publication by some of the authors (Pasqualini et al., 2024; see Figure 4, panels g and h). There, we pointed out that the observed outcome is weakly influenced by sequencing depth in our dataset, while the main impact on the order parameters estimate comes from the species diversity of the two groups. In the same publication, we showed that other well-known patterns (species abundance distribution, mean abundance distribution) are also observed. Also, to mitigate the effect of the number of samples and sequencing depth, we estimated the order parameters by a bootstrap procedure (90% of samples for healthy and diseased groups, 5000 resamples), which resulted in the error bars in Figure 2.

      We also fully agree with the broader call for a systematic comparison of macroecological patterns derived from 16S and metagenomic data. While some of us have already begun exploring this direction (e.g., Pasqualini et al., 2024), the reviewer’s comment highlights its significance and motivates us to pursue a more comprehensive, integrative analysis across data types. While we found qualitative agreement of these patterns with previous publications (e.g., Grilli, Nature Comm. 2020), we will acknowledge this as an important future direction in the Discussion section.

      References

      (1) Seppi, M., Pasqualini, J., Facchin, S., Savarino, E.V. and Suweis, S., 2023. Emergent functional organization of gut microbiomes in health and diseases. Biomolecules, 14(1), p.5.

      (2) Pasqualini, J., Facchin, S., Rinaldo, A., Maritan, A., Savarino, E. and Suweis, S., 2024. Emergent ecological patterns and modelling of gut microbiomes in health and in disease. PLOS Computational Biology, 20(9), p.e1012482.

      (3) Mallmin, E., Traulsen, A. and De Monte, S., 2024. Chaotic turnover of rare and abundant species in a strongly interacting model community. Proceedings of the National Academy of Sciences, 121(11), p.e2312822121.

      (4) Altieri, A., Roy, F., Cammarota, C., & Biroli, G. (2021). Properties of equilibria and glassy phases of the random Lotka-Volterra model with demographic noise. Physical Review Letters, 126(25), 258301.

      (5) Grilli, J. (2020). Macroecological laws describe variation and diversity in microbial communities. Nature communications, 11(1), 4743.

    1. eLife Assessment

      This study makes an important contribution to the molecular mechanisms of neural circuit formation. The data convincingly show that the transcription factor Sp1 regulates ephrin-mediated axon guidance in the spinal cord. Although the authors show that Sp1 and its co-activators p300 and CBP are required to induce ephrin expression, additional discussion and/or experiments are needed to support the claims that Sp1 regulates cis-binding of Epha receptors, or that Sp1 controls ephrin expression in relevant motor neuron populations. The study will be of broad interest to developmental neurobiologists.

    2. Reviewer #1 (Public review):

      The manuscript by Liao et al investigates the mechanisms that induce ephrin expression in spinal cord lateral motor column (LMC) neurons to facilitate axon guidance into the dorsal and ventral limb. The authors show that Sp1 and its co-activators p300 and CBP are required to induce ephrin expression to modulate the responsiveness of motor neurons to external ephrin cues. The study is well done and convincingly demonstrates the role of Sp1 in motor neuron axon guidance.

      Further discussion and clarification of some results would further improve the study.

      (1) The mechanism that the authors propose (Figure 7) and is also supported by their data is that Sp1 induces ephrinA5 in LMCm and ephrinB2 in LMCl to attenuate inappropriate responses to external ephrins in the limb. Therefore, deletion of Sp1 should result in mistargeting of LMCl and LMCm axons, as shown in the mouse data, but no overt changes in the number of axons in the ventral and dorsal limb. From the mouse backfills, it seems that an equal number of LMCm/LMCl project into the wrong side of the limb. However, the chick data show an increase of axons projecting into the ventral limb in the Sp1 knockout. Is this also true in the mouse? The authors state that medial and lateral LMC neurons differ in their reliance on Sp1 function but that is not supported by the mouse backfill data (27% vs 32% motor neurons mistargeted). Also, the model presented in Figure 7 does not explain how Sp1 overexpression leads to axon guidance defects.

      (2) The authors do not directly show changes in ephrin expression in motor neurons, either in chick or mouse, after Sp1 knockout, which is the basis of their model. The experiment in Figure 4G seems to be Sp1 overexpression rather than knockdown (as mentioned in the results) and NSC-34 cells may not be relevant to motor neurons in vivo. NSC-34 experiments are also not described in the methods.

      (3) There is no information about how the RNA-sequencing experiment was done (which neurons were isolated, how, at what age, how many replicates, etc) so it is hard to interpret the resulting data.

      (4) It is unclear why the authors chose to use a Syn1-cre driver rather than a motor neuron restricted cre driver. Since this is a broad neuronal cre driver, the behavioral defects shown in Figure 7 may not be solely due to Sp1 deletion in motor neurons. Are there other relevant neuronal populations that express Sp1 that are targeted by this cre-mediated deletion?

    3. Reviewer #2 (Public review):

      Summary:

      This study shows that transcription factor Sp1 is required for correct ventral vs. dorsal targeting of limb-innervating LMC motor neurons using mouse and chick as model systems. In a wild-type embryo, lateral LMC axons specifically target dorsal muscles while medial LMC axons target ventral muscles. The authors convincingly show that this specificity is lost when Sp1 is knocked down or knocked out - axons of both lateral and medial LMC motor neurons project to both dorsal and ventral muscles in mutant conditions. The authors then conduct RNA-seq and ChIP experiments to show that Sp1 loss of function disrupts Ephrin-Epha receptor signaling pathway genes. These molecules are known to provide attractive or repulsive cues to guide LMC axons to their targets. The authors show that attraction/repulsion properties of medial and lateral LMC axons to specific Ephrin/Epha molecules are in fact disrupted in Sp1 mutants using ex vivo explant studies. Finally, the authors show that behaviors like coordinated movement and grip strength are also affected in Sp1 mutant mice. This study convincingly shows that Sp1 is important for correct circuit wiring of LMC neurons, and moves the field forward by elucidating a new level of transcriptional regulation required in this process. However, the claims made by the authors that the mode of Sp1-mediated regulation is through cis-attenuation of Epha activity is not well supported. These and additional strengths and weaknesses in approach and in data interpretation are discussed below.

      Strengths:

      (1) The study convincingly shows that wildtype levels of Sp1 are necessary for LMC axon targeting specificity. The combination of the following approaches is a strength:<br /> a) Both loss of function and gain of function experiments are performed for Sp1 and show complementary effects on the axon targeting phenotype.<br /> b) Retrograde labeling of LMC neurons from dorsal and ventral muscles shows that Sp1 mutants clearly lose the specificity of LMC axon targeting.<br /> c) The authors also use explant experiments to show that both loss of Sp1 and gain of Sp1 show clear changes in attraction and repulsion to specific ephrin and epha receptor molecules.<br /> d) The Sp1 loss and gain of function experiments are well controlled to show that the changes in axon wiring observed are not due to cell death, cell fate switches, or due to unequal numbers of medial and lateral LMC neurons being labeled in the experiments.

      (2) It is also convincing that Sp1 requires cofactors p300 and CBP for its function. In the absence of these cofactors, the gain of function phenotypes of Sp1 are subdued.

      Weaknesses:

      (1) The robustness of RNAseq and ChIP experiments is difficult to judge as methods are not described. For example, it is unclear if RNAseq is performed on purified motor neurons or on whole spinal cords. This is an important consideration as Sp1 is a broadly expressed protein.

      (2) The authors state that expression of Ephrin A5 and Ephrin B2 is reduced based on RNAseq data, however, it is not shown that this reduction occurs specifically in LMC neurons.

      (3) The authors show Sp1 ChIP peaks at Ephrin B2 promoter, but nothing is mentioned about peaks at Eprin A5 or other types of signaling molecules like Sema7a, which are also differentially expressed in Sp1 mutants. There is also no mention of the correlation between changes in gene expression seen in RNAseq data and the binding profile of Sp1 seen in ChIP data, which could help establish the robustness of these datasets.

      (4) The authors conclude that Sp1 functions by activating Ephrin A5 in medial LMC and Ephrin B2 in lateral LMC. The argument, as I understand it, is that this activation leads to cis attenuation of their respective Epha receptors and therefore targeting the correct muscle. Though none of the data presented go against this hypothesis, this hypothesis is also not fully supported. Specifically:<br /> a) It would be important to know that modulation of Sp1 expression leads to changes in EphrinA5 and B2 in LMC lateral/medial neurons.<br /> b) It would also be important to show that none of the other changes caused by Sp1 are responsible for axon mistargeting by performing rescue experiments with Ephrin A5 and Ephrin B2.<br /> c) To make the most convincing case, experiments showing increased or decreased cis-binding of Ephrin molecules with Epha receptors would be necessary. This study would still be compelling without this last experiment, but the language in the abstract would need to be modulated.

      (5) All behavior experiments are done in a pan-neuronal knockout of Sp1. As Sp1 is broadly expressed in neurons, a statement describing whether and why the authors think the phenotypes arise from Sp1's function in LMC motor neurons would be helpful. Experimentally, rescue experiments in which Sp1 is restored in LMC neurons or motor neurons would also make this claim more convincing.

    4. Reviewer #3 (Public review):

      Summary:

      This is a compelling study on the role of Sp1 in motor axon trajectory selection, demonstrating that Sp1 is both necessary and sufficient for correct axon guidance in the limb. Sp1 regulates ephrin ligand expression to fine-tune Eph/ephrin signaling in the lateral motor column (LMC) neurons.

      Strengths:

      The study integrates multiple approaches. These include in ovo electroporation in chick embryos, conditional knockout mouse models, transcriptomic analyses, and functional assays such as stripe assays and behavioral testing-to provide robust evidence for Sp1's role in axon guidance mechanisms. The manuscript is well-written and scientifically rigorous, and the findings are of broad interest to the developmental neuroscience community.

      Weaknesses:

      Some aspects of the manuscript could be improved to enhance clarity, ensure logical flow, and strengthen the impact of the findings.

    5. Author response:

      Reviewer 1:

      (1) Clarification of axon mistargeting patterns and model interpretation

      We will clarify the apparent discrepancy between chick and mouse axon mistargeting data. Specifically, we will expand the explanation in the main text and Figure 7 legend and/or revise the model in Figure 7 to better reflect observed phenotypes and clarify how Sp1 overexpression contributes to mistargeting.

      (2) Evidence for Sp1-dependent ephrin expression

      We agree that demonstrating ephrin expression changes in motor neurons is essential. We will: • Conduct in situ hybridization and/or immunostaining for ephrins in control and Sp1 mutant spinal cords from both chick and mouse embryos.

      Clarify and expand the methodological details of the NSC-34 cell experiments shown in Figure 4G.

      (3) RNA-seq experiment details

      We will revise the Methods section to provide additional experimental details.

      (4) Use of Syn1-cre

      We acknowledge concerns about the broad expression of Syn1-cre. To address this:

      We will clarify our rationale for using Syn1-cre and describe its expression pattern in the spinal cord.

      We are evaluating the feasibility of additional experiments using a motor neuron-specific Cre driver to confirm cell-type specificity.

      We will include a new paragraph in the Discussion addressing potential contributions from other neuronal populations.

      Reviewer 2:

      (1) & (2) Clarification and localization of RNA-seq data

      We will expand the Methods section to provide greater detail on the RNA-seq approach. In addition, we will validate ephrin downregulation in LMC neurons using in situ hybridization and/or immunostaining.

      (3) Integration of ChIP and RNA-seq data We will:

      Report additional ChIP peaks for ephrinA5 and other differentially expressed genes such as Sema7a.

      Add a summary figure that integrates ChIP and RNA-seq results to strengthen the link between Sp1 binding and transcriptional regulation.

      (4) Clarification of the cis-attenuation model

      We recognize that our data do not yet directly demonstrate Sp1’s role in cis-attenuation. To address this:

      We will revise the abstract and main text to frame Sp1's role in cis-attenuation as a hypothesis. • We are exploring the feasibility of ephrinA5 and B2 rescue experiments in Sp1-deficient embryos to test specificity.

      (5) Behavioral phenotypes and cell-type specificity

      We will clarify that behavioral phenotypes may result from combined effects across neuron populations due to Syn1-cre expression. To address this:

      We are planning rescue experiments with Sp1 expression in chick embryos to test for rescue of axon misrouting.

      We will include a new paragraph in the Discussion to highlight this limitation and discuss alternative interpretations.

      Reviewer 3:

      We appreciate your positive evaluation and support for the rigor of our study.

      In response to your suggestions:

      We are revising the manuscript to improve clarity and flow, particularly the transitions between datasets.

      We will update Figure 7 and the associated text to more clearly convey the working model and avoid overinterpretation.

      We thank all reviewers for their constructive feedback and are committed to addressing each point thoroughly. All revisions will be clearly marked in the resubmitted manuscript.

    1. eLife Assessment

      This study offers valuable insights into the role of miR-283 in ventral-lateral neurons (LNvs) and its impact on senescence, cardiac function, and aging in the Drosophila melanogaster model. However, the evidence supporting some of the conclusions remains incomplete, and further mechanistic studies are needed to clarify how miR-283 affects normal aging and influences exercise adaptations. Nonetheless, the work can be of interest to cell biologists studying miRNA biology, aging, and age-related diseases.

    2. Reviewer #1 (Public review):

      In this study, Li et al et al. investigated the role of miR-283 in regulating cardiac aging and its potential contribution to age-related bradyarrhythmia. Using Drosophila as a model, the authors demonstrated that systemic overexpression or knockdown of miR-283 induced age-associated bradycardia. Notably, the study found that miR-283 knockdown in ventral-lateral neurons (LNvs), rather than in the heart, was sufficient to induce bradyarrhythmia, an effect the authors linked to the upregulation of miR-283 expression in both the brain and heart. The study also explored the beneficial impact of exercise on cardiac aging, showing that endurance training mitigated bradyarrhythmia, correlating with reduced miR-283 accumulation in the brain and myocardium.

      The conclusions of this paper are mostly well supported by data; however, some concerns arise from the unexpected finding that bradyarrhythmia was triggered by miR-283 knockdown in LNvs rather than in the heart, suggesting a non-cell-autonomous mechanism. A more precise mechanistic explanation linking miR-283 dysregulation in LNvs to cardiac dysfunction would strengthen the study's conclusions. While the authors propose cwo as a potential target of miR-283, no functional experiments were conducted to confirm its role in mediating miR-283's effects. Additionally, it remains unclear whether reduced miR-283 levels in LNvs lead to accelerated aging rather than a cardiac-specific effect. Likewise, the potential influence of miR-283 on the circadian clock and its broader impact on aging warrant further investigation.

      Major Comments:

      (1) A significant concern arises from the unexpected outcome observed in miR-283 knockdown in LNvs, which suggests a non-cell-autonomous mechanism. Elucidating the mechanisms by which miR-283 deficiency leads to the observed phenotypes would provide a more comprehensive understanding of the study's implications.

      (2) The authors propose cwo as a potential target of miR-283; however, no functional experiments were conducted to confirm its role in mediating miR-283's effects. Similarly, direct evidence demonstrating that cwo is a bona fide target of miR-283 in LNvs should be provided.

      (3) It remains unclear whether miR-283 knockdown in LNvs results in accelerated aging rather than a cardiac-specific effect. This hypothesis is supported by observations that pdf>miR-283SP animals exhibit systemic premature senescence (elevated SA-β-gal activity in both the heart and brain), cardiac dysfunction, impaired climbing ability, and reduced lifespan.

      (4) The finding that reduced miR-283 levels in LNvs lead to accelerated aging raises an important, yet unexplored, question: does miR-283 influence the circadian clock, thereby broadly affecting aging?

      Two aspects of this question should be addressed:<br /> (a) Is the circadian rhythm disrupted in miR-283 knockdown experiments?<br /> (b) Do circadian rhythm defects impact aging?

      (5) The authors state that miR-283 knockdown in LNvs led to bradyarrhythmia, which was mainly caused by miR-283 upregulation in the whole brain and heart. However, it is unclear which experiments support this conclusion. Could the authors clarify this point?

      (6) Given that miR-283 expression varies with age, could the upregulation of miR-283 in both the brain and heart be a consequence of accelerated aging rather than a specific effect of miR-283 knockdown in LNvs?

      (7) While the beneficial effects of exercise on cardiac function appear clear, the claim that this effect is mediated through miR-283 function in LNvs seems premature. The data suggest that exercise-induced improvement occurs in both wild-type and miR-283-SP animals, raising the possibility that exercise acts through a miR-283-independent mechanism.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript presents findings that indicate a role in controlling Drosophila heart rate for a conserved miRNA (miR-238 in flies). Further, the manuscript localizes the relevant tissue for the function of this miRNA to a subset of neurons that are heavily involved in circadian regulation, thus presenting an interesting mechanistic link between the circadian system and heart rate. Either ubiquitous knockout or ubiquitous overexpression negatively impacts several aspects of heart performance, with a pronounced effect on heart rate. Interestingly, knockdowns in the heart itself are innocuous, but knockdown in LNvS neurons recapitulates the effect on heart rate. Authors use bioinformatics to identify the clockwork orange (cwo) gene as a potential target and validate that cwo expression is reduced when miR-238 is knocked down in LNvS neurons in vivo and also validate that cwo is regulated by miR-238 in cell culture luciferase assays. Exercise shows a modest ability to restore normal cwo expression and a trend toward an effect on survival, but shows a much stronger rescue of the heart rate phenotype.

      Strengths:

      Evidence is strong for the effect of miR-238 in pdf-positive neurons on the control of heart rate and for cwo as a downstream effector of miR-238.

      Work to identify specific targets of miR-283 is well-done and successfully identified a key downstream regulator in cwo.

      The potential mechanism using miR-238 to link circadian neurons to heart rate regulation is novel and exciting.

      Weaknesses:

      The evidence that this is related to normal aging is rather weak, and the effect of exercise on the observed parameters is small and not necessarily working through the miR-238/cwo mechanism.

      The authors seem to be conflating two hypotheses in their interpretations. Is miR-283 working through circadian mechanisms or age-related mechanisms? While it is true that aging tends to reduce heart rate, I don't think that means that any intervention that reduces heart rate is causing "senescence". Similarly, reduced survival in miR-283 knockdown flies does not prove that miR-283 promotes healthy aging per se, just that miR-283 is required for health regardless of age.

      Survival reduction is quite modest which does not necessarily support the idea that the bradycardia is causing major health issues or premature senescence for the flies. The interpretation of the longevity experiments throughout the manuscript seems overstated.

      The study would benefit greatly from a direct test of the author's proposed pathway for exercise to improve bradycardia.

      The statement in the discussion "inducing endurance exercise of anti gravity climbing in flies with miR-283 knockdown in LNvs can improve bradyarrhythmic features by decreasing brain miR-283 expression" is not fully supported by data in the paper. There is an association there, but it cannot be said to be the full cause (or even required) without doing more experiments

      The summary figure includes both data-supported mechanistic relationships and mechanisms that are inferred or assumed.