10,000 Matching Annotations
  1. Last 7 days
    1. Reviewer #1 (Public review):

      In the manuscript, Aldridge and colleagues investigate the role of IL-27 in regulating hematopoiesis during T. gondii infection. Using loss-of-function approaches, reporter mice, and the generation of serial chimeric mice, they elegantly demonstrate that IL-27 induction plays a critical role in modulating bone marrow myelopoiesis and monocyte generation to the infection site. The study is well-designed, with clear experimental approaches that effectively address the mechanisms by which IL-27 regulates bone marrow myelopoiesis and prevents HSC exhaustion.

    2. Reviewer #2 (Public review):

      Summary:

      Aldridge et al. aim to demonstrate the role of IL27 in limiting emergency myelopoiesis in response to Toxoplasma gondii infection by acting directly at the level of early haematopoietic progenitors.

      They used different mouse genetic models, such as HSC lineage tracing, IL27 and IL27R-deficient mice, to show that:

      (1) HSCs actively participate in emergency myelopoiesis during Toxoplasma gondii infection.

      (2) The absence of IL27 and IL27R increases monocyte progenitors and monocytes, mainly inflammatory monocytes CCR2hi.

      (3) At steady state, loss of IL27 impairs HSC fitness as competitive transplantation shows long-term engraftment deficiency of IL27 BM cells. This impairment is exacerbated after infection.

      (4) IL27 is produced by various BM and other tissue cells at steady state, and its expression increases with infection, mainly by increasing the number of monocytes producing it.

      Although it is indisputable that IL27 has a role in emergency myelopoiesis by limiting the number of pro-inflammatory monocytes in response to infection, the authors' claim that it acts only on HSCs and not on more committed progenitors (CMP, GMP, MP) is not supported by the quality of the data presented here, as described below in the weakness section. In addition, this study highlights a role for IL27 during infection, but does not focus on trained immunity, which is the focus of the targeted elife issue.

      Weaknesses:

      (1) In Figure 4, MFI quantification is required. This figure also shows the expression level (FACS and RNA) in progenitors (GMP and CMP, GP, MP), which is quite similar to that of HSC at this level, so it is really surprising that CMP does not respond at all to IL27 (S5C).

      (2) Total BM was used to test the direct effect of IL27 on HSC. There could be an indirect effect from other more mature BM cells, even if they show lower receptor expression than HSC. This should be done on a different sorted population to prove the direct effect of IL27 on HSC. The authors need to look more closely at some stat-dependent genes or stat itself in different sorted cell populations, not just irgm1. It is also known that Stat is associated with increased HSC proliferation in response to IFN, which is the opposite of what is observed here.

      (3) The decrease in HSC fitness in IL27R KO at steady state could be an indirect effect of the increase in proinflammatory monocytes contributing to high levels of inflammatory cytokines in the BM and thus chronic HSC activation that is enhanced in response to infection. What is the pro-inflammatory cytokine profile of the BM of IL27 or IL27R deficient mice and of mixed chimera mice?

      (4) Furthermore, the FACS profile of KI67/brdu of Figure 7 is doubtful, as it is shown in different literature that KSL are not predominantly quiescent as shown here, but about 50% are KI67-. This is also inconsistent with the increase of HSC observed in Figure 1. Quantification of total BruDU+ HSC and other progenitors is also important to quantify all cells that have proliferated during infection. As the repopulation of IL27-deficient BM is also lower in the absence of infection, the proliferation of HSC in IL27R KO mice in the absence of infection is also important.

      (5) The immunofluorescence in Figure 3 shows a high level of background and it is difficult to see the GFP and tomato positive cells. In this sense, the number of HSCs quantified as Procr+ (more than 8000 on a single BM section) is inconsistent with the total number of HSCs that a BM can contain (i.e., around 6000 per BM as quantified in Figure 1).

      (6) The addition of arrows to the figure will help to visualise positive cells. It is also not clear why the author normalised the GFP+ cells to the tomato+ cells in Figure 3D.

      (7) Furthermore, even if monocytes represent a high proportion of IL27-producing cells, they are only 50% of the cells at 5dpi, as shown in Figure 3 and S4. Without other monocyte markers, line 307 is incorrect.

      (8) How do the authors explain that in Figure 1, 5-10% of labelled precursors and monocytes can give 100% of monocytes? This would mean that only labelled HSC can differentiate into PEC monocytes.

    1. eLife Assessment

      The granularity with which neural activity in the sensorimotor cortex of mice corresponds to voluntary forelimb motion is a key open question. This paper provides convincing evidence for the encoding of low-level features like joint angles and represents an important step forward toward understanding the cortical origins of limb control signals.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the encoding of forelimb movement parameters using a reach-to-grasp task in mice. The authors use a modified version of the water-reaching paradigm developed by Galinanes and Huber. Two-photon calcium imaging was then performed with GCaMP6f to measure activity across both the contralateral caudal forelimb area (CFA) and the forelimb portion of primary somatosensory cortex (fS1) as mice perform the reaching behavior. Established methods were used to extract the activity of imaged neurons in layer 2/3, including methods for deconvolving the calcium indicator's response function from fluorescence time series. Video-based limb tracking was performed to track the positions of several sites on the forelimb during reaching and extract numerous low-level (joint angle) and high-level (reach direction) parameters. The authors find substantial encoding of parameters for both the proximal and distal parts of the limb across both CFA and fS1, with individual neurons showing heterogeneous parameter encoding. Limb movement can be decoded similarly well from both CFA and fS1, though CFA activity enables decoding of reach direction earlier and for a more extended duration than fS1 activity. Collectively, these results indicate involvement of a broadly distributed sensorimotor region in mouse cortex in determining low-level features of limb movement during reach-to-grasp.

      Strengths:

      The technical approach is of very high quality. In particular, the decoding methods are well designed and rigorous. The use of partial correlations to distinguish correlation between cortical activity and either proximal or distal limb parameters or either low- or high-level movement parameters was very nice. The limb tracking was also of extremely high quality, and critical here to revealing the richness of distal limb movement during task performance.

      The task itself also reflects an important extension of the original work by Galinanes and Huber. The demonstration of a clear, trackable grasp component in a paradigm where mice will perform hundreds of trials per day expands the experimental opportunities for the field. This is an exciting development.

      The findings here are important and the support for them is solid. The work represents an important step forward toward understanding the cortical origins of limb control signals. One can imagine numerous extensions of this work to address basic questions that have not been reachable in other model systems.

      Collectively, these strengths made this manuscript a pleasure to read and review.

      Weaknesses:

      In the last section of the results, the authors purport to examine the representation of "higher-level target-related signals," using the decoding of reach direction. While I think the authors are careful in their phrasing here, I think they should be more explicit about what these signals could be reflecting. The "signals" here that are used to decode direction could relate to anything - low-level signals related to limb or postural muscles, or true high-level commands that dictate only what movement downstream motor centers should execute, rather than the muscle commands that dictate how. One could imagine using a partial correlation-type approach again here to extract a signal uncorrelated with all the measured low-level parameters, but there would still be all the unmeasured ones. Again, I think it is still ok to call these "high-level signals," but I think some explicit discussion of what these signals could reflect is necessary.

      Related to this, I think the manuscript in general does not do an adequate job of explicitly raising the important caveats in interpreting parametric correlations in motor system signals, like those raised by Todorov, 2000. The authors do an expert job of handling the correlations, using PCA to extract uncorrelated components and using the partial correlation approach. However, more clarity about the range of possible signal types the recorded activity could reflect seems necessary.

      The manuscript could also do a better job of clarifying relevant similarities and differences between the rodent and primate systems, especially given the claims about the rodent being a "first-class" system for examining the cellular and circuit basis of motor control, which I certainly agree with. Interspecies similarities and differences could be better addressed both in the Introduction, where results from both rodents and primates are intermixed (second paragraph), and in the Discussion, where more clarity on how results here agree and disagree with those from primates would be helpful. For example, the ratio of corticospinal projections targeting sensory and motor divisions of the spinal cord differs substantially between rodents and primates. As another example, the relatively high physical proximity between the typical neurons in mouse M1 and S1 compared to primates seems likely to yoke their activity together to a greater extent. There is also the relatively large extent of fS1 from which forelimb movements can be elicited through intracortical microstimulation at current levels similar to those for evoking movement from M1. All of these seem relevant in the context of findings that activity in mouse M1 and S1 are similar.

      In addition, there are a number of other issues related to the interpretation of findings here that are not adequately addressed. These are described in the Recommendations for improvement.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Grier, Salimian, and Kaufman characterize the relationship between the activity of neurons in sensorimotor cortex and forelimb kinematics in mice performing a reach-to-grasp task. First, they train animals to reach to two cued targets to retrieve water reward, measure limb motion with high resolution, and characterize the stereotyped kinematics of the shoulder, elbow, wrist, and digits. Next, they find that inactivation of the caudal forelimb motor area severely impairs coordination of the limb and prevents successful performance of the task. They then use calcium imaging to measure the activity of neurons in motor and somatosensory cortex, and demonstrate that fine details of limb kinematics can be decoded with high fidelity from this activity. Finally, they show reach direction (left vs right target) can be decoded earlier in the trial from motor than from somatosensory cortex.

      Strengths:

      In my opinion, this manuscript is technically outstanding and really sets a new bar for motor systems neurophysiology in the mouse. The writing and figures are clear, and the claims are supported by the data. This study is timely, as there has been a recent trend towards recording large numbers of neurons across the brain in relatively uncontrolled tasks and inferring a widespread but coarse encoding of high-level task variables. The central finding here, that sensorimotor cortical activity reflects fine details of forelimb movement, argues against the resurgent idea of cortical equipotentiality, and in favor of a high degree of specificity in the responses of individual neurons and of the specialization of cortical areas.

      Weaknesses:

      It would be helpful for the authors to be more explicit about which models of mouse cortical function their results support or rule out, and how their findings break new conceptual ground.

    1. eLife Assessment

      This useful study reports a method to detect and analyze a novel post-translational modification, lysine acetoacetylation (Kacac), finding it regulates protein metabolism pathways. The study unveils epigenetic modifiers involved in placing this mark, including key histone acetyltransferases such as p300, and concomitant HDACs, which remove the mark. Proteomic and bioinformatics analysis identified many human proteins with Kacac sites, potentially suggesting broad effects on cellular processes and disease mechanisms. While the data presented are solid, the functional validation of the sites would add significantly to the manuscript's description of this modification; the study will be of interest to those studying protein and metabolic regulation.

    2. Reviewer #1 (Public review):

      Summary

      Lysine acetoacetylation (Kacac) is a recently discovered histone post-translational modification (PTM) connected to ketone body metabolism. This research outlines a chemo-immunological method for detecting Kacac, eliminating the requirement for creating new antibodies. The study demonstrates that acetoacetate acts as the precursor for Kacac, which is catalyzed by the acyltransferases GCN5, p300, and PCAF, and removed by the deacetylase HDAC3. Acetoacetyl-CoA synthetase (AACS) is identified as a central regulator of Kacac levels in cells. A proteomic analysis revealed 139 Kacac sites across 85 human proteins, showing the modification's extensive influence on various cellular functions. Additional bioinformatics and RNA sequencing data suggest a relationship between Kacac and other PTMs, such as lysine β-hydroxybutyrylation (Kbhb), in regulating biological pathways. The findings underscore Kacac's role in histone and non-histone protein regulation, providing a foundation for future research into the roles of ketone bodies in metabolic regulation and disease processes.

      Strengths

      (1) The study developed an innovative method by using a novel chemo-immunological approach to the detection of lysine acetoacetylation. This provides a reliable method for the detection of specific Kacac using commercially available antibodies.

      (2) The research has done a comprehensive proteome analysis to identify unique Kacac sites on 85 human proteins by using proteomic profiling. This detailed landscape of lysine acetoacetylation provides a possible role in cellular processes.

      (3) The functional characterization of enzymes explores the activity of acetoacetyltransferase of key enzymes like GCN5, p300, and PCAF. This provides a deeper understanding of their function in cellular regulation and histone modifications.

      (4) The impact of acetyl-CoA and acetoacetyl-CoA on histone acetylation provides the differential regulation of acylations in mammalian cells, which contributes to the understanding of metabolic-epigenetic crosstalk.

      (5) The study examined acetoacetylation levels and patterns, which involve experiments using treatment with acetohydroxamic acid or lovastatin in combination with lithium acetoacetate, providing insights into the regulation of SCOT and HMGCR activities.

      Weakness

      (1) There is a limitation to functional validation, related to the work on the biological relevance of identified acetoacetylation sites. Hence, the study requires certain functional validation experiments to provide robust conclusions regarding the functional implications of these modifications on cellular processes and protein function. For example, functional implications of the identified acetoacetylation sites on histone proteins would aid the interpretation of the results.

      (2) The authors could have studied acetoacetylation patterns between healthy cells and disease models like cancer cells to investigate potential dysregulation of acetoacetylation in pathological conditions, which could provide insights into their PTM function in disease progression and pathogenesis.

      (3) The time-course experiments could be performed following acetoacetate treatment to understand temporal dynamics, which can capture the acetoacetylation kinetic change, thereby providing a mechanistic understanding of the PTM changes and their regulatory mechanisms.

      (4) Though the discussion section indeed provides critical analysis of the results in the context of existing literature, further providing insights into acetoacetylation's broader implications in histone modification. However, the study could provide a discussion on the impact of the overlap of other post-translational modifications with Kacac sites with their implications on protein functions.

      Impact

      The authors successfully identified novel acetoacetylation sites on proteins, expanding the understanding of this post-translational modification. The authors conducted experiments to validate the functional significance of acetoacetylation by studying its impact on histone modifications and cellular functions.

    3. Reviewer #2 (Public review):

      In the manuscript by Fu et al., the authors developed a chemo-immunological method for the reliable detection of Kacac, a novel post-translational modification, and demonstrated that acetoacetate and AACS serve as key regulators of cellular Kacac levels. Furthermore, the authors identified the enzymatic addition of the Kacac mark by acyltransferases GCN5, p300, and PCAF, as well as its removal by deacetylase HDAC3. These findings indicate that AACS utilizes acetoacetate to generate acetoacetyl-CoA in the cytosol, which is subsequently transferred into the nucleus for histone Kacac modification. A comprehensive proteomic analysis has identified 139 Kacac sites on 85 human proteins. Bioinformatics analysis of Kacac substrates and RNA-seq data reveals the broad impacts of Kacac on diverse cellular processes and various pathophysiological conditions. This study provides valuable additional insights into the investigation of Kacac and would serve as a helpful resource for future physiological or pathological research.

      The following concerns should be addressed:

      (1) A detailed explanation is needed for selecting H2B (1-26) K25 sites over other acetylation sites when evaluating the feasibility of the chemo-immunological method.

      (2) In Figure 2(B), the addition of acetoacetate and NaBH4 resulted in an increase in Kbhb levels. Specifically, please investigate whether acetoacetylation is primarily mediated by acetoacetyl-CoA and whether acetoacetate can be converted into a precursor of β-hydroxybutyryl (bhb-CoA) within cells. Additional experiments should be included to support these conclusions.

      (3) In Figure 2(E), the amount of pan-Kbhb decreased upon acetoacetate treatment when SCOT or AACS was added, whereas this decrease was not observed with NaBH4 treatment. What could be the underlying reason for this phenomenon?

      (4) The paper demonstrates that p300, PCAF, and GCN5 exhibit significant acetoacetyltransferase activity and discusses the predicted binding modes of HATs (primarily PCAF and GCN5) with acetoacetyl-CoA. To validate the accuracy of these predicted binding models, it is recommended that the authors design experiments such as constructing and expressing protein mutants, to assess changes in enzymatic activity through western blot analysis.

      (5) HDAC3 shows strong de-acetoacetylation activity compared to its de-acetylation activity. Specific experiments should be added to verify the molecular docking results. The use of HPLC is recommended, in order to demonstrate that HDAC3 acts as an eraser of acetoacetylation and to support the above conclusions. If feasible, mutating critical amino acids on HDAC3 (e.g., His134, Cys145) and subsequently analyzing the HDAC3 mutants via HPLC and western blot can further substantiate the findings.

      (6) The resolution of the figures needs to be addressed in order to ensure clarity and readability.

    4. Reviewer #3 (Public review):

      Summary:

      This paper presents a timely and significant contribution to the study of lysine acetoacetylation (Kacac). The authors successfully demonstrate a novel and practical chemo-immunological method using the reducing reagent NaBH4 to transform Kacac into lysine β-hydroxybutyrylation (Kbhb).

      Strengths:

      This innovative approach enables simultaneous investigation of Kacac and Kbhb, showcasing their potential in advancing our understanding of post-translational modifications and their roles in cellular metabolism and disease.

      Weaknesses:

      The paper's main weaknesses are the lack of SDS-PAGE analysis to confirm HATs purity and loading consistency, and the absence of cellular validation for the in vitro findings through knockdown experiments. These gaps weaken the evidence supporting the conclusions.

  2. Apr 2025
    1. eLife Assessment

      This paper investigates how isoform II of transcription factor RUNX2 promotes cell survival and proliferation in oral squamous cell carcinoma cell lines. The authors used gain and loss of function techniques to provide convincing evidence showing that RUNX2 isoform silencing led to cell death via several mechanisms including apoptosis and ferroptosis that was partially suppressed through RUNX2 regulation of PRDX2 expression. The study provides valuable insight into the underlying mechanism by which RUNX2 acts in oral squamous cell carcinoma.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, authors investigated the role of RUNT-related transcription factor 2 (RUNX2) in oral squamous carcinoma (OSCC) growth and resistance to ferroptosis. They found that RUNX2 suppresses ferroptosis through transcriptional regulation of peroxiredoxin-2. They further explored the upstream positive regulator of RUNX2, HOXA10 and found that HOXA1/RNUX2/PRDX2 axis protects OSCC from ferroptosis.

      Strengths:

      The study is well designed and provides a novel mechanism of HOXA1/RNUX2/PRDX2 control of ferroptosis in OSCC.

      Weaknesses:

      According to the data presented in (Figure 2F, Figure 3F and G, Figure 5D and Figure 6E and F), apoptosis seems to be affected in the same amount as ferroptosis by HOXA1/RNUX2/PRDX2 axis, which raises a question on the authors' specific focus on ferroptosis in this study. Reasonably, authors should adapt the title and the abstract in a way that it recapitulates the whole data, which is HOXA1/RNUX2/PRDX2 axis control of cell death, including ferroptosis and apoptosis in OSCC.

      Comments on revisions:

      The revised manuscript has been well improved, and I'm satisfied with the authors' response to my comments.

    3. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This paper investigates how isoform II of transcription factor RUNX2 promotes cell survival and proliferation in oral squamous cell carcinoma cell lines. The authors used gain and loss of function techniques to provide incomplete evidence showing that RUNX2 isoform silencing led to cell death via several mechanisms including ferroptosis that was partially suppressed through RUNX2 regulation of PRDX2 expression. The study provides useful insight into the underlying mechanism by which RUNX2 acts in oral squamous cell carcinoma, but the conclusions of the authors should be revised to acknowledge that ferroptosis is not the only cause of cell death.

      We appreciate the editor’s positive comments on our work and the valuable suggestions provided by the reviewers. We did find that RUNX2 isoform II knockdown or HOXA10 knockdown could also lead to apoptosis. We have revised our title as following: “RUNX2 Isoform II Protects Cancer Cells from Ferroptosis and Apoptosis by Promoting PRDX2 Expression in Oral Squamous Cell Carcinoma”. In addition, we have also revised our conclusions in the abstract as follows: “OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.” We have added more experiments to better support our conclusions. Please see following responses to reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, authors investigated the role of RUNT-related transcription factor 2 (RUNX2) in oral squamous carcinoma (OSCC) growth and resistance to ferroptosis. They found that RUNX2 suppresses ferroptosis through transcriptional regulation of peroxiredoxin-2. They further explored the upstream positive regulator of RUNX2, HOXA10 and found that HOXA10/RUNX2/PRDX2 axis protects OSCC from ferroptosis.

      Strengths:

      The study is well designed and provides a novel mechanism of HOXA10/RUNX2/PRDX2 control of ferroptosis in OSCC.

      Weaknesses:

      According to the data presented in (Figure 2F, Figure 3F and G, Figure 5D and Figure 6E and F), apoptosis seems to be affected in the same amount as ferroptosis by HOXA10/RUNX2/PRDX2 axis, which raises questions on the authors' specific focus on ferroptosis in this study. Reasonably, authors should adapt the title and the abstract in a way that recapitulates the whole data, which is HOXA10/RUNX2/PRDX2 axis control of cell death, including ferroptosis and apoptosis in OSCC.

      We really grateful for your comments. We agree that these figures do show that isoform II-knockdown or HOXA10-knockdown could induce apoptosis. We have adapted the title and abstract as follow:

      Title: “RUNX2 Isoform II Protects Cancer Cells from Ferroptosis and Apoptosis by Promoting PRDX2 Expression in Oral Squamous Cell Carcinoma”.

      Abstract: “In the present study, we surprisingly find that RUNX2 isoform II is a novel ferroptosis and apoptosis suppressor. RUNX2 isoform II can bind to the promoter of peroxiredoxin-2 (PRDX2), a ferroptosis inhibitor, and activate its expression. Knockdown of RUNX2 isoform II suppresses cell proliferation in vitro and tumorigenesis in vivo in oral squamous cell carcinoma (OSCC). Interestingly, homeobox A10 (HOXA10), an upstream positive regulator of RUNX2 isoform II, is required for the inhibition of ferroptosis and apoptosis through the RUNX2 isoform II/PRDX2 pathway. Consistently, RUNX2 isoform II is overexpressed in OSCC, and associated with OSCC progression and poor prognosis. Collectively, OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.”

      In addition, we have performed the rescue experiment showing that PRDX2 overexpression rescues the apoptosis induced by isoform II-knockdown (Figure 4-figure supplement 4) or HOXA10-knockdown (Figure 7-figure supplement 2).

      We have added the description about these experiments in result “RUNX2 isoform II promotes the expression of PRDX2” and “HOXA10 inhibits ferroptosis and apoptosis through RUNX2 isoform II” as follow: “In addition, we found that PRDX2 overexpression could partially reduce the increased apoptosis caused by isoform II-knockdown. (Figure 4-figure supplement 4).” “PRDX2 overexpression also could rescue the increased cellular apoptosis caused by HOXA10 knockdown (Figure 7-figure supplement 2).”.

      Comments:

      In the description of the result section related to Figure 3E, the author wrote "In addition, we found that isoform II-knockdown induced shrunken mitochondria with vanished cristae with transmission electron microscopy (Figure 3E). These results suggest that RUNX2 isoform II may suppress ferroptosis." The interpretation provided here is not clear to the reviewer. How shrunken mitochondria and vanished cristae can be linked to ferroptosis?

      We apologize for the inaccurate description. Ferroptotic cells usually exhibit shrunken mitochondria, reduced or absent cristae, and increased membrane dentistry (Dixon et al., 2012). However, the presence of shrunken mitochondria or vanished cristae does not guarantee that ferroptosis has occurred in the cells. Other evidences, such as the increased ROS production and lipid peroxidation accumulation in cells with RUNX2 isoform II-knockdown must be evaluated as we are showing in Figure 3A and 3B. Furthermore, isoform II overexpression suppressed ROS production (Figure 3C) and lipid peroxidation (Figure 3D). We have revised our interpretation as follow: “In addition, we found that isoform II-knockdown induced shrunken mitochondria with vanished cristae with transmission electron microscopy (Figure 3E). This phenomenon along with the above results of ROS production and lipid peroxidation accumulation assays suggests that RUNX2 isoform II may suppress ferroptosis.”.

      Dixon, S. J., Lemberg, K. M., Lamprecht, M. R., Skouta, R., Zaitsev, E. M., Gleason, C. E., . . . Stockwell, B. R. (2012). Ferroptosis: an iron-dependent form of nonapoptotic cell death. Cell, 149(5), 1060-1072. doi:10.1016/j.cell.2012.03.042 PMID:22632970

      The electron microscopy images show more elongated mitochondria in the RUNX2 isoform II-KO cells than in RUNX2 isoform II positive cells, which might result from the fusion of mitochondria. These images should complete with a fluorescent mitochondria staining of these cells.

      We do find that the TEM images of RUNX2 isoform II-knockdown cells show more elongated mitochondria. The mitochondria undergo cycles of fission and fusion, known as mitochondrial dynamics, which in turn leads to changes in mitochondrial length. Through examining factors related to mitochondrial dynamics, we find that isoform II knockdown could decrease the expression levels of FIS1 (Fission, Mitochondrial 1) (Figure 3-figure supplement 2B) which mediates the fission of mitochondria. Therefore, we speculate that the elongated mitochondria in the isoform II-knockdown cells may be due to the decrease in mitochondrial fission through inhibiting FIS1 expression.

      In addition, we have tried our best to perform the fluorescent staining of mitochondrial to observe mitochondrial morphology. However, due to the quality of probes and fluorescent microscope, our images of mitochondrial fluorescence were not satisfactory. So, we re-capture more electron microscopy images, measure the length of mitochondria, and perform statistical analyses. We find that isoform II-knockdown cells show significantly more mitochondrial elongation than the control cells (Author response image 1 and Figure 3-figure supplement 2A). Therefore, we believe that isoform II knockdown promotes mitochondrial elongation to be relatively reliable.

      Author response image 1.

      The new electron microscopy images in RUNX2 isoform II-knockdown cells. RSL3 (a ferroptosis activator) served as a positive control. Scale bar: 1 μm. The calculation and statistical analysis of mitochondrial elongation were added in Figure 3-figure supplement 2A.

      What is the oxygen consumption rate in RUNX2 KO cells?

      We have performed a new mitochondrial stress assay to analyze the oxygen consumption rate (OCR). We find that RUNX2 isoform II-knockdown can decrease OCR in OSCC cell line. This result has been added to Figure 3-figure supplement 3A and B. It is consistent with our observation of the damaged mitochondria morphology in the cells with RUNX2 isoform II knockdown.

      The increase in cell proliferation after RUNX2 overexpression in Figure 2A is not convincing, is there any differences in their migration or invasion capacity?

      We agree that overexpression of isoform II didn’t dramatically enhance OSCC cell proliferation. We consider that it may be due to the existing high level of isoform II in OSCC cells. We have performed wound-healing assay and transwell assay to analyze the migration or invasion capacity of cells with RUNX2 isoform II or isoform I overexpression. We find that isoform II overexpression has no effect on the migration and invasion in OSCC cells (Figure 2-figure supplement 2). This phenomenon suggests that further increasing isoform II cannot improve the migration or invasion capacity of OSCC cells. However, isoform I overexpression suppresses the migration and invasion of cancer cells (Figure 2-figure supplement 2), indicating that the upregulation of isoform I, which is downregulated in OSCC cells, may inhibit tumorigenesis. In addition, we found that the expression level of isoform I was lower in TCGA OSCC patients than that in normal controls (Figure 1D), and patients with higher isoform I showed longer overall survival (Figure 1-figure supplement 1). These results support that isoform I may inhibit tumorigenesis in OSCC cells.

      The in vivo study shows 50% reduction in primary tumor growth after RUNX2 inhibition by shRNA in CAL 27 xenografts, but only one shRNA is shown. Is this one shRNA clone? At least 2 shRNA clones should be used.

      In this vivo primary tumor growth experiment, we used a CAL 27 stable cell line transfected with an shRNA against RUNX2 isoform II (shisoform II-1). We agree that at least two shRNAs should be used. In this revision, we perform another tumor growth experiment with the CAL 27 stably transfected with another new shRNA targeting the different region in isoform II (shisoform II-2). As with the previous experiment, CAL 27 cells stably transfected with this new shRNA also showed significantly reduced tumor growth and weight than those transfected with non-specific control shRNA in nude mice (Figure 2-figure supplement 4A-D).

      Apoptosis and necroptosis seem to be affected in the same amount as ferroptosis by HOXA10/RUNX2/PRDX2 axis. This is evident from experiments in Figure 3E, F and from Figure 6E, F and Figure 3G. Either Fer-1, Z-VAD, or Nec-1 used alone, were not able to fully restore cell proliferation to control cell level, which implies an additive effect of ferroptosis, apoptosis and necrosis. The author should verify potential additive or synergistic effect of the combination of Fer-1 and Z-VAD in these assays after si-RUNX2 in Figure 3 F and G and after si-HOX assays.

      We sincerely appreciate your valuable comments. We have performed the new assay to analyze the potential additive or synergistic effect of the combination of Fer-1 and Z-VAD after RUNX2 isoform II (si-II) or HOXA10 (si-HOX) knockdown. We find that the combination of Fer-1 and Z-VAD is more effective in rescuing the cell proliferation than Fer-1 or Z-VAD alone. (Figure 3- figure supplement 6 and Figure 6- figure supplement 4).

      What is the effect of PRDX2 or HOXA10 depletion on tumor growth?

      We have performed a new xenograft tumor formation assay in nude mice to analyze the effect of PRDX2-knockdown on tumor growth. We found that CAL 27 cells stably transfected with shRNAs against PRDX2 showed significantly reduced tumor growth and weight than those transfected with non-specific control shRNA in nude mice (Figure 4-figure supplement 2A-D). Regarding the effect of HOXA10 depletion on tumor growth, please allow us to cite a study (Guo et al., 2018) which demonstrated that HOXA10 knockout in Fadu cells (a cell line of pharyngeal squamous cell carcinoma) could inhibit tumor growth. 

      We have added these results to the section of “RUNX2 isoform II promotes the expression of PRDX2” as follows: “In line with the inhibitory effect of isoform II-knockdown on tumor growth, CAL 27 cells stably transfected with anti-PRDX2 shRNAs showed notably reduced tumor growth and weight than those transfected with non-specific control shRNA in nude mice (Figure 4-figure supplement 2A-D).”.

      Guo, L. M., Ding, G. F., Xu, W., Ge, H., Jiang, Y., Chen, X. J., & Lu, Y. (2018). MiR-135a-5p represses proliferation of HNSCC by targeting HOXA10. Cancer Biol Ther, 19(11), 973-983. doi:10.1080/15384047.2018.1450112 PMID:29580143

      What is the clinical relevance of HOXA10 in OSCC patients?

      In Figure 5-figure supplement 1B, we have showed that the expression levels of HOXA10 in TCGA OSCC patients were also significantly higher than those in normal controls. In this revision, we further find that patients with higher HOXA10 show significantly shorter overall survival in TCGA OSCC dataset (Figure 5-figure supplement 2C). In addition, we have also analyzed the expression of HOXA10 in our clinical OSCC and adjacent normal tissues, and found that HOXA10 expression level of OSCC tissues is significantly higher than that of normal controls (Figure 5-figure supplement 2A and B), which is consistent with the results from TCGA OSCC dataset.

      We have revised our writing in the result “HOXA10 is required for RUNX2 isoform II expression and cell proliferation in OSCC” as follows: “Similarly, HOXA10 expression level of our clinical OSCC tissues is significantly higher than that of adjacent normal tissues (Figure 5-figure supplement 2A and B). Moreover, TCGA OSCC patients with higher expression levels of HOXA10 showed shorter overall survival (Figure 5-figure supplement 2C).”

      Reviewing editor (Public Review):

      This paper reports the role of the Isoform II of RUNX2 in activating PRDX2 expression to suppress ferroptosis in oral squamous cell carcinoma (OSCC).

      The following major issues should be addressed.

      A major postulate of this study is the specific role of RUNX2 isoform II compared to isoform I.

      Figure 1F shows association between patient survival and Iso II expression, but nothing is shown for Iso I, this should be added, in addition the number of patients at risk in each category should be shown.

      We sincerely appreciate your valuable comments. We have added the survival curve of isoform I (exon 2.1) in the new Figure 1-figure supplement 1. In contrast to isoform II, patients with higher isoform I showed longer overall survival. The numbers of patients at risk in each category in the Figure 1F and Figure 1-figure supplement 1 are added.

      The authors test Iso I and Iso II overexpression in CAL27 or SCC-9 model cell lines. In Fig. 2A in CAL27, the overexpression of Iso II is much stronger than Iso I so it seems premature to draw any conclusions. More importantly, however, no Iso l silencing is shown in either of the cell lines nor the xenografted tumours. This is absolutely essential for the authors hypothesis and should be tested using shRNA in cells and xenografted tumours.

      Thank you for your valuable comments. We agree that the overexpression of isoform I is much stronger than isoform II in CAL 27 cells in Fig. 2A-B. We have done another repeat experiment which shows the similar overexpression of isoform II and I in Figure 2A-figure supplement 1. This repeat experiment also shows that overexpression of FLAG tagged isoform II significantly promoted the proliferation of OSCC cells. We tried our best to knockdown isoform I. However, the specific sequence of isoform I is 317 nt. We designed four anti-isoform I siRNAs, and unfortunately found that none of these siRNAs could knockdown isoform I efficiently. Please see following Author response image 2. Therefore, currently we cannot knockdown isoform I. However, we have tried the overexpression of isoform I. We find that isoform I overexpression inhibits the migration and invasion of cancer cells (Figure 2- figure supplement 2). In addition, we have shown that isoform II overexpression showed enhanced cell proliferation compared with isoform I overexpression in OSCC cells (Figure 2A). Therefore, we consider that isoform I is not essential for OSCC cell proliferation and tumorigenesis. Then, we mainly focus on isoform II in this study.  

      Author response image 2.

      The knockdown efficiency of RUNX2 isoform I (anti-isoform I, si-I-1, si-I-2, si-I-3, si-I-4) in OSCC cells were analyzed by RT-PCR, 18S rRNA served as a loading control. The sequences of siRNAs are as follows: 5’ GGCCACUUCGCUAACUUGU 3’ (si-I-1), 5’ GUUCCAAAGACUCCGGCAA 3’ (si-I-2), 5’ UGGCUGUUGUGAUGCGUAU 3’ (si-I-3), and 5’ CGGCAGUCGGCCUCAUCAA 3’ (si-I-4).

      A major conclusion of this study is that Iso II expression suppresses ferroptosis. To support this idea, the authors use the inhibitor Ferrostatin-1 (Fer -1). While Fer-1 typically does not lead to a 100% rescue, here the effect is only marginal and as shown in Figures 3F and G only marginally better than Z-VAD or Necrostatin 1. These data do not support the idea that the major cause of cell death is ferroptosis. Instead. Iso II silencing leads to cell death through different pathways. The authors should acknowledge this and rephrase the conclusion of the paper accordingly. Moreover, the authors consistently confound cell proliferation with cell death.

      We agree that RUNX2 isoform II-knockdown could also induce apoptosis. We have revised the description in the title and abstract as follow:

      Title: “RUNX2 Isoform II Protects Cancer Cells from Ferroptosis and Apoptosis by Promoting PRDX2 Expression in Oral Squamous Cell Carcinoma”.

      Abstract: “In the present study, we surprisingly find that RUNX2 isoform II is a novel ferroptosis and apoptosis suppressor. RUNX2 isoform II can bind to the promoter of peroxiredoxin-2 (PRDX2), a ferroptosis inhibitor, and activate its expression. Knockdown of RUNX2 isoform II suppresses cell proliferation in vitro and tumorigenesis in vivo in oral squamous cell carcinoma (OSCC). Interestingly, homeobox A10 (HOXA10), an upstream positive regulator of RUNX2 isoform II, is required for the inhibition of ferroptosis and apoptosis through the RUNX2 isoform II/PRDX2 pathway. Consistently, RUNX2 isoform II is overexpressed in OSCC, and associated with OSCC progression and poor prognosis. Collectively, OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.”.

      Conclusion: “In conclusion, we identified RUNX2 isoform II as a novel ferroptosis and apoptosis inhibitor in OSCC cells by transactivating PRDX2 expression. RUNX2 isoform II plays oncogenic roles in OSCC. Moreover, we also found that HOXA10 is an upstream regulator of RUNX2 isoform II and is required for suppressing ferroptosis and apoptosis through RUNX2 isoform II and PRDX2.”.

      We apologize for confusing cell proliferation with cell death. We have checked the whole manuscript and corrected the mistakes.

      In Fig. 4A the authors investigate GPX1 expression, whereas GPX4 is often the key ferroptosis regulator, this has to be tested. This is important as the authors also test the effect of the GPX4 inhibitor RSL3, however, the authors do not determine IC<sub50</sub> values of the different cell lines with or without Iso II overexpression or silencing or compared to other RSL3 sensitive or resistant cells. Without this information, no conclusions can be drawn.

      We greatly appreciated the reviewer’s comments. We have performed new experiment to analyze the effect of isoform II on GPX4 expression. We find that isoform II knockdown decreases the expression of GPX4 mRNA and protein (Figure 4-figure supplement 1A and B), and conversely isoform II overexpression promotes GPX4 expression (Figure 4-figure supplement 1C and D), which is consistent with the inhibition of ferroptosis by RUNX2 isoform II. As an upstream positive regulator of RUNX2 isoform II, HOXA10 knockdown also inhibited the expression of GPX4 mRNA and protein (Figure 6-figure supplement 1A and B).

      We also perform new experiment to determine IC<sub50</sub> values of the cells with or without isoform II overexpression or silencing. We find that isoform II overexpression elevates the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8A), in contrast, isoform II-knockdown decreases the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8B).

      We have added the description of these experiments in Result “RUNX2 isoform II suppresses ferroptosis”, “RUNX2 isoform II promotes the expression of PRDX2” and “HOXA10 inhibits ferroptosis through RUNX2 isoform II” as follow:

      RUNX2 isoform II suppresses ferroptosis: “Isoform II overexpression could elevate the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8A), in contrast, isoform II-knockdown decreased the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8B).”.

      RUNX2 isoform II promotes the expression of PRDX2: “Firstly, we found that RUNX2 isoform II-knockdown or overexpression could downregulate or upregulate the expression of GPX4 mRNA and protein, respectively (Figure 4-figure supplement 1A-D). In addition to the GPX4, we found that PRDX2 is the most significantly down-regulated gene upon isoform II-knockdown in CAL 27 (Figure 4A).”.

      HOXA10 inhibits ferroptosis through RUNX2 isoform II: “In addition, HOXA10-knockdown could suppress the expression of GPX4 mRNA and protein (Figure 6-figure supplement 1A and B).”.

      In summary, while the authors show that RUNX2 Iso II expression enhances cell survival, the idea that cell death is principally via ferroptosis is not fully established by the data. The authors should modify their conclusions accordingly.

      We agree that RUNX2 isoform II could enhance cell survival via suppressing both ferroptosis and apoptosis. We have revised the description in the title and abstract as follow:

      Abstract: “In the present study, we surprisingly find that RUNX2 isoform II is a novel ferroptosis and apoptosis suppressor. RUNX2 isoform II can bind to the promoter of peroxiredoxin-2 (PRDX2), a ferroptosis inhibitor, and activate its expression. Knockdown of RUNX2 isoform II suppresses cell proliferation in vitro and tumorigenesis in vivo in oral squamous cell carcinoma (OSCC). Interestingly, homeobox A10 (HOXA10), an upstream positive regulator of RUNX2 isoform II, is required for the inhibition of ferroptosis and apoptosis through the RUNX2 isoform II/PRDX2 pathway. Consistently, RUNX2 isoform II is overexpressed in OSCC, and associated with OSCC progression and poor prognosis. Collectively, OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.”.

      Conclusion: “In conclusion, we identified RUNX2 isoform II as a novel ferroptosis and apoptosis inhibitor in OSCC cells by transactivating PRDX2 expression. RUNX2 isoform II plays oncogenic roles in OSCC. Moreover, we also found that HOXA10 is an upstream regulator of RUNX2 isoform II and is required for suppressing ferroptosis and apoptosis through RUNX2 isoform II and PRDX2.”

    1. eLife Assessment

      This important study offers a molecular characterization of neurons and glia in the adult nervous system of the fruit fly Drosophila melanogaster. The study focuses on the progeny of a specific set of neural stem cells that contribute to the central complex, a conserved brain region that plays key roles in sensorimotor integration. The data are convincing and collected using validated methodology, generating an invaluable resource for future studies. The study will be of interest to developmental neurobiologists.

    2. Reviewer #1 (Public review):

      Summary:

      Epiney et al. use single-nuclei RNA sequencing (snRNA-seq) to characterize the lineage of Type-2 (T2) neuroblasts (NBs) in the adult Drosophila brain. To isolate cells born from T2 NBs, the authors used a genetic tool that specifically allows the permanent labeling of T2-derived cell types, which are then FAC-sorted for snRNA-seq. This effective labeling approach also allows them to compare the isolated T2 lineage cells with T1-derived cell types by a simple exclusion method. The authors begin by describing a transcriptomic atlas for all T1 and T2-derived neuronal and glia clusters, reporting that the T2-derived lineage comprises 161 neuronal clusters, in contrast to the T1 lineage which comprises 114 of them. The authors then use the expression of VAChT, VGlut, Gad1, Tbh, Ple, SerT, and Tdc2 to show that T2 neuroblasts generate all major neuron classes of fast-acting neurotransmitters. Strikingly, they show that a subset of glia and neuronal clusters have disproportionate enrichment in males or females, suggesting that T2 neuroblasts generate sex-biased cell types. The authors then proceed to characterize neuropeptide expression across T2-derived neuronal clusters and argue that the same neuropeptide can be expressed across different cell types, while similar cell types can express distinct neuropeptides. The functional implication of both observations, however, remains to be tested. Furthermore, the authors describe combinatorial transcription factor (TF) codes that are correlated with neuropeptide expression for T2-derived neurons along with an overall TF code for all T2-derived cell types, both of which will serve as an important starting point for future investigations. Finally, the authors map well-studied neuronal types of the central complex to the clusters of their T2-derived snRNA-seq dataset. They use known marker combinations, bulk RNA-seq data and highly specific split-GAL4 driver lines to annotate their T2-derived atlas, establishing a comprehensive transcriptomic atlas that would guide future studies in this field.

      Strengths:

      This study provides an in-depth transcriptomic characterization of neurons and glia derived from Type-2 neuroblast lineages. The results of this manuscript offer several future directions to investigate the mechanisms of diversifying neuronal identity. The datasets of T1-derived and T2-derived cells will pave the way for studies focused on the functional analysis of combinatorial TF codes specifying cell identity, sex-based differences in neurogenesis and gliogenesis, the relationship between neuropeptide (co)expression and cell identity, and the differential contributions of distinct progenitor populations to the same cell type.

      Weaknesses:

      The study presents several important observations based on the characterization of Type II neuroblast-derived lineages. However, a mechanistic insight is missing for most observations. The idea that there is a sex-specific bias to certain T2-derived neurons and glial clusters is quite interesting, however, the functional significance of this observation is not tested or discussed extensively. Finally, the authors do not show whether the combinatorial TF code is indeed necessary for neuropeptide expression or if this is just a correlation due to cell identity being defined by TFs. Functional knockdown of some candidate TFs for a subset of neuropeptide-expressing cells would have been helpful in this case.

      Comments on revisions:

      The authors have addressed my recommendations.

    3. Reviewer #2 (Public review):

      In this manuscript, Epiney et al., present a single-nucleus sequencing analysis of Drosophila adult central brain neurons and glia. By employing an ingenious permanent labeling technique, they trace the progeny of T2 neuroblasts, which play a key role in the formation of the central complex. This transcriptomic dataset is poised to become a valuable resource for future research on neurogenesis, neuron morphology, and behavior.

      The authors further delve into this dataset with several analyses, including the characterization of neurotransmitter expression profiles in T2-derived neurons. While some of the bioinformatic analyses are preliminary, they would benefit from additional experimental validation in future studies.

      Comments on revisions:

      We appreciate the authors' efforts to address some of the comments. While these revisions have improved the clarity of certain sections, some of the larger concerns remain unaddressed. Specifically, the manuscript still lacks the additional analyses that would allow for more specific conclusions, rather than the general observations currently presented. Although the revisions have certainly made the text clearer, the core issue of needing more detailed analysis to draw more concrete conclusions still stands.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Epiney et al. use single-nuclei RNA sequencing (snRNA-seq) to characterize the lineage of Type-2 (T2) neuroblasts (NBs) in the adult Drosophila brain. To isolate cells born from T2 NBs, the authors used a genetic tool that specifically allows the permanent labeling of T2-derived cell types, which are then FAC-sorted for snRNA-seq. This effective labeling approach also allows them to compare the isolated T2 lineage cells with T1-derived cell types by a simple exclusion method. The authors begin by describing a transcriptomic atlas for all T1 and T2-derived neuronal and glia clusters, reporting that the T2-derived lineage comprises 161 neuronal clusters, in contrast to the T1 lineage which comprises 114 of them. The authors then use the expression of VAChT, VGlut, Gad1, Tbh, Ple, SerT, and Tdc2 to show that T2 neuroblasts generate all major neuron classes of fast-acting neurotransmitters. Strikingly, they show that a subset of glia and neuronal clusters have disproportionate enrichment in males or females, suggesting that T2 neuroblasts generate sex-biased cell types. The authors then proceed to characterize neuropeptide expression across T2-derived neuronal clusters and argue that the same neuropeptide can be expressed across different cell types, while similar cell types can express distinct neuropeptides. The functional implication of both observations, however, remains to be tested. Furthermore, the authors describe combinatorial transcription factor (TF) codes that are correlated with neuropeptide expression for T2-derived neurons along with an overall TF code for all T2-derived cell types, both of which will serve as an important starting point for future investigations. Finally, the authors map well-studied neuronal types of the central complex to the clusters of their T2-derived snRNA-seq dataset. They use known marker combinations, bulk RNA-seq data and highly specific split-GAL4 driver lines to annotate their T2-derived atlas, establishing a comprehensive transcriptomic atlas that would guide future studies in this field.

      Thanks for the clear and accurate summary of our findings.

      Strengths:

      This study provides an in-depth transcriptomic characterization of neurons and glia derived from Type-2 neuroblast lineages. The results of this manuscript offer several future directions to investigate the mechanisms of diversifying neuronal identity. The datasets of T1-derived and T2-derived cells will pave the way for studies focused on the functional analysis of combinatorial TF codes specifying cell identity, sex-based differences in neurogenesis and gliogenesis, the relationship between neuropeptide (co)expression and cell identity, and the differential contributions of distinct progenitor populations to the same cell type.

      Thank you for the positive comments.

      Weaknesses:

      The study presents several important observations based on the characterization of Type II neuroblast-derived lineages. However, a mechanistic insight is missing for most observations. The idea that there is a sex-specific bias to certain T2-derived neurons and glial clusters is quite interesting, however, the functional significance of this observation is not tested or discussed extensively. Finally, the authors do not show whether the combinatorial TF code is indeed necessary for neuropeptide expression or if this is just a correlation due to cell identity being defined by TFs. Functional knockdown of some candidate TFs for a subset of neuropeptide-expressing cells would have been helpful in this case.

      We agree that we do not provide mechanistic or functional insights. Our goal was to produce hypothesis generating datasets for our lab and others to use to direct functional or mechanistic studies.

      Reviewer #2 (Public review):

      In this manuscript, Epiney et al., present a single-nucleus sequencing analysis of Drosophila adult central brain neurons and glia. By employing an ingenious permanent labeling technique, they trace the progeny of T2 neuroblasts, which play a key role in the formation of the central complex. This transcriptomic dataset is poised to become a valuable resource for future research on neurogenesis, neuron morphology, and behavior.

      Thank you for the positive comments.

      The authors further delve into this dataset with several analyses, including the characterization of neurotransmitter expression profiles in T2-derived neurons. While some of the bioinformatic analyses are preliminary, they would benefit from additional experimental validation in future studies.

      Thank you for the positive comments. We too hope that future research will benefit from this dataset.

      Reviewer #1 (Recommendations for the authors):

      Major points

      (1) In Figures 1E and 4A, the T1 and T2 glia subsets reveal sub-clusters for several cell types as seen by the distribution of points on the UMAP. This observation is never validated or discussed. Do these sub-clusters represent true differences in identities or are they artifacts of the single-nucleus preparation? For Figure 1E, it is not clear whether specific sub-clusters (see Ensheathing-4 vs Ensheathing-5 and Astrocyte-2 vs. Astrocyte-6) are differentially enriched between the T1 and T2 lineages. The existence of these sub-clusters must be discussed or dismissed.  

      We agree that this needs to be addressed more clearly in the manuscript and have made text changes in the Results and Discussion sections to clarify. We note that a recent glial cell atlas (Lago-Baldaia et al., 2023: PMID: 37862379) of the developing fly VNC and optic lobes found sub-clusters that mapped to the same subtype annotations. Interestingly, Lago-Baldaia and colleagues found that the transcriptional diversity of glia cell types did not match the morphological diversity of glia validated in vivo. See text changes below.

      Lines 131-133: “Similar to a previous glial cell atlas (Lago-Baldaia et al., 2023) we found some glial subtypes (astrocytes, ensheathing, and subperineurial) mapped to multiple clusters (Figure 1E, 1F).”

      Lines 206-208: “In line with our T1+T2 atlas and previous glia cell atlas (Lago-Baldaia et al., 2023), some subtypes mapped to several subclusters including ensheathing, astrocytes, and chiasm (Figure 4A-B).”

      Lines 397-401: “Similar to a recent glial cell atlas (Lago-Baldaia et al., 2023), we found glial subtypes like astrocytes, ensheathing, and subperineurial glia mapped to several sub-clusters (Figure 1E-F). It remains unclear if these sub-clusters with the same cell type annotation represent distinct glial identities or different transcriptional states within these populations.”

      (2) The authors present evidence for sex-specific neuronal and glia subtypes and find differential expression of specific yolk proteins and long non-coding RNAs. However, whether any of these differences are driven by other canonical sex-specific genes such as Fruitless (Fru) or Double-sex (Dbx) has not been reported or discussed. The authors must re-analyze their data for these genes and claim whether they have any contribution to sex-specific sub-clusters.

      Thank you for pointing this out. We have made text changes and clarifications to highlight the expression of other canonical sex-specific genes. Fru was enriched in male nuclei as expected. Interestingly, dbx was enriched in female nuclei. It remains to be determined if these genes are mechanisms that may be driving sex-specific changes.

      Lines 224-226: “Additionally, female nuclei were enriched for dbx (Supp Table 8). Male glial nuclei expressed higher levels of genes including the male-specific genes lncRNA:rox1/2 and fru (Figure 5C; Supp Table 8) (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997).”

      Lines 237-239: “Male nuclei expressed higher levels of genes including the male-specific genes lncRNA:rox1/2 and fru (Figure 5G; Supp Table 9) (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997).”

      Lines 428-431:” We found the expected differential expression of yolk proteins (yp1, yp2, yp3) enriched in female nuclei and the long non-coding RNAs rox1/2 and fru enriched in male neuronal nuclei (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997; Warren et al., 1979). Interestingly, we found dbx to be enriched in both glial and neuronal female nuclei.”

      Lines 433-435: “It remains to be determined if these genes are driving these sex-specific differences in glia and neurons.”

      (3) In Figure 6C, it is unclear whether the Ms-2A-LexA-expressing neurons of clusters 157 and 160 project to two different neuropils or share projects to both neuropils. However, it is not explicitly shown in the immunostaining data whether indeed there are two populations to begin with. The authors must check for cluster 157 and 160 specific markers (such as Dh44 and ple) and test whether they appear mutually exclusively in the Ms-2A-LexA-expressing neurons. The same reasoning would apply to the data shown in Figures 6D and 6E, where the authors must test whether the NPF and AstA expressing cells are indeed neurons from clusters 100 and 128, using orthogonal cluster markers to conclude that they are similar (or the same) neurons.

      We changed the focus of the paragraph to confirm that these neurons indeed come from type II and that they target the central complex. Although due to the lack of reagents we cannot test the identity of each one of these neurons, we could make meaningful interpretations of the staining to validate our ideas about neuropeptidergic cells in the central complex. We made sure to mention the limitation of our experiment to avoid any wrong conclusions.

      Minor points

      (1) Line 115 - "cluster that represents optic lobe neurons". How was this cluster identified?

      We reexamined the most significant genes enriched in this cluster 124, and found they are Rh2, ninaC, trpl, and phototransduction related genes (Supplemental table 1). We reassigned the identity of this cluster as ocelli, which also express photoreceptor genes but can’t be easily removed during dissection. We modified the text as follows:

      "We used known markers (Croset et al., 2018; Davie et al., 2018; Supp Table 2) to identify distinct cell types in the central brain, including glia, mushroom body neurons, olfactory projection neurons, clock neurons, Poxn+ neurons, serotonergic neurons, dopaminergic neurons, octopaminergic neurons, corazonergic neurons, hemocytes, and ocelli (Figure 1B, Supp. Table 1)."

      (2) As the separation in Figure 1B is not obvious, annotated cell type clusters must be re-colored instead of being labelled as the exact dots are indistinguishable. This would especially be helpful for OCTY, SER, OPN, and CLK clusters.

      (3) Cluster labels in Figure 1C are barely visible and the font size must be increased for the reader. Recoloring the cluster identities and attaching a legend would again help in this case.

      We recolored the atlas in Figure 1B, 1C and 1C’ and increased the font size in Figure 1C’.

      (4) For Figure 4A, clusters should be labelled on the UMAP along with the legend as it is difficult for the reader to match identities using Seurat colors. The same is true for the UMAPs in Figure 5A.

      Yes, we agree that labeling would improve readability and have done so for UMAPs in Figure 4A and 5A-A’’.

      Reviewer #2 (Recommendations for the authors):

      In this manuscript, Epiney et al., present a single-nucleus sequencing analysis of adult central brain neurons and glia Through the use of a ingenious permanent labeling technique, they are able to trace the progeny of T2 neuroblasts, which contribute significantly to the formation of the central complex. This transcriptomic dataset is the first of its kind and will likely serve as a valuable resource for future studies.

      The authors further explore this dataset through several analyses, including the characterization of neurotransmitter expression profiles in T2-derived neurons. However, the approach used to identify the identity of each neuron cluster could be more clearly articulated, and some of the authors' conclusions are more generalized - either already well-established or lacking sufficient support.

      Detailed comments:

      Abstract - "Our data support the hypothesis that each transcriptional cluster represents one or a few closely related neuron subtypes. - Is this a novel finding? If so, it would be helpful if the authors could explain why this is the case more clearly.

      Our results are not generally novel, and many single cell/single nuclei RNA-seq papers have been published (more citations added to Introduction). Our work is novel in that we analyze Type 1 and Type 2 neuroblasts in the central brain.

      Line 53 - In the introduction the authors should also reference other single-cell studies done in the Drosophila brain.

      Done.

      Line 59 - There are some typos here. The authors could also mention type zero.

      Both done.

      Figure 1 and Sup Table 1 - Authors show in sup table 1 the top cell markers by cluster but there is no correspondence between cluster number and identity. The authors do not say which known markers were used to give the identity to each cluster.

      We have added the cell identity in the Supplemental Table 1. For the unknown cells, we left the column blank. We have also added a Supplemental Table 2 to show the markers we used to give identity to the clusters.

      Supplementary Tables - For each table, more detailed information should be provided regarding what is being compared and the methods used for these comparisons.

      We have added the methods we used in Seurat to generate each individual table.

      Line 138 - Differential gene expression analysis between T1 and T2 glial progeny did not show differences across any glial cell types (Supp Table 4). - Was this comparison done per cluster? Is differential gene expression of top markers, which are anyway the genes that define each glial cell type, enough for this type of analysis?

      Yes, we performed the differential expression analysis using all genes (i.e., not just marker defining) at a cluster-by-cluster resolution with results in Supplemental Table 4. We have edited the text to make this clarification.

      Lines 139-141: “Differential gene expression analysis for all genes between T1 and T2 glial progeny did not show differences across any glial cell types or clusters (Supp Table 4).”

      Line 146 - We identified T1-derived neurons by excluding cells co-expressing T2-specific. Markers FLP+/GFP+/RFP+ plus repo+ glial clusters. - Bioinformatically, correct?

      Yes. We clarified the sentence as follows:

      "We identified T1-derived neurons by bioinformatically excluding cells co-expressing T2-specific markers FLP+/GFP+/RFP+ plus repo+ glial clusters."

      Line 156 - We found that each cluster strongly expressed a unique combination of genes. - As they are grouped by seurat in different clusters, why is this surprising?

      Line 175 - "top 10 significantly enriched genes gathered from each T2 neuron cluster" - can these lists be included?

      Yes they are grouped by Seurat. We toned down the sentence and refer each combination of genes as cluster markers. We modified the sentences as follows:

      Each unique combination of enriched genes could be referred to as cluster markers.

      Line 211- How did the authors identify sex-biased clusters? How did the authors separate the samples/cells by sex? Was it done bioinformatically by the expression of certain genes? If so, which?

      We collected male and female nuclei separately. We have added text in the methods section as follows:

      "Equal amounts of male and female central brains (excluding optic lobes) were dissected at room temperature within 1 hour. The samples were flash-frozen in liquid nitrogen and stored separately at -80°.

      In the first round, we pooled male and female brains together to select GFP+ nuclei and used particle-templated instant partitions to capture single nuclei to generate cDNA library (Fluent BioSciences, Waterton, MA). In the second and third rounds, RFP+ nuclei from male and female brains were collected separately. The split-pool method was then used to generate barcoded cDNA libraries from each individual nucleus."

      Are there sex-specific differences in genes in glia other than genes that were previously known to be sex-specific?

      We report the comprehensive list of sex-specific differences in gene expression for both glia and neurons in Supp tables 8 and 9.

      Line 237 - When the authors mention "We conclude that male and female adult T2 neurons have sex-specific differences in gene expression within the same neuronal subtype" does this mean that these neurons are the same in male and in female brains, but they additionally specifically express sex-specific genes?

      Yes, we report that male and females contain the same neurons defined by their transcriptional profile. It remains to be seen if this sex-specific differences changes how these same neuronal subtypes function between male and females. We have added additional text in the discussion to expand on this thought.

      Lines 437-441: “It remains to be determined if these genes are driving sex-specific differences within glial and neuronal subtypes. These genes may reflect sex-specific differences in the adult central brain and may provide insight into how behavioral circuits are linked to sex-specific behaviors. Future work should aim to characterize and test these genes.”

      Line 250 - The idea behind these sections "What is the relationship between neuropeptide expression and cluster identity?" "relation between cluster and morphology" lacks clarity. As clusters are defined based on principal component analysis, and the genes used to define a cluster are dependent on this method, there is no assumption that each cluster represents only one type of neuron or that it should include only neurons expressing the same neurotransmitter genes. Even if some clusters consist of a single neuron type, this should not be generalized to all clusters (and vice-versa).

      Correct, we cannot determine from the transcriptome data whether distinct clusters will have different morphology. We have changed the focus of the question to address that we are confirming they come from type 2 and that they target the central complex while comparing to known cells that express the neuropeptide.

      Line 265 - We first assayed the neuronal morphology of Ms+ neurons - why did the authors choose these neurons?

      Resolved in main text: “we found that type II-derived Ms-2A-LexA-expressing neurons project to multiple layers of the dorsal fan-shaped body and the entire ellipsoid body, suggesting an unknown class of Ms+ neurons targeting to EB and/orFB".

      Line 268 - "Currently we can't determine whether Ms+ neurons in clusters 157 and 160 project to different CX neuropils, or whether neurons from both clusters share projections into both neuropils. " - The purpose of this point is unclear.

      Resolved in text: “we found that type II-derived Ms-2A-LexA-expressing neurons project to multiple layers of the dorsal fan-shaped body and the entire ellipsoid body, suggesting an unknown class of Ms+ neurons targeting to EB and/or FB”.

      Line 279 - This analysis could be more explored.

      Thank you for your feedback. As the comment was somewhat broad, we were unsure of the specific revisions needed and have therefore left the text unchanged.

      Line 301 - The text regarding this section, and the description and details of respective figures should be proofread to ensure clarity.

      Done.

      Line 386 - Alternatively, co-expression may be due to background from RNAs released during dissociation. - RNA in soup could be bioinformatically analysed.

      Correct. We opted to delete this sentence since our split-pool based method does not create background RNA expression. Additionally, the analysis is performed on scaled expression >2, and any background RNA is unlikely to yield such high expression.

      Discussion - Some of the conclusions are a bit too general, suggesting that the results might be meaningful, but also acknowledging the possibility of artifacts. If the authors could refine this, it would strengthen the manuscript.

      We are sorry but we are uncertain what you are asking; we don't know what you want us to refine. Our apologies for the misunderstanding.

    1. eLife Assessment

      This manuscript presents an important contribution to the field of single-cell transcriptomic analysis in cancer by introducing a novel computational framework-SCellBOW-which applies embedding techniques from natural language processing to model phenotypic heterogeneity in tumors. The revised version includes new validation experiments and significant clarifications that provide convincing evidence for the method's utility. The authors have benchmarked SCellBOW across diverse datasets, including glioblastoma, breast, and metastatic prostate cancer, and have demonstrated its superior performance compared to existing state-of-the-art methods.

    2. Reviewer #2 (Public review):

      Summary:

      The authors developed a novel tool, SCellBOW, to perform cell clustering and infer survival risks on individual cancer cell clusters from the single cell RNA seq dataset. The key ideas/techniques used in the tool include transfer learning, bag of words (BOW), and phenotype algebra which is similar to word algebra from natural language processing (NLP). Comparisons with existing methods demonstrated that SCellBOW provides superior clustering results and exhibits robust performance across a wide range of datasets. Importantly, a distinguishing feature of SCellBOW compared to other tools is its ability to assign risk scores to specific cancer cell clusters. Using SCellBOW, the authors identified a new group of prostate cancer cells characterized by a highly aggressive and dedifferentiated phenotype.

      Strengths:

      The application of natural language processing (NLP) to single-cell RNA sequencing (scRNA-seq) datasets is both smart and insightful. Encoding gene expression levels as word frequencies is a creative way to apply text analysis techniques to biological data. When combined with transfer learning, this approach enhances our ability to describe the heterogeneity of different cells, offering a novel method for understanding the biological behavior of individual cells and surpassing the capabilities of existing cell clustering methods. Moreover, the ability of the package to predict risk, particularly within cancer datasets, significantly expands the potential applications.

      Weaknesses:

      Given the promising nature of this tool, it would be beneficial for the authors to test the risk-stratification functionality on other types of tumors with high heterogeneity, such as liver and pancreatic cancers, which currently lack clinically relevant and well-recognized stratification methods. Additionally, it would be worthwhile to investigate how the tool could be applied to spatial transcriptomics by analyzing cell embeddings from different layers within these tissues.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      This review evaluates the SCellBOW framework, which applies phenotype algebra to obtain vectors from cancer subclusters or user-defined subclusters.

      Strengths:

      SCellBOW employs an innovative application of NLP-inspired techniques to analyze scRNA-seq data, facilitating the identification and visualization of phenotypically divergent cell subpopulations. The framework demonstrates robustness in accurately representing various cell types across multiple datasets, highlighting its versatility and utility in different biological contexts. By simulating the impact of specific malignant subpopulations on disease prognosis, SCellBOW provides valuable insights into the relative risk and aggressiveness of cancer subpopulations, which is crucial for personalized therapeutic strategies. The identification of a previously unknown and aggressive AR−/NElow subpopulation in metastatic prostate cancer underscores the potential of SCellBOW in uncovering clinically significant findings.

      Major concerns:

      The reliance on bulk RNA-seq data as a reference raises concerns about potentially misleading results due to the presence of RNA expression from immune cells in the TME. It is unclear if SCellBOW adequately addresses this issue, which could affect the accuracy of the cancer subcluster vectors.

      We appreciate the reviewer's concerns. To address the concern about potentially misleading results due to the TME when using bulk RNA-seq data as a reference:

      a. We account for systematic biases between the single-cell and bulk transcriptomics readouts by creating pseudo-bulk profiles for single-cell clusters, enabling more accurate comparisons [Section Materials and methods, Data preparation for phenotype algebra].

      b. We encode expressions into word vectors and co-embed them together. By doing this, we mitigate any possibility of systematic differences in the embedding. It is imperative that we subject both single-cell and bulk data through the same treatments because otherwise, it will be difficult to perform algebraic operations on them [Section Materials and methods, Generating vectors for phenotype algebra].

      c. In our new analysis of the tumor microenvironment, we have shown that SCellBOW effectively differentiates between malignant and non-malignant cells, confirming that it is not biased by the immune cell composition in the bulk RNA-seq data [Section SCellBOW facilitates survival-risk attribution of tumor subpopulations, Fig. 5g-h].

      The method of extracting vectors in phenotype algebra appears to be a straightforward subtraction operation. This simplicity might limit its efficiency in excluding associations with phenotypes from specific subpopulations, potentially leading to inaccurate interpretations of the data.

      Thanks for this excellent query. Vector algebra operations are not done in the gene expression space (i.e., gene expression vectors associated with tumor samples), rather we process the single cell and bulk expression profiles through multiple steps (pseudo-bulk vector generation for single cell clusters, mapping gene expression values to word frequencies as better understood by the Doc2vec neural networks etc.) to ensure their embeddings are consistent and capture intricate phenotypic information. We have demonstrated this through rigorous validation of the clusters yielded on various types of healthy and diseased samples. Furthermore, we have demonstrated the consistency of the vector algebra operations on known cancer subtypes in breast cancer, glioblastoma, and prostate cancer. We have clarified this further in text. [Section Materials and methods, ‘Generating vectors for phenotype algebra’, ‘Survival risk attribution’].

      The review would benefit from additional validation studies to assess the effectiveness of SCellBOW in distinguishing between cancerous and non-cancerous signals, particularly in heterogeneous tumor environments.

      We thank the reviewer for advising this additional validation. While our study primarily focused on signals from malignant cells, we have now considered the impact of the tumor microenvironment. We observed that the predicted risk score increases when the immune component is subtracted from the tumor, suggesting that tumor aggressiveness increases in the absence of immune components. Importantly, the aggressiveness ranking of tumor subtypes (NE > ARAL > ARAH) remained consistent, confirming that SCellBOW effectively preserves subtype-specific risk stratification [Section SCellBOW facilitates survival-risk attribution of tumor subpopulations, Fig. 5g-h].

      Further clarification on how SCellBOW handles mixed-cell populations within bulk RNA-seq data would strengthen the evaluation of its applicability and reliability in diverse research settings.

      We really appreciate the reviewer’s observation. We clarify that rather than relying on absolute gene expression values, SCellBOW maps bulk RNA-seq data into an embedding space, where we extract the latent representation of the tumor. This process effectively masks the influence of mixed-cell populations, reducing biases introduced by immune or stromal components. Furthermore, phenotype algebra operates within this embedding space by comparing cosine similarities between latent representations of bulk and pseudo-bulk datasets, rather than using direct gene expression values. This allows SCellBOW to capture biologically meaningful relationships and infer tumor-specific signals effectively, even in the presence of heterogeneous cell populations. Our benchmarking across diverse cancer types confirms its effectiveness [Section Results, ‘SCellBOW enables pseudo-grading of metastatic prostate cancer tumor microenvironment’, ‘Unsupervised risk-stratification of metastatic prostate cancer clusters using SCellBOW’].

      Reviewer #2 (Public Review):

      The authors developed a novel tool, SCellBOW, to perform cell clustering and infer survival risks on individual cancer cell clusters from the single-cell RNA seq dataset. The key ideas/techniques used in the tool include transfer learning, bag of words (BOW), and phenotype algebra which is similar to word algebra from natural language processing (NLP). Comparisons with existing methods demonstrated that SCellBOW provides superior clustering results and exhibits robust performance across a wide range of datasets. Importantly, a distinguishing feature of SCellBOW compared to other tools is its ability to assign risk scores to specific cancer cell clusters. Using SCellBOW, the authors identified a new group of prostate cancer cells characterized by a highly aggressive and dedifferentiated phenotype.

      Strengths:

      The application of natural language processing (NLP) to single-cell RNA sequencing (scRNA-seq) datasets is both smart and insightful. Encoding gene expression levels as word frequencies is a creative way to apply text analysis techniques to biological data. When combined with transfer learning, this approach enhances our ability to describe the heterogeneity of different cells, offering a novel method for understanding the biological behavior of individual cells and surpassing the capabilities of existing cell clustering methods. Moreover, the ability of the package to predict risk, particularly within cancer datasets, significantly expands the potential applications.

      Major concerns:

      Given the promising nature of this tool, it would be beneficial for the authors to test the risk-stratification functionality on other types of tumors with high heterogeneity, such as liver and pancreatic cancers, which currently lack clinically relevant and well-recognized stratification methods. Additionally, it would be worthwhile to investigate how the tool could be applied to spatial transcriptomics by analyzing cell embeddings from different layers within these tissue

      (1) We completely agree with the reviewer’s view. Our selection of glioblastoma and breast cancer for this study was primarily driven by the focus on extensively studied and well-defined cancer types. To demonstrate the effectiveness of our model, we tested it on advanced prostate cancer, which currently lacks clinically relevant and well-recognized stratification methods. This application to metastatic prostate cancer serves as a proof of concept, illustrating our model's potential to provide valuable insights into cancer types where established stratification approaches are limited or absent.

      (2) Regarding the application of our tool to spatial transcriptomics, we have already analyzed data from Digital Spatial Profiling (DSP). The article is already quite complex and involved, and we are afraid the inclusion of spatial transcriptomics may amount to a significant extension of the method. To this end, although we will discuss the future possibilities, we will skip the method validity check on spatial transcriptomics data.

      Reviewer #2 (Recommendations For The Authors):

      (1) "SCellBOW adapts the popular document-embedding model Doc2vec for single-cell latent representation learning, which can be used for downstream analysis...": Using only simple gene frequency might overlook the dependent relationships between genes, potentially compromising the biological significance. This could be discussed further.

      This is an excellent point raised by the reviewer. We acknowledge that using only simple gene frequency may overlook dependent relationships between genes, potentially compromising biological significance. To address this, we have now compared SCellBOW on the specific task of phenotype algebra and demonstrated its effectiveness in capturing meaningful biological relationships which is overlooked by simple gene frequency. We have now added the results of this comparison and showed that gene expression data alone couldn't cut it for accurate risk stratification [Section Overall discussion, Supplementary Note 7, Supplementary Fig. 8i-k].

      (2) "While existing methods effectively reveal the subpopulations, they are insufficient in associating malignant risk with specific cellular subpopulations identified from scRNA-seq data....": Perhaps I missed it in the methods section, but how does SCellBOW compare to simply performing pseudobulk analysis on separate cell clusters, treating them as bulk RNA-seq, and then associating the signatures with disease prognosis?

      This is an insightful point, and we appreciate the opportunity to clarify it.

      (1) While pseudobulk analysis on separate cell clusters, followed by associating their signatures with disease prognosis, is a common approach, SCellBOW achieves this without requiring a priori knowledge of prognostic biomarkers to determine whether a subpopulation is aggressive.

      (2) Moreover, pseudobulk analysis aggregates gene expression across cells, which can potentially mask intra-cluster heterogeneity, thereby obscuring important signatures associated with disease prognosis. In contrast, the latent representation in SCellBOW captures the semantic meaning of disease aggressiveness, allowing for a more nuanced and biologically meaningful risk assessment.

      (3) "The proposed approach, SCellBOW, can effectively capture the heterogeneity and risk associated with each phenotype, enabling the identification and assessment of malignant cell subtypes in tumors directly from scRNA-seq gene expression profiles, thereby eliminating the need for marker genes...": Have the author compared the resulting group with well-known markers and do they overlap?

      We appreciate this thoughtful question. While SCellBOW does not rely on predefined marker genes for clustering or risk stratification, we have systematically evaluated whether the resulting subpopulations align with well-known markers. To assess this, we compared SCellBOW-derived clusters with established marker-based annotations across multiple datasets. We observed a significant overlap between SCellBOW clusters and canonical marker-defined cell types in various cancers, including GBM, BRCA, and mCRPC.

      (4) "We constructed three use cases leveraging publicly available scRNA-seq datasets...": The three training and testing datasets are all from healthy tissue. How about in tumor tissue? i.e., Could SCellBOW also identify better cell clusters in tumor datasets?

      We appreciate the reviewer’s inquiry. For benchmarking and method validation, we primarily selected normal tissue datasets as they are heavily annotated and well-characterized. Our goal was to extensively evaluate SCellBOW across different clustering metrics, including ARI, NMI, and SI, which required datasets with reliable ground truth. Tumor datasets, in contrast, often lack confirmatory ground truth, making direct benchmarking more challenging. However, to assess SCellBOW’s applicability in tumor settings, we performed downstream analyses on tumor scRNA-seq datasets using phenotype algebra. Our results demonstrate that SCellBOW effectively identifies distinct cell clusters, including malignant and non-malignant populations, reinforcing its applicability in tumor settings [Section Results, ‘Unsupervised risk-stratification of metastatic prostate cancer clusters using SCellBOW’].

      Minor issues:

      (1) Labels of subplots within the manu/figure should be revised to ensure correct order (missing Figures 3a-d, 4b before 4a, etc).

      We thank the reviewer for pointing this out. We have corrected the figure labels and ensured that all subplots follow the correct order, aligning with the manuscript.

      (2) "reaffirmed the clinically known aggressiveness order, i.e., CLA >-MES >-PRO, where CLA succeeds the rest of the subtypes in aggressiveness48 (Figures 4c, d)...": "Fig. 4c, d" should be "Fig. 4e, f". Also please put Figure 4a before 4b. Overall the order of Figure 4 needs to be revised to match the order in the manu. Similar to Figure 6.

      We have corrected the figure reference to Fig. 4e, f and revised the order of Figure 4 to maintain consistency with the manuscript.

      (3) "Our results showed that SCellBOW learned latent representation of single-cells accurately captures the 'semantics' associated with cellular phenotypes and allows algebraic operations such as'+' and'-'." Figure 5f (SCellBOW performances on mCRPC) should also be cited here since Supplementary Figure 6 contains three datasets (GBM, BRCA, mCRPC) while in Figure 4 only GBM and BRCA were shown?

      We thank the reviewer for this suggestion. We have now cited Figure 5f in this section to ensure that all datasets, including mCRPC, are appropriately referenced.

      (4) Under the subheading "SCellBOW facilitates survival-risk attribution of tumor subpopulations", the lines start with "We refer to this as phenotype algebra. We utilized this ability to find an association between the embedding vectors, representing total tumor - a specific malignant cell cluster with tumor aggressiveness..." could be reduced a little bit especially the re-intro of phenotype algebra since the author has already discussed previously (under "overview of SCellBOW").

      We appreciate the feedback and have condensed this section to avoid redundancy while maintaining clarity in connecting phenotype algebra to survival-risk attribution.

      (5) "Most CD4+ T cells map to CL0 and CL9 (here, CL is used as an abbreviation for cluster) (Figure 3f)..." "(here, CL is used as an abbreviation for cluster)" this note could be moved forward to SF2 since CL is first introduced in SF2.

      We thank the reviewer for the suggestion. We have moved the definition of CL (cluster) to Supplementary Figure 2 (SF2), where it is first introduced, for improved clarity.

    1. eLife Assessment

      The submission by Praveen and colleagues reports important findings describing the structure of genetic and colour variation in its native range for the globally invasive weed Lantana camara. Whilst the importance of the research question and the scale of the sampling is appreciated, the analysis, which is currently incomplete, requires further tests to support the claims made by the authors.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the population structure of the invasive weed Lantana camara from 36 localities in India using 19,008 genome-wide SNPs obtained through ddRAD sequencing.

      Strengths:<br /> The manuscript is well-written, the analyses are sound, and the figures are of great quality.

      Weaknesses:

      The narrative almost completely ignores the fact that this plant is popular in horticultural trade and the different color morphs that form genetic populations are most likely the result of artificial selection by humans for certain colors for trade, and not the result of natural selfing. Although it may be possible that the genetic clustering of color morphs is maintained in the wild through selfing, there is no evidence in this study to support that. The high levels of homozygosity are more likely explained as a result of artificial selection in horticulture and relatively recent introductions in India. Therefore, the claim of the title that "the population structure.. is shaped by its mating system" is in part moot, because any population structure is in large part shaped by the mating system of the organism, but further misleading because it is much more likely artificial selection that caused the patterns observed.

    3. Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found a clear population structure that did not show a geographical pattern, and that flower color was rather associated with population structure. Excess of homozygosity indicates a high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. The authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strengths of the present study.

      Weaknesses:

      One of the most critical weaknesses of this study would be that the output modelling analysis is largely qualitative, which cannot be directly comparable to the empirical data. The main findings of the SLiM-based simulation were that selfing promotes the fixation of alleles and the reduction of genetic diversity. These are theoretically well-reported knowledge, and such findings themselves are not novel, although it may have become interesting these findings are quantitatively integrated with their empirical findings in the studied species. In that sense, a coalescent-based analysis such as an Approximate Bayesian Computation method (e.g. DIY-ABC) utilizing their SNPs data would be more interesting. For example, by ABC-based methods, authors can infer the split time between subpopulations identified in this study. If such split time is older than the recorded invasion date, the result supports the scenario that multiple introductions may have contributed to the population structure of this species. In the current form of the manuscript, multiple introductions were implicated but not formally tested.

      I also have several concerns regarding the authors' population genetic analyses. First, the authors removed SNPs that were not in Hardy-Weinberg equilibrium (HWE), but the studied populations would not satisfy the assumption of HWE, i.e., random mating, because of a high level of inbreeding. Thus, the first screening of the SNPs would be biased strongly, which may have led to spurious outputs in a series of downstream analyses. Second, in the genetic simulation, it is not clear how a set of parameters such as mutation rate, recombination rate, and growth rate were determined and how they are appropriate. Importantly, while authors assume the selfing rate in the simulation, selfing can also strongly influence the effective mutation rate (e.g. Nordborg & Donnelly 1997 Genetics, Nordborg 2000 Genetics). It is not clear how this effect is incorporated in the simulation. Third, while the authors argue the association between flower color and population structure, their statistical associations were not formally tested. Also, it is not mentioned how flower color polymorphisms are defined. Could it be possible to distinguish many flower color morphs shown in Figure 1b objectively? I am concerned particularly because the authors also mentioned that flower color may change temporally and that a single inflorescence can have flowers of different colors (L160).

    4. Author response:

      We sincerely thank the editor and both reviewers for their time and thoughtful feedback on our manuscript. We have addressed several of the concerns in the responses below and are currently working on additional analyses to further strengthen the study. These results will be incorporated into the final version of the research paper.

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the population structure of the invasive weed Lantana camara from 36 localities in India using 19,008 genome-wide SNPs obtained through ddRAD sequencing.

      Strengths:<br /> The manuscript is well-written, the analyses are sound, and the figures are of great quality.

      Weaknesses:

      The narrative almost completely ignores the fact that this plant is popular in horticultural trade and the different color morphs that form genetic populations are most likely the result of artificial selection by humans for certain colors for trade, and not the result of natural selfing. Although it may be possible that the genetic clustering of color morphs is maintained in the wild through selfing, there is no evidence in this study to support that. The high levels of homozygosity are more likely explained as a result of artificial selection in horticulture and relatively recent introductions in India. Therefore, the claim of the title that "the population structure.. is shaped by its mating system" is in part moot, because any population structure is in large part shaped by the mating system of the organism, but further misleading because it is much more likely artificial selection that caused the patterns observed.

      The reviewer raises the possibility that the observed genetic patterns may have originated through the selection of different varieties by the horticultural industry. While it is plausible that artificial selection can lead to the formation of distinct morphs, the presence of a strong structure between them in the wild populations cannot be explained just based on selection. In the wild, different flower colour variants frequently occur in close physical proximity and should, in principle, allow for cross-fertilization. Over time, this gene flow would be expected to erode any genetic structure shaped solely by past selection. However, our results show no evidence of such a breakdown in structure. Despite co-occurring in immediate proximity, the flower colour variants maintain distinct genetic identities. This suggests the presence of a barrier to gene flow, likely maintained by the species' mating system. Moreover, the presence of many of these flower colour morphs in the native range—as documented through observations on platforms like iNaturalist—suggests that these variants may have a natural origin rather than being solely products of horticultural selection.

      While it is plausible that horticultural breeding involved efforts to generate new varieties through crossing—resulting in the emergence of some of the observed morphs—even if this were the case, the dynamics of a self-fertilizing species would still lead to rapid genetic structuring. Following hybridization, just a few generations of selfing are sufficient to produce inbred lines, which can then maintain distinct genetic identities. As discussed in our manuscript, such inbred lines could be associated with specific flower colour morphs and persist through predominant self-fertilization. This mechanism provides a compelling explanation for the strong genetic structure observed among co-occurring flower colour variants in the wild.

      While a recent bottleneck may have increased inbreeding, the strong and consistent genetic structuring we observe within populations is more indicative of predominant self-fertilization. To further validate this, we conducted a bagging experiment on Lantana camara inflorescences to exclude insect-mediated cross-pollination. The results showed no significant difference in seed set between bagged and open-pollinated flowers, supporting the conclusion that L. camara is primarily self-fertilizing in India.

      As the reviewer rightly points out, the mating system of a species plays a crucial role in shaping patterns of genetic structure. However, in many natural populations, structuring patterns are often influenced by a combination of factors such as selection, barriers to gene flow, and genetic drift. In some cases, the mating system exerts a more prominent influence at the microgeographic level, while in others, it can shape genetic structure at broader spatial scales. What is particularly interesting in our study is that - the mating system appears to shape genetic structure at a subcontinental scale. Despite the species having undergone other evolutionary forces—such as a genetic bottleneck and expansion due to its invasive nature—the mating system exerts a more pronounced effect on the observed genetic patterns, and the influence of the mating system is remarkably strong, resulting in a clear and consistent genetic structure across populations.

      Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found a clear population structure that did not show a geographical pattern, and that flower color was rather associated with population structure. Excess of homozygosity indicates a high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. The authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strengths of the present study.

      Weaknesses:

      One of the most critical weaknesses of this study would be that the output modelling analysis is largely qualitative, which cannot be directly comparable to the empirical data. The main findings of the SLiM-based simulation were that selfing promotes the fixation of alleles and the reduction of genetic diversity. These are theoretically well-reported knowledge, and such findings themselves are not novel, although it may have become interesting these findings are quantitatively integrated with their empirical findings in the studied species. In that sense, a coalescent-based analysis such as an Approximate Bayesian Computation method (e.g. DIY-ABC) utilizing their SNPs data would be more interesting. For example, by ABC-based methods, authors can infer the split time between subpopulations identified in this study. If such split time is older than the recorded invasion date, the result supports the scenario that multiple introductions may have contributed to the population structure of this species. In the current form of the manuscript, multiple introductions were implicated but not formally tested.

      Through our SLiM simulations, we aimed to demonstrate that a pattern of strong genetic structure within a location—similar to what we observed in Lantana camara—can arise under a predominantly self-fertilizing mating system. These simulations were not parameterized using species-specific data from Lantana but were intended as a conceptual demonstration of the plausibility of such patterns under selfing using SNP data. While the theoretical consequences of self-fertilisation have been widely discussed, relatively few studies have directly modelled these patterns using SNP data. Our SLiM simulations contribute to this gap and support the notion that the observed genetic structuring in Lantana may indeed result from predominant self-fertilisation.

      We thank the reviewer for the suggestion regarding the use of simulations based on genomic data from Lantana and for explaining the importance of it. We are currently conducting demographic simulations using genomic data from Lantana to estimate divergence times between the different flower colour variants. We believe this analysis will offer deeper insights and provide further clarity on the points raised by the reviewers.

      I also have several concerns regarding the authors' population genetic analyses. First, the authors removed SNPs that were not in Hardy-Weinberg equilibrium (HWE), but the studied populations would not satisfy the assumption of HWE, i.e., random mating, because of a high level of inbreeding. Thus, the first screening of the SNPs would be biased strongly, which may have led to spurious outputs in a series of downstream analyses.

      Hardy-Weinberg Equilibrium (HWE) filtering is a commonly used step in SNP filtering analysis to exclude loci potentially under selection, thereby enriching for neutral variants and minimizing bias in downstream analyses. To ensure that our results are not influenced by selection-driven SNPs, we conducted the analysis both with and without applying the HWE filter. Notably, the number of SNPs retained did not drop significantly after filtering, and the overall patterns observed remained consistent across both approaches.

      Second, in the genetic simulation, it is not clear how a set of parameters such as mutation rate, recombination rate, and growth rate were determined and how they are appropriate. Importantly, while authors assume the selfing rate in the simulation, selfing can also strongly influence the effective mutation rate (e.g. Nordborg & Donnelly 1997 Genetics, Nordborg 2000 Genetics). It is not clear how this effect is incorporated in the simulation.

      The aim of the SLiM simulation was to demonstrate that the extreme genetic structuring observed in Lantana camara can plausibly arise in natural systems under predominant self-fertilization. For the simulation, we used mutation and recombination rates estimated for Arabidopsis thaliana, as these parameters are currently unknown for Lantana. The details of this will be added in the revised version, and thanks to the reviewer for pointing this out. While we acknowledge that this simulation does not provide an exact representation of the species' evolutionary history, the goal of the simulation was not to produce precise estimates but rather to illustrate the feasibility of such strong genetic structuring resulting from self-fertilization alone. The impact of the selfing on the mutation rate is not incorporated in the simulations now. We will look into the details of this.

      Third, while the authors argue the association between flower color and population structure, their statistical associations were not formally tested.

      We recognize that one of the key improvements needed for the manuscript is to provide experimental evidence supporting self-fertilization. To address this, we conducted a bagging experiment on Lantana camara inflorescences to prevent insect visitation and eliminate insect-mediated cross-fertilization. The results showed no significant difference in seed set between bagged and open-pollinated inflorescences, indicating that Lantana is predominantly self-fertilizing in India. This finding is consistent with our genetic data and will be included in the revised version of the manuscript.

      Also, it is not mentioned how flower color polymorphisms are defined. Could it be possible to distinguish many flower color morphs shown in Figure 1b objectively? I am concerned particularly because the authors also mentioned that flower color may change temporally and that a single inflorescence can have flowers of different colors (L160).

      The different flower colour variants are visually distinguishable. Our classification of these variants is not based on the colour of individual flowers at a single time point, but rather on the overall colour change pattern across the inflorescence over time. In other words, the temporal aspect of colour change has been considered in our grouping. For example, in the “yellow-pink” variant, flowers begin as yellow when young and gradually turn pink as they age. Importantly, variants that follow this pattern do not transition to an orange type at any stage, which distinguishes them from other colour types. The varieties that don't change colours are named based on the single flower colour like “orange”.

    1. eLife Assessment

      The authors present an algorithm and workflow for the inference of developmental trajectories from single-cell data, including a mathematical approach to increase computational efficiency. In this latest version, the authors addressed the benchmarking of the novel method, but the absence of quantitative comparisons to state-of-the-art methods still make this study incomplete. Based on the shown validation approaches, one can neither ultimately judge if the shown method will be an advance over previous work nor whether the approach will be of general useful applicability.

    2. Reviewer #1 (Public review):

      The authors present tviblindi, an algorithm to infer cell development trajectories from single-cell molecular data. The paper is well-written and the algorithm is conceptually interesting. However, the validation is incomplete as the comparison against existing trajectory inference methods is weak: although the lack of a proper benchmark was pointed out as the main weakness of the original version of the manuscript, the revised version still only contains qualitative comparisons against state-of-the-art methods.

      Both me and Reviewer 2 pointed out that the lack of a proper benchmark against state-of-the-art methods on a wider variety of datasets (including scRNA-seq data) was a major weakness of the original version of the manuscript. In response to this criticism, the authors now did the following:

      - They ran various competitor methods on the datasets that were used already for the previous version of the manuscript.<br /> - They ran tviblindi and two of the competitors on two public scRNA-seq datasets.<br /> - For all datasets, they qualitatively assessed the trajectories computed by tviblindi and its competitors and argued that tviblindi's trajectories better reflect the biological signal in the data.<br /> - The results of all of these additional analyses are reported in the supplement, which has now become very lengthy (88 pages).

      In my opinion, this is insufficient to establish that tviblindi is comparable or even superior to the state of the art in the field. To show that this is the case, the authors would have to carry out a systematic benchmark study which relies on quantitative evaluation metrics rather than on qualitative intepretations of trajectories. As method developers, we are all susceptive to confirmation bias when comparing our new algorithms to the state of the art. To avoid this pitfall, reporting quantitative performance metrics is required. At the moment, the only quantitative metric reported by the authors is runtime, which is insufficient.

      Moreover, the results of a benchmark study should be reported in the main manuscript, not in the supplement. When presenting a new algorithm in a field as crowded as trajectory inference, a benchmark against the state of the art serves to establish trust in the new algorithm and to provide the readers with a rationale to use it for their research. For this, the results of the benchmark have to be presented prominently and should not be hidden in the supplement.

      A second major criticism raised in Reviewer 2's review of the original version of the manuscript is that tviblindi invites cherry picking due to its inherently interactive design. In response to this, the authors now argue at length that "the data-driven expert interpretation approach of tviblindi" (quote from Section 2.2.2) is a strength rather than a weakness. If we concede for the sake of the argument that tviblindi's "expert interpretation approach" is indeed a strength of the method (although I tend to agree with Reviewer 2 that it is rather a limitation), usability for biologists becomes critical. However, given the current implementation of tviblindi, its usability is far from optimal. The authors do not provide tviblindi as a web interface that is directly usable for domain experts without programming experience and not even as a package that is installable via some widely used package manager such as conda. Instead, they implemented tviblindi as an R package with a Shiny GUI that can either run in a Docker container or requires the installation of several dependencies. I therefore strongly doubt that many biologists will be able or willing to run tviblindi, which substantially limits the value of its "expert interpretation approach". Moreover, tviblindi does not support Apple silicon, which prevented also myself from testing the tool.

    3. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      The authors present an algorithm and workflow for the inference of developmental trajectories from single-cell data, including a mathematical approach to increase computational efficiency. While such efforts are in principle useful, the absence of benchmarking against synthetic data and a wide range of different single-cell data sets make this study incomplete. Based on what is presented, one can neither ultimately judge if this will be an advance over previous work nor whether the approach will be of general applicability.

      We thank the eLife editor for the valuable feedback. Both benchmarking against other methods and validation on a synthetic dataset (“dyntoy”) are indeed presented in the Supplementary Note, although this was not sufficiently highlighted in the main text, which has now been improved.

      Our manuscript contains benchmarking against a challenging synthetic dataset in Figure 1; furthermore, both the synthetic dataset and the real-world thymus dataset have been analyzed in parallel using currently available TI tools (as detailed in the Supplementary Note). z other single-cell datasets (single-cell RNA-seq) were added in response to the reviewers' comments.

      One of the reviewers correctly points out that tviblindi goes against the philosophy of automated trajectory inference. This is correct; we believe that a new class of methods, complementary to fully automated approaches, is needed to explore datasets with unknown biology. tviblindi is meant to be a representative of this class of methods—a semi-automated framework that builds on features inferred from the data in an unbiased and mathematically well-founded fashion (pseudotime, homology classes, suitable low-dimensional representation), which can be used in concert with expert knowledge to generate hypotheses about the underlying dynamics at an appropriate level of detail for the particular trajectory or biological process.

      We would also like to mention that the algorithm and the workflow are not the sole results of the paper. We have thoroughly characterized human thymocyte development, where, in addition to expected biological endpoints, we found and characterized an unexpected activated thymic T-reg endpoint.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present tviblindi, a computational workflow for trajectory inference from molecular data at single-cell resolution. The method is based on (i) pseudo-time inference via expecting hitting time, (ii) sampling of random walks in a directed acyclic k-NN where edges are oriented away from a cell of origin w.r.t. the involved nodes' expected hitting times, and (iii) clustering of the random walks via persistent homology. An extended use case on mass cytometry data shows that tviblindi can be used elucidate the biology of T cell development.

      Strengths:

      - Overall, the paper is very well written and most (but not all, see below) steps of the tviblindi algorithm are explained well.

      - The T cell biology use case is convincing (at least to me: I'm not an immunologist, only a bioinformatician with a strong interest in immunology).

      We thank the reviewer for feedback and suggestions that we will accommodate, we respond point-by-point below

      Weaknesses:

      - The main weakness of the paper is that a systematic comparison of tviblindi against other tools for trajectory inference (there are many) is entirely missing. Even though I really like the algorithmic approach underlying tviblindi, I would therefore not recommend to our wet-lab collaborators that they should use tviblindi to analyze their data. The only validation in the manuscript is the T cell development use case. Although this use case is convincing, it does not suffice for showing that the algorithms's results are systematically trustworthy and more meaningful (at least in some dimension) than trajectories inferred with one of the many existing methods.

      We have compared tviblindi to several trajectory inference methods (Supplementary note section 8.2: Comparison to state-of-the-art methods, namely Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021), StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      Also, in the meantime we have successfully used tviblindi to investigate human B-cell development in primary immunodeficiency (Bakardjieva M, et al. Tviblindi algorithm identifies branching developmental trajectories of human B-cell development and describes abnormalities in RAG-1 and WAS patients. Eur J Immunol. 2024 Dec;54(12):e2451004. doi: 10.1002/eji.202451004.).

      - The authors' explanation of the random walk clustering via persistent homology in the Results (subsection "Real-time topological interactive clustering") is not detailed enough, essentially only concept dropping. What does "sparse regions" mean here and what does it mean that "persistent homology" is used? The authors should try to better describe this step such that the reader has a chance to get an intuition how the random walk clustering actually works. This is especially important because the selection of sparse regions is done interactively. Therefore, it's crucial that the users understand how this selection affects the results. For this, the authors must manage to provide a better intuition of the maths behind clustering of random walks via persistent homology.

      In order to satisfy both reader types: the biologist and the mathematician, we explain the mathematics in detail in the Supplementary Note, section 4. We improved the Results text to better point the reader to the mathematical foundations in the Supplementary Note.  

      - To motivate their work, the authors write in the introduction that "TI methods often use multiple steps of dimensionality reduction and/or clustering, inadvertently introducing bias. The choice of hyperparameters also fixes the a priori resolution in a way that is difficult to predict." They claim that tviblindi is better than the original methods because "analysis is performed in the original high-dimensional space, avoiding artifacts of dimensionality reduction." However, in the manuscript, tviblindi is tested only on mass cytometry data which has a much lower dimensionality than scRNA-seq data for which most existing trajectory inference methods are designed. Since tviblindi works on a k-NN graph representation of the input data, it is unclear if it could be run on scRNA-seq data without prior dimensionality reduction. For this, cell-cell distances would have to be computed in the original high-dimensional space, which is problematic due to the very high dimensionality of scRNA-seq data. Of course, the authors could explicitly reduce the scope of tviblindi to data of lower dimensionality, but this would have to be stated explicitly.

      In the manuscript we tested the framework on the scRNA-seq data from Park et al 2020 (DOI: 10.1126/science.aay3224). To illustrate that tviblindi can work directly in the high-dimensional space, we applied the framework successfully on imputed 2000 dimensional data. Furthermore we successfully used tviblindi to investigate bone marrow atlas scRNA-Seq dataset Zhang et al. (2024) and atlas of mouse gastrulation Pijuan-Sala et al. (2019). The idea behind tviblindi is to be able to work without the necessity to use non-linear dimensionality reduction techniques, which reduce the dimensionality to a very low number of dimensions and whose effects on the data distribution are difficult to predict. On the other hand the use of (linear) dimensionality reduction techniques which effectively suppress noise in the data such as PCA is a good practice (see also response to reviewer 2). We have emphasized this in the revised version and added the results of the corresponding analysis (see Supplementary note, section 9).

      - Also tviblindi has at least one hyper-parameter, the number k used to construct the k-NN graphs (there are probably more hidden in the algorithm's subroutines). I did not find a systematic evaluation of the effect of this hyper-parameter.

      Detailed discussion of the topic is presented in the Supplementary Note, section 8.1, where Spearman correlation coefficient between pseudotime estimated using k=10 and k=50 nearest neighbors was 0.997.   The number k however does affect the number of candidate endpoints. But even when larger k causes spurious connection between unrelated cell fates, the topological clustering of random walks allows for the separation of different trajectories. We have expanded the “sensitivity to hyperparameters” section 8.1 also in response to reviewer 2.

      Reviewer #2 (Public Review):

      Summary:

      In Deconstructing Complexity: A Computational Topology Approach to Trajectory Inference in the Human Thymus with tviblindi, Stuchly et al. propose a new trajectory inference algorithm called tviblindi and a visualization algorithm called vaevictis for single-cell data. The paper utilizes novel and exciting ideas from computational topology coupled with random walk simulations to align single cells onto a continuum. The authors validate the utility of their approach largely using simulated data and establish known protein expression dynamics along CD4/CD8 T cell development in thymus using mass cytometry data. The authors also apply their method to track Treg development in single-cell RNA-sequencing data of human thymus.

      The technical crux of the method is as follows: The authors provide an interactive tool to align single cells along a continuum axis. The method uses expected hitting time (given a user input start cell) to obtain a pseudotime alignment of cells. The pseudotime gives an orientation/direction for each cell, which is then used to simulate random walks. The random walks are then arranged/clustered based on the sparse region in the data they navigate using persistent homology.

      We thank the reviewer for feedback and suggestions that we have accommodated, we responded point-by-point below.

      Strengths:

      The notion of using persistent homology to group random walks to identify trajectories in the data is novel.

      The strength of the method lies in the implementation details that make computationally demanding ideas such as persistent homology more tractable for large scale single-cell data. This enables the authors to make the method more user friendly and interactive allowing real-time user query with the data.

      Weaknesses:

      The interactive nature of the tool is also a weakness, by allowing for user bias leading to possible overfitting for a specific data.

      tviblindi is not designed as a fully automated TI tool (although it implements a fully automated module), but as a data driven framework for exploratory analysis of unknown data. There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models. 

      tvilblindi tries to solve this challenge by intentionally overfitting the data and keeping the level of resolution on a single random walk. In this way we aim to capture all putative local relationships in the data. The on-demand aggregation of the walks using the global topology of the data allows researchers to use their expert knowledge to choose the right level of detail (as demonstrated in the Figure 4 of the manuscript) while relying on the topological structure of the high dimensional point cloud. At all times tviblindi allows to inspect the composition of the trajectory to assess the variance in the development, possible hubs on the KNN-graph etc.

      The main weakness of the method is lack of benchmarking the method on real data and comparison to other methods. Trajectory inference is a very crowded field with many highly successful and widely used algorithms, the two most relevant ones (closest to this manuscript) are not only not benchmarked against, but also not sited. Including those that specifically use persistent homology to discover trajectories (Rizvi et.al. published Nat Biotech 2017). Including those that specifically implement the idea of simulating random walks to identify stable states in single-cell data (e.g. CellRank published in Lange et.al Nat Meth 2022), as well as many trajectory algorithms that take alternative approaches. The paper has much less benchmarking, demonstration on real data and comparison to the very many other previous trajectory algorithms published before it. Generally speaking, in a crowded field of previously published trajectory methods, I do not think this one approach will compete well against prior work (especially due to its inability to handle the noise typical in real world data (as was even demonstrated in the little bit of application to real world data provided).

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      Beyond general lack of benchmarking there are two issues that give me particular concern. As previously mentioned, the algorithm is highly susceptible to user bias and overfitting. The paper gives the example (Figure 4) of a trajectory which mistakenly shows that cells may pass from an apoptotic phase to a different developmental stage. To circumvent this mistake, the authors propose the interactive version of tviblindi that allows users to zoom in (increase resolution) and identify that there are in fact two trajectories in one. In this case, the authors show how the author can fix a mistake when the answer is known. However, the point of trajectory inference is to discover the unknown. With so much interactive options for the user to guide the result, the method is more user/bias driven than data-driven. So a rigorous and quantitative discussion of robustness of the method, as well as how to ensure data-driven inference and avoid over-fitting would be useful.

      Local directionality in expression data is a challenge which is not, to our knowledge, solved. And we are not sure it can be solved entirely, even theoretically. The random walks passing “through” the apoptotic phase are biologically infeasible, but it is an (unbiased) representation of what the data look like based on the diffusion model. It is a property of the data (or of the panel design), which has to be interpreted properly rather than a mistake. Of note, except for Monocle3 (which does not provide the directionality) other tested methods did not discover this trajectory at all.

      The “zoom in” has in fact nothing to do with “passing through the apoptosis”. We show how the researcher can investigate the suggested trajectory to see if there is an additional structure of interest and/or relevance. This investigation is still data driven (although not fully automated). Anecdotally in this particular case this branching was discovered by a bioinformatician, who knew nothing about the presence of beta-selection in the data.  

      We show that the trajectory of apoptosis of cortical thymocytes consists of 2 trajectories corresponding to 2 different checkpoints (beta-selection and positive/negative selection). This type of a structure, where 2 (or more) trajectories share the same path for most of the time, then diverge only to be connected at a later moment (immediately from the point of view of the beta-selection failure trajectory) is a challenge for TI algorithms and none of tested methods gave a correct result. More importantly there seems to be no clear way to focus on these kinds of structures (common origin and common fate) in TI methods.

      Of note, the “zoom in” is a recommended and convenient method to look for an inner structure, but it does not necessarily mean addition of further homological classes. Indeed, in this case the reason that the structure is not visible directly is the limitation of the dendrogram complexity (only branches containing at least 10% of simulated random walks are shown by default). In summary, tviblindi effectively handled all noise in the data that obscured biologically valid trajectories for other methods. We have improved the discussion of the robustness in the current version.  

      Second, the paper discusses the benefit of tviblindi operating in the original high dimensions of the data. This is perhaps adequate for mass cytometry data where there is less of an issue of dropouts and the proteins may be chosen to be large independent. But in the context of single-cell RNA-sequencing data, the massive undersampling of mRNA, as well as high degree of noise (e.g. ambient RNA), introduces very large degree of noise so that modeling data in the original high dimensions leads to methods being fit to the noise. Therefore ALL other methods for trajectory inference work in a lower dimension, for very good reason, otherwise one is learning noise rather than signal. It would be great to have a discussion on the feasibility of the method as is for such noisy data and provide users with guidance. We note that the example scRNA-seq data included in the paper is denoised using imputation, which will likely result in the trajectory inference being oversmoothed as well.

      We agree with the reviewer. In our manuscript we wanted to showcase that tviblindi can directly operate in high-dimensional space (thousands of dimensions) and we used MAGIC imputation for this purpose. This was not ideal. More standard approach, which uses 30-50 PCs as input to the algorithm resulted in equivalent trajectories. We have added this analysis to the study (Supplementary note, section 9).

      In summary, the fact that tviblindi scales well with dimensionality of the data and is able to work in the original space does not mean that it is always the best option. We have added a corresponding comment into the Supplementary note.  

      Reviewer #3 (Public Review):

      Summary:

      Stuchly et al. proposed a single-cell trajectory inference tool, tviblindi, which was built on a sequential implementation of the k-nearest neighbor graph, random walk, persistent homology and clustering, and interactive visualization. The paper was organized around the detailed illustration of the usage and interpretation of results through the human thymus system.

      Strengths:

      Overall, I found the paper and method to be practical and needed in the field. Especially the in-depth, step-by-step demonstration of the application of tviblindi in numerous T cell development trajectories and how to interpret and validate the findings can be a template for many basic science and disease-related studies. The videos are also very helpful in showcasing how the tool works.

      Weaknesses:

      I only have a few minor suggestions that hopefully can make the paper easier to follow and the advantage of the method to be more convincing.

      (1) The "Computational method for the TI and interrogation - tviblindi" subsection under the Results is a little hard to follow without having a thorough understanding of the tviblindi algorithm procedures. I would suggest that the authors discuss the uniqueness and advantages of the tool after the detailed introduction of the method (moving it after the "Connectome - a fully automated pipeline".

      We thank the reviewer for the suggestion and we have accommodated it to improve readability of the text.

      Also, considering it is a computational tool paper, inevitably, readers are curious about how it functions compared to other popular trajectory inference approaches. I did not find any formal discussion until almost the end of the supplementary note (even that is not cited anywhere in the main text). Authors may consider improving the summary of the advantages of tviblindi by incorporating concrete quantitative comparisons with other trajectory tools.

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      (2) Regarding the discussion in Figure 4 the trajectory goes through the apoptotic stage and reconnects back to the canonical trajectory with counterintuitive directionality, it can be a checkpoint as authors interpret using their expert knowledge, or maybe a false discovery of the tool. Maybe authors can consider running other algorithms on those cells and see which tracks they identify and if the directionality matches with the tviblindi.

      We have indeed used the thymus dataset for comparison of all TI algorithms listed above. Except for Monocle 3 they failed to discover the negative selection branch (Monocle 3 does not offer directionality information). Therefore, a valid topological trajectory with incorrect (expert-corrected) directionality was partly or entirely missed by other algorithms. 

      (3) The paper mainly focused on mass cytometry data and had a brief discussion on scRNA-seq. Can the tool be applied to multimodality data such as CITE-seq data that have both protein markers and gene expression? Any suggestions if users want to adapt to scATAC-seq or other epigenomic data?

      The analysis of multimodal data is the logical next step and is the topic of our current research. At this moment tviblindi cannot be applied directly to multimodal data. It is possible to use the KNN-graph based on multimodal data (such as weighted nearest neighbor graph implemented in Seurat) for pseudotime calculation and random walk simulation. However, we do not have a fully developed triangulation for the multimodal case yet. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses:

      -  Benchmark against existing trajectory inference methods.

      -  Benchmark on scRNA-seq data or an explicit statement that, unlike existing methods, tviblindi is not designed for such data.

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      -  Systematic evaluation of the effetcs of hyper-parameters on the performance of tviblindi (as mentioned above, there is at least one hyper-parameter, the number k to construct the k-NN graphs).

      This is described in Supplementary Note section 8.1

      Recommendations for improving the writing and presentation:

      -  The GitHub link to the algorithm which is currently hidden in the Methods should be moved to the abstract and/or a dedicated section on code availability.

      -  The presentation of the persistent homology approach used for random walk clustering should be improved (see public comment above).

      This is described extensively in Supplementary Note  

      -  A very minor point (can be ignored by the authors): consider renaming the algorithm. At least for me, it's extremely difficult to remember.

      We choose to keep the original name

      Minor corrections to the text and figures:

      -  Labels and legend texts are too small in almost all figures.

      Reviewer #2 (Recommendations For The Authors):  

      (1) On page 3: "(2) Analysis is performed in the original high-dimensional space avoiding artifacts of dimensionality reduction." In mass cytometry data where there is no issue of dropouts, one may choose proteins such that they are not correlated with each other making dimensionality reduction techniques less relevant. But in the context of an unbiased assays such as single-cell RNA-sequencing (scRNA-seq), one measures all the genes in a cell so dimensionality reduction can help resolve the redundancy in the feature space due to correlated/co-regulated gene expression patterns. This assumption forms the basis of most methods in scRNA-seq. More importantly, in scRNA-seq data the dropouts and ambient molecules in mRNA counts result in so much noise that modeling cells in the full gene expression is highly problematic. So the authors are requested to discuss in detail how they would propose to deal with noise in scRNA-seq data.

      On this note, the authors mention in Supplementary Note 9 (Analysis of human thymus single-cell RNA-seq data): "Imputed data are used as the input for the trajectory inference, scaled counts (no imputation) are shown in line plots". The line plots indicate the gene expression trends along the obtained pseudotime. The authors use MAGIC to impute the data, and we request the authors to mention this in the Methods section (currently one must look through the code on Supplementary Note 1.3 to find this). Data imputation in single-cell RNA-seq data are intended to enable quantification of individual gene expression distribution or pairwise gene associations. But when all the genes in an imputed data are used for visualization, clustering or trajectory inference, the averaging effect will compound and result in severely smoothed data that misses important differences between cell states. Especially, in the case of MAGIC, which uses a transition matrix raised to a power, it is over-smoothing of the data to use a transition matrix smoothed data to obtain another transition matrix to calculate the hitting time (or simulate random walks). Second, the authors' proposal to use scaled counts to study gene trends cannot be generalized to other settings due to drop out issue. Given the few genes (and only one branch) that are highlighted in Figure 7D-G and Figure 31 in Supplementary Note, it is hard to say if scaling raw values would pick up meaningful biology robustly here for other branches.

      We recommend that this data be reanalyzed with non-imputed data used for trajectory inference and imputed gene expression used for line plots.

      As stated above in the public review, we reanalyzed the scRNA Seq data using a more standard approach (first 50 principal components). We have also analyzed two additional scRNA Seq datasets (Section 1 and section 10 of Supplementary Note)

      On the same note, the authors use Seurat's CellCycleScoring to obtain the cell cycle phase of each cell and later use ScaleData to regress them out. While we agree that it is valuable to remove cell cycle effect from the data for trajectory inference (and has been used previously in other methods), the regression approach employed in Seurat's ScaleData is not appropriate. It is an aggressive approach that severely changes expression pattern of many genes and can result in new artifacts (false positives) in the data. We recommend the authors to explore this more and consider using a more principled alternatives such as fscLVM (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1334-8). 

      Cell cycle correction is an open problem (Heumos, Nat Rev Genetics, 2023)

      Here we use an (arguably aggressive) approach to make the presentation more straightforward. The cells we are interested here (end #6) are not dividing and the regression does not change the conclusion drawn in the paper

      (2) The figures provided are extremely low in resolution that it is practically impossible to correctly interpret a lot of the conclusion and references made in the figure (especially Figure 3 in the main text).

      Resolution of the Figures was improved

      (3) There are many aspects of the method that enable easy user biases and can lead to substantial overfitting of the data.

      a. On page 7: "The topology of the point cloud representing human T-cell development is more complex ... and does not offer a clear cutoff for the choice of significant sparse regions. Interactive selection allows the user to vary the resolution and to investigate specific sparse regions in the data iteratively." This implies that the method enables user biases to be introduced into the data analysis. While perhaps useful for exploration, quantitative trajectory assessment using such approach can be faulty when the user (A) may not know the underlying dynamics (B) forces preconceived notion of trajectory.

      The authors should consider making the trajectory inference approach less dependent on interactive user input and show that the trajectory results are robust to any choices the user may make. It may also help if the authors provide an effective guide and mention clearly what issues could result due to the use of such thresholds.

      As explained in the response in public reviews, tviblindi is not designed as a fully automated TI tool, but as a data driven framework for exploratory analysis of unknown data. 

      There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models.  To specifically address the points raised by the reviewer:

      “(A) may not know the underlying dynamics” - tviblindi is designed to perform exploratory analysis of the unknown underlying dynamics. We showcase in the study how this can be performed and we highlight possible cases which can be resolved expertly (spurious connections (doublets), different scales of resolution (beta selection)). Crucially, compared to other TI methods, tviblindi offers a clear mechanism on how to discover, focus and resolve these issues which would (and do) contaminate the trajectories discovered fully automatically by tested methods (cf. the beta selection, or the development of plasmacytoid dendritic cells (PDCs) (Supplementary note, section 10.1).

      “(B) forces preconceived notion of trajectory” - user interaction in tviblindi does not force a preconceived notion of the trajectory. The random walks are simulated before the interactive step in an unbiased manner. During the interactive step the user adjusts trajectory specific resolution - incorrect choice of the resolution may result in either merging distinct trajectories into one or over separating the trajectories (which is arguably much less serious). However the interactive step is designed to deal with exactly this kind of challenge. We showcase (e.g. beta selection, or PDCs development) how to address the issue - tviblindi allows us to investigate deeper structure in any considered trajectory.

      Thus, tviblindi represents a new class of methods that is complementary to fully automated trajectory inference tools. It offers a semi-automated tool that leverages features derived from data in an unbiased and mathematically rigorous manner, including pseudotime, homology classes, and appropriate low-dimensional representations. These can be integrated with expert knowledge to formulate hypotheses regarding the underlying dynamics, tailored to the specific trajectory or biological process under investigation.

      b. In Figure 4, the authors discuss the trajectory of cells emanating from CD3 negative double positive stage and entering apoptotic phase and mention tviblindi may give "the false impression that cells may pass through an apoptotic phase into a later developmental stage" and propose that the interactive version of tviblindi can help user zoom into (increase resolution) this phenomenon and identify that there are in fact two trajectories in one. Given this, how do the other trajectories in the data change if a user manually adjusts the resolution? A quantification of the robustness is important. Also, it appears that a more careful data clean up could avoid such pitfalls where the algorithm infers trajectory based on mixed phenotype and the user would not have to manually adjust the resolution to obtain clear biological conclusion. We not that the original publication of this data did such "data clean up" using simple diffusion map based dimensionality reduction which the authors boast they avoid. There is a reason for this dimensionality reduction (distinguishing signal from noise), even in CyTOF data, let alone its importance in single cell data.

      The reviewer is concerned about two different, but intertwined issues we wish to untangle here. First, data clean-up is typically done on the premise that dead cells are irrelevant and they are a source of false signals. In the case of the thymocytes in the human thymus this premise is not true. Apoptotic cells are a legitimate (actually dominant) fate of the development and thus need to be represented in the TI dataset. Their biological behavior is however complex as they stop expressing proteins and thus lose their surface markers gradually, as dictated by the particular protein degradation kinetics. So can we clean up dead and dying cells better? Yes, but we don't want to do it since we would lose cells we want to analyze. Second, do trajectories change when we zoom into the data? No, only the level of detail presented visually changes. Since we calculate 5000 trajectories in the dataset, we need to aggregate them already for the hierarchical clustering visualization. Note that Figure 4, panel A highlights 159 trajectories selected in V. group. Zooming in means that the hierarchy of trajectories within V. group is revealed (panel D, groups V.a and Vb.) and can be interpreted on the vaevictis and lineplot graphs (panel E, F). 

      c. In the discussion, the authors write "[tviblindi] allows the selection and grouping of similar random walks into trajectories based on visual interaction with the data". This counters the idea of automated trajectory inference and can lead to severe overfitting.

      As explained in reply to Q3, our aim was NOT to create a fully automated trajectory inference tool. Even more, in our experience we realized that all current tools are taking this fully  automated approach with a search for an “ideal” set of hyperparameters. This, in our experience,  leads to a “blackbox” tool that is difficult to interpret for the expert in the biological field. To respond to this need we designed a modular approach where the results of the TI are presented and the expert can interact with them to focus the visualization and to derive interpretation. Our interactive concept is based on 15 years of experience with the data analysis in flow cytometry, where neither manual gating nor full automation is the ultimate solution but smart integration of both approaches eventually wins the game.

      Thus, tviblindi represents a new class of methods that is complementary to fully automated trajectory inference tools.  It offers a semi-automated tool that leverages features derived from data in an unbiased and mathematically rigorous manner. These features include pseudotime, homology classes, and appropriate low-dimensional representations. These features can be integrated with expert knowledge to formulate hypotheses regarding the underlying dynamics, tailored to the specific trajectory or biological process under investigation.

      d. The authors provide some comment on the robustness to the relaxation parameter for witness complex construction in Supplementary Note Section 8.1.2 but it is limited given the importance of this parameter and a more thorough investigation is recommended. We request the authors to provide concrete examples with figures of how changing alpha2 parameter leads to simplicial complexes of different sizes and an assessment of contexts in which the parameter is robust and when not (in both simulated and publicly available real data). Of note, giving the users a proper guide for parameter choice based on these examples and offering them ways to quantify robustness of their results may also be valuable.

      Section 8 in Supplementary Note was extended as requested.

      e. The authors are requested for an assessment of possible short-circuits (e.g. cells of two distantly related phenotypes that get connected erroneously in the trajectory) in the data, and how their approach based on persistent homology deals with it.

      If a short circuit results in a (spurious) alternative trajectory, the persistent homology approach allows us to distinguish it from genuine trajectories that do not follow the short circuit. This prevents contamination of the inferred evolution by erroneous connections. The ability to distinguish and separate distinct trajectories with the same fate is a major strength of this approach (e.g., the trajectory through doublets or the trajectories around checkpoints in thymocytes’ evolution).

      (4) The authors propose vaevictis as a new visualization tool and show its performance compared to the standard UMAP algorithm on a simulated data set (Figure 1 in Supplementary Notes). We recommend a more comprehensive comparison between the two algorithms on a wide array of publicly available single-cell datasets. As well as comparison to other popular dimensionality reduction approaches like force directed layouts, which are the most widely used tool specifically to visualize trajectories.

      We added Section 10 to Supplementary Note that presents multiple comparisons of this kind. It is important to note that tviblindi works independently of visualization and any preferred visualization can be used in the interactive phase (multiple visualisation methods are implemented).

      (5) In Supplementary Note 8.2, the authors compare tviblindi against the other methods. We recommend the authors to quantify the comparison or expand on their assesments in real biological data. For example, in comparison against Palantir and VIA the authors mention "... discovers candidate endpoints in the biological dataset but lacks toolbox to interrogate subtle features such as complex branching" and "fails to discover subtle features (such as Beta selection)" respectively. We recommend the authors to make these comparisons more precise or provide quantification. While the added benefit of interactive sessions of tviblindi may make it more user friendly, the way tviblindi appears to enable analysis of subtle features (e.g. Figure 1H) should be possible in Palantir or VIA as well.

      We extended the comparisons and presented them in Section 8 and 10 in Supplementary Note.  

      (6) The notion of using random walk simulations to identify terminal (and initial states) has been previously used in single-cell data (CellRank algorithm: https://www.nature.com/articles/s41592-021-01346-6). We request the authors to compare their approach to CellRank.

      We compared our algorithm to the CellRank successor CellRank 2 (see section 8.2, Supplementary Note)

      (7) The notion of using persistent homology to discover trajectories has been previously used in single cell data https://pubmed.ncbi.nlm.nih.gov/28459448/. we request a comparison to this approach

      The proposed algorithm was not able to accommodate the large datasets we used.

      scTDA (Rizvi, Camara et al. Nat. Biotechnol. 2017) has not been updated for 6 years. It is not suited for complex atlas-sized datasets both in terms of performance and utility, with its limited visualization tools. It also lacks capabilities to analyze individual trajectories.

      (8) In Figure 3B, the authors visualize the endpoints and simulated random walks using the connectome. There is no edge from start to the apoptotic cells here. It is not clear why? If they are not relevant based on random walks, can the user remove them from analysis? Same for the small group of pink cells below initial point.

      The connectome is a fully automated approach (similar to PAGA) which gives a basic overview of the data. It is not expected to be able to compete with the interactive pipeline of tviblindi for the same reasons as the fully automated methods (difficult to predict the effect of hyperparameters).

      (9) In Supplementary Figure 3, in relation to "Variants of trajectories including selection processes" the author mention that there is a spurious connection between CD4 single positive, and the doublet set of cells. The authors mention that the presence of dividing cells makes it difficult to remove the doublets. We request the authors to discuss why. For example, the authors seem to have cell cycle markers (e.g. Ki67, pH3, Cyclin) and one would think that coupled with DNA intercalator 191/193lr one could further clean-up the data. Can the authors employ alternative toolkits such as doublet detection methods?

      To address this issue, we do remove doublets with illegitimate cell barcodes (e.g. we remove any two cells from two samples with different barcode which present with double barcode). Although there are computational doublet removal approaches for mass cytometry (Bagwell, Cytometry A 2020), mostly applied to peripheral blood samples (where cell division is not present under steady state immune system conditions), these are however not well suited for situations where dividing samples occur (Rybakowska P, Comput Struct Biotechnol J. 2021), which is the case of our thymocyte samples. Furthermore, there are other situations where doublet formation is not an accident, but rather a biological response (Burel JG, Cytometry A (2020). Thus, the doublet cell problem is similar to the apoptotic cell problem discussed earlier.

      We could remove cells with the double DNA signal, but this would remove not only accidental doublets but also the legitimate (dividing) cells. So the question is how to remove the illegitimate doublets but not the legitimate?

      Of note, the trajectory going through doublets does not affect the interpretation of other trajectories as it is readily discriminated by persistent homology and thus random walks passing through this (spurious) trajectory do not contaminate the markers’ evolution inferred for legitimate trajectories.

      We therefore prefer to remove only the barcode illegitimate and keep all others in analysis, using the expert analysis step also to identify (using the cell cycle markers plus other features) the artificially formed doublets and thus spurious connections.

      (10) The authors should discuss how the gene expression trend plots are made (e.g. how are the expression averaged? Rolling mean?).

      The development of those markers is shown as a line plot connecting the average values of a specific marker within a pseudotime segment. By default, the pseudotime values are divided into uniform segments (each containing the same number of points) whose number can be changed in the GUI. To focus on either early or late stages of the development, the segment division can be adjusted in GUI. See section 6 of the Supplementary Note.

      Reviewer #3 (Recommendations For The Authors):

      The overall figures quality needs to be improved. For example, I can barely see the text in Figure 3c.

      Resolution of the Figures was improved

    1. eLife Assessment

      This study provides a valuable and comprehensive dataset on transcription factor binding in Pseudomonas aeruginosa, along with analyses of its regulatory network, key virulence and metabolic regulators, and a pangenomic examination of transcription factors. Utilizing large-scale ChIP-seq and multi-omics integration, the research convincingly supports the hierarchical regulatory structures and offers insights into virulence mechanisms. While further experimental validation is needed, this publicly accessible PATF_Net database enhances its utility for researchers investigating this significant pathogen associated with hospital infections and antibiotic resistance.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, Huang et al. revealed the complex regulatory functions and transcription network of 172 unknown transcriptional factors (TFs) in Pseudomonas aeruginosa PAO1. They have built a global TF-DNA binding landscape and elucidated binding preferences and functional roles of these TFs. More specifically, the authors established a hierarchical regulatory network and identified ternary regulatory motifs, and co-association modules. Since P. aeruginosa is a well known pathogen, the authors thus identified key TFs associated with virulence pathways (e.g., quorum sensing [QS], motility, biofilm formation), which could be potential drug targets for future development. The authors also explored the TF conservation and functional evolution through pan-genome and phylogenetic analyses. For the easy searching by other researchers, the authors developed a publicly accessible database (PATF_Net) integrating ChIP-seq and HT-SELEX data.

      Strengths:

      (1) The authors performed ChIP-seq analysis of 172 TFs (nearly half of the 373 predicted TFs in P. aeruginosa) and identified 81,009 significant binding peaks, representing one of the largest TF-DNA interaction studies in the field. Also, The integration of HT-SELEX, pan-genome, and phylogenetic analyses provided multi-dimensional insights into TF conservation and function.

      (2) The authors provided informative analytical Framework for presenting the TFs, where a hierarchical network model based on the "hierarchy index (h)" classified TFs into top, middle, and bottom levels. They identified 13 ternary regulatory motifs and co-association clusters, which deepened our understanding of complex regulatory interactions.

      (3) The PATF_Net database provides TF-target network visualization and data-sharing capabilities, offering practical utility for researchers especially for the P. aeruginosa field.

      Weaknesses:

      (1) There is very limited experimental validation for this study. Although 24 virulence-related master regulators (e.g., PA0815 regulating motility, biofilm, and QS) were identified, functional validation (e.g., gene knockout or phenotypic assays) is lacking, leaving some conclusions reliant on bioinformatic predictions. Another approach for validation is checking the mutations of these TFs from clinical strains of P. aeruginosa, where chronically adapted isolates often gain mutations in virulence regulators.

      (2) ChIP-seq in bacteria may suffer from low-abundance TF signals and off-target effects. The functional implications of non-promoter binding peaks (e.g., coding regions) were not discussed.

      (3) PATF_Net currently supports basic queries but lacks advanced tools (e.g., dynamic network modeling or cross-species comparisons). User experience and accessibility remain underevaluated. But this could be improved in the future.

      Achievement of Aims and Support for Conclusions

      (1) The authors successfully mapped global P. aeruginosa TF binding sites, constructed hierarchical networks and co-association modules, and identified virulence-related TFs, fulfilling the primary objectives. The database and pan-genome analysis provide foundational resources for future studies.

      (2) The hierarchical model aligns with known virulence mechanisms (e.g., LasR and ExsA at the bottom level directly regulating virulence genes). Co-association findings (e.g., PA2417 and PA2718 co-regulating pqsH) resonate with prior studies, though experimental confirmation of synergy is needed.

      Impact on the Field and Utility of Data/Methods

      (1) This study fills critical gaps in TF functional annotation in P. aeruginosa, offering new insights into pathogenicity mechanisms (e.g., antibiotic resistance, host adaptation). The hierarchical and co-association frameworks are transferable to other pathogens, advancing comparative studies of bacterial regulatory networks.

      (2) PATF_Net enables rapid exploration of TF-target interactions, accelerating candidate regulator discovery.

    3. Reviewer #3 (Public review):

      Summary:

      The authors utilized ChIP-seq on strains containing tagged transcription factor (TF)-overexpression plasmids to identify binding sites for 172 transcription factors in P. aeruginosa. High-quality binding site data provides a rich resource for understanding regulation in this critical pathogen. These TFs were selected to fill gaps in prior studies measuring TF binding sites in P. aeruginosa. The authors further perform a structured analysis of the resulting transcriptional regulatory network, focusing on regulators of virulence and metabolism, in addition to performing a pangenomic analysis of the TFs. The resulting dataset has been made available through an online database. While the implemented approach to determining functional TF binding sites has limitations, the resulting dataset still has substantial value to P. aeruginosa research.

      Strengths:

      The generated TF binding site database fills an important gap in regulatory data in the key pathogen P. aeruginosa. Key analyses of this dataset presented include an analysis of TF interactions and regulators of virulence and metabolism, which should provide important context for future studies into these processes. The online database containing this data is well organized and easy to access. As a data resource, this work should be of significant value to the infectious disease community.

      Weaknesses:

      Drawbacks of the study include 1) challenges interpreting binding site data obtained from TF overexpression due to unknown activity state of the TFs on the measured conditions, 2) limited practical value of the presented TRN topological analysis, and 3) lack of independent experimental validation of the proposed master regulators of virulence and metabolism.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work done by Huang et.al. revealed the complex regulatory functions and transcription network of 172 unknown transcription factors of Pseudomonas aeruginosa PAO1. The authors utilized ChIP-seq to profile TFs binding site information across the genome, demonstrating diverse regulatory relationships among them via hierarchical networks with three levels. They further constructed thirteen ternary regulatory motifs in small subs and co-association atlas with 7 core associated clusters. The study also uncovered 24 virulence-related master regulators. The pan-genome analysis uncovered both the conservation and evolution of TFs with P. aeruginosa complex and related species. Furthermore, they established a web-based database combining both existing and novel data from HT-SELEX and ChIP-seq to provide TF binding site information. This study offered valuable insights into studying transcription regulatory networks in P. aeruginosa and other microbes.

      Strengths:

      The results are presented with clarity, supported by well-organized figures and tables that not only illustrate the study's findings but also enhance the understanding of complex data patterns.

      Thank you for your valuable feedback on our paper exploring the transcription regulatory networks in P. aeruginosa.

      Weaknesses:

      The results of this manuscript are mainly presented in systematic figures and tables. Some of the results need to be discussed as an illustration how readers can utilize these datasets.

      We appreciate the valuable suggestion about enhancing the practical aspects of our manuscript. We have expanded the discussion section to include more detailed explanations of how these datasets can be utilized in practical applications. 

      Reviewer #2 (Public review):

      In this work, the authors comprehensively describe the transcriptional regulatory network of Pseudomonas aeruginosa through the analysis of transcription factor binding characteristics. They reveal the hierarchical structure of the network through ChIP-seq, categorizing transcription factors into top-, middle-, and bottom-level, and reveal a diverse set of relationships among the transcription factors. Additionally, the authors conduct a pangenome analysis across the Pseudomonas aeruginosa species complex as well as other species to study the evolution of transcription factors. Moreover, the authors present a database with new and existing data to enable the storage and search of transcription factor binding sites. The findings of this study broaden our knowledge on the transcriptome of P. aeruginosa. This study sheds light on the complex interconnections between various cellular functions that contribute to the pathogenicity of P. aeruginosa, along with the associated regulatory mechanisms. Certain findings, such as the regulatory tendencies of DNA-binding domain-types, provides valuable insights on the possible functions of uncharacterized transcription factors and new functions of those that have already been characterized. The techniques used hold great potential for discovery of transcription factor functions in understudied organisms as well.

      The study would benefit from a more clear discussion on the implications of various findings, such as binding preferences, regulatory preferences, and the link between regulatory crosstalk and virulence. Additionally, the pangenome analysis would be furthered through a discussion of the divergence of the transcription factors of P. aeruginosa PAO1 across species in relation to the findings on the hierarchical structure of the transcriptional regulatory network.

      Thank you for your positive feedback and suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major:

      (1) It appears that many TFs are conserved among bacteria, archaebacteria, fungi, plants, and animals. Does this mean these TFs in bacterial could be the ancestors of TFs in fungi, plants, and animals? If we fetch these TFs out and build an evolutionary tree, can we visual the three kingdoms as well?

      Thank you for this comment. While many TFs are conserved across bacteria, archaea, fungi, plants, and animals, this conservation does not necessarily imply a direct ancestral relationship. Instead, it may reflect the fundamental importance of certain domains and regulatory mechanisms, which could have arisen from a common ancestral system or through convergent evolution. If we fetch TF PA2032 out to build an evolutionary tree by setting PAO1 as the root, we can visualize these kingdoms in a tree. We added this content in the revised manuscript. Please see Figure S7D and Lines 404-411.

      “The phylogenetic tree of PA2032 across bacteria, archaea, fungi, plants, and animals, with PAO1 as the root revealed that the bacterial TFs (purple) indicates a high degree of conservation within prokaryotes, suggesting a fundamental role in core regulatory processes. In contrast, eukaryotic TFs (fungi, plants, and animals) form distinct clades with longer branch lengths, indicating significant divergence and specialization during eukaryotic evolution. These findings suggest that while TF is conserved across domains of life, its functional roles and regulatory mechanisms have undergone substantial diversification in eukaryotes.”

      (2) Can the authors give an indication how could we employ the findings of this study in designing next generation of antimicrobial agents?

      Thank you for this important suggestion. We have provided this content in the discussion part. Please see Lines 481-492.

      “The extensive datasets generated in this study offer valuable insights into understanding and targeting P. aeruginosa pathogenicity. The genome-wide binding profiles can be systematically analyzed through our hierarchical regulatory network framework to decode complex virulence mechanisms. The virulence-related master regulators and core regulatory clusters identified in this study highlighted key nodes of transcriptional control. Understanding these regulatory relationships is particularly valuable for identifying targets whose modulation would significantly impact virulence while accounting for potential compensatory mechanisms. This knowledge base thus provides a foundation for developing targeted approaches to combat P. aeruginosa infections, moving beyond traditional antibiotic strategies toward more sophisticated interventions based on regulatory network manipulation.”

      Minor:

      (1) Lines 178-180: It would strengthen the discussion to include a few additional references that support the claims made in this section, providing a more comprehensive context for the readers.

      Yes. We have added more citations(1-5) (No. 1-5 in the references at the end of the rebuttal) to support the claims. Please see Line 182.

      (2) Line 198: You mention 'seven' motifs containing toggle switches, but Fig.3 actually displays eight motifs. Please revise this discrepancy to ensure consistency between the text and the figure.

      Yes. We have revised the wording to “eight”. Please see Line 200.

      (3) Figure 3A: Consider adding a diagram or legend that represents the colors associated with each DNA-binding domain (DBD) family.

      Thank you for your suggestion. The colors of DBD were aligned with the legend in Figure S3. We have added it in Figure 3A.

      Reviewer #2 (Recommendations for the authors):

      Line 21: The use of the abbreviation 'TF' should be done at the first instance of 'transcription factor'.

      Yes. We have revised it. Please see Line 21.

      Line 74: The purpose of this paragraph is slightly unclear. It is recommended that appropriate modifications are made.

      We are sorry for the confusion. The purpose of this paragraph was to introduce the major virulence pathways in P. aeruginosa and mention the important role of TRN in these pathways. We have modified it to make it clearer. Please see Lines 74-75.

      “P. aeruginosa employs diverse virulence pathways to establish successful infection, with QS being one of the major mechanisms involving the expression of many virulence genes.”

      Line 113: How were these 172 TFs selected?

      Thank you for indicating this question. In a previous study, we performed HT-SELEX to characterize the DNA-binding motifs of all TFs in P. aeruginosa PAO1, successfully identifying binding sequences for 182 TFs. To further elucidate the binding landscapes of the rest, we performed ChIP-seq on the remaining TFs (172 TFs in total with high-quality ChIP-seq libraries). Please see Lines 100-101 in the revised manuscript.

      Line 119: Defining other features, namely downstream and include Feature, would be helpful.

      Thank you for your suggestion. We have added the definition for all peak annotation in the legend. Please see Lines 569-574.

      “Annotation heatmap of all peak distribution with 6 locations: Upstream, where the peak is located entirely upstream of the gene; Downstream, where the peak is positioned completely downstream of the gene; Inside, where the peak is entirely contained within the gene body; OverlapStart, where the peak overlaps with the 5' end of the gene; OverlapEnd, where the peak overlaps with the 3' end of the gene; and IncludeFeature, where the peak completely encompasses the gene.”

      Line 129: The distribution type of AraC-type TFs is unclear - it is mentioned that AraC has a 'broad distribution', but it is later stated that it has a 'narrow distribution'.

      We are sorry for this mistake, and we have revised the example for “broad distribution”, which is Cor_CI instead of AraC. Please see Lines 132-135.

      Line 161: 'h value' here may need to be modified to 'absolute h value'.

      Yes. We have revised it. Please see Line 164.

      Line 502: "s The DNA" needs to be corrected.

      Yes. We have revised it. Please see Line 514.

      Line 515: It would be helpful to readers if the reference used for these pathways was cited.

      Yes. We have added the review reference (Shao et al, 2023) related to these pathways(6) (the 6th reference at the end of the rebuttal). Please see Line 527.

      Line 558: "Translation start site" needs to be corrected to "Transcription start site"

      The “TSS” here exactly indicated “Translation start site”.

      Line 593. "Virulent" pathways needs to be corrected to "virulence" pathways.

      Yes. We have revised it. Please see Line 609.

      Line 604: The type of categorization based on which the proportion of genes is displayed needs to be mentioned.

      Yes, we agree. We have added the type of categorization in the legend. Please see Lines 621-627.

      “Figure 6. Conservation and variability of TFs in PAO1. (A). The pie chart shows the proportions of genes categorized by their presence across P. aeruginosa strains for all genes. (B). The pie chart shows the distribution of TFs identified from PAO1 across different conservation categories. (C). The bar plot of the proportion for non-core TFs. Genes are categorized based on their presence frequency across P. aeruginosa strains: Core genes (present in 99% ~ 100% strains), Soft core genes (present in 95% ~ 99% strains), Shell genes (present in 15% ~ 95% strains), and Cloud genes (present in 0% ~ 15% strains).”

      Reference:

      (1) Liang H, Deng X, Li X, Ye Y, Wu M. 2014. Molecular mechanisms of master regulator VqsM mediating quorum-sensing and antibiotic resistance in Pseudomonas aeruginosa. Nucleic acids research 42:10307-10320.

      (2) Jones CJ, Ryder CR, Mann EE, Wozniak DJ. 2013. AmrZ modulates Pseudomonas aeruginosa biofilm architecture by directly repressing transcription of the psl operon. Journal of bacteriology 195:1637-1644.

      (3) Hickman JW, Harwood CS. 2008. Identification of FleQ from Pseudomonas aeruginosa as ac‐di‐GMP‐responsive transcription factor. Molecular microbiology 69:376-389.

      (4) Déziel E, Gopalan S, Tampakaki AP, Lépine F, Padfield KE, Saucier M, Xiao G, Rahme LG. 2005. The contribution of MvfR to Pseudomonas aeruginosa pathogenesis and quorum sensing circuitry regulation: multiple quorum sensing‐regulated genes are modulated without affecting lasRI, rhlRI or the production of N‐acyl‐L‐homoserine lactones. Molecular microbiology 55:998-1014.

      (5) Lizewski SE, Lundberg DS, Schurr MJ. 2002. The transcriptional regulator AlgR is essential for Pseudomonas aeruginosa pathogenesis. Infection and immunity 70:6083-6093.

      (6) Shao X, Yao C, Ding Y, Hu H, Qian G, He M, Deng X. 2023. The transcriptional regulators of virulence for Pseudomonas aeruginosa: Therapeutic opportunity and preventive potential of its clinical infections. Genes & Diseases 10:2049-2063.

    1. eLife Assessment

      In contrast with mammals, measures of cochlear tuning in budgerigars do not match the frequency dependence of behavioral tuning. Earlier behavioral data in the budgerigar had shown good selectivity at around 3-4 kHz, but it was unknown whether this unusual selectivity arose in the inner ear or was a more central adaptation. The authors measured both auditory-nerve tuning curves and stimulus-frequency otoacoustic emissions and found fairly normal-looking cochlear tuning in the budgerigar. These important findings imply that any behavioral/perceptual differences in frequency selectivity are likely more central in original. These solid new data also provide significant support for the utility of otoacoustic estimates of cochlear tuning.

    2. Reviewer #1 (Public review):

      Summary:

      In their manuscript, the authors provide compelling evidence that stimulus-frequency otoacoustic emission (SFOAE) phase-gradient delays predict the sharpness (quality factors) of auditory-nerve-fiber (ANF) frequency tuning curves in budgerigars. In contrast with mammals, neither SFOAE- nor ANF-based measures of cochlear tuning match the frequency dependence of behavioral tuning in this species of parakeet. Although the reason for the discrepant behavioral results (taken from previous studies) remains unexplained, the present data provide significant and important support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored only in mammals.

      Strengths:

      * The OAE and ANF data appear solid and believable. (The behavioral data are taken from previous studies and the resulting limitations are discussed.)

      * No other study in birds (and only a single previous study in mammals) has combined behavioral, auditory-nerve, and otoacoustic estimates of cochlear tuning in a single species.

      * SFOAE-based estimates of cochlear tuning were obtained by assuming that the tuning ratio estimated in chicken applies also to the budgerigar. Possible complications arising from an avian apical-basal transition analogous to that found in mammals are discussed.

    3. Reviewer #2 (Public review):

      Summary:

      Earlier behavioral data in the budgerigar have suggested frequency selectivity that was different from that in many other avian species, showing particularly good selectivity at around 3-4 kHz. It was unknown whether this unusual selectivity was determined in the inner ear, or whether it was a more central adaptation. The results using direct auditory-nerve tuning curves and less invasive stimulus-frequency otoacoustic emissions, suggest fairly normal-looking cochlear tuning in the budgerigar, implying that any behavioral/perceptual differences in frequency selectivity are likely more central in original.

      Strengths:

      - The study presents novel data in budgerigar, comparing the bandwidths of auditory-nerve tuning curves with the latencies of stimulus-frequency otoacoustic emissions (SFOAEs), which are thought to reflect the sharpness of cochlear tuning.<br /> - Using a conversion factor taken from previous data in the chicken to avoid circularity of reasoning, the study shows quite good correspondence between the non-invasive estimates obtained from SFOAEs and the tuning obtained from auditory-nerve fibers. Similarity between budgerigar and chicken are harder to ascertain with the way the data are presented.

      Weaknesses:

      - The comparison of SFOAEs and auditory-nerve tuning curves in the most interesting regions (beyond 3.5 kHz, where some perceptual anomalies seem to occur in some previous data), relies on an extrapolation of the data from the chicken.<br /> - No new behavioral data are presented, so the comparisons made in the paper are between studies separated by decades. None of the behavioral studies cited used the more current techniques that have been claimed to provide a behavioral estimate of cochlear tuning.

    4. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      Previous studies in mammals and other vertebrates have shown that a noninvasive measure of cochlear tuning, based on the latency derived from stimulus-frequency otoacoustic emissions, provides a reasonable, and non-invasive, estimate of cochlear tuning. This valuable study confirms that finding in a new species, the budgerigar, and provides convincing support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored primarily in mammals. The study's remaining claims of a mismatch between behavioral frequency selectivity and cochlear tuning are based on old behavioral data, and collected in an extreme frequency region at the edge of the limits of hearing. Hearing abilities are hard to measure accurately on the upper frequency edge of the hearing range, and the evidence for these claims is weak.

      We appreciate the detailed summary of our paper by the editors highlighting its strengths. As described in the following responses, we added additional evidence to the Introduction supporting that budgerigars have (1) unusual behavioral frequency tuning compared to other bird species and (2) unusual behavioral tuning results in budgerigars are not readily explainable by the audiogram. This additional background information, including Fig. 1B, substantially strengthens the claim of mismatched behavioral and neural/otoacoustic frequency tuning in budgerigars. Moreover, that the behavioral data are “old” seems not particularly relevant considering that the same behavioral methods are still widely used in animal research, as elaborated upon in the responses below. We suggest the term “previously published” to clarify the behavioral data used in our analyses.

      Reviewer #1 (Public review):

      Summary:

      In their manuscript, the authors provide compelling evidence that stimulus-frequency otoacoustic emission (SFOAE) phase-gradient delays predict the sharpness (quality factors) of auditory-nerve-fiber (ANF) frequency tuning curves in budgerigars. In contrast with mammals, neither SFOAE- nor ANF-based measures of cochlear tuning match the frequency dependence of behavioral tuning in this species of parakeet. Although the reason for the discrepant behavioral results (taken from previous studies) remains unexplained, the present data provide significant and important support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored only in mammals.

      Strengths:

      * The OAE and ANF data appear solid and believable. (The behavioral data are taken from previous studies.)

      * No other study in birds (and only a single previous study in mammals) has combined behavioral, auditory-nerve, and otoacoustic estimates of cochlear tuning in a single species.

      * SFOAE-based estimates of cochlear tuning now avoid possible circularity and were are obtained by assuming that the tuning ratio estimated in chicken applies also to the budgerigar.

      Weaknesses:

      * In mammals, accurate prediction of neural Q_ERB from otoacoustic N_SFOAE involves the application of species-invariance of the tuning ratio combined with an attempt to compensate for possible species differences in the location of the so-called apical-basal transition (for a review, see Shera & Charaziak, Cochlear frequency tuning and otoacoustic emissions. Cold Spring Harb Perspect Med 2019; 9:pii a033498. doi: 10.1101/cshperspect.a033498; in particular, the text near Eq. 2 and the value of CFa|b).

      Despite this history, the manuscript makes no mention of the apical-basal transition, its possible role in birds, or why it was ignored in the present analysis. As but one result, the comparative discussion of the tuning ratio (paragraph beginning on lines 383) is incomplete and potentially misleading. Although the paragraph highlights differences in the tuning ratio across groups, perhaps these differences simply reflect differences in the value of CFa|b. For example, if the cochlea of the budgerigar is assumed to be entirely "apical" in character (so that CFa|b is around 7-8 kHz), then the budgerigar tuning ratios appear to align remarkably well with those previously obtained in mammals (see Shera et al 2010, Fig 9).

      We added sections on the apical-basal transition to the Results and Discussion, including how this concept might apply in budgerigars and other birds.

      * For the most part, the authors take previous behavioral results in budgerigar at face value, attributing the discrepant behavioral results to hypothesized "central specializations for the processing of masked signals". But before going down this easy road, the manuscript would be stronger if the authors discussed potential issues that might affect the reliability of the previous behavioral literature. For example, the ANF data show that thresholds rise rapidly above about 5 kHz. Might the apparent broadening of the behavioral filters arise as a consequence of off-frequency listening due to the need to increase signal levels at these frequencies? Or perhaps there are other issues. Inquiring readers would appreciate an informed discussion.

      This is a good point, also raised by reviewer 2, that declining audibility above 4 kHz could impact behavioral tuning estimates. On the other hand, other bird species with highly similar audiograms to budgerigars show conventional behavioral tuning that increases in sharpness relatively slowly and monotonically for higher frequences. Thus, the unusual pattern of behavioral tuning in budgerigars is not fully explainable by the audiogram. We added a section to the Introduction highlighting these points.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes two new sets of data involving budgerigar hearing: 1) auditory-nerve tuning curves (ANTCs), which are considered the 'gold standard' measure of cochlear tuning, and 2) stimulus-frequency otoacoustic emissions (SFOAEs), which are a more indirect measure (requiring some assumptions and transformations to infer cochlear tuning) but which are non-invasive, making them easier to obtain and suitable for use in all species, including humans. By using a tuning ratio (relating ANTC bandwidths and SFOAE delay) derived from another bird species (chicken), the authors show that the tuning estimates from the two methods are in reasonable agreement with each other over the range of hearing tested (280 Hz to 5.65 kHz for the ANTCs), and both show a slow monotonic increase in cochlear tuning quality over that range, as expected. These new results are then compared with (much) older existing behavioral estimates of frequency selectivity in the same species.

      Strengths:

      This topic is of interest, because there are some indications from the older behavioral literature that budgerigars have a region of best tuning, which the current authors refer to as an 'acoustic fovea', at around 4 kHz, but that beyond 5 kHz the tuning degrades. Earlier work has speculated that the source could be cochlear or higher (e.g., Okanoya and Dooling, 1987). The current study appears to rule out a cochlear source to this phenomenon.

      Weaknesses:

      The conclusions are rendered questionable by two major problems.

      The first problem is that the study does not provide new behavioral data, but instead relies on decades-old estimates that used techniques dating back to the 1970s, which have been found to be flawed in various ways. The behavioral techniques that have been developed more recently in the human psychophysical literature have avoided these well-documented confounds, such as nonlinear suppression effects (e.g., Houtgast, https://doi.org/10.1121/1.1913048; Shannon, https://doi.org/10.1121/1.381007; Moore, https://doi.org/10.1121/1.381752), perceptual confusion between pure-tone maskers and targets (e.g., Neff, https://doi.org/10.1121/1.393678), beats and distortion products produced by interactions between simultaneous maskers and targets (e.g., Patterson, https://doi.org/10.1121/1.380914), unjustified assumptions and empirical difficulties associated with critical band and critical ratio measures (Patterson, https://doi.org/10.1121/1.380914), and 'off-frequency listening' phenomena (O'Loughlin and Moore, https://doi.org/10.1121/1.385691). More recent studies, tailored to mimic to the extent possible the techniques used in ANTCs, have provided reasonably accurate estimates of cochlear tuning, as measured with ANTCs and SFOAEs (Shera et al., 2003, 2010; Sumner et al., 2010). No such measures yet exist in budgerigars, and this study does not provide any. So the study fails to provide valid behavioral data to support the claims made.

      We appreciate the reviewer’s efforts in summarizing and critiquing our study. We feel that the budgerigar data collected by the Dooling and Saunders labs remain essentially valid today. The methods used in these behavioral studies are rigorous and remain widely used in animal research (e.g., critical bands and ratios: Yost & Shofner, 2009; King et al., 2015; simultaneous masking: Burton et al., 2018). The methods are based on the same power-spectrum-model assumptions of auditory masking as even the most recent and elaborate human psychophysical procedures. We therefore believe that it remains highly relevant to test and report whether these methods can accurately predict cochlear tuning. More importantly, while forward-masking behavioral results are hypothesized to more accurately predict cochlear tuning humans (Shera et al., 2002; Joris et al., 2011; Sumner et al., 2018), evidence from nonhumans is controversial. For example, one study showed a closer match between forward-masking results and auditory-nerve tuning (ferret: Sumner et al., 2018), whereas several others showed a close match for simultaneous masking results (e.g., guinea pig, chinchilla, macaque; reviewed by Ruggero & Temchin, 2005; see Joris et al., 2011 for macaque auditory-nerve tuning). Moreover, forward- and simultaneous-masking results can often be equated with a simple scaling factor (e.g., Sumner et al., 2018). Given no consensus on an optimal behavioral method, and seemingly limited potential for the “wrong” method to fundamentally transform the shape of the behavioral tuning quality function, it seems reasonable to accept previously published behavioral tuning estimates as valid while also discussing limitations and remaining open to alternative interpretations. We added these points to the discussion and added clarification throughout as to the specific behavioral approaches used.

      The second, and more critical, problem can be observed by considering the frequencies at which the old behavioral data indicate a worsening of tuning. From the summary shown in the present Fig. 2, the conclusion that behavioral frequency selectivity worsens again at higher frequencies is based on four data points, all with probe frequencies between 5 and 6 kHz. Comparing this frequency range with the absolute thresholds shown in Fig. 3 (as well as from older budgerigar data) shows it to be on the steep upper edge of the hearing range. Thus, we are dealing not so much with a fovea as the point where hearing starts to end. The point that anomalous tuning measures are found at the edge of hearing in the budgerigar has been made before: Saunders et al. (1978) state in the last sentence of their paper that "the size of the CB rapidly increases above 4.0 kHz and this may be related to the fact that the behavioral audibility curve, above 4.0 kHz, loses sensitivity at the rate of 55 dB per octave."

      Hearing abilities are hard to measure accurately on the upper frequency edge of the hearing range, in humans as well as in other species. The few attempts to measure human frequency selectivity at that upper edge have resulted in quite messy data and unclear conclusions (e.g., Buus et al., 1986, https://doi.org/10.1007/978-1-4613-2247-4_37). Indeed, the only study to my knowledge to have systematically tested human frequency selectivity in the extended high frequency range (> 12 kHz) seems to suggest a substantial broadening, relative to the earlier estimates at lower frequencies, by as much as a factor of 2 in some individuals (Yasin and Plack, 2005; https://doi.org/10.1121/1.2035594) - in other words by a similar amount as suggested by the budgerigar data. The possible divergence of different measures at the extreme end of hearing could be due to any number of factors that are hard to control and calibrate, given the steep rate of threshold change, leading to uncontrolled off-frequency listening potential, the higher sound levels needed to exceed threshold, as well as contributions from middle-ear filtering. As a side note, in the original ANTC data presented in this study, there are actually very few tuning curves at or above 5 kHz, which are the ones critical to the argument being forwarded here. To my eye, all the estimates above 5 kHz in Fig. 3 fall below the trend line, potentially also in line with poorer selectivity going along with poorer sensitivity as hearing disappears beyond 6 kHz.

      This is an excellent point, also raised by reviewer 1, that declining audibility above 4 kHz could influence behavioral tuning measures. While we acknowledge this possibility, declining audibility cannot fully explain the unusual pattern of behavioral frequency tuning in budgerigars considering that other bird species with the same audiogram phenotype show conventional tuning patterns. We added these points to the Introduction and Fig. 1B. We also added clarification throughout that it is not just the shape of tuning function that is noteworthy in budgerigars, but also the extreme slope in the 1-3.5 kHz region. Behavioral tuning quality in budgerigars increases by 5.3 dB/octave in this range (i.e., nearly doubling each octave increase in frequency), vs. 1.8 dB/octave in humans, 2.5 dB/octave in ferret, 1.1 dB/octave in macaque, and 1.9 dB/octave in starling. This additional background information, including Fig. 1B, substantially strengthens the claim of mismatched behavioral and neural/otoacoustic frequency tuning in budgerigars.

      The basic question posed in the current study title and abstract seems a little convoluted (why would you expect a behavioral measure to reflect cochlear mechanics more accurately than a cochlear-based emissions measure?). A more intuitive (and likely more interesting) way of framing the question would be "What is the neural/mechanical source of a behaviorally observed acoustic fovea?" Unfortunately, this question does not lend itself to being answered in the budgerigar, as that 'fovea' turns out to be just the turning point at the end of the hearing range. There is probably a reason why no other study has referred to this as an acoustic fovea in the budgerigar.

      Overall, a safe interpretation of the data is that hearing starts to change (and becomes harder to measure) at the very upper frequency edge, and not just in budgerigars. Thus, it is difficult to draw any clear conclusions from the current work, other than that the relations between ANTC and SFOAEs estimates of tuning are consistent in budgerigar, as they are in most (all?) other species that have been tested so far.

      We removed the term fovea from the paper. See above for our argument that unusual behavioral tuning in budgerigars is not simply or fully explainable by the audiogram.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Line 34. As far as I could tell, no other study has referred to this region in budgerigar as an acoustic fovea. Probably for good reason (see above). This wording should probably be avoided.

      We removed the term.

      Line 35. Describing 3.5-4 kHz as 'mid-frequencies' is a stretch. 4 kHz is actually the corner frequency, above which hearing degrades.

      We added a more detailed and accurate description of the tuning pattern.

      Lines 89-91. This seems a nice statement of the problem, and to my mind makes for a much better rationale for the study.

      Line 255. "mixed effect" should "mixed effects".

      We made the correction.

      Line 380. Kuhn and Saunders didn't measure high enough to detect any changes in tuning.

      We removed the reference here.

    1. eLife Assessment

      This important study provides compelling evidence for the evolutionary diversification and conserved NFκB-inducing function of RHIM-containing RIP kinase proteins across animal lineages, combining thorough bioinformatic analysis with functional assays in human cells. The findings are of broad interest to immunologists and evolutionary biologists, though some novel observations would benefit from deeper conceptual integration.

    2. Reviewer #2 (Public review):

      Summary:

      By combining bioinformatical and experimental approaches, the authors address the question why several vertebrate lineages lack specific genes of the necroptosis pathway, or those that regulate the interplay between apoptosis and necroptosis. The lack of such genes was already known from previous publications, but the current manuscript provides a more in-depth analysis and also uses experiments in human cells to address the question of functionality of the remaining genes and pathways. A particular focus is placed on RIPK3/RIPK1 and their dual roles in inducing NFkB and/or necroptosis.

      Strengths:

      The well documented bioinformatical analyses provide a comprehensive data basis of the presence/absence of RIP-kinases, other RHIM proteins, apoptosis signaling proteins (FADD,CASP8,CASP10) and some other genes involved in these pathway. Several of these genes are known to be missing in certain animal lineages, which raises the question why their canonical binding partners are present in these species. By expressing several such proteins (both wildtype and mutants destroying particular interaction regions) in human cells, the authors succeed in establishing a general role of RIPK3 and RIPK1 in NFkB activation. This function appears to be better conserved and more universal than the necroptotic function of the RHIM proteins. The authors also scrutinize the importance of the kinase function and RHIM integrity for these separate functionalities.

      Weaknesses:

      A weakness of the presented study is the experimental restriction to human HEK293 cells. There are several situations where the functionality of proteins from distant organisms (like lampreys or even mussels) in human cells is not necessarily indicative of their function in native context. In some cases, these problems are addressed by co-expressing potential interaction partners, but not all of these experiments are really informative. However, I agree with the authors that it is not possible to perform all the experiments in native cells, and that comparing all proteins in the same (human) cell type allows for a better comparison.

      The conclusions drawn by the authors are supported by convincing evidence. I have no doubts that this study will be very useful for future studies addressing the evolution of necroptosis and its regulation by NFkB and apoptosis.

    3. Reviewer #3 (Public review):

      In this study, the authors employ both computational and experimental methods to reveal functional conservation of RIP family kinases and associated proteins in animals, with particular focus on mammals and other major groups of vertebrates. The bionformatic part of the work involves genomic data from diverse animal groups, providing insightful data on loss and duplications patterns for RIP and other necroptosis-related genes, and positive selection signals for RIPK1/3 genes in certain mammalian clades. These findings are then extensively used for selecting species and RHIM tetrad candidates for further experiments, in which the authors demonstrate different modes of functional conservation for RIPK proteins in necroptosis and NF-kB signaling across vertebrate species.

      As an only major drawback, I would mention several important findings which the authors make in the course of their research but do not pursue further in the experimental part of the paper. These include:

      • An additional copy for RIPK2 (RIPK2B) found in monotremes and non-mammalian vertebrates and its functions;<br /> • The entire diversity of RHIM functional tetrad variants; of particular interest here are IQFG and IQLG tetrads specific for bats, which are known to harbor human-affecting viruses and were demonstrated to have their RIPK1/3 genes under positive selection in this study;<br /> • Functions and involvement of RIPK3 protein in NF-kB pathway in lampreys;<br /> • The mode of NF-kB activation in non-mammalian species retaining ZBP1 copies.

      Further elucidation of some or all of these points in the experimental part would facilitate conceptualizing the paper's numerous findings, which otherwise might appear insufficiently scrutinized. On the other hand, I agree that at least some of them require separate studies to be elucidated in. Given the importance of the results presented in this paper, I believe these points will be further addressed in future works.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript titled "Evolutionary and Functional Analyses Reveal a Role for the RHIM in Tuning RIPK3 Activity Across Vertebrates" by Fay et al. explores the function of RIPK gene family members across a wide range of vertebrate and invertebrate species through a combination of phylogenomics and functional studies. By overexpressing these genes in human cell lines, the authors examine their capacity to activate NF-κB and induce cell death. The methods employed are appropriate, with a thorough analysis of gene loss, positive selection, and functionality. While the study is well-executed and comprehensive, its broader relevance remains limited, appealing mainly to specialists in this specific field of research. It misses the opportunity to extract broader insights that could extend the understanding of these genes beyond evolutionary conservation, particularly by employing evolutionary approaches to explore more generalizable functions.

      Major comments:

      The main issue I encounter is distinguishing between what is novel in this study and what has been previously demonstrated. What new insights have been gained here that are of broader relevance? The discussion, which would be a good place to do so, is very speculative and has little to do with the actual results. Throughout the manuscript, there is little explanation of the study's importance beyond the fact that it was possible to conduct it. Is the evolutionary analysis being used to advance our understanding of gene function, or is the focus merely on how these genes behave across different species? The former would be exciting, while the latter feels less impactful.

      We thank the reviewer for the positive feedback. With regard to the major comment, we have now made changes throughout the revised manuscript to highlight the novel insights that emerge from our work, as well as the importance of using evolutionary and functional analyses to understand gene function. 

      Reviewer #2 (Public review):

      Summary:

      By combining bioinformatical and experimental approaches, the authors address the question of why several vertebrate lineages lack specific genes of the necroptosis pathway or those that regulate the interplay between apoptosis and necroptosis. The lack of such genes was already known from previous publications, but the current manuscript provides a more in-depth analysis and also uses experiments in human cells to address the question of the functionality of the remaining genes and pathways. A particular focus is placed on RIPK3/RIPK1 and their dual roles in inducing NFkB and/or necroptosis.

      Strengths:

      The well-documented bioinformatical analyses provide a comprehensive data basis of the presence/absence of RIP-kinases, other RHIM proteins, apoptosis signaling proteins (FADD, CASP8, CASP10), and some other genes involved in these pathways. Several of these genes are known to be missing in certain animal lineages, which raises the question of why their canonical binding partners are present in these species. By expressing several such proteins (both wildtype and mutants destroying particular interaction regions) in human cells, the authors succeed in establishing a general role of RIPK3 and RIPK1 in NFkB activation. This function appears to be better conserved and more universal than the necroptotic function of the RHIM proteins. The authors also scrutinize the importance of the kinase function and RHIM integrity for these separate functionalities.

      Weaknesses:

      A major weakness of the presented study is the experimental restriction to human HEK293 cells. There are several situations where the functionality of proteins from distant organisms (like lampreys or even mussels) in human cells is not necessarily indicative of their function in the native context. In some cases, these problems are addressed by co-expressing potential interaction partners, but not all of these experiments are really informative.

      A second weakness is that the manuscript addresses some interesting effects only superficially. By using host cells that are deleted for certain signaling components, a more focussed hypothesis could have been tested.

      Thus, while the aim of the study is mostly met, it could have been a bit more ambitious. The limited conclusions drawn by the authors are supported by convincing evidence. I have no doubts that this study will be very useful for future studies addressing the evolution of necroptosis and its regulation by NFkB and apoptosis.

      We thank the reviewer for the positive feedback. We agree that our study is limited by using HEK293 cells. However, we do not have appropriate cell lines for all species analyzed and therefore wished to use a single system to test all effects. As the reviewer points out, we do  co-express when possible, and are careful in the manuscript to not overextend our conclusions. We, like the reviewer, believe that many of the intriguingly findings in this study, which was intended to cover a broad range of species, will be useful for more in-depth studies in a given species.

      Reviewer #3 (Public review):

      This important study provides insights into the functional diversification of RIP family kinase proteins in vertebrate animals. The provided results, which combine bioinformatic and experimental analyses, will be of interest to specialists in both immunology and evolutionary biology. However, the computational part of the methodology is insufficiently covered in the paper and the experimental results would benefit from including data for additional species.

      We thank the reviewer for the positive feedback. As described below, we have now addressed the concerns about the description of the computational methods.

      (1) In the Methods section concerning gene loss analysis, the authors refer to the 'Phylogenetic analysis' section for details of RIPK sequence acquisition and alignment procedure. This section is missing from the manuscript as provided. In its absence, it is hard for the reviewer to provide relevant comments on gene presence/absence analysis.

      We have expanded the gene loss analysis methods to be more comprehensive. 

      (2) In the same section, the authors state that gene sequences were filtered and grouped based on the initial gene tree pattern (lines 448-449). How exactly did the authors filter the non-RIP kinases and other irrelevant homologs from the gene trees? Did they consider the reciprocal best (BLAST) hit approach or similar approaches for orthology inference? Did they also encounter potential pseudogenes of genes marked as missing in Figure 1C? Will the gene trees mentioned be available as supplementary files?

      We have expanded the gene loss analysis methods to be more comprehensive. 

      (3) The authors state the presence of additional RIPK2 paralog in non-therian vertebrates.

      The ramifications of this paralog loss in therians are not discussed in the text, although RIPK2 is also involved in NF-kB activation. In addition, the RIPK2B gene loss pattern is shunned from Figure 1C to Supplementary Figure 4, despite posing comparable interest to the reader.

      We are also intrigued by the RIPK2/RIPK2B data and felt it important to include our findings here, however we do not have functional data for RIPK2B at this point and feel it is better suited for a separate study. We therefore focused both the title and the main figures on RIPK3, for which we have functional data.

      (4) The authors present evidence for (repeated) positive selection in both RIPK1 and RIPK3 in bats; however, neither bat RIPK1/3 orthologs nor bat-specific RHIM tetrad variants (IQFG, IQLG) are considered in the experimental part of the work.

      We included a tetrad variant (VQFG) that is found in bats and multiple other species. We wanted to test a wide range of variant amino acids, so testing both IQFG (found only in bats) and VQFG (found in bats and multiple other diverse species) was not of high importance.

      (5) The authors present gene presence/absence patterns for zebra mussels as an outgroup of vertebrate species analyzed. From the evolutionary perspective, adding results for a closer invertebrate group, such as lancelets, tunicates, or echinoderms, would be beneficial for reconstructing the evolutionary progression of RIPK-mediated immune functions in animals.

      In our initial analyses, we searched for RIPK-like proteins in cnidarians, arthropods, nematodes, amoeba, and spiralia, with only spiralia species containing proteins with substantial homology to vertebrate RIPK1 proteins, as defined by a homologous N-terminal kinase domain and C-terminal RHIM and death domain. We have expanded this analysis to include lancelets, tunicates, and echinoderms and found several lancelet species with RIPK1 like proteins. These data have been added to the manuscript.

      (6) In the broader sense, the list of non-mammalian species included in the study is not explained or substantiated in the text. What was the rationale behind selecting lizards, turtles, and lampreys for experimental assays? Why was turtle RIPK3 but not turtle RIPK1CT protein used for functional tests? Which results do the authors expect to observe if amphibian or teleost RIPK1/3 are included in the analysis, especially those with divergent tetrad variants?

      We have added additional text to define our rationale for selecting which species were tested. 

      (7) For lamprey RIPK3, the observed NF-kB activity levels still remain lower than those of mammalian and reptilian orthologs even after catalytic tetrad modification. In the same way, switching human RIPK3 catalytic tetrad to that of lamprey does not result in NF-kB activation. What are the potential reasons for the observed difference? Does it mean that lamprey's RIPK3 functions in NF-kB activation are, at least partially, delegated to RIPK1?

      The function of lamprey RIPK3 is intriguing, albeit unknown. The reduced activation in human cells may be due to an incompatibility between lamprey RIPK3 and human NF-kB machinery, or it may not function in NF-kB at all. Considering that lamprey do not have other components of the known mammalian necroptosis pathway, it is unclear what function RIPK3 would serve in these species. It is possible lamprey may have a necroptosis pathway that is RIPK3-dependent but distinct from the mammalian pathway. It is an interesting question for future study. 

      (8) In lines 386-388, the authors state that 'only non-mammalian RIPK1CT proteins required the RHIM for maximal NF-kB activation', which is corroborated by results in Figure 4B. The authors further associate this finding with a lack of ZBP1 in the respective species (lines 388-389). However, non-squamate reptiles seem to retain ZBP1, as suggested by

      Supplementary Table 1. Given that, do the authors expect to observe RHIM-independent (maximal) NF-kB activation in turtles and crocodilians or respective RIPK1CT-transfected cells?

      While turtles and crocodiles do retain ZBP1, it is still unclear if they are able to activate ZBP1/RIPK3/MLKL-dependent necroptosis similar to mammals, especially given the divergence in the turtle ZBP1 RHIMs seen in Figure 4C. Future studies will be needed to further test our hypotheses and to continue to characterize innate immune function and evolution across a range of vertebrate species. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (1) The title is somewhat restrictive, as it only mentions RIPK3, despite the manuscript covering a broader range of RIPKs and associated proteins.

      We agree that a title that encompasses both the breadth of our study and the depth with which we analyzed RIPK3 would be ideal. However, we were unable to come up with a succinct title that conveyed both points appropriately, so opted for one that focused on our RIPK3 insights.

      (2) Several supplementary figures contain valuable information that could be incorporated into the main figures for greater clarity and emphasis.

      We agree that many interesting pieces of data are in the supplement. We felt it was important to include those data in the manuscript, but also wanted to keep the main manuscript figures as focused as possible.  

      Reviewer #2 (Recommendations for the authors):

      (1) I do not fully agree with the claim that caspase-8 is absent from fish. I briefly repeated this part of the analysis and found several fish proteins that cluster with caspase-8 rather than caspase-10 or cFLIP. From the method section, it does not really become clear how the Casp8/Casp10/cFLIP decision was made, and particularly, how cases were addressed where Genew predate the caspase-8/caspase-10 split. To name just a few examples, the authors might check uniprot:A0A444UA91, W5MXS4, or A0A8X8BKJ8 for being fish Caspase-8 candidates.

      We thank the reviewer for their critical analysis. CASP8 and CASP10 are very similar proteins in humans. We are distinguishing between the two based on vertebrate phylogeny with outgroup proteins (CASP2, CASP9, and CFLAR, see tree in Author response image 1 below) to help define the CASP8/CASP10 clade. Once we isolate CASP8/10, we build an additional tree to distinguish CASP8 and CASP10. Using this method, all fish CASP8/10-like proteins cluster with the mammalian CASP10 clade rather than the CASP8 clade, despite many fish proteins being annotated as CASP8 or CASP8-like. We do acknowledge that, because of the similarities between CASP8 and CASP10, there are likely proteins that can fall in either clade depending on which outgroups are included. To this end, we have updated our gene loss figure to only denote whether a species has no CASP8/10, a single CASP8/10 protein, or both CASP8 and CASP10. We have also updated our methods to better define how we completed our analyses. 

      Author response image 1.

      (2) While analyzing which RIPK3 protein causes cell death (lines 188ff), the underlying assumption is that the heterologous RIPK3 proteins can interact with human MLKL and activate it by phosphorylation. No attempts are being made to check if MLKL actually gets phosphorylated, and this issue is also not discussed. In Figure 2C, cell death is either measured by RIPK3 overexpression alone or by the additional overexpression of ZBP1 and MLKL. However, it is not shown if in all cases all the transfected proteins are expressed at a comparable level, or if the observed cell death might be caused by MLKL/ZBP1 overexpression alone.

      Cell death is dependent on expression of ZBP1, MLKL, and RIPK3, as shown in

      Supplementary Figure 6. We have attempted to detect phospho-MLKL via western blot. However, in these overexpression assays, we are able to detect phospho-MLKL in the presence of RIPK3 and MLKL alone, independent of activation of cell death. In fact, we see reduced phospho-MLKL and reduced expression of MLKL overall when ZBP1, MLKL, and RIPK3 are added, presumably due to cell death induced in these conditions (see blot in Author response image 2 below). We therefore felt these data were of limited use here.

      Author response image 2.

      (3) The manuscript describes a well-documented bioinformatical analysis and acknowledges the body of earlier published work on necroptosis evolution and associated gene losses. However, when discussing the RHIM-related aspects, the authors do not mention previous publications on RHIM conservation in invertebrates and even fungal proteins such as Het-S. They also fail to mention/discuss the amyloid-forming properties of RHIMs, which I consider crucial for understanding the function of RHIM-containing proteins.

      We thank the reviewer for their insight. We have added additional points on both RHIM conservation and amyloid formation.

      (4) Related to the above issue: In lines 226ff, the induction of NFkB by RIPK3 overexpression is described. While RIPK3 from other mammals requires endogenous (human) RIPK1 to be present, lizard and turtle RIPK3 do not require human RIPK1 but *do* require functional RHIMs. It is not checked (or at least discussed) if RHIM amyloid formation is required, nor if the RHIM of the heterologous RIPK3 might act through interaction with endogenous (human) RIPK3.

      We and others (PMID: 29073079) did not detect RIPK3 protein in HEK293T cells. This, combined with the requirement for exogenous RIPK3 to activate cell death, indicate that endogenous RIPK3 is not contributing to these assays. 

      (5) In lines 275ff, the authors observe that RIPK1s from other mammalian species do not require the RHIM for NFkB activation, while RIPK1 from non-mammalian species do require the RHIM. I wonder why the (in my opinion) most obvious explanation is not addressed: Maybe the mammalian RIPK1 proteins are similar enough to the human one so that they can signal on their own, while the more distant RIPK1 cannot and thus require human RIPK1 (associated via RHIMs) for NFkB activation? Since the authors used RIPK1-deficient cells in previous experiments, wouldn't it make sense to test them here, too?

      It is intriguing that the more diverged RIPK1 species require the RHIM for NF-kB signaling. In Supplementary Figure 12, we do test the mammalian and non-mammalian proteins in RIPK1 KO cells and all proteins are able to activate NF-kB. So while nonmammalian RIPK1 signaling is dependent on the RHIM, it is independent of endogenous RIPK1.  

      Minor comments:

      (1) In the legend of Figure 1, there is a typo "heat amp".

      This typo has now been corrected.

      (2) In Figure 3A, the term "FUBAR" is not explained at all.

      FUBAR has now been defined in the methods section.

      Reviewer #3 (Recommendations for the authors):

      A few typos and graph inconsistencies have been encountered in the course of the manuscript, e.g.:

      (1) Line 168: 'heat amp' -> 'heat map'.

      (2) Lines 290-291: 'known mediate' -> 'known to mediate' (?)

      We thank the reviewer for catching these mistakes. They have been corrected. 

      (3) Supplementary Figure 12: Are human RIPK1 results presented in both 'mammalian' and 'non-mammalian' parts of the figure? If so, why do human data differ between the graphs?

      Mammalian and non-mammalian data were collected in separate experiments with human RIPK1 used as a control for both. The human data shown in the two graphs represent two separate experiments.

    1. eLife Assessment

      This important study investigates the molecular mechanisms by which the p53 isoforms Δ133p53α and Δ160p53α exert dominant-negative effects on full-length p53 (FLp53). Through a combination of chromatin immunoprecipitation, transcriptional reporter assays, subcellular localization analyses, and protein aggregation experiments, the authors provide solid evidence that these N-terminally truncated isoforms promote co-aggregation with FLp53, disrupting its transcriptional activity and cellular distribution. The revised manuscript successfully addresses prior reviewer concerns, and the findings are well supported by the experimental data.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The authors have provided a mechanism by which how presence of truncated P53 can inactivate function of full length P53 protein. The authors proposed this happens by sequestration of full length P53 by truncated P53. In the study, the performed experiments are well described.

      Significance:

      The work in significant, since it points out more mechanistic insight how wild type full length P53 could be inactivated in the presence of truncated isoforms, this might offer new opportunity to recover P53 function as treatment strategies against cancer.

      Comments on latest version:

      The authors have made significant effort to address my concerns using the system available to them. I find the justifications provided in the rebuttal letter and the revised figures satisfactory. My initial concerns regarding the overexpression system have been largely addressed. However, the experimental system used by the authors lacks the means to measure the effect on endogenous p53, which remains a limitation.

    3. Reviewer #2 (Public review):

      Summary:

      The revised manuscript by Zhao and colleagues presents a novel and compelling investigation into the p53 isoforms, Δ133p53 and Δ160p53, which are implicated in aggressive cancer phenotypes. The primary goal of this study was to elucidate how these isoforms exert a dominant-negative impact on the activity of full-length p53 (FLp53). The authors demonstrate that the Δ133p53 and Δ160p53 isoforms display impaired binding to p53-regulated promoters. Their findings suggest that the dominant-negative effects observed are primarily due to the co-aggregation of FLp53 with Δ133p53 and Δ160p53.

      Overall, the study is innovative, thoroughly executed, and supported by robust data analysis. The authors have effectively addressed the reviewers' criticisms and incorporated their suggestions in this revised manuscript.

      Significance:

      The manuscript by Zhao and colleagues presents a novel and compelling study on the p53 isoforms, Δ133p53 and Δ160p53, which are associated with aggressive cancer types. The main objective of the study was to understand how these isoforms exert a dominant negative effect on full-length p53 (FLp53). The authors discovered that the Δ133p53 and Δ160p53 proteins exhibit impaired binding to p53-regulated promoters. The data suggest that the predominant mechanism driving the dominant-negative effect is the co-aggregation of FLp53 with Δ133p53 and Δ160p53.

    4. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Authors has provided a mechanism by which how presence of truncated P53 can inactivate function of full length P53 protein. Authors proposed this happens by sequestration of full length P53 by truncated P53.

      In the study, performed experiments are well described.

      My area of expertise is molecular biology/gene expression, and I have tried to provide suggestions on my area of expertise. The study has been done mainly with overexpression system and I have included few comments which I can think can be helpful to understand effect of truncated P53 on endogenous wild type full length protein. Performing experiments on these lines will add value to the observation according to this reviewer.

      Major comments:

      (1) What happens to endogenous wild type full length P53 in the context of mutant/truncated isoforms, that is not clear. Using a P53 antibody which can detect endogenous wild type P53, can authors check if endogenous full length P53 protein is also aggregated as well? It is hard to differentiate if aggregation of full length P53 happens only in overexpression scenario, where lot more both of such proteins are expressed. In normal physiological condition P53 expression is usually low, tightly controlled and its expression get induced in altered cellular condition such as during DNA damage. So, it is important to understand the physiological relevance of such aggregation, which could be possible if authors could investigate effect on endogenous full length P53 following overexpression of mutant isoforms.

      Thank you very much for your insightful comments.

      (1) To address “what happens to endogenous wild-type full-length P53 in the context of mutant/truncated isoforms," we employed a human A549 cell line expressing endogenous wild-type p53 under DNA damage conditions such as an etoposide treatment(1). We choose the A549 cell line since similar to H1299, it is a lung cancer cell line (www.atcc.org). For comparison, we also transfected the cells with 2 μg of V5-tagged plasmids encoding FLp53 and its isoforms Δ133p53 and Δ160p53. As shown in Author response image 1A, lanes 1 and 2, endogenous p53 expression, remained undetectable in A549 cells despite etoposide treatment, which limits our ability to assess the effects of the isoforms on the endogenous wild-type FLp53. We could, however, detect the V5-tagged FLp53 expressed from the plasmid using anti-V5 (rabbit) as well as with antiDO-1 (mouse) antibody (Author response image 1). The latter detects both endogenous wildtype p53 and the V5-tagged FLp53 since the antibody epitope is within the Nterminus (aa 20-25). This result supports the reviewer’s comment regarding the low level of expression of endogenous p53 that is insufficient for detection in our experiments.   

      In summary, in line with the reviewer’s comment that ‘under normal physiological conditions p53 expression is usually low,’ we could not detect p53 with an anti-DO-1 antibody. Thus, we proceeded with V5/FLAG-tagged p53 for detection of the effects of the isoforms on p53 stability and function. We also found that protein expression in H1299 cells was more easily detectable than in A549 cells (Compare Author response image 1A and B). Thus, we decided to continue with the H1299 cells (p53-null), which would serve as a more suitable model system for this study.  

      (2) We agree with the reviewer that ‘It is hard to differentiate if aggregation of full-length p53 happens only in overexpression scenario’. However, it is not impossible to imagine that such aggregation of FLp53 happens under conditions when p53 and its isoforms are over-expressed in the cell. Although the exact physiological context is not known and beyond the scope of the current work, our results indicate that at higher expression, p53 isoforms drive aggregation of FLp53. Given the challenges of detecting endogenous FLp53, we had to rely on the results obtained with plasmid mediated expression of p53 and its isoforms in p53-null cells.

      Author response image 1.

      Comparative analysis of protein expression in A549 and H1299 cells. (A) A549 cells (p53 wild-type) were treated with etoposide to induce endogenous wild-type p53 expression. To assess the effects of FLp53 and its isoforms Δ133p53 and Δ160p53 on endogenous wild-type p53 aggregation, A549 cells were transfected with 2 μg of V5-tagged p53 expression plasmids, with or without etoposide (20μM for 8h) treatment. Western blot analysis was done with the anti-V5 (rabbit) to detect V5-tagged proteins and anti-DO-1 (mouse), the latter detects both endogenous wild-type p53 and V5-tagged FLp53. The merged image corresponds to the overlay between the V5 and DO1 antibody signals. (B) H1299 cells (p53-null) were transfected with 2 μg V5tagged p53 expression plasmids or the empty vector control pcDNA3.1. Western blot analysis was done with the anti-V5 (mouse) antibody. 

      (2) Can presence of mutant P53 isoforms can cause functional impairment of wild type full length endogenous P53? That could be tested as well using similar ChIP assay authors has performed, but instead of antibody against the Tagged protein if the authors could check endogenous P53 enrichment in the gene promoter such as P21 following overexpression of mutant isoforms. May be introducing a condition such as DNA damage in such experiment might help where endogenous P53 is induced and more prone to bind to P53 target such as P21.

      Thank you very much for your valuable comments and suggestions. To investigate the potential functional impairment of endogenous wild-type p53 by p53 isoforms, we initially utilized A549 cells (p53 wild-type), aiming to monitor endogenous wild-type p53 expression following DNA damage. However, as mentioned and demonstrated in Author response image 1, endogenous p53 expression was too low to be detected under these conditions, making the ChIP assay for analyzing endogenous p53 activity unfeasible. Thus, we decided to utilize plasmid-based expression of FLp53 and focus on the potential functional impairment induced by the isoforms.

      (3) On similar lines, authors described:

      "To test this hypothesis, we escalated the ratio of FLp53 to isoforms to 1:10. As expected, the activity of all four promoters decreased significantly at this ratio (Figure 4A-D). Notably, Δ160p53 showed a more potent inhibitory effect than Δ133p53 at the 1:5 ratio on all promoters except for the p21 promoter, where their impacts were similar (Figure 4E-H). However, at the 1:10 ratio, Δ133p53 and Δ160p53 had similar effects on all transactivation except for the MDM2 promoter (Figure 4E-H)."

      Again, in such assay authors used ratio 1:5 to 1:10 full length vs mutant. How authors justify this result in context (which is more relevant context) where one allele is Wild type (functional P53) and another allele is mutated (truncated, can induce aggregation). In this case one would except 1:1 ratio of full-length vs mutant protein, unless other regulation is going which induces expression of mutant isoforms more than wild type full length protein. Probably discussing on these lines might provide more physiological relevance to the observed data.

      Thank you for raising this point regarding the physiological relevance of the ratios used in our study.

      (1) In the revised manuscript (lines 193-195), we added in this direction that “The elevated Δ133p53 protein modulates p53 target genes such as miR‑34a and p21, facilitating cancer development(2, 3). To mimic conditions where isoforms are upregulated relative to FLp53, we increased the ratios to 1:5 and 1:10.” This approach aims to simulate scenarios where isoforms accumulate at higher levels than FLp53, which may be relevant in specific contexts, as also elaborated above.

      (2) Regarding the issue of protein expression, where one allele is wild-type and the other is isoform, this assumption is not valid in most contexts. First, human cells have two copies of TPp53 gene (one from each parent). Second, the TP53 gene has two distinct promoters: the proximal promoter (P1) primarily regulates FLp53 and ∆40p53, whereas the second promoter (P2) regulates ∆133p53 and ∆160p53(4, 5). Additionally, ∆133TP53 is a p53 target gene(6, 7) and the expression of Δ133p53 and FLp53 is dynamic in response to various stimuli. Third, the expression of p53 isoforms is regulated at multiple levels, including transcriptional, post-transcriptional, translational, and post-translational processing(8). Moreover, different degradation mechanisms modify the protein level of p53 isoforms and FLp53(8). These differential regulation mechanisms are regulated by various stimuli, and therefore, the 1:1 ratio of FLp53 to ∆133p53 or ∆160p53 may be valid only under certain physiological conditions. In line with this, varied expression levels of FLp53 and its isoforms, including ∆133p53 and ∆160p53, have been reported in several studies(3, 4, 9, 10). 

      (3) In our study, using the pcDNA 3.1 vector under the human cytomegalovirus (CMV) promoter, we observed moderately higher expression levels of ∆133p53 and ∆160p53 relative to FLp53 (Author response image 1B). This overexpression scenario provides a model for studying conditions where isoform accumulation might surpass physiological levels, impacting FLp53 function. By employing elevated ratios of these isoforms to FLp53, we aim to investigate the potential effects of isoform accumulation on FLp53.

      (4) Finally does this altered function of full length P53 (preferably endogenous one) in presence of truncated P53 has any phenotypic consequence on the cells (if authors choose a cell type which is having wild type functional P53). Doing assay such as apoptosis/cell cycle could help us to get this visualization.

      Thank you for your insightful comments. In the experiment with A549 cells (p53 wild-type), endogenous p53 levels were too low to be detected, even after DNA damage induction. The evaluation of the function of endogenous p53 in the presence of isoforms is hindered, as mentioned above. In the revised manuscript, we utilized H1299 cells with overexpressed proteins for apoptosis studies using the Caspase-Glo® 3/7 assay (Figure 7). This has been shown in the Results section (lines 254-269). “The Δ133p53 and Δ160p53 proteins block pro-apoptotic function of FLp53.

      One of the physiological read-outs of FLp53 is its ability to induce apoptotic cell death(11). To investigate the effects of p53 isoforms Δ133p53 and Δ160p53 on FLp53-induced apoptosis, we measured caspase-3 and -7 activities in H1299 cells expressing different p53 isoforms (Figure 7). Caspase activation is a key biochemical event in apoptosis, with the activation of effector caspases (caspase-3 and -7) ultimately leading to apoptosis(12). The caspase-3 and -7 activities induced by FLp53 expression was approximately 2.5 times higher than that of the control vector (Figure 7). Co-expression of FLp53 and the isoforms Δ133p53 or Δ160p53 at a ratio of 1: 5 significantly diminished the apoptotic activity of FLp53 (Figure 7). This result aligns well with our reporter gene assay, which demonstrated that elevated expression of Δ133p53 and Δ160p53 impaired the expression of apoptosis-inducing genes BAX and PUMA (Figure 4G and H). Moreover, a reduction in the apoptotic activity of FLp53 was observed irrespective of whether Δ133p53 or Δ160p53 protein was expressed with or without a FLAG tag (Figure 7). This result, therefore, also suggests that the FLAG tag does not affect the apoptotic activity or other physiological functions of FLp53 and its isoforms. Overall, the overexpression of p53 isoforms Δ133p53 and Δ160p53 significantly attenuates FLp53-induced apoptosis, independent of the protein tagging with the FLAG antibody epitope.”

      Referees cross-commenting

      I think the comments from the other reviewers are very much reasonable and logical.

      Especially all 3 reviewers have indicated, a better way to visualize the aggregation of full-length wild type P53 by truncated P53 (such as looking at endogenous P53# by reviewer 1, having fluorescent tag #by reviewer 2 and reviewer 3 raised concern on the FLAG tag) would add more value to the observation.

      Thank you for these comments. The endogenous p53 protein was undetectable in A549 cells induced by etoposide (Figure R1A). Therefore, we conducted experiments using FLAG/V5-tagged FLp53.  To avoid any potential side effects of the FLAG tag on p53 aggregation, we introduced untagged p53 isoforms in the H1299 cells and performed subcellular fractionation. Our revised results, consistent with previous FLAG-tagged p53 isoforms findings, demonstrate that co-expression of untagged isoforms with FLAG-tagged FLp53 significantly induced the aggregation of FLAG-FLp53, while no aggregation was observed when FLAG-tagged FLp53 was expressed alone (Supplementary Figure 6). These results clearly indicate that the FLAG tag itself does not contribute to protein aggregation. 

      Additionally, we utilized the A11 antibody to detect protein aggregation, providing additional validation (Figure 8 from Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137). Given that the fluorescent proteins (~30 kDa) are substantially bigger than the tags used here (~1 kDa) and may influence oligomerization (especially GFP), stability, localization, and function of p53 and its isoforms, we avoided conducting these vital experiments with such artificial large fusions. 

      Reviewer #1 (Significance):

      The work in significant, since it points out more mechanistic insight how wild type full length P53 could be inactivated in the presence of truncated isoforms, this might offer new opportunity to recover P53 function as treatment strategies against cancer.

      Thank you for your insightful comments. We appreciate your recognition of the significance of our work in providing mechanistic insights into how wild-type FLp53 can be inactivated by truncated isoforms. We agree that these findings have potential for exploring new strategies to restore p53 function as a therapeutic approach against cancer. 

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript by Zhao and colleagues presents a novel and compelling study on the p53 isoforms, Δ133p53 and Δ160p53, which are associated with aggressive cancer types. The main objective of the study was to understand how these isoforms exert a dominant negative effect on full-length p53 (FLp53). The authors discovered that the Δ133p53 and Δ160p53 proteins exhibit impaired binding to p53-regulated promoters. The data suggest that the predominant mechanism driving the dominant-negative effect is the coaggregation of FLp53 with Δ133p53 and Δ160p53.

      This study is innovative, well-executed, and supported by thorough data analysis. However, the authors should address the following points:

      (1) Introduction on Aggregation and Co-aggregation: Given that the focus of the study is on the aggregation and co-aggregation of the isoforms, the introduction should include a dedicated paragraph discussing this issue. There are several original research articles and reviews that could be cited to provide context.

      Thank you very much for the valuable comments. We have added the following paragraph in the revised manuscript (lines 74-82): “Protein aggregation has become a central focus of modern biology research and has documented implications in various diseases, including cancer(13, 14, 15). Protein aggregates can be of different types ranging from amorphous aggregates to highly structured amyloid or fibrillar aggregates, each with different physiological implications. In the case of p53, whether protein aggregation, and in particular, co-aggregation with large N-terminal deletion isoforms, plays a mechanistic role in its inactivation is yet underexplored. Interestingly, the Δ133p53β isoform has been shown to aggregate in several human cancer cell lines(16). Additionally, the Δ40p53α isoform exhibits a high aggregation tendency in endometrial cancer cells(17). Although no direct evidence exists for Δ160p53 yet, these findings imply that p53 isoform aggregation may play a major role in their mechanisms of actions.”

      (2) Antibody Use for Aggregation: To strengthen the evidence for aggregation, the authors should consider using antibodies that specifically bind to aggregates.

      Thank you for your insightful suggestion. We addressed protein aggregation using the A11 antibody which specifically recognizes amyloid-like protein aggregates. We analyzed insoluble nuclear pellet samples prepared under identical conditions as described in Figure 6B. To confirm the presence of p53 proteins, we employed the anti-p53 M19 antibody (Santa Cruz, Cat No. sc-1312) to detect bands corresponding to FLp53 and its isoforms Δ133p53 and Δ160p53. The monomer FLp53 was not detected (Figure 8, lower panel, Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137), which may be attributed to the lower binding affinity of the anti-p53 M19 antibody to it. These samples were also immunoprecipitated using the A11 antibody (Thermo Fischer Scientific, Cat No. AHB0052) to detect aggregated proteins. Interestingly, FLp53 and its isoforms, Δ133p53 and Δ160p53, were clearly visible with Anti-A11 antibody when co-expressed at a 1:5 ratio suggesting that they underwent co-aggregation. However, no FLp53 aggregates were observed when it was expressed alone (Author response image 2). These results support the conclusion in our manuscript that Δ133p53 and Δ160p53 drive FLp53 aggregation. 

      Author response image 2.

      Induction of FLp53 Aggregation by p53 Isoforms Δ133p53 and Δ160p53. H1299 cells transfected with the FLAG-tagged FLp53 and V5-tagged Δ133p53 or Δ160p53 at a 1:5 ratio. The cells were subjected to subcellular fractionation, and the resulting insoluble nuclear pellet was resuspended in RIPA buffer. The samples were heated at 95°C until the pellet was completely dissolved, and then analyzed by Western blotting. Immunoprecipitation was performed using the A11 antibody, which specifically recognizes amyloid protein aggregates, and the anti-p53 M19 antibody, which detects FLp53 as well as its isoforms Δ133p53 and Δ160p53. 

      (3) Fluorescence Microscopy: Live-cell fluorescence microscopy could be employed to enhance visualization by labeling FLp53 and the isoforms with different fluorescent markers (e.g., EGFP and mCherry tags).

      We appreciate the suggestion to use live-cell fluorescence microscopy with EGFP and mCherry tags for the visualization FLp53 and its isoforms. While we understand the advantages of live-cell imaging with EGFP / mCherry tags, we restrained us from doing such fusions as the GFP or corresponding protein tags are very big (~30 kDa) with respect to the p53 isoform variants (~30 kDa).  Other studies have shown that EGFP and mCherry fusions can alter protein oligomerization, solubility and aggregation(18, 19) Moreover, most fluorescence proteins are prone to dimerization (i.e. EGFP) or form obligate tetramers (DsRed)(20, 21, 22), potentially interfering with the oligomerization and aggregation properties of p53 isoforms, particularly Δ133p53 and Δ160p53.

      Instead, we utilized FLAG- or V5-tag-based immunofluorescence microscopy, a well-established and widely accepted method for visualizing p53 proteins. This method provided precise localization and reliable quantitative data, which we believe meet the needs of the current study. We believe our chosen method is both appropriate and sufficient for addressing the research question.

      Reviewer #2 (Significance):

      The manuscript by Zhao and colleagues presents a novel and compelling study on the p53 isoforms, Δ133p53 and Δ160p53, which are associated with aggressive cancer types. The main objective of the study was to understand how these isoforms exert a dominant negative effect on full-length p53 (FLp53). The authors discovered that the Δ133p53 and Δ160p53 proteins exhibit impaired binding to p53-regulated promoters. The data suggest that the predominant mechanism driving the dominant-negative effect is the coaggregation of FLp53 with Δ133p53 and Δ160p53.

      We sincerely thank the reviewer for the thoughtful and positive comments on our manuscript and for highlighting the significance of our findings on the p53 isoforms, Δ133p53 and Δ160p53. 

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this manuscript entitled "Δ133p53 and Δ160p53 isoforms of the tumor suppressor protein p53 exert dominant-negative effect primarily by coaggregation", the authors suggest that the Δ133p53 and Δ160p53 isoforms have high aggregation propensity and that by co-aggregating with canonical p53 (FLp53), they sequestrate it away from DNA thus exerting a dominantnegative effect over it.

      First, the authors should make it clear throughout the manuscript, including the title, that they are investigating Δ133p53α and Δ160p53α since there are 3 Δ133p53 isoforms (α, β, γ), and 3 Δ160p53 isoforms (α, β, γ).

      Thank you for your suggestion. We understand the importance of clearly specifying the isoforms under study. Following your suggestion, we have added α in the title, abstract, and introduction and added the following statement in the Introduction (lines 57-59): “For convenience and simplicity, we have written Δ133p53 and Δ160p53 to represent the α isoforms (Δ133p53α and Δ160p53α) throughout this manuscript.” 

      One concern is that the authors only consider and explore Δ133p53α and Δ160p53α isoforms as exclusively oncogenic and FLp53 dominant-negative while not discussing evidences of different activities. Indeed, other manuscripts have also shown that Δ133p53α is non-oncogenic and non-mutagenic, do not antagonize every single FLp53 functions and are sometimes associated with good prognosis. To cite a few examples:

      (1) Hofstetter G. et al. D133p53 is an independent prognostic marker in p53 mutant advanced serous ovarian cancer. Br. J. Cancer 2011, 105, 15931599.

      (2) Bischof, K. et al. Influence of p53 Isoform Expression on Survival in HighGrade Serous Ovarian Cancers. Sci. Rep. 2019, 9,5244.

      (3) Knezovi´c F. et al. The role of p53 isoforms' expression and p53 mutation status in renal cell cancer prognosis. Urol. Oncol. 2019, 37, 578.e1578.e10.

      (4) Gong, L. et al. p53 isoform D113p53/D133p53 promotes DNA doublestrand break repair to protect cell from death and senescence in response to DNA damage. Cell Res. 2015, 25, 351-369.

      (5) Gong, L. et al. p53 isoform D133p53 promotes efficiency of induced pluripotent stem cells and ensures genomic integrity during reprogramming. Sci. Rep. 2016, 6, 37281.

      (6) Horikawa, I. et al. D133p53 represses p53-inducible senescence genes and enhances the generation of human induced pluripotent stem cells. Cell Death Differ. 2017, 24, 1017-1028.

      (7) Gong, L. p53 coordinates with D133p53 isoform to promote cell survival under low-level oxidative stress. J. Mol. Cell Biol. 2016, 8, 88-90.

      Thank you very much for your comment and for highlighting these important studies. 

      We agree that Δ133p53 isoforms exhibit complex biological functions, with both oncogenic and non-oncogenic potentials. However, our mission here was primarily to reveal the molecular mechanism for the dominant-negative effects exerted by the Δ133p53α and Δ160p53α isoforms on FLp53 for which the Δ133p53α and Δ160p53α isoforms are suitable model systems. Exploring the oncogenic potential of the isoforms is beyond the scope of the current study and we have not claimed anywhere that we are reporting that. We have carefully revised the manuscript and replaced the respective terms e.g. ‘prooncogenic activity’ with ‘dominant-negative effect’ in relevant places (e.g. line 90). We have now also added a paragraph with suitable references that introduces the oncogenic and non-oncogenic roles of the p53 isoforms.

      After reviewing the papers you cited, we are not sure that they reflect on oncogenic /non-oncogenic role of the Δ133p53α isoform in different cancer cases.  Although our study is not about the oncogenic potential of the isoforms, we have summarized the key findings below:

      (1) Hofstetter et al., 2011: Demonstrated that Δ133p53α expression improved recurrence-free and overall survival (in a p53 mutant induced advanced serous ovarian cancer, suggesting a potential protective role in this context.

      (2) Bischof et al., 2019: Found that Δ133p53 mRNA can improve overall survival in high-grade serous ovarian cancers. However, out of 31 patients, only 5 belong to the TP53 wild-type group, while the others carry TP53 mutations.

      (3) Knezović et al., 2019: Reported downregulation of Δ133p53 in renal cell carcinoma tissues with wild-type p53 compared to normal adjacent tissue, indicating a potential non-oncogenic role, but not conclusively demonstrating it.

      (4) Gong et al., 2015: Showed that Δ133p53 antagonizes p53-mediated apoptosis and promotes DNA double-strand break repair by upregulating RAD51, LIG4, and RAD52 independently of FLp53.

      (5) Gong et al., 2016: Demonstrated that overexpression of Δ133p53 promotes efficiency of cell reprogramming by its anti-apoptotic function and promoting DNA DSB repair. The authors hypotheses that this mechanism is involved in increasing RAD51 foci formation and decrease γH2AX foci formation and chromosome aberrations in induced pluripotent stem (iPS) cells, independent of FL p53.

      (6) Horikawa et al., 2017: Indicated that induced pluripotent stem cells derived from fibroblasts that overexpress Δ133p53 formed noncancerous tumors in mice compared to induced pluripotent stem cells derived from fibroblasts with complete p53 inhibition. Thus, Δ133p53 overexpression is "non- or less oncogenic and mutagenic" compared to complete p53 inhibition, but it still compromises certain p53-mediated tumor-suppressing pathways. “Overexpressed Δ133p53 prevented FL-p53 from binding to the regulatory regions of p21WAF1 and miR-34a promoters, providing a mechanistic basis for its dominant-negative

      inhibition of a subset of p53 target genes.”

      (7) Gong, 2016: Suggested that Δ133p53 promotes cell survival under lowlevel oxidative stress, but its role under different stress conditions remains uncertain.

      We have revised the Introduction to provide a more balanced discussion of Δ133p53’s dule role (lines 62-73):

      “The Δ133p53 isoform exhibit complex biological functions, with both oncogenic and non-oncogenic potentials. Recent studies demonstrate the non-oncogenic yet context-dependent role of the Δ133p53 isoform in cancer development. Δ133p53 expression has been reported to correlate with improved survival in patients with TP53 mutations(23, 24), where it promotes cell survival in a nononcogenic manner(25, 26), especially under low oxidative stress(27). Alternatively, other recent evidences emphasize the notable oncogenic functions of Δ133p53 as it can inhibit p53-dependent apoptosis by directly interacting with the FLp53 (4, 6). The oncogenic function of the newly identified Δ160p53 isoform is less known, although it is associated with p53 mutation-driven tumorigenesis(28) and in melanoma cells’ aggressiveness(10). Whether or not the Δ160p53 isoform also impedes FLp53 function in a similar way as Δ133p53 is an open question. However, these p53 isoforms can certainly compromise p53-mediated tumor suppression by interfering with FLp53 binding to target genes such as p21 and miR-34a(2, 29) by dominant-negative effect, the exact mechanism is not known.” On the figures presented in this manuscript, I have three major concerns:

      (1) Most results in the manuscript rely on the overexpression of the FLAGtagged or V5-tagged isoforms. The validation of these construct entirely depends on Supplementary figure 3 which the authors claim "rules out the possibility that the FLAG epitope might contribute to this aggregation. However, I am not entirely convinced by that conclusion. Indeed, the ratio between the "regular" isoform and the aggregates is much higher in the FLAG-tagged constructs than in the V5-tagged constructs. We can visualize the aggregates easily in the FLAG-tagged experiment, but the imaging clearly had to be overexposed (given the white coloring demonstrating saturation of the main bands) to visualize them in the V5-tagged experiments. Therefore, I am not convinced that an effect of the FLAG-tag can be ruled out and more convincing data should be added. 

      Thank you for raising this important concern. We have carefully considered your comments and have made several revisions to clarify and strengthen our conclusions.

      First, to address the potential influence of the FLAG and V5 tags on p53 isoform aggregation, we have revised Figure 2 and removed the previous Supplementary Figure 3, where non-specific antibody bindings and higher molecular weight aggregates were not clearly interpretable. In the revised Figure 2, we have removed these potential aggregates, improving the clarity and accuracy of the data.

      To further rule out any tag-related artifacts, we conducted a coimmunoprecipitation assay with FLAG-tagged FLp53 and untagged Δ133p53 and Δ160p53 isoforms. The results (now shown in the new Supplementary Figure 3) completely agree with our previous result with FLAG-tagged and V5tagged Δ133p53 and Δ160p53 isoforms and show interaction between the partners. This indicates that the FLAG / V5-tags do not influence / interfere with the interaction between FLp53 and the isoforms. We have still used FLAGtagged FLp53 as the endogenous p53 was undetectable and the FLAG-tagged FLp53 did not aggregate alone. 

      In the revised paper, we added the following sentences (Lines 146-152): “To rule out the possibility that the observed interactions between FLp53 and its isoforms Δ133p53 and Δ160p53 were artifacts caused by the FLAG and V5 antibody epitope tags, we co-expressed FLAG-tagged FLp53 with untagged Δ133p53 and Δ160p53. Immunoprecipitation assays demonstrated that FLAGtagged FLp53 could indeed interact with the untagged Δ133p53 and Δ160p53 isoforms (Supplementary Figure 3, lanes 3 and 4), confirming formation of hetero-oligomers between FLp53 and its isoforms. These findings demonstrate that Δ133p53 and Δ160p53 can oligomerize with FLp53 and with each other.”

      Additionally, we performed subcellular fractionation experiments to compare the aggregation and localization of FLAG-tagged FLp53 when co-expressed either with V5-tagged or untagged Δ133p53/Δ160p53. In these experiments, the untagged isoforms also induced FLp53 aggregation, mirroring our previous results with the tagged isoforms (Supplementary Figure 5). We’ve added this result in the revised manuscript (lines 236-245): “To exclude the possibility that FLAG or V5 tags contribute to protein aggregation, we also conducted subcellular fractionation of H1299 cells expressing FLAG-tagged FLp53 along with untagged Δ133p53 or Δ160p53 at a 1:5 ratio. The results showed (Supplementary Figure 6) a similar distribution of FLp53 across cytoplasmic, nuclear, and insoluble nuclear fractions as in the case of tagged Δ133p53 or Δ160p53 (Figure 6A to D). Notably, the aggregation of untagged Δ133p53 or Δ160p53 markedly promoted the aggregation of FLAG-tagged FLp53 (Supplementary Figure 6B and D), demonstrating that the antibody epitope tags themselves do not contribute to protein aggregation.” 

      We’ve also discussed this in the Discussion section (lines 349-356): “In our study, we primarily utilized an overexpression strategy involving FLAG/V5tagged proteins to investigate the effects of p53 isoforms Δ133p53 and Δ160p53 on the function of FLp53. To address concerns regarding potential overexpression artifacts, we performed the co-immunoprecipitation (Supplementary Figure 6) and caspase-3 and -7 activity (Figure 7) experiments with untagged Δ133p53 and Δ160p53. In both experimental systems, the untagged proteins behaved very similarly to the FLAG/V5 antibody epitopecontaining proteins (Figures 6 and 7 and Supplementary Figure 6). Hence, the C-terminal tagging of FLp53 or its isoforms does not alter the biochemical and physiological functions of these proteins.”

      In summary, the revised data set and newly added experiments provide strong evidence that neither the FLAG nor the V5 tag contributes to the observed p53 isoform aggregation.

      (2) The authors demonstrate that to visualize the dominant-negative effect, Δ133p53α and Δ160p53α must be "present in a higher proportion than FLp53 in the tetramer" and the need at least a transfection ratio 1:5 since the 1:1 ration shows no effect. However, in almost every single cell type, FLp53 is far more expressed than the isoforms which make it very unlikely to reach such stoichiometry in physiological conditions and make me wonder if this mechanism naturally occurs at endogenous level. This limitation should be at least discussed.

      Thank you for your insightful comment. However, evidence suggests that the expression levels of these isoforms such as Δ133p53, can be significantly elevated relative to FLp53 in certain physiological conditions(3, 4, 9). For example, in some breast tumors, with Δ133p53 mRNA is expressed at a much levels than FLp53, suggesting a distinct expression profile of p53 isoforms compared to normal breast tissue(4). Similarly, in non-small cell lung cancer and the A549 lung cancer cell line, the expression level of Δ133p53 transcript is significantly elevated compared to non-cancerous cells(3). Moreover, in specific cholangiocarcinoma cell lines, the Δ133p53 /TAp53 expression ratio has been reported to increase to as high as 3:1(9). These observations indicate that the dominant-negative effect of isoform Δ133p53 on FLp53 can occur under certain pathological conditions where the relative amounts of the FLp53 and the isoforms would largely vary. Since data on the Δ160p53 isoform are scarce, we infer that the long N-terminal truncated isoforms may share a similar mechanism.

      (3) Figure 5C: I am concerned by the subcellular location of the Δ133p53α and Δ160p53α as they are commonly considered nuclear and not cytoplasmic as shown here, particularly since they retain the 3 nuclear localization sequences like the FLp53 (Bourdon JC et al. 2005; Mondal A et al. 2018; Horikawa I et al, 2017; Joruiz S. et al, 2024). However, Δ133p53α can form cytoplasmic speckles (Horikawa I et al, 2017) when it colocalizes with autophagy markers for its degradation.

      The authors should discuss this issue. Could this discrepancy be due to the high overexpression level of these isoforms? A co-staining with autophagy markers (p62, LC3B) would rule out (or confirm) activation of autophagy due to the overwhelming expression of the isoform.

      Thank you for your thoughtful comments. We have thoroughly reviewed all the papers you recommended (Bourdon JC et al., 2005; Mondal A et al., 2018; Horikawa I et al., 2017; Joruiz S. et al., 2024)(4, 29, 30, 31). Among these, only the study by Bourdon JC et al. (2005) provided data regarding the localization of Δ133p53(4). Interestingly, their findings align with our observations, indicating that the protein does not exhibit predominantly nuclear localization in the Figure 8 from Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137. The discrepancy may be caused by a potentially confusing statement in that paper(4).

      The localization of p53 is governed by multiple factors, including its nuclear import and export(32). The isoforms Δ133p53 and Δ160p53 contain three nuclear localization sequences (NLS)(4). However, the isoforms Δ133p53 and Δ160p53 were potentially trapped in the cytoplasm by aggregation and masking the NLS. This mechanism would prevent nuclear import. 

      Further, we acknowledge that Δ133p53 co-aggregates with autophagy substrate p62/SQSTM1 and autophagosome component LC3B in cytoplasm by autophagic degradation during replicative senescence(33). We agree that high overexpression of these aggregation-prone proteins may induce endoplasmic reticulum (ER) stress and activates autophagy(34). This could explain the cytoplasmic localization in our experiments. However, it is also critical to consider that we observed aggregates in both the cytoplasm and the nucleus (Figures 6B and E and Supplementary Figure 6B). While cytoplasmic localization may involve autophagy-related mechanisms, the nuclear aggregates likely arise from intrinsic isoform properties, such as altered protein folding, independent of autophagy. These dual localizations reflect the complex behavior of Δ133p53 and Δ160p53 isoforms under our experimental conditions.

      In the revised manuscript, we discussed this in Discussion (lines 328-335): “Moreover, the observed cytoplasmic isoform aggregates may reflect autophagy-related degradation, as suggested by the co-localization of Δ133p53 with autophagy substrate p62/SQSTM1 and autophagosome component LC3B(33). High overexpression of these aggregation-prone proteins could induce endoplasmic reticulum stress and activate autophagy(34). Interestingly, we also observed nuclear aggregation of these isoforms (Figure 6B and E and Supplementary Figure 6B), suggesting that distinct mechanisms, such as intrinsic properties of the isoforms, may govern their localization and behavior within the nucleus. This dual localization underscores the complexity of Δ133p53 and Δ160p53 behavior in cellular systems.”

      Minor concerns:

      -  Figure 1A: the initiation of the "Δ140p53" is shown instead of "Δ40p53"

      Thank you! The revised Figure 1A has been created in the revised paper.

      -  Figure 2A: I would like to see the images cropped a bit higher, so the cut does not happen just above the aggregate bands

      Thank you for this suggestion. We’ve changed the image and the new Figure 2 has been shown in the revised paper.

      -  Figure 3C: what ratio of FLp53/Delta isoform was used?

      We have added the ratio in the figure legend of Figure 3C (lines 845-846) “Relative DNA-binding of the FLp53-FLAG protein to the p53-target gene promoters in the presence of the V5-tagged protein Δ133p53 or Δ160p53 at a 1: 1 ratio.”

      -  Figure 3C suggests that the "dominant-negative" effect is mostly senescencespecific as it does not affect apoptosis target genes, which is consistent with Horikawa et al, 2017 and Gong et al, 2016 cited above. Furthermore, since these two references and the others from Gong et al. show that Δ133p53α increases DNA repair genes, it would be interesting to look at RAD51, RAD52 or Lig4, and maybe also induce stress.

      Thank you for your thoughtful comments and suggestions. In Figure 3C, the presence of Δ133p53 or Δ160p53 only significantly reduced the binding of FLp53 to the p21 promoter. However, isoforms Δ133p53 and Δ160p53 demonstrated a significant loss of DNA-binding activity at all four promoters: p21, MDM2, and apoptosis target genes BAX and PUMA (Figure 3B). This result suggests that Δ133p53 and Δ160p53 have the potential to influence FLp53 function due to their ability to form hetero-oligomers with FLp53 or their intrinsic tendency to aggregate. To further investigate this, we increased the isoform to FLp53 ratio in Figure 4, which demonstrate that the isoforms Δ133p53 and Δ160p53 exert dominant-negative effects on the function of FLp53. 

      These results demonstrate that the isoforms can compromise p53-mediated pathways, consistent with Horikawa et al. (2017), which showed that Δ133p53α overexpression is "non- or less oncogenic and mutagenic" compared to complete p53 inhibition, but still affects specific tumor-suppressing pathways. Furthermore, as noted by Gong et al. (2016), Δ133p53’s anti-apoptotic function under certain conditions is independent of FLp53 and unrelated to its dominantnegative effects.

      We appreciate your suggestion to investigate DNA repair genes such as RAD51, RAD52, or Lig4, especially under stress conditions. While these targets are intriguing and relevant, we believe that our current investigation of p53 targets in this manuscript sufficiently supports our conclusions regarding the dominant-negative effect. Further exploration of additional p53 target genes, including those involved in DNA repair, will be an important focus of our future studies.

      - Figure 5A and B: directly comparing the level of FLp53 expressed in cytoplasm or nucleus to the level of Δ133p53α and Δ160p53α expressed in cytoplasm or nucleus does not mean much since these are overexpressed proteins and therefore depend on the level of expression. The authors should rather compare the ratio of cytoplasmic/nuclear FLp53 to the ratio of cytoplasmic/nuclear Δ133p53α and Δ160p53α.

      Thank you very much for this valuable suggestion. In the revised paper, Figure 5B has been recreated.  Changes have been made in lines 214215: “The cytoplasm-to-nucleus ratio of Δ133p53 and Δ160p53 was approximately 1.5-fold higher than that of FLp53 (Figure 5B).” 

      Referees cross-commenting

      I agree that the system needs to be improved to be more physiological.

      Just to precise, the D133 and D160 isoforms are not truncated mutants, they are naturally occurring isoforms expressed in almost every normal human cell type from an internal promoter within the TP53 gene.

      Using overexpression always raises concerns, but in this case, I am even more careful because the isoforms are almost always less expressed than the FLp53, and here they have to push it 5 to 10 times more expressed than the FLp53 to see the effect which make me fear an artifact effect due to the overwhelming overexpression (which even seems to change the normal localization of the protein).

      To visualize the endogenous proteins, they will have to change cell line as the H1299 they used are p53 null.

      Thank you for these comments. We’ve addressed the motivation of overexpression in the above responses. We needed to use the plasmid constructs in the p53-null cells to detect the proteins but the expression level was certainly not ‘overwhelmingly high’. 

      First, we tried the A549 cells (p53 wild-type) under DNA damage conditions, but the endogenous p53 protein was undetectable. Second, several studies reported increased Δ133p53 level compared to wild-type p53 and that it has implications in tumor development(2, 3, 4, 9). Third, the apoptosis activity of H1299 cells overexpressing p53 proteins was analyzed in the revised manuscript (Figure 7). The apoptotic activity induced by FLp53 expression was approximately 2.5 times higher than that of the control vector under identical plasmid DNA transfection conditions (Figure 7). These results rule out the possibility that the plasmid-based expression of p53 and its isoforms introduced artifacts in the results. We’ve discussed this in the Results section (lines 254269).

      Reviewer #3 (Significance):

      Overall, the paper is interesting particularly considering the range of techniques used which is the main strength.

      The main limitation to me is the lack of contradictory discussion as all argumentation presents Δ133p53α and Δ160p53α exclusively as oncogenic and strictly FLp53 dominant-negative when, particularly for Δ133p53α, a quite extensive literature suggests a not so clear-cut activity.

      The aggregation mechanism is reported for the first time for Δ133p53α and Δ160p53α, although it was already published for Δ40p53α, Δ133p53β or in mutant p53.

      This manuscript would be a good basic research addition to the p53 field to provide insight in the mechanism for some activities of some p53 isoforms.

      My field of expertise is the p53 isoforms which I have been working on for 11 years in cancer and neuro-degenerative diseases

      Thank you very much for your positive and critical comments. We’ve included a fair discussion on the oncogenic and non-oncogenic function of Δ133p53 in the Introduction following your suggestion (lines 62-73). 

      References

      (1) Pitolli C, Wang Y, Candi E, Shi Y, Melino G, Amelio I. p53-Mediated Tumor Suppression: DNA-Damage Response and Alternative Mechanisms. Cancers 11,  (2019).

      (2) Fujita K, et al. p53 isoforms Delta133p53 and p53beta are endogenous regulators of replicative cellular senescence. Nature cell biology 11, 1135-1142 (2009).

      (3) Fragou A, et al. Increased Δ133p53 mRNA in lung carcinoma corresponds with reduction of p21 expression. Molecular medicine reports 15, 1455-1460 (2017).

      (4) Bourdon JC, et al. p53 isoforms can regulate p53 transcriptional activity. Genes & development 19, 2122-2137 (2005).

      (5) Ghosh A, Stewart D, Matlashewski G. Regulation of human p53 activity and cell localization by alternative splicing. Molecular and cellular biology 24, 7987-7997 (2004).

      (6) Aoubala M, et al. p53 directly transactivates Δ133p53α, regulating cell fate outcome in response to DNA damage. Cell death and differentiation 18, 248-258 (2011).

      (7) Marcel V, et al. p53 regulates the transcription of its Delta133p53 isoform through specific response elements contained within the TP53 P2 internal promoter. Oncogene 29, 2691-2700 (2010).

      (8) Zhao L, Sanyal S. p53 Isoforms as Cancer Biomarkers and Therapeutic Targets. Cancers 14,  (2022).

      (9) Nutthasirikul N, Limpaiboon T, Leelayuwat C, Patrakitkomjorn S, Jearanaikoon P. Ratio disruption of the ∆133p53 and TAp53 isoform equilibrium correlates with poor clinical outcome in intrahepatic cholangiocarcinoma. International journal of oncology 42, 1181-1188 (2013).

      (10) Tadijan A, et al. Altered Expression of Shorter p53 Family Isoforms Can Impact Melanoma Aggressiveness. Cancers 13,  (2021).

      (11) Aubrey BJ, Kelly GL, Janic A, Herold MJ, Strasser A. How does p53 induce apoptosis and how does this relate to p53-mediated tumour suppression? Cell death and differentiation 25, 104-113 (2018).

      (12) Ghorbani N, Yaghubi R, Davoodi J, Pahlavan S. How does caspases regulation play role in cell decisions? apoptosis and beyond. Molecular and cellular biochemistry 479, 1599-1613 (2024).

      (13) Petronilho EC, et al. Oncogenic p53 triggers amyloid aggregation of p63 and p73 liquid droplets. Communications chemistry 7, 207 (2024).

      (14) Forget KJ, Tremblay G, Roucou X. p53 Aggregates penetrate cells and induce the coaggregation of intracellular p53. PloS one 8, e69242 (2013).

      (15) Farmer KM, Ghag G, Puangmalai N, Montalbano M, Bhatt N, Kayed R. P53 aggregation, interactions with tau, and impaired DNA damage response in Alzheimer's disease. Acta neuropathologica communications 8, 132 (2020).

      (16) Arsic N, et al. Δ133p53β isoform pro-invasive activity is regulated through an aggregation-dependent mechanism in cancer cells. Nature communications 12, 5463 (2021).

      (17) Melo Dos Santos N, et al. Loss of the p53 transactivation domain results in high amyloid aggregation of the Δ40p53 isoform in endometrial carcinoma cells. The Journal of biological chemistry 294, 9430-9439 (2019).

      (18) Mestrom L, et al. Artificial Fusion of mCherry Enhances Trehalose Transferase Solubility and Stability. Applied and environmental microbiology 85,  (2019).

      (19) Kaba SA, Nene V, Musoke AJ, Vlak JM, van Oers MM. Fusion to green fluorescent protein improves expression levels of Theileria parva sporozoite surface antigen p67 in insect cells. Parasitology 125, 497-505 (2002).

      (20) Snapp EL, et al. Formation of stacked ER cisternae by low affinity protein interactions. The Journal of cell biology 163, 257-269 (2003).

      (21) Jain RK, Joyce PB, Molinete M, Halban PA, Gorr SU. Oligomerization of green fluorescent protein in the secretory pathway of endocrine cells. The Biochemical journal 360, 645-649 (2001).

      (22) Campbell RE, et al. A monomeric red fluorescent protein. Proceedings of the National Academy of Sciences of the United States of America 99, 7877-7882 (2002).

      (23) Hofstetter G, et al. Δ133p53 is an independent prognostic marker in p53 mutant advanced serous ovarian cancer. British journal of cancer 105, 1593-1599 (2011).

      (24) Bischof K, et al. Influence of p53 Isoform Expression on Survival in High-Grade Serous Ovarian Cancers. Scientific reports 9, 5244 (2019).

      (25) Gong L, et al. p53 isoform Δ113p53/Δ133p53 promotes DNA double-strand break repair to protect cell from death and senescence in response to DNA damage. Cell research 25, 351-369 (2015).

      (26) Gong L, et al. p53 isoform Δ133p53 promotes efficiency of induced pluripotent stem cells and ensures genomic integrity during reprogramming. Scientific reports 6, 37281 (2016).

      (27) Gong L, Pan X, Yuan ZM, Peng J, Chen J. p53 coordinates with Δ133p53 isoform to promote cell survival under low-level oxidative stress. Journal of molecular cell biology 8, 88-90 (2016).

      (28) Candeias MM, Hagiwara M, Matsuda M. Cancer-specific mutations in p53 induce the translation of Δ160p53 promoting tumorigenesis. EMBO reports 17, 1542-1551 (2016).

      (29) Horikawa I, et al. Δ133p53 represses p53-inducible senescence genes and enhances the generation of human induced pluripotent stem cells. Cell death and differentiation 24, 1017-1028 (2017).

      (30) Mondal AM, et al. Δ133p53α, a natural p53 isoform, contributes to conditional reprogramming and long-term proliferation of primary epithelial cells. Cell death & disease 9, 750 (2018).

      (31) Joruiz SM, Von Muhlinen N, Horikawa I, Gilbert MR, Harris CC. Distinct functions of wild-type and R273H mutant Δ133p53α differentially regulate glioblastoma aggressiveness and therapy-induced senescence. Cell death & disease 15, 454 (2024).

      (32) O'Brate A, Giannakakou P. The importance of p53 location: nuclear or cytoplasmic zip code? Drug resistance updates : reviews and commentaries in antimicrobial and anticancer chemotherapy 6, 313-322 (2003).

      (33) Horikawa I, et al. Autophagic degradation of the inhibitory p53 isoform Δ133p53α as a regulatory mechanism for p53-mediated senescence. Nature communications 5, 4706 (2014).

      (34) Lee H, et al. IRE1 plays an essential role in ER stress-mediated aggregation of mutant huntingtin via the inhibition of autophagy flux. Human molecular genetics 21, 101-114 (2012).

    1. eLife Assessment

      This important study advances our understanding of maladaptive innate immune training. The experimental evidence supporting the conclusions is convincing and the expert reviewers strongly endorse the manuscript. The work will be of high interest to both researchers in the trained immunity field and clinician scientists.

    2. Reviewer #1 (Public review):

      Summary:

      The concept that trained immunity, as defined, can be beneficial to subsequent immune challenges is important in the broad context of health and disease. The significance of this manuscript is the finding that trained immunity is actually a two-edged sword, herein, detrimental in the context of LPS-induced Acute Lung Injury that is mediated by AMs.

      Strengths:

      Several lines of evidence in different mouse models support this conclusion. The postulation that differences in immune responses in individuals is linked to differences in the mycobiome and consequent B-glucan makeup is provocative.

      Weaknesses:

      However, the findings that the authors state are relevant to sepsis are actually confined to a specific lung injury model and not classically-defined sepsis, the ontogeny of the reprogrammed AMs is uncertain, and links in the proposed signaling pathways need to be strengthened.

      Comments on the latest version:

      The manuscript is improved with further clarifications and additional experimentation. My prior concerns are addressed.

    3. Reviewer #2 (Public review):

      Summary:

      Prével et al. present an in vivo study in which they reveal an interesting aspect of β-glucan, a known inducer of enhanced immune responses termed trained immunity in sterile inflammation. The authors can show that β-glucan's can reprogram alveolar macrophages (AMs) in the lungs through neutrophils and IFNγ signaling and independent of Dectin1. This reprogramming occurs at both transcriptional and metabolic levels. After β-glucan training, LPS induced sterile inflammation exacerbated acute lung injury via enhanced immunopathology. These findings highlight a new aspect of β-glucan's role in trained immunity and its potential detrimental effects when enhanced pathogen clearance is not required.

      Strengths:

      - This manuscript is well-written and effectively conveys its message.

      - The authors provide important evidence that β-glucan training is not solely beneficial but depending on the context can also enhance immunopathology. This will be important to the field for two reasons. It shows again that trained immunity can also be harmful. Jentho et al. 2021 had already provided further evidence for this aspect. And it highlights anew that LPS application is an insufficient infection model.

      Original weaknesses noted:

      - Only a little physiological data from the in vivo models is provided.

      - Effects in histology appear to be rather weak.

      Comments on latest version:

      The authors have revised the new version according to my suggestions or responded in a sufficient manner to my requests, with one exception. I recommend to rename TNF as explained by Grimstad in JAMA Dermatol. 2016;152(5):557.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The concept that trained immunity, as defined, can be beneficial to subsequent immune challenges is important in the broad context of health and disease. The significance of this manuscript is the finding that trained immunity is actually a two-edged sword, herein, detrimental in the context of LPS-induced Acute Lung Injury that is mediated by AMs.

      Strengths:

      Several lines of evidence in different mouse models support this conclusion. The postulation that differences in immune responses in individuals are linked to differences in the mycobiome and consequent B-glucan makeup is provocative.

      Weaknesses:

      The findings that the authors state are relevant to sepsis, are actually confined to a specific lung injury model and not classically-defined sepsis. In addition, the ontogeny of the reprogrammed AMs is uncertain. Links in the proposed signaling pathways need to be strengthened.

      Reviewer #2 (Public review):

      Summary:

      Prével et al. present an in vivo study in which they reveal an interesting aspect of β-glucan, a known inducer of enhanced immune responses termed trained immunity in sterile inflammation. The authors can show, that β-glucan's can reprogram alveolar macrophages (AMs) in the lungs through neutrophils and IFNγ signaling and independent of Dectin1. This reprogramming occurs at both transcriptional and metabolic levels. After β-glucan training, LPS-induced sterile inflammation exacerbated acute lung injury via enhanced immunopathology. These findings highlight a new aspect of β-glucan's role in trained immunity and its potential detrimental effects when enhanced pathogen clearance is not required.

      Strengths:

      (1) This manuscript is well-written and effectively conveys its message.

      (2) The authors provide important evidence that β-glucan training is not solely beneficial, but depending on the context can also enhance immunopathology. This will be important to the field for two reasons. It shows again, that trained immunity can also be harmful. Jentho et al. 2021 have already provided further evidence for this aspect. And it highlights anew that LPS application is an insufficient infection model.

      Weaknesses:

      (1) Only a little physiological data is provided by the in vivo models.

      (2) The effects in histology appear to be rather weak.

      Reviewer #1 (Recommendations for the authors):

      The opening paragraph in the introduction focuses on sepsis. This is misleading since this manuscript does not address sepsis but rather intranasal-administered LPS-induced acute lung injury.

      We are in total agreement with the reviewer and have modified the introduction to focus on acute lung injury with clinical relevance more associated to TLR4-mediated acute lung injury and lung inflammation.

      The authors make definitive statements that AMs originate from fetal liver monocytes. However, it is well known that the ontogeny of AMs is complex and AMs can be populated, in part, from peripheral monocytes. The ontogeny of reprogrammed AMs was not addressed in this study but they may come from monocyte-derived AMs following B-glucan training (transfer of AMs into Csf2rb KO mice does not prove the contrary). In this regard, do, for example, the percentages of CD11b+ AMs change? More phenotyping of the control and reprogrammed AMs would enhance the interpretation of the findings.

      The reviewer is correct that the ontogeny of AMs can be heterogenous, especially following a pulmonary challenge. In β-glucan-treated mice, Figure 1I shows no changes in frequency or number of AMs in the BAL. As the reviewer suggested, we repeated this experiment and incorporate more markers for AMs. New Supplementary Figure 1C shows the expression of CD11b on AMs (CD11c<sup>+</sup>SiglecF<sup>+</sup>) from control and β-glucan-treated mice. While the frequency increases with LPS administration, we show no difference between control and β-glucan groups suggesting β-glucan does not induce the expansion of monocyte-derived AMs. Additionally, in New Supplementary Figure 1D, we show the expression of AM-associated markers in order to better delineate their phenotype. We observed no differences in MHCII, CD169, CD64 and F4/80 in β-glucan-treated mice, but an increase in CD80<SUP>+</SUP> AMs following βglucan suggesting enhanced activation corroborating their proinflammatory phenotype. Collectively, these data indicate that while the frequency and number of either yolk-sac or BMderived AMs are unchanged in the β-glucan treated mice, the activation of AMs is enhanced after the systemic treatment with β-glucan.

      The abstract seems to overpromise a bit. First, it mentions trained immunity and HSCs, but they don't seem to formally address either in the context of this model (there is reprogramming as assessed by transcriptome and metabolic analyses which is suggestive as stated by the authors, but do the changes overlap significantly with classically trained immunity?), and second, it links phenotypes together in a pathway(s) that they haven't actually interrogated - although they look at transcripts and do a seahorse assay they don't actually confirm that any of those findings are related to the increased response to LPS in vivo. The long discussion with all the caveats highlights these limitations, all relegated to future studies.

      We thank the reviewer for this comment. In response, we have revised the abstract to more accurately highlight the key findings of this study. Specifically, we introduced the concept of central trained immunity to describe the phenomena commonly observed with β-glucan treatment, contrasting it with the peripheral trained immunity detailed in the manuscript.

      The use of Csf2rb-/- mice to complement the clodronate approach is interesting (this approach has been used in the past with influenza virus). In addition to lacking AMs, these mice develop pulmonary alveolar proteinosis. Do the authors have histopathology from these mice in the current model? They mention PAP in the discussion.

      Pulmonary alveolar proteinosis (PAP) typically develops in Csf2b-/- mice from 12 weeks of age onwards (Stanley et al., Proc Natl Acad Sci USA, 1994). However, in our model, mice were euthanized at 6 weeks, ensuring that pulmonary function and structure remained intact. A hallmark of PAP is the accumulation of protein, primarily surfactant, in BAL. To investigate this, we measured BAL protein concentration and observed no differences at baseline (Figure 2F). These findings were further supported by the absence of differences in BAL proinflammatory cytokine concentrations (Figure 2H).

      A question about their BAL technique? In the control mice without glucan/LPS stimulation, only 40% of BAL cells are AMs [and the total number of AMs (range of <103 to 2-3 x 104) is at least 5-fold lower than typically seen in BALs from healthy mice (105), and there didn't seem to be many PMNs either. Are 60% of the BAL cells lymphocytes/ RBCs? Is it possible that overall AM numbers are changing, but CD11c/SiglecF-positive cell numbers stay the same (only assessed 2 markers)? More phenotyping would help.

      We appreciate the reviewer’s comment and would like to clarify that alveolar macrophages (AMs) are presented in the manuscript as a frequency of viable cells rather than as a frequency of CD45<SUP>+</SUP> cells, to ensure consistency throughout the study. The remaining cells in the samples are likely epithelial cells and lymphocytes, as red blood cells are lysed during sample processing. For additional context, we now provide data showing AMs as a percentage of CD45<SUP>+</SUP> cells, which account for 80–90% of leukocytes. Furthermore, in New Supplementary Figure 1D, we highlight the expression of AM-associated markers to better define their phenotype. We observed no differences in MHCII, CD169, CD64, or F4/80 expression in βglucan-treated mice. However, there was an increase in CD80<SUP>+</SUP> AMs, indicating enhanced activation and corroborating their proinflammatory phenotype.

      Author response image 1.

      AMs as percentage of CD45<SUP>+</SUP> cells. Mice were treated with β-glucan for seven days. We show CD11c<sup>+</sup>SiglecF<sup>+</sup> cells in the bronchoalveolar lavage (BAL) as a percentage of CD45<SUP>+</SUP> cells (n=5).

      Line 130-131. TNF is decreased and not pointed out.

      In the poly(I:C) model, the difference in the BAL TNF concentration is not statistically different between naïve and trained mice due to high variability of data. The reviewer is correct that TNFα does not appear to reflect Poly(I:C)-mediated ALI. We have included this point in the revised manuscript (Line 146-148).

      Reviewer #2 (Recommendations for the authors):

      Suggestions:

      (1) The authors provide evidence for enhanced ALI via different techniques, e.g. histology, vascular leakage, immune cell composition in BAL etc. It would be interesting to see whether there were any changes in the disease severity of ALI. If possible the authors could provide data for survival, temperature, weight, and/or glucose in the different groups.

      Mice are extremely resistant to the pulmonary LPS model. We have previously assessed lethality of our LPS model, and all mice survive even with an increased intranasal dose of LPS 200μg (Pernet et al, Nature, 2023). To address the reviewer concerns, we next assessed the morbidity by monitoring weight loss following LPS challenge and showed β-glucan-treated mice exhibit a delayed recovery time after 4 days LPS treatment (New Supplementary Figure 1B).

      (2) The authors show that ß-glucan mediated training enhances ALI. Conversely, the opposite, decreased immunopathology should be observed in case an LPS tolerance model would be used. I am wondering whether this has already been performed, given that the (LPS/immune)tolerance field is already older than the training field. If not, I suggest incorporating this feature in their discussion.

      Thank you for this insightful comment. While LPS has long been recognized to induce tolerance, studies have also shown that intranasal exposure to ambient levels of LPS can induce alveolar macrophage (AM) training via type I interferon signaling (Zahalka et al., Mucosal Immunol, 2022). In contrast, Mason et al. demonstrated that systemic LPS stimulation induces tolerance through TNF-α signaling, resulting in diminished AM phagocytosis and superoxide production. This leads to reduced neutrophil recruitment and impaired bacterial clearance in a Pseudomonas aeruginosa pneumonia model (J Infect Dis, 1997). Furthermore, we recently reported that systemic administration of β-glucan induces central trained immunity, generating a distinct subset of regulatory neutrophils that promote disease tolerance against influenza viral infection (Khan et al., Nat Immunol, 2025). These findings highlight the complex and context-dependent interplay between training and tolerance. We have expanded on this point in the discussion section of the revised manuscript (Lines 289-297).

      (3) The finding that trained immunity can exert not only beneficial effects but also enhance immunopathology is interesting and should be further explored. Already Jentho et al. (PNAS 2021) have shown that upon sterile inflammation as imposed by LPS, (heme) training can lead to enhanced mortality. This might be a relevant trade-off in trained immunity since no beneficial resistance effect by pathogen killing can be obtained. It would be interesting to see, in their model, whether heme would also enhance ALI after intranasal LPS application. Or at least, can the authors discuss this finding more, also in relation to the already published evidence?

      Thank you for raising this interesting point, which is indeed relevant to our study. Jentho et al. demonstrated that training by heme can be beneficial in combating infectious challenges but can have deleterious effects in the context of sterile inflammation. The concept of endogenous training agents like heme, with their diverse effects on immune cells, aligns well with our βglucan model, particularly given the high prevalence of fungal agents in the microbiome.

      While investigating the effects of heme on alveolar macrophages would certainly be intriguing, Jentho and colleagues have already reported the maladaptive effects of heme, such as tissue damage, during sterile LPS-induced inflammation. As such, these findings might be redundant in the context of our model. However, we have drawn a relevant parallel and expanded on this discussion in the revised manuscript (Lines 382-385).

      (4) It is not clear how the histologies were evaluated. This is a field of great subjectivity. The authors should describe it in more detail. The best option would have been a blinded observer. Was this done?

      Histology samples were evaluated according to ATS 2011 guidelines regarding “Features and measurements of experimental acute lung injury in animals” by a blinded pathologist. We have specified this in the methods of the revised manuscript.

      Minor:

      (1) Line 108 and ff. Please change TNF, not TNFa

      Since we used an ELISA specific for TNF-α rather than general TNF, it is more accurate to refer to it as TNF-α.

      (2) Line 513 and ff. Please use Greek letters when appropriate, e.g. IFN-γ not IFNg.

      Thank you for pointing out these mistakes, we rectified these in the text.

    1. eLife Assessment

      In this preregistered study, Kunkel and colleagues set out to compare the magnitude and duration of placebo versus nocebo effects in healthy volunteers, and also to examine the different factors contributing to these effects. The authors follow a rigorous methodology in a within-subjects design, taking into consideration standard conventions for manipulation of expectations, and using an appropriate sham condition. They present compelling evidence of long-lasting placebo and nocebo effects, with nocebo responses demonstrating consistently greater strength. These valuable results have the potential for a great impact in the field of experimental and clinical pain.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents a study on expectation manipulation to induce placebo and nocebo effects in healthy participants. The study follows standard placebo experiment conventions with the use of TENS stimulation as the placebo manipulation. The authors were able to achieve their aims. A key finding is that placebo and nocebo effects were predicted by recent experience, which is a novel contribution to the literature. The findings provide insights into the differences between placebo and nocebo effects and the potential moderators of these effects.

      Specifically, the study aimed to:

      (1) assess the magnitude of placebo and nocebo effects immediately after induction through verbal instructions and conditioning<br /> (2) examine the persistence of these effects one week later, and<br /> (3) identify predictors of sustained placebo and nocebo responses over time.

      Strengths:

      An innovation was to use sham TENS stimulation as the expectation manipulation. This expectation manipulation was reinforced not only by the change in pain stimulus intensity, but also by delivery of non-painful electrical stimulation, labelled as TENS stimulation.

      Questionnaire-based treatment expectation ratings were collected before conditioning and after conditioning, and after the test session, which provided an explicit measure of participants' expectations about the manipulation.

      The finding that placebo and nocebo effects are influenced by recent experience provides a novel insight into a potential moderator of individual placebo effects.

      Weaknesses:

      There are a limited number of trials per test condition (10), which means that the trajectory of responses to the manipulation may not be adequately explored.

      On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60, and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation. There is a potential risk of revealing the manipulation to participants during the re-familiarization process, as they were not previously briefed to expect the painful stimulus intensity to vary without the application of sham TENS stimulation.

      The differences between the nocebo and control conditions in pain ratings during conditioning could be explained by the differing physiological effects of the different stimulus intensities, so it is difficult to make any claims about expectation effects here.

      A randomisation error meant that 25 participants received an unbalanced number of 448 trials per condition (i.e., 10 x VAS 40, 14 x VAS 60, 12 x VAS 80).

    3. Reviewer #2 (Public review):

      Summary:

      Kunkel et al aim to answer a fundamental question: Do placebo and nocebo effects differ in magnitude or longevity? To address this question, they used a powerful within-participants design, with a very large sample size (n=104), in which they compared placebo and nocebo effects - within the same individuals - across verbal expectations, conditioning, testing phase, and a 1-week follow-up. With elegant analyses, they establish that different mechanisms underlie the learning of placebo vs nocebo effects, with the latter being acquired faster and extinguished slower. This is an important finding for both the basic understanding of learning mechanisms in humans and for potential clinical applications to improve human health.

      Strengths:

      Beyond the above - the paper is well-written and very clear. It lays out nicely the need for the current investigation and what implications it holds. The design is elegant, and the analyses are rich, thoughtful, and interesting. The sample size is large which is highly appreciated, considering the longitudinal, in-lab study design. The question is super important and well-investigated, and the entire manuscript is very thoughtful with analyses closely examining the underlying mechanisms of placebo versus nocebo effects.

      Weaknesses:

      There were two highly addressable weaknesses in my opinion:

      (1) I could not find the preregistration - this is crucial to verify what analyses the authors have committed to prior to writing the manuscript. Please provide a link leading directly to the preregistration - searching for the specified number in the suggested website yielded no results.

      (2) There is a recurring issue which is easy to address: because the Methods are located after the Results, many of the constructs used, analyses conducted, and even the main placebo and nocebo inductions are unclear, making it hard to appreciate the results in full. I recommend finding a way to detail at the beginning of the results section how placebo and nocebo effects have been induced. While my background means I am familiar with these methods, other readers will lack that knowledge. Even a short paragraph or a figure (like Figure 4) could help clarify the results substantially. For example, a significant portion of the results is devoted to the conditioning part of the experiment, while it is unknown which part was involved (e.g., were temperatures lowered/increased in all trials or only in the beginning).

    4. Author response:

      Public Reviews:  

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a study on expectation manipulation to induce placebo and nocebo effects in healthy participants. The study follows standard placebo experiment conventions with the use of TENS stimulation as the placebo manipulation. The authors were able to achieve their aims. A key finding is that placebo and nocebo effects were predicted by recent experience, which is a novel contribution to the literature. The findings provide insights into the differences between placebo and nocebo effects and the potential moderators of these effects.

      Specifically, the study aimed to:

      (1) assess the magnitude of placebo and nocebo effects immediately after induction through verbal instructions and conditioning

      (2) examine the persistence of these effects one week later, and

      (3) identify predictors of sustained placebo and nocebo responses over time.

      Strengths:

      An innovation was to use sham TENS stimulation as the expectation manipulation. This expectation manipulation was reinforced not only by the change in pain stimulus intensity, but also by delivery of non-painful electrical stimulation, labelled as TENS stimulation.

      Questionnaire-based treatment expectation ratings were collected before conditioning and after conditioning, and after the test session, which provided an explicit measure of participants' expectations about the manipulation.

      The finding that placebo and nocebo effects are influenced by recent experience provides a novel insight into a potential moderator of individual placebo effects.

      We thank the reviewer for their thorough evaluation of our manuscript and for highlighting the novelty and originality of our study.

      Weaknesses:

      There are a limited number of trials per test condition (10), which means that the trajectory of responses to the manipulation may not be adequately explored.

      We appreciate the reviewer’s comment regarding the number of trials in the test phase (i.e., 10 trials per condition). This trial number was chosen to ensure comparability with previous studies employing similar designs and research questions (e.g. Colloca et al., 2010). Our primary objective was to directly compare placebo and nocebo effects within a within-subject design and to examine their persistence one week after the first test session. While we did not specifically aim to investigate the trajectory of responses within a single testing session, we fully agree that a comprehensive analysis of the trajectories of expectation effects on pain would be a valuable extension of our work. We will acknowledge this limitation and future direction in the revised manuscript. 

      On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60, and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation. There is a potential risk of revealing the manipulation to participants during the re-familiarization process, as they were not previously briefed to expect the painful stimulus intensity to vary without the application of sham TENS stimulation.

      We thank the reviewer for the opportunity to clarify that participants were informed at the beginning of the experiment that we would use different stimulation intensities to re-familiarize them with the stimuli before the second test session. We are therefore confident that participants perceived this step as part of a recalibration rather than associating it with the experimental manipulation. We will add this information to the revised version of the manuscript. 

      The differences between the nocebo and control conditions in pain ratings during conditioning could be explained by the differing physiological effects of the different stimulus intensities, so it is difficult to make any claims about expectation effects here.

      We appreciate the reviewer’s comment and agree that, despite the careful calibration of the three pain stimuli, we cannot entirely rule out the possibility that temporal dynamics during the conditioning session were influenced by differential physiological effects of the varying stimulus intensities (e.g., intensity-dependent habituation or sensitization). We will address this in the revision of the manuscript, but we would like to emphasize that the stronger nocebo effects during the test phase are statistically controlled for any differences in the conditioning session. 

      A randomisation error meant that 25 participants received an unbalanced number of 448 trials per condition (i.e., 10 x VAS 40, 14 x VAS 60, 12 x VAS 80).

      We agree that it is unfortunate that 25 participants were conditioned with an unbalanced number of trials per condition during the conditioning session. In the revised version of the manuscript, we will include additional analyses to demonstrate that this imbalance did not systematically bias the results and that the findings observed during the test phase remain robust despite this error.  

      Reviewer #2 (Public review):

      Summary:

      Kunkel et al aim to answer a fundamental question: Do placebo and nocebo effects differ in magnitude or longevity? To address this question, they used a powerful within-participants design, with a very large sample size (n=104), in which they compared placebo and nocebo effects - within the same individuals - across verbal expectations, conditioning, testing phase, and a 1-week follow-up. With elegant analyses, they establish that different mechanisms underlie the learning of placebo vs nocebo effects, with the latter being acquired faster and extinguished slower. This is an important finding for both the basic understanding of learning mechanisms in humans and for potential clinical applications to improve human health.

      Strengths:

      Beyond the above - the paper is well-written and very clear. It lays out nicely the need for the current investigation and what implications it holds. The design is elegant, and the analyses are rich, thoughtful, and interesting. The sample size is large which is highly appreciated, considering the longitudinal, in-lab study design. The question is super important and well-investigated, and the entire manuscript is very thoughtful with analyses closely examining the underlying mechanisms of placebo versus nocebo effects.

      We thank the reviewer for their positive evaluation of our manuscript and for acknowledging the large sample size, methodological rigor, and the significant implications for clinical applications and the broader research field.

      Weaknesses:

      There were two highly addressable weaknesses in my opinion:

      (1) I could not find the preregistration - this is crucial to verify what analyses the authors have committed to prior to writing the manuscript. Please provide a link leading directly to the preregistration - searching for the specified number in the suggested website yielded no results.

      We apologize that the registration number alone does not directly lead to the preregistration of this study. We thank the reviewer for pointing this out and will include a link to the preregistration in the revised manuscript. This study was pre-registered with the German Clinical Trial Register (registration number: DRKS00029228; https://drks.de/search/de/trial/DRKS00029228).

      (2) There is a recurring issue which is easy to address: because the Methods are located after the Results, many of the constructs used, analyses conducted, and even the main placebo and nocebo inductions are unclear, making it hard to appreciate the results in full. I recommend finding a way to detail at the beginning of the results section how placebo and nocebo effects have been induced. While my background means I am familiar with these methods, other readers will lack that knowledge. Even a short paragraph or a figure (like Figure 4) could help clarify the results substantially. For example, a significant portion of the results is devoted to the conditioning part of the experiment, while it is unknown which part was involved (e.g., were temperatures lowered/increased in all trials or only in the beginning).

      We thank the reviewer for this comment and suggestion. In the revised version, we will restructure the manuscript and include more detailed information about the key experimental procedures and design at the beginning of the Results section to enhance clarity and improve the interpretability of the reported findings.

    1. eLife Assessment

      This paper describes the study of the evolution of the N-terminal domain of the MSH6 mismatch repair protein in regard to the presence or absence of histone reader domains. While the presence of the histone reader domains was previously known, the phylogenetic analysis of these domains performed here establishing their insertion through convergent evolution is important, definitively done, and establishes an interesting feature of the MSH6 family of proteins. The work is convincing but the presentation of the structural features of MSH6 could be improved.

    2. Reviewer #1 (Public review):

      Summary:

      Previous studies have shown that the MSH6 family of mismatch repair proteins contains an unstructured N-terminal domain that contains either a PWWP domain, a Tudor domain or neither and that the interaction of the histone reader domains with the appropriate histone H3 modification enhances mismatch repair, and hence reduces mutation rates in coding regions to some extent. However, the elimination of the MSH6-histone modification probably does not completely eliminate mismatch repair, although the published papers on this point do not seem definitive.

      In this study, the authors perform a details phylogenetic analysis of the presence of the PWWP and Tudor domains in MSH6 proteins across the tree of life. They observe that there are basically three classes of organisms that contain either a PWWP domain, a Tudor domain, or neither. On the basis of their analysis, they suggest that this represents convergent evolution of the independent acquisition of histone reader domains and that key amino acid residues in the reader domains are selected for.

      Strengths:

      The phylogenetic aspects of the work seem well done and the basic evolutionary conclusions of the work are well supported. The basic evolutionary conclusions are interesting and there is little to criticize from my perspective.

      Weaknesses:

      A major concern about this paper is that the authors fail to put their work into the proper context of what is already known about the N-terminus of MSH6. Further, their structural studies, which are really structural illustrations, are misleading, often incorrect, and not always helpful in addition to having been published before.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, Monroe JG and colleagues show a compelling case of convergent evolution in the fusion between an important mismatch repair protein (MSH6) and histone reader domains across the tree of life. These fused MSH6 readers have been shown to be important for the recruitment of MSH6 to exon-rich genome locations, therefore improving the efficiency of reducing mutation rates in coding regions.

      Comparative genomic analyses here performed revealed independent instances of MSH6 fusion with histone readers in plants and metazoa with several instances of putative loss (or gain) across the phylogeny. The work also unveiled instances of MSH6 fusion putatively interesting domains in fungi which might be worth exploring in the future.

      The authors also show potential signatures of purifying selection in functional amino acids MSH6 histone readers.

      Overall the approach is adequate for the questions proposed to be answered, the analyses are rigorous and support the authors' claims.

      DNA repair genes are essential to maintain genome stability and fidelity, and alterations in these pathways have been associated with hypermutation phenotypes in the context for instance of cancer in humans, with sometimes implications in treatment resistance. This is an important work that contributes to our understanding of the evolutionary consequences of the evolution of epigenome-targeted DNA repair.

      Strengths:

      The methods used are adequate for the questions and support the results. The search for MSH6 fusions was rigorous and conservative, which strengthens the significance of the claims on the evolutionary history of these fusion events.

      Weaknesses:

      I did not identify any major weaknesses, but please see my suggestions/recommendations.

    4. Reviewer #3 (Public review):

      Summary:

      In the manuscript entitled "Convergent evolution of epigenome recruited DNA repair across the Tree of Life", Monroe et al. investigate bioinformatically how some important mechanisms of epigenome-targeted DNA repair evolved at the tree of life scale. They provide a clear example of convergent evolution of these mechanisms between animals and plants, investigating more than 4000 eukaryotic genomes, and uncovering a significant association between gain/retention of such mechanisms with genome size and high intron content, that at least partially explains the evolutionary patterns observed within major eukaryotic lineages.

      Strengths:

      The manuscript is well written, clear, and understandable, and has potentially broad interest. It provides a thorough analysis of the evolution of MSH6-related DNA repair mechanisms using more than 4000 eukaryotic genomes, a pretty impressive number allowing to identify both large-scale (i.e. kingdoms) as well as shorter-scale (i.e. phyla, orders) evolutionary patterns. Moreover, despite providing no experimental validation, it investigates with a sufficient degree of depth, a potential relationship between gain/retention of epigenome recruited DNA repair mediated by MSH6 and genomic, as well as life-history (population size, body mass, lifespan), traits. In particular, it provides convincing evidence for a causative effect between genome size/intron content and the presence/absence of this mechanism. Moreover, it stimulates further scientific investigation and biological questions to be addressed, such as the conservation of epigenomes across the tree of life, the existence of potential trade-offs in gain/retention vs. loss of such mechanisms, and the relationship between these processes, mutation rate heterogeneity, and evolvability.

      Weaknesses:

      Despite the interesting and necessary insights provided on (1) the evolution of DNA repair mechanisms, and (2) the convergent evolution of molecular mechanisms, this bioinformatic study emanates from studies in humans and Arabidopsis already showing signs of potential convergent evolution in aspects of epigenome-recruited DNA repair. For this, this study, although bioinformatically remarkably thorough, does not come as a surprise, potentially lowering its novelty.

      What could have increased further its impact, interest, and novelty could have been a more comprehensive understanding of the causative processes leading to gain/retention vs. loss of MSH6-related epigenetic recruitment mechanisms. The authors provide interesting associations with life-history traits (yet not significant), and significant links with genome size and intron content only at the theoretical level. For the first aspect, the analyses could have expanded toward other life-history traits. For the second, maybe it could have been even possible to tackle experimentally some of the generated questions, functionally in some models, or deepened using specific case studies.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Previous studies have shown that the MSH6 family of mismatch repair proteins contains an unstructured N-terminal domain that contains either a PWWP domain, a Tudor domain or neither and that the interaction of the histone reader domains with the appropriate histone H3 modification enhances mismatch repair, and hence reduces mutation rates in coding regions to some extent. However, the elimination of the MSH6-histone modification probably does not completely eliminate mismatch repair, although the published papers on this point do not seem definitive.

      In this study, the authors perform a details phylogenetic analysis of the presence of the PWWP and Tudor domains in MSH6 proteins across the tree of life. They observe that there are basically three classes of organisms that contain either a PWWP domain, a Tudor domain, or neither. On the basis of their analysis, they suggest that this represents convergent evolution of the independent acquisition of histone reader domains and that key amino acid residues in the reader domains are selected for.

      Strengths:

      The phylogenetic aspects of the work seem well done and the basic evolutionary conclusions of the work are well supported. The basic evolutionary conclusions are interesting and there is little to criticize from my perspective.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      A major concern about this paper is that the authors fail to put their work into the proper context of what is already known about the N-terminus of MSH6. Further, their structural studies, which are really structural illustrations, are misleading, often incorrect, and not always helpful in addition to having been published before.

      Thank you for the helpful suggestions on this front. We agree that some of the structural visualizations were over simplified and apologize for the lack of clarity. Notably, we did not annotate the presence of putative or known short PCNA-interacting protein (PIP) motifs which have been found at the linker disordered N-terminus of MSH6 proteins. Indeed, while not direct to our investigation of the origins of histone readers, the PIP motifs are an interesting and functionally important feature of MSH6 structural biology, especially because they may facilitate DNA repair processes more generally. In the revised manuscript, we aim to improve the scholarship on this topic and clarify the presence/importance of this motif for MSH6 function, as well as what is known about the structural biology of the MSH6 N-terminus more broadly. We will add annotations of the PIP motif and will also improve structural prediction by visualizing MSH6 structure in its dimerized form with MSH2, for a more accurate estimate of its folding in vivo. We hope that these in addition to other valuable suggested improvements will enhance the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this work, Monroe JG and colleagues show a compelling case of convergent evolution in the fusion between an important mismatch repair protein (MSH6) and histone reader domains across the tree of life. These fused MSH6 readers have been shown to be important for the recruitment of MSH6 to exon-rich genome locations, therefore improving the efficiency of reducing mutation rates in coding regions.

      Comparative genomic analyses here performed revealed independent instances of MSH6 fusion with histone readers in plants and metazoa with several instances of putative loss (or gain) across the phylogeny. The work also unveiled instances of MSH6 fusion putatively interesting domains in fungi which might be worth exploring in the future.

      The authors also show potential signatures of purifying selection in functional amino acids MSH6 histone readers.

      Overall the approach is adequate for the questions proposed to be answered, the analyses are rigorous and support the authors' claims.

      DNA repair genes are essential to maintain genome stability and fidelity, and alterations in these pathways have been associated with hypermutation phenotypes in the context for instance of cancer in humans, with sometimes implications in treatment resistance. This is an important work that contributes to our understanding of the evolutionary consequences of the evolution of epigenome-targeted DNA repair.

      Strengths:

      The methods used are adequate for the questions and support the results. The search for MSH6 fusions was rigorous and conservative, which strengthens the significance of the claims on the evolutionary history of these fusion events.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      I did not identify any major weaknesses, but please see my suggestions/recommendations.

      Thank you, we will also address your suggestions, which provide valuable recommendations for improving the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      In the manuscript entitled "Convergent evolution of epigenome recruited DNA repair across the Tree of Life", Monroe et al. investigate bioinformatically how some important mechanisms of epigenome-targeted DNA repair evolved at the tree of life scale. They provide a clear example of convergent evolution of these mechanisms between animals and plants, investigating more than 4000 eukaryotic genomes, and uncovering a significant association between gain/retention of such mechanisms with genome size and high intron content, that at least partially explains the evolutionary patterns observed within major eukaryotic lineages.

      Strengths:

      The manuscript is well written, clear, and understandable, and has potentially broad interest. It provides a thorough analysis of the evolution of MSH6-related DNA repair mechanisms using more than 4000 eukaryotic genomes, a pretty impressive number allowing to identify both large-scale (i.e. kingdoms) as well as shorter-scale (i.e. phyla, orders) evolutionary patterns. Moreover, despite providing no experimental validation, it investigates with a sufficient degree of depth, a potential relationship between gain/retention of epigenome recruited DNA repair mediated by MSH6 and genomic, as well as life-history (population size, body mass, lifespan), traits. In particular, it provides convincing evidence for a causative effect between genome size/intron content and the presence/absence of this mechanism. Moreover, it stimulates further scientific investigation and biological questions to be addressed, such as the conservation of epigenomes across the tree of life, the existence of potential trade-offs in gain/retention vs. loss of such mechanisms, and the relationship between these processes, mutation rate heterogeneity, and evolvability.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      Despite the interesting and necessary insights provided on (1) the evolution of DNA repair mechanisms, and (2) the convergent evolution of molecular mechanisms, this bioinformatic study emanates from studies in humans and Arabidopsis already showing signs of potential convergent evolution in aspects of epigenome-recruited DNA repair. For this, this study, although bioinformatically remarkably thorough, does not come as a surprise, potentially lowering its novelty.

      What could have increased further its impact, interest, and novelty could have been a more comprehensive understanding of the causative processes leading to gain/retention vs. loss of MSH6-related epigenetic recruitment mechanisms. The authors provide interesting associations with life-history traits (yet not significant), and significant links with genome size and intron content only at the theoretical level. For the first aspect, the analyses could have expanded toward other life-history traits. For the second, maybe it could have been even possible to tackle experimentally some of the generated questions, functionally in some models, or deepened using specific case studies.

      We agree that this work expands on recent experimental work in humans and Arabidopsis on the function of histone readers in MSH6, PWWP and Tudor, respectively. However, the evolution of these fusions remained a significant knowledge gap, limiting the degree to which functional work could be translated to other organisms. This study definitively characterized the evolutionary history of MHS6 histone readers and lays the groundwork for future investigations in diverse species. We agree that more causal inference would be valuable to understand the evolutionary pressures acting on MSH6 histone reader presence/absence. Indeed, we prioritized the conservative approach of testing hypotheses with strict phylogenetically constrained contrasts. While we observed highly significant associations between histone readers and genomic traits like intron content, associations with life history traits were only significant before accounting for phylogeny. It is possible that this is due to a lack of power because such traits are only available in limited taxa. In the revised manuscript, we aim to clarify potential causes, outline future experimental work beyond the scope of this individual study, and argue that this work highlights the need to catalog trait diversity at broader phylogenetic scales.  We also address other valuable suggestions in the revised manuscript.

    1. eLife Assessment

      This study provides valuable insights into how auditory stimuli influence the temporal dynamics of visual perception by modulating brain rhythms (oscillations) in the alpha band. The authors present convincing evidence that auditory input induces a drop in visual alpha frequency, increasing the time window for audio-visual integration, and subsequently shifting the predictive role from prestimulus alpha frequency to alpha phase. The conclusions are well-supported by the combination of psychophysics, electrophysiological recordings (EEG), non-invasive brain stimulation (tACS), and computational modelling.

    2. Reviewer #1 (Public review):

      Summary:

      Is peristimulus alpha (8-14 Hz) frequency and/or phase involved in shaping the length of visual and audiovisual temporal binding windows, as posited by the discrete sampling hypothesis? If so, to what extent and perceptual scenario are they functionally relevant? The authors addressed such questions by collecting EEG data during the completion of the widely-known 2-flash fusion paradigm, administered both in a standard (i.e., visual only, F2) and audiovisual (i.e., 2 flashes and 1 beep, F2B1) fashion. Instantaneous frequency estimation performed over parieto-occipital sensors revealed slower alpha rhythms right after stimulus onset in the F2B1 condition, as compared to the F2, a pattern found to correlate with the difference between modality-specific ISIs (F2B1-F2). Of note, peristimulus alpha frequency differed also between 1 vs 2 flashes reports, although in the visual modality only (i.e., faster alpha oscillations in 2 flash percept vs 1 flash). This pattern of results was reinvigorated in a causal manner via occipital tACS, which was capable of, respectively, narrowing down vs enlarging the temporal binding window of individuals undergoing 13 Hz vs 8 Hz stimulation in the F2 modality alone. To elucidate what the oscillatory signatures of crossmodal integration might be, the authors further focused on the phase of posterior alpha rhythms. Accordingly, the Phase Opposition Sum proved to significantly differ between modalities (F2B1 vs F2) during the prestimulus time window, suggesting that audiovisual signals undergo finer processing based on the ongoing phase of occipital alpha oscillations, rather than the speed at which these rhythms cycle. As a last bit of information, a computational model factoring in the electrophysiological assumptions of both the discrete sampling hypothesis and auditory-induced phase-resetting was devised. Analyses run on such synthetic data were partially able to reproduce the patterns witnessed in the empirical dataset. While faster frequency rates broadly provide a higher probability to detect 2 flashes instead of 1, the occurrence of a concurrent auditory signal in cross-modal trials should cause a transient elongation (i.e. slower frequency rate) of the ongoing alpha cycle due to phase-reset dynamics (as revealed via inter-trial phase clustering), prompting larger ISIs during F2B1 trials. Conversely, the model provides that alpha oscillatory phase might predict how well an observer dissociates sensory information from noise (i.e., perceptual clarity), with the second flash clearly perceived as such as long as it falls within specific phase windows along the alpha cycle.

      Strengths:

      The authors leveraged complementary approaches (EEG, tACS, and computational modelling), the results thereof not only integrate, but depict an overarching mechanistic scenario elegantly framing phase-resetting dynamics into the broader theoretical architecture posited by the discrete sampling hypothesis. Analyses on brain oscillations (either via frequency sliding and phase opposition sum) mostly appear to be methodologically sound, and very-well supported by tACS results. Under this perspective, the modelling approach serves as a convenient tool to reconcile and shed more light on the pieces of evidence gathered on empirical data, returning an appealing account on how cross-modal stimuli interplay with ongoing alpha rhythms and differentially affect multisensory processing in humans.

      Weaknesses:

      Some information relative to the task and the analyses is missing. For instance, it is not entirely clear from the text what the number of flashes actually displayed in explicit short trials is (1 or 2?). We believe it is always two, but it should be explicitly stated.

      Moreover, the sample size might be an issue. As highlighted by a recent meta-analysis on the matter (Samaha & Romei, 2024), an underpowered sample size may very well drive null-findings relative to tACS data in F2B1 trials, in interplay with broad and un-individualized frequency targets.

      Some criticality arises regarding the actual "bistability" of bistable trials, as the statistics relative to the main task (i.e., the actual means and SEMs are missing) broadly point toward a higher proclivity to report 2 instead of 1 flash in both F2B1 and F2 trials. This makes sense to some extent, given that 2 flashes have always been displayed (at least in bistable trials), yet tells about something botched during the pretest titration procedure.

      Coming to the analyses on brain waves, one main concern relates to the phase-reset-induced slow-down of posterior alpha rhythms being of true oscillatory nature, rather than a mere evoked response (i.e., not sustained over time). Another question calling for some further scrutiny regards the overlooked pattern linking the temporal extent of the IAF differences between F2 and F2B1 trials with the ISIs across experimental conditions (explicit short, bistable, and explicit long). That is, the wider the ISI, the longer the temporal extent of the IAF difference between sensory modalities. Although neglected by the authors, such a trend speaks in favour of a rather nuanced scenario stemming from not only auditory-induced phase-reset alpha cycle elongation, but also some non-linear and perhaps super-additive contribution of flash-induced phase-resetting. This consideration introduces some of the issues about the computational simulation, which was modelled around the assumption of phase-resetting being triggered by acoustic stimuli alone. Given how appealing the model already is, I wonder whether the authors might refine the model accordingly and integrate the phase-resetting impact of visual stimuli upon synthetic alpha rhythms. Relatedly, I would also suggest the authors to throw in a few more simulations to explore the parameter space and assay, to which quantitative extent the model still holds (e.g. allowing alpha frequency to randomly change within a range between 8 and 13 Hz, or pivoting the phase delay around 10 or 50 ms). As a last remark, I would avoid, or at least tone down, concluding that the results hereby presented might reconcile and/or explain the null effects in Buergers & Noppeney, 2022; as the relationship between IAFs and audiovisual abilities still holds when examining other cross-modal paradigms such as the Sound-Induced Flash-Illusion (Noguchi, 2022), and the aforementioned patterns might be due to other factors, such as a too small sample size (Samaha & Romei, 2024).

    3. Reviewer #2 (Public review):

      Summary:

      The authors used a visual flash discrimination task in which two flashes are presented one after another with different inter-stimulus intervals. Participants either perceive one flash or two flashes. The authors show that the simultaneous presence of an auditory input extends the temporal window of integration, meaning that two flashes presented shortly after one another are more likely to be perceived as a single flash. Auditory inputs are accompanied by a reduction in alpha frequency over visual areas. Prestimulus alpha frequency predicts perceptual outcomes in the absence of auditory stimuli, whereas prestimulus alpha phase becomes the dominant predictor when auditory input is present. A computational model based on phase-resetting theory supports these findings. Additionally, a transcranial stimulation experiment confirms the causal role of alpha frequency in unimodal visual perception but not in cross-modal contexts.

      Strengths:

      The authors elegantly combined several approaches-from behavior to computational modeling and EEG-to provide a comprehensive overview of the mechanisms involved in visual integration in the presence or absence of auditory input. The methods used are state-of-the-art, and the authors attempted to address possible pitfalls.

      Weaknesses:

      The use of Bayesian statistics could further strengthen the paper, especially given that a few p-values are close to the significance threshold (lines 162 & 258), but they are interpreted differently in different cases (absence of effect vs. trend).

      Overall, these results provide new insights into the role of alpha oscillations in visual processing and offer an interesting perspective on the current debate regarding the roles of alpha phase and frequency in visual perception. More generally, they contribute to our understanding of the neural dynamics of multisensory integration.

    4. Reviewer #3 (Public review):

      Summary:

      The authors investigated the impact of an auditory stimulus on visual integration at the behavioral, electrophysiological, and mechanistic levels. Although the role of alpha brain oscillations on visual perception has been widely studied, how the brain dynamics in the visual cortices are influenced by a cross-modal stimulus remains ill-defined. The authors demonstrated that auditory stimulation systematically induced a drop in visual alpha frequency, increasing the time window for audio-visual integration, while in the unimodal condition, visual integration was modulated by small variations within the alpha frequency range. In addition, they only found a role of the phase of alpha brain oscillations on visual perception in the cross-modal condition. Based on the perceptual cycles' theory framework, the authors developed a model allowing them to describe their results according to a phase resetting induced by the auditory stimulation. These results showed that the influence of well-known brain dynamics on one modality can be disrupted by another modality. They provided insights into the importance of investigating cross-modal brain dynamics, and an interesting model that extends the perceptual cycle framework.

      Strengths:

      The results are supported by a combination of various, established experimental and analysis approaches (e.g., two-flash fusion task, psychometric curves, phase opposition), ensuring strong methodological bases and allowing direct comparisons with related findings in the literature.

      The model the authors proposed is an extension and an improvement of the perceptual cycle's framework. Interestingly, this model could then be tested in other experimental approaches.

      Weaknesses:

      There is an increasing number of studies in cognitive neuroscience showing the importance of considering inter-individual variability. The individual alpha frequency (IAF) varied from 8 to 13 Hz with a huge variability across participants, and studies have shown that the IAF influenced visual perception. Investigating inter-individual variations of the IAF in the reported results would be of great interest, especially for the model.

      Although the use of non-invasive brain stimulation to infer causality is a method of great interest, the use of tACS in the presented work is not optimal. Instead of inducing alpha brain oscillations in visual cortices, the use of tACS to activate the auditory cortex instead of the actual auditory stimulation would have presented more interest.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Is peristimulus alpha (8-14 Hz) frequency and/or phase involved in shaping the length of visual and audiovisual temporal binding windows, as posited by the discrete sampling hypothesis? If so, to what extent and perceptual scenario are they functionally relevant? The authors addressed such questions by collecting EEG data during the completion of the widely-known 2-flash fusion paradigm, administered both in a standard (i.e., visual only, F2) and audiovisual (i.e., 2 flashes and 1 beep, F2B1) fashion. Instantaneous frequency estimation performed over parieto-occipital sensors revealed slower alpha rhythms right after stimulus onset in the F2B1 condition, as compared to the F2, a pattern found to correlate with the difference between modality-specific ISIs (F2B1-F2). Of note, peristimulus alpha frequency differed also between 1 vs 2 flashes reports, although in the visual modality only (i.e., faster alpha oscillations in 2 flash percept vs 1 flash). This pattern of results was reinvigorated in a causal manner via occipital tACS, which was capable of, respectively, narrowing down vs enlarging the temporal binding window of individuals undergoing 13 Hz vs 8 Hz stimulation in the F2 modality alone. To elucidate what the oscillatory signatures of crossmodal integration might be, the authors further focused on the phase of posterior alpha rhythms. Accordingly, the Phase Opposition Sum proved to significantly differ between modalities (F2B1 vs F2) during the prestimulus time window, suggesting that audiovisual signals undergo finer processing based on the ongoing phase of occipital alpha oscillations, rather than the speed at which these rhythms cycle. As a last bit of information, a computational model factoring in the electrophysiological assumptions of both the discrete sampling hypothesis and auditory-induced phase-resetting was devised. Analyses run on such synthetic data were partially able to reproduce the patterns witnessed in the empirical dataset. While faster frequency rates broadly provide a higher probability to detect 2 flashes instead of 1, the occurrence of a concurrent auditory signal in cross-modal trials should cause a transient elongation (i.e. slower frequency rate) of the ongoing alpha cycle due to phase-reset dynamics (as revealed via inter-trial phase clustering), prompting larger ISIs during F2B1 trials. Conversely, the model provides that alpha oscillatory phase might predict how well an observer dissociates sensory information from noise (i.e., perceptual clarity), with the second flash clearly perceived as such as long as it falls within specific phase windows along the alpha cycle.

      Strengths:

      The authors leveraged complementary approaches (EEG, tACS, and computational modelling), the results thereof not only integrate, but depict an overarching mechanistic scenario elegantly framing phase-resetting dynamics into the broader theoretical architecture posited by the discrete sampling hypothesis. Analyses on brain oscillations (either via frequency sliding and phase opposition sum) mostly appear to be methodologically sound, and very-well supported by tACS results. Under this perspective, the modelling approach serves as a convenient tool to reconcile and shed more light on the pieces of evidence gathered on empirical data, returning an appealing account on how cross-modal stimuli interplay with ongoing alpha rhythms and differentially affect multisensory processing in humans.

      Weaknesses:

      Some information relative to the task and the analyses is missing. For instance, it is not entirely clear from the text what the number of flashes actually displayed in explicit short trials is (1 or 2?). We believe it is always two, but it should be explicitly stated.

      We thank the reviewer for highlighting this important point. In our study, all explicit trials consistently presented two flashes. We will clearly state this detail in the Methods section to avoid any further confusion.

      Moreover, the sample size might be an issue. As highlighted by a recent meta-analysis on the matter (Samaha & Romei, 2024), an underpowered sample size may very well drive null-findings relative to tACS data in F2B1 trials, in interplay with broad and un-individualized frequency targets.

      We thank the reviewer for raising this point. First, we would like to clarify that our results do not suggest that the frequency effect is absent in the F2B1 condition; rather, it is relatively attenuated compared to the F2 condition. If the sample size were the primary issue, we would expect to observe a null effect in both conditions. Instead, the stronger frequency modulation in F2 confirms that the sound-induced modulation is present, albeit reduced in the audiovisual context. In our revised manuscript, we will explicitly note that our claim is not that there is no frequency effect in F2B1 but that the effect is weaker relative to F2, and we will also acknowledge the potential limitations associated with sample size and the lack of individualized frequency targeting.

      Some criticality arises regarding the actual "bistability" of bistable trials, as the statistics relative to the main task (i.e., the actual means and SEMs are missing) broadly point toward a higher proclivity to report 2 instead of 1 flash in both F2B1 and F2 trials. This makes sense to some extent, given that 2 flashes have always been displayed (at least in bistable trials), yet tells about something botched during the pretest titration procedure.

      We thank the reviewer for pointing out the potential bias toward reporting “two flashes” in the bistable trials. Because our experimental design involves presenting two flashes in both explicit and bistable trials, a slight tendency to report two flashes may naturally arise, especially at threshold levels determined during pretesting. We believe, however, that this bias does not undermine our primary findings. Our psychophysical procedure is designed to align the inter-stimulus interval with each participant’s fusion threshold, aiming for a near 50/50 split between “one-flash” and “two-flash” reports. However, given that two flashes are always presented, participants may be predisposed to report two flashes when uncertain. This reflects a plausible perceptual bias inherent in the bistable design, rather than a systematic flaw. Importantly, this tendency appears at comparable levels in both the F2 and F2B1 conditions, indicating that it does not selectively affect any particular condition. In the revised manuscript, we will include additional descriptive statistics, such as means and standard deviations, to demonstrate that the observed bias remains within an acceptable range and does not compromise our core conclusions regarding the modulatory effect of auditory input on visual integration.

      Coming to the analyses on brain waves, one main concern relates to the phase-reset-induced slow-down of posterior alpha rhythms being of true oscillatory nature, rather than a mere evoked response (i.e., not sustained over time).

      We appreciate the reviewer’s concern regarding this issue. First, the sustained decrease in posterior alpha frequency observed in our study—persisting for approximately 280 ms—substantially exceeds the typical duration of an auditory evoked potential (generally 50–200 ms) (Näätänen and Picton, 1987). This extended period of modulation suggests that it is not merely a transient evoked response.

      Second, our analysis of alpha power further supports this interpretation. A purely evoked response is usually accompanied by a corresponding increase in signal power; however, our results show no such power increase when comparing the F2B1 condition with the F2 condition.

      Moreover, the observed increase in alpha phase resetting—as measured by inter-trial phase coherence (ITC)—does not significantly correlate with changes in alpha power. This dissociation further indicates that the auditory-induced effects are unlikely to be driven solely by evoked potentials, but are more consistent with a reorganization of the intrinsic neural oscillatory activity.

      Together, these lines of evidence strongly support the view that the auditory-induced decrease in alpha frequency reflects true changes in ongoing oscillatory dynamics, rather than being merely a transient evoked response.

      Another question calling for some further scrutiny regards the overlooked pattern linking the temporal extent of the IAF differences between F2 and F2B1 trials with the ISIs across experimental conditions (explicit short, bistable, and explicit long). That is, the wider the ISI, the longer the temporal extent of the IAF difference between sensory modalities. Although neglected by the authors, such a trend speaks in favour of a rather nuanced scenario stemming from not only auditory-induced phase-reset alpha cycle elongation, but also some non-linear and perhaps super-additive contribution of flash-induced phase-resetting. This consideration introduces some of the issues about the computational simulation, which was modelled around the assumption of phase-resetting being triggered by acoustic stimuli alone. Given how appealing the model already is, I wonder whether the authors might refine the model accordingly and integrate the phase-resetting impact of visual stimuli upon synthetic alpha rhythms.

      We appreciate the reviewer’s insightful comment regarding the potential influence of flash-induced phase resetting on the temporal extent of the IAF differences. We acknowledge that the observation—that wider ISIs are associated with a longer period of IAF differences—hints at a non-linear or even super-additive interaction between auditory- and flash-induced phase resetting mechanisms.

      However, the primary focus of our current study is on how auditory stimuli affect alpha oscillatory dynamics. Our experimental design and computational model were specifically optimized to capture auditory-induced phase resetting. Incorporating the additional influence of flash-induced effects would require a significantly more refined experimental framework and a more complex modeling approach. This added complexity could obscure the interpretation of our main findings, which are centered on auditory influences.

      In the revised manuscript, we will address this intriguing possibility in the Discussion section. We will acknowledge that while the data hint at a potential visual contribution, our present model deliberately isolates auditory-induced phase resetting to maintain clarity. We also propose that future research, with more precise experimental designs and enhanced modeling techniques, is necessary to fully disentangle and capture the interplay between auditory and flash-induced phase resetting mechanisms.

      Relatedly, I would also suggest the authors to throw in a few more simulations to explore the parameter space and assay, to which quantitative extent the model still holds (e.g. allowing alpha frequency to randomly change within a range between 8 and 13 Hz, or pivoting the phase delay around 10 or 50 ms).

      We appreciate the reviewer’s suggestion to further explore our model’s parameter space. In response, we will conduct additional simulations that incorporate variability in alpha frequency—sampling randomly between 8 and 13 Hz—and examine alternative phase delays (e.g., around 10 and 50 ms). By systematically adjusting these parameters, we can more thoroughly evaluate the model’s robustness and delineate its boundaries under a broader range of neurophysiological conditions. We will present these results in the revised manuscript and discuss how they inform our understanding of alpha-driven visual integration in cross-modal contexts.

      As a last remark, I would avoid, or at least tone down, concluding that the results hereby presented might reconcile and/or explain the null effects in Buergers & Noppeney, 2022; as the relationship between IAFs and audiovisual abilities still holds when examining other cross-modal paradigms such as the Sound-Induced Flash-Illusion (Noguchi, 2022), and the aforementioned patterns might be due to other factors, such as a too small sample size (Samaha & Romei, 2024).

      We appreciate the reviewer’s suggestion and will revise our claims accordingly. In the revised manuscript, we will clarify that while our study demonstrates a mechanism by which alpha oscillations influence audiovisual integration in certain paradigms, this does not mean that our findings fully reconcile all conflicting results in the literature. We will emphasize that our mechanism may help explain why alpha frequency plays a critical role in some experimental settings, but that factors such as sample size, task parameters, and experimental design differences likely contribute to the divergent results observed across studies. Accordingly, we acknowledge that further research with larger samples and more refined methodologies is necessary to fully reconcile these discrepancies. This more cautious interpretation will be clearly discussed in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors used a visual flash discrimination task in which two flashes are presented one after another with different inter-stimulus intervals. Participants either perceive one flash or two flashes. The authors show that the simultaneous presence of an auditory input extends the temporal window of integration, meaning that two flashes presented shortly after one another are more likely to be perceived as a single flash. Auditory inputs are accompanied by a reduction in alpha frequency over visual areas. Prestimulus alpha frequency predicts perceptual outcomes in the absence of auditory stimuli, whereas prestimulus alpha phase becomes the dominant predictor when auditory input is present. A computational model based on phase-resetting theory supports these findings. Additionally, a transcranial stimulation experiment confirms the causal role of alpha frequency in unimodal visual perception but not in cross-modal contexts.

      Strengths:

      The authors elegantly combined several approaches-from behavior to computational modeling and EEG-to provide a comprehensive overview of the mechanisms involved in visual integration in the presence or absence of auditory input. The methods used are state-of-the-art, and the authors attempted to address possible pitfalls.

      Weaknesses:

      The use of Bayesian statistics could further strengthen the paper, especially given that a few p-values are close to the significance threshold (lines 162 & 258), but they are interpreted differently in different cases (absence of effect vs. trend).

      We appreciate the reviewer’s suggestion regarding the use of Bayesian statistics. We agree that a Bayesian framework can offer valuable complementary insights to our analysis by helping to distinguish whether a marginal p-value represents a trend or truly indicates the absence of an effect. To enhance the robustness of our conclusions, we will incorporate supplemental Bayesian analyses in the revised manuscript.

      Overall, these results provide new insights into the role of alpha oscillations in visual processing and offer an interesting perspective on the current debate regarding the roles of alpha phase and frequency in visual perception. More generally, they contribute to our understanding of the neural dynamics of multisensory integration.

      Reviewer #3 (Public review):

      Summary:

      The authors investigated the impact of an auditory stimulus on visual integration at the behavioral, electrophysiological, and mechanistic levels. Although the role of alpha brain oscillations on visual perception has been widely studied, how the brain dynamics in the visual cortices are influenced by a cross-modal stimulus remains ill-defined. The authors demonstrated that auditory stimulation systematically induced a drop in visual alpha frequency, increasing the time window for audio-visual integration, while in the unimodal condition, visual integration was modulated by small variations within the alpha frequency range. In addition, they only found a role of the phase of alpha brain oscillations on visual perception in the cross-modal condition. Based on the perceptual cycles' theory framework, the authors developed a model allowing them to describe their results according to a phase resetting induced by the auditory stimulation. These results showed that the influence of well-known brain dynamics on one modality can be disrupted by another modality. They provided insights into the importance of investigating cross-modal brain dynamics, and an interesting model that extends the perceptual cycle framework.

      Strengths:

      The results are supported by a combination of various, established experimental and analysis approaches (e.g., two-flash fusion task, psychometric curves, phase opposition), ensuring strong methodological bases and allowing direct comparisons with related findings in the literature.

      The model the authors proposed is an extension and an improvement of the perceptual cycle's framework. Interestingly, this model could then be tested in other experimental approaches.

      Weaknesses:

      There is an increasing number of studies in cognitive neuroscience showing the importance of considering inter-individual variability. The individual alpha frequency (IAF) varied from 8 to 13 Hz with a huge variability across participants, and studies have shown that the IAF influenced visual perception. Investigating inter-individual variations of the IAF in the reported results would be of great interest, especially for the model.

      We appreciate the reviewer’s valuable feedback regarding the importance of inter-individual variability in alpha frequency. In our current study, we have already addressed participant-level variability in our neural data by performing inter-subject correlation analyses, investigating whether individual reductions in alpha frequency correlate with broader temporal integration windows at the behavioral level.

      Moreover, our computational model incorporates physiologically realistic distributions for key parameters, including frequency and amplitude, which captures some degree of individual variability. Nevertheless, we acknowledge that a more targeted examination of how different IAF values specifically affect the model’s predictions would be highly valuable. In response, we will expand our simulations to systematically explore a range of IAF values and assess their impact on temporal integration windows and related measures of audiovisual processing. These additional analyses will help clarify the role of inter-individual variability in alpha frequency and further strengthen the mechanistic account offered by our model. We will detail these enhancements and discuss their implications in the revised manuscript.

      Although the use of non-invasive brain stimulation to infer causality is a method of great interest, the use of tACS in the presented work is not optimal. Instead of inducing alpha brain oscillations in visual cortices, the use of tACS to activate the auditory cortex instead of the actual auditory stimulation would have presented more interest.

      We appreciate the reviewer’s suggestion and acknowledge that non-invasive brain stimulation offers promising avenues for inferring causality. In our study, our primary hypothesis focused on the role of occipital alpha oscillations in defining the temporal window for visual integration, and accordingly we targeted visual cortex in our tACS protocol.

      We recognize that stimulating the auditory cortex could provide additional insights into auditory contributions to phase resetting. However, accurately targeting the auditory cortex with tACS presents technical challenges. The auditory cortex is located deeper within the temporal lobe, and factors such as variable skull thickness and complex current spread make it difficult to reliably modulate its neural activity compared to the more superficial visual areas. Indeed, recent studies have demonstrated that tACS-induced electric fields in the temporal regions tend to be weaker and less focal—for example, Huang et al. (2017) and Opitz et al. (2016) highlight the limitations in achieving robust stimulation of deeper or anatomically complex brain regions using conventional tACS approaches.

      Given these considerations, while we agree that future investigations could benefit from exploring auditory cortex stimulation—either as an alternative or as a complementary approach—the present study remains focused on visual alpha modulation, where our protocol is well validated and yields reliable results. In the revised manuscript, we will clearly discuss these issues and acknowledge the potential, yet technically challenging, possibility of stimulating the auditory cortex in future work to further disentangle the contributions of auditory and visual inputs to cross-modal integration.

    1. eLife assessment

      In this useful study, the authors tested the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes using modeling and behavioral experiments, claiming that bumblebees rely most on ground-views for homing. However, due to a lack of analysis of the bees' behavior during training and a lack of information as to how the homing behavior of bees develops over time, the evidence supporting their claims is currently incomplete. Moreover, there was concern that the experimental environment was not representative of natural scenes, thus limiting the findings of the study.

    2. Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented. 

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.

    3. Reviewer #2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.<br /> The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:<br /> line 51: "Snapshot models perform best with bird's eye views";<br /> line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it.";<br /> line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views."

      Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:<br /> Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      Behavioural analysis:<br /> The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17.

      Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).<br /> In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

    4. Author response:

      Reviewer 1 (Public Review):

      “Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.”

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest entrance using views within a confined area. While many studies have focused on larger scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing on a smaller scale, especially in dense environments.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      We agree with your comment about the term "clutter". Therefore, we will refer to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:

      line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views." Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the clutter but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing. (Neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we will include model results with the arena wall in the revised paper.

      As we wanted to investigate if bees would use ground views or bird’s eye views to home in a dense environment, we think the catchment volumes would provide qualitatively similar, though quantitatively more detailed information as catchment slices. Our approach of catchment slices is sufficient to predict whether ground or bird' s-eye views perform better in leading to the nest, and we will, therefore, not include further computations of catchment volumes.

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17.

      Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments. A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Our current knowledge of learning flights did not permit these investigations of bee training. Firstly, our setup does not allow us to record each inbound and outbound flight of the bumblebees during training. Doing so would require blocking the entire colony for extended time periods, potentially impairing the motivation of the bees to forage or the survival and development of the colony. Secondly, the exact locations where bees learn or if and whether they continuously learn by weighting the visual experience based on their positions and orientations is not always clear. It makes it difficult to categorise these flights accurately in learning and return flights. Additionally, homing models remain elusive on the learning mechanisms at play during the learning flights. Therefore, we believe that continuous effort must be made to understand bees' learning and homing ability. We felt it was necessary first to establish that bees could navigate back to the nest in a dense, cluttered environment. With this understanding, we are currently conducting a detailed study of the bees' learning flights in various dense environments and provide these results in a separate article.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the clutter.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled lab conditions. Both field and lab research are absolutely necessary and should feed each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of these components for the behaviour through targeted variation of individual components of the environment. These results should guide field-based experiments for validation.

      Our lab settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and will refer to our environment as a "dense environment."

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factor inherent to field work, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious

      mechanisms for homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

    1. eLife Assessment

      This important study investigates whether neural prediction of words can be measured through pre-activation of neural network word representations in the brain; solid evidence is provided that neural network representations of neighboring words are correlated in natural language. Therefore, it is crucial to differentiate between neural activity that predicts the upcoming word and neural activity that encodes the current words - information that can be used to predict the upcoming word. The study is of potential interest to researchers investigating language encoding in the brain or in large language models. Additional discussions are needed regarding the distinction between prediction and stimulus dependency and potential methods to distinguish them.

    2. Reviewer #1 (Public review):

      Summary:

      This paper tackles an important question: What drives the predictability of pre-stimulus brain activity? The authors challenge the claim that "pre-onset" encoding effects in naturalistic language data have to reflect the brain predicting the upcoming word. They lay out an alternative explanation: because language has statistical structure and dependencies, the "pre-onset" effect might arise from these dependencies, instead of active prediction. The authors analyze two MEG datasets with naturalistic data.

      Strengths:

      The paper proposes a very reasonable alternative hypothesis for claims in prior work. Two independent datasets are analyzed. The analyses with the most and least predictive words are clever, and nicely complement the more naturalistic analyses.

      Weaknesses:

      I have to admit that I have a hard time understanding one conceptual aspect of the work, and a few technical aspects of the analyses are unclear to me. Conceptually, I am not clear on why stimulus dependencies need to be different from those of prediction. Yes, it is true that actively predicting an upcoming word is different from just letting the regression model pick up on stimulus dependencies, but given that humans are statistical learners, we also just pick up on stimulus dependencies, and is that different from prediction? Isn't that in some way, the definition of prediction (sensitivity to stimulus dependencies, and anticipating the most likely upcoming input(s))?

      This brings me to some of the technical points: If the encoding regression model is learning one set of regression weights, how can those reflect stimulus dependencies (or am I misunderstanding which weights are learned)? Would it help to fit regression models on for instance, every second word or something (that should get rid of stimulus dependencies, but still allow to test whether the model predicts brain activity associated with words)? Or does that miss the point? I am a bit unclear as to what the actual "problem" with the encoding model analyses is, and how the stimulus dependency bias would be evident. It would be very helpful if the authors could spell out, more explicitly, the precise predictions of how the bias would be present in the encoding model.

    3. Reviewer #2 (Public review):

      Summary:

      At a high level, the reviewers demonstrate that there is an explanation for pre-word-onset predictivity in neural responses that does not invoke a theory of predictive coding or processing. The paper does this by demonstrating that this predictivity can be explained solely as a property of the local mutual information statistics of natural language. That is, the reason that pre-word onset predictivity exists could simply boil down to the common prevalence of redundant bigram or skip-gram information in natural language.

      Strengths:

      The paper addresses a problem of significance and uses methods from modern NeuroAI encoding model literature to do so. The arguments, both around stimulus dependencies and the problems of residualization, are compellingly motivated and point out major holes in the reasoning behind several influential papers in the field, most notably Goldstein et al. This result, together with other papers that have pointed out other serious problems in this body of work, should provoke a reconsideration of papers from encoding model literature that have promoted predictive coding. The paper also brings to the forefront issues in extremely common methods like residualization that are good to raise for those who might be tempted to use or interpret these methods incorrectly.

      Weaknesses:

      The authors don't completely settle the problem of whether pre-word onset predictivity is entirely explainable by stimulus dependencies, instead opting to show why naive attempts at resolving this problem (like residualization) don't work. The paper could certainly be better if the authors had managed to fully punch a hole in this.

    4. Reviewer #3 (Public review):

      Summary:

      The study by Schönmann et al. presents compelling analyses based on two MEG datasets, offering strong evidence that the pre-onset response observed in a highly influential study (Goldstein et al., 2022) can be attributed to stimulus dependencies, specifically, the auto-correlation in the stimuli-rather than to predictive processing in the brain. Given that both the pre-onset response and the encoding model are central to the landmark study, and that similar approaches have been adopted in several influential works, this manuscript is likely to be of high interest to the field. Overall, this study encourages more cautious interpretation of pre-onset responses in neural data, and the paper is well written and clearly structured.

      Strengths:

      (1) The authors provide clear and convincing evidence that inherent dependencies in word embeddings can lead to pre-activation of upcoming words, previously interpreted as neural predictive processing in many influential studies.

      (2) They demonstrate that dependencies across representational domains (word embeddings and acoustic features) can explain the pre-onset response, and that these effects are not eliminated by regressing out neighboring word embeddings - an approach used in prior work.

      (3) The study is based on two large MEG datasets, showing that results previously observed in ECoG data can be replicated in MEG. Moreover, the stimulus dependencies appear to be consistent across the two datasets.

      Weaknesses:

      (1) To allow a more direct comparison with Goldstein et al., the authors could consider using their publicly available dataset.

      (2) Goldstein et al. already addressed embedding dependencies and showed that their main results hold after regressing out the embedding dependencies. This may lessen the impact of the concerns about self-dependency raised here.

      (3) While this study shows that stimulus dependency can account for pre-onset responses, it remains unclear whether this fully explains them, or whether predictive processing still plays a role. The more important question is whether pre-activation remains after accounting for these confounds.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      This paper tackles an important question: What drives the predictability of pre-stimulus brain activity? The authors challenge the claim that "pre-onset" encoding effects in naturalistic language data have to reflect the brain predicting the upcoming word. They lay out an alternative explanation: because language has statistical structure and dependencies, the "pre-onset" effect might arise from these dependencies, instead of active prediction. The authors analyze two MEG datasets with naturalistic data.

      Strengths:

      The paper proposes a very reasonable alternative hypothesis for claims in prior work. Two independent datasets are analyzed. The analyses with the most and least predictive words are clever, and nicely complement the more naturalistic analyses.

      Weaknesses:

      I have to admit that I have a hard time understanding one conceptual aspect of the work, and a few technical aspects of the analyses are unclear to me. Conceptually, I am not clear on why stimulus dependencies need to be different from those of prediction. Yes, it is true that actively predicting an upcoming word is different from just letting the regression model pick up on stimulus dependencies, but given that humans are statistical learners, we also just pick up on stimulus dependencies, and is that different from prediction? Isn't that in some way, the definition of prediction (sensitivity to stimulus dependencies, and anticipating the most likely upcoming input(s))?

      This brings me to some of the technical points: If the encoding regression model is learning one set of regression weights, how can those reflect stimulus dependencies (or am I misunderstanding which weights are learned)? Would it help to fit regression models on for instance, every second word or something (that should get rid of stimulus dependencies, but still allow to test whether the model predicts brain activity associated with words)? Or does that miss the point? I am a bit unclear as to what the actual "problem" with the encoding model analyses is, and how the stimulus dependency bias would be evident. It would be very helpful if the authors could spell out, more explicitly, the precise predictions of how the bias would be present in the encoding model.

      We thank the reviewer for their comments and address both points.

      Conceptually, there is a key difference between encoding predictions, i.e. pre-activations of future words, versus encoding stimulus dependencies. The speech acoustics provide a useful control case: they encode the stimulus (and therefore stimulus dependencies) but do not predict. When we apply the encoding analysis to the acoustics (i.e. when we estimate the acoustics pre-onset from post-onset words), we observe the “hallmarks of prediction” – yet, clearly, the acoustics aren't "predicting" the next word.

      This reveals the methodological issue: if the brain were just passively filtering the stimulus (akin to a speech spectrogram), these "prediction hallmarks" would still appear in the acoustics encoding results, despite no actual prediction taking place. Therefore, one necessary criterion for concluding pre-activation from pre-stimulus neural encoding, is that at least the pre-stimulus encoding performance is better on neural data than on the stimulus itself. This would show that the pre-onset neural signal contains additional predictive information about the next word beyond that of the stimulus (e.g. acoustics) itself. We will make this point more prominent in the revision.

      Regarding the regression: different weights are estimated per time point in a time-resolved regression. This allows for modeling of unfolding responses over time, but also for the learning of stimulus dependencies.

      To sum up, the difference between encoding dependencies and predictions is at the core of our work. We appreciate this was not clear in the initial version and we will make this much clearer in the revision, conceptually and methodologically.

      Reviewer #2 (Public review):

      Summary:

      At a high level, the reviewers demonstrate that there is an explanation for pre-word-onset predictivity in neural responses that does not invoke a theory of predictive coding or processing. The paper does this by demonstrating that this predictivity can be explained solely as a property of the local mutual information statistics of natural language. That is, the reason that pre-word onset predictivity exists could simply boil down to the common prevalence of redundant bigram or skip-gram information in natural language.

      Strengths:

      The paper addresses a problem of significance and uses methods from modern NeuroAI encoding model literature to do so. The arguments, both around stimulus dependencies and the problems of residualization, are compellingly motivated and point out major holes in the reasoning behind several influential papers in the field, most notably Goldstein et al. This result, together with other papers that have pointed out other serious problems in this body of work, should provoke a reconsideration of papers from encoding model literature that have promoted predictive coding. The paper also brings to the forefront issues in extremely common methods like residualization that are good to raise for those who might be tempted to use or interpret these methods incorrectly.

      Weaknesses:

      The authors don't completely settle the problem of whether pre-word onset predictivity is entirely explainable by stimulus dependencies, instead opting to show why naive attempts at resolving this problem (like residualization) don't work. The paper could certainly be better if the authors had managed to fully punch a hole in this.

      We thank the reviewer for their assessment.

      We believe the limitation we highlight extends beyond the specific method of residualizing features. Rather, it points to a fundamental problem: adjusting the features (X matrix) alone cannot address stimulus dependencies that persist in the signal (y matrix), as we demonstrate by using a different signal (acoustics) that encodes no predictions. While removing dependencies from the signal would be more thorough, this would also eliminate the effect of interest. We view this as a fundamental limitation of the encoding analysis approach combined with the experimental design, rather than something that can be resolved analytically. We will perform additional analyses to test this premise and elaborate on this point in our revision.

      Reviewer #3 (Public review):

      Summary:

      The study by Schönmann et al. presents compelling analyses based on two MEG datasets, offering strong evidence that the pre-onset response observed in a highly influential study (Goldstein et al., 2022) can be attributed to stimulus dependencies, specifically, the auto-correlation in the stimuli-rather than to predictive processing in the brain. Given that both the pre-onset response and the encoding model are central to the landmark study, and that similar approaches have been adopted in several influential works, this manuscript is likely to be of high interest to the field. Overall, this study encourages more cautious interpretation of pre-onset responses in neural data, and the paper is well written and clearly structured.

      Strengths:

      (1) The authors provide clear and convincing evidence that inherent dependencies in word embeddings can lead to pre-activation of upcoming words, previously interpreted as neural predictive processing in many influential studies.

      (2) They demonstrate that dependencies across representational domains (word embeddings and acoustic features) can explain the pre-onset response, and that these effects are not eliminated by regressing out neighboring word embeddings - an approach used in prior work.

      (3) The study is based on two large MEG datasets, showing that results previously observed in ECoG data can be replicated in MEG. Moreover, the stimulus dependencies appear to be consistent across the two datasets.

      Weaknesses:

      (1) To allow a more direct comparison with Goldstein et al., the authors could consider using their publicly available dataset.

      (2) Goldstein et al. already addressed embedding dependencies and showed that their main results hold after regressing out the embedding dependencies. This may lessen the impact of the concerns about self-dependency raised here.

      (3) While this study shows that stimulus dependency can account for pre-onset responses, it remains unclear whether this fully explains them, or whether predictive processing still plays a role. The more important question is whether pre-activation remains after accounting for these confounds.

      We thank the reviewer for their comments.

      We want to address a key unclarity regarding the procedure of regressing out embedding dependencies. While Goldstein et al. showed that neural encoding results persist after their control analysis (like we did, too, in our supplementary Figure S3), this does not lessen the concern surrounding stimulus dependencies. Our analyses demonstrate that even after such residualization, the "hallmarks of prediction" remain encodable in the speech acoustics – a control system that, by definition, cannot predict upcoming words. Therefore, the hallmarks of prediction can be fully explained by stimulus dependencies. This persistence in the acoustics strengthens rather than lessens our concerns about dependencies.

      This connects to a broader methodological point: our key evidence comes from analyzing the stimulus material itself as a control system. By comparing results from encoding neural responses to those of a system that encodes the stimulus, and therefore the dependencies that cannot predict the upcoming input (like acoustics), we can establish proper criteria for concluding that the brain engages in prediction. Notably, the Goldstein dataset was not available when we conducted this research. However, for the revision we will perform additional analyses to make a more direct comparison.

      Finally, our focus was not to definitively test whether the brain predicts upcoming words, but rather to establish rigorous methodological and epistemological criteria for making such claims. We will elaborate on this crucial distinction in our revision and more prominently feature our central argument about the limitations of current evidence for neural prediction.

    1. eLife Assessment

      This study offers valuable insights into brain responses to words in the visual cortex of blind and sighted individuals. However, the evidence supporting the authors' claims remains incomplete, and the conclusions would benefit from a more comprehensive characterization of the conceptual properties of the word stimuli. This work will be of broad interest to cognitive neuroscientists, psycholinguists, and neurologists investigating meaning representation in the brain.

    2. Reviewer #1 (Public review):

      Summary:

      This fMRI study shows that two regions of the visual cortex (BA18 and BA19) of blind and sighted individuals carry information about the physical similarity of objects denoted by words. This effect was found for written words (Braille in blind, visual in sighted) but not spoken words. The evidence complements earlier studies reporting physical similarity effects in the occipitotemporal cortex of blind and sighted individuals (e.g., Peelen et al., 2014).

      Strengths:

      The study addresses an important question in the fields of neural plasticity and visual cortex organization. The study is generally well-conducted and the findings are clearly presented.

      Weaknesses:

      While the evidence is statistically strong, it is currently incomplete because of missing control analyses (see below). The framing of the results, as arguing against the pluripotent cortex account, is not entirely convincing as it was not clear that the study addressed the key predictions of that account.

      Main comments:

      (1) The study is framed as a test of Bedny's "cognitively pluripotent cortex" proposal (2017) that attributes the increased visual cortex response to linguistic stimuli in blind individuals to high-level cognitive functions. Key evidence for this account came from studies showing increased responses in blind visual cortex to certain grammatical manipulations and to solving mathematical equations. The current study did not include such manipulations. Instead, the current study focused on the representation of objects denoted by single words. Bedny's account did not make a strong argument that the physical similarity of word referents should be differently represented in blind and sighted individuals - if it did, please state this explicitly. Indeed, evidence that (some regions of) the visual cortex represent objects similarly in blind and sighted individuals does not seem incompatible with it.

      (2) Throughout the manuscript (including the abstract) it was not clear what was meant with "visual cortex" or "visual areas"; whether this refers to early visual cortex (V1/BA17) or to visual cortex more generally (e.g., BA17-BA19, occipitotemporal cortex (MT, etc)). This is important for the theoretical arguments and for the interpretation of the results. If visual cortex = BA17, the current results point to potentially important differences between blind and sighted individuals, with the physical similarity of objects only observed in the visual cortex of the blind. If visual cortex is meant to include areas beyond BA17, the blind and sighted show similarities in the current study, although such similarities have been observed before using similar research approaches.

      (3) Related to the point above, the abstract does not accurately describe the results, as it only describes the similarities between blind and sighted but not the differences. The study revealed differences between groups, particularly in BA17 - primary visual cortex. The differences between the groups are also illustrated by the strikingly different searchlight results in the two groups separately (Figure S6). These differences do not reach significance in a whole-brain-corrected contrast, but that likely reflects a lack of power (particularly for a between-group contrast).

      (4) Results were found for written words but not spoken words (Figure S9). This is somewhat surprising considering that the visual cortex was more strongly activated for written words in the sighted, with this activation presumably not adding any information about the physical properties of word referents. Together with the widespread significance of clusters correlating with the physical similarity matrix (Figure 6), this raises the possibility of a confound. It would be good to ensure that this is not the case, e.g., you could create similarity matrices based on word length, word visual similarity (e.g., overlap in letters), and word frequency, and correlate these matrices with the physical similarity matrix to ensure that these correlations are not positive (or if they are, partial it out).

      (5) The study included a task manipulation, with participants either judging physical or conceptual properties. This task manipulation is a central aspect of the design but does not feature anywhere in the results, and is also not discussed or introduced in the text. It would be interesting to know whether the results depend on the property (physical/conceptual) being task-relevant. But more importantly, a potential concern is that the responses in the task (given for each object using a two-response button box) correlate with physical or conceptual similarity and that this explains the fMRI findings. For example, two objects that are elongated would both receive a "yes" button press when participants answer the question "is this elongated"; these objects would also be rated as physically similar. This may apply more to physical than conceptual similarity. To exclude this possibility, the responses need to be analysed and included in the fMRI analyses, either as a regressor in the GLM or as another matrix to be partialed out at the final stage of analysis.

      (4) Many of the blind participants had some residual vision (9/20 had light perception, 2/20 had contour perception); this could possibly have prevented the reorganization of visual cortex.

    3. Reviewer #2 (Public review):

      Summary:

      The authors show, through rigorous and extensive analyses, that the visual cortex in both congenitally blind and sighted participants represented differences between individual words presented across sensory modalities. In both groups, the activation patterns for words in the visual cortex reflected physical, but not conceptual similarity between word referents. This suggests a similar representation for both groups of words, one derived from vision-oriented mechanisms, and does not reflect significant functional reorganization in blindness.

      Strengths:

      The theoretical question is sound, as is the analysis approach. The authors' literature discussion is thorough, and the writing is clear.

      Weaknesses:

      I have only minor concerns left open.

      (1) In the representational connectivity analysis, what is the average value across the brain? The authors compare the representational correlation across brain regions to the average value, but the average itself is not reported.

      (2) Can the authors add a map showing the representational connectivity values across the brain in addition to the bar plot? It would make it easier to see what networks show similar neural representation to the visual cortex.

      (3) Are the participants in the behavioral experiment from which the physical and conceptual similarity between word referents were collected matching in age or education with the fMRI participants?

      (4) Although there are no group differences in the correlation of the physical similarity, I think it is important to acknowledge that the effect is only significant at the searchlight level in the blind early visual cortex (Figure S6).

    4. Reviewer #3 (Public review):

      Summary:

      This study examines semantic processing in the visual cortex of both congenitally blind and sighted individuals using fMRI and multivariate pattern analysis (MVPA). The key finding is that the visual cortex in both groups encodes the physical properties of word referents, rather than their conceptual similarities. These results suggest that the same representational mechanisms operate in both the blind and sighted brain.

      Strengths:

      (1) The findings contribute to a broader understanding of cortical reorganization and provide evidence for top-down processing of word referents, even in the absence of visual experience.

      (2) The experiment incorporates both spoken and written word presentations (Braille for blind participants), ensuring that the results are not confounded by modality effects.

      (3) The study employs a rigorous methodological approach, combining multivariate and univariate analyses to strengthen the validity of its findings.

      (4) The paper is well-structured and clearly written, making it easy to follow.

      Weaknesses:

      (1) The word stimuli consists of only 20 nouns referring to concrete entities. However, in the behavioral experiment, participants rated the physical and conceptual similarity of only 30 word pairs, which represents just a subset of all possible word pair combinations. The average similarity ratings across subjects were then used to construct stimuli similarity matrices, which were correlated with the fMRI similarity matrices in the MVPA analysis. What is the rationale for presenting only a small subset of all possible word pair combinations to participants? Additionally, the instruction to rate the "conceptual similarity" of word pairs seems somewhat ambiguous. Would "conceptual similarity" correlate with "physical similarity"? Instead of subjective ratings, why not use cosine similarity scores from pretrained language models to construct the "conceptual similarity" matrices? This approach could provide a more objective and reproducible measure of conceptual similarity.

      (2) There are only six questions each for assessing the physical and conceptual properties of the words in the fMRI experiment. Most of the physical property questions focus on shape-related attributes (e.g., round, angular, elongated, symmetrical), while the conceptual properties are limited to three pairs of antonyms (living/non-living, natural/manufactured, pleasant/unpleasant). These aspects seem insufficient to comprehensively characterize the physical and conceptual properties of the nouns. What was the rationale behind selecting only these six questions? Could this limited set of attributes introduce bias in how the neural representations in the visual cortex are interpreted?

      (3) Two of the blind participants are right-handed, and two may have some form of contour vision. What was the rationale for including these participants? In addition, the sample size for blind participants is relatively small (N = 20). Does the sample size provide sufficient justification for the main conclusion that the visual cortex in both blind and sighted groups represents the physical properties of word referents? Additionally, could individual differences among blind participants impact the results, and were any analyses conducted to account for such variability?

      (4) I appreciate the authors' effort to integrate both univariate and multivariate approaches in their analyses. However, the results appear somewhat contradictory: The MVPA results suggest similar neural representations of word referents in the visual cortex for both blind and sighted participants. However, the univariate analyses indicate higher activation in the visual cortex of blind participants. How can these two findings be reconciled? The authors attributed the increased activation in the visual cortex of blind participants to their "enhanced excitability", but what exactly does "excitability" mean in this context? Could this increased activation instead reflect an alternative neural strategy for processing semantic information in the blind brain? If so, how does this align with the claim that similar representational mechanisms exist in both blind and sighted individuals?

      (5) The authors interpret their findings to suggest that the visual cortex can represent the physical properties of words even without visual experience, attributing this to top-down modulation from higher cognitive regions, which then backprojects to the visual cortex. However, it is unclear why only physical properties, and not conceptual properties, are backprojected. If higher cognitive regions modulate the visual cortex in a top-down manner, wouldn't both physical and conceptual attributes be expected to influence its activity? Could the authors clarify the mechanism that selectively supports physical property encoding over conceptual representation?

    1. eLife Assessment

      In this important study, the authors advance our understanding of copper uptake by chalkophores and their targeted metalloproteins in Mycobacterium tuberculosis. These convincing data demonstrate that chalkophore-acquired copper is solely incorporated into the Mtb bcc:aa3 copper-iron respiratory oxidase under low copper conditions, and that chalkophore-mediated protection of the respiratory chain is critical to Mtb virulence. These findings may be leveraged for drug discovery and will be of broad interest to those studying bacterial pathogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      It is known that the nrp operon is induced by copper deprivation and encodes the synthesis of chalkophores. The authors carried out a genetic analysis that revealed transcriptional differences for WT and Mtb∆nrp when exposed to the copper chelator tetrathiomolybdate (TTM). The authors found that copper chelation results in upregulation of genes in the chalkophore cluster as well as genes involved in the respiratory chain: including, components of the heme-dependent oxidase CytBD and subunits of the bcc:aa3 heme-copper oxidase. Utilizing several knockout variants and inhibitors, the authors showed that copper starvation survival requires chalkophore synthesis and that copper starvation results in dysfunctional bcc:aa3 oxidase. By monitoring oxygen consumption, they go on to show that copper deprivation inhibits respiration through the bcc:aa3 oxidase. Lastly, the authors compare virulence of WT Mtb, Mtb∆nrp and MtbΔnrpΔcydAB strains in mice spleen and lung. The Mtb∆nrp strain showed mild attenuation, but virulence in MtbΔnrpΔcydAB was severely attenuated and complementation with the chalkophore biosynthetic pathway restored Mtb virulence. These results suggest that chalkophore mediated protection of the respiratory chain is critical to Mtb virulence, and that redundant respiratory oxidases within Mtb provide respiratory chain flexibility that may promote host adaptation.

      This new information about Mtb biology may be leveraged for drug discovery, highlighting that the Mtb respiratory pathway is a promising drug target, where one may target the Mtb chalkophore biosynthetic pathway in conjunction with CytBD, to obliterate Mtb.

      Strengths: Overall, the paper is very clear and well written, with thorough and well-thought-out experimentation.

      No weaknesses.

      Comments on revisions:

      The authors have addressed all the reviewers' comments.

    3. Reviewer #2 (Public review):

      Summary:

      This is a well-written manuscript that clearly demonstrates that the nrp encoded diisonitrile chalkophore is necessary for function of the bcc-aa3 oxidase supercomplex under low copper conditions. In addition, the study demonstrates the chlakophore is important early during infection when copper sequestration is employed by the host as a method of nutritional immunity.

      Strengths:

      The authors use genetic approaches, including single and double mutants of chalkophore biosynthesis, and both the Mtb oxidases. Use a copper chelators to restrict copper in vitro. A strength of the work was the use of a synthesized a Mtb chalkophore analogue to show chemical complementation of the mutant nrp locus. Oxphos metabolic activity was measured by oxygen consumption and ATP levels. Importantly, the study demonstrated that chalkophore, especially in a strain lacking the secondary oxidase, was necessary for early infection and ruled out a role for adaptive immunity in the chalkophore lacking Mtb by use of SCID mice. It is interesting that after two weeks of infection and onset of adaptive immunity the chalkophore is not required, which is consistent with the host environment switching from a copper restricted to copper overload in phagosomes.

      Weaknesses:

      None noted

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the group of Glickman expand on their previous studies on the function of chalkophores during growth of and infection by Mycobacterium tuberculosis. Previously, the group had shown that chalkophores, which are metallophores specific for the scavenging of copper, are induced by M. tuberculosis under copper deprivation conditions. Here, they show that chalkophores, under copper limiting conditions, are essential for the uptake of copper and maturation of a terminal oxidase, the heme-copper oxidase, cytochrome bcc:aa3. As M. tuberculosis has two redundant terminal oxidases, growth of and infection by M. tuberculosis is only moderated if both the chalkophores and the second terminal oxidase, cytochrome bd, are inhibited.

      Strengths:

      A strength of this work is that the lab-culture experiments are complemented with mice infection models, providing strong indications that host-inflicted copper deprivation is a condition that M. tuberculosis has adapted to for virulence.

      Weaknesses:

      Because the phenotype of M. tuberculosis lacking chalkophores is similar, if not identical, to using Q203, an inhibitor of cytochrome bcc:aa3, the authors propose that the copper-containing cytochrome bcc:aa3 is the only recipient of copper-uptake by chalkophores. A minor weakness of the work is that this latter conclusion is not verified under infection conditions and other copper-enzymes might still be functionally required during one or more stages of infection.

      Comments on revisions:

      I thank the authors for carefully addressing my suggestion to the original submission and congratulate them on their work.

    5. Author response:

      The following is the authors’ response to the original reviews

      Response to public reviews:

      We thank the reviewers for their careful evaluation of our manuscript and appreciate the suggestions for improvement. We will outline our planned revisions in response to these reviews.

      Reviewer 2: “The one exception is the claim that "maintenance of respiration is the only cellular target of chalkophore mediated copper acquisition." While under the in vitro conditions tested this does appear to be the case; however, it can't be ruled out that the chalkophore is important in other situations. In particular, for maintenance of the periplasmic superoxide dismutase, SodC, which is the other M. tuberculosis enzyme known to require copper.”

      And

      Reviewer 3: “Because the phenotype of M. tuberculosis lacking chalkophores is similar, if not identical, to using Q203, an inhibitor of cytochrome bcc:aa3, the authors propose that the coppercontaining cytochrome bcc:aa3 is the only recipient of copper-uptake by chalkophores. A minor weakness of the work is that this latter conclusion is not verified under infection conditions and other copper-enzymes might still be functionally required during one or more stages of infection.

      Both comments concern the question of whether the bcc:aa3 respiratory oxidase supercomplex is the only target of chalkophore delivered copper. In culture, our experiments suggest that bcc:aa3 is the only target. The evidence for this claim is in Figure 2E and F. In 2E, we show that M. tuberculosis D_ctaD_ (a subunit of bcc:aa3) is growth impaired, copper chelation with TTM does not exacerbate that growth defect, and that a D_ctaD_D_nrp_ double mutant is no more sensitive to TTM than D_ctaD_. These data indicate that role of the chalkophore in protecting against copper deprivation is absent when the bcc:aa3 oxidase is missing. Similar results were obtained with Q203 (Figure 2F). Q203 or TTM arrest growth of M. tuberculosis D_nrp, but the combination has no additional effect, indicating that when Q203 is inhibiting the _bcc:aa3 oxidase, the chalkophore has no additional role. However, we agree with the reviewers that we cannot exclude the possibility that during infection, there is an additional target of chalkophore mediated Cu acquisition. We have added this caveat to the discussion of revised version of this manuscript. 

      Response to Reviewers Recommendations for the authors:

      Reviewing Editor Comments:

      In addition to the specific recommendations below, there was consensus that the conclusions/discussion should contextualize that the results cannot exclude that in other conditions (such as in infection), enzymes other than cytochrome bcc:aa3 receive copper from the chalkophore system.

      Reviewer #1 (Recommendations for the authors):

      (1) In the introduction, the authors mention that the nrp operon is only present in pathogenic Mtb and Mycobacterium marinum but not non-pathogenic mycobacterium. Is the nrp operon present in other pathogenic mycobacterium such as in M. leprae, M. avium or M. abscessus?

      Bhatt et al (PMID 30381350) presented an analysis of the distribution of nrp gene clusters in mycobacteria and concluded that M. bovis, M. leprae and M. canetti clearly encode nrp genes. M. marinum has been shown to have a functional chalkophore biosynthetic cluster, but the presence of this system in other mycobacteria awaits experimental validation. We have added the Bhatt reference to this sentence in the introduction. 

      (2) Figure 1A - it would be helpful if the genes were grouped and labeled as per their purpose (for example, CytBD components, bcc:aa3 components). While these are described in the text, the genes belonging to the chalkophore cluster are not defined in the text, and are thus not easily identified in the figure.

      The order of genes in the heatmap is determined by unsupervised clustering as indicated by the dendrogram to the left of the heatmap. To highlight chalkophore and CytBD genes, we have added color coding to the gene names and explained this color coding in the legend. 

      (3) Figure 2B/2C - it is interesting that complementation of ΔnrpΔcydAB with cydABCD does not rescue growth to Δnrp levels. Is there an explanation for this? 

      AND

      (4) Figure 2C - BCS is not introduced in the text for this figure nor are the results described - which seems like an oversight. It is interesting that BCS treatment does have a full rescue with cydABCD complementation, while TTM treatment does not. Is there an explanation for this?

      We thank the reviewer for raising this issue. We have attempted several different complementation constructs, including CydAB alone and different promoters, to address the partial complementation in question. However, we do not have an adequate explanation for this partial complementation. As the reviewer notes, the partial complementation is only evident with TTM, not BCS. However, we cannot speculate on the reason for this difference at present.  We have added a note to the text in the results section noting this difference. 

      (5) Figure 2F - is there a reason for the change in TTM concentrations (50 μM TTM vs 10 μM TTM)? Is the concentration for Q203 in both single treatment and combinatory tests 100nM?  

      We have clarified the 100nm Q203 concentration in the figure legend. To avoid confusion, we have removed the 50µM TTM condition from panel F because the growth inhibition phenotype of 10µM is shown in panel E and is the comparator for the combined TTM/Q203 condition in panel F. 

      (6) Figure 3A - I assume d0 = day 0, d3 = day 3. This should be defined.

      We have modified the legend to clarify these abbreviations. 

      (7) Figure 4B - as complementation of nrp for ΔnrpΔcydAB returns levels back to WT, I assume there is no attenuation with ΔcydAB alone? Clarification would be appreciated.

      The mouse phenotype of M. tuberculosis D_cydAB_ is reported here:

      https://www.pnas.org/doi/10.1073/pnas.1706139114#sec-1 and this paper is reference 22 of the paper and was noted in the discussion. 

      Reviewer #2 (Recommendations for the authors):

      In vitro conditions that require SodC could reveal a role for the chalkophore (ie., exposure to extracellular or periplasmic superoxide stress under low iron conditions). Some minor confusion exists with the terminology around the two oxidases found in M. tuberculosis. The bcc:aa3 oxidase is a supercomplex between the reductase and oxidase complexes. This point should be clarified in the introduction as the term supercomplex isn't used until later in line 194 and without definition. Referring to the bcc:aa3 supercomplex as an oxidase is fine but is sometimes confusing especially when mentioning the target of Q203 is the oxidase as it targets the reductase portion of the supercomplex.

      We thank the reviewer for this point. We have modified the text to refer to the supercomplex at first mention and modified subsequent mentions to be clearer. 

      In the RNA preparation section boxes appear in several places where spaces should be.

      We do not see these boxes so we suspect this is a conversion error of some type. 

      Reviewer #3 (Recommendations for the authors):

      The authors have very carefully performed their studies and their main conclusions are amply supported by the data. The manuscript is also very clearly written, and easily accessible to a broad audience interested in both bioinorganic chemistry and mycobacteria. I have two recommendations:

      (1) I agree that the evidence shows that chalkophores provide copper to cytochrome bcc:aa3. Under lab-culture conditions, it could well be that, when cytochrome bd is deleted or inhibited, cytochrome bcc:aa3 is rate limiting. Under lab-culture conditions, it is also clear that only the expression of a select number of enzymes is affected. However, this does not mean that cytochrome bcc:aa3 is the ONLY enzyme that receives copper from chalkophores. Thus, under infection conditions, other copper enzymes might be important. For instance, M. tuberculosis expresses a Cu-Zn superoxide dismutase. In summary, perhaps the authors would consider changing the wording of statements such as that in Figure 2E and the conclusions drawn in the discussion.

      This comment concerns the question of whether the bcc:aa3 respiratory supercomplex is the only target of chalkophore delivered copper. In culture, our experiments suggest that the supercomplex is the only target. The evidence for this claim is in Figure 2E and F. In 2E, we show that M. tuberculosis D_ctaD_ (a subunit of the bcc:aa3 supercomplex) is growth impaired, copper chelation with TTM does not exacerbate that growth defect, and that a D_ctaD_D_nrp_ double mutant is no more sensitive to TTM than D_ctaD_. These data indicate that role of the chalkophore in protecting against copper deprivation is absent when the bcc:aa3 supercomplex is missing. Similar results were obtained with Q203 (Figure 2F). Q203 or TTM arrest growth of M. tuberculosis D_nrp, but the combination has no additional effect, indicating that when Q203 is inhibiting _bcc:aa3, the chalkophore has no additional role. However, we agree with the reviewers that we cannot exclude the possibility that during infection, there is an additional target of chalkophore mediated Cu acquisition. We have added the following to the discussion: “Although chalkophore mediated protection of the bcc:aa3 supercomplex is an important virulence function, we cannot exclude the possibility that additional copper dependent enzymes use chalkophore delivered copper during infection.”

      (2) There is a difference between copper-uptake (e.g. by chalkophores) and the maturation of metallo-enzymes. A short paragraph discussing knowledge from other bacteria in this area would help understand the role chalkophores (e.g. see 10.1128/mBio.00065-18 or 10.1111/mmi.14701). This could possibly be extended with a genome analysis to check which other proteins are present in M. tuberculosis.

      We thank the reviewer for this point. We agree that our data does not distinguish between 1) a generic role for the chalkophore in copper uptake, with the ultimate candidate metalloenzyme rendered dysfunctional by copper loss, and 2) the chalkophore being an intrinsic part of the cytochrome maturation pathway and interacting directly with the target enzymes. We have added this point to the discussion but have not otherwise added the suggested full discussion of metalloenzyme maturation as we believe this discussion is beyond the scope of our data. 

      Finally, can I suggest the labels d0 and d3 are made clearer in Figure 3A (and defined in the legend).

      We have modified the legend to be clearer.

    1. eLife Assessment

      This is a valuable study that tests the functional role of food-washing behavior in removing tooth-damaging sand and grit in long-tailed macaques and whether dominance rank predicts level of investment in the behavior. The evidence that food-washing is deliberate is compelling and the evidence that individual investment in the behavior varies is solid. Overall, the paper should be of interest to researchers interested in foraging behavior, cognition, and primate evolution.

    2. Reviewer #1 (Public review):

      In this paper, the authors had 2 aims:

      (1) Measure macaques' aversion to sand and see if its' removal is intentional, as it likely in an unpleasurable sensation that causes tooth damage.

      (2) Show that or see if monkeys engage in suboptimal behavior by cleaning foods beyond the point of diminishing returns, and see if this was related to individual traits such as sex and rank, and behavioral technique.

      They attempted to achieve these aims through a combination of geochemical analysis of sand, field experiments, and comparing predictions to an analytical model.

      The authors' conclusions were that they verified a long-standing assumption that monkeys have an aversion to sand as it contains many potentially damaging fine grained silicates, and that removing it via brushing or washing is intentional.

      They also concluded that monkeys will clean food for longer than is necessary, i.e. beyond the point of diminishing returns, and that this is rank-dependent.

      High and low-ranking monkeys tended not to wash their food, but instead over-brushed it, potentially to minimize handling time and maximize caloric intake, despite the long-term cumulative costs of sand.

      This was interpreted through the *disposable soma hypothesis*, where dominants maximize immediate needs to maintain rank and increase reproductive success at the potential expense of long-term health and survival.

      Strengths:

      The field experiment seemed well designed, and their quantification of the physical and mineral properties of quartz particles (relative to human detection thresholds) seemed good relative to their feret diameter and particle circularity (to a reviewer that is not an expert in sand). The *Rank Determination* and *Measuring Sand* sections were clear.

      In achieving Aim 1, the authors validated a commonly interpreted, but unmeasured function, of macaque and primate behavior-- a key study/finding in primate food processing and cultural transmission research.

      I commend their approach in trying to develop a quantitative model to generate predictions to compare to empirical data for their second aim.<br /> This is something others should strive for.

      I really appreciated the historical context of this paper in the introduction and found it very enjoyable and easy to read.

      I do think that interpreting these results in the context of the *disposable soma hypothesis* and the potential implications in the *paleolithic matters* section about interpreting dental wear in the fossil record are worthwhile.

    3. Author response:

      The following is the authors’ response to the previous reviews

      We thank the editors and Reviewers 1 and 3 for their though6ul consideration of our manuscript. The present revision is submitted to address comments raised concerning rank determinations and the following sentence in the editorial assessment:

      The evidence that food-washing is deliberate is compelling, but the evidence for variable and adaptive investment depending on rank, including the fitness-relevance and ultimate evolutionary implications of the findings, is incomplete given limitations of the experimental design.

      Close reading of this sentence reveals two parallel threads. The first can be read as “…evidence for variable rank is incomplete given the limitations of the experimental design,” whereas the second can be read as “…evidence for adaptive investment and fitness is incomplete given the limitations of the experimental design.” The first alludes to a critique of our methods, while the second alludes to points of discussion unrelated to our experimental design. Unpacking this sentence is important because it casts the totality of our paper as “incomplete,” a word of consequence for early-career scholars because it prevents indexing in Web of Science.

      For clarity, we will refer to these topics as Thread 1 and Thread 2 in the following response.

      Thread 1 seems rooted in a comment made by Reviewer 1, which is reproduced below:

      I am still struck that there was an analysis of only trials where <3 individuals are present. If rank was important, I would imagine that behavior might be different in social contexts when theA, scrounging, policing, aggression, or other distractions might occur-- where rank would have effects on foraging behavior. Maybe lower rankers prioritize rapid food intake then. If rank should be related to investment in this behavior, we might expect this to be magnified (or different) in social contexts where it would affect foraging. It might just be that the data was too hard to score or process in those settings, or the analysis was limited. Additionally, I think that more robust metrics of rank from more densely sampled focal follow data would be a beJer measure, but I acknowledge the limitations in getting the ideal. Since rank is central to the interpretation of these results, I think that reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.

      We are grateful for this perspective of Reviewer 1, but it puts us in an uncomfortable position. We must respond rather forcefully because of its influence on the above assessment. A problem with R1’s comment is that it uses the word “foraging” (a behavior we did not study) instead of “cleaning” (the behavior we did study). Still, we can substitute the latter word with the former to get the gist of it. 

      R1 criticizes our methods as a prelude for imagining the behaviors of our study animals, a form of conjecture. R1 correctly supposes a positive relationship between the number of animals and the intensity of competition for a limited food resource, a well-known phenomenon; and, yes, the food in each trial was decidedly limited, being fixed at nine cucumber slices. But R1 incorrectly presumes rank effects on cleaning under conditions of intense food competition. When the number of monkeys participating in a trial exceeded the number of feeding stations (n = 3), we saw little or no cleaning effort, either brushing or washing. So, rank effects on cleaning are immaterial under these conditions. As our study goals were narrowly focused on detecting individual propensities, or choices, as a function of rank, we limited our analysis to trials involving three monkeys or fewer. In retrospect, we admit that we should have provided better justification for our choice of trials, so we’ve edited one of our sentences:

      Original sentence 

      Formerly lines 219-220: To minimize the potential confounding effects of dominance interactions, we analyzed trials with ≤ 3 monkeys.

      Revised sentence

      Current lines 219-224: We excluded trials from analysis if the number of participating monkeys exceeded the number of feeding stations, as these conditions produced high levels of feeding competition with scant cleaning behavior. Such conditions effectively erased individual variation in sand removal, the topic motivating our experiment. Accordingly, we analyzed trials with ≤ 3 monkeys, putting 937 food-handling bouts into the GLMM statistical models, which included data on individual rank, sex, and sand treatment.

      R1’s final criticism – “I think that more robust metrics of rank from more densely sampled focal follow data would be a better measure, but I acknowledge the limitations in getting the ideal” – seems to imply that rank data were collected during our experiment. On the contrary, we determined ranks from five years of focal follows preceding the experiment, achieving the very standard that R1 describes as ideal. The relevant text appeared on lines 165-169 in version 2.0:

      To determine the rank-order of adults, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). In some cases, these data were supplemented with ad libitum observations. This protocol existed during five years (2013-2018) of continual observations before we conducted our experiment in July-August 2018. 

      Naturally, we were puzzled by R1’s dismissal of our methods, as well as R1’s conclusion, reached without evidence, that “[the] reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.” It is unsubstantiated assertation with no definition of robustness, making it difficult for anyone to objectively assess the quality of our data.

      We detect in R1’s words some unfamiliarity with the social organization of our study species, which is fair enough. To better orient readers to the dominance hierarchy of Macaca fascicularis, and to boost reader confidence in the volume and quality of our rank data, we have added several sentences to this section of the manuscript, lines 169-183:

      Macaques form multi-male multi-female (polygynandrous) social groups with individual dominance hierarchies. In M. fascicularis, the hierarchy is strictly linear and extremely steep, meaning aggression is unidirectional (de Waal, 1977; van Noordwijk and van Schaik, 2001) with profound asymmetries in outcomes for individuals of adjacent ranks (Balasubramaniam et al., 2012). Further, the dominance hierarchies of philopatric females are stable and predictable. Daughters follow the pattern of youngest ascendancy, ranking just below their mothers with few known exceptions among older sisters (de Waal, 1977; van Noordwijk and van Schaik, 1999). Taken together, these species traits are conducive to unequivocal rank determinations. 

      To determine the rank-order of adults in our study group, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5-min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). These data were supplemented with ad libitum observations and all rank determinations were updated monthly, and when males immigrated or emigrated. This protocol predates our experiment in July-August 2018, representing 970 hr of focal data during five years of systematic study (2013-2018). 

      Thread 2 criticizes our evidence for adaptive investment and fitness, describing it is a limitation of our experimental design. Accordingly, the totality of our experiment was classified as “incomplete.” Yet, our experiment was never designed to collect such evidence, and we make no claims of having it. Rather, we discussed potential fitness consequences to highlight the broader significance of our study, connecting it diverse bodies of literature, from evolutionary theory to paleoanthropology. Our intent was to follow the conventions of scientific writing; to put our results into conversation with the wider literature and set an agenda for future research.

      On reflection, Thread 2 seems to pivot around something as arbitrary as structure. Previously, our results and discussion were combined under a single section header (“Results and Discussion”), a stylistic choice to economize words. Our manuscript is a Short Report, which is limited to 1,500 words of main text. But this level of concision proved counterproductive. It blurred our results and discussion in the minds of readers. Indeed, Reviewer 3 described it as “misleading,” a barbed word that accomplishes the same act attributed to us. To counter this perspective, we have simply partitioned our Results (now “Experimental Results”) and Discussion to draw a sharper distinction between the two components of our paper.

    1. eLife Assessment

      This important study identifies a mechanism by which caspases are activated in a non-lethal context to induce functional modulation in Drosophila olfactory receptor neurons. To deliver, the authors generated a new reporter of caspases, used TurboID to identify proteins proximal of the Drosophila executioner caspases Drice, and then focused on Fasciclin 3 as a mediator. The experimental results and the main conclusions are convincing. This substantial body of work will be of interest to researchers across fields, from neuroscience of olfaction to development and cell biology.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Muramoto and colleagues have examined a mechanism by which the executioner caspase Drice is activated in a non-lethal context in Drosophila. The authors have comprehensively examined this in the Drosophila olfactory receptor neurons using sophisticated techniques. In particular, they had to engineer a new reporter by which non-lethal caspase activation could be detected. The authors conducted a proximity labeling experiment and identified Fasciclin 3 as a key protein in this context. While removal of Fascilin 3 did not block non-lethal caspase activation (likely because of redundant mechanisms), its overexpression was sufficient to activate non-lethal caspase activation.

      Strengths:

      While non-lethal functions of caspases have been reported in several contexts, far less is known about the mechanisms by which caspases are activated in these non-lethal contexts. So, the topic is very timely. The overall detail of this work is impressive and the results, for the most part, are well controlled and justified.

      Weaknesses:

      The behavioral results shown in Fig. 6 need more explanation and clarification (more details below). As currently shown, the results of Fig. 6 seem uninterpretable. Also, overall presentation of the Figures and description in legends can be improved.

      Comments on revisions:

      The authors have adequately addressed my comments.

    3. Reviewer #2 (Public review):

      In this revised version of the study, the authors investigate the role of caspases in neuronal modulation through non-lethal activation. They analyze proximal proteins of executioner caspases using a variety of techniques, including TurboID and a newly developed monitoring system based on Gal4 manipulation, called MASCaT. They demonstrate that overexpression of Fas3G promotes the non-lethal activation of caspase Dronc in olfactory receptor neurons. In addition, they investigate the regulatory mechanisms of non-lethal function of caspase by performing a comprehensive analysis of proximal proteins of executioner caspase Drice. It is important to point out that the authors use an array of techniques from western blot to behavioral experiments and also that the generated several reagents, from fly lines to antibodies. In this revised version of the manuscript the authors addressed the concerns raised by this reviewer in a very thorough way. This is an interesting work that would appeal to readers of multiple disciplines. As a whole these findings suggest that overexpression of Fas3G enhances a non-lethal caspase activation in ORNs, providing a novel experimental model that will allow for exploration of molecular processes that facilitate caspase activation without leading to cell death.

      Comments on revisions:

      I would like to thank the authors for fully addressing my concerns.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Muramoto and colleagues have examined a mechanism by which the executioner caspase Drice is activated in a non-lethal context in Drosophila. The authors have comprehensively examined this in the Drosophila olfactory receptor neurons using sophisticated techniques. In particular, they had to engineer a new reporter by which non-lethal caspase activation could be detected. The authors conducted a proximity labeling experiment and identified Fasciclin 3 as a key protein in this context. While the removal of Fascilin 3 did not block non-lethal caspase activation (likely because of redundant mechanisms), its overexpression was sufficient to activate non-lethal caspase activation.

      Strengths:

      While non-lethal functions of caspases have been reported in several contexts, far less is known about the mechanisms by which caspases are activated in these non-lethal contexts. So, the topic is very timely. The overall detail of this work is impressive and the results for the most part are wellcontrolled and justified.

      Weaknesses:

      The behavioral results shown in Figure 6 need more explanation and clarification (more details below). As currently shown, the results of Figure 6 seem uninterpretable. Also, overall presentation of the Figures and description in legends can be improved.

      We sincerely thank the reviewer for their highly positive evaluation of our study, particularly from a technical perspective. We also greatly appreciate the valuable comments provided on our manuscript. In response, we have revised the manuscript with a particular focus on Figure 6, as well as the overall presentation of the figure and its description in the legends, in accordance with the reviewer’s suggestions. For further clarification, please refer to our detailed point-by-point responses provided below.

      Reviewer #2 (Public review):

      In this study, the authors investigate the role of caspases in neuronal modulation through non-lethal activation. They analyze proximal proteins of executioner caspases using a variety of techniques, including TurboID and a newly developed monitoring system based on Gal4 manipulation, called MASCaT. They demonstrate that overexpression of Fas3G promotes the non-lethal activation of caspase Dronc in olfactory receptor neurons. In addition, they investigate the regulatory mechanisms of non-lethal function of caspase by performing a comprehensive analysis of proximal proteins of executioner caspase Drice. It is important to point out that the authors use an array of techniques from western blot to behavioral experiments and also that the generated several reagents, from fly lines to antibodies.

      This is an interesting work that would appeal to readers of multiple disciplines. As a whole these findings suggest that overexpression of Fas3G enhances a non-lethal caspase activation in ORNs, providing a novel experimental model that will allow for exploration of molecular processes that facilitate caspase activation without leading to cell death.

      We sincerely thank the reviewer for their highly positive evaluation of our study, particularly from a methodological perspective. We also greatly appreciate the valuable comments provided on our manuscript. In response, we have revised the manuscript in line with the reviewer’s suggestions. For further clarification, please refer to our detailed point-by-point responses provided below.

      Reviewing Editor comments:

      I am pleased to let you know that our reviewers found the results in your paper important and the evidence compelling. There are a few minor comments and a point was raised regarding figure 6 for which further details were asked. Please see the reviewer's comments. We are looking forward to receiving an updated version of your very interesting paper.

      We are grateful to you and the reviewers for dedicating time to review our manuscript and for providing insightful comments and suggestions. We have revised our manuscript in line with the reviewers' feedback. The major revision involves clarifying the two-choice preference assay presented in Figure 6. Details of these revisions are provided in our point-by-point responses to the reviewers’ comments below. The new and extensively modified sections of text are highlighted in blue. We have introduced new panels (Figures 1D, 3D, 6B, and 6C) and made modifications to Figure 6A. The previous Figure 1D has been relocated to Figure 1–figure supplement 1B. Additionally, our detailed responses to the reviewers’ comments are also highlighted in blue within the point-by-point response section. With all concerns and suggestions from the Editor and reviewers addressed, our conclusion—that executioner caspase is proximal to Fasciclin 3 which facilitates non-lethal activation in Drosophila olfactory receptor neurons—is now more robustly supported. We are confident that our revised manuscript makes a significant contribution to the fields of caspase function and neurobiology. We remain hopeful that the reviewers will find it suitable for publication in eLife.

      Reviewer #1 (Recommendations for the authors):

      The main comment here is related to Figure 6, which needs to be better explained. First, if the results in Figure 6B and C are conducted with young flies, why is the preference index close to 0? Aren't these young flies more attracted to ACV? Second, what are the results with Dronc-RNAi and DroncDN alone? These should be shown to more accurately assess the outcome of Fas3G expression with and without Dronc inhibition. Third, if Fas3G overexpression induces non-lethal caspase activation and a behavioral change, why does Dronc inhibition enhance (and not suppress) this behavioral change?

      We sincerely thank the reviewer for the comment. We used one-week-old young flies for the two-choice preference assay. We found that 16 hours of starvation combined with 25% ACV in the trap elicited a robust attraction behavior to the vinegar (New Figure 6B). In contrast, 4 hours of starvation with 1% ACV in the trap resulted in milder attraction behavior, with the preference index value being close to 0 but still showing a positive trend (New Figure 6B). Since our hypothesis is that non-lethal caspase activation suppresses attraction behavior, and that inhibiting caspase activation could enhance attraction, we used the milder experimental condition for subsequent analyses.

      In the original manuscript, we did not test Dronc inhibition alone because caspase activation is rarely observed in young flies (as demonstrated in Figure 3C, New Figure 3D, etc), suggesting that Dronc inhibition during this stage would not affect behavior. This hypothesis is further supported by previous research showing that inhibition of caspase activity in aged flies restores attraction behavior but does has no effect in young flies (Chihara et al., 2014). To validate this hypothesis, we conducted the two-choice preference assay again, including caspase activity inhibition by Dronc<sup>DN</sup> expression alone. As expected, Dronc inhibition alone did not alter behavior in young flies (New Figure 6C).

      We also observed that Fas3G overexpression promotes a weak, though not statistically significant, enhancement in attraction behavior. Importantly, simultaneous inhibition of caspase activity further enhanced attraction behavior (New Figure 6C). These results suggest that Fas3G overexpression has a dual function: one aspect promotes attraction behavior, while the other induces non-lethal caspase activation. In this context, non-lethal caspase activation appears to counteract the behavioral response, acting as a regulatory brake. To address the reviewer’s comments comprehensively, we included the New Figure 6B and replaced the original Figure 6B and C with New Figure 6C. Additionally, we revised the manuscript text as follows:

      Using a two-choice preference assay with ACV (Figure 6A), we found that 16 hours of starvation combined with 25% ACV in the trap elicited a robust attraction behavior to the vinegar (Figure 6B). In contrast, 4 hours of starvation with 1% ACV in the trap resulted in milder attraction behavior, with the preference index value being close to 0 but still showing a positive trend (Figure 6B). Under the milder experimental condition, we first confirmed that inhibition of caspase activity through expressing Dronc<sup>DN</sup> didn’t affect attraction behavior in young adult (Figure 6C), consistent with a previous report (Chihara et al., 2014).We then observed that the overexpression of Fas3G, which activates caspases, did not impair attraction behavior. Instead, it rather appeared to enhance the tendency for attraction behavior (Figure 6C), suggesting that Fas3G promotes attraction behavior. Finally, we found that inhibiting Fas3G overexpression-facilitated non-lethal caspase activation by expressing Dronc<sup>DN</sup> strongly promoted attraction to ACV (Figure 6C). Overall, these results suggest that Fas3G overexpression has a dual function: it enhances attraction behavior while also triggering non-lethal caspase activation, which counteracts the behavioral response, functioning as a regulatory brake without causing cell death.

      Other minor comments are below:

      The authors should clarify that while they refer to their caspases reporters as "non-lethal caspase reporters", these are caspase reporters in general and can report both lethal and non-lethal caspase activation. Of course, the only surviving cells are those that experience non-lethal caspase activation.

      We thank the reviewer for pointing this out. This reporter can monitor caspase activation with high sensitivity only if the cell is capable of transcribing and translating the reporter proteins following cleavage of the probe, most likely in living cells. However, as mentioned, using the term “non-lethal reporter” is not accurate, as additional experiments are required to determine whether caspase activation leads to cell death. Therefore, we removed the term “non-lethal” and referred to this reporter simply as a highly sensitive caspase reporter in the revised manuscript.

      Some of the figure panels could be better described in the legends (e.g. Figure 1E, 1F, 4E, 4F).

      We thank the reviewer for the comment. We have included additional explanations in the figure legends throughout the manuscript.

      In Figure 3C, the OL and AL regions should be marked in the figure as done in Figure 1C.

      We thank the reviewer for the comment. We have marked OL and AL regions in Figure 3C and Figure 2A as in Figure 1C.

      In Figures 4A and B, the authors should rearrange the order of the x-axis to reflect the order that appears in the text (Dronc first).

      We thank the reviewer for the comment. We have rearranged the order of labels on the X-axis to reflect the order that appears in the text.

      In Figure 6B, do the colors imply anything? If so, it should be explained. 

      We thank the reviewer for pointing this out. We intended to use the colors where the light blue bars represent Fas3G overexpression, while the red dots indicate caspase-activated conditions. In the New Figure 6C, we used light blue dots for Fas3G overexpression and red bars for caspase-activated conditions. We have added an explanation in the figure legend. In addition, we have removed the colors in Figure 4B and have added an explanation in the figure legend in Figure 4D.  

      Reviewer #2 (Recommendations for the authors):

      (1) For the methods section make a table for the lines, the way they are listed is not the most easy to read.

      We thank the reviewer for the comment. We have listed the fly strains used in this study in Table S3.

      (2) Lines 420 to 573, not sure why this is here, this information should be in the figure or figure legend, or make a table if necessary.

      We thank the reviewer for the comment. We have listed the detailed genotypes corresponding to each figure in Table S4.

      (3) Blocking with donkey serum, do you get better results than bovine?

      We have not conducted tests with bovine serum for immunohistochemistry. Donkey serum was used throughout the manuscript.

      (4) The Methods section is very thorough and complete but I recommend the use of tables to organize some of the reagents used.

      We thank the reviewer for the comment. We have listed the fly strains used in this study in Table S3 and the detailed genotypes corresponding to each figure in Table S4.

      (5) Line 647 spells out LC-MS/MS.

      We thank the reviewer for pointing this out. We have provided the full spelling as “liquidchromatography-tandem mass spectrometry”.

      (6) Line 808 spells out ACV (apple cider vinegar) and MQ (MilliQ water).

      We thank the reviewer for pointing this out. We have provided the full spelling as suggested.

      (7) Figure 1D. Why do you use only females? 

      We thank the reviewer for pointing this out. In the original manuscript, we analyzed female flies by crossing each Gal4 strain with UAS-Drice-RNAi; Drice::V5::TurboID virgin females. In this case, because Pebbled-Gal4 is located on X chromosome, we could only use female flies for the analysis. To address this, we examined the expression pattern in males flies by crossing each Gal4 virgin female with UAS-Drice-RNAi; Drice::V5::TurboID males. As expected, Drice expression is also mostly depleted when using the ORN-specific Gal4 driver, Pebbled-Gal4, suggesting that Drice expression is predominantly observed in ORNs in males as well. We have added New Figure 1D to present the male data. The original Figure 1D, which presents female data, has been relocated to Figure 1–figure supplement 1B.

      (8) Figure 1D. Be clear about the LN driver used here in the figure.

      We thank the reviewer for pointing this out. We used Orb<sup>0449</sup>-Gal4 driver (#63325, Bloomington Drosophila Stock Center), which has been previously characterized as an LN-specific Gal4 driver (Wu et al., 2017). Accordingly, we have revised “LN-Gal4” to “Orb<sup>0449</sup>-Gal4” throughout the manuscript.

      (9) Figure 1 and Supplementary Figure 1 images are very good. I would recommend the use of a different color palette, to help visualization for colorblind readers (such as this reviewer).

      We apologize for any inconvenience caused. We chose the green/magenta color pair because these are complementary colors, which generally provide better contrast compared to other color pairs. Therefore, we have decided to continue using this pair. To enhance readability, we have intensified the magenta signal in the New Figure 1D and Figure 1–figure supplement 1B. We retained the original magenta signal levels in Figure 1C and Figure 1–figure supplement 1A to avoid oversaturation. Instead, we have kept the Streptavidin-only signal images alongside the color merged images for clarity. We hope these adjustments improve the visualization and help you better interpret the figures.

      (10) Based on Supplementary Figure 1 and based on the fact that Figures 1B and 1C use males, why not used also males for Figure 1D?

      Please refer to our reply to comment #7. We have now included the results for males in the New Figure 1D, which show a similar expression pattern to that observed in females. The results for females originally shown in Figure 1D have been relocated to Figure 1–figure supplement 1B.

      (11) Why were the old versus young flies used for Figure 3 raised at 29C? Why not let the animals age at 25C? The use of 29C throughout the manuscript is not clear.

      We thank the reviewer for pointing this out. Most of the UAS fly strains used in this study, including a Fas3G overexpression line, are UASz lines, which exhibit relatively low expression levels compared to UASt lines (DeLuca and Spradling, 2018). Since the Gal4/UAS system is temperature-dependent (Duffy, 2002), we performed most of the experiments at 29°C to enhance gene expression.

      For the aging experiments, we chose to rear flies at 29°C because higher temperatures accelerate aging including neuronal aging (Okenve-Ramos et al., 2024), allowing for faster experimentation, and 29°C is within the ecologically relevant range of temperatures for Drosophila melanogaster (SotoYéber et al., 2018). Additionally, we confirmed that a subset of olfactory receptor neurons undergo aging-dependent caspase activation at both 29°C and 25°C, as shown in New Figure 3D.

      (12) Why not use an Or42b specific GAL 4 for the aging experiment? What are the odorants that are detected by this ORN? Are any of the odorants behaviorally relevant compounds?

      We thank the reviewer for pointing this out. While the exact odorant detected by Or42b neurons has not been fully determined, these neurons innervate the DM1 region in the antennal lobe, which is activated by ACV. Additionally, Or42b neurons have been shown to be required for attraction behavior to ACV (Semmelhack and Wang, 2009), supporting the relevance of ACV for the behavioral experiment.   We used Or42b-Gal4 to confirm that Or42b neurons undergo aging-dependent caspase activation, which is detectable using the MASCaT system (New Figure 3D). Furthermore, we verified that these neurons exhibit aging-dependent caspase activation at both 25°C and 29°C (New Figure 3D).

      (13) Make the panel lettering in all the figures bigger or bold.

      We thank the reviewer for pointing this out. We have increased the size of the panel lettering and made it bold throughout the figures to improve the readability.

      (14) Line 806. MilliQ water.

      We thank the reviewer for pointing this out. We have ensured that “MilliQ water” is consistently spelled this way throughout the manuscript.

      (15) Figure 6. The authors need to be more clear on the experimental conditions. At what time of the day was this experiment performed? Was the experiment run in DD? Were the flies young or old?

      We thank the reviewer for pointing this out. We performed the assay using one-week-old young flies under constant dark conditions during both the starvation period and the assay. We have added a detailed explanation in the Methods section. For clarity, we have also revised Figure 6A to provide a more detailed explanation of the experimental setup.

      References

      Chihara T, Kitabayashi A, Morimoto M, Takeuchi K-I, Masuyama K, Tonoki A, Davis RL, Wang JW, Miura M. 2014. Caspase inhibition in select olfactory neurons restores innate attraction behavior in aged Drosophila. PLoS Genet 10:e1004437.

      DeLuca SZ, Spradling AC. 2018. Efficient expression of genes in the Drosophila germline using a UAS promoter free of interference by Hsp70 piRNAs. Genetics 209:381–387.

      Duffy JB. 2002. GAL4 system in Drosophila: a fly geneticist’s Swiss army knife. Genesis 34:1–15.

      Okenve-Ramos P, Gosling R, Chojnowska-Monga M, Gupta K, Shields S, Alhadyian H, Collie C, Gregory E, Sanchez-Soriano N. 2024. Neuronal ageing is promoted by the decay of the microtubule cytoskeleton. PLoS Biol 22:e3002504.

      Semmelhack JL, Wang JW. 2009. Select Drosophila glomeruli mediate innate olfactory attraction and aversion. Nature 459:218–223.

      Soto-Yéber L, Soto-Ortiz J, Godoy P, Godoy-Herrera R. 2018. The behavior of adult Drosophila in the wild. PLoS One 13:e0209917.

      Wu B, Li J, Chou Y-H, Luginbuhl D, Luo L. 2017. Fibroblast growth factor signaling instructs ensheathing glia wrapping of Drosophila olfactory glomeruli. Proc Natl Acad Sci U S A 114:7505–7512.

    1. eLife Assessment

      In this useful study, the authors tested the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes using modeling and behavioral experiments, claiming that bumblebees rely most on ground-views for homing. However, due to a lack of analysis of the bees' behavior during training and a lack of information as to how the homing behavior of bees develops over time, the evidence supporting their claims is currently incomplete. Moreover, there was concern that the experimental environment was not representative of natural scenes, thus limiting the findings of the study.

    2. Reviewer #2 (Public review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exit the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information to the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:

      line 51: "Snapshot models perform best with bird's eye views";<br /> line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it.";<br /> line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views."

      Strengths:

      The experimental set-up allows to record the flight behaviour of bees in great spatial and temporal detail and in principle also to reconstruct the visual information available to the bees throughout the arena.

      Modelling: The revised manuscript now presents the results of modelling that includes information potentially available to the bees from the arena wall and in particular from the top edge of the arena.

      As I predicted, this increases the width of rotational image difference functions and therefore provides directional guidance over a larger range of misalignments. However, the authors dismiss the modelling results based on such reconstructed views which more realistically describe the information available to the bumblebees, because (line 291ff): 'Further simulations with a rendered arena wall led to worse results because the agent was mainly led to the centre of the arena (Fig. S17, Fig. S18-21)".

      What the modelling in Fig. 17 actually shows is that the agent is led more or less exactly to the 'entry points' to the arena chosen by the real bees (Fig. 4). The authors ignore this and in their rebuttal state that 'We hypothesised that the arena wall and object location created ambiguity'. The problem here is that you don't remove potential 'ambiguity' for real bees by ignoring information they are unlikely to ignore.

      Behavioural analysis: The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Fig. 17.

      Basically both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      The revised manuscript does not address my concerns. The rebuttal states that a detailed analysis of learning and return flights was 'outside the scope of this particular study', that their experimental design 'does not require the entire history of the bee's trajectory to be tested', that 'the entire flight history...will require...effort...conceptually' and that it would be 'difficult to test a hypothesis'.

      These responses clarify the frustrating problem with this study: The authors are more concerned with testing hypotheses than with trying to understand how bumblebees learn to cope with a situation which constrains their learning choreography and confronts them with the one fundamental problem view-based homing has: repetitive scene elements.

      Homing is an experience-dependent process and to understand what cues the bees used to navigate this set-up requires an analysis of the whole learning process. For instance, it may well be that the B+G+ bees initially did enter the array from above, but subsequently learnt a more efficient route into the array, by simply entering it from the side, followed by 'unguided' searching.

      General: The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses unnatural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Fig. 9-12) when more distant scene elements are excluded.

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved. A nice start would be to build camera-based 3D models of natural bumblebee nest entrance environments and analyse whether there are any particularly unusual challenges for the visual localization of the nest entrance.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public Review):

      Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small spatial scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest hole within a confined area. While many studies have focused on larger spatial scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing, especially in dense environments as we propose here.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      See the new discussion at lines 192-197

      We agree with your comment about the term "clutter". Therefore, we referred to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      See line 20 and we changed the wording throughout the manuscript and figures.

      Reviewer 1 (Recommendations): 

      The manuscript is well written, nicely designed experiments and well illustrated. I have a few comments below.

      It would be useful to discuss known data of learning flights in bumblebees, and the height or catchment area of their flights. This will allow the reader to compare your exp design to the natural learning flights.

      In our study, we first focused on demonstrating the ability to solve a homing task in a dense environment. As we observed the bees returning within the dense environment and not from above it (contrary to the model predictions), we investigated whether they flew above it during their first flights. The bees did indeed fly above, demonstrating their ability to ascend and descend within the constellation of objects (see Supplementary Material Fig. 22).

      In nature, the learning flight of bumblebees may cover several decametres, with the loops performed during these flights increasing with flight time (e.g. Osborne et al. 2013; Woodgate et al. 2016). A similar pattern can be observed on a smaller spatial scale (e.g. Philippides et al. 2013). Similar to the loops that extend over time, the bees gradually gain altitude (Lobecke et al., 2018). However, these observations come from studies where few conspicuous objects surround the nest entrance.

      Although our study  focussed on the performance in goal finding in cluttered environments, we now also address the issue of learning flights in the discussion, as learning flights are the scaffolding of visual learning. We have already conducted several learning flight experiments to fill the knowledge gap mentioned above. These will allow us in a forthcoming paper to compare learning flights in this environment with the existing literature (Sonntag et al., 2024).

      We added a reference to this in the discussion (lines 218-219 and 269-272)

      Include bumblebee in the title rather than 'bees'.

      We adapted the title accordingly:

      “Switching perspective: Comparing ground-level and bird’s-eye views for bumblebees navigating dense environments”

      I found switching between bird-views and frog-views to explain bee-views slightly tricky to read. Why not use 'ground-views', which you already have in the title?

      We agree and adapted the wording in the manuscript according to this suggestion.

      I am not convinced there is evidence here to suggest the bees do not use view-based navigation, because of the following: In L66: unclear what were the views centred around, I assume it is the nest. Is 45cm above the ground the typical height gained by bumblebees during learning flight? The clutter seems to be used more as an obstacle that they are detouring to reach the goal, isn't it?

      Based on many previous studies, view-based navigation can be assumed to be one of the plausible mechanisms bees use for homing (Cartwright & Collett, 1987; Doussot et al., 2020; Lehrer & Collett, 1994; Philippides et al., 2013; Zeil, 2022). In our tests, when the dense environment was shifted to a different position in the flight arena, almost no bees searched at the real location of the nest entrance but at the fictive new location within the dense environment, indicating that the bees assumed  the nest to be located within the dense environment, and therefore  that vision played a crucial role for homing. We thus never meant that the bees were not using view-based navigation. We clarified this point in the revised manuscript.

      See lines 247-248, 250-259, added visual memory to schematic in Fig. 6

      In our model simulations, the memorised snapshots were centred around the nest. However, we found that a multi-snapshot model could not explain the behaviour of the bees. This led us to suggest that bees likely employ acombination of multiple mechanisms for navigation.

      We refined paragraph about possible alternative homing mechanisms. See lines  218-263

      The height of learning flights has not been extensively investigated in previous studies, and typical heights are not well-documented in the literature. However, from our observations of the first outbound flights of bumblebees within the dense environment, we noted that they quickly increased their altitude and then flew above the objects. Since the objects had a height of 0.3 metres, we chose 0.45 metres as a height above the objects for our study.

      Furthermore, the nest is positioned within the arrangement of objects, making it a target the bees must actively find rather than detour around.

      I think a discussion to contrast your findings with Murray and Zeil 2017 will be useful. It was unclear to me whether the flight arena had UV availability, if it didn't, this could be a reason for the difference.

      We referred to this study in the discussion of the revised paper (see our response to the public review). Lines 192-197

      As in most lab studies on local homing, the bees did not have UV light available in the arena. Even without this, they were successful in finding their nest position during the tests. We clarified that in the revised manuscript. See line 334-336

      Figure 2A, can you add a scale bar?

      We added a scale bar to the figure showing the dimensions of the arena. See Fig. 2

      The citation of figure orders is slightly off. We have Figure 5 after Figure 2, without citing Figures 3 and 4. Similarly for a few others.

      We carefully checked the order of cited figures and adapted them.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions: line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views."

      Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the dense environment but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing (neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we included model results with the arena wall in the supplements of the revised paper. See lines 290-293, Figures S17-21

      We agree that the catchment volumes would provide quantitatively more detailed information as catchment slices. Nevertheless, since our goal was  to investigate if bees would use ground views or bird's eye views to home in a dense environment, catchment slices, which provide qualitatively similar information as catchment volumes, are sufficient to predict whether ground or bird's-eye views perform better in leading to the nest. Therefore, we did not include further computations of catchment volumes. (ll. 296-297)

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17. Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments(Baddeley et al., 2012; Dittmar et al., 2010; Doussot et al., 2020; Möller, 2012; Wystrach et al., 2011, 2013; Zeil, 2012). A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Since we observed behavioural responses different from the one suggested by the models, it becomes interesting to look at the flight history. If we had found an alignment between the model and the behaviour, looking at thehistory would have become much less interesting. Thus our results raise an interest in looking at the entire flight history, which will require not only effort on the recording procedure, but as well conceptually. At the moment the underlying mechanisms of learning during outbound, inbound, exploration, or orientation flight remains evasive and therefore difficult to test a hypothesis. A detailed description of the flight during the entire bee history would enable us to speculate alternative models to the one tested in our study, but would remain limited in testing those.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the dense environment.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled laboratory conditions. Both field and laboratory research are necessary and should complement each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of the components of the environment for the behaviour through targeted variation of them. These results yield precious information to then guide future field-based experiments for validation.

      Our laboratory settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was based on the knowledge that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and now refer to the  environment as a "dense environment."

      We changed the wording throughout the manuscript and figures.

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factors inherent to field work conditions, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious assessments of catchment areas in the context of local homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

      Reviewer 2 (Recommendations):

      (1) Clarify what is meant by modelling panoramic images at 1cm intervals (only?) along the x-axis of the arena.

      The panoramic images were taken along a grid with 0.5cm steps within the dense environment and 1cm steps in the rest of the arena. A previous study (Doussot et al., 2020) showed successful homing of multi-snapshot models in an environment of similar scale with a grid with 2cm steps. Therefore, we think that our scaling is sufficiently fine. We apologise for the missing information in the method section and added it to the revised manuscript. See lines 286-287

      (2) In Figures 9-12 what are the memory0 to memory7 locations and reference image orientations? Explain clearly which image comparisons generated the rotIDFs shown.

      Memory 0 to memory 7 are examples of the eight memorised snapshots, which are aligned in the nest direction and taken around the nest. In the rotIDFs shown, we took memory 0 as a reference image, and compared the 7 others by rotating them against memory 0. We clarified that in the revised manuscript.

      See revised figure caption in Fig. S9 – 16

      (3) Figure 9 seems to compare 'bird's-eye', not 'frog's-eye' views.

      We apologise for that mistake and carefully double-checked the figure caption.

      See revised figure caption Fig. S9

      (4) Why do you need to invoke a PI vector (Figure 6) to explain your results?

      Since the bees were able to home in the dense environment without entering the object arrangement from above but from the side, image matching alone could not explain the bees’ behaviour. Therefore, we suggest, as an hypothesis for future studies, a combination of mechanisms such as a home vector. Other alternatives, perhaps without requiring a PI vector, may explain the bees’ behaviour, and we will welcome any future contributions from the scientific community.

      References

      Baddeley, B., Graham, P., Husbands, P., & Philippides, A. (2012). A Model of Ant Route Navigation Driven by Scene Familiarity. PLoS Computational Biology,8(1), e1002336. https://doi.org/10.1371/journal.pcbi.1002336

      Capaldi, E. A., Smith, A. D., Osborne, J. L., Farris, S. M., Reynolds, D. R., Edwards, A. S., Martin, A., Robinson, G. E., Poppy, G. M., & Riley, J. R. (2000).

      Ontogeny of orientation flight in the honeybee revealed by harmonic radar. Nature, 403. https://doi.org/10.1038/35000564

      Cartwright, B. A., & Collett, T. S. (1987). Landmark maps for honeybees. Biological Cybernetics, 57(1), 85–93. https://doi.org/10.1007/BF00318718

      Dittmar, L., Stürzl, W., Baird, E., Boeddeker, N., & Egelhaaf, M. (2010). Goal seeking in honeybees: Matching of optic flow snapshots? Journal of Experimental Biology, 213(17), 2913–2923. https://doi.org/10.1242/jeb.043737

      Doussot, C., Bertrand, O. J. N., & Egelhaaf, M. (2020). Visually guided homing of bumblebees in ambiguous situations: A behavioural and modelling study. PLoS Computational Biology, 16(10). https://doi.org/10.1371/journal.pcbi.1008272

      Lehrer, M., & Collett, T. S. (1994). Approaching and departing bees learn different cues to the distance of a landmark. Journal of Comparative Physiology A, 175(2), 171–177. https://doi.org/10.1007/BF00215113

      Lobecke, A., Kern, R., & Egelhaaf, M. (2018). Taking a goal-centred dynamic snapshot as a possibility for local homing in initially naïve bumblebees. Journal of Experimental Biology, 221(2), jeb168674. https://doi.org/10.1242/jeb.168674

      Möller, R. (2012). A model of ant navigation based on visual prediction. Journal of Theoretical Biology, 305, 118–130. https://doi.org/10.1016/j.jtbi.2012.04.022

      Murray, T., & Zeil, J. (2017). Quantifying navigational information: The catchment volumes of panoramic snapshots in outdoor scenes. PLOS ONE, 12(10), e0187226. https://doi.org/10.1371/journal.pone.0187226

      Osborne, J. L., Smith, A., Clark, S. J., Reynolds, D. R., Barron, M. C., Lim, K. S., & Reynolds, A. M. (2013). The ontogeny of bumblebee flight trajectories: From Naïve explorers to experienced foragers. PLoS ONE, 8(11). https://doi.org/10.1371/journal.pone.0078681

      Philippides, A., de Ibarra, N. H., Riabinina, O., & Collett, T. S. (2013). Bumblebee calligraphy: The design and control of flight motifs in the learning and return flights of Bombus terrestris. Journal of Experimental Biology, 216(6), 1093–1104. https://doi.org/10.1242/jeb.081455

      Sonntag, A., Lihoreau, M., Bertrand, O. J. N., & Egelhaaf, M. (2024). Bumblebees increase their learning flight altitude in dense environments. bioRxiv, 2024.10.14.618154. https://doi.org/10.1101/2024.10.14.618154

      Woodgate, J. L., Makinson, J. C., Lim, K. S., Reynolds, A. M., & Chittka, L. (2016). Life-long radar tracking of bumblebees. PLoS ONE, 11(8). https://doi.org/10.1371/journal.pone.0160333

      Wystrach, A., Mangan, M., Philippides, A., & Graham, P. (2013). Snapshots in ants? New interpretations of paradigmatic experiments. Journal of Experimental Biology, 216(10), 1766–1770. https://doi.org/10.1242/jeb.082941

      Wystrach, A., Schwarz, S., Schultheiss, P., Beugnon, G., & Cheng, K. (2011). Views, landmarks, and routes: How do desert ants negotiate an obstacle course? Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, 197(2), 167–179. https://doi.org/10.1007/s00359-010-0597-2

      Zeil, J. (2012). Visual homing: An insect perspective. Current Opinion in Neurobiology, 22(2), 285–293. https://doi.org/10.1016/j.conb.2011.12.008

      Zeil, J. (2022). Visual navigation: Properties, acquisition and use of views. Journal of Comparative Physiology A. https://doi.org/10.1007/s00359-022-01599-2

    1. eLife Assessment

      These important findings detail the role of Pim1 and Pim2 in controlling the behaviour and activity of 'killer' T cells; a vital cell within of our immune system. The authors capitalized on high resolution quantitative analysis of the proteomes and transcriptomes of Pim1/Pim2-deficient CD8 T cells to provide compelling evidence for how the PIM1/2 kinases control TCR-driven activation and IL-2/IL-15-driven proliferation and differentiation into effector T cells. It's also noteworthy that Pim1/Pim2 impact is better revealed through quantitative proteomics than transcriptomics.

    2. Reviewer #1 (Public review):

      Summary and Strengths:

      The study focuses on PIM1 and 2 in CD8 T cell activation and differentiation. These two serine/threonine kinases belong to a large network of Serine/Threonine kinases that acts following engagement of the TCR and of cytokine receptors and phosphorylates proteins that control transcriptional, translational and metabolic programs that result in effector and memory T cell differentiation. The expression of PIM1 and PIM2 is induced by the T-cell receptor and several cytokine receptors. The present study capitalized on high-resolution quantitative analysis of the proteomes and transcriptomes of Pim1/Pim2-deficient CD8 T cells to decipher how the PIM1/2 kinases control TCR-driven activation and IL-2/IL-15-driven proliferation, and differentiation into effector T cells.

      Quantitative mass spectrometry-based proteomics analysis of naïve OT1 CD8 T cell stimulated with their cognate peptide showed that the PIM1 protein was induced within 3 hours of TCR engagement and its expression was sustained at least up to 24 hours. The kinetics of PIM2 expression was protracted as compared to that of PIM1. Such TCR-dependent expression of PIM1/2 correlated with the analysis of both Pim1 and Pim2 mRNA. In contrast, Pim3 mRNA was only expressed at very low levels and the PIM3 protein not detected by mass spectrometry. Therefore, PIM1 and 2 are the major PIM kinases in recently activated T cells. Pim1/Pim2 double knockout (Pim dKO) mice were generated on a B6 background and found to express lower number of splenocytes. No difference in TCR/CD28-driven proliferation was observed between WT and Pim dKO T cells over 3 days in culture. Quantitative proteomics of >7000 proteins further revealed no substantial quantitative or qualitative differences in protein content or proteome composition. Therefore, other signaling pathways can compensate for the lack of PIM kinases downstream of TCR activation.

      Considering that PIM1 and PIM2 kinase expression is regulated by IL-2 and IL-15, antigen-primed CD8 T cells were expanded in IL-15 to generate memory phenotype CD8 T cells or expanded in IL-2 to generate effector cytotoxic T lymphocytes (CTL). Analysis of the survival, proliferation, proteome, and transcriptome of Pim dKO CD8 T cells kept for 6 days in IL-15 showed that PIM1 and PIM2 are dispensable to drive the IL-15-mediated metabolic or differentiation programs of antigen-primed CD8 T cells. Moreover, Pim1/Pim2-deficiency had no impact on the ability of IL-2 to maintain CD8 T cell viability and proliferation. However, WT CTL downregulated expression of CD62L whereas the Pim dKO CTL sustained higher CD62L expression. Pim dKO CTL were also smaller and less granular than WT CTL. Comparison of the proteome of day 6 IL-2 cultured WT and Pim dKO CTL showed that the latter expressed lower levels of the glucose transporters, SLC2A1 and SLC2A3, of a number of proteins involved in fatty acid and cholesterol biosynthesis, and CTL effector proteins such as granzymes, perforin, IFNg and TNFa. Parallel transcriptomics analysis showed that the reduced expression of perforin and some granzymes correlated with a decrease in their mRNA whereas the decreased protein levels of granzymes B and A, and of the glucose transporters SLC2A1 and SLC2A3 did not correspond with decreased mRNA expression. Therefore, PIM kinases are likely required for IL-2 to maximally control protein synthesis in CD8 CTL. Along that line, the translational repressor PDCD4 was increased in Pim dKO CTL and pan-PIM kinase inhibitors caused a reduction in protein synthesis rates in IL-2 expanded CTL. Finally, the differences between Pim dKO and WT CTL in terms of CD62L expression resulted in that Pim dKO CTL but not WT CTL retained the capacity to home to secondary lymphoid organs. In conclusion, this thorough and solid study showed that the PIM1/2 kinases shape the effector CD8 T cell proteomes rather than transcriptomes and are important mediators of IL2-signalling and CD8 T cell trafficking.

      Weaknesses: None

      Comments on revisions:

      The authors have been able to provide in their rebuttal letter fair answers to most of the queries primarily raised by Reviewer 2 and they have incorporated the corresponding results in the revised text. It makes the paper stronger.

    3. Reviewer #2 (Public review):

      Summary:

      Using a suite of techniques (e.g., RNA seq, proteomics, and functional experiments ex vivo) this paper extensively focuses on the role of PIM1/2 kinases during CD8 T-cell activation and cytokine-driven (i.e., IL-2 or IL-15) differentiation. The authors key finding is that PIM1/2 enhance protein synthesis in response to IL-2 stimulation, but not IL-15, in CD8+ T cells. Loss of PIM1/2 made T cells less 'effector-like', with lower granzyme and cytokine production, and a surface profile that maintained homing towards secondary lymphoid tissue. The cytokines the authors focus on are IL-15 and Il-2, which drive naïve CD8 T cells towards memory or effector states, respectively. Although PIM1/2 are upregulated in response to T-cell activation and cytokine stimulation (e.g., IL-15, and to a greater extent, IL-2), using T cells isolated from a global mouse genetic knockout background of PIM1/2, the authors find that PIM1/2 did not significantly influence T-cell activation, proliferation, or expression of anything in the proteome under anti-CD3/CD28 driven activation with/without cytokine (i.e., IL-15) stimulation ex vivo. This is perhaps somewhat surprising given PIM1/2 are upregulated, albeit to a small degree, in response to IL-15, and yet PIM1/2 did not seem to influence CD8+ T cell differentiation towards a memory state. Even more surprising is that IL-15 was previously shown to influence the metabolic programming of intestinal intraepithelial lymphocytes, suggesting cell-type specific effects from PIM kinases. What the authors went on to show, however, is that PIM1/2 KO altered CD8 T cell proteomes in response to IL-2. Using proteomics, they saw increased expression of homing receptors (i.e., L-selectin, CCR7), but reduced expression of metabolism-related proteins (e.g., GLUT1/3 & cholesterol biosynthesis) and effector-function related proteins (e.g., IFNy and granzymes). Rather neatly, by performing both RNA-seq and proteomics on the same IL-2 stimulated WT vs. PIM1/2 KO cells, the authors found that changes at the proteome level were not corroborated by differences in RNA uncovering that PIM1/2 predominantly influence protein synthesis/translation. Effectively, PIM1/2 knockout reduced the differentiation of CD8+ T cells towards an effector state. In vivo adoptive transfer experiments showed that PIM1/2KO cells homed better to secondary lymphoid tissue, presumably owing to their heightened L-selectin expression (although this was not directly examined).

      Strengths:

      Overall, I think the paper is scientifically good, and I have no major qualms with the paper. The paper as it stands is solid, and while the experimental aim of this paper was quite specific/niche, it is overall a nice addition to our understanding of how serine/threonine kinases impact T cell state, tissue homing, and functionality. Of note, they hint towards a more general finding that kinases may have distinct behaviour in different T-cell subtypes/states. I particularly liked their use of matched RNA-seq and proteomics to first suggest that PIM1/2 kinases may predominantly influence translation (then going on to verify this via their protein translation experiment - although I must add this was only done using PIM kinase inhibitors not the PIM1/2KO cells). I also liked that they used small molecule inhibitors to acutely reduce PIM1/2 activity, which corroborated some of their mouse knockout findings - this experiment helps resolve any findings resulting from potential adaptation issues from the PIM1/2 global knockout in mice but also gives it a more translational link given the potential use of PIM kinase inhibitors in the clinic. The proteomics and RNA seq dataset may be of general use to the community, particularly for analysis of IL-15 or IL-2 stimulated CD8+ T cells.

      Weaknesses:

      None. My comments here have been addressed in the previous review.

    4. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their careful reading of our manuscript and their considered feedback. Please see our detailed response to reviewer comments inset below.

      In addition to requested modifications we have also uploaded the proteomics data from 2 of the experiments contained within the manuscript onto the Immunological Proteome Resource (ImmPRes) website: immpres.co.uk making the data available in an easy-to-use graphical format for interested readers to interrogate and explore. We have added the following text to the data availability section (lines 1085-1091) to indicate this:

      “An easy-to-use graphical interface for examining protein copy number expression from the 24-hour TCR WT and Pim dKO CD4 and CD8 T cell proteomics and IL-2 and IL-15 expanded WT and Pim dKO CD8 T cell proteomics datasets is also available on the Immunological Proteome Resource website: immpres.co.uk (Brenes et al., 2023) under the Cell type(s) selection: “T cell specific” and Dataset selection: “Pim1/2 regulated TCR proteomes” and “Pim1/2 regulated IL2 or IL15 CD8 T cell proteomes”.”

      As well as indicating in figure legends where proteomics datasets are first introduced in Figures 1, 2 and 4 with the text:

      “An interactive version of the proteomics expression data is available for exploration on the Immunological Proteome Resource website: immpres.co.uk

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary and Strengths:

      The study focuses on PIM1 and 2 in CD8 T cell activation and differentiation. These two serine/threonine kinases belong to a large network of Serine/Threonine kinases that acts following engagement of the TCR and of cytokine receptors and phosphorylates proteins that control transcriptional, translational and metabolic programs that result in effector and memory T cell differentiation. The expression of PIM1 and PIM2 is induced by the T-cell receptor and several cytokine receptors. The present study capitalized on high-resolution quantitative analysis of the proteomes and transcriptomes of Pim1/Pim2-deficient CD8 T cells to decipher how the PIM1/2 kinases control TCRdriven activation and IL-2/IL-15-driven proliferation, and differentiation into effector T cells.

      Quantitative mass spectrometry-based proteomics analysis of naïve OT1 CD8 T cell stimulated with their cognate peptide showed that the PIM1 protein was induced within 3 hours of TCR engagement, and its expression was sustained at least up to 24 hours. The kinetics of PIM2 expression was protracted as compared to that of PIM1. Such TCRdependent expression of PIM1/2 correlated with the analysis of both Pim1 and Pim2 mRNA. In contrast, Pim3 mRNA was only expressed at very low levels and the PIM3 protein was not detected by mass spectrometry. Therefore, PIM1 and 2 are the major PIM kinases in recently activated T cells. Pim1/Pim2 double knockout (Pim dKO) mice were generated on a B6 background and found to express a lower number of splenocytes. No difference in TCR/CD28-driven proliferation was observed between WT and Pim dKO T cells over 3 days in culture. Quantitative proteomics of >7000 proteins further revealed no substantial quantitative or qualitative differences in protein content or proteome composition. Therefore, other signaling pathways can compensate for the lack of PIM kinases downstream of TCR activation.

      Considering that PIM1 and PIM2 kinase expression is regulated by IL-2 and IL-15, antigen-primed CD8 T cells were expanded in IL-15 to generate memory phenotype CD8 T cells or expanded in IL-2 to generate effector cytotoxic T lymphocytes (CTL). Analysis of the survival, proliferation, proteome, and transcriptome of Pim dKO CD8 T cells kept for 6 days in IL-15 showed that PIM1 and PIM2 are dispensable to drive the IL-15mediated metabolic or differentiation programs of antigen-primed CD8 T cells. Moreover, Pim1/Pim2-deficiency had no impact on the ability of IL-2 to maintain CD8 T cell viability and proliferation. However, WT CTL downregulated the expression of CD62L whereas the Pim dKO CTL sustained higher CD62L expression. Pim dKO CTL was also smaller and less granular than WT CTL. Comparison of the proteome of day 6 IL-2 cultured WT and Pim dKO CTL showed that the latter expressed lower levels of the glucose transporters, SLC2A1 and SLC2A3, of a number of proteins involved in fatty acid and cholesterol biosynthesis, and CTL effector proteins such as granzymes, perforin, IFNg, and TNFa. Parallel transcriptomics analysis showed that the reduced expression of perforin and some granzymes correlated with a decrease in their mRNA whereas the decreased protein levels of granzymes B and A, and the glucose transporters SLC2A1 and SLC2A3 did not correspond with decreased mRNA expression. Therefore, PIM kinases are likely required for IL-2 to maximally control protein synthesis in CD8 CTL. Along that line, the translational repressor PDCD4 was increased in Pim dKO CTL and pan-PIM kinase inhibitors caused a reduction in protein synthesis rates in IL-2expanded CTL. Finally, the differences between Pim dKO and WT CTL in terms of CD62L expression resulted in Pim dKO CTL but not WT CTL retained the capacity to home to secondary lymphoid organs. In conclusion, this thorough and solid study showed that the PIM1/2 kinases shape the effector CD8 T cell proteomes rather than transcriptomes and are important mediators of IL2-signalling and CD8 T cell trafficking.

      Weaknesses:

      None identified by this reviewer.

      Reviewer #2 (Public Review):

      Summary:

      Using a suite of techniques (e.g., RNA seq, proteomics, and functional experiments ex vivo) this paper extensively focuses on the role of PIM1/2 kinases during CD8 T-cell activation and cytokine-driven (i.e., IL-2 or IL-15) differentiation. The authors' key finding is that PIM1/2 enhances protein synthesis in response to IL-2 stimulation, but not IL-15, in CD8+ T cells. Loss of PIM1/2 made T cells less 'effector-like', with lower granzyme and cytokine production, and a surface profile that maintained homing towards secondary lymphoid tissue. The cytokines the authors focus on are IL-15 and Il-2, which drive naïve CD8 T cells towards memory or effector states, respectively. Although PIM1/2 are upregulated in response to T-cell activation and cytokine stimulation (e.g., IL-15, and to a greater extent, IL-2), using T cells isolated from a global mouse genetic knockout background of PIM1/2, the authors find that PIM1/2 did not significantly influence T-cell activation, proliferation, or expression of anything in the proteome under anti-

      CD3/CD28 driven activation with/without cytokine (i.e., IL-15) stimulation ex vivo. This is perhaps somewhat surprising given PIM1/2 is upregulated, albeit to a small degree, in response to IL-15, and yet PIM1/2 did not seem to influence CD8+ T cell differentiation towards a memory state. Even more surprising is that IL-15 was previously shown to influence the metabolic programming of intestinal intraepithelial lymphocytes, suggesting cell-type specific effects from PIM kinases. What the authors went on to show, however, is that PIM1/2 KO altered CD8 T cell proteomes in response to IL-2. Using proteomics, they saw increased expression of homing receptors (i.e., L-selectin, CCR7), but reduced expression of metabolism-related proteins (e.g., GLUT1/3 & cholesterol biosynthesis) and effector-function related proteins (e.g., IFNy and granzymes). Rather neatly, by performing both RNA-seq and proteomics on the same IL2 stimulated WT vs. PIM1/2 KO cells, the authors found that changes at the proteome level were not corroborated by differences in RNA uncovering that PIM1/2 predominantly influence protein synthesis/translation. Effectively, PIM1/2 knockout reduced the differentiation of CD8+ T cells towards an effector state. In vivo adoptive transfer experiments showed that PIM1/2KO cells homed better to secondary lymphoid tissue, presumably owing to their heightened L-selectin expression (although this was not directly examined).

      Strengths:

      Overall, I think the paper is scientifically good, and I have no major qualms with the paper. The paper as it stands is solid, and while the experimental aim of this paper was quite specific/niche, it is overall a nice addition to our understanding of how serine/threonine kinases impact T cell state, tissue homing, and functionality. Of note, they hint towards a more general finding that kinases may have distinct behaviour in different T-cell subtypes/states. I particularly liked their use of matched RNA-seq and proteomics to first suggest that PIM1/2 kinases may predominantly influence translation (then going on to verify this via their protein translation experiment - although I must add this was only done using PIM kinase inhibitors, not the PIM1/2KO cells). I also liked that they used small molecule inhibitors to acutely reduce PIM1/2 activity, which corroborated some of their mouse knockout findings - this experiment helps resolve any findings resulting from potential adaptation issues from the PIM1/2 global knockout in mice but also gives it a more translational link given the potential use of PIM kinase inhibitors in the clinic. The proteomics and RNA seq dataset may be of general use to the community, particularly for analysis of IL-15 or IL-2 stimulated CD8+ T cells.

      We thank the reviewer for their comments supporting the robustness and usefulness of our data.

      Weaknesses:

      It would be good to perform some experiments in human T cells too, given the ease of e.g., the small molecule inhibitor experiment.

      The suggestions to check PIM inhibitor effects in human T cell is a good one. We think an ideal experiment would be to use naïve cord blood derived CD4 and CD8 cells as a model to avoid the impact of variability in adult PBMC and to really look at what PIM kinases do as T cells first respond to antigen and cytokines. In this context there is good evidence that the signalling pathways used by antigen receptors or the cytokines IL-2 and IL-15 are not substantially different in mouse and human. We have also previously compared proteomes of mouse and human IL-2 expanded cytotoxic T cells and they are remarkably similar. As such we feel that mature mouse CD8 T cells are a genetically tractable model to use to probe the signalling pathways that control cytotoxic T cell function. To repeat the full set of experiments observed within this study with human T cells would represent 1-year of work by an experienced postdoctoral fellow.

      Unfortunately, the funding for the project has come to an end and there is no capacity to complete this work.

      Would also be good for the authors to include a few experiments where PIM1/2 have been transduced back into the PIM1/2 KO T cells, to see if this reverts any differences observed in response to IL-2 - although the reviewer notes that the timeline for altering primary T cells via lentivirus/CRISPR may be on the cusp of being practical such that functional experiments can be performed on day 6 after first stimulating T cells.

      A rescue experiment could indeed be informative, though of course comes with challenges/caveats with re-expressing both proteins that have been deleted at once and ability to control the level of PIM kinase that is re-expressed. This work using the Pim dKO mice was performed from 2019-2021 and was seriously impacted by the work restrictions during the COVID19 pandemic. We had to curtail all mouse colonies to allow animal staff to work within the legal guidelines. We had to make choices and the Pim1/2 dKO colony was stopped because we felt we had generated very useful data from the work but could not justify continued maintenance of the colony at such a difficult time. As such we no longer have this mouse line to perform these rescue experiments.

      We have however, performed a limited number of retroviral overexpression studies in WT IL-2-expanded CTL, where T cells were transfected after 24 hours activation and phenotype measured on day 6 of culture. We chose to leave these out of the initial manuscript as these were overexpression under conditions where PIM expression was already high, rather than a true test of the ability of PIM1 or PIM2 to rescue the Pim dKO phenotype. A more robust test would also have required doing these overexpression experiments in IL-15 expanded or cytokine deprived CTL where PIM kinase expression is low, however, we ran out of time and funding to complete this work.

      We have provided Author response image 1 below from the experiments performed in the IL-2 CTL for interested readers. The limited experiments that were performed do support some key phenotypes observed with the Pim dKO mice or PIM inhibitors, finding that PIM1 or PIM2 overexpression was sufficient to increase S6 phosphorylation, and provided a small further increase in GzmB expression above the already very high levels in IL-2 expanded CTL.

      Author response image 1.

      PIM1 or PIM2 overexpression drives increased GzmB expression and S6 phosphorylation in WT IL-2 CTL. OT1 lymph node cell suspensions were activated for 24 hours with SIINFEKL peptide (10 ng/mL), IL-2 (20 ng/mL) and IL-12 (2 ng/mL) then transfected with retroviruses to drive expression of PIM1-GFP, PIM2-GFP fusion proteins or a GFP only control. T cells were split into fresh media and IL-2 daily and (A) GzmB expression and (B) S6 phosphorylation assessed by flow cytometry in GFP+ve vs GFP-ve CD8 T cells 5 days post-transfection (i.e. day 6 of culture). Histograms are representative of 2 independent experiments.

      Other experiments could also look at how PIM1/2 KO influences the differentiation of T cell populations/states during ex vivo stimulation of PBMCs or in vivo infection models using (high-dimensional) flow cytometry (rather than using bulk proteomics/RNA seq which only provide an overview of all cells combined).

      We did consider the idea of in vivo experiments with the Pim1/2 dKO mice but rejected this idea as the mice have lost PIM kinases in all tissues and so we would not be able to understand if any phenotype was CD8 T cell selective. To note the Pim1/2 dKO mice are smaller than normal wild type mice (discussed further below) and clearly have complex phenotypes. An ideal experiment would be to make mice with floxed Pim1 and Pim2 alleles so that one could use cre recombinase to make a T cell-specific deletion and then study the impact of this in in vivo models. We did not have the budget or ethical approval to make these mice. Moreover, this study was carried out during the COVID pandemic when all animal experiments in the UK were severely restricted. So our objective was to get a molecular understanding of the consequences of losing theses kinases for CD8 T cells focusing on using controlled in vitro systems. We felt that this would generate important data that would guide any subsequent experiments by other groups interested in these enzymes.

      We do accept the comment about bulk population proteomics. Unfortunately, single cell proteomics is still not an option at this point in time. High resolution multidimensional flow cytometry is a valuable technique but is limited to looking at only a few proteins for which good antibodies exist compared to the data one gets with high resolution proteomics.

      Alongside this, performing a PCA of bulk RNA seq/proteomes or Untreated vs. IL-2 vs. IL-15 of WT and PIM1/2 knockout T cells would help cement their argument in the discussion about PIM1/2 knockout cells being distinct from a memory phenotype.

      We thank the reviewer for this very good suggestion. We have now included PCAs for the RNAseq and proteomics datasets of IL-2 and IL-15 expanded WT vs Pim dKO CTL in Fig S5 and added the following text to the discussion section of the manuscript (lines 429-431):

      “… and PCA plots of IL-15 and IL-2 proteomics and RNAseq data show that Pim dKO IL-2 expanded CTL are still much more similar to IL-2 expanded WT CTL than to IL-15 expanded CTL (Fig S5)”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In panel B of Figure S1, are the smaller numbers of splenocytes found in dKO fully accounted for by a reduction in the numbers of T cells or also correspond to a reduction in B cell numbers? Are the thymus and lymph nodes showing the same trend?

      We’re happy to clarify on this.

      Since we were focused on T cell phenotypes in the paper this is what we have plotted in this figure, however there is also a reduction in total number of B, NK and NKT cells in the Pim dKO mice (see James et al, Nat Commun, 2021 for additional subset percentages). We find that all immune subsets we have measured make up the same % of the spleen in Pim dKO vs WT mice (we show this for T cell subsets in what was formerly Fig S1C and is now Fig S1A), the total splenocyte count is just lower in the Pim dKO mice (which we show in what was formerly Fig S1B and is now Fig S1C). To note, the Pim dKO mice were smaller than their WT counterparts (though we have not formally weighed and quantified this) and we think this is likely the major factor leading to lower total splenocyte numbers.

      We have not checked the thymus so can’t comment on this. We can confirm that lymph nodes from Pim dKO mice had the same number and % CD4 and CD8 T cells as in WT.

      For our in vitro studies we have made sure to either use co-cultures or for single WT and Pim dKO cultures to equalise starting cell densities between wells to account for the difference in total splenocyte number. We have now clarified this point in the methods section lines 682-684

      “For generation of memory-like or effector cytotoxic T lymphocytes (CTL) from mice with polyclonal T cell repertoires, LN or spleen single cell suspensions at an equal density for WT and Pim dKO cultures (~1-3 million live cells/mL)….”

      Reviewer #2 (Recommendations For The Authors):

      Line 89-99 - PIM kinase expression is elevated in T cells in autoimmunity and inhibiting therefore may make some sense if PIM is enhancing T cell activity. Why then would you use an inhibitor in cancer settings? This needs better clarification for readers, with reference to T cells, particularly given this is an important justification for looking at PIM kinases in T cells.

      We thank the reviewer for highlighting the lack of clarity in our explanation here.

      PIM kinase inhibitors alone are proposed as anti-tumour therapies for select cancers to block tumour growth. However so far these monotherapies haven’t been very effective in clinical trials and combination treatment options with a number of strategies are being explored. There are two lines of logic for why PIM kinase inhibitors might be a good combination with an e.g. anti-PD1 or adoptive T cell immunotherapy. 1) PIM kinase inhibition has been shown to reduce inhibitory/suppressive surface proteins (e.g. PDL1) and cytokine (e.g. TGFbeta) expression in tumour cells and macrophages in the tumour microenvironment. 2) Inhibiting glycolysis and increasing memory/stem-like phenotype has been identified as desirable for longer-lasting more potent anti-tumour T cell immunity. PIM kinase inhibition has been shown to reduce glycolytic function and increase several ‘stemness’ promoting transcription factors e.g. TCF7 in a previous study. Controlled murine cancer models have shown improvement in clearance with the combination of pan-Pim kinase inhibitors and anti-PD1/PDL1 treatments (Xin et al, Cancer Immunol Res, 2021 and Chatterjee et al, Clin Cancer Res 2019).

      It is worth noting, this is seemingly contradictory with other studies of Pim kinases in T cells that have generally found Pim1/2/3 deletion or inhibition in T cells to be suppressive of their function.

      We have clarified this reasoning/seeming conflict of results in the introductory text as follows (lines 90-101):

      “PIM kinase inhibitors have also entered clinical trials to treat some cancers (e.g. multiple myeloma, acute myeloid leukaemia, prostate cancer), and although they have not been effective as a monotherapy, there is interest in combining these with immunotherapies. This is due to studies showing PIM inhibition reducing expression of inhibitory molecules (e.g. PD-L1) on tumour cells and macrophages in the tumour microenvironment and a reported increase of stem-like properties in PIM-deficient T cells which could potentially drive longer lasting anti-cancer responses (Chatterjee et al., 2019; Xin et al., 2021; Clements and Warfel, 2022). However, PIM kinase inhibition has also generally been shown to be inhibitory for T cell activation, proliferation and effector activities (Fox et al., 2003; Mikkers et al., 2004; Jackson et al., 2021) and use of PIM kinase inhibitors could have the side effect of diminishing the anti-tumour T cell response.”  

      Line 93 - The use of 'some cancers' is rather vague and unscientific - please correct phrasing like this. The same goes for lines 54 and 77 (some kinases and some analyses).

      We have clarified the sentence in what is now Line 91 to include examples of some of the cancers that PIM kinase inhibitors have been explored for (see text correction in response to previous reviewer comment), which are predominantly haematological malignancies. The use of the phrase ‘some kinases’ and ‘some analyses’ in what are now Lines 52 and 75 is in our view appropriate as the subsequent sentence/(s) provide specific details on the kinases and analyses that are being referred to.

      Lines 146-147 - Could it be that rather than redundancies, PIM KO is simply not influential on TCR/CD28 signalling in general but influences other pathways in the T cell?

      We agree that the lack of PIM1/2 effect could also be because PIM targets downstream of TCR/CD28 are not influential and have clarified the text as follows (lines 156-161):

      “These experiments quantified expression of >7000 proteins but found no substantial quantitative or qualitative differences in protein content or proteome composition in activated WT versus Pim dKO CD4 and CD8 T cells (Fig 1G-H) (Table S1). Collectively these results indicate that PIM kinases do not play an important unique role in the signalling pathways used by the TCR and CD28 to control T cell activation.”

      Line 169 - Instead of specifying control - maybe put upregulate or downregulate for clarity.

      We have changed the text as per reviewer suggestion (see line 183)

      Line 182-183 - I would move the call out for Figure 2D to after the last call out for Figure 2C to make it more coherent for readers.

      We have changed the text as per reviewer suggestion (see lines 197-200)

      Line 190 - 14,000 RNA? total, unique? mRNA?

      These are predominantly mRNA since a polyA enrichment was performed as part of the standard TruSeq stranded mRNA sample preparation process, however, a small number of lncRNA etc were also detected in our RNA sequencing. We left the results in as part of the overall analysis since it may be of interest to others but don’t look into it further. We do mention the existence of the non-mRNA briefly in the subsequent sentence when discussing the total number of DE RNA that were classified as protein coding vs non-coding.

      We have edited this sentence as follows to more accurately reflect that the RNA being referred to is polyA+ (lines 205-207):

      “The RNAseq analysis quantified ~14,000 unique polyA+ mRNA and using a cut off of >1.5 fold-change and q-value <0.05 we saw that the abundance of 381 polyA+ RNA was modified by Pim1/Pim2-deficiency (Fig 2E) (Table S2A).

      Questions/points regarding figures:

      Figure 1 - Is PIM3 changed in expression with the knockout of PIM1/2 in mice? Although the RNA is low could there be some compensation here? The authors put a good amount of effort in to showing that mouse T cells do not exhibit differences from knocking out pim1/2 i.e., Efforts have been made to address this using activation markers and cell size, cytokines, and proliferation and proteomics of activated T cells. What do the resting T cells look like though? Although TCR signalling is not impacted, other pathways might be. Resting-state comparison may identify this.

      In all experiments Pim3 mRNA was only detected at very low levels and no PIM3 protein was detected by mass spectrometry in either wild type or PIM1/2 double KO TCR activated or cytokine expanded CD8 T cells (See Tables S1, S3, S4). There was similarly no change in Pim3 mRNA expression in RNAseq of IL-2 or IL-15 expanded CD8 T cells (See Tables S2, S6). While we have not confirmed this in resting state cells for all the conditions examined, there is no evidence that PIM3 compensates for PIM1/2deficiency or that PIM3 is substantially expressed in T cells.

      Figure 1A&B - Does PIM kinase stay elevated when removing TCR stimulus? During egress from lymph node and trafficking to infection/tumour/autoimmune site, T cells experience a period of 'rest' from T-cell activation so is PIM upregulation stabilized, or does it just coincide with activation? This could be a crucial control given the rest of the study focuses on day 6 after initial activation (which includes 4 days of 'rest' from TCR stimulation). Nice resolution on early time course though.

      This is an interesting question. Unfortunately, we do not know how sensitive PIM kinases are to TCR stimulus withdrawal, as we have not tried removing the TCR stimulus during early activation and measuring PIM expression.

      Based on the data in Fig 2A there is a hint that 4 hours withdrawal of peptide stimulus may be enough to lose PIM1/2 expression (after ~36 hrs of TCR activation), however, we did not include a control condition where peptide is retained within the culture. Therefore, we cannot resolve this question from the current experimental data, as this difference could also be due to a further increase in PIMs in the cytokine treated conditions rather than a reduction in expression in the no cytokine condition. This ~36-hour time point is also at a stage where T cells have become more dependent on cytokines for their sustained signalling compared to TCR stimulus.

      It is worth noting that PIM kinases are thought to have fairly short mRNA and protein half lives (~5-20 min for PIM1 in primary cells, ~10 min – 1 hr for PIM2). This is consistent with previous observations that cytotoxic T cells need sustained IL-2/Jak signalling to sustain PIM kinase expression, e.g. in Rollings et al (2018) Sci Signaling, DOI:10.1126/scisignal.aap8112 . We would therefore expect that sustained signalling from some external signalling receptor whether this is TCR, costimulatory receptors or cytokines is required to drive Pim1/2 mRNA and protein expression.

      Figure 1D - the CD4 WT and Pim dKO plots are identical - presumably a copying error - please correct.

      We apologise for the copying error and have amended the manuscript to show the correct data. We thank the reviewer for noticing this mistake.

      In Figure 1H - there is one protein found significant - would be nice to mention what this is - for example, if this is a protein that influences TCR levels this could be quite important.

      The protein is Phosphoribosyl Pyrophosphate synthase 1 like 1 (Prps1l1).

      This was a low confidence quantification (based on only 2 peptides) with no known function in T cells. Based on what is known, this gene is predominantly expressed in the testis (though also detected in spleen, lung, liver). A whole-body KO mouse found no difference in male fertility. No further phenotype has been reported in this mouse. See: Wang et al (2018) Mol Reprod Dev, DOI: 10.1002/mrd.23053

      We have added the following text to the legend of Figure 1H to address this protein:

      “Phosphoribosyl Pyrophosphate synthase 1 like 1 (Prps1l1), was found to be higher in Pim dKO CD8 T cells, but was a low confidence quantification (based on only 2 unique peptides) with no known function in T cells.”

      Figure S1 - In your mouse model the reduction in CD4 T cells is quite dramatic in the spleen - is this reduced homing or reduced production of T cells through development?

      Could you quantify the percentage of CD45+ cells that are T cells from blood too? Would be good to have a more thorough analysis of this new mouse model.

      We apologise for the lack of clarity around the Pim dKO mouse phenotype. Something we didn’t mention previously due to a lack of a formal measurement is that the Pim dKO mice were typically smaller than their WT counterparts. This is likely the main reason for total splenocytes being lower in the Pim dKO mice - every organ is smaller. It is not a phenotype reported in Pim1/2 dKO mice on an FVB background, though has been reported in the Pim1/2/3 triple KO mouse before (see Mikkers et al, Mol Cell Biol 2004 doi: 10.1128/MCB.24.13.6104-6115.2004).

      The % cell type composition of the spleen is equivalent between WT and Pim dKO mice and as mentioned above, was controlled for when setting up of our in vitro cultures.

      We have revised the main text and changed the order of the panels in Fig S1 to make this caveat clearer as follows (lines 138-144):

      “There were normal proportions of peripheral T cells in spleens of Pim dKO mice (Fig S1A) similar to what has been reported previously in Pim dKO mice on an FVB/N genetic background (Mikkers et al., 2004), though the total number of T cells and splenocytes was lower than in age/sex matched wild-type (WT) mouse spleens (Fig S1B-C). This was not attributable to any one cell type (Fig S1A)(James et al., 2021) but was instead likely the result of these mice being smaller in size, a phenotype that has previously been reported in Pim1/2/3 triple KO mice (Mikkers et al., 2004).”

      Figure S1C - why are only 10-15% of the cells alive? Please refer to this experiment in the main text if you are going to include it in the supplementary figure.

      With regards what was previously Fig S1C (now Fig S1A) we apologise for our confusing labelling. We were quoting these numbers as the percentage of live splenocytes (i.e. % of live cells). Typically ~80-90% of the total splenocytes were alive by the time we had processed, stained and analysed them by flow cytometry direct ex vivo. Of these CD4 and CD8 T cells made up ~%10-15 of the total live splenocytes (with most of the rest of the live cells being B cells).  

      We have modified the axis to say “% of splenocytes” to make it clearer that this is what we are plotting.

      Figure S1 - Would be good to show that the T cells are truly deficient in PIM1/2 in your mice to be absolutely sure. You could just make a supplementary plot from your mass spec data.

      This is a good suggestion and we have now included this data as supplementary figure 2.

      To note, due to the Pim1 knockout mouse design this is not as simple as showing presence or absence of total PIM1 protein detection in this instance.

      To elaborate: the Pim1/Pim2 whole body KO mice used in this study were originally made by Prof Anton Berns’ lab (Pim1 KO = Laird et al Nucleic Acids Res, 1993, doi: 10.1093/nar/21.20.4750, with more detail on deletion construct in te Riele, H. et al, Nature,1990, DOI: 10.1038/348649a0; Pim2 KO = Mikkers et al, Mol Cell Biol, 2004, DOI: 10.1128/MCB.24.13.6104-6115.2004). They were given to Prof Victor Tybulewicz on an FVB/N background. He then backcrossed them onto the C57BL/6 background for > 10 generations then gave them to us to intercross into Pim1/2 dKO mice on a C57BL/6 background.

      The strategy for Pim1 deletion was as follows:

      A neomycin cassette was recombined into the Pim1 gene in exon 4 deleting 296 Pim1 nucleotides. More specifically, the 98th pim-1 codon (counted from the ATG start site = the translational starting point for the 34 kDa isoform of PIM1) was fused in frame by two extra codons (Ser, Leu) to the 5th neo codon (pKM109-90 was used). The 3'-end of neo included a polyadenylation signal. The cassette also contains the PyF101 enhancer (from piiMo +PyF101) to ensure expression of neo on homologous recombination in ES cells.

      Collectively this means that the PIM1 polypeptide is made prior to amino acid 98 of the 34 kDa isoform but not after this point. This deletes functional kinase activity in both the 34 kDa and 44 kDa PIM1 isoforms. Ablation of PIM1 kinase function using this KO was verified via kinase activity assay in Laird et al. Nucelic Acids Res 1993.

      The strategy to delete Pim2 was as follows:

      “For the Pim2 targeting construct, genomic BamHI fragments encompassing Pim2 exons 1, 2, and 3 were replaced with the hygromycin resistance gene (Pgp) controlled by the human PGK promoter.” (Mikkers et al Mol Cell Biol, 2004)

      The DDA mass spectrometry data collected in Fig 1 G-H and supplementary table 1 confirmed we do not detect peptides from after amino acid residue 98 in PIM1 (though we do detect peptides prior to this deletion point) and we do not detect peptides from the PIM2 protein in the Pim dKO mice. Thus confirming that no catalytically active PIM1/PIM2 proteins were made in these mice.

      We have added a supplementary figure S2 showing this and the following text (Lines 155-156):

      “Proteomics analysis confirmed that no catalytically active PIM1 and PIM2 protein were made in Pim dKO mice (Fig S2).”

      Figure 2A - I found the multiple arrows a little confusing - would just use arrows to indicate predicted MW of protein and stars to indicate non-specific. Why are there 3 bands/arrows for PIM2?  

      The arrows have now been removed. We now mention the PIM1 and PIM2 isoform sizes in the figure legend and have left the ladder markings on the blots to give an indication of protein sizes. There are 2 isoforms for PIM1 (34 and 44 kDa) in addition to the nonspecific band and 3 isoforms of PIM2 (40, 37, 34 kDa, though two of these isoform bands are fairly faint in this instance). These are all created via ribosome use of different translational start sites from a single Pim1 or Pim2 mRNA transcript.

      The following text has been added to the legend of Fig 2A:

      “Western blots of PIM1 (two isoforms of 44 and 34 kDa, non-specific band indicated by *), PIM2 (three isoforms of 40, 37 and 34 kDa) or pSTAT5 Y694 expression.”

      Figure 2A - why are the bands so faint for PIM1/2 (almost non-existent for PIM2 under no cytokine stim) here yet the protein expression seems abundant in Figure 1B upon stim without cytokines? Is this a sensitivity issue with WB vs proteomics? My apologies if I have missed something in the methods but please explain this discrepancy if not.

      There is differing sensitivity of western blotting versus proteomics, but this is not the reason for the discrepancy between the data in Fig 1B versus 2A. These differences reflect that Fig1 B and Fig 2A contrast PIM levels in two different sets of conditions and that while proteomics allows for an estimate of ‘absolute abundance’ Western blotting only shows relative expression between the conditions assessed.  

      To expand on this… Fig 1B proteomics looks at naïve versus 24 hr aCD3/aCD28 TCR activated T cells. The western blot data in Fig 2A looks at T cells activated for 1.5 days with SIINFEKL peptide and then washed free of the media containing the TCR stimulus and cultured with no stimulus for 4 or 24 hrs hours and contrast this with cells cultured with IL-2 or IL-15 for 4 or 24 hours. All Fig 2A can tell us is that cytokine stimuli increases and/or sustains PIM1 and PIM2 protein above the level seen in TCR activated cells which have not been cultured with cytokine for a given time period. Overexposure of the blot does reveal detectable PIM1 and PIM2 protein in the no cytokine condition after 4 hrs. Whether this is equivalent to the PIM level in the 24 hr TCR activated cells in Fig 1B is not resolvable from this experiment as we have not included a sample from a naïve or 24 hr TCR activated T cell to act as a point of reference.

      Figure 4F - Your proteomics data shows substantial downregulation in proteomics data for granzymes and ifny- possibly from normalization to maximise the differences in the graph - and yet your flow suggests there are only modest differences. Can you explain why a discrepancy in proteomics and flow data - perhaps presenting in a more representative manner (e.g., protein counts)?

      The heatmaps are a scaled for ‘row max’ to ‘row min’ copy number comparison on a linear scale and do indeed visually maximise differences in expression between conditions. This feature of these heatmaps is also what makes the lack of difference in GzmB and GzmA at the mRNA heatmap in Fig 5C quite notable.

      We have now included bar graphs of Granzymes A and B and IFNg protein copy number in Figure 4 (see new Fig 4G-H) to make clearer the magnitude of the effect on the major effector proteins involved in CTL killing function. It is worth noting that flow cytometry histograms from what was formerly Fig 4G (now Fig 4I) are on a log-scale so the shift in fluorescence does generally correspond well with the ~1.7-2.75-fold reduction in protein expression observed.

      Figure 4G - did you use isotype controls for this flow experiment? Would help convince labelling has worked - particularly for low levels of IFNy production.

      We did not use isotype controls in these experiments but we are using a well validated interferon gamma antibody and very carefully colour panel/compensation controls to minimise background staining. The only ways to be 100% confident that an antibody is selective is to use an interferon gamma null T cell which we do not have. We do however know that the antibody we use gives flow cytometry data consistent with other orthogonal approaches to measure interferon gamma e.g. ELISA and mass spectrometry.

      Figure 5M - why perform this with just the PIM kinase inhibitors? Can you do this readout for the WT vs. PIM1/2KO cells too? This would really support your claims for the paper about PIM influencing translation given the off-target effects of SMIs.

      Regrettably we have not done this particular experiment with the Pim dKO T cells. As mentioned above, due to this work being performed predominantly during the COVID19 pandemic we ultimately had to make the difficult decision to cease colony maintenance. When work restrictions were lifted we could not ethically or economically justify resurrecting a mouse colony for what was effectively one experiment, which is why we chose to test this key biological question with small molecule inhibitors instead.

      We appreciate that SMIs have off target effects and this is why we used multiple panPIM kinase inhibitors for our SMI validation experiments. While the use of 2 different inhibitors still doesn’t completely negate the concern about possible off-target effects, our conclusions re: PIM kinases and impact on proteins synthesis are not solely based on the inhibitor work but also based on the decreased protein content of the PIM1/2 dKO T cells in the IL-2 CTL, and the data quantifying reductions in levels of many proteins but not their coding mRNA in PIM1/2dKO T cells compared to controls.

    1. eLife Assessment

      This article presents a valuable genetic spatio-temporal analysis of malaria-infected individuals from four villages in a highly seasonal transmission setting in The Gambia, covering the period between December 2014 and May 2017. Evidence generated by the study's laboratory and data processing approaches is solid and helps to advance the understanding of malaria in The Gambia, particularly due to its longitudinal design and the inclusion of asymptomatic cases.

    2. Reviewer #2 (Public review):

      Summary:

      Malaria transmission in the Gambia is highly seasonal, whereby periods of intense transmission at the beginning of the rainy season are interspersed by long periods of low to no transmission. This raises several questions about how this transmission pattern impacts the spatiotemporal distribution of circulating parasite strains, how parasites persist during the dry season, and how asymptomatic infections contribute to maintaining transmission during the low/no transmission season.

      Combining a molecular barcode genotyping using 101 bi-allelic SNPs and SNPs from Whole Genome Sequence (WGS) in a "consensus barcode", the authors aimed at measuring the relatedness between parasites at different spatial (i.e., individual, household, village, and region) and temporal (i.e., high, low, and the corresponding the transitions) levels by assessing the fraction of the genome having a common ancestry (i.e. Identity-by-Descent (IBD)).

      By measuring the Complexity of Infection (COI) and parasite relatedness by IBD the authors show that a large fraction of infections is polygenomic and stable over time, resulting in a high recombinational diversity. Moreover, they show that transmission intensity increases during the transition from the dry to wet seasons. However, they find that there is a higher probability of finding similar genotypes within the same household, but this similarity rapidly disappears over time and is not observed between different villages. If there is no drug selection during the dry season, and if resistance results in a fitness cost, alleles associated with drug resistance may change in frequency. The authors looked at the frequencies of six drug-resistance haplotypes (aat1, crt, dhfr, dhps, kelch13, and mdr1), and found no evidence of changes in allele frequencies associated with seasonality. They also find chronic infections lasting from one month to one and a half years with no dependence on age or gender.

      This work makes use of genomic information and IBD analytic tools to show parasite relatedness from asymptomatic infections at different spatial and temporal scales, thus providing a better understanding of the transmission dynamics of malaria in highly seasonal environments.

      Strength:

      The authors use a combination of high-quality barcodes (425 barcodes representing 101 bi-allelic SNPs) and 199 high-quality genome sequences to infer the fraction of the genome with shared Identity by Descent (IBD) (i.e. a metric of recombination rate) over several time points covering two years. The barcode and whole genome sequence combination allows full use of a large dataset, to confidently infer the relatedness of parasite isolates at various spatiotemporal scales and show the advantage of using genomic information for understanding malaria transmission dynamics.

      The authors aimed to establish how seasonal transmission cycles shape the spatiotemporal parasite population structure using metrics such as parasite genetic diversity, genetic relatedness, and frequency of drug resistance alleles, as well as the contribution of asymptomatic chronic carriers to sustained transmission. The results support their conclusions.

      Using a combination of molecular barcodes and available whole genome sequence datasets opens new opportunities to understand malaria transmission dynamics in different transmission settings. This allows for data analysis at different spatiotemporal granularities, having a practical utility for identifying malaria control targets and acquiring metrics to evaluate malaria control programs. The development of molecular barcodes using similar SNPs by different malaria control programs would be of great utility to compare and understand malaria transmission dynamics in different settings worldwide.

    3. Reviewer #3 (Public review):

      This study aimed to examine the impact of seasonality on the population genetics of malaria parasites. To achieve this, the researchers conducted a longitudinal study in a region with seasonal malaria transmission. Over a 2.5-year period, blood samples were collected from 1,516 participants across four villages in the Upper River Region of The Gambia. These samples were tested for malaria parasite infection, and the parasites from positive samples were genotyped using a genetic barcode and/or whole genome sequencing. Genetic relatedness analysis was then performed to explore the findings

      The study identified three key findings:

      (1) The malaria parasite population undergoes continuous recombination, with no single genotype predominating, in contrast to viral populations;

      (2) Parasite relatedness is influenced by both spatial and temporal factors; and

      (3) The lowest genetic relatedness among parasites occurs during the transition from the low to high transmission seasons, which the authors linked to increased recombination during sexual reproduction in mosquitoes.

      The results section is well-structured, and the figures are clear and self-explanatory. The methods are adequately described, providing a solid foundation for the findings. While there are no unexpected results, it is reassuring to see the anticipated outcomes supported by actual data. The conclusions are generally well-supported and the recommendation to target asymptomatic infections is logical and relevant.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The manuscript titled "Household clustering and seasonal genetic  variation of Plasmodium falciparum at the community-level in The Gambia" presents a valuable genetic spatio-temporal analysis of  malaria-infected individuals from four villages in The Gambia, covering  the period between December 2014 and May 2017. The majority of samples  were analyzed using a SNP barcode with the Spotmalaria panel, with a  subset validated through WGS. Identity-by-descent (IBD) was calculated  as a measure of genetic relatedness and spatio-temporal patterns of the  proportion of highly related infections were investigated. Related  clusters were detected at the household level, but only within a short  time period.

      Strengths:

      This study offers a valuable dataset, particularly due to its  longitudinal design and the inclusion of asymptomatic cases. The  laboratory analysis using the Spotmalaria platform combined and  supplemented with WGS is solid, and the authors show a linear  correlation between the IBD values determined with both methods,  although other studies have reported that at least 200 SNPs are required for IBD analysis. Data-analysis pipelines were created for (1) variant  filtering for WGS and subsequent IBD analysis, and (2) creating a  consensus barcode from the spot malaria panel and WGS data and  subsequent SNP filtering and IBD analysis.

      Weaknesses:

      Further refining the data could enhance its impact on both the scientific community and malaria control efforts in The Gambia.

      (1) The manuscript would benefit from improved clarity and better  explanation of results to help readers follow more easily. Despite  familiarity with genotyping, WGS, and IBD analysis, I found myself  needing to reread sections. While the figures are generally clear and  well-presented, the text could be more digestible. The aims and  objectives need clearer articulation, especially regarding the rationale for using both SNP barcode and WGS (is it to validate the approach with the barcode, or is it to have less missing data?). In several analyses, the purpose is not immediately obvious and could be clarified.

      The text of the manuscript has now been thoroughly revised. But please let us know if a specific section remains unclear.

      (2) Some key results are only mentioned briefly in the text without  corresponding figures or tables in the main manuscript, referring only  to supplementary figures, which are usually meant for additional detail, but not main results. For example, data on drug resistance markers  should be included in a table or figure in the main manuscript.

      We agree with the reviewer suggesting to move the prevalence of drug resistance markers from supplementary figures (previously Figure S8) to the main manuscript (now Figure 5). If other Figure/Table should be moved to the main manuscript please let us know.

      (3) The study uses samples from 2 different studies. While these are  conducted in the same villages, their study design is not the same,  which should be addressed in the interpretation and discussion of the  results. Between Dec 2014 and Sept 2016, sampling was conducted only in 2 villages and at less frequent intervals than between Oct 2016 to May  2017. The authors should assess how this might have impacted their  temporal analysis and conclusions drawn. In addition, it should be  clarified why and for exactly in which analysis the samples from Dec  2016 - May 2017 were excluded as this is a large proportion of your  samples.

      We have clarified which set of samples was used in our Results (Lines 293-295, 316-319). While two villages were recruited halfway through the study, two villages (J and K, Figure 1C) consistently provided data for each transmission season. Importantly, our temporal analysis accounts for these differences by grouping paired barcodes based on their respective locations (Figure 3B). Despite variations in sampling frequency, we still observe a clear overall decline in relatedness between the ‘0-2 months’ and ‘2-5 months’ groups, both of which include barcodes from all four villages.

      (4) Based on which criteria were samples selected for WGS? Did the  spatiotemporal spread of the WGS samples match the rest of the genotyped samples? I.e. were random samples selected from all times and places,  or was it samples from specific times/places selected for WGS?

      All P. falciparum positive samples were sent for genotyping and whole genome sequencing, ensuring no selection bias. However, only samples with sufficient parasite DNA were successfully sequenced. We have updated the text (Line 129-130) and added a supplementary figure (Figure S4) to show the sample collection broken down by type of data (barcode or genome). High quality genomes are distributed across all time points.

      (5) The manuscript would benefit from additional detail in the methods section.

      Please see our response in the section “Recommendation for the authors”.

      (6) Since the authors only do the genotype replacement and build  consensus barcode for 199 samples, there is a bias between the samples  with consensus barcode and those with only the genotyping barcode. How  did this impact the analysis?

      While we acknowledge the potential for bias between samples with a consensus barcode (based on WGS) and those with genotyping-only barcodes, its impact is minimal. WGS does indeed produce a more accurate barcode compared to SNP genotyping, but any errors in the genotyping barcodes were mitigated by excluding loci that systematically mismatched with WGS data (see Figure S3). Additionally, the use of WGS improved the accuracy of 51 % (216/425) of barcodes, which strengthens the overall quality and validity of our analysis.

      (7) The linear correlation between IBD-values of barcode vs genome is  clear. However, since you do not use absolute values of IBD, but a  classification of related (>=0.5 IBD) vs. unrelated (<0.5), it  would be good to assess the agreement of this classification between the 2 barcodes. In Figure S6 there seem to be quite some samples that would be classified as unrelated by the consensus barcode, while they have  IBD>0.5 in the Genome-IBD; in other words, the barcode seems to be  underestimating relatedness.

      a. How sensitive is this correlation to the nr of SNPs in the barcode?

      We measured the agreement between the two classifications using specificity (0.997), sensitivity (0.841) and precision (0.843) described in the legend of Figure S8. To further demonstrate the good agreement between the two methods, we calculated a Cohen’s kappa value of 0.839 (Lines 226, 290), indicative of a strong agreement (McHugh 2012). As expected, the correlation between IBD values obtained by both methods improves (higher Cohen’s kappa and R<sup>2</sup>) as the cutoff for the minimal number of comparable and informative loci per barcode pair is raised (data not shown).

      (8) With the sole focus on IBD, a measure of genetic relatedness, some of the conclusions from the results are speculative.

      a. Why not include other measures such as genetic diversity, which  relates to allele frequency analysis at the population level (using, for example, nucleotide diversity)? IBD and the proportion of highly  related pairs are not a measure of genetic diversity. Please revise the  manuscript and figures accordingly.

      We agree with the fact that IBD is not a direct measure of genetic diversity, even though both are related (Camponovo et al., 2023). More precisely, IBD is a measure of the level of inbreeding in the population (Taylor et al., 2019). We have updated our manuscript by replacing “genetic diversity” with “genetic relatedness” or “inbreeding/outcrossing” when appropriate. Nucleotide diversity would be relevant if we wanted to compare different settings, e.g. Africa vs Asia, however this is not the case here.

      b. Additionally, define what you mean by "recombinatorial genetic  diversity" and explain how it relates to IBD and individual-level  relatedness.

      We considered the term ‘recombinatorial genetic diversity’ to be equivalent to the level of inbreeding in the population. Because this expression is rather uncommon, we decided to drop it from our manuscript and replace it with “inbreeding/outcrossing”.

      c. Recombination is one potential factor contributing to the loss of  relatedness over time. There are several other factors that could  contribute, such as mobility/gene flow, or study-specific limitations  such as low numbers of samples in the low transmission season and many  months apart from the high transmission samples.

      Indeed, the loss of relatedness could be attributed not only to the recombination of local cases but also to new parasites introduced by imported malaria cases. As we stated in our manuscript, previous studies have shown a limited effect of imported cases on maintaining transmission (Lines 72-74). Nevertheless, we cannot definitely exclude that imported cases have an effect on inbreeding levels, since we do not have access to genetic data of surrounding parasites at the time of the study. We updated the discussion accordingly (Lines 497-501).

      d. By including other measures such as linkage disequilibrium you could  further support the statements related to recombination driving the loss of relatedness.

      This commendable suggestion is actually part of an ongoing project focusing on the sharing of IBD fragments and how it correlates with linkage disequilibrium. However, we believe that this analysis would not fit in the scope of our manuscript which is really about spatio-temporal effects on parasite relatedness at a local scale.

      (9) While the authors conclude there is no seasonal pattern in the  drug-resistant markers, one can observe a big fluctuation in the dhps  haplotypes, which go down from 75% to 20% and then up and down again  later. The authors should investigate this in more detail, as dhps is  related to SP resistance, which could be important for seasonal malaria  chemoprofylaxis, especially since the mutations in dhfr seem near-fixed  in the population, indicating high levels of SP resistance at some of  the time points.

      As the reviewer noted, the DHPS A437G haplotype appears to decrease in prevalence twice throughout our study: from the 2015 and 2016 high transmission seasons to the subsequent 2016 and 2017 low transmission seasons. Seasonal Malaria Chemoprophylaxis (SMC) was carried out in the area through the delivery of sulfadoxine–pyrimethamine plus amodiaquine to children 5 years old and younger during high transmission seasons. As DHPS A437G haplotype has been associated with resistance to sulfadoxine, its apparent increase in prevalence during high transmission seasons could be resulting from the selective pressure imposed on parasites. After SMC, the decrease in prevalence observed during low transmission seasons could be caused by a fitness cost of the mutation favouring wild-type parasites over resistant ones. We updated our manuscript to reflect this relevant observation (Lines 400-405).

      (10) I recommend that raw data from genotyping and WGS should be deposited in a public repository.

      Genotyping data is available in the supplementary table 4 (Table S4). Whole genome sequencing is accessible in a European Nucleotide Archive public repository with the identifiers provided in supplementary table 5 (Table S5). We added references to these tables in the manuscript (Lines 249-250).

      Reviewer #2 (Public review):

      Summary:

      Malaria transmission in the Gambia is highly seasonal, whereby periods  of intense transmission at the beginning of the rainy season are  interspersed by long periods of low to no transmission. This raises  several questions about how this transmission pattern impacts the  spatiotemporal distribution of circulating parasite strains. Knowledge  of these dynamics may allow the identification of key units for targeted control strategies, the evaluation of the effect of selection/drift on  parasite phenotypes (e.g., the emergence or loss of drug resistance  genotypes), and analyze, through the parasites' genetic nature, the  duration of chronic infections persisting during the dry season. Using a combination of barcodes and whole genome analysis, the authors try to  answer these questions by making clever use of the different  recombination rates, as measured through the proportion of genomes with  identity-by-descent (IBD), to investigate the spatiotemporal relatedness of parasite strains at different spatial (i.e., individual, household,  village, and region) and temporal (i.e., high, low, and the  corresponding the transitions) levels. The authors show that a large  fraction of infections are polygenomic and stable over time, resulting  in high recombinational diversity (Figure 2). Since the number of  recombination events is expected to increase with time or with the  number of mosquito bites, IBD allows them to investigate the  connectivity between spatial levels and to measure the fraction of  effective recombinational events over time. The authors demonstrate the  epidemiological connectivity between villages by showing the presence of related genotypes, a higher probability of finding similar genotypes  within the same household, and how parasite-relatedness gradually  disappears over time (Figure 3). Moreover, they show that transmission  intensity increases during the transition from dry to wet seasons  (Figure 4). If there is no drug selection during the dry season and if  resistance incurs a fitness cost it is possible that alleles associated  with drug resistance may change in frequency. The authors looked at the  frequencies of six drug-resistance haplotypes (aat1, crt, dhfr, dhps,  kelch13, and mdr1), and found no evidence of changes in allele  frequencies associated with seasonality. They also find chronic  infections lasting from one month to one and a half years with no  dependence on age or gender.

      The use of genomic information and IBD analytic tools provides the  Control Program with important metrics for malaria control policies, for example, identifying target populations for malaria control and  evaluation of malaria control programs.

      Strength:

      The authors use a combination of high-quality barcodes (425 barcodes  representing 101 bi-allelic SNPs) and 199 high-quality genome sequences  to infer the fraction of the genome with shared Identity by Descent  (IBD) (i.e. a metric of recombination rate) over several time points  covering two years. The barcode and whole genome sequence combination  allows full use of a large dataset, and to confidently infer the  relatedness of parasite isolates at various spatiotemporal scales.

      Reviewer #3 (Public review):

      Summary

      This study aimed to investigate the impact of seasonality on the malaria parasite population genetic. To achieve this, the researchers conducted a longitudinal study in a region characterized by seasonal malaria  transmission. Over a 2.5-year period, blood samples were collected from  1,516 participants residing in four villages in the Upper River Region  of The Gambia and tested the samples for malaria parasite positivity.  The parasites from the positive samples were genotyped using a genetic  barcode and/or whole genome sequencing, followed by a genetic  relatedness analysis.

      The study identified three key findings:

      (1) The parasite population continuously recombines, with no single genotype dominating, in contrast to viral populations;

      (2) The relatedness of parasites is influenced by both spatial and temporal distances; and

      (3) The lowest genetic relatedness among parasites occurs during the  transition from low to high transmission seasons. The authors suggest  that this latter finding reflects the increased recombination associated with sexual reproduction in mosquitoes.

      The results section is well-structured, and the figures are clear and  self-explanatory. The methods are adequately described, providing a  solid foundation for the findings. While there are no unexpected  results, it is reassuring to see the anticipated outcomes supported by  actual data. The conclusions are generally well-supported; however, the  discussion on the burden of asymptomatic infections falls outside the  scope of the data, as no specific analysis was conducted on this aspect  and was not stated as part of the aims of the study. Nonetheless, the  recommendation to target asymptomatic infections is logical and  relevant.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript would benefit from additional detail in the methods section.

      a. Refer to Figure 1 when you describe the included studies and sample processing.

      We added the reference to Figure 1 (Line 131).

      b. While you describe each step in the pipeline, you do not specify the  tools, packages, or environment used (the GitHub link is also  non-functional). A graphic representation of the pipeline, with more  bioinformatic details than Supplementary Figure S1, would be helpful.  Add references to used tools and software created by others.

      The GitHub link has been updated and is now functional. We find Figure S1 already heavy in details, adding in more would be detrimental to our will of it being an easily readable summary of our pipeline. Readers seeking in-depth explanation of our pipeline might be more interested in reading the methods section instead. We are very much committed to credit the authors of the tools that were essential for us to create our analysis pipeline. The two most relevant tools that we used are hmmIBD and the Fws calculation, which were both cited in the methods (Lines 148-152, 214-215).

      c. What changed in the genotyping protocol after May 2016? Does it not  lead to bias in the (temporal) analysis by leaving these loci in for  samples collected before May 2016 and making them 'unknown' for the  majority of samples collected after this date?

      These 21 SNPs all clustered in 1 of the 4 multiplexes used for molecular genotyping, which likely failed to produce accurate base calls. We updated the text to include this information (Lines 198-200).

      The rationale behind the discarding of these 21 SNPs for barcodes sampled after May 2016 was that they were consistently mismatching with the WGS SNPs, probably due to genotyping error as mentioned above. However, by replacing these unknown positions in the molecular barcodes with WGS SNPs, 141 samples did recover some of these 21 SNPs with the accurate base calls (Figure S3A). Additionally, we added an extra analysis to assess the agreement between barcodes and WGS data (Figure S3B).

      d. Related to this, how are unknown and mixed genotypes treated in the  binary matrix? How is the binary matrix coded? Is 0 the same as the  reference allele? So all the missing and mixed are treated as  references? How many missing and mixed alleles are there, how often does it occur and how does this impact the IBD analysis?

      We acknowledge that the details that we provided regarding the IBD analysis were confusing. hmmIBD requires a matrix that contains positive or null integers for each different allele at a given loci (all our loci were bi-allelic, thus only 0 and 1 were used) and -1 for missing data. In our case, we set missing and mixed alleles to -1, which were then ignored during the IBD estimation. The corresponding text was updated accordingly (Lines 173-175).

      e. By excluding households with less than 5 comparisons, are you not preselecting households with high numbers of cases, and therefore higher likelihood of transmission within the household?

      All participants in each household were sampled at every collection time point. This sampling was unbiased towards likelihood of transmission. Excluding pairs of households with less than 5 comparisons was necessary to ensure statistical robustness in our analyses. Besides, this does not necessarily restrict the analysis to only households with a high number of cases as it is the total number of pairs between households that must equal 5 at least (for instance these pairs would pass the cutoff: household with 1 case vs household with 5 cases; household with 2 cases vs household with 3 cases).

      (2) Since the authors only do the genotype replacement and build  consensus barcode for 199 samples, there is a bias between the samples  with consensus barcode and those with only the genotyping barcode. How  did this impact the analysis?

      See (6) in the Public Review.

      a. It would be good to get a better sense of the distribution of the nr  of SNPs in the barcode. The range is 30-89, and 30 SNPs for IBD is  really not that much.

      Adding the range of the number of available SNPs per barcode is indeed particularly relevant. We added a supplementary figure (Figure S5) showing the distribution of homozygous SNPs per barcode, showing that a very small minority of barcodes have only 30 SNPs available for IBD (average of 65, median of 64).

      b. Did you compare the nr of SNPs in the consensus vs. only genotyped  barcodes? Is there more missing data in the genotype-only barcodes?

      We added a supplementary figure (Figure S5) with the distribution of homozygous SNPs in consensus (216 samples) and molecular (209 samples) barcodes. Consensus barcodes have more homozygous SNPs (average 76, median 82) than molecular barcodes (average of 54, median of 53), showing the improvement resulting from using whole genome sequencing data.

      c. How was the cut-off/sample exclusion criteria of 30 SNPs in the barcode determined?

      As described above (Public review section 7.a.), we removed pairs of barcodes with less than 30 comparable loci (and 10 informative loci) because this led to a good agreement between IBD values obtained from barcodes and genomes while still retaining a majority of pairwise IBD values.

      d. Was there more/less IBD between sample pairs with a consensus barcode vs those with genotype-only barcodes?

      We separated pairwise IBD values into two groups: “within consensus” and “within molecular”. The percentages of related barcodes (IBD ≥ 0.5) was virtually identical between “within consensus” (1.88 %) and “within molecular” (1.71 %) groups (χ<sup>2</sup> = 1.33, p value > 0.24).

      (3) Line 124 adds a reference for the PCR method used.

      We have updated this information: varATS qPCR (Line 121).

      (4) Line 126, what is MN2100ff? Is this the catalogue number of the  cellulose columns? Please clarify and add manufacturer details.

      MN2100ff was a replacement for CF11. We added a link to the MalariaGen website describing the product and the procedure (Lines 124-125).

      (5) Line 143: Figure S7 is the first supplementary figure referenced. Change the order and make this Figure S1?

      The numbering of figures is now fixed.

      (6) Line 154: How many SNPs were in the vcf before filtering?

      There were 1,042,186 SNPs before filtering. This information was added to the methods (Line 168).

      (7) Line 156: Why is QUAL filtered at 10000? This seems extremely high.  (I could be mistaken, but often QUAL above 50 or so is already fine, why discard everything below 10000?). What is the range of QUAL scores in  your vcf?

      We used the QUAL > 10000 to make our analyses less computationally intensive while keeping enough relevant genetic information. We agree that keeping variants with extremely high values of QUAL is not relevant above a certain threshold as it translates into infinitesimally low probabilities (10<sup>-(QUAL/10)</sup>) of the variant calling being wrong. We then decided to use a minimal population minor allele frequency (MAF) of 0.01 to keep a variant as this will make the IBD calculation more accurate (Taylor et al., 2019). The variant filtering was carried out with the MAF > 0.01 filter, resulting in 27,577 filtered SNPs with a minimal QUAL of 132. With a cutoff of 3000 available SNPs, we retrieved all 199 genomes previously obtained with the QUAL > 10000 condition. The methods have been updated accordingly (Lines 166-170).

      (8) Line 161-165: How did you handle the mixed alleles in the hmmIBD  analysis for the WGS data? Did you set them as 0 as you do later on for  the consensus barcode?

      Mixed alleles and missing data were ignored. This translated into a value of -1 for the hmmIBD matrix and not 0 as we incorrectly stated previously. We updated our manuscript with this correct information (Lines 173-175).

      (9) Line 168-171: How many SNPs do you have in the WGS dataset after all the filtering steps? If the aim of the IBD with WGS was to validate the IBD-analysis with the barcode, wouldn't it make sense to have at least  200 loci (as shown in Taylor et al to be required for hmmIBD) in the WGS data? What proportion of comparisons were there with only 100 pairs of  loci? This seems like really few SNPs from WGS data.

      There were 27,577 SNPs overall in the 199 high quality genomes. In our analysis, we make the distinction between comparable and informative loci. For two loci to be comparable, they both have to be homozygous. To be informative, they must be comparable and at least one of them must correspond to the minor allele in the population. We borrowed this term and definition from hmmIBD software which yields directly the number of informative loci per pair. By keeping pairs with at least 100 informative SNPs, we aimed to reduce the number of samples artificially related because only population major alleles are being compared. Pairs of genomes had between 1073 and 27466 of these, way above the recommended 200 loci in Taylor et al. (2019). We added more details on comparable and informative sites (Lines 152-160).

      (10) Line 178: why remove the 12 loci that are absent from the WGS? Are  these loci also poorly genotyped in the spotmalaria panel?

      As our goal is to validate the reliability of molecular genotyped SNPs, these 12 loci have to be removed. Especially because we did find a consistent discrepancy between genotyped and WGSed SNPs, which cannot be tested if these SNPs are absent from the genomes.

      (11) Line 180-182: What do you mean by this sentence: "Genomic barcodes  are built using different cutoffs of within-sample MAF and aligned  against molecular barcodes from the same isolates." Is this the analysis presented in the supplementary figure and resulting in the cut-off of  MAF 0.2? Please clarify.

      A loci where both alleles are called can result from two distinct haploïd genomes present or from an error occurring during sequencing data acquisition or processing. To distinguish between the two, we empirically determined the cutoff of within-sample MAF above which the loci can be considered heterozygous and below which only the major allele is kept. The corresponding figure was indeed Figure S2 (referenced in next sentence Lines 192-195). We clarified our approach in the methods (Lines 190-192) and legends of Figures S2 and Figure S3.

      (12) Line 191: How often was there a mismatch between WGS and SNP barcode?

      We added a panel (Figure S3B) showing the average agreement of each SNP between molecular genotyping and WGS. We highlighted the 21 discrepant SNPs showing a lower agreement only for samples collected after May 2016.

      (13) Line 201-204: This part is unclear (as above for the WGS): did you  include sample pairs with more than 10 paired loci? But isn't 10 loci  way too few to do IBD analysis?

      We included pairs of samples with at least 30 comparable loci and 10 informative paired loci (refer to our answer to comment 8 for the difference between the two). We added more details regarding comparable and informative sites (Lines 152-160). Indeed, using fewer than 200 loci leads to an IBD estimation that is on average off by 0.1 or more (Taylor et al., 2019). However we showed that the barcode relatedness classification based on a cutoff of IBD (related when above 0.5, unrelated otherwise) was close enough to our gold standard using genomes (each pair having more than 1000 comparable sites). Because we use this classification approach rather than the exact value of barcode-estimated IBD in our study, our 30 minimum comparable sites cutoff seems sufficient.

      (14) Lines 206-207: which program did you use to analyse Fws?

      We did not use any program, we computed Fws according to Manske et al. (2012) methods.

      (15) Line 233: "we attempted parasite genotyping and whole genome  sequencing of 522 isolates over 16 time points" => This is confusing, you did not do WGS of 522 samples, only 199 as mentioned in the next  sentence.

      We attempted whole genome sequencing on 331 isolates and molecular genotyping on 442 isolates with 251 isolates common between the two methods. We updated our text to clarify this point (Lines 247-252).

      (16) Lines 256-259: Add a range of proportions or some other summary  statistic in this section as you are only referring here to  supplementary figures to support these statements.

      The text has been updated (Lines 271-274).

      (17) Line 260: check the formatting of the reference "Collins22" as the rest of the document references are numbered.

      Fixed.

      (18) Figure 2/3:

      a. You could also inspect relatedness at the temporal level, by  adjusting the network figure where the color is village and shape is  time (month/year).

      Although visualising the effect of time on the parasite relatedness network would be a valuable addition, we did not find any intuitive and simple way of doing so. Using shapes to represent time might end up being more confusing than helpful, especially because the sampling was not done at fixed intervals.

      b. To further support the statement of clustering at the household  level, it might be useful to add a (supplementary) figure with the  network with household number/IDs as color or shape. In the network,  there seems to be a lot of relatedness within the villages and between  villages. Perhaps looking only at the distribution of the proportion of  highly related isolates is simplifying the data too much. Besides, there is no statistical difference between clustering at the household vs  within-village levels as indicated in Figure 3.

      Unfortunately, there are too many households (71 in Figure 2) to make a figure with one color or shape per household readable. The statistical test of the difference between the within household and within village relatedness yielded a p value above the cutoff of 0.05 (p value of 0.084). However, it is possible that the lack of significance arises from the relatively low number of data points available in the “within household” group. This is even more plausible considering the statistical difference of both “within household” and “within village” groups with “between village” group. Overall, our results indicate a decreasing parasite relatedness with spatial distance, and that more investigation would be needed to quantify the difference between “within household” and “within village” groups. 

      (19) Figure 4: Please add more description in the caption of this figure to help interpret what is displayed here. Figure 4A is hard to  interpret and does not seem to show more than is already shown in Figure 3A. What do the dots represent in Figure 4B? It is not clear what is  presented here.

      Compared to Figure 3A, Figure 4A enables the visualization of the relatedness between each individual pair of time points, which are later used in the comparison of relatedness between seasonal groups in Figure 4B. For this reason, we believe that Figure 4A should remain in the manuscript. However, we agree that the relationship between Figure 4A and Figure 4B is not intuitive in the way we presented it initially. For this reason, we added more details in the legend and modified Figure 4A to highlight the seasonal groups used in Figure 4B. 

      (20) Line 360-361: what did you do when haplotypes were not identical?

      We explained it in the methods section (Lines 144-146): in this case, only WGS haplotypes were kept.

      (21) Section chronic infections: it is important to mention that the  majority of chronic infections are individuals from the monthly  dry-season cohort.

      We added a statement about the 21 chronically infected individuals that were also part of the December 2016 – May 2017 monthly follow-up (Lines 423-426).

      (22) Lines 381-386: Did you investigate COI in these individuals? Could  it be co-circulating strains that you do not pick up at all times due to the consensus barcodes and discarding of mixed genotypes (and does not  necessarily show intra-host competition. That is speculation and should  perhaps not be in the results)?

      This is exactly what we think is happening. Due to the very nature of genotyping, only one strain may be observed at a time in the case of a co-infection, where distinct but related strains are simultaneously present in the host. The picked-up strain is typically the one with the highest relative abundance at the time of sampling. As the reviewer stated, fluctuation of strain abundance might not only be due to intra-host competition but also asynchronous development stages of the two strains. We added this observation to the manuscript (Lines 432-435).

      (22) Figure 6: highlight the samples where the barcode was not available in a different color to be able to see the difference between a  non-matching barcode and missing data.

      We thank the reviewer for this great suggestion. We have now added to Figure 6 barcodes available along with their level of relatedness with the dominant genotypes for each continuous infections.

      (24) Improve the discussion by adding a clear summary of the main  findings and their implications, as well as study-specific limitations.

      The Discussion has been updated with a paragraph summarizing the primary results (Lines 451-457).

      (25) Line 445: "implying that the whole population had been replaced in just one year "

      a. What do you mean by replaced? Did other populations replace the  existing populations? I am not sure the lack of IBD is enough to show  that the population changed/was replaced. Perhaps it is more accurate to say that the same population evolved. Nevertheless, other measures such as genetic diversity and genetic differentiation or population  structure.would be more suitable to strengthen these conclusions.

      We agree that “replaced” was the wrong term in this case. We rather intended to describe how the numerous recombinations between malaria parasites completely reshaped the same initial population which gradually displayed lower levels of relatedness over time. We updated the manuscript accordingly (Lines 507-512).

      Reviewer #2 (Recommendations for the authors):

      (1) Line 260: Remove Collins 22.

      Fixed.

      (2) Lines 270-274: 73 + 213 = 286 not 284; sum of percentages is equal to 101%.

      The numbers are correct: the 73 barcodes identical (IBD >= 0.9) to another barcode are a subset of the 213 related (IBD >= 0.5) to another barcode. However we agree that this might be confusing and will considering barcodes to be related if they have an IBD between 0.5 and 0.9, while excluding those with an IBD >= 0.9. The text has been updated (Lines 299-301).

      (3) Section: "Independence of seasonality and drug resistance markers prevalence".

      The text has been revised and the supplementary figure is now a main figure.

      (4) For readers unaware of malaria control policy in the Gambia it would be helpful to have more details on the specifics of anti-malarial drug  administration.

      We added the drugs used in SMC (sulfadoxine-pyrimethamine and amodiaquine) and the first line antimalarial treatment in use in The Gambia during our study (Coartem) (Lines 383-388).

      Reviewer #3 (Recommendations for the authors):

      (1) The abstract is not as clear as the authors' summary. For example, I found the sentence starting with "with 425 P. falciparum..." hard to  follow.

      The abstract has been updated.

      (2) It is better to consistently use "barcode genotyping "or "genotyping by barcode". Sometimes "molecular genotyping" is used instead of  "barcode genotyping"

      We have now replaced all occurrences of “barcode genotyping” with “molecular genotyping” or “molecular barcode genotyping”. We prefer to stick with “molecular genotyping” as this let us distinguish between the molecular and the genomic barcode.

      (3) The introduction is quite disjoined and does not provide a clear  build-up to the gap in knowledge that the study is attempting to fill.  please revise.

      Introduction is now thoroughly revised.

      (4) Line 31 "with notable increase of parasite differentiation" is an interpretation and not an observation.

      We have modified that sentence (Lines 31-33).

      (5) Overall, the introduction requires substantial revision.

      Introduction is now thoroughly revised.

      (6) Line 70 "parasite population adapts..." I thought this required phenotypic analysis and not genetics?

      The idea is that population of parasites may adapt to environmental conditions (such as seasonality) by selecting the most fitted genotypes. For instance, antimalarial exposure has an effect of selecting parasites with specific mutations in drug resistance related genes, and this even appears to be transient (for example with chloroquine). As such, there is good reason to think that seasonality might have a similar effect on parasite genetics.

      (7) Line 129-130: the #442 is not reflected in the schematic Figure 1.

      This is an intentional choice to make the figure more synthetic. For this reason, we included the Figure S1, which provides more details on the data collection and analysis pipeline.

      (8) Line 242-243: "Made with natural earth". What is this?

      This is a statement acknowledging the use of Natural Earth data to produce the map presented in Figure 1A.

      (9) Line 260: "collins22", is this a reference?

      Fixed.

      (10) Line 269-70. Very hard to follow. Please revise.

      We changed the text (Lines 293-297).

      (11) Line 324: similarly... I think there is a typo here.

      We did not find any typo in this specific sentence. However, “Similarly to Figure 3” sounds maybe a bit off, so we changed it to “As in Figure 3” (Line 351).

      (12) Line 332-334: very hard to follow. please revise. Again, the lower  parasite relatedness during the transition from low to high was linked  to recombination occurring in the mosquito but what about infection  burden shifting to naive young children? Is there a role for host  immunity in the observed reduction in parasite-relatedness during the  transition period?

      This text has been rewritten (Lines 356-361).

      About the hypothesis of infection burden shifting to naïve young children, this question is difficult to address in The Gambia because children under 5 years old received Seasonal Malaria Chemoprophylaxis during the high transmission season. In older children (6-15 years old), the prevalence was similar to adults (Fogang et al., 2024).

      About the role of host immunity on parasite relatedness across time and space, our dataset is too small to divide it in different age groups. Further studies should address this very interesting question.

    1. eLife Assessment

      The authors show MRI relaxation time changes that are claimed to originate from cell membrane potential changes. This would be a substantial contribution if true because it may provide a mechanism whereby membrane potential changes could be inferred noninvasively. However, the membrane potential manipulations applied here are performed on a slow time scale and are known to induce cell swelling. Cell swelling has been previously shown to affect relaxation time. Experiments could be performed to rule out this hypothesis, but the authors have chosen not to perform these experiments. The study is therefore useful, but the evidence is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      This paper examines changes in relaxation time (T1 and T2) and magnetization transfer parameters that occur in a model system and in vivo when cells or tissue are depolarized using an equimolar extracellular solution with different concentrations of the depolarizing ion K+. The motivation has been revised to state that the results suggest a potential approach to non-invasively detect changes in membrane potential using MRI.

      Strengths:

      The authors argue that the use of various concentrations of KCL in the extracellular fluid depolarize or hyperpolarize the cell pellets used, and that this change in membrane potential is the driving force for the T2 (and T1-supplementary material) changes observed. In particular, they report an increase in T2 with increasing KCL concentration in the extracellular fluid (ECF) of pellets of SH-SY5Y cells. To offset the increasing osmolarity of the ECF due to the increase in KCL, the NaCL molarity of the ECF is proportionally reduced. The authors measure the intracellular voltage using patch clamp recordings, which is a gold standard. With 80 mM of KCL in the ECF, a change in T2 of the cell pellets of ~10 ms is observed with the intracellular potential recorded as about -6 mv. A very large T1 increase of ~90 ms is reported under the same conditions. The PSR (ratio of hydrogen protons on macromolecules to free water) decreases by about 10% at this 80 mM KCL concentration. Similar results are seen in a Jurkat cell line and similar, but far smaller changes are observed in vivo, for a variety of reasons discussed. As a final control, T1 and T2 values are measured in the various equimolar KCL solutions. As expected, no significant changes in T1 and T2 of the ECF were observed for these concentrations.

      Weaknesses:

      While the concepts presented are interesting, and the actual experimental methods seem to be nicely executed, the conclusions are not supported by the data for a number of reasons. This is not to say that the data isn't consistent with the conclusions, but there are other controls not included that would be necessary to draw the conclusion that it is membrane potential that is driving these T1 and T2 changes. The results are consistent with Stroman et al. Magn. Reson. in Med. 59:700-706 (increased T2 with KCL) as well as some other cited work. However all those authors emphasize that cell swelling is the mechanism, not cell membrane potentials.

      It is well established that cells swell/shrink upon depolarization/hyperpolarization. Cell swelling is accompanied by increased light transmittance in vivo, and this should be true in the pellet system as well. In a beautiful series of experiments, Stroman et al. (2008) showed in perfused brain slices that the cells swell upon equimolar KCL depolarization and the light transmittance increases. The time course of these changes is quite slow, of the order of many minutes, both for the T2-weighted MRI signal and for the light transmittance. Stroman et al. also show that hypoosmotic changes produce the exact same timecourse as the KCL depolarization changes (and vice versa for the hyperosmotic changes - which cause cell shrinkage). Their conclusion therefore, was that cell swelling (not membrane potential) was the cause of the T2-weighted changes observed, and that these were relatively slow (on the scale of many minutes).

      What are the implications for the current study? Well, for one, the authors cannot exclude cell swelling as the mechanism for T2 changes, as they have not measured that. It is however well established that cell swelling occurs during depolarization, so this is not in question. Water in the pelletized cells is in slow/intermediate exchange with the ECF, and the solutions for the two compartment relaxation model for this are well established (see Menon and Allen, Magn. Reson. in Med. 20:214-227 (1991). The T2 relaxation times should be multiexponential (see point (3) further below). The current work cannot exclude cell swelling as the mechanism for T2 changes (it is mentioned in the paper, but not dealt with). Water entering cells dilutes the protein structures, changes rotational correlation times of the proteins in the cell and is known to increase T2. The PSR confirms that this is indeed happening, so the data in this work is completely consistent with the Stroman work and completely consistent with cell swelling associated with depolarization. The authors should have performed light scattering studies to demonstrate the degree cell swelling or shrinkage. Measuring intracellular potential is not enough to clarify the mechanism.

      So why does it matter whether the mechanism is cell swelling or membrane potential? The reason is response time. Cell swelling due to depolarization is a slow process, slower than hemodynamic responses that characterize BOLD. And in fact, cell swelling under normal homeostatic conditions in vivo is virtually non-existent. Only sustained depolarization events typically associated with non-naturalistic stimuli or brain dysfunction produce cell swelling. Membrane potential changes associated with neural activity, on the other hand, are very fast. In this manuscript, the authors have convincingly shown a signal change that is virtually the same as what was seem in the Stroman publication, but they have not shown that there is a response that can be detected with anything approaching the timescale of an action potential. So one cannot definitely say that the changes observed are due to membrane potential. One can only say they are consistent with cell swelling, regardless of what causes the cell swelling. The First line of the discussion still claims that T2 relaxation time and pool size ratio (PSR) can detect responses to membrane potential changes modulated by ionic solutions. However, in the absence of cell swelling controls, this cannot be stated.

      For this mechanism to be relevant to measuring neuronal activity directly or explaining techniques such DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity. I think one would find that these are minuscule within the context of an action potential, or even bulk action potentials.

      Comments on revisions:

      The manuscript is well written and my previous methodological concerns have been clarified as well. There are no flaws in the experiments, but the interpretation really depends on simultaneous measurements of cell volume and membrane potential, which have yet to be done.

    3. Reviewer #2 (Public review):

      Summary:

      Min et al. attempt to demonstrate a mechanism whereby magnetic resonance imaging (MRI) can reflect changes in neuronal membrane potentials. They approach this goal by studying how MRI contrast and cellular potentials together respond to treatment of cultured cells with ionic solutions that are known to depolarize or hyperpolarize excitable cells. The authors specifically examine two MRI-based measurements: (A) the transverse (T2) relaxation rate, which reflects microscopic magnetic fields caused by solutes and biological structures; and (B) the fraction or "pool size ratio" (PSR) of water molecules estimated to be bound to macromolecules, using an MRI technique called magnetization transfer (MT) imaging. They see that depolarizing K+ and Ba2+ concentrations lead to T2 increases and PSR decreases that vary approximately linearly with parallel measurements of voltage in a neuroblastoma cell line and that change similarly in a second cell type. They also show that depolarizing potassium concentrations evoke T2 increases in rat brains, and that these changes are reversed when potassium is renormalized. Min et al. argue that their results suggest a basis for noninvasive functional imaging of cellular voltage signals. If this were true, it would help validate a recent paper published by some of the authors (Toi et al., Science 378:160-8, 2022), in which they claimed to be able to detect millisecond-scale neuronal responses by MRI.

      Strengths:

      The discovery of a mechanism for relating cellular membrane potential to MRI contrast could yield an important means for studying functions of the nervous system. Achieving this has been a longstanding goal in the MRI community, but previous strategies have proven insufficient for neuroscientific or clinical applications. The current paper suggests that one of the simplest and most widely used MRI contrast mechanisms-T2 weighted imaging-may indicate correlates of membrane potential if measured in the absence of the hemodynamic signals that most functional MRI (fMRI) experiments rely on. The authors make their case using quantitative tests that include some controls for ion and cell type-specificity of their in vitro results and reversibility of MRI changes observed in vivo.

      Weaknesses:

      The major weakness of the paper is that it uses only slow correlational experiments to probe the relationship between MRI contrast and membrane potential. The authors do not examine effects on the subsecond time scale that is of greatest interest, and they do not adequately consider how biophysical factors with only loose relationship to electrophysiological variables could explain their imaging results. Notably, depolarizing ionic solutions that perturb membrane potential can also induce changes in cellular volume and tissue structure that in turn alter MRI contrast properties similarly to the results shown here. For example, a study by Stroman et al. (Magn Reson Med 59:700-6, 2008) reported reversible potassium-dependent T2 increases in neural tissue that correlate closely with light scattering-based indications of cell swelling. Phi Van et al. (Sci Adv 10:eadl2034, 2024) showed that potassium addition to one of the cell lines used here likewise leads to cell size increases and T2 increases. In their revised manuscript, the authors acknowledge that cell swelling might contribute to the MRI signals they report, but they do nothing to probe the contributions or characteristics of such effects. If cell swelling accounted for the author's MRI results, it would likely operate on a time scale far too slow to yield useful indications of membrane potential. Given these considerations and the absence of data demonstrating correspondence of electrophysiological measures with MRI readouts on a fast time scale, the paper fails to provide evidence that membrane potential changes can be meaningfully detected by MRI.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines changes in relaxation time (T1 and T2) and magnetization transfer parameters that occur in a model system and in vivo when cells or tissue are depolarized using an equimolar extracellular solution with different concentrations of the depolarizing ion K<sup>+</sup>. The motivation is to explain T2 changes that have previously been observed by the authors in an in vivo model with neural stimulation (DIANA) and to try to provide a mechanism to explain those changes.

      Strengths:

      The authors argue that the use of various concentrations of KCL in the extracellular fluid depolarize or hyperpolarize the cell pellets used and that this change in membrane potential is the driving force for the T2 (and T1-supplementary material) changes observed. In particular, they report an increase in T2 with increasing KCL concentration in the extracellular fluid (ECF) of pellets of SH-SY5Y cells. To offset the increasing osmolarity of the ECF due to the increase in KCL, the NaCL molarity of the ECF is proportionally reduced. The authors measure the intracellular voltage using patch clamp recordings, which is a gold standard. With 80 mM of KCL in the ECF, a change in T2 of the cell pellets of ~10 ms is observed with the intracellular potential recorded as about -6 mv. A very large T1 increase of ~90 ms is reported under the same conditions. The PSR (ratio of hydrogen protons on macromolecules to free water) decreases by about 10% at this 80 mM KCL concentration. Similar results are seen in a Jurkat cell line and similar, but far smaller changes are observed in vivo, for a variety of reasons discussed. As a final control, T1 and T2 values are measured in the various equimolar KCL solutions. As expected, no significant changes in T1 and T2 of the ECF were observed for these concentrations.

      Weaknesses:

      [Reviewer 1, Comment 1] While the concepts presented are interesting, and the actual experimental methods seem to be nicely executed, the conclusions are not supported by the data for a number of reasons. This is not to say that the data isn't consistent with the conclusions, but there are other controls not included that would be necessary to draw the conclusion that it is membrane potential that is driving these T1 and T2 changes. Unfortunately for these authors, similar experiments conducted in 2008 (Stroman et al. Magn. Reson. in Med. 59:700-706) found similar results (increased T2 with KCL) but with a different mechanism, that they provide definite proof for. This study was not referenced in the current work.

      It is well established that cells swell/shrink upon depolarization/hyperpolarization. Cell swelling is accompanied by increased light transmittance in vivo, and this should be true in the pellet system as well. In a beautiful series of experiments, Stroman et al. (2008) showed in perfused brain slices that the cells swell upon equimolar KCL depolarization and the light transmittance increases. The time course of these changes is quite slow, of the order of many minutes, both for the T2-weighted MRI signal and for the light transmittance. Stroman et al. also show that hypoosmotic changes produce the exact same time course as the KCL depolarization changes (and vice versa for the hyperosmotic changes - which cause cell shrinkage). Their conclusion, therefore, was that cell swelling (not membrane potential) was the cause of the T2-weighted changes observed, and that these were relatively slow (on the scale of many minutes).

      What are the implications for the current study? Well, for one, the authors cannot exclude cell swelling as the mechanism for T2 changes, as they have not measured that. It is however well established that cell swelling occurs during depolarization, so this is not in question. Water in the pelletized cells is in slow/intermediate exchange with the ECF, and the solutions for the two compartment relaxation model for this are well established (see Menon and Allen, Magn. Reson. in Med. 20:214-227 (1991). The T2 relaxation times should be multiexponential (see point (3) further below). The current work cannot exclude cell swelling as the mechanism for T2 changes (it is mentioned in the paper, but not dealt with). Water entering cells dilutes the protein structures, changes rotational correlation times of the proteins in the cell and is known to increase T2. The PSR confirms that this is indeed happening, so the data in this work is completely consistent with the Stroman work and completely consistent with cell swelling associated with depolarization. The authors should have performed light scattering studies to demonstrate the presence or absence of cell swelling. Measuring intracellular potential is not enough to clarify the mechanism.

      [Reviewer 1, Response 1] We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed changes in T2, PSR, and T1, especially in pelletized cells. For this reason, we already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes, though this study did not present the magnitude of the cell volume changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we additionally discussed the work of Stroman et al. in the revised manuscript.

      In addition, we acknowledge that the title and main conclusion of the original manuscript may be misleading, as we did not separately consider the effect of cell volume changes on MR parameters. To more accurately reflect the scope and results of this study and also take into account the reviewer 2’s suggestion, we adjusted the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and also revised the relevant phrases in the main text.

      Finally, when [K<sup>+</sup>]-induced membrane potential changes are involved, there seems to be factors other than cell volume changes that appear to influence T<sup>2</sup> changes. Our follow-up study shows that there are differences in volume changes for the same T<sup>2</sup> change in the following two different situations: pure osmotic volume changes versus [K<sup>+</sup>]-induced volume changes. For example, for the same T<sup>2</sup> change, the volume change for depolarization is greater than the volume change for hypoosmotic conditions. We will present these results in this coming ISMRM 2025 and are also preparing a manuscript to report shortly.

      [Reviewer 1, Comment 2] So why does it matter whether the mechanism is cell swelling or membrane potential? The reason is response time. Cell swelling due to depolarization is a slow process, slower than hemodynamic responses that characterize BOLD. In fact, cell swelling under normal homeostatic conditions in vivo is virtually non-existent. Only sustained depolarization events typically associated with non-naturalistic stimuli or brain dysfunction produce cell swelling. Membrane potential changes associated with neural activity, on the other hand, are very fast. In this manuscript, the authors have convincingly shown a signal change that is virtually the same as what was seen in the Stroman publication, but they have not shown that there is a response that can be detected with anything approaching the timescale of an action potential. So one cannot definitely say that the changes observed are due to membrane potential. One can only say they are consistent with cell swelling, regardless of what causes the cell swelling.

      For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity. I think one would find that these are minuscule within the context of an action potential, or even bulk action potential.

      [Reviewer 1, Response 2] In the context of cell swelling occurring at rapid response times, if we define cell swelling simply as an “increase in cell volume,” there are several studies reporting transient structural (or volumetric) changes (e.g., ~nm diameter change over ~ms duration) in neuron cells during action potential propagation (Akkin et al., Biophys J 93:1347-1353, 2007; Kim et al., Biophys J 92:3122-3129, 2007; Lee et al., IEEE Trans Biomed Eng 58:3000-3003, 2011; Wnek et al., J Polym Sci Part B: Polym Phys 54:7-14, 2015; Yang et al., ACS Nano 12:4186-4193, 2018). These studies show a good correlation between membrane potential changes and cell volume changes (even if very small) at the cellular level within milliseconds.

      As mentioned in the Response 1 above, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly mentioned as one of the limitations in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (e.g., T<sup>2</sup> and PSR) when using ionic solutions that modulate membrane potential. Identifying MR parameter changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be addressed in the follow-up study mentioned in the Response 1 above.

      There are a few smaller issues that should be addressed.

      [Reviewer 1, Comment 3] (1) Why were complicated imaging sequences used to measure T1 and T2? On a Bruker system it should be possible to do very simple acquisitions with hard pulses (which will not need dictionaries and such to get quantitative numbers). Of course, this can only be done sample by sample and would take longer, but it avoids a lot of complication to correct the RF pulses used for imaging, which leads me to the 2nd point.

      [Reviewer 1, Response 3] We appreciate the reviewer’s suggestion regarding imaging sequences. In fact, we used dictionaries for fitting in vivo T<sup>2</sup> decay data, not in vitro data. Sample-by-sample nonlocalized acquisition with hard pulses may be applicable for in vitro measurements. However, for in vivo measurements, a slice-selective multi-echo spin-echo sequence was necessary to acquire T<sup>2</sup> maps within a reasonable scan time. Our choice of imaging sequence was guided by the need to spatially resolve MR signals from specific regions of interest while balancing scan time constraints.

      [Reviewer 1, Comment 4] (2) Figure S1 (H) is unlike any exponential T2 decay I have seen in almost 40 years of making T2 measurements. The strange plateau at the beginning and the bump around TE = 25 ms are odd. These could just be noise, but the fitted curve exactly reproduces these features. A monoexponential T2 decay cannot, by definition, produce a fit shaped like this.

      [Reviewer 1, Response 4] The T<sup>2</sup> decay curves in Figure S1(H) indeed display features that deviate from a simple monoexponential decay. In our in vivo experiments, we used a multi-echo spin-echo sequence with slice-selective excitation and refocusing pulses. In such sequences, the echo train is influenced by stimulated echoes and imperfect slice profiles. This phenomenon is inherent to the pulse sequence rather than being artifacts or fitting errors (Hennig, Concepts Magn Reson 3:125-143, 1991; Lebel and Wilman, Magn Reson Med 64:1005-1014, 2010; McPhee and Wilman, Magn Reson Med 77:2057-2065, 2017). Therefore, we fitted the T<sub>2</sub> decay curve using the technique developed by McPhee and Wilman (2017).

      [Reviewer 1, Comment 5] (3) As noted earlier, layered samples produce biexponential T2 decays and monoexponential T1 decays. I don't quite see how this was accounted for in the fitting of the data from the pellet preparations. I realize that these are spatially resolved measurements, but the imaging slice shown seems to be at the boundary of the pellet and the extracellular media and there definitely should be a biexponential water proton decay curve. Only 5 echo times were used, so this is part of the problem, but it does mean that the T2 reported is a population fraction weighted average of the T2 in the two compartments.

      [Reviewer 1, Response 5] We understand the reviewer’s concern regarding potential biexponential decay due to the presence of different compartments. In our experiments, we carefully positioned the imaging slice sufficiently remote from the pellet-media interface. This approach ensures that the signal predominantly arises from the cells (and interstitial fluid), excluding the influence of extracellular media above the cell pellet. We described the imaging slice more clearly in the revised manuscript. As mentioned in our Methods section, for in vitro experiments, we repeated a single-echo spin-echo sequence with 50 difference echo times. While Figure 1C illustrates data from five echo times for visual clarity, the full dataset with all 50 echo times was used for fitting. We clarified this point in the revised manuscript to avoid any misunderstanding.

      [Reviewer 1, Comment 6] (4) Delta T1 and T2 values are presented for the pellets in wells, but no absolute values are presented for either the pellets or the KCL solutions that I could find.

      [Reviewer 1, Response 6] As requested by the reviewer, we included the absolute values in the supplementary information.

      Reviewer #2 (Public review):

      Summary:

      Min et al. attempt to demonstrate that magnetic resonance imaging (MRI) can detect changes in neuronal membrane potentials. They approach this goal by studying how MRI contrast and cellular potentials together respond to treatment of cultured cells with ionic solutions. The authors specifically study two MRI-based measurements: (A) the transverse (T2) relaxation rate, which reflects microscopic magnetic fields caused by solutes and biological structures; and (B) the fraction or "pool size ratio" (PSR) of water molecules estimated to be bound to macromolecules, using an MRI technique called magnetization transfer (MT) imaging. They see that depolarizing K<sup>+</sup> and Ba2+ concentrations lead to T2 increases and PSR decreases that vary approximately linearly with voltage in a neuroblastoma cell line and that change similarly in a second cell type. They also show that depolarizing potassium concentrations evoke reversible T2 increases in rat brains and that these changes are reversed when potassium is renormalized. Min et al. argue that this implies that membrane potential changes cause the MRI effects, providing a potential basis for detecting cellular voltages by noninvasive imaging. If this were true, it would help validate a recent paper published by some of the authors (Toi et al., Science 378:160-8, 2022), in which they claimed to be able to detect millisecond-scale neuronal responses by MRI.

      Strengths:

      The discovery of a mechanism for relating cellular membrane potential to MRI contrast could yield an important means for studying functions of the nervous system. Achieving this has been a longstanding goal in the MRI community, but previous strategies have proven too weak or insufficiently reproducible for neuroscientific or clinical applications. The current paper suggests remarkably that one of the simplest and most widely used MRI contrast mechanisms-T2 weighted imaging-may indicate membrane potentials if measured in the absence of the hemodynamic signals that most functional MRI (fMRI) experiments rely on. The authors make their case using a diverse set of quantitative tests that include controls for ion and cell type-specificity of their in vitro results and reversibility of MRI changes observed in vivo.

      Weaknesses:

      [Reviewer 2, Comment 1] The major weakness of the paper is that it uses correlational data to conclude that there is a causational relationship between membrane potential and MRI contrast. Alternative explanations that could explain the authors' findings are not adequately considered. Most notably, depolarizing ionic solutions can also induce changes in cellular volume and tissue structure that in turn alter MRI contrast properties similarly to the results shown here. For example, a study by Stroman et al. (Magn Reson Med 59:700-6, 2008) reported reversible potassium-dependent T2 increases in neural tissue that correlate closely with light scattering-based indications of cell swelling. Phi Van et al. (Sci Adv 10:eadl2034, 2024) showed that potassium addition to one of the cell lines used here likewise leads to cell size increases and T2 increases. Such effects could in principle account for Min et al.'s results, and indeed it is difficult to see how they would not contribute, but they occur on a time scale far too slow to yield useful indications of membrane potential. The authors' observation that PSR correlates negatively with T2 in their experiments is also consistent with this explanation, given the inverse relationship usually observed (and mechanistically expected) between these two parameters. If the authors could show a tight correspondence between millisecond-scale membrane potential changes and MRI contrast, their argument for a causal connection or a useful correlational relationship between membrane potential and image contrast would be much stronger. As it is, however, the article does not succeed in demonstrating that membrane potential changes can be detected by MRI.

      [Reviewer 2, Response 1] We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed MR parameter changes. For this reason, we have already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008) and Phi Van et al. (Sci Adv 10:eadl2034, 2024). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we additionally discussed both work of Stroman et al. and Phi Van et al. in the revised manuscript.

      In addition, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly discussed as one of the limitations of this study in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (although on a slow time scale) when using ionic solutions that modulate membrane potential. Identifying MR parameter changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be addressed in the follow-up study mentioned in the Response 1 to Reviewer 1’s Comment 1 above.

      Together, we acknowledge that the title and main conclusion of the original manuscript may be misleading. To more accurately reflect the scope and results of this study and also consider the reviewer’s suggestion, we adjusted the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and also revised the relevant phrases in the main text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      [Reviewer 1, Comment 7] The manuscript is well written. One thing to emphasize early on is that the KCL depolarization is done in an equimolar (or isotonic) manner. I was not clear on this point until I got to the very end of the methods. This is a strength of the paper and should be presented earlier.

      [Reviewer 1, Response 7] In response to the reviewer’s suggestion, we have revised the manuscript to present the equimolar characteristic of our experiment earlier.

      [Reviewer 1, Comment 8] In terms of experiments, the relaxation time measurements are not well constructed. They should be done with a CPMG sequence with hundreds of echos and properly curve fit. This is entirely possible on a Bruker spectrometer.

      [Reviewer 1, Response 8] As noted in our Response to Reviewer 1’s Comment 3, while a CPMG sequence with numerous echoes and straightforward curve fitting can be effective, it is less feasible for in vivo experiments. Our multi-echo spin-echo sequence was a balanced approach between spatial resolution, reasonable scan duration, and the need to localize signals within specific regions of interest.

      [Reviewer 1, Comment 9] Measurements of cell swelling should be done to determine the time course of the cell swelling. This could be with NMR (CPMG) or with light scattering. For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity.

      [Reviewer 1, Response 9] We acknowledge the importance of further research to further strengthened the claims of this study through additional experiments such as cell volume recording. We will do it in future studies.

      As noted in our Response 2 to Reviewer 1’s Comment 2, this study does not address rapid membrane potential changes on the millisecond scale, and we acknowledge that establishing the precise timing of cell swelling is crucial for fully understanding the mechanisms of DIANA. Our current work demonstrates that MR parameters (e.g., T<sup>2</sup> and PSR) correlate strongly with membrane potential-modulating ionic environments, but it does not extend to millisecond-scale neural activation. We recognize the importance of further experiments, such as direct cell volume measurements and plan to incorporate it in future studies to build on the insights gained from the present work.

      Reviewer #2 (Recommendations for the authors):

      Here are a few comments, questions, and suggestions for improvement:

      [Reviewer 2, Comment 2] I could not find much information about the various incubation times and delays used for the authors' in vitro experiments. For each of the in vitro experiments in particular, how long were cells exposed to the stated ionic condition prior to imaging, and how long did the imaging take? Could this and any other relevant information about the experimental timing please be provided and added to the methods section?

      [Reviewer 2, Response 2] We have included the information about the preparation/incubation times in the revised manuscript. For the scan time, it was already stated in the original manuscript: 23 minutes for the single-echo spin-echo sequence and 23 minutes for the inversion-recovery multi-echo spin-echo, for a total of 46 minutes.

      [Reviewer 2, Comment 3] In what format were the cells used for patch clamping, and were any controls done to ensure that characteristics of these cells were the same as those pelleted and imaged in the MRI studies? How long were the incubation times with ionic solutions in the patch clamp experiment? This information should likewise be added to the paper.

      [Reviewer 2, Response 3] We have clarified in the revised manuscript that SH-SY5Y cells were patch clamp-measured in their adherent state. On the other hand, the cells were dissociated from the culture plate and pelleted, so the experimental environments were not entirely identical. The patch clamp experiments involved a 20–30 minutes incubation period with the ionic solutions. We have included this information in the revised manuscript.

      [Reviewer 2, Comment 4] Can the authors provide information about the mean cell size observed under each condition in their in vitro experiments?

      [Reviewer 2, Response 4] We did not directly quantify the mean cell size for each in vitro condition in this study, so we do not have corresponding data. However, we acknowledge that this information could provide valuable insights into potential mechanisms underlying the observed MR parameter changes. In future experiments, we plan to include direct cell-size measurements to further elucidate how changes in cell volume or hydration contribute to our MR findings.

      [Reviewer 2, Comment 5] The ionic challenges used both in vitro and in vivo could also have affected cell permeability, with corresponding effects that would be detectable in diffusion weighted imaging. Did the authors examine this or obtain any results that could reflect on contributions of permeability properties to the contrast effects they report?

      [Reviewer 2, Response 5] We did not perform diffusion-weighted imaging and therefore do not have direct data regarding changes in cell permeability. We agree that incorporating diffusion-weighted measurements could help distinguish whether the MR parameters changes are driven primarily by membrane potential shifts, cell volume changes, or variations in permeability properties. We will consider these approaches in our future studies.

      [Reviewer 2, Comment 6] Clearly, a faster stimulation method such as optogenetics, in combination with time-locked MRI readouts of the pelleted cells, would be more effective at demonstrating a useful relationship between cellular neurophysiology and MRI contrast in vitro. Can the authors present data from such an experiment? Is there any information they can present that documents the time course of observed responses in their experiments?

      [Reviewer 2, Response 6] In the current study, our methodology did not include time-resolved or dynamic measurements. While it may be possible to obtain indirect information about the temporal dynamics using T<sup>2</sup>-weighted or MT-weighted imaging, such an experiment was beyond the scope of this work. However, we agree that an optogenetic approach with time-locked MRI acquisitions could help directly link cell physiology to MRI contrast, and we will explore this in future studies.

      [Reviewer 2, Comment 7] The authors used a drug cocktail to suppress hemodynamic effects in the experiments of Figs. 5-6. What evidence is there that this cocktail successfully suppresses hemodynamic responses and that it also preserves physiological responses to the ionic challenges used in their experiments? Were analogous in vivo results also obtained in the absence of the cocktail?

      [Reviewer 2, Response 7] We appreciate the reviewer’s concern regarding pharmacological suppression of hemodynamic effects. Although each component is known to inhibit nitric oxide synthesis, we did not directly measure the degree of hemodynamic suppression in this study. In addition, we cannot definitively confirm that these agents preserved the physiological responses to the ionic challenges. We have clarified these points in the revised manuscript and identified them as limitations of the study.

      [Reviewer 2, Comment 8] Why weren't PSR results reported as part of the in vivo experimental results in Fig. 5? Does PSR continue to vary inversely to T2 in these experiments?

      [Reviewer 2, Response 8] In our current experimental setup, acquiring the T<sup>2</sup> map four times required 48 minutes, and extending the scan to include additional quantitative MT measurements for PSR would have significantly prolonged the scanning session. Given that these experiments were conducted on acutely craniotomized rats, maintaining stable physiological conditions for such a long period of time was challenging. Therefore, due to time constraints, we did not perform MT measurements and focused on T<sub>2</sub> mapping.

      [Reviewer 2, Comment 9] The authors have established in vivo optogenetic stimulation paradigms in their laboratory and used them in the Toi et al. DIANA study. Were T2 or PSR changes observed in vivo using standard T2 measurement or T2-weighted imaging methods that do not rely on the DIANA pulse sequence they originally applied?

      [Reviewer 2, Response 9] Our current T<sub>2</sub> mapping experiments utilized a standard multi-echo spin-echo sequence, rather than the DIANA pulse sequence employed in our previous work. In this respect, the T<sub>2</sub> changes we observed in vivo do not rely on the specialized DIANA methodology.

      [Reviewer 2, Comment 10] In the discussion section, the authors state that to their knowledge, theirs "is the first report that changes in membrane potential can be detected through MRI." This cannot be true, as their own Toi et al. Science paper previously claimed this, and a number of the studies cited on p.2 also claimed to detect close correlates of neuroelectric activity. This statement should be amended or revised.

      [Reviewer 2, Response 10] We appreciate the reviewer’s comment. We have revised the discussion section of the manuscript to reflect the points raised by the reviewer.

      [Reviewer 2, Comment 11] Because the current study does not actually demonstrate that changes in membrane potential can be detected by MRI, the authors should alter the title, abstract, and a number of relevant statements throughout the text to avoid implying that this has been shown. The title, for instance, could be changed to "Responses to depolarizing and hyperpolarizing ionic solutions measured by magnetic resonance imaging of excitable cells and rat brains," or something along these lines.

      [Reviewer 2, Response 11] We appreciate the reviewer’s suggestions. We have revised the title, abstract, and relevant statements of the manuscript to clarify that our findings show MR-detectable responses to ionic solutions that are expected to modulate membrane potential, rather than demonstrating direct detection of membrane potential changes by MRI.

      [Reviewer 2, Comment 12] The axes in Fig. 3 seem to be mislabeled. I think the horizontal axes are supposed to be membrane potential measured in mV.

      [Reviewer 2, Response 12] Thank the reviewer for finding an error. We have corrected the axis labels in Figure 3 to indicate membrane potential (in mV) on the horizontal axis.

      [Reviewer 2, Comment 13] Since neither the experiments in Jurkat cells (Fig. 4) nor the in vivo MRI tests (Fig. 5-6) appear to have made in conjunction with membrane potential measurements, it seems like a stretch to refer to these experiments as involving manipulation of membrane potentials per se. Instead, the authors should refer to them as involving administration of stimuli expected to be depolarizing or hyperpolarizing. The "hyperpolarization" and "depolarization" labels of Fig. 4 similarly imply a result that has not actually been shown, and should ideally be changed.

      [Reviewer 2, Response 13] To prevent any misleading that membrane potential changes were directly measured in Jurkat cells or in vivo, we have revised the relevant text and figure labels.

      [Reviewer 2, Comment 14] The changes in T2 and PSR documented with various K<sup>+</sup> challenges to Jurkat cells in Fig. 4 seem to follow a step-function-like profile that differs from the results reported in SH-SY5Y cells. Can the authors explain what might have caused this difference?

      [Reviewer 2, Response 14] We currently do not have a definitive explanation for why Jurkat cells exhibit a step-function-like response to varying K⁺ levels, whereas SH-SY5Y cells show a linear response to log [K<sup>+</sup>]. Experiments that include direct membrane potential measurements in Jurkat cells would help clarify whether this difference arises from genuinely different patterns of depolarization/hyperpolarization or from other factors. We have revised the revised manuscript to address this point.

    1. eLife Assessment

      A regression discontinuity analysis finds essentially no effect of 1 additional year of secondary education on brain structure in adulthood. This is a valuable finding that adds to the literature on the impact of education on brain health. While the finding is convincing on its own, as the analysis was pre-registered and very carefully conducted, the impact is limited as the manipulated variable only relates to a single additional year of education (remaining in education to 15 vs 16 years of age).

    2. Reviewer #2 (Public review):

      Summary:

      The authors conduct a causal analysis of years of secondary education on brain structure in late life. They use a regression discontinuity anlaysis to measure the impact of a UK law change in 1972 that increased the years of mandatory education by 1 year. Using brain imaging data from the UK Biobank, they find essentially no evidence for 1 additional year of education altering brain structure in adulthood.

      Strengths:

      The authors pre-registered the study and the regression discontinuity was very carefully described and conducted. They completed a large number of diagnostic and alternate analyses to allow for different possible features in the data. (Unlike a positive finding, a negative finding is only bolstered by additional alternative anlayses).

      Weaknesses:

      While the work is of high quality for the precise question asked, ultimately the exposure (1 additional year of education) is a very modest manipulation and the outcome measured long after the intervention. Thus a null finding here is completely consistent educational attainement (EA) in fact having an impact on brain structure, where EA may reflect elements of training after second education (e.g. university, post-graduate qualifications, etc) and not just stopping education at 16 yrs yes/no.

    3. Reviewer #3 (Public review):

      Summary:

      This study investigates evidence for a hypothesised, causal relationship between education, specifically the number of years spent in school, and brain structure as measured by common brain phenotypes such as surface area, cortical thickness, total volume and diffusivity.

      To test their hypothesis, the authors rely on a "natural" intervention, that is, the 1972 ROSLA act that mandated an extra year of education for all 15-year olds. The study's aim is to determine potential discontinuities in the outcomes of interest at the time of the policy change, which would indicate a causal dependence. Naturalistic experiments of this kind are akin to randomised controlled trials, the gold standard for answering questions of causality.

      Using two complementary, regression-based approaches, the authors find no discernible effect of spending an extra year in primary education on brain structure. The authors further demonstrate that observational studies showing an effect between education and brain structure may be confounded and thus unreliable when assessing causal relationships.

      Strengths:

      - A clear strength of this study is the large sample size totalling up to 30k participants from the UK Biobank. Although sample sizes for individual analyses are an order of magnitude smaller, most neuroimaging studies usually have to rely on much smaller samples.<br /> - This study has been preregistered in advance, detailing the authors' scientific question, planned method of inquiry and intended analyses, with only minor, justifiable changes in the final analysis.<br /> - The analyses look at both global and local brain measures used as outcomes, thereby assessing a diverse range of brain phenotypes that could be implicated in a causal relationship with a person's level of education.<br /> - The authors use multiple methodological approaches, including validation and sensitivity analyses, to investigate the robustness of their findings and, in the case of correlational analysis, highlight differences with related work by others.<br /> - The extensive discussion of findings and how they relate to the existing, somewhat contradictory literature gives a comprehensive overview of the current state of research in this area.

      Weaknesses:

      - This study investigates a well-posed but necessarily narrow question in a specific setting: 15-year old British students born around 1957 who also participate in the UKB imaging study roughly 60 years later. Thus conclusions about the existence or absence of any general effect of the number of years of education on the brain's structure are limited to this specific scenario.<br /> - The modelling approach used in this study requires that all covariates of no interest are equal before and after the cut-off, something that is impossible to test. However, other studies have not found specific issues that would invalidate ROSLA as a natural experiment.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      This fascinating manuscript studies the effect of education on brain structure through a natural experiment. Leveraging the UK BioBank, these authors study the causal effect of education using causal inference methodology that focuses on legislation for an additional mandatory year of education in a regression discontinuity design. 

      Strengths: 

      The methodological novelty and study design were viewed as strong, as was the import of the question under study. The evidence presented is solid. The work will be of broad interest to neuroscientists 

      Weaknesses: 

      There were several areas which might be strengthed from additional consideration from a methodological perspective. 

      We sincerely thank the reviewer for the useful input, in particular, their recommendation to clarify RD and for catching some minor errors in the methods (such as taking the log of the Bayes factors). 

      Reviewer #1 (Recommendations for the authors): 

      (1) The fuzzy local-linear regression discontinuity analysis would benefit from further description. 

      (2) In the description of the model, the terms "smoothness" and "continuity" appear to be used interchangeably. This should be adjusted to conform to mathematical definitions. 

      We have now added to our explanations of continuity regression discontinuity. In particular, we now explain “fuzzy”, and add emphasis on the two separate empirical approaches (continuity and local-randomization), along with fixing our use of “smoothness” and “continuity”.

      results:

      “Compliance with ROSLA was very high (near 100%; Sup. Figure 2). However, given the cultural and historical trends leading to an increase in school attendance before ROSLA, most adolescents were continuing with education past 15 years of age before the policy change (Sup Plot. 7b). Prior work has estimated 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      methods:

      “RD designs, like ours, can be ‘fuzzy’ indicating when assignment only increases the probability of receiving it, in turn, treatment assigned and treatment received do not correspond for some units 33,53. For instance, due to cultural and historical trends, there was an increase in school attendance before ROSLA; most adolescents were continuing with education past 15 years of age (Sup Plot. 7b). Prior work has estimated that 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      (3) The optimization of the smoother based on MSE would benefit from more explanation and consideration. How was the flexibility of the model taken into account in testing? Were there any concerns about post-selection inference? A sensitivity analysis across bandwidths is also necessary. Based on the model fit in Figure 1, results from a linear model should also be compared. 

      It is common in the RD literature to illustrate plots with higher-order polynomial fits while inference is based on linear (or at most quadratic) models (Cattaneo, Idrobo & Titiunik, 2019). We agree that this field-specific practice can be confusing to readers. Therefore, we have redone Figure 1 using local-linear fits better aligning with our analysis pipeline. Yet, it is still not a one-to-one alignment as point estimation and confidence are handled robustly while our plotting tools are simple linear fits. In addition, we updated Sup. Fig 3 and moved 3rd-order polynomial RD plots to Sup. Fig 4.

      Empirical RD has many branching analytical decisions (bandwidth, polynomial order, kernel) which can have large effects on the outcome. Fortunately, RD methodology is starting to become more standardized (Catteneo & Titiunik, 2022, Ann. Econ Rev) as there have been indications of publication bias using these methods (Stommes, Aronow & Sävje, 2023, Research and Politics (This paper suggest it is not researcher degrees of freedom, rather inappropriate inferential methods)). While not necessarily ill-intended, researcher degrees of freedom and analytic flexibility are major contributors to publication bias. We (self) limited our analytic flexibility by using pre-registration (https://osf.io/rv38z).

      One of the most consequential analytic decisions in RD is the bandwidth size as there is no established practice, they are context-specific and can be highly influential on the results. The choice of bandwidths can be framed as a ‘bias vs. variance trade-off’. As bandwidths increase, variance decreases since more subjects are added yet bias (misspecification error/smoothing bias) also increases (as these subjects are further away and less similar). In our case, our assignment (running/forcing) variable is ‘date of birth in months’; therefore our smallest comparison would be individuals born in August 1957 (unaffected/no treatment) vs September 1957 (affected/treated). This comparison has the least bias (subjects are the most similar to each other), yet it comes at the expense of very few subjects (high variance in our estimate). 

      MSE-derived bandwidths attempt to solve this issue by offering an automatic method to choose an analysis bandwidth in RD. Specifically, this aims to minimize the MSE of the local polynomial RD point estimator – effectively choosing a bandwidth by balancing the ‘bias vs. variance trade-off’ (explained in detail 4.4.2 Cattaneo et al., 2019 p 45 - 51 “A practical introduction to regression discontinuity designs: foundations”). Yet, you are very correct in highlighting potential overfitting issues as they are “by construction invalid for inference” (Calonico, Cattaneo & Farrell, 2020, p. 192). Quoting from Cattaneo and Titiunik’s Annual Review of Economics from 2022: 

      “Ignoring the misspecification bias can lead to substantial overrejection of the null hypothesis of no treatment effect. For example, back-of-the-envelop calculations show that a nominal 95% confidence interval would have an empirical coverage of about 80%.”

      Fortunately, modern RD analysis packages (such as rdrohust or RDHonest) calculate robust confidence intervals - for more details see Armstrong and Kolesar (2020). For a summary on MSE-bandwidths see the section “Why is it hard to estimate RD effects?” in Stommes and colleagues 2023 (https://arxiv.org/abs/2109.14526). For more in-depth handling see the Catteneo, Idrobo, and Titiunik primer (https://arxiv.org/abs/1911.09511).

      Lastly, with MSE-derived bandwidths sensitivity tests only make sense within a narrow window of the MSE-optimized bandwidth (5.5 Cattaneo et al., 2019 p 106 - 107). When a significant effect occurs, placebo cutoffs (artificially moving the cutoff) and donut-hole analysis are great sensitivity tests. Instead of testing our bandwidths, we decided to use an alternate RD framework (local randomization) in which we compare 1-month and 5-month windows. Across all analysis strategies, MRI modalities, and brain regions, we do not find any effects of the education policy change ROSLA on long-term neural outcomes.

      (4) In the Bayesian analysis, the authors deviated from their preregistered analytic plan. This whole section is a bit confusing in its current form - for example, point masses are not wide but rather narrow. Bayes factors are usually estimated; it is unclear how or why a prior was specified. What exactly is being modeled using a prior? Also, throughout - If the log was taken, as the methods seem to indicate for the Bayes factor, this should be mentioned in figures and reported estimates. 

      First, we would like to thank you for spotting that we incorrectly kept the log in the methods. We have fixed this and added the following sentence to the methods: 

      “Bayes factors are reported as BF<sub>10</sub> in support of the alternative hypothesis, we report Bayes factors under 1 as the multiplicative inverse (BF<sub>01</sub> = 1/BF)”

      All Bayesian analyses need to have a prior. In practice, this becomes an issue when you’re uncertain about 1) the location of the effect (directionality & center mass, defined by a location parameter), yet more importantly, the 2) confidence/certainty of the range-spread of possible effects (determined by a scale parameter). In normally distributed priors these two ‘beliefs’ are represented with a mean and a standard deviation (the latter impacts your confidence/certainty on the range of plausible parameter space). 

      Supplementary figure 6 illustrates several distributions (location = 0 for all) with varying scale parameters; when used as Bayesian priors this indicates differing levels of confidence in our certainty of the plausible parameter space. We illustrate our three reported, normally distributed priors centered at zero in blue with their differing scale parameters (sd = .5, 1 & 1.5).

      All of these five prior distributions have the same location parameter (i.e., 0) yet varying differences in the scale parameter – our confidence in the certainty of the plausible parameter space. At first glance it might seem like a flat/uniform prior (not represented) is a good idea – yet, this would put equal weight on the possibility of every estimate thereby giving the same probability mass to implausible values as plausible ones. A uniform prior would, for instance, encode the hypothesis that education causing a 1% increase in brain volume is just as plausible as it causing either a doubling or halving in brain volume. In human research, we roughly know a range of reasonable effect sizes and it is rare to see massive effects.

      A benefit of ‘weakly-informative’ priors is that they limit the range of plausible parameter values. The default prior in STAN (a popular Bayesian estimation program; https://mc-stan.org) is a normally distributed prior with a mean of zero and an SD of 2.5 (seen in orange in the figure; our initial preregistered prior). This large standard deviation easily permits positive and negative estimates putting minimal emphasis on zero. Contrast this to BayesFactor package’s (Morey R, Rouder J, 2023) default “wide” prior which is the Cauchy distribution (0, .7) illustrated in magenta (for more on the Cauchy see: https://distribution-explorer.github.io/continuous/cauchy.html). 

      These different defaults reflect differing Bayesian philosophical schools (‘estimate parameters’ vs ‘quantify evidence’ camps); if your goal is to accurately estimate a parameter it would be odd to have a strong null prior, yet (in our opinion) when estimating point-null BF’s a wide default prior gives far too much evidence in support of the null. In point-null BF testing the Savage-Dickey density ratio is the ratio between the height of the prior at 0 and the height of the posterior at zero (see Figure under section “testing against point null 0”). This means BFs can be very prior sensitive (seen in SI tables 5 & 6). For this reason, we thought it made sense to do prior sensitivity testing, to ensure our conclusions in favor of the null were not caused solely by an overly wide prior (preregistered orange distribution) we decided to report the 3 narrower priors (blue ones).

      Alternative Bayesian null hypotheses testing methods such as using Bayes Factors to test against a null region and ‘region of practical equivalence testing’ are less prior sensitive, yet both methods demand the researcher (e.g. ‘us’) to decide on a minimal effect size of practical interest. Once a minimal effect size of interest is determined any effect within this boundary is taken as evidence in support of the null hypothesis.

      (5) It is unclear why a different method was employed for the August / September data analysis compared to the full-time series. 

      We used a local-randomization RD framework, an entirely different empirical framework than continuity methods (resulting in a different estimate). For an overview see the primer by Cattaneo, Idrobo & Titiunik 2023 (“A Practical Introduction to Regression Discontinuity Designs: Extensions”; https://arxiv.org/abs/2301.08958).

      A local randomization framework is optimal when the running variable is discrete (as in our case with DOB in months) (Cattaneo, Idrobo & Titiunik 2023). It makes stronger assumptions on exchangeability therefore a very narrow window around the cutoff needs to be used. See Figure 2.1 and 2.2 (in the Cattaneo, Idrobo & Titiunik 2023) for graphical illustrations of 1) a randomized experiment, 2) a continuity RD design, and 3) local-randomization RD. Using the full-time series in a local randomization analysis is not recommended as there is no control for differences between individuals as we move further away from the cutoff – making the estimated parameter highly endogenous.

      We understand how it is confusing to have both a new framework and Bayesian methods (we could have chosen a fully frequentist approach) but using a different framework allows us to weigh up the aforementioned ‘bias vs variance tradeoff’ while Bayesian methods allow us to say something about the weight of evidence (for or against) our hypothesis.

      (6) Figure 1 - why not use model fits from those employed for hypothesis testing? 

      This is a great suggestion (ties into #3), we have now redone Figure 1.

      (7) The section on "correlational effect" might also benefit from additional analyses and clarifications. Indeed, the data come from the same randomized experiment for which minimum education requirements were adjusted. Was the only difference that the number of years of education was studied as opposed to the cohort? If so, would the results of this analysis be similar in another subsample of the UK Biobank for which there was no change in policy?

      We have clarified the methods section for the correlational/associational effect. This was the same subset of individuals for the local randomization analysis; all we did was change the independent variable from an exogenous dummy-coded ROSLA term (where half of the sample had the natural experiment) to a continuous (endogenous) educational attainment IV. 

      In principle, the results from the associational analysis should be exactly the same if we use other UK Biobank cohorts. To see if the association of education attainment with the global neuroimaging cohorts was similar across sub-cohorts of new individuals, we conducted post hoc Bayesian analysis on eight more subcohort of 10-month intervals, spaced 2 years apart from each other (Sup. Figure 7; each indicated by a different color). Four of these sub-cohorts predate ROSLA, while the other four are after ROSLA. Educational attainment is slowly increasing across the cohorts of individuals born from 1949 until 1965; intriguingly the effect of ROSLA is visually evident in the distributions of educational attainment (Sup. Figure 7). Also, as seen in the cohorts predating ROSLA more and more individuals were (already) choosing to stay in education past 15 years of age (see cohort 1949 vs 1955 in Sup. Figure 7).

      Sup. Figure 8 illustrates boxplots of the educational attainment posterior of the eight sub-cohorts in addition to our original analysis (s1957) using a normal distributed prior with a mean of 0 and a sd of 1. Total surface area shows a remarkably replicable association with education attainment. Yet, it is evident the “extremely strong” association we found for CSF was a statistical fluke – as the posterior of other cohorts (bar our initial test) crosses zero. The conclusions for the other global neuroimaging covariates where we concluded ‘no associational effect’ seems to hold across cohorts.

      We have now added methods, deviation from preregistration, and the following excerpt to the results:

      “A post hoc replication of this associational analysis in eight additional 10-month cohorts spaced two years apart (Sup. Figure 7) indicates our preregistered report on the associational effect of educational attainment on CSF to be most likely a false-positive (Sup. Figure 8). Yet, the positive association between surface area and educational attainment is robust across the additional eight replication cohorts.”

      Reviewer #2 (Public review): 

      Summary: 

      The authors conduct a causal analysis of years of secondary education on brain structure in late life. They use a regression discontinuity analysis to measure the impact of a UK law change in 1972 that increased the years of mandatory education by 1 year. Using brain imaging data from the UK Biobank, they find essentially no evidence for 1 additional year of education altering brain structure in adulthood. 

      Strengths: 

      The authors pre-registered the study and the regression discontinuity was very carefully described and conducted. They completed a large number of diagnostic and alternate analyses to allow for different possible features in the data. (Unlike a positive finding, a negative finding is only bolstered by additional alternative analyses). 

      Weaknesses: 

      While the work is of high quality for the precise question asked, ultimately the exposure (1 additional year of education) is a very modest manipulation and the outcome is measured long after the intervention. Thus a null finding here is completely consistent educational attainment (EA) in fact having an impact on brain structure, where EA may reflect elements of training after a second education (e.g. university, post-graduate qualifications, etc) and not just stopping education at 16 yrs yes/no. 

      The work also does not address the impact of the UK Biobank's well-known healthy volunteer bias (Fry et al., 2017) which is yet further magnified in the imaging extension study (Littlejohns et al., 2020). Under-representation of people with low EA will dilute the effects of EA and impact the interpretation of these results. 

      References: 

      Fry, A., Littlejohns, T. J., Sudlow, C., Doherty, N., Adamska, L., Sprosen, T., Collins, R., & Allen, N. E. (2017). Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. American Journal of Epidemiology, 186(9), 1026-1034. https://doi.org/10.1093/aje/kwx246 

      Littlejohns, T. J., Holliday, J., Gibson, L. M., Garratt, S., Oesingmann, N., Alfaro-Almagro, F., Bell, J. D., Boultwood, C., Collins, R., Conroy, M. C., Crabtree, N., Doherty, N., Frangi, A. F., Harvey, N. C., Leeson, P., Miller, K. L., Neubauer, S., Petersen, S. E., Sellors, J., ... Allen, N. E. (2020). The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nature Communications, 11(1), 2624. https://doi.org/10.1038/s41467-020-15948-9 

      We thank the reviewer for the positive comments and constructive feedback, in particular, their emphasis on volunteer bias in UKB (similar points were mentioned by Reviewer 3). We have now addressed these limitations with the following passage in the discussion:

      “The UK Biobank is known to have ‘healthy volunteer bias’, as respondents tend to be healthier, more educated, and are more likely to own assets [71,72]. Various types of selection bias can occur in non-representative samples, impacting either internal (type 1) or external (type 2) validity. One benefit of a natural experimental design is that it protects against threats to internal validity from selection bias [43], design-based internal validity threats still exist, such as if volunteer bias differentially impacts individuals based on the cutoff for assignment. A more pressing limitation – in particular, for an education policy change – is our power to detect effects using a sample of higher-educated individuals. This is evident in our first stage analysis examining the percentage of 15-year-olds impacted by ROSLA, which we estimate to be 10% in neuro-UKB (Sup. Figure 2 & Sup. Table 2), yet has been reported to be 25% in the UK general population [41]. Our results should be interpreted for this subpopulation  (UK, 1973, from 15 to 16 years of age, compliers) as we estimate a ‘local’ average treatment effect [73]. Natural experimental designs such as ours offer the potential for high internal validity at the expense of external validity.”

      We also highlighted it both in the results and methods.

      We appreciate that one year of education may seem modest compared to the entire educational trajectory, but as an intervention, we disagree that one year of education is ‘a very modest manipulation’. It is arguably one of the largest positive manipulations in childhood development we can administer. If we were to translate a year of education into the language of a (cognitive) intervention, it is clear that the manipulation, at least in terms of hours, days, and weeks, is substantial. Prior work on structural plasticity (e.g., motor, spatial & cognitive training) has involved substantially more limited manipulations in time, intensity, and extent. There is even (limited) evidence of localized persistent long-term structural changes (Wollett & Maguire, 2011, Cur. Bio.).

      We have now also highlighted the limited generalizability of our findings since we estimate a ‘local’ average treatment effect. It is possible higher education (college, university, vocational schools, etc.) could impact brain structure, yet we see no theoretical reason why it would while secondary wouldn’t. Moreover, higher education education is even trickier to research empirically due to heightened self and administrative selection pressures. While we cannot discount this possibility, the impacts of endogenous factors such as genetics and socioeconomic status are most likely heightened. That being said, higher education offers exciting possibilities to compare more domain-specific processes (e.g., by comparing a philosophy student to a mathematics student). Causality could be tested in European systems with point entry into field-specific programs – allowing comparison of students who just missed entry criteria into one topic and settled for another.

      Regarding the amount of time following the manipulation, as we highlight in our discussion this is both a weakness and a strength. Viewed from a developmental neuroplasticity lens it would have been nice to have imaging immediately following the manipulation. Yet, from an aging perspective, our design has increased power to detect an effect.  

      Reviewer #2 (Recommendations for the authors): 

      (1) The authors assert there is no strong causal evidence for EA on brain structure. This overlooks work from Mendielian Randomisation, e.g. this careful work: https://pubmed.ncbi.nlm.nih.gov/36310536/ ... evidence from (good quality) MR studies should be considered. 

      We thank the reviewer for highlighting this well-done mendelian randomization study. We have now added this citation and removed previous claims on the “lack of causal evidence existing”. We refrain from discussing Mendelian randomization, as it it would need to be accompanied by a nuanced discussion on the strong limitations regarding EduYears-PGS in Mendelian randomization designs.

      (2) Tukey/Boxplot is a good name for your identification of outliers but your treatment of outliers has a well-recognized name that is missing: Windsorisation. Please add this term to your description to help the reader more quickly understand what was done. 

      Thanks, we have now added the term winsorized.

      (3) Nowhere is it plainly stated that "fuzzy" means that you allow for imperfect compliance with the exposure, i.e. some children born before the cut-off stayed in school until 16, and some born after the cut-off left school before 16. For those unfamiliar with RD it would be very helpful to explain this at or near the first reference of the term "fuzzy". 

      We have now clarified the term ‘fuzzy’ to the results and methods:

      methods:

      “RD designs, like ours, can be ‘fuzzy’ indicating when assignment only increases the probability of receiving it, in turn, treatment assigned and treatment received do not correspond for some units 33,53. For instance, due to cultural and historical trends, there was an increase in school attendance before ROSLA; most adolescents were continuing with education past 15 years of age (Sup Plot. 7b). Prior work has estimated that 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      (4) Supplementary Figure 2 never states what the percentage actually measures. What exactly does each dot represent? Is it based on UK Biobank subjects with a given birth month? If so clarify. 

      Fixed!

      Reviewer #3 (Public review): 

      Summary: 

      This study investigates evidence for a hypothesized, causal relationship between education, specifically the number of years spent in school, and brain structure as measured by common brain phenotypes such as surface area, cortical thickness, total volume, and diffusivity. 

      To test their hypothesis, the authors rely on a "natural" intervention, that is, the 1972 ROSLA act that mandated an extra year of education for all 15-year-olds. The study's aim is to determine potential discontinuities in the outcomes of interest at the time of the policy change, which would indicate a causal dependence. Naturalistic experiments of this kind are akin to randomised controlled trials, the gold standard for answering questions of causality. 

      Using two complementary, regression-based approaches, the authors find no discernible effect of spending an extra year in primary education on brain structure. The authors further demonstrate that observational studies showing an effect between education and brain structure may be confounded and thus unreliable when assessing causal relationships. 

      Strengths: 

      (1) A clear strength of this study is the large sample size totalling up to 30k participants from the UK Biobank. Although sample sizes for individual analyses are an order of magnitude smaller, most neuroimaging studies usually have to rely on much smaller samples. 

      (2) This study has been preregistered in advance, detailing the authors' scientific question, planned method of inquiry, and intended analyses, with only minor, justifiable changes in the final analysis. 

      (3) The analyses look at both global and local brain measures used as outcomes, thereby assessing a diverse range of brain phenotypes that could be implicated in a causal relationship with a person's level of education. 

      (4) The authors use multiple methodological approaches, including validation and sensitivity analyses, to investigate the robustness of their findings and, in the case of correlational analysis, highlight differences with related work by others. 

      (5) The extensive discussion of findings and how they relate to the existing, somewhat contradictory literature gives a comprehensive overview of the current state of research in this area. 

      Weaknesses: 

      (1) This study investigates a well-posed but necessarily narrow question in a specific setting: 15-year-old British students born around 1957 who also participated in the UKB imaging study roughly 60 years later. Thus conclusions about the existence or absence of any general effect of the number of years of education on the brain's structure are limited to this specific scenario. 

      (2) The authors address potential concerns about the validity of modelling assumptions and the sensitivity of the regression discontinuity design approach. However, the possibility of selection and cohort bias remains and is not discussed clearly in the paper. Other studies (e.g. Davies et al 2018, https://www.nature.com/articles/s41562-017-0279-y) have used the same policy intervention to study other health-related outcomes and have established ROSLA as a valid naturalistic experiment. Still, quoting Davies et al. (2018), "This assumes that the participants who reported leaving school at 15 years of age are a representative sample of the sub-population who left at 15 years of age. If this assumption does not hold, for example, if the sampled participants who left school at 15 years of age were healthier than those in the population, then the estimates could underestimate the differences between the groups.". Recent studies (Tyrrell 2021, Pirastu 2021) have shown that UK Biobank participants are on average healthier than the general population. Moreover, the imaging sub-group has an even stronger "healthy" bias (Lyall 2022). 

      (3) The modelling approach used in this study requires that all covariates of no interest are equal before and after the cut-off, something that is impossible to test. Mentioned only briefly, the inclusion and exclusion of covariates in the model are not discussed in detail. Standard imaging confounds such as head motion and scanning site have been included but other factors (e.g. physical exercise, smoking, socioeconomic status, genetics, alcohol consumption, etc.) may also play a role. 

      We thank the reviewer for their numerous positive comments and have now attempted to address the first two limitations (generalizability and UKB bias) with the following passage in the discussion:

      “The UK Biobank is known to have ‘healthy volunteer bias’, as respondents tend to be healthier, more educated, and are more likely to own assets [71,72]. Various types of selection bias can occur in non-representative samples, impacting either internal (type 1) or external (type 2) validity. One benefit of a natural experimental design is that it protects against threats to internal validity from selection bias [43], design-based internal validity threats still exist, such as if volunteer bias differentially impacts individuals based on the cutoff for assignment. A more pressing limitation – in particular, for an education policy change – is our power to detect effects using a sample of higher-educated individuals. This is evident in our first stage analysis examining the percentage of 15-year-olds impacted by ROSLA, which we estimate to be 10% in neuro-UKB (Sup. Figure 2 & Sup. Table 2), yet has been reported to be 25% in the UK general population [41]. Our results should be interpreted for this subpopulation  (UK, 1973, from 15 to 16 years of age, compliers) as we estimate a ‘local’ average treatment effect [73]. Natural experimental designs such as ours offer the potential for high internal validity at the expense of external validity.”

      We further highlight this in the results section:

      “Compliance with ROSLA was very high (near 100%; Sup. Figure 2). However, given the cultural and historical trends leading to an increase in school attendance before ROSLA, most adolescents were continuing with education past 15 years of age before the policy change (Sup Plot. 7b). Prior work has estimated 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      Healthy volunteer bias can create two types of selection bias; crucially participation itself can serve as a collider threatening internal validity (outlined in van Alten et al., 2024; https://academic.oup.com/ije/article/53/3/dyae054/7666749). Natural experimental designs are partially sheltered from this major limitation, as ‘volunteer bias’ would have to differentially impact individuals on one side of the cutoff and not the other – thereby breaking a primary design assumption of regression discontinuity. Substantial prior work (including this article) has not found any threats to the validity of the 1973 ROSLA (Clark & Royer 2010, 2013; Barcellos et al., 2018, 2023; Davies et al., 2018, 2023). While the Davies 2028 article did IP-weight with the UK Biobank sample, Barcellos and colleagues 2023 (and 2018) do not, highlighting the following “Although the sample is not nationally representative,  our estimates have internal validity because there is no differential selection on the two sides of the September 1, 1957 cutoff – see  Appendix A.”.

      The second (more acknowledged & arguably less problematic) type of selection bias results in threats to external validity (aka generalizability). As highlighted in your first point; this is a large limitation with every natural experimental design, yet in our case, this is further amplified by the UK Biobank’s healthy volunteer bias. We have now attempted to highlight this limitation in the discussion passage above.

      Point 3 – the inability to fully confirm design validity – is again, another inherent limitation of a natural experimental approach. That being said, extensive prior work has tested different predetermined covariates in the 1973 ROSLA (cited within), and to our knowledge, no issues have been found. The 1973 ROSLA seems to be one of the better natural experiments around (there was also a concerted effort to have an ‘effective’ additional year; see Clark & Royer 2010). For these reasons, we stuck with only testing the variables we wanted to use to increase precision (also offering new neuroimaging covariates that didn’t exist in the literature base). One additional benefit of ROSLA was that the cutoff was decided years later on a variable that happened (date of birth) in the past – making it particularly hard for adolescents to alter their assignments.

      Reviewer #3 (Recommendations for the authors): 

      (1) FMRIB's preprocessing pipeline is mentioned. Does this include deconfounding of brain measures? Particularly, were measures deconfounded for age before the main analysis? 

      This is such a crucial point that we triple-checked, brain imaging phenotypes were not corrected for age (https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/brain_mri.pdf) – large effects of age can be seen in the global metrics; older individuals have less surface area, thinner cortices, less brain volume (corrected for head size), more CSF volume (corrected for head size), more white matter hyperintensities, and worse FA values. Figure 1 shows these large age effects, which are controlled for in our continuity-based RD analysis.

      One’s date of birth (DOB) of course does not match perfectly to their age, this is why we included the covariate ‘visit date’; this interplay can now be seen in our updated SI Figure 1 (recommended in #3) which shows the distributions of visit date, DOB, and age of scan. 

      In a valid RD design covariates should not be necessary (as they should be balanced on either side of the cutoff), yet the inclusion of covariates does increase precision to detect effects. We tested this assumption, finding the effect of ‘visit date’ and its quadratic term to be not related to ROSLA (Sup. Table 1). This adds further evidence (specific to the UK Biobank sample) to the existing body of work showing the 1973 ROSLA policy change to not violate any design assumptions. Threats to internal validity would more than likely increase endogeneity and result in ‘false causal positive causal effects’ (which is not what we find).  

      (2) Despite the large overall sample size, I am wondering whether the effective number of samples is sufficient to detect a potentially subtle effect that is further attenuated by the long time interval before scanning. As stated, for the optimised bandwidth window (DoB 20 to 35 months around cut-off), N is about 5000. Does this mean that effectively about 250 (10%) out of about 2500 participants born after the cut-off were leaving school at 16 rather than 15 because of ROSLA? For the local randomisation analysis, this becomes about N=10 (10% out of 100). Could a power analysis show that these cohort sizes are large enough to detect a reasonably large effect? 

      This is a very valid point, one which we were grappling with while the paper was out for review. We now draw attention to this in the results and highlight this as a limitation in the discussion. While UKB’s non-representativeness limits our power (10% affected rather than 25% in the general population), it is still a very large sample. Our sample size is more in line with standard neuroimaging studies than with large cohort studies. 

      The novelty of our study is its causal design, while we could very precisely measure an effect of some phenotype (variable X) in 40,000 individuals. This effect is probably not what we think we are measuring. Without IP-weighting it could even have a different sign. But more importantly, it is not variable X – it is the thousands of things (unmeasured confounders) that lead an individual to have more or less of variable X. The larger the sample the easier it is for small unmeasured confounders to reach significance (Big data paradox) – this in no way invalidates large samples, it is just our thinking and how we handle large samples will hopefully change to a more casual lens.

      (3) Supplementary Figure 1: A similar raincloud plot of date of birth would be instructive to visualise the distribution of subjects born before and after the 1957 cut-off. 

      Great idea! We have done this in Sup Fig. 1 for both visit date and DOB.

      (4) p.9: Not sure about "extreme evidence", very strong would probably be sufficient. 

      As preregistered, we interpreted Bayes Factors using Jeffrey’s criteria. ‘Extreme evidence’ is only used once and it is about finding an associational effect of educational attainment on CSF (BF10 > 100). Upon Reviewer 1’s recommendation 7, we conducted eight replication samples (Sup. Figure 7 & 8) and have now added the following passage to the results:

      “A post hoc replication of this associational analysis in eight additional 10-month cohorts spaced two years apart (Sup. Figure 7) indicates our preregistered report on the associational effect of educational attainment on CSF to be most likely a false-positive (Sup. Figure 8). Yet, the positive association between surface area and educational attainment is robust across the additional eight replication cohorts.”

      (5) The code would benefit from a bit of clean-up and additional documentation. In its current state, it is not easy to use, e.g. in a replication study. 

      We have now further added documentation to our code; including a readme describing what each script does. The analysis pipeline used is not ideal for replications as the package used for continuity-based RD (RDHonest) initially could not handle covariates – therefore we manually corrected our variables after a discussion with Prof Kolesár (https://github.com/kolesarm/RDHonest/issues/7). 

      Prof Kolesár added this functionality recently and future work should use the latest version of the package as it can correct for covariates. We have a new preprint examining the effect of 1972 ROLSA on telomere length in the UK Biobank using the latest package version of RDHonest (https://www.biorxiv.org/content/10.1101/2025.01.17.633604v1). To ensure maximum availability of such innovations, we will ensure the most up-to-date version of this script becomes available on this GitHub link (https://github.com/njudd/EduTelomere).

    1. eLife Assessment

      The work presented is important for our understanding of the development of the cardiac conduction system and its regulation by T-box transcription factors. The conclusions are supported by convincing data. Overall this is an excellent study that advances our understanding of cardiac biology and has implications beyond the immediate field of study.

    2. Reviewer #2 (Public review):

      Summary:

      The goal of this work is to define the functions of T-box transcription factors Tbx3 and Tbx5 in the adult mouse ventricular cardiac conduction system (VCS) using a novel conditional mouse allele in which both genes are targeted in cis. A series of studies over the past 2 decades by this group and others have shown that Tbx3 is a transcriptional repressor that patterns the conduction system by repressing genes associated with working myocardium, while Tbx5 is a potent transcriptional activator of "fast" conduction system genes in the VCS. In a previous work, the authors of the present study further demonstrated that Tbx3 and Tbx5 exhibit an epistatic relationship whereby the relief of Tbx3-mediated repression through VCS conditional haploinsufficiency allows better toleration of Tbx5 VCS haploinsufficiency. Conversely, excess Tbx3-mediated repression through overexpression results in disruption of the fast-conduction gene network despite normal levels of Tbx5. Based on these data the authors proposed a model in which repressive functions of Tbx3 drive adoption of conduction system fate, followed by segregation into a fast-conducting VCS and slow-conduction AVN through modulation of the Tbx5/Tbx3 ratio in these respective tissue compartments.

      The question motivating the present work is: If Tbx5/Tbx3 ratio is important for slow versus fast VCS identity, what happens when both genes are completely deleted from the VCS? Is conduction system identity completely lost without both factors and if so, does the VCS network transform into a working myocardium-like state? To address this question, the authors have generated a novel mouse line in which both Tbx5 and Tbx3 are floxed on the same allele, allowing complete conditional deletion of both factors using the VCS-specific MinK-CreERT2 line, convincingly validated in previous work. The goal is to use these double conditional knockout mice to further explore the model of Tbx3/Tbx5 co-dependent gene networks and VCS patterning. First the authors demonstrate that the double conditional knockout allele results in the expected loss of Tbx3 and Tbx5 specifically in the VCS when crossed with Mink-CreERT2 and induced with tamoxifen. The double conditional knockout also results in premature mortality. Detailed electrophysiological phenotyping demonstrated prolonged PR and QRS intervals, inducible ventricular tachycardia, and evidence of abnormal impulse propagation along the septal aspect of the right ventricle. In addition, the mutants exhibit downregulation of VCS genes responsible for both fast conduction AND slow conduction phenotypes with upregulation of 2 working myocardial genes including connexin-43. The authors conclude that loss of both Tbx3 and Tbx5 results in "reversion" or "transformation" of the VCS network to a working myocardial phenotype, which they further claim is a prediction of their model and establishes that Tbx3 and Tbx5 "coordinate" transcriptional control of VCS identity.

      Overall Appraisal:

      As noted above, the present study does not further explore the Tbx5/Tbx3 ratio concept since both genes are completely knocked out in the VCS. Instead, the main claims are that absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function. However, only limited data are presented to support the claim of transcriptional reprogramming since the knockout cells are not directly compared to working myocardial cells at the transcriptional level and only a small number of key genes are assessed (versus genome-wide assessment). In addition, the optical mapping dataset has alternative interpretations that are not excluded or thoroughly discussed.

      In sum, while this study adds an elegantly constructed genetic model to the field, the data presented mostly fit within the existing paradigm of established functions of Tbx3 and Tbx5. The authors present some evidence to support the claim that VCS cells adopt a working myocardial phenotype in the absence of Tbx3 and Tbx5, but some key experiments that could more definitively test this model were not performed, reducing the degree to which the data support the conclusions.

      Strengths:

      (1) Successful generation of a novel Tbx3-Tbx5 double conditional mouse model<br /> (2) Successful VCS-specific deletion of Tbx3 and Tbx5 using a VCS-specific inducible Cre driver line<br /> (3) Well-powered and convincing assessments of mortality and physiological phenotypes<br /> (4) Isolation of genetically modified VCS cells using flow.

      Weaknesses:

      (1) In general, the data is consistent with a long-standing and well-supported model in which Tbx3 represses working myocardial genes and Tbx5 activates expression of VCS genes, which seem like distinct roles in VCS patterning.<br /> (2) More direct quantitative comparison of Tbx5 Adult VCS KO with Tbx5/Tbx3 Adult VCS double KO would be helpful to ascertain whether deletion of Tbx3 on top of Tbx5 deletion changes the underlying phenotype in some discernable way beyond mRNA expression of a few genes. Superficially, the phenotypes look quite similar at the EKG and arrhythmia inducibility level and no optical mapping data from single Tbx5 KO is presented for comparison to the double KO. I understand that single Tbx5 VCS KO mutants have been evaluated in previous publications but I think in order to evaluate the claims presented here, it would be important to do a direct comparison using the same assays and conditions.<br /> (3) The authors claim that double knockout VCS cells transform to working myocardial fate, but there is no comparison of gene expression levels between actual working myocardial cells and the Tbx3/Tbx5 DKO VCS cells so it's hard to know if the data reflect an actual cell state change or a more non-specific phenomenon with global dysregulation of gene expression or perhaps dedifferentiation. I understand that the upregulation of Gja1 and Smpx is intended to address this, but it's only two genes and it seems relevant to understand their degree of expression relative to actual working myocardium. In addition, the gene panel is somewhat limited and does not include other key transcriptional regulators in the VCS such as Irx3 and Nkx2-5. RNA-seq in these populations would provide a clearer comparison among the groups.<br /> (4) From the optical mapping data, it is difficult to distinguish between the presence of (1) a focal proximal right bundle branch block due to dysregulation of gene expression in the VCS but overall preservation of the right bundle and its distal ramifications; from (2) actual loss of the VCS with reversion of VCS cells to a working myocardial fate. Related to this, the authors claim that this experiment allows for direct visualization of His bundle activation, but can the authors confirm or provide evidence that the tissue penetration of their imaging modality allows for imaging of a deep structure like the AV bundle as opposed to the right bundle branch which is more superficial? Does the timing of the separation of the sharp deflection from the subsequent local activation suggest visualization of more distal components of the VCS rather than the AV bundle itself? Additional clarification would be helpful.

      impact:

      The present study contributes a novel and elegantly constructed mouse model to the field. The data presented generally corroborate existing models of transcriptional regulation in the VCS. Acknowledging that the present work is strong start, some additional studies not included in the present manuscript will be needed for this new mouse model to decisively advance the field of VCS transcriptional biology.

    3. Reviewer #3 (Public review):

      Summary:

      In the study presented by Burnicka-Turek et al., the authors generated for the first time a mouse model to cause the combined conditional deletion of Tbx3 and Tbx5 genes. This has been impossible to achieve to date due to the proximity of these genes in chromosome 5, preventing the generation of loss of function strategies to delete simultaneously both genes. It is known that both Tbx3 and Tbx5 are required for the development of the cardiac conduction system by transcription factor-specific but also overlapping roles as seen in the common and diverse cardiac defects found in patients with mutations for these genes. After validating the deletion efficiency and specificity of the line, the authors characterised the cardiac phenotype associated to cardiac conduction system (CCS)-specific combined deletion of Tbx5 and Tbx3 in the adult by inducing the activation of the CCS-specific tamoxifen inducible Cre recombination (MinK-creERT) at 6 weeks after birth. Their analysis of 8-9 weeks old animals did not identify any major morphological cardiac defects. However, the authors found conduction defects including prolonged PR and QTR intervals and ventricular tachycardia causing the death of the double mutants, which do not survive more than 3 months after tamoxifen induction. Molecular and optical mapping analysis of the ventricular conduction system (VCS) of these mutants concluded that, in the absence of Tbx5 and Tbx3 function, the cells forming the ventricular conduction system (VCS) become working myocardium and lose the specific contractile features characterising VCS cells. Altogether, the study identified the critical combined role of Tbx3 and Tbx5 in the maintenance of the VCS in adulthood.

      Strengths:

      The study generated a new animal model to study the combined deletion of Tbx5 and Tbx3 in the cardiac conduction system. This unique model has provided the authors with the perfect tool to answer their biological questions. The study includes top-class methodologies to assess the functional defects present in the different mutants analysed, and gathered very robust functional data on the conduction defects present in these mutants. They also applied optical action potential (OAP) methods to demonstrate the loss of conduction action potential and the acquisition of working myocardium action potentials in the affected cells because of Tbx5/Tbx3 loss of function. The study used simpler molecular and morphological analysis to demonstrate that there are no major morphological defects in these mutant and that indeed, the conduction defects found are due to the acquisition of working myocardium features by the VCS cells. Altogether, this study identified the critical role of these transcription factors in the maintenance of the VCS in the adult heart.

      Weaknesses:

      In the opinion of this reviewer, the weakness in the study lays in the morphological and molecular characterization. The morphological analysis simply described the absence of general cardiac defects in the adult heart, however, whether the CCS tissues are present or not was not investigated. Linage tracing analysis using the reporter lines included in the crosses described in the study, will determine if there are changes in CCS tissue composition in the different mutants studied. Similarly, combining this reporter analysis with the molecular markers found to be dysregulated by qPCR and western blot will demonstrate that indeed the cells that were specified as VCS in the adult heart become working myocardium in the absence of Tbx3 and Tbx5 function.

      Comments on revisions:

      I would like to thank the authors for their revised manuscript and for their corrections based on the suggestions from the 3 reviewers. Although I would have preferred to see some of the additional experiments suggested by any of the reviewers to improve the robustness and depth of the study integrated in the revised version of the manuscript, I acknowledge that the authors may prefer to develop them as follow-up studies. So, looking forward to seeing the follow-up study unravelling the detailed molecular regulation controlled by Tbx3/Tbx5 during the formation and maintenance of the ventricular cardiac conduction system.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      In a heroic effort, Ozanna Burnicka-Turek et al. have made and investigated conduction system-specific Tbx3-Tbx5 deficient mice and investigated their cardiac phenotype. Perhaps according to expectations, given the body of literature on the function of the two T-box transcription factors in the heart/conduction system, the cardiomyocytes of the ventricular conduction system seemed to convert to "ordinary" ventricular working myocytes. As a consequence, loss of VCS-specific conduction system propagation was observed in the compound KO mice, associated with PR and QRS prolongation and elevated susceptibility to ventricular tachycardia.

      Strengths:

      Great genetic model. Phenotypic consequences at the organ and organismal levels are well investigated. The requirement of both Tbx3 and Tbx5 for maintaining VCS cell state has been demonstrated.

      We thank Reviewer #1 for acknowledging the effort involved in generating and characterizing the Tbx3/Tbx5 double conditional knockout mouse model and for highlighting the significance of this work in elucidating the role of these transcription factors in maintaining the functional and transcriptional identity of the ventricular conduction system. 

      Weaknesses:

      The actual cell state of the Tbx3/Tbx5 deficient conducting cells was not investigated in detail, and therefore, these cells could well only partially convert to working cardiomyocytes, and may, in reality, acquire a unique state.

      We agree with Reviewer #1 that the Tbx3/Tbx5 double mutant ventricular conduction myocardial cells may only partially convert to working cardiomyocytes or may acquire a unique state.  The transcriptional state of the double mutant VCS cells was investigated by bulk profiling of key genes associated with specific conduction and non-conduction cardiac regions, including fast conduction, slow conduction, or working myocardium. Neither the bulk transcriptional approaches nor the optical mapping approaches we employed capture single-cell data; in both cases, the data represents aggregated signals from multiple cells (1, 2). Single cell approaches for transcriptional profiling and cellular electrophysiology would clarify this concern and are appropriate for future studies. 

      (1) O’Shea C, Nashitha Kabri S, Holmes AP, Lei M, Fabritz L, Rajpoot K, Pavlovic D (2020) Cardiac optical mapping – State-of-the-art and future challenges. The International Journal of Biochemistry & Cell Biology 126:105804. doi: 10.1016/j.biocel.2020.105804. (2) Efimov IR, Nikolski VP, and Salama G (2004) Optical Imaging of the Heart. Circulation Research 95:21-33. doi: 10.1161/01.RES.0000130529.18016.35.

      Reviewer #2 (Public review):

      Summary:

      The goal of this work is to define the functions of T-box transcription factors Tbx3 and Tbx5 in the adult mouse ventricular cardiac conduction system (VCS) using a novel conditional mouse allele in which both genes are targeted in cis. A series of studies over the past 2 decades by this group and others have shown that Tbx3 is a transcriptional repressor that patterns the conduction system by repressing genes associated with working myocardium, while Tbx5 is a potent transcriptional activator of "fast" conduction system genes in the VCS. In a previous work, the authors of the present study further demonstrated that Tbx3 and Tbx5 exhibit an epistatic relationship whereby the relief of Tbx3-mediated repression through VCS conditional haploinsufficiency allows better toleration of Tbx5 VCS haploinsufficiency. Conversely, excess Tbx3-mediated repression through overexpression results in disruption of the fast-conduction gene network despite normal levels of Tbx5. Based on these data the authors proposed a model in which repressive functions of Tbx3 drive the adoption of conduction system fate, followed by segregation into a fast-conducting VCS and slow-conduction AVN through modulation of the Tbx5/Tbx3 ratio in these respective tissue compartments.

      The question motivating the present work is: If Tbx5/Tbx3 ratio is important for slow versus fast VCS identity, what happens when both genes are completely deleted from the VCS? Is conduction system identity completely lost without both factors and if so, does the VCS network transform into a working myocardium-like state? To address this question, the authors have generated a novel mouse line in which both Tbx5 and Tbx3 are floxed on the same allele, allowing complete conditional deletion of both factors using the VCS-specific MinK-CreERT2 line, convincingly validated in previous work. The goal is to use these double conditional knockout mice to further explore the model of Tbx3/Tbx5 co-dependent gene networks and VCS patterning. First, the authors demonstrate that the double conditional knockout allele results in the expected loss of Tbx3 and Tbx5 specifically in the VCS when crossed with Mink-CreERT2 and induced with tamoxifen. The double conditional knockout also results in premature mortality. Detailed electrophysiological phenotyping demonstrated prolonged PR and QRS intervals, inducible ventricular tachycardia, and evidence of abnormal impulse propagation along the septal aspect of the right ventricle. In addition, the mutants exhibit downregulation of VCS genes responsible for both fast conduction AND slow conduction phenotypes with upregulation of 2 working myocardial genes including connexin-43. The authors conclude that loss of both Tbx3 and Tbx5 results in "reversion" or "transformation" of the VCS network to a working myocardial phenotype, which they further claim is a prediction of their model and establishes that Tbx3 and Tbx5 "coordinate" transcriptional control of VCS identity.

      We appreciate Reviewer #2’s detailed summary of the study’s aims, methodologies, and findings, as well as their thoughtful suggestions for further analysis. We are grateful for their recognition of our genetic model’s novelty and robustness.

      Overall Appraisal:

      As noted above, the present study does not further explore the Tbx5/Tbx3 ratio concept since both genes are completely knocked out in the VCS. Instead, the main claims are that the absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function.

      We agree with this reviewer’s assessment of the assertions in our manuscript.  The novel combined Tbx5/Tbx3 double mutant model does not further explore the TBX5/TBX3 ratio concept, which we previously examined in detail (1). Instead, as the Reviewer notes, this manuscript focuses on testing a model that the coordinated activity of Tbx3 and Tbx5 defines specialized ventricular conduction identity. 

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      Strengths:

      (1) Successful generation of a novel Tbx3-Tbx5 double conditional mouse model.

      (2) Successful VCS-specific deletion of Tbx3 and Tbx5 using a VCS-specific inducible Cre driver line.

      (3) Well-powered and convincing assessments of mortality and physiological phenotypes. (4) Isolation of genetically modified VCS cells using flow.

      We thank Reviewer #2 for acknowledging the listed strengths of our study.

      Weaknesses:

      (1) In general, the data is consistent with a long-standing and well-supported model in which Tbx3 represses working myocardial genes and Tbx5 activates the expression of VCS genes, which seem like distinct roles in VCS patterning. However, the authors move between different descriptions of the functional relationship and epistatic relationship between these factors, including terms like "cooperative", "coordinated", and "distinct" at various points. In a similar vein, sometimes terms like "reversion" are used to describe how VCS cells change after Tbx3/Tbx5 conditional knockout, and other times "transcriptional shift" and at other times "reprogramming". But these are all different concepts. The lack of a clear and consistent terminology for describing the phenomena observed makes the overarching claims of the manuscript more difficult to evaluate.

      We discriminate prior work on the “long-standing and well-supported model’ supported by investigation of the role of Tbx5 and Tbx3 independently from this work examining the coordinated role of Tbx5 and Tbx3. Prior work demonstrated that Tbx3 represses working myocardial genes and Tbx5 activates expression of VCS genes, consistent with the reviewer’s suggestion of their distinct roles in VCS patterning. However, the current study uniquely evaluates the combined role of Tbx3 and Tbx5 in distinguishing specialized conduction identify from working myocardium, for the first time. 

      We appreciate Reviewer #2’s feedback regarding the need for consistent terminology when describing the impact of the double Tbx3 and Tbx5 mutant. We will edit the manuscript to replace terms like “reversion” with “transcriptional shift” or “transformation” when describing the observed phenotype, and we will use “coordination” to describe the combined role of Tbx5 and Tbx3 in maintaining VCS-specific identity.

      (2) A more direct quantitative comparison of Tbx5 Adult VCS KO with Tbx5/Tbx3 Adult VCS double KO would be helpful to ascertain whether deletion of Tbx3 on top of Tbx5 deletion changes the underlying phenotype in some discernable way beyond mRNA expression of a few genes. Superficially, the phenotypes look quite similar at the EKG and arrhythmia inducibility level and no optical mapping data from a single Tbx5 KO is presented for comparison to the double KO.

      We thank Reviewer #2 for the suggestions that a direct comparison between Tbx5 single conditional knockout and Tbx3/Tbx5 double conditional knockout models may help isolate the specific contribution of Tbx3 deletion in addition to Tbx5 deletion. 

      Previous studies have assessed the effect of single Tbx5 CKO in the VCS of murine hearts (1, 3, 5). Arnolds et al. demonstrated that the removal of Tbx5 from the adult ventricular conduction system results in VCS slowing, including prolonged PR and QRS intervals, prolongation of the His duration and His-ventricular (HV) interval (3).

      Furthermore, Burnicka-Turek et al. demonstrated that the single conditional knockout of Tbx5 in the adult VCS caused a shift toward a pacemaker cell state, with ectopic beats and inappropriate automaticity (1). Whole-cell patch clamping of VCS-specific Tbx5 deficient cells revealed action potentials characterized by a slower upstroke (phase 0), prolonged plateau (phase 2), delayed repolarization (phase 3), and enhanced phase 4 depolarization - features characteristic of nodal action potentials rather than typical VCS action potentials (3). These observations were interpreted as uncovering nodal potential of the VCS in the absence of Tbx5. Based on the role of Tbx3 in CCS specification (2), we hypothesized that the nodal state of the VCS uncovered in the absence of Tbx5 was enabled by maintained Tbx3 expression. This motivated us to generate the double Tbx5

      / Tbx3 knockout model to examine the state of the VCS in the absence of both T-box TFs. In the current study, we demonstrate that the VCS-specific deletion of Tbx3 and Tbx5 results in the loss of fast electrical impulse propagation in the VCS, similar to that observed in the single Tbx5 mutant. However, unlike the Tbx5 single mutant, the Tbx3/Tbx5 double deletion does not cause a gain of pacemaker cell state in the VCS. Instead, the physiological data suggests a transition toward non-conduction working myocardial physiology. This conclusion is supported by the presence of only a single upstroke in the optical action potential (OAP) recorded from the His bundle region and VCS cells in Tbx3/Tbx5 double conditional knockout mice. The electrical properties of VCS cells in the double knockout are functionally indistinguishable from those of ventricular working myocardial cells. As a result, ventricular impulse propagation is significantly slowed, resembling activation through exogenous pacing rather than the rapid conduction typically associated with the VCS. We will edit the text of the manuscript to more carefully distinguish the observations between these models, as suggested.

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      (2) Mohan RA, Bosada FM, van Weerd JH, van Duijvenboden K, Wang J, Mommersteeg MTM, Hooijkaas IB, Wakker V, de Gier-de Vries C, Coronel R, Boink GJJ, Bakkers J, Barnett P, Boukens BJ, Christoffels VM (2020) T-box transcription factor 3 governs a transcriptional program for the function of the mouse atrioventricular conduction system. Proc Natl Acad Sci U S A. 117:18617-18626. doi: 10.1073/pnas.1919379117.

      (3) Arnolds DE, Liu F, Fahrenbach JP, Kim GH, Schillinger KJ, Smemo S, McNally EM, Nobrega MA, Patel VV, Moskowitz IP (2012) TBX5 drives Scn5a expression to regulate cardiac conduction system function. The Journal of Clinical Investigation 122:2509–2518. doi: 10.1172/JCI62617.

      (4) Frank DU, Carter KL, Thomas KR, Burr RM, Bakker ML, Coetzee WA, Tristani-Firouzi M, Bamshad MJ, Christoffels VM, Moon AM (2012) Lethal arrhythmias in Tbx3-deficient mice reveal extreme dosage sensitivity of cardiac conduction system function and homeostasis. Proc Natl Acad Sci U S A. 109:E154-63. doi: 10.1073/pnas.1115165109.

      (5) Moskowitz IP, Pizard A, Patel VV, Bruneau BG, Kim JB, Kupershmidt S, Roden D, Berul CI, Seidman CE, Seidman JG (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131:4107-4116. doi: 10.1242/dev.01265. PMID: 15289437.

      (3) The authors claim that double knockout VCS cells transform to working myocardial fate, but there is no comparison of gene expression levels between actual working myocardial cells and the Tbx3/Tbx5 DKO VCS cells so it's hard to know if the data reflect an actual cell state change or a more non-specific phenomenon with global dysregulation of gene expression or perhaps dedifferentiation. I understand that the upregulation of Gja1 and Smpx is intended to address this, but it's only two genes and it seems relevant to understand their degree of expression relative to actual working myocardium. In addition, the gene panel is somewhat limited and does not include other key transcriptional regulators in the VCS such as Irx3 and Nkx2-5. RNA-seq in these populations would provide a clearer comparison among the groups.

      And

      the main claims are that the absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function. However, only limited data are presented to support the claim of transcriptional reprogramming since the knockout cells are not directly compared to working myocardial cells at the transcriptional level and only a small number of key genes are assessed (versus genome-wide assessment).

      We appreciate Reviewer #2’s suggestion to expand the gene expression analysis in Tbx3/Tbx5-deficient VCS cells by including other specific genes and comparisons with “native”/actual working ventricular myocardial cells and broadening the gene panel. In this study, we evaluated core cardiac conduction system markers, revealing a loss of conduction system-specific gene expression in the double mutant VCS. Furthermore, we evaluated key working myocardial markers normally excluded from the conduction system, Gja1 and Smpx, revealing a shift towards a working myocardial state in the double mutant VCS (Figure 4). We agree that a more comprehensive analysis, such as transcriptome-wide approaches, would offer greater clarity on the extent and specificity of the observed shift from conduction to non-conduction identity. These approaches are appropriate directions for future studies.

      (4) From the optical mapping data, it is difficult to distinguish between the presence of (a) a focal proximal right bundle branch block due to dysregulation of gene expression in the VCS but overall preservation of the right bundle and its distal ramifications; from (b) actual loss of the VCS with reversion of VCS cells to a working myocardial fate. Related to this, the authors claim that this experiment allows for direct visualization of His bundle activation, but can the authors confirm or provide evidence that the tissue penetration of their imaging modality allows for imaging of a deep structure like the AV bundle as opposed to the right bundle branch which is more superficial? Does the timing of the separation of the sharp deflection from the subsequent local activation suggest visualization of more distal components of the VCS rather than the AV bundle itself? Additional clarification would be helpful.

      And

      In addition, the optical mapping dataset is incomplete and has alternative interpretations that are not excluded or thoroughly discussed.

      We agree with Reviewer #2 that the resolution of the optical mapping experiment may be insufficient to precisely localize the conduction block due to the limited signal strength from the VCS. It is possible that the region defined as the His Bundle also includes portions of the right bundle branch. Our control mice show VCS OAP upstrokes consistent with those reported by Tamaddon et al. (2000) using Di-4-ANEPPS (1). We appreciate the Reviewer’s attention to alternative interpretations, and we will incorporate these caveats into the manuscript text. 

      (1) Tamaddon HS, Vaidya D, Simon AM, Paul DL, Jalife J, Morley GE (2000) Highresolution optical mapping of the right bundle branch in connexin40 knockout mice reveals slow conduction in the specialized conduction system. Circulation Research 87:929-36. doi: 10.1161/01.res.87.10.929. 

      Impact:

      The present study contributes a novel and elegantly constructed mouse model to the field. The data presented generally corroborate existing models of transcriptional regulation in the VCS but do not, as presented, constitute a decisive advance.

      And

      In sum, while this study adds an elegantly constructed genetic model to the field, the data presented fit well within the existing paradigm of established functions of Tbx3 and Tbx5 in the VCS and in that sense do not decisively advance the field. Moreover, the authors' claims about the implications of the data are not always strongly supported by the data presented and do not fully explore alternative possibilities.

      We appreciate Reviewer # 2’s acknowledgment of the elegance and novelty of the mouse model we generated. However, we respectfully disagree with their assessment that this work merely corroborates existing models without providing a decisive advance. Previous studies have investigated single Tbx5 or Tbx3 gene knockouts in-depth and established the T-box ratio model for distinguishing fast VCS from slow nodal conduction identity (1) that the reviewer alludes to in earlier comments. In contrast, this study aimed to explore a different model, that the combined effects of Tbx5 and Tbx3 distinguish adult VCS identity from non-conduction working myocardium. The coordinated Tbx3 and Tbx5 role in conduction system identify remained untested due to the lack of a mouse model that allowed their simultaneous removal. The very model the reviewer recognizes as “novel and elegantly constructed” has allowed the examination of the coordinated role of Tbx5 and Tbx3 for the first time. While we acknowledge the opportunity for additional depth of investigation of this model in future studies, the data we present provides consistent experimental support for the coordinated requirement of both Tbx5 and Tbx3 for ventricular cardiac conduction system identity. 

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      Reviewer #3 (Public review):

      Summary:

      In the study presented by Burnicka-Turek et al., the authors generated for the first time a mouse model to cause the combined conditional deletion of Tbx3 and Tbx5 genes. This has been impossible to achieve to date due to the proximity of these genes in chromosome 5, preventing the generation of loss of function strategies to delete simultaneously both genes. It is known that both Tbx3 and Tbx5 are required for the development of the cardiac conduction system by transcription factor-specific but also overlapping roles as seen in the common and diverse cardiac defects found in patients with mutations for these genes. After validating the deletion efficiency and specificity of the line, the authors characterized the cardiac phenotype associated with the cardiac conduction system (CCS)-specific combined deletion of T_bx5_ and Tbx3 in the adult by inducing the activation of the CCS-specific tamoxifen-inducible Cre recombination (MinKcreERT) at 6 weeks after birth. Their analysis of 8-9-week-old animals did not identify any major morphological cardiac defects. However, the authors found conduction defects including prolonged PR and QTR intervals and ventricular tachycardia causing the death of the double mutants, which do not survive more than 3 months after tamoxifen induction. Molecular and optical mapping analysis of the ventricular conduction system (VCS) of these mutants concluded that, in the absence of Tbx5 and Tbx3 function, the cells forming the ventricular conduction system (VCS) become working myocardium and lose the specific contractile features characterizing VCS cells. Altogether, the study identified the critical combined role of Tbx3 and Tbx5 in the maintenance of the VCS in adulthood.

      Strengths:

      The study generated a new animal model to study the combined deletion of Tbx5 and Tbx3 in the cardiac conduction system. This unique model has provided the authors with the perfect tool to answer their biological questions. The study includes top-class methodologies to assess the functional defects present in the different mutants analyzed, and gathered very robust functional data on the conduction defects present in these mutants. They also applied optical action potential (OAP) methods to demonstrate the loss of conduction action potential and the acquisition of working myocardium action potentials in the affected cells because of Tbx5/Tbx3 loss of function. The study used simpler molecular and morphological analysis to demonstrate that there are no major morphological defects in these mutants and that indeed, the conduction defects found are due to the acquisition of working myocardium features by the VCS cells. Altogether, this study identified the critical role of these transcription factors in the maintenance of the VCS in the adult heart.

      We appreciate the Reviewer’s comments regarding the originality and utility of our model and the strengths of our methodological approach. The Reviewer’s appreciation of the molecular and morphological analyses as well as their constructive feedback is highly valuable.

      Weaknesses:

      In the opinion of this reviewer, the weakness in the study lies in the morphological and molecular characterization. The morphological analysis simply described the absence of general cardiac defects in the adult heart, however, whether the CCS tissues are present or not was not investigated. Lineage tracing analysis using the reporter lines included in the crosses described in the study will determine if there are changes in CCS tissue composition in the different mutants studied. Similarly, combining this reporter analysis with the molecular markers found to be dysregulated by qPCR and western blot, will demonstrate that indeed the cells that were specified as VCS in the adult heart, become working myocardium in the absence of Tbx3 and Tbx5 function.

      We appreciate the reviewer’s concern regarding the morphology of the cardiac conduction system in the Tbx3/Tbx5 double conditional knockout model. We did not observe any structural abnormalities, as the Reviewer notes. We agree with their suggestion for using Genetic Inducible Fate Mapping to mark cardiac conduction cells expressing MinKCre. In fact, we utilized this approach to isolate VCS cells for transcriptional profiling. Specifically, we combined the tamoxifen-inducible MinKCreERT allele with the Cre-dependent R26Eyfp reporter allele to label MinKCre-expressing cells in both control VCS and VCS-specific double Tbx3/Tbx5 knockouts. EYFP-positive cells were isolated for transcriptional studies, ensuring that our analysis exclusively targeted conduction system-lineage marked cells. The ability to isolate MinKCre-marked cells from both controls and Tbx5/Tbx3 double mutants indicates that VCS cells persisted in the double knockout. Nonetheless, the suggestion for in-vivo marking by Genetic Inducible

      Fate Mapping and morphologic analysis is a valuable recommendation for future studies. 

      Reviewer #1 (Recommendations for the authors):

      In a heroic effort, Ozanna Burnicka-Turek et al. have made and investigated conduction system-specific Tbx3-Tbx5 deficient mice and investigated their cardiac phenotype. Perhaps according to expectations, given the body of literature on the function of the two T-box transcription factors in the heart/conduction system, the cardiomyocytes of the ventricular conduction system seemed to convert to "ordinary" ventricular working myocytes. As a consequence, loss of VCS-specific conduction system propagation was observed in the compound KO mice, associated with PR and QRS prolongation and elevated susceptibility to ventricular tachycardia.

      Previous work suggested the prediction that VCS-specific genetic ablation of both the TBX3 and TBX5 would transform fast-conducting adult VCS into cells resembling working myocardium, eliminating specialized CCS fate. The current study suggests that this prediction is at least to some extent accurate.

      We appreciate Reviewer #1’s summary and recognition of our study. As the review notes, the simultaneous deletion of Tbx3 and Tbx5 in the mature ventricular conduction system (VCS) suggests a conversion of VCS to "ordinary" ventricular working myocytes. To our knowledge, this represents a novel observation and experimental model that uniquely captures the combined roles of these essential T-box transcription factors. We believe that this model offers a valuable platform for further investigation into the transcriptional mechanisms underlying conduction system specialization.

      (1) The huge effort made to generate the DKO model contrasts with the limited efforts made to study the mechanism. Conditional deficiency of Tbx3 and Tbx5 creates an artificial situation that is useful for addressing fundamental mechanistic questions. The authors provide a rather superficial analysis of the changes in the VCS upon deletion of these two critically important factors and do not provide really novel insights into their requirement/function in the VCS gene regulatory network and epigenetic state. So to what extent do VCS cardiomyocytes (CMs) from Tbx3/5 DKO mice resemble "simple" working myocardium? To what extent do these cells acquire the working myocardial (epigenetic) state, do these cells have an epigenetic memory of the Tbx3/Tbx5+ history, is the enhancer usage between the modified VCS CMs and the working CMs similar or not, etc.? The assumption that the authors' data indicate that the DKO VCS CMs simply acquire a ventricular working "fate" is unlikely. Following this reasoning, the reverse experiment to induce Tbx3 and Tbx5 expression in working CMs would result in complete conversion to VCS CMs, which is also unlikely.

      To answer such questions, transcriptomic and epigenetic state analysis, electrophysiologic analysis (e.g. patch-clamp), cell/subcellular level analysis, etc. would be required, as well as a comparison of the changed state of the DKO VCS CMs to that of working CMs.

      This initial study focused on generating the Tbx3:Tbx5 double-conditional knockout model and characterizing the resulting physiological and molecular changes within the VCS. We analyzed transcriptomic markers of fast conduction (VCS), slow conduction (nodal), and non-conduction (working myocardium). Additionally, we applied optical mapping to evaluate the physiological consequences of the double knockout, which allowed a calculated AP of the VCS to be generated. We agree that a more in-depth mechanistic investigation of the VCS transformation upon Tbx3/Tbx5 deletion by transcriptomic or cellular electrophysiology could provide a deeper understanding of the precise transcriptional/epigenetic state of the VCS in the double knockout and clarify whether there is a partial or complete conversion of VCS cells to a simple working myocardial phenotype. The suggestions by the reviewer will be considered for future studies.

      (2) Tbx3 stimulates BMP-TGFb signaling (e.g. positive loop between Tbx3-Bmp2), which in turn stimulates EMT and modulates the behavior of endocardial and mesenchymal cells. Did the authors investigate the impact of Tbx3/5 DKO on non-CM cells in and around the VCS? (see also comment 1). The insulation of the AVB for example could be a Tbx3/5 non cell autonomous target.

      We appreciate the Reviewer’s suggestion to examine the impact of Tbx3/Tbx5 deletion on non-CM cells surrounding the VCS. While this is an intriguing avenue for future exploration, it falls outside the scope of the current study, which focused on the cardiomyocyte-specific roles of Tbx3 and Tbx5 in maintaining adult VCS identity.

      (3) The MinK-Cre line used (from the Moskowitz lab) also recombines in the AVN (Arnolds et al 2011). The authors do not mention changes in the AVN, and systematically call the line VCS specific (which refers to the AVB, BB, PVCS I assume). This could also impact the PR interval. Please address.

      The MinK-Cre line recombines in the atrioventricular bundle (AVB) and bundle branches (BB). It recombines in cardiomyocytes adjacent to the atrioventricular node (AVN). We previously interpreted these cells as the penetrating portion of the His bundle into the AVN. This line does not recombine in the vast majority, if any, physiologic nodal cells. We also assessed nodal conduction parameters by invasive electrophysiologic (EP) studies. Our data showed that non-VCS parameters, including sinus node recovery time, AV node recovery time, and atrial and ventricular effective refractory periods, remained within normal ranges in Tbx3:Tbx5-deficient mice (please see Figure 2I). These findings indicate that AVN function is preserved in the VCS-specific double knockout, reinforcing the specificity of the observed conduction defects to the ventricular conduction system.

      (4) Did the authors also investigate the electrophysiological changes in the (EGFP+) DKO VCS CMs? Would these resemble the properties of ventricular working CMs, or would they still show some VCS properties? (see also comment 1).

      We performed electrophysiologic analysis of the double knockout by optical mapping. Optical mapping provides tissue-level resolution, capturing the functional behavior of clusters of thousands of cells simultaneously, rather than individual cells. While this technique does not achieve single-cell resolution, it allows for a comprehensive assessment of electrophysiological changes across the VCS region. Single cell electrophysiology is a good idea for future studies. 

      (5) Throughout the manuscript, the authors use "patterning" and "fate", which are applicable to development and differentiation, not to the situation where a gene is removed from fully differentiated cells in an adult organism resulting in a change of these cells. Perhaps more appropriate are "state" change and the requirement for "homeostasis/maintenance" of state.

      We appreciate the Reviewer’s concern regarding the terminology used to describe changes in VCS cell identity. To ensure precision and uniformity, we replaced terms such as “fate” and “patterning” with “state” or “maintenance” to reflect the shift in cellular characteristics in a fully differentiated adult tissue context. 

      Minor:

      (1) Please provide all data points in bar graphs.

      We have incorporated individual data points into the bar graphs as suggested, ensuring enhanced transparency and clarity in the data presentation.

      “(2) Formally, gene expression levels between samples are not normally distributed. The Welch t-test used here assumes a normal distribution. Therefore, nonparametric tests should be used.

      We appreciate Reviewer #1’s consideration of the appropriate statistical approach to the qPCR data and clarify our statistical approach here. Normality within each experimental group was assessed using the Shapiro-Wilk test. Between-group comparisons were conducted using Welch t-test, and multiple comparisons were corrected using the Benjamini & Hochberg method to control the false discovery rate (FDR) (71). If a significant difference was detected between two groups (t-test FDR < 0.05) but normality was rejected in any of the compared groups (Shapiro-Wilk P < 0.05), a non-parametric Wilcoxon rank-sum test was used for verification. A significant group-mean difference was confirmed at one-tailed Wilcoxon P≤0.05 (detailed in Supplementary Data Set I). Furthermore, we have updated the qRT-PCR information in each figure and their respective legends as follows. Statistical analysis was performed using R version 4.2.0. We have included a new Supplementary Data Set I, detailing the statistical analysis of qRT-PCR data. Additionally, we have revised the Methods/Statistics section to detail the applied statistical analysis. 

      (3) Some of the panels of figures are tiny and cannot be evaluated. For example, in Figure 1B the actual data (expression of Tbx3/5) is impossible to see.

      We appreciate the Reviewer’s observation and have revised the figures to improve visual clarity and ensure that the presented data are easily interpretable by readers.

      Reviewer #2 (Recommendations for the authors):

      Additional Experiments, Data, Analysis:

      (1) Comparisons between both single knockouts and double knockouts at the phenotypic level are needed. In some instances, the data is shown (e.g., mortality and EKG) but direct statistical comparison is not performed. In other instances (optical mapping and gene expression), data with single knockouts are not shown. If combined VCS Tbx3/Tbx5 deletion does not change the phenotype of the VCS Tbx5 single deletion, this should be explicitly stated and discussed.

      We appreciate Reviewer #2’s suggestion to compare the phenotypic outcomes of the Tbx3 and Tbx5 single conditional knockout models with those observed in Tbx3/Tbx5 double conditional knockout model. We have expanded the discussion section of our manuscript to incorporate a more detailed comparison between the double Tbx3/Tbx5 model and the single Tbx5 and Tbx3 models [1-5], highlighting the distinct phenotypic outcomes of the single and double knockouts.

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      (2) Mohan RA, Bosada FM, van Weerd JH, van Duijvenboden K, Wang J, Mommersteeg MTM, Hooijkaas IB, Wakker V, de Gier-de Vries C, Coronel R, Boink GJJ, Bakkers J, Barnett P, Boukens BJ, Christoffels VM (2020) T-box transcription factor 3 governs a transcriptional program for the function of the mouse atrioventricular conduction system. Proc Natl Acad Sci U S A. 117:18617-18626. doi: 10.1073/pnas.1919379117.

      (3) Arnolds DE, Liu F, Fahrenbach JP, Kim GH, Schillinger KJ, Smemo S, McNally EM, Nobrega MA, Patel VV, Moskowitz IP (2012) TBX5 drives Scn5a expression to regulate cardiac conduction system function. The Journal of Clinical Investigation 122:2509–2518. doi: 10.1172/JCI62617.

      (4) Frank DU, Carter KL, Thomas KR, Burr RM, Bakker ML, Coetzee WA, Tristani-Firouzi M, Bamshad MJ, Christoffels VM, Moon AM (2012) Lethal arrhythmias in Tbx3-deficient mice reveal extreme dosage sensitivity of cardiac conduction system function and homeostasis. Proc Natl Acad Sci U S A. 109:E154-63. doi: 10.1073/pnas.1115165109. [5] Moskowitz IP, Pizard A, Patel VV, Bruneau BG, Kim JB, Kupershmidt S, Roden D, Berul CI, Seidman CE, Seidman JG (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131:4107-4116. doi: 10.1242/dev.01265.

      (2) Genome-wide expression analysis including working myocardium would provide stronger evidence for interconversion of cell states. Ideally, this would include single knockouts.

      We agree that a genome-wide expression analysis, including a direct comparison with working myocardium, would provide more comprehensive insights into cell state transitions in Tbx3:Tbx5-deficient VCS cells. Additionally, incorporating single knockout models into such analyses would further clarify the distinct and cooperative contributions of Tbx3 and Tbx5 to maintaining VCS identity. This is a good suggestion for future studies.

      (3) This may not be essential to support the authors' claims, but the addition of epigenetic data from single and double KO VCS using ATAC-seq (which can be performed with relatively small numbers of cells) could provide stronger evidence for cell state changes of the kind hypothesized by the authors.

      We agree that epigenetic data such as ATAC-seq would complement transcriptional analyses and provide insight into chromatin states that underlie the observed cellular reprogramming. This is a good suggestion for follow-up studies to further characterize the molecular state of Tbx3:Tbx5-deficient VCS cells.

      (4) Additional clarification of the optical mapping experiments to exclude alternative interpretations like focal right bundle branch block and to include single knockouts for comparison - if the Tbx5 single KO looks the same as the double KO that would be very important to know and would directly affect interpretation of the experiment.

      Right septal optical mapping preparation involved removing the right ventricular free wall to directly image the right ventricular septum, which contains the VCS. In a healthy mouse, there are two peak components of the optical action potential upstroke, the first peak due to the activation of the VCS and the second due to the activation of the ventricular cardiomyocytes. Importantly, in Tbx3:Tbx5 double-conditional knockout mice, the first peak was absent, rather than delayed, indicating loss of fast conduction through the VCS. This absence suggests a shift in VCS cells toward a ventricular working myocardial phenotype, rather than a regional conduction block or delayed propagation through a structurally intact VCS.

      Previous studies from our group have extensively characterized the effect of single Tbx5 knockout on the VCS in murine hearts [1, 2, 3]. Arnolds et al. demonstrated that VCSspecific Tbx5-deficiency results in significant slowing of VCS conduction, evidenced by prolonged PR and QRS intervals, along with lengthening of the atrio-Hisian interval, His duration, and Hisioventricular interval [1]. Although both single Tbx5 knockout and Tbx3:Tbx5 double knockout mice exhibit slowing of ventricular conduction system, our optical mapping studies reveal distinct differences in their electrophysiological phenotypes. Burnicka-Turek et al. showed that the single knockout of Tbx5 in the VCS leads to a shift toward a pacemaker cell state, evidenced by ectopic beats originating in the ventricles and inappropriate automaticity [3]. During spontaneous beats, electrical impulses were retrogradely activated, propagating from the ventricles to the atria [3]. Whole-cell patch clamping recordings confirmed that Tbx5-deficient VCS cells displayed action potentials resembling pacemaker cells, characterized by slower upstroke (phase 0), prolonged plateau (phase 2), delayed repolarization (phase 3), and enhanced phase 4 depolarization [3]. In contrast, our current study on VCS-specific Tbx3:Tbx5 double knockout demonstrates a loss of the VCS-specific fast conduction propagation. Optical mapping demonstrated the absence of the initial upstroke corresponding to VCS activation in the His bundle region, indicating a shift in the VCS cells toward a ventricular working myocardium state. This loss of fast conduction properties highlights a fundamental distinction between single and double knockouts, suggesting that both Tbx3 and Tbx5 are required to maintain VCS identity and function.

      (1) D. E. Arnolds et al., “TBX5 drives Scn5a expression to regulate cardiac conduction system function,” J. Clin. Invest., vol. 122, no. 7, pp. 2509–2518, Jul. 2012, doi: 10.1172/JCI62617.

      (2) Moskowitz, I.P., Pizard, A., Patel, V.V., Bruneau, B.G., Kim, J.B., Kupershmidt, S., Roden, D., Berul, C.I., Seidman, C.E., Seidman, J.G. (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131(16):4107-4116. 

      (3) Burnicka-Turek, O., Broman, M.T., Steimle, J.D., Boukens, B.J., Peterenko, N.B, Ikegami, K., Nadadur, R.D., Qiao, Y., Arnolds, D.E., Yang, X.H., Patel, V.V., Nobrega, M.A., Efimov, I.R., Moskowitz, I.P. (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circ Res. 127(3):e94-e106. 

      Methods:

      (1) Additional methods on FACS are required. The methods section references a paper from 2004 (reference 67) that describes the flow sorting of embryonic cardiomyocytes. However, flow cytometric isolation of intact adult cardiomyocytes, which the authors describe in the present work, is a distinct technique and generally requires special equipment. These need to be described in more detail to be fully replicable.

      We thank Reviewer #2 for highlighting the need to provide additional details regarding our flow cytometric isolation of adult VCS cardiomyocytes. While we referenced earlier methods, we agree that isolating adult cardiomyocytes requires specialized approaches. Therefore, we revised the Methods section to include a detailed description of the equipment, procedures, and adaptations specific to isolating intact adult VCS cells to ensure full replicability.

      Minor Corrections:

      (1) Figure 1D. Please add a statistical test for mortality between the double conditional KO and the Tbx5 conditional KO.

      We have revised Figure 1D to include the statistical test comparing mortality between the Tbx3:Tbx5 double conditional knockout and the Tbx5 conditional knockout cohorts.

      (2) Figure 2A, 2I, 3A: Please include all individual data points not just a bar graph with error bars.

      We have added all individual data points to the bar graphs as recommended, enhancing the transparency and clarity of the data presentation.

      (3) Figure 2A: Please consider separate graphs for PR and QRS with appropriately scaled Y-axis so differences are easier to see.

      We appreciate Reviewer #2’s suggestion and fully agree with it. As a result, we have revised Figure 2A to include separate graphs for PR and QRS intervals, each with appropriately scaled Y-axes. This adjustment enhanced both the readability and the clarity of the observed differences.

      (4) Figure 3 G-K: The figure would be easier to interpret for the reader if genotypes were shown in the figure not just in the legend.

      We agree with Reviewer #2’s suggestion and have revised Figure 3 accordingly by adding genotype labels directly to the histological sections in Panels G-K. This update improves clarity, making the data easier for readers to interpret without needing to refer to the figure legend.

      (5) Figure 4A, C: Are vertical axes mislabeled? They say, "CON VCS and TBX5OE VCS". Please double-check axis labels and data on the graph.

      We appreciate the Reviewer bringing the mislabeling of the vertical axis in Figure 4 to our attention. We have corrected the labeling errors and ensured consistency between the graph and the underlying data.

      (6) Legend to Supplementary Figure 6. Says "Tbx3:Tbx3" instead of "Tbx3:Tbx5".

      We thank Reviewer #2 for pointing out the typo. It has been corrected to: “Supplementary Figure 6. Tbx3:Tbx5 double-conditional knockout mice exhibit QRS prolongation”.

      (7) Discussion. The authors write, "In Tbx3:Tbx5 double VCS knockout, we observed repression of fast VCS markers and also repression of Pan-CCS markers transcribed throughout the entire CCS." The term 'repression' has a specific connotation with transcription regulators that is likely not intended in this context so perhaps 'reduced expression' would be better here?

      We agree with Reviewer #2 and have replaced “repression” with “reduced expression” throughout the text (look below for references).

      “In the Tbx3:Tbx5 double VCS knockout, we observed a reduction in the expression of both fast VCS markers and Pan-CCS markers transcribed throughout the entire CCS.”

      (8) Discussion, the authors write, "This study combined with prior literature (1, 7, 11, 15, 26, 53, 54) indicates that the presence of both Tbx3 and Tbx5 is necessary for the specification of the adult VCS (Figure 7)." Since this work presents data from an adult conditional deletion, it's not clear how it informs our understanding of the specification, which occurs during development. Perhaps "maintenance of VCS fate" would be more appropriate here?

      We agree with Reviewer #2 that the term “maintenance of VCS fate” is more appropriate in the context of our study. Accordingly, we have updated the text to reflect this terminology.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 2B: It is hard to see the IF images. What is the cardiac structure studied? Maybe a dashed line and a label to define the region and the structure represented will help. As the authors have described that the crosses used contain a reporter allele (R26-EYFP), a clearer way to show these results would be to include images of the linage traced cells with the reporter, not only to identify the CCS structure analyzed, but also to demonstrate that the deletion is specific to the MinK-creERT expression in the CCS.

      We appreciate the Reviewer’s suggestion to improve the clarity of Figure 2B by delineating the cardiac structures analyzed. In response, we have added dashed lines and labels to highlight the regions of interest within the IF images. Unfortunately, we were unable to capture high-quality EYFP fluorescence images for these sections. However, to address this concern, we microdissected the region shown in the IF images and performed FACS to isolate EYFP-positive cells from this specific area. These sorted cells were subsequently used for qPCR analysis, which confirmed the presence of Tbx3 and Tbx5 in control samples and the successful deletion of both genes in the doubleconditional knockout samples (Figure 2C, middle panel). We believe this approach provides robust evidence for the specificity of the MinK-CreERT expression in the CCS and the efficiency of gene deletion in the targeted region.

      (2) 3G-K: The authors describe the absence of morphological defects in the tissue sections of adult hearts from the different genotypes analyzed. Although this reviewer agrees that there seem to be no major defects in the general cardiac morphology of these animals, the higher magnification images suggest some tissue differences at the level of the AVN especially in the double HET, double HOMO, and the Tbx3 HOMO. Is that due to the section plane used? If so, more appropriate and comparable sections must be provided. Again, as the crosses used by the authors contain a reporter allele (R26-EYFP), it is required that the authors show that the CCS cells, where deletions are induced, are still present in equivalent areas in the mutants and that they remain in similar numbers only failing to maintain their specification into CCS due to Tbx3 and Tbx5 loss of function.

      This analysis will reinforce the authors' claims on the role of Tbx5/Tbx3 in this process.

      We thank the reviewer for their thorough assessment and thoughtful feedback on our histological analysis. The higher magnification images in Figure 3G-K do not specifically present the AVN. These sections primarily represent areas of the ventricular conduction system (VCS), particularly the His bundle and bundle branches, rather than the AVN itself. We do not believe that the observed morphological differences are related to AVN tissue, and there were no functional deficits attributable to the AVN in the double knockout. Furthermore, the Mink-Cre allele used in this study does not recombine in the ANV proper.   We agree that confirming the presence of CCS cells in equivalent regions across different genotypes is crucial. Our approach using FACS-based isolation of EYFP-positive cells from the VCS, followed by qPCR analysis, provides evidence that these cells remain present in double conditional knockouts, although they fail to maintain their specialized gene expression profile. This reinforces our conclusion that Tbx3 and Tbx5 are essential for maintaining the molecular identity of CCS cells, rather than their physical presence.

      (3) Figure 4: The authors performed molecular analysis by qPCR and WB in Tbx5/Tbx3 double mutants to demonstrate that CCS cells lose the expression of CCS genes and express working myocardium genes. Could this be further demonstrated by ISH, HCR, or IF together with lineage tracing to provide evidence that these changes are located where the CCS tissues are in the control embryos? Analysis of 2 or 3 of these markers of each type on tissue sections would be enough.

      We thank the Reviewer for their insightful suggestion regarding additional validation of our molecular findings through ISH, HCR, or IF combined with lineage tracing. However, we would like to clarify that the molecular analyses we performed by qPCR and WB were conducted on EYFP-positive cells that were specifically isolated from the ventricular conduction system (VCS) region of both control and double conditional knockout (dCKO) mice. These EYFP-positive cells were obtained through fluorescence-activated cell sorting (FACS), ensuring that our analyses were confined to the targeted VCS population. Alternate approaches are appropriate for future studies to investigate the precise genomic and molecular nature of the transformation observed in the double knockout.

      (4) Discussion: in the discussion section the authors conclude that the combined role of Tbx5/Tbx3 is critical for the specification of the adult VCS. However, as the Tbx5/Tbx3 loss of function conditions are only induced in adult animals 6 weeks old, would it be more appropriate that their function is the maintenance of the VCS cell fate and that if not present these cells return to the working myocardium fate? If the authors believe that these genes are involved in the induction of VCS specification in adults, then they need to demonstrate that, before the loss of function induction at 6 weeks, these cells are not yet specified as adult VCS.

      We appreciate the Reviewer’s clarification regarding terminology. We agree that our study focuses on adult-specific conditional deletion and thus reflects the maintenance, rather than the specification, of VCS cell fate. Accordingly, we have revised the text to explicitly state that Tbx3 and Tbx5 are critical for maintaining VCS identity in adult mice, and that their loss leads to a shift toward a working myocardial fate.

      Minor:

      (1) There is no consistency in the way the quantitative data is shown in graphs. There are some graphs showing only bars, other dot plots, and other a combination of both. The authors must homogenise the representation of quantitative data showing the different data points in dot plots and not in bar graphs.

      We have standardized the quantitative data presentation across all figures, by including individual data points in bar graphs, ensuring enhanced transparency and clarity.

      (2) Figure 3: The labels defining the genotypes corresponding to the different histological sections of adult hearts (Panels G-K) are missing. Panels J and K are not referenced in the text.

      We thank Reviewer #3 for highlighting these omissions. We have added the genotype labels to the histological sections in Panels G-K of Figure 3 to ensure clarity. Furthermore, we have now referenced Panels J and K in the results and in the supplementary material (please look below for references).

      “Histological examination of all four-chambers demonstrated no discernible differences between VCS-specific Tbx3:Tbx5 double-knockout (Tbx3<sup>fl/fl</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) and control (Tbx3<sup>+/+</sup>;Tbx5<sup>+/+</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) mice, nor between . the double-knockout (Tbx3<sup>fl/fl</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) and single-knockout models for either Tbx3 (Tbx3<sup>fl/fl</sup>;Tbx5<sup>+/+</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) or Tbx5 (Tbx3<sup>+/+</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>).Ventricular muscle appeared normal without hypertrophy or myofibrillar disarray and no fibrosis was present (Figure 3G, 3I, 3J, and 3K, respectively).”

      “Additionally, we confirmed the absence of histological and structural abnormalities in these mice, aligning with previous findings (Figures 3A, 3F versus 3B, and 3K versus 3G, respectively)(1, 11).”

      (3) Typo: Supplementary Figure 6. Tbx3:Tbx3 double-conditional knockout: it should say Tbx5:Tbx3 double-conditional knockout.

      We thank Reviewer #3 for pointing out the typo. It has been corrected to: “Supplementary Figure 6. Tbx3:Tbx5 double-conditional knockout mice exhibit QRS prolongation”.

    1. eLife Assessment

      This manuscript makes important contributions to our understanding of cell polarization dynamics by demonstrating how compensatory regulatory and spatial mechanisms enhance the robustness of polarization patterns. By integrating a computational pipeline with comparisons to experimental data, the authors provide convincing evidence that stability and asymmetry in reaction-diffusion networks are crucial for polarization in C. elegans zygotes. Their findings offer novel insights into essential biological processes such as cell migration, division, and symmetry breaking. Future theoretical and experimental work could refine the model by addressing its acknowledged limitations.

    2. Joint Public Review:

      In this manuscript, the authors aim to evaluate the robustness of stable asymmetric polarization patterns by analyzing both a minimal 2-node network and a more biologically realistic 5-node network based on the C. elegans polarization system. They introduce a computational pipeline for systematically exploring reaction-diffusion network dynamics. Their study highlights the limitations of the widely used 2-node antagonistic network, demonstrating its susceptibility to simple modifications that disrupt polarization. However, they show that polarization stability can be restored by combining multiple regulatory mechanisms, and that spatially varying kinetic parameters can fine-tune the interface position. The authors further investigate the 5-node network of C. elegans, identifying key parameters that enhance its robustness against perturbations. Their findings provide novel insights into the mechanisms that ensure stable polarization in biological systems.

      The major strengths of this work lie in its rigorous computational approach and the clarity of its findings. The authors demonstrate that the widely used 2-node antagonistic network is highly sensitive to parameter changes, requiring precise fine-tuning to maintain stable polarization. However, they show that stability can be restored through compensatory modifications, which expand the range of parameter sets supporting polarization. By further exploring spatial parameter variations, the authors reveal how compensatory adjustments can stabilize polarization patterns, offering insights into potential biological mechanisms regulating interface localization.

      Extending their analysis to the C. elegans polarization network, the authors construct a 5-node model grounded in an extensive literature review. Their computational pipeline identifies key parameters that enhance robustness, and their model successfully replicates experimental observations, even in mutant conditions. Notably, among 34 possible network structures, only the naturally evolved 5-node network with mutual inhibition between specific components maintains stable polarization, highlighting its evolutionary optimization. This work significantly advances our understanding of polarization maintenance and provides a valuable framework for future in silico experiments.

      Despite its strengths, the study has some limitations related to simplifying assumptions. The model neglects cortical flows and the role of actomyosin dynamics, which are known to be crucial during the establishment phase of polarization in the C. elegans zygote. While the authors focus on the maintenance phase, the absence of these biomechanical effects may limit the model's applicability to the full polarization process. Additionally, the assumption of infinitely fast cytoplasmic diffusion disregards potential effects of cytoplasmic flows on the stability of molecular distributions. Experimental measurements suggest that cytoplasmic diffusion coefficients are only an order of magnitude higher than membrane diffusion coefficients, meaning that finite diffusion combined with cytoplasmic flows could influence polarization stability. Although the authors acknowledge and discuss these limitations, incorporating these effects in future models could provide a more complete picture of the polarization dynamics in C. elegans embryos.

    3. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This manuscript makes valuable contributions to our understanding of cell polarisation dynamics and its underlying mechanisms. Through the development of a computational pipeline, the authors provide solid evidence that compensatory actions, whether regulatory or spatial, are essential for the robustness of the polarisation pattern. However, a more comprehensive validation against experimental data and a proper estimation of model parameters are required for further characterization and predictions in natural systems, such as the C. elegans embryo.

      We sincerely thank the editor(s) for their pertinent assessment. We have carefully considered the constructive recommendations and made the necessary revisions in the manuscript, which are also detailed in this response letter. We have implemented most of the revisions requested by the reviewers. For the few requests we did not fully accept, we have provided justifications. The corresponding revisions in both the Manuscript and Supplementary Information are highlighted with a yellow background. To provide a more comprehensive validation against experimental data and model parameters used for characterizing and predicting natural systems, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference. These results effectively demonstrate how comprehensively the network structure and parameters capture the characteristics of the C. elegans embryo. We have also acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements.

      Joint Public Review:

      The polarisation phenomenon describes how proteins within a signalling network segregate into different spatial domains. This phenomenon holds fundamental importance in biology, contributing to various cellular processes such as cell migration, cell division, and symmetry breaking in embryonic morphogenesis. In this manuscript, the authors assess the robustness of stable asymmetric patterns using both a previously proposed minimal model of a 2-node network and a more realistic 5-node network based on the C. elegans cell polarisation network, which exhibits anterior-posterior asymmetry. They introduce a computational pipeline for numerically exploring the dynamics of a given reaction-diffusion network and evaluate the stability of a polarisation pattern. Typically, the establishment of polarisation requires the mutual inhibition of two groups of proteins, forming a 2-node antagonistic network. Through a reaction-diffusion formulation, the authors initially demonstrate that the widely-used 2-node antagonistic network for creating polarised patterns fails to maintain the polarised pattern in the face of simple modifications. However, the collapsed polarisation can be restored by combining two or more opposing regulations. The position of the interface can be adjusted with spatially varied kinetic parameters. Furthermore, the authors show that the 5-node network utilised by C. elegans is the most stable for maintaining polarisation against parameter changes, identifying key parameters that impact the position of the interface.

      We sincerely thank the editor(s) for the pertinent summary!

      While the results offer novel and insightful perspectives on the network's robustness for cell polarisation, the manuscript lacks comprehensive validation against experimental data, justified node-node network interactions, and proper estimation of model parameters (based on quantitative measurements or molecular intensity distributions). These limitations significantly restrict the utility of the model in making meaningful predictions or advancing our understanding of cell polarisation and pattern formation in natural systems, such as the C. elegans embryo.

      We sincerely thank the editor(s) for the comment!

      To provide a more comprehensive validation against experimental data and model parameters, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference. These meaningful predictions effectively demonstrate the utility of our model’s network structure and parameters in advancing our understanding of cell polarisation and pattern formation in natural systems, exemplified by the C. elegans embryo.

      We have also acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “node-node network interactions” and the “proper estimation of model parameters (based on quantitative measurements or molecular intensity distributions)”, both of which rely on experimental measurements of biological information.   However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness, supported by five experimental groups and eight perturbed conditions in the C. elegans embryo.

      The study extends its significance by examining how cells maintain pattern stability amid spatial parameter variations, which are common in natural systems due to extracellular and intracellular fluctuations. The authors found that in the 2-node network, varying individual parameters spatially disrupt the pattern, but stability is restored with compensatory variations. Additionally, the polarisation interface stabilises around the step transition between parameter values, making its localisation tunable. This suggests a potential biological mechanism where localisation might be regulated through signalling perception.

      We sincerely thank the editor(s) for the pertinent review!

      Focusing on the C. elegans cell polarisation network, the authors propose a 5-node network based on an exhaustive literature review, summarised in a supplementary table. Using their computational pipeline, they identify several parameter sets capable of achieving stable polarisation and claim that their model replicates experimental behaviour, even when simulating mutants. They also found that among 34 possible network structures, the wild-type network with mutual inhibition is the only one that proves viable in the computational pipeline. Compared with previous studies, which typically considered only 2- or 3-node networks, this analysis provides a more complete and realistic picture of the signalling network behind polarisation in the C. elegans embryo. In particular, the model for C. elegans cell polarisation paves the way for further in silico experiments to investigate the role of the network structure over the polarisation dynamics. The authors suggest that the natural 5-node network of C. elegans is optimised for maintaining cell polarisation, demonstrating the elegance of evolution in finding the optimal network structure to achieve certain functions.

      We sincerely thank the editor(s) for the pertinent review!

      Noteworthy limitations are also found in this work. To simplify the model for numerical exploration, the authors assume several reactions have equivalent dynamics, reducing the parameter space to three independent dimensions. While the authors briefly acknowledge this limitation in the "Discussion and Conclusion" section, further analysis might be required to understand the implications. For instance, it is not clear how the results depend on the particular choice of parameters. The authors showed that adding additional regulation might disrupt the polarised pattern, with the conclusion apparently depending on the strength of the regulation. Even for the 5-node wild-type network, which is the most robust, adding a strong enough self-activation of [A], as done in the 2-node network, will probably cause the polarised pattern to collapse as well.

      We sincerely thank the editor(s) for the comment!

      Now we have thoroughly expanded our acknowledgment of the model’s limitations in in 2. Results and 3. Discussion and conclusion. To rule out the equivalent dynamics assumption undermines our conclusions, we have added simulations showing that the cell polarization pattern stability does not depend on the exact strength of each regulation, provided the regulations on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values ( i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions (i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      Additionally, the authors utilise parameter values that are unrealistic, fail to provide units for some of them, and assume unknown parameter values without justification. The model appears to have non-dimensionalised length but not time, resulting in a mix of dimensional and non-dimensional variables that can be confusing. Furthermore, they assume equal values for Hill coefficients and many parameters associated with activation and inhibition pathways, while setting inhibition intensity parameters to 1. These arbitrary choices raise concerns about the fidelity of the proposed model in representing the real system, as their selected values could potentially differ by many orders of magnitude from the actual parameters.

      We sincerely thank the editor(s) for the comment!

      We apologize for the confusion. The non-dimensionalised parameter values are adopted from previous theoretical research [Seirin-Lee et al., Cells, 2020], which originates from the experimental measurement in [Goehring et al., J. Cell Biol., 2011; Goehring et al., Science, 2011]. With the in silico time set as 2 sec per step, now we have added the Supplemental Text justifying how the units are removed during non-dimensionalization. This demonstrates that the derived non-dimensionalized parameter in this paper achieves realistic values on the same order of magnitude as those observed in reality, confirming the fidelity of the proposed model in representing the real system.

      The assumption of “equal values for Hill coefficients and many parameters associated with activation and inhibition pathways” is to reduce the parameter space for affordable computational cost. It is a widely-used strategy to fix Hill coefficients [Seirin-Lee et al., J. Theor. Biol., 2015; Seirin-Lee, Bull. Math. Biol., 2021] and unify parameter values for different pathways in network research about both cell polarization [Marée et al., Bull. Math. Biol., 2006; Goehring et al., Science, 2011; Trong et al., New J. Phys., 2014] and other biological topics (e.g., plasmid transferring in the microbial community [Wang et al., Nat. Commun., 2020]), to control computational cost. Nevertheless, to rule out that the equivalent dynamics assumption undermines our conclusions, we have added simulations showing that the cell polarization pattern stability does not depend on the exact parameter values associated with activation and inhibition pathways, provided the regulations on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values (i.e_., _γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions ( i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      To confirm the fidelity of the proposed model in representing the real system, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference. These results effectively demonstrate how comprehensively the network structure and parameters capture the characteristics of the C. elegans embryo. We have also acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements.

      It is worth noting that, although a strict match between numerical and realistic parameter values with consistent units is always helpful, a lot of notable pure numerical studies successfully unveil principles that help interpret [Ma et al., Cell, 2009] and synthesize real biological systems [Chau et al., Cell, 2012]. These studies suggest that numerical analysis in biological systems remains powerful, even when comprehensive experimental data from prior research are not fully available.

      The definition of stability and its evaluation in the proposed pipeline might also be too narrow. Throughout the paper, the authors discuss the stability of the polarised pattern, checked by an exhaustive search of the parameter space where the system reaches a steady state with a polarised pattern instead of a homogeneous pattern. It is not clear if the stability is related to the linear stability analysis of the reaction terms, as conducted in Goehring et al. (Science, 2011), which could indicate if a homogeneous state exists and whether it is stable or unstable. The stability test is performed through a pipeline procedure where they always start from a polarised pattern described by their model and observe how it evolves over time. It is unclear if the conclusions depend on the chosen initial conditions. Particularly, it is unclear what would happen if the initial distribution of posterior molecules is not exactly symmetric with respect to the anterior molecules, or if the initial polarisation is not strong.

      We sincerely thank the editor(s) for the comment!

      The definition of stability and its evaluation in the proposed pipeline consider two criteria: 1. The pattern is polarized; 2. The pattern is stable. Following simulations, figures, and videos (Fig. 1-6; Fig. S1-S5; Fig. S7-S9; Movie S1-S5) have sufficiently demonstrated that the parameters and networks set up capture the cell polarization dynamis regarding both the stable and unstable states very well.

      Now we have added new simulation on alternative initial conditions. They demonstrating the necessity of a polarized initial pattern set up independently of the reaction-diffusion network during the establishment phase, probably through additional mechanisms such as the active actomyosin contractility and flow [Cuenca et al., Development, 2003; Gross et al., Nat. Phys., 2019]. Our conclusions ( i.e., single-sided self-regulation, single-sided additional regulation, and unequal system parameters cause the stable polarized pattern to collapse) have little dependence on the chosen initial conditions as long as the unsymmetric initial patterns can set up a stable polarized pattern. A part of the simulations institutively show our conclusions still hold if the initial distribution of posterior molecules is not exactly symmetric with respect to the anterior molecules, or if the initial polarisation is not strong (Fig. S4 and Fig. S9).

      Regarding the biological interpretation and relevance of the model, it overlooks some important aspects of the C. elegans polarisation system. The authors focus solely on a reaction-diffusion formulation to reproduce the polarisation pattern. However, the polarisation of the C. elegans zygote consists of two distinct phases: establishment and maintenance, with actomyosin dynamics playing a crucial role in both phases (see Munro et al., Dev Cell 2004; Shivas & Skop, MBoC 2012; Liu et al., Dev Biol 2010; Wang et al., Nat Cell Biol 2017). Both myosin and actin are crucial to maintaining the localisation of PAR proteins during cell polarisation, yet the authors neglect cortical flows during the establishment phase and any effects driven by myosin and actin in their model, failing to capture the system's complexity. How this affects the proposed model and conclusions about the establishment of the polarisation pattern needs careful discussion. Additionally, they assume that diffusion in the cytoplasm is infinitely fast and that cytoplasmic flows do not play any role in cell polarity. Finite cytoplasmic diffusion combined with cytoplasmic flows could compromise the stability of the anterior-posterior molecular distributions. The authors claim that cytoplasmic diffusion coefficients are two orders of magnitude higher than membrane diffusion coefficients, but they seem to differ by only one order of magnitude (Petrášek et al., Biophys. J. 2008). The strength of cytoplasmic flows has been quantified by a few studies, including Cheeks et al., and Curr Biol 2004.

      We sincerely thank the editor(s) for the comment!

      Indeed, previous research highlighted the importance of convective cortical flow in orchestrating the localisation of PAR proteins during the establishment phase of polarisation formation [Goehring et al., J. Cell Biol., 2011; Rose et al., WormBook, 2014; Beatty et al., Development, 2013]. However, during the maintenance phase, the non-muscle myosin II (NMY-2) is regulated downstream by the PAR protein network rather than serving as the primary upstream factor controlling PAR protein localization [Goehring et al., J. Cell Biol., 2011; Rose et al., WormBook, 2014; Beatty et al., Development, 2013]. While some theoretical studies integrated both reaction-diffusion dynamics and the effects of myosin and actin [Tostevin, 2008; Goehring, Science, 2011], others focused exclusively on reaction-diffusion dynamics [Dawes et al., Biophys. J., 2011; Seirin-Lee et al., Cells, 2020]. We have now clarified the distinction between the establishment and maintenance phases in 1. Introduction, emphasized our research focus on the reaction-diffusion dynamics during the maintenance phase in 2. Results, and provided a discussion of the omitted actomyosin dynamics to foster a more comprehensive understanding in the future in 3. Discussion and conclusion. The effect of the establishment phase is studied as the initial condition for the cell polarization simulation solely governed by reaction-diffusion dynamics, with new simulations demonstrating the necessity of a polarized initial pattern set up independently of the reaction-diffusion network during the establishment phase, probably through additional mechanisms such as the active actomyosin contractility and flow [Cuenca et al., Development, 2003; Gross et al., Nat. Phys., 2019].

      Cytoplasmic and membrane diffusion coefficients differ by two orders of magnitude according to previous experimental measurements on PAR-2 and PAR-6 [Goehring et al., J. Cell Biol., 2011; Lim et al., Cell Rep., 2021]. Many previous C. elegans cell polarization models have incorporated mass-conservation model combined with finite cytoplasmic diffusion, but this model description can lead to reverse spatial concentration distribution between the cell membrane and cytosol [Fig. 3 of Seirin-Lee et al., J. Theor. Biol., 2016; Fig. 2ab of Seirin-Lee et al., J. Math. Biol., 2020], disobeying experimental observation [Fig. 4A of Sailer et al., Dev. Cell, 2015; Fig. 1A of Lim et al., Cell Rep., 2021]. This implies that the infinite cytoplasmic diffusion, without precise experiment-based parameter assignment or accounting for other hidden biological processes ( e.g., protein production and degradation), may be inappropriate in modeling the real spatial concentration distributions distinguished between the cell membrane and cytosol. To address this issue, some theoretical research incorporated protein production and degradation into their model, to acquire the consistent spatial concentration distribution between the cell membrane and cytosol [Tostevin et al., Biophys. J., 2008]. More definitive experimental data on the spatiotemporal changes in protein diffusion, production, and degradation are essential for providing a more realistic representation of cellular dynamics and enhancing the model's predictive power.

      Now we have acknowledged the possibly overlooked aspects of the C. elegans polarisation system in 3. Discussion and conclusion, a detailed outline of potential model improvements. Those aspects include, but are not limited to, issues involving “neglect cortical flows” and the “diffusion in the cytoplasm is infinitely fast”. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness. The meaningful predictions of five experimental groups and eight perturbed conditions in the C. elegans embryo faithfully supports the biological interpretation and relevance of the model.

      Although the authors compare their model predictions to experimental observations, particularly in reproducing mutant behaviours, they do not explicitly show or discuss these comparisons in detail. Diffusion coefficients and off-rates for some PAR proteins have been measured (Goehring et al., JCB 2011), but the authors seem to use parameter values that differ by many orders of magnitude, perhaps due to applied scaling. To ensure meaningful predictions, whether their proposed model captures the extensive published data should be evaluated. Various cellular/genetic perturbations have been studied to understand their effects on anterior-posterior boundary positioning. Testing these perturbations' responses in the model would be important. For example, comparing the intensity distribution of PAR-6 and PAR-2 with measurements during the maintenance phase by Goehring et al., JCB 2011, or comparing the normalised intensity of PAR-3 and PKC-3 from the model with those measured by Wang et al., Nat Cell Biol 2017, during establishment and maintenance phases (in both wild-type and cdc-42 (RNAi) zygotes) could provide insightful validation. Additionally, in the presence of active CDC-42, it has been observed that PAR-6 extends further into the posterior side (Aceto et al., Dev Biol 2006). Conducting such validation tests is essential to convince readers that the model accurately represents the actual system and provides insights into pattern formation during cell polarisation.

      We sincerely thank the editor(s) for the comment!

      To provide more comprehensive validations and refinements to ensure the model accurately represents biological systems, we extensively reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total from published data, comprising eight perturbed conditions and using wild-type as the reference. We have also explicitly show the comparison between model predictions and experimental observations (including the mutant behaviors reproduction as well) in detail, by describing how “cell polarization pattern characteristics in simulation” responds to various cellular/genetic perturbations (Section 2.5; Fig. 5; Fig. S7 and S8). The original and new validation tests conducted can convince readers that the model accurately represents the actual system and provides insights into pattern formation during cell polarisation.

      The diffusion coefficients for anterior and posterior molecular species were assigned according to previous experimental and theoretical research [Goehring et al., J. Cell Biol., 2011; Goehring et al., Science, 2011; Seirin-Lee et al., Cells, 2020]. The off-rates are assigned uniformly by searching viable parameter sets that can set up a network with cell polarization pattern stability. Now we have added simulations showing that the cell polarization pattern stability and response to network structure and parameter perturbation does not depend on the exact parameter values (incl., diffusion coefficients and off-rates), provided the parameter values on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values ( i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions ( i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      With the in silico time set as 2 sec per step, now we have added the Supplemental Text justifying how the units are removed during non-dimensionalization. This demonstrates that the derived non-dimensionalized parameter in this paper achieves realistic values on the same order of magnitude as those observed in reality, confirming the fidelity of the proposed model in representing the real system. We agreed that full experimental measurements of biological information are essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion.

      A clear justification, with references, for each network interaction between nodes in the five-node model is needed. Some of the activatory/inhibitory signals proposed by the authors have not been demonstrated ( e.g. CDC-42 directly inhibiting CHIN-1). Table S2 provided by the authors is insufficient to justify each node-node interaction, requiring additional explanations. (See the review by Gubieda et al., Phil. Trans. R. Soc. B 2020, for a similar node network that differs from the authors' model.) Additionally, the intensity distributions of cortical PAR-3 and PKC-3 seem to vary significantly during both establishment and maintenance phases (Wang et al., Nat Cell Biol 2017), yet the authors consider the PAR-3/PAR-6/PKC-3 as a single complex. The choices in the model should be justified, as the presence or absence of clustering of these PAR proteins can be crucial during cell polarisation (Wang et al., Nat Cell Biol 2017; Dawes & Munro, Biophys J 2011).

      We sincerely thank the editor(s) for the comment!

      Now we have acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “each network interaction between nodes” and the “consider the PAR-3/PAR-6/PKC-3 as a single complex”, in which the former one relies on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion.

      In consistent with previous modeling efforts [Goehring et al., Science, 2011; Gross et al., Nat. Phys., 2019; Lim et al., Cell Rep., 2021], our model treats the PAR-3/PAR-6/PKC-3 complex as a single entity for simplification, thus neglecting the potentially distinct spatial distributions of each single molecular species. We agree that a more comprehensive model, capable of resolving the individual localization patterns of these anterior PAR proteins, would be a valuable future direction. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness, supported by five experimental groups and eight perturbed conditions in the C. elegans embryo.

      In summary, the authors successfully demonstrate the importance of compensatory actions in maintaining polarisation robustness. Their computational pipeline offers valuable insights into the dynamics of reaction-diffusion networks. However, the lack of detailed experimental validation and realistic parameter estimation limits the model's applicability to real biological systems. While the study provides a solid foundation, further work is needed to fully characterise and validate the model in natural contexts. This work has the potential to significantly impact the field by providing a new perspective on the robustness of cell polarisation networks.

      We sincerely thank the editor(s) for the pertinent summary!

      To provide a more comprehensive validation against experimental data and model parameters, three more groups of the qualitative and semi-quantitative phenomenon regarding CDC-42 are reproduced based on previously published experiments (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total, comprising eight perturbed conditions and using wild-type as the reference.

      With the in silico time set as 2 sec per step, now we have added the Supplemental Text justifying how the units are removed during non-dimensionalization. This demonstrates that the derived non-dimensionalized parameter in this paper achieves realistic values on the same order of magnitude as those observed in reality, confirming the fidelity of the proposed model in representing the real system. Together with the reproduction of five experimental groups (eight perturbed conditions with wild-type as the reference), the model’s applicability to real biological systems in natural contexts are are fully characterised and validated.

      The computational pipeline developed could be a valuable tool for further in silico experiments, allowing researchers to explore the dynamics of more complex networks. To maximise its utility, the model needs comprehensive validation and refinement to ensure it accurately represents biological systems. Addressing these limitations, particularly the need for more detailed experimental validation and realistic parameter choices, will enhance the model's predictive power and its applicability to understanding cell polarisation in natural systems.

      We sincerely thank the editor(s) for the comment!

      To provide more comprehensive validations and refinements to ensure the model accurately represents biological systems, we extensively reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total from published data, comprising eight perturbed conditions and using wild-type as the reference. We have also explicitly show the comparison between model predictions and experimental observations (including the mutant behaviors reproduction as well) in detail, by describing how “cell polarization pattern characteristics in simulation” responds to various cellular/genetic perturbations (Section 2.5; Fig. 5; Fig. S7 and S8).

      With the in silico time set as 2 sec per step, now we have added the Supplemental Text justifying how the units are removed during non-dimensionalization. This demonstrates that the derived non-dimensionalized parameter in this paper achieves realistic values on the same order of magnitude as those observed in reality, confirming the fidelity of the proposed model in representing the real system. Together with the reproduction of five experimental groups (eight perturbed conditions with wild-type as the reference), the model's predictive power and its applicability to understanding cell polarisation in natural systems are enhanced.

      Now we have added simulations showing that the cell polarization pattern stability and response to network structure and parameter perturbation does not depend on the exact parameter values (incl., diffusion coefficients, basal off-rates and inhibition intensity), provided the parameter values on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions ( i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      Recommendations for the Authors:

      (1) Parameterisation and Model Validation: The authors utilise parameter values that lack realism and fail to provide units for some of them, which can lead to confusion. For instance, the length of the cell is set to 0.5 without clear justification, raising questions about the scale used. Additionally, there's a mix of dimensional and non-dimensional variables, potentially complicating interpretation. Furthermore, arbitrary choices such as equal Hill coefficients and setting inhibition intensity parameters to 1 raise concerns about model fidelity. To ensure meaningful predictions, the authors should validate their model against extensive published data, including cellular/genetic perturbations. For example, comparing intensity distributions of PAR proteins measured during maintenance phases by Goehring et al., JCB 2011, and those obtained from the model could provide valuable validation. Similarly, comparisons with data from Wang et al., Nat Cell Biol 2017, on wild-type and cdc-42 (RNAi) zygotes, as well as observations from Aceto et al., Dev Biol 2006, on PAR-6 extension in the presence of active CDC-42, would strengthen the model's validity. Such validation tests are essential for convincing readers that the model accurately represents the actual system and can provide insights into pattern formation during cell polarisation.

      We sincerely thank the editor(s) and referee(s) for the helpful suggestion!

      Now we have added a new section, Parameter Nondimensionalization and Order of Magtitude Consistency, into Supplemental Text. In this section, we introduced how we adopted the parameter nondimensionalization and value assignments from previous works [Goehring et al., J. Cell Biol., 2011; Goehring et al., Science, 2011; Seirin-Lee et al., Cells, 2020]. We listed four examples (i.e., evolution time, membrane diffusion coefficient, basal off-rate, and inhibition intensity) to show the consistency in order of magtitude between numerical and realistic values.

      The assumption of “equal Hill coefficients” is to reduce the parameter space for an affordable computational cost. It is a widely-used strategy to fix Hill coefficients [Seirin-Lee et al., J. Theor. Biol., 2015; Seirin-Lee, Bull. Math. Biol., 2021] in network research, to control computational cost. Besides, setting inhibition intensity parameters to 1 is for determining a numerical scale. Now we have added simulations showing that the cell polarization pattern stability does not depend on the exact parameter values associated with activation and inhibition pathways, provided the regulations on both sides are initially balanced as a whole (Fig. S5). Specifically, we used a Monte Carlo method to sample a wide range of various parameter values (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) for all nodes and regulations in simple 2-node network and C. elegans 5-node network, to achieve pattern stability. Under these conditions (i.e., without any reduction in the parameter space), single-sided self-regulation, single-sided additional regulation, and unequal system parameters still cause the stable polarized pattern to collapse, consistent with our conclusions in the simplified conditions with the parameter space reduced to three independent dimensions.

      To confirm the fidelity of the proposed model in representing the real system, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference. These results effectively demonstrate how comprehensively the network structure and parameters capture the characteristics of the C. elegans embryo. We have also acknowledged the limitations of the current cell polarization model and provided, in 2. Results and 3. Discussion and conclusion, a detailed outline of potential model improvements.

      It is worth noting that, although a strict match between numerical and realistic parameter values with consistent units is always helpful, a lot of notable pure numerical studies successfully unveil principles that help interpret [Ma et al., Cell, 2009] and synthesize real biological systems [Chau et al., Cell, 2012]. These studies suggest that numerical analysis in biological systems remains powerful, even when comprehensive experimental data from prior research are not fully available.

      (2) Parameter Changes: It is not clear how the parameters change as more complicated networks are explored, and how this affects the comparison between the simple and complete model. Clarification on this point would be beneficial.

      We sincerely thank the editor(s) and referee(s) for the helpful suggestion!

      The computational pipeline in Section 2.1 is generalized for all reaction-diffusion networks, including the simple and complete ones studied in this paper. The parameter changes included two parts: 1. The mutual activation in the anterior (none for the simple 2-node network and q<sub2</sub> for the complete 5-node network); 2. The viable parameter sets (122 sets for the simple 2-node network and 602 sets for the complete 5-node network). Now we have explicitly clarified those differences:

      Those differences don’t affect the comparison between the simple and complete models. Now we have added comprehensive comparisons between the simple and complete models about 1. How they respond to alternative initial conditions consistently (Fig. S2). 2. How they respond to alternative single modifications consistently (Fig. S4 and S9), even when the parameters (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) are assigned with various values concerning all nodes and regulations (Fig. S5).

      (3) Exploration of Model Parameter Space: In the two-node dual antagonistic model, the authors observe that the cell polarisation pattern is unstable for different systems (Fig. 1). However, it remains uncertain whether this instability holds true for the entire model parameter space. Have the authors thoroughly screened the full model parameter space to support their statements? It would be beneficial for the authors to provide clarification on the extent of their exploration of the model parameter space to ensure the robustness of their conclusions.

      We sincerely thank the editor(s) and referee(s) for the helpful suggestion!

      The trade-off between considered parameter space and computational cost is a long-term challenge in network study as there are always numerous combinations of network nodes, edges, and parameters [Ma et al., Cell, 2009; Chau et al., Cell, 2012]. The computational pipeline in Section 2.1 generalized for all reaction-diffusion networks exerts two strategies to limit the computational cost and set up a basic network reference: 1. Dimension Reduction (Strategy 1) - Unifying the parameter values for different nodes and different edges within the same regulatory type to minimize the unidentical parameter numbers into 3; 2: Parameter Space Confinement (Strategy 2): Enumerating the dimensionless parameter set on a three-dimensional (3D) grid confined by γ∈ [0,0.05] in steps ∆γ = 0.001, k<sub>1</sub>∈[0,5] in steps ∆k<sub>1</sub> = 0.05,  and  in steps .

      In the early stage of our project, we tried to explore “the entire model parameter space” as indicated by the reviewer. We first tried to use the Monte Carlo method to find parameter solutions in an open parameter space and with all parameter values allowed to be different. However, such a process is full of randomness and is computationally expensive (taking months to search viable parameter sets but still unable to profile the continuous viable parameter space; the probability of finding a viable parameter set is no higher than 0.02%, making it very hard to profile a continuous viable parameter space). Now we clearly can see the viable parameter space is a thin curved surface where all parameters have to satisfy a critical balance (Fig. 3a, b, Fig. 5e, f). This is why we exert a typical strategy for dimension reduction in network research in both cell polarization [Marée et al., Bull. Math. Biol., 2006; Goehring et al., Science, 2011; Trong et al., New J. Phys., 2014] and other biological topics (e.g., plasmid transferring in the microbial community [Wang et al., Nat. Commun., 2020]), i.e., unifying the parameter values for different nodes and different edges within the same regulatory type.

      Additionally, the curved surface for viable parameter space can be extended to infinite as long as the parameter balance is achieved (Fig. 3a, b, Fig. 5e, f), it is impossible or unnecessary to explore “the entire model parameter space”. Setting up a confined parameter region near the original point for parameter enumeration can help profile the continuous viable parameter space, which is sufficient for presenting the central conclusion of this paper – that is - the network structure and parameter need to satisfy a balance for stable cell polarization.

      To support a comprehensive study considering all kinds of reference and perturbed networks, we have maximized the parameter domain size by exhausting all the computational research we can access, including 400-500 Intel(R) Core(TM) E5-2670v2 and Gold 6132 CPU on the server (High-Performance Computing Platform at Peking University) and 5 Intel(R) Core(TM) i9-14900HX CPU on personal computers.

      To make it certain that instability holds true when the model parameter space is extended, we add a comprehensive comparison between the simple and complete models about how their instability occurs consistently even when the parameters (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub>) are assigned with various values concerning all nodes and regulations, searched by the Monte Carlo method (Fig. S5).

      (4) Sensitivity of Numerical Solutions to Initial Conditions: Are the numerical solutions in both models sensitive to the chosen initial condition? What results do the models provide if uniform initial distributions were utilised instead?

      We sincerely thank the editor(s) and referee(s) for the comments!

      To investigate both the simple network and the realistic network consisting of various node numbers and regulatory pathways [Goehring et al., Science, 2011; Lang et al., Development, 2017], we propose a computational pipeline for numerical exploration of the dynamics of a given reaction-diffusion network's dynamics, specifically targeting the maintenance phase of stable cell polarization after its initial establishment [Motegi et al., Nat. Cell Biol., 2011; Goehring et al., Science, 2011; Seirin-Lee et al., Cells, 2020].

      Now we have added new simulations and explanations for the sensitivity of numerical solutions to initial conditions. For both models, a uniform initial distribution leads to a homogeneous pattern while a Gaussian noise distribution leads to a multipolar pattern. In contrast, an initial polarized distribution (even with shifts in transition planes, weak polarization, or asymmetric curve shapes between the two molecular species) can maintain cell polarization reliably.

      (5) Initial Conditions and Stability Tests: In Figure 1, the authors discuss the stability of the basic two-node network (a) upon modifications in (b-d). The stability test is performed through a pipeline procedure in which they always start from a polarised pattern described by Equation (4) and observe how the pattern evolves over time. It would be beneficial to explore whether the stability test depends on this specific initial condition. For instance, what would happen if the posterior molecules have an initial distribution of 1/(1+e^(-10x)), which is not exactly symmetric with respect to the anterior molecules' distribution of 1-1/(1+e^(-20x))? Additionally, if the initial polarisation is not as strong, for example, with the anterior molecules having a distribution of 10-1/(1+e^(-20x)) and the posterior molecules having a distribution of 9+1/(1+e^(-20x)), how would this affect the results?

      We sincerely thank the editor(s) and referee(s) for the constructive advice!

      Now we have added comprehensive comparisons between the simple and complete models about how they respond to alternative initial conditions consistently (Fig. S4, Fig. S9). The successful cell polarization pattern requests an initial polarized pattern, but its following stability and response to perturbation depend very little on the specific form of the initial polarized pattern. All the conditions mentioned by the reviewer have been included.

      (6) Stability Analysis: Throughout the paper, the authors discuss the stability of the polarised pattern. The stability is checked by an exhaustive search of the parameter space, ensuring the system reaches a steady state with a polarised pattern instead of a homogeneous pattern. It would be beneficial to explore if this stability is related to a linear stability analysis of the model parameters, similar to what was conducted in Reference [18], which can determine if a homogeneous state exists and whether it is stable or unstable. Including such an analysis could provide deeper insights into the system's stability and validate its robustness.

      We sincerely thank the editor(s) and referee(s) for the comments!

      We agree that the linear stability analysis can potentially offer additional insights into polarized pattern behavior. However, this approach often requests the aid of numerical solutions and is therefore not entirely independent [Goehring et al., Science, 2011]. Over the past decade, numerical simulations have consistently proven to be a reliable and sufficient approach for studying network dynamics, spanning from C. elegans cell polarization [Tostevin et al., Biophys. J, 2008; Blanchoud et al., Biophys. J, 2015; Seirin-Lee, Dev. Growth Differ., 2020] to topics in metazoon [Chau et al., Cell, 2012; Qiao et al., eLife, 2022; Sokolowski et al., arXiv, 2023]. Numerous purely numerical studies have successfully unveiled principles that help interpret [Ma et al., Cell, 2009] and synthesized real biological systems [Chau et al., Cell, 2012], independent of additional mathematical analysis. Thus, we leverage our numerical framework to address the cell polarization problems cell polarization problems in this paper.

      To confirm the reliability of stability checked by an exhaustive search of the parameter space, now we reproduce the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], we reproduce five experimental groups in total (two acting on LGL-1 and three acting on CDC-42), comprising eight perturbed conditions and using wild-type as the reference.

      To confirm the robustness of our conclusions regarding the system's stability, now we add comprehensive comparisons between the simple and complete models about 1. How they respond to alternative initial conditions consistently (Fig. S4; Fig. S9). 2. How they respond to alternative single modifications consistently, even when the parameters (i.e., γ, α, k<sub>1</sub>, k<sub>2</sub>, q<sub>1</sub>, q<sub>2</sub> and [X<sub>c</sub> ) are assigned with various values concerning all nodes and regulations (Fig. S5).

      (7) Interface Position Determination: In Figure 4, the authors demonstrate that by using a spatially varied parameter, the position of the interface can be tuned. Particularly, the interface is almost located at the step where the parameter has a sharp jump. However, in the case of a homogeneous parameter (e.g., Figure 4(a)), the system also reaches a stable polarised pattern with the interface located in the middle (x = 0), similar to Figure 4(b), even though the homogeneous parameter does not contain any positional information of the interface. It would be helpful to clarify the difference between Figure 4(a) and Figure 4(b) in terms of the interface position determination.

      We sincerely thank the editor(s) and referee(s) for the comments!

      The case of a homogeneous parameter (e.g., Fig. 4a), in which the system also reaches a stable polarised pattern with the interface located in the middle (x = 0), is just a reference adopted from Fig. 1a to show that the inhomogeneous positional information in Fig. 4b can achieve a similar stable polarised pattern.

      Now we clarify the interface position determination to Section 2.4 to improve readability. Moreover, it is marked with grey dashed line in all the patterns in Fig. 4 and Fig. 6 to highlight the importance of inhomogeneous parameters on interface localization.

      (8) Presented Comparison with Experimental Observations: The comparison with experimental observations lacks clarity. It isn't clear that the model "faithfully recapitulates" the experimental observations (lines 369-370). We recommend discussing and showing these comparisons more carefully, highlighting the expectations and similarities.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we remove the word “faithfully” and highlight the expectations and similarities of each experimental group by describing “cell polarization pattern characteristics in simulation: …”.

      (9) Validation of Model with Experimental Data: Given the extensive number of model parameters and the uncertainty of their values, it is essential for the authors to validate their model by comparing their results with experimental data. While C. elegans polarisation has been extensively studied, the authors have yet to utilise existing data for parameter estimation and model validation. Doing so would considerably strengthen their study.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      To utilise existing data for parameter estimation, now we add a new section, Parameter Nondimensionalization and Order of Magtitude Consistency, into Supplemental Text. In this section, we introduced how we adopted the parameter nondimensionalization and value assignments from previous works [Goehring et al., J. Cell Biol., 2011; Goehring et al., Science, 2011; Seirin-Lee et al., Cells, 2020]. We listed four examples (i.e., evolution time, membrane diffusion coefficient, basal off-rate, and inhibition intensity) to show the consistency in order of magtitude between numerical and realistic values.

      To utilise existing data for model validation, now we reproduce the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], we reproduce five experimental groups in total (two acting on LGL-1 and three acting on CDC-42), comprising eight perturbed conditions and using wild-type as the reference.

      Also, we acknowledge the limitations of the current cell polarization model and provided, in 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “extensive number of model parameters” and “uncertainty of their values”, both of which rely on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness, supported by five experimental groups and eight perturbed conditions with wild-type as a reference in the C. elegans embryo.

      (10) Enhancing Model Accuracy by Considering Cortical Flows: The authors are encouraged to include cortical flows in their cell polarisation model, as these flows are known to be pivotal in the process. Although the current model successfully predicts cell polarisation without accounting for cortical flows, research has demonstrated their significant role in polarisation formation. By incorporating cortical flows, the model would provide a more thorough and precise representation of the biological process. Furthermore, previous studies, such as those by Goehring et al. (References 17 and 18), highlight the importance of convective actin flow in initiating polarisation. It would be valuable for the authors to address the contribution of convection with actin flow to the establishment of the polarisation pattern. The polarisation of the C. elegans zygote progresses through two distinct phases: establishment and maintenance, both heavily influenced by actomyosin dynamics. Works by Munro et al. (Dev Cell 2004), Shivas & Skop (MBoC 2012), Liu et al. (Dev. Biol. 2010), and Wang et al. (Nat Cell Biol 2017) underscore the critical roles of myosin and actin in orchestrating the localisation of PAR proteins during cell polarisation. To enhance the fidelity of their model, we recommend that the authors either integrate cortical flows and consider the effects driven by myosin and actin, or provide a discussion on the repercussions of omitting these dynamics.

      We sincerely thank the editor(s) and referee(s) for the comment!

      Indeed, previous research highlighted the importance of convective cortical flow in orchestrating the localisation of PAR proteins during the establishment phase of polarisation formation [Goehring et al., J. Cell Biol., 2011; Rose et al., WormBook, 2014; Beatty et al., Development, 2013]. However, during the maintenance phase, the non-muscle myosin II (NMY-2) is regulated downstream by the PAR protein network rather than serving as the primary upstream factor controlling PAR protein localization. While some theoretical studies integrated both reaction-diffusion dynamics and the effects of myosin and actin [Tostevin et al., Biophys J, 2008; Goehring et al, Science, 2011], others focused exclusively on reaction-diffusion dynamics [Dawes et al., Biophys. J., 2011; Seirin-Lee et al., Cells, 2020]. Now we clarify the distinction between the establishment and maintenance phases, emphasize our research focus on the reaction-diffusion dynamics during the maintenance phase, and provide a discussion of these omitted dynamics to foster a more comprehensive understanding in the future, as suggested.

      (11) Further Justification of Network Interactions: The authors should provide additional explanations, supported by empirical evidence, for the network interactions assumed in their model. This includes both node-node interactions and the rationale behind protein complex formations. Some of the proposed interactions lack empirical validation, as noted in studies such as Gubieda et al., Phil. Trans. R. Soc. B 2020. Additionally, discrepancies in protein intensity distributions, as observed in Wang et al., Nat Cell Biol 2017, should be addressed, particularly concerning the consideration of the PAR-3/PAR-6/PKC-3 complex as a single entity. Justifying these choices is crucial for ensuring the model's credibility and alignment with experimental findings.

      We sincerely thank the editor(s) and referee(s) for the helpful advice!

      In consistency with previous modeling efforts [Goehring et al., Science, 2011; Gross et al., Nat. Phys., 2019; Lim et al., Cell Rep., 2021], our model treats the PAR-3/PAR-6/PKC-3 complex as a single entity for simplification, thus neglecting the potentially distinct spatial distributions of each single molecular species.

      Now we acknowledge the limitations of the current cell polarization model and provided, in 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “node-node interactions” and “discrepancies in protein intensity distributions”, both of which rely on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion.

      To ensure the model's credibility and alignment with experimental findings, now we reproduce the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference.

      (12) Further Justification of Node-Node Network Interactions: The authors should provide further justification for the node-node network interactions assumed in their study. To the best of our knowledge, some of the node-node interactions proposed have not yet been empirically demonstrated. Providing additional explanations for these interactions would enhance the credibility of the model and ensure its alignment with empirical evidence.

      We sincerely thank the editor(s) and referee(s) for the helpful advice!

      Now we acknowledge the limitations of the current cell polarization model and provided, in 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “node-node network interactions”, which rely on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion.

      To enhance the credibility of the model and ensure its alignment with empirical evidence, we reproduced the qualitative and semi-quantitative phenomenon in three more experimental groups previously published (Section 2.5; Fig. S8) [Gotta et al., Curr. Biol., 2001; Aceto et al., Dev. Biol., 2006]. Combined with the original experiments (Section 2.5; Fig. 5; Fig. S7) [Hoege et al., Curr. Biol., 2010; Beatty et al., Development, 2010; Beatty et al., Development, 2013], now we have reproduced five experimental groups in total (two acting on LGL-1 and three on CDC-42), comprising eight perturbed conditions and using wild-type as the reference.

      (13) Justification for Network Interactions and Protein Complexes: The authors must provide clear justifications, supported by references, for each network interaction between nodes in the five-node model. Some of the activatory/inhibitory signals proposed lack empirical validation, such as CDC-42 directly inhibiting CHIN-1. The provided Table S2 is insufficient to justify these interactions, necessitating additional explanations. Reviewing relevant literature, such as the work by Gubieda et al., Phil. Trans. R. Soc. B 2020, may offer insights into similar node networks. Furthermore, the authors should address discrepancies in protein intensity distributions, as observed in studies like Wang et al., Nat Cell Biol 2017. Specifically, the authors consider the PAR-3/PAR-6/PKC-3 complex as a single entity despite potential differences in their distributions. Justification for this choice is essential, particularly considering the importance of clustering dynamics during cell polarisation, as demonstrated by Wang et al., Nat Cell Biol 2017, and Dawes & Munro, Biophys J 2011.

      We sincerely thank the editor(s) and referee(s) for the helpful advice!

      In consistent with previous modeling efforts [Goehring et al., Science, 2011; Gross et al., Nat. Phys., 2019; Lim et al., Cell Rep., 2021], our model treats the PAR-3/PAR-6/PKC-3 complex as a single entity for simplification, thus neglecting the potentially distinct spatial distributions of each single molecular species. Besides, the inhibition of CHIN-1 from CDC-42, which recruits cytoplasmic PAR-6/PKC-3 to form a complex, may act indirectly to restrict CHIN-1 localization through phosphorylation [Sailer et al., Dev. Cell, 2015; Lang et al., Development, 2017].

      Now we acknowledge the limitations of the current cell polarization model and provided, in 3. Discussion and conclusion, a detailed outline of potential model improvements. The limitations include, but are not limited to, issues involving “each network interaction between nodes in the five-node model” and “discrepancies in protein intensity distributions”, both of which rely on experimental measurements of biological information. However, comprehensive experimental measurement data on every molecular species, their interactions, and each species’ intensity distribution in space and time were not fully available from prior research. Refinement is lacking for some of these interactions, potentially requiring years of additional experimentation. Moreover, for certain species at specific developmental stages, only relative (rather than absolute) intensity measurements are available. We agreed that such information is essential for establishing a more utilizable model and discussed it thoroughly in 3. Discussion and conclusion. From a theoretical perspective, we adopted assumptions from the previous literature and constructed a minimal model for a specific cell polarization phase to investigate the network's robustness, supported by five experimental groups and eight perturbed conditions with wild-type as a reference in the C. elegans embryo.

      (14) Incorporating Cytoplasmic Dynamics into the Model: The authors assume infinite cytoplasmic diffusion and neglect the role of cytoplasmic flows in cell polarity, which may oversimplify the model. Finite cytoplasmic diffusion combined with flows could potentially compromise the stability of anterior-posterior molecular distributions, affecting the accuracy of the model's predictions. The authors claim a significant difference between cytoplasmic and membrane diffusion coefficients, but the actual disparity seems smaller based on data from Petrášek et al., Biophys. J. 2008. For example, cytosolic diffusion coefficients for NMY-2 and PAR-2 differ by less than one order of magnitude. Additionally, the strength of cytoplasmic flows, as quantified by studies such as Cheeks et al., and Curr Biol 2004, should be considered when assessing the impact of cytoplasmic dynamics on polarity stability. Incorporating finite cytoplasmic diffusion and cytoplasmic flows into the model could provide a more realistic representation of cellular dynamics and enhance the model's predictive power.

      We sincerely thank the editor(s) and referee(s) for the comment!

      Cytoplasmic and membrane diffusion coefficients differ by two orders of magnitude according to previous experimental measurements on PAR-2 and PAR-6 [Goehring et al., J. Cell Biol., 2011; Lim et al., Cell Rep., 2021]. Many previous C. elegans cell polarization models have incorporated mass-conservation model combined with finite cytoplasmic diffusion, but this model description can lead to reverse spatial concentration distribution between the cell membrane and cytosol [Fig. 3 of Seirin-Lee et al., J. Theor. Biol., 2016; Fig. 2ab of Seirin-Lee et al., J. Math. Biol., 2020], disobeying experimental observation [Fig. 4A of Sailer et al., Dev. Cell, 2015; Fig. 1A of Lim et al., Cell Rep., 2021]. This implies that the infinite cytoplasmic diffusion, without precise experiment-based parameter assignment or accounting for other hidden biological processes (e.g., protein production and degradation), may be inappropriate in modeling the real spatial concentration distributions distinguished between the cell membrane and cytosol. To address this issue, some theoretical research incorporated protein production and degradation into their model, to acquire the consistent spatial concentration distribution between the cell membrane and cytosol [Tostevin et al., Biophys. J., 2008]. More definitive experimental data on the spatiotemporal changes in protein diffusion, production, and degradation are essential for providing a more realistic representation of cellular dynamics and enhancing the model's predictive power.

      Cytoplasmic flows indeed play an unneglectable role in cell polarity during the establishment phase [Kravtsova et al., Bull. Math. Biol., 2014], which creates a spatial gradient of actomyosin contractility and directs PAR-3/PKC-3/PAR-6 to the anterior membrane by cortical flow [Rose et al., WormBook, 2014; Lang et al., Development, 2017]. However, during the maintenance phase, the non-muscle myosin II (NMY-2) is regulated downstream by the PAR protein network rather than serving as the primary upstream factor controlling PAR protein localization [Goehring et al., J. Cell Biol., 2011; Rose et al., WormBook, 2014; Geβele et al., Nat. Commun., 2020]. While some theoretical studies integrated both reaction-diffusion dynamics and the effects of myosin and actin [Tostevin, 2008; Goehring, Science, 2011], others focused exclusively on reaction-diffusion dynamics [Dawes et al., Biophys. J., 2011; Seirin-Lee et al., Cells, 2020]. We now emphasize our research focus on the reaction-diffusion dynamics during the maintenance phase, so the dynamics between NMY-2 and PAR-2 are not included. We have also provided a discussion of the simplified cytoplasmic diffusion and omitted cytoplasmic flows to foster a more comprehensive understanding in the future.

      (15) Explanation of Lethality References: On page 13, the authors mention lethality without adequately explaining why they are drawing connections with lethality experimental data.

      We sincerely thank the editor(s) and referee(s) for the comment!

      It is well-known that cell polarity loss in C. elegans zygote will lead to symmetric cell division, which brings out the more symmetric allocation of molecular-to-cellular contents in daughter cells; this will result in abnormal cell size, cell cycle length, and cell fate in daughter cells, followed by embryo lethality [Beatty et al., Development, 2010; Beatty et al., Development, 2013; Rodriguez et al., Dev. Cell, 2017; Jankele et al., eLife, 2021]. Now we explain why we are drawing connections with lethality experimental data in Section 2.5.

      (16) Improved Abstract: "...However, polarity can be restored through a combination of two modifications that have opposing effects..." This sentence could be revised for better clarity. For example, the authors could consider rephrasing it as follows: "...However, polarity restoration can be achieved by combining two modifications with opposing effects...".

      We sincerely thank the editor(s) and referee(s) for helpful advice!

      Now we revise the abstract as follows:

      “Abstract – However, polarity restoration can be achieved by combining two modifications with opposing effects.”

      (17) Conservation of Mass in Network Models: Is conservation of mass satisfied in their network models?

      We sincerely thank the editor (s) and referee(s) for the comment!

      While previous experiments provide evidence for near-constant protein mass during the establishment phase [Goehring et al., Science, 2011], whether this is consistent until the end of maintenance is unclear.

      Many previous C. elegans cell polarization models have assumed mass conservation on the cell membrane and in the cell cytosol, this model description can lead to reverse spatial concentration distribution between the cell membrane and cytosol [Fig. 3 of Seirin-Lee et al., J. Theor. Biol., 2016; Fig. 2ab of Seirin-Lee et al., J. Math. Biol., 2020], disobeying experimental observation [Fig. 4A of Sailer et al., Dev. Cell, 2015; Fig. 1A of Lim et al., Cell Rep., 2021]. This implies that mass conservation may be inappropriate in modeling the real spatial concentration distributions distinguished between the cell membrane and cytosol. To address this issue, some theoretical research incorporated protein production and degradation into their model, instead of assuming mass conservation [Tostevin et al., Biophys. J., 2008]. More definitive experimental data on the spatiotemporal changes in protein mass are essential for constructing a more accurate model.

      Given the absence of a universally accepted model in agreement with experimental observation, we adopted the assumption that the concentration of molecules in the cytosol (not the total mass on the cell membrane and in the cell cytosol) is spatially inhomogeneous and temporally constant, which was also used before [Kravtsova et al., Bull. Math. Biol., 2014]. In the context of this well-mixed constant cytoplasmic concentration, our model successfully reproduced the cell polarization phenotype in wild-type and eight perturbed conditions (Section 2.5; Fig. S7; Fig. S8), supporting the validity of this simplified, yet effective, model. Now we have provided a discussion of protein mass assumption to foster a more comprehensive understanding in the future.

      (18) Comparison of Network Structures: In Figure 1c, the authors demonstrate that the symmetric two-node network is susceptible to single-sided additional regulation. They considered four subtypes of modifications, depending on whether [L] is in the anterior or posterior and whether [A] and [L] are mutually activating or inhibiting. What is the difference between the structure where [L] is in the anterior and in the posterior? Upon comparing the time evolution of the left panel ([L] is sided with

      ) and the right panel ([L] is sided with [A]), the difference is so tiny that they are almost indistinguishable. It might be beneficial for the authors to provide a clearer explanation of the differences between these network structures to aid in understanding their implications.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      The difference between the structures where [L] is in the anterior and posterior is the initial spatial concentration distribution of [L], which is polarized to have a higher concentration in the anterior and posterior respectively. The time evolution of the left panel ([L] is sided with [P]) and the right panel [L] is sided with [P]) is almost indistinguishable because the perturbation from [L] is slight (less than over one order of magnitude) compared to the predominant [A]~[P] interaction ( for [A]~[P] mutual inhibition while for [A]~[L] mutual inhibition and for [A]~[L] mutual activation), highlighting the response of cell polarization pattern. To aid the readers in understanding their implications, we have added the [L] and plotted the spatial concentration distribution of all three molecular species at t=0,100, 200, 300, 400 and 500 in Fig. S3, where the difference between the [L] ones in the left and right panels are distinguishably shown.

      (19) Figure Reference: In line 308, Fig. 4a is referenced when explaining the loss of pattern stability by modifying an individual parameter, but this is not shown in that panel. Please update the panel or adjust the reference in the main text.

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Fig. 4 focuses on the regulatable shift of the zero-velocity interface by modifying a pair of individual parameters, not on the loss (or recovery) of pattern stability, which has been analyzed as a focus in Fig. 1, Fig. 2, and Fig. 3. Fig. 4a is actually from the same simulation as the one in Fig. 1a, which has spatially uniform parameters used as a reference in Fig. 4. The individual parameter modification in other subfigures of Fig. 4 shows how the zero-velocity interface is shifted in a regulatable manner always in the context of pattern stability. Now we update the panel, adjust the reference, add one more paragraph, and improve the wording to clarify how the analyses in Fig. 4 are carried out on top of the pattern stability already studied.

      (20) Viable Parameter Sets: In line 355, the number of viable parameter sets (602) is not very informative by itself. We suggest reporting the fraction or percentage of sets tested that resulted in viable results instead. This applies similarly to lines 411 and 468.

      We sincerely thank the editor(s) and referee(s) for the constructive comment!

      Now the fraction/percentage of parameter sets tested that resulted in viable results are added everywhere the number appears.

      (21) Perturbation Experiments: In lines 358-359, "the perturbation experiments" implies that those considered are the only possible ones. Please rephrase to clarify.

      We sincerely thank the editor(s) and referee(s) for the helpful advice!

      Now we rephrase three paragraphs to clarify why the perturbation experiments involved with [L] and [C] are considered instead of other possible ones.

      (22) Figure 2S: This figure is unclear. The caption states that panel (a) shows the "final concentration distribution," but only a line is shown. If "distribution" refers to spatial distribution, please clarify which parameters are shown.

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Now we clarify the “spatial concentration distribution” and which parameters are shown in the figure caption.

      (23) Figure 5 and 6 Captions: The captions for Figures 5 and 6 could benefit from clarification for better understanding.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we clarify the details in the captions of Fig. 5 and Fig. 6 for better understanding.

      (24) Figure 5 Legend: The legend on the bottom right corner of Figure 5 is unclear. Please specify to which panel it refers.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we clarify to which the legend on the bottom right corner of Fig. 5 refers.

      (25) L and A~C Interactions: In paragraphs 405-418, please explain why the L and A~C interactions are removed for the comparison instead of others.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we add a separate paragraph and a supplemental figure to explain why the L and A~C interactions are removed for the comparison instead of others.

      (26) Network Structures in Figure S3: From the "34 possible network structures" considered in Figure S3 (lines 440-441), why are the "null cases" (L disconnected from the network) relevant? Shouldn't only 32 networks be considered?

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Now the two “null cases” are removed:

      (27) Figure S3 Caption: The caption must state that the position of the nodes (left or right) implies the polarisation pattern. Additionally, with the current size of the figure, the dashed lines are extremely hard to differentiate from the continuous lines.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we state that the position of the nodes (left or right) implies the polarization pattern. Additionally, we have modified the figure size and dashed lines so that the dash lines are adequately distinguishable from the continuous lines.

      (28) Equation #7: It is confusing to use P as the number of independent simulations when P is also one of the variables/species in the network. Please consider using different notation.

      We sincerely thank the editor(s) and refer(s) for the hhelpful advice!

      Now we replace the P in current Equation #8 with Q and the P in current Equation #10 with W.

      (29) Use of "Detailed Balance": The authors used the term "detailed balance" to describe the intricate balance between the two groups of proteins when forming a polarised pattern. However, "detailed balance" is a term with a specific meaning in thermodynamics. Breaking detailed balance is a feature of nonequilibrium systems, and the polarisation phenomenon is evidently a nonequilibrium process. Using the term "detailed balance" may cause confusion, especially for readers with a physics background. It might be advisable to reconsider the terminology to avoid potential confusion and ensure clarity for readers.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      To avoid potential confusion and ensure clarity for readers, now we replace “detailed balance” with “balance”, “required balance”, or “interplay” regarding different contexts.

      (30) Terminology: The word "molecule" is used where "molecular species" would be more appropriate, e.g., lines 456 and 551. Please revise these instances.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we replace all the “molecule” by “molecular species” as suggested.

      (31) Section 2.5: This section is confusing. It isn't clear where the "method outlined" (line 464) is nor what "span an iso-velocity surface at vanishing speed" means in line 470. The sentence in lines 486-488, "An expression similar to Eq. 8 enables quantitative prediction...", is too vague. Please clarify these points and specify what the "similar expression" is and where it can be found.

      We sincerely thank the editor(s) and referee(s) for the constructive suggestion!

      Now we clarify these points and specify the terms as suggested.

      (32) Software Mention: The software is only mentioned in the abstract and conclusions. It should also be mentioned where the computational pipeline is described, and the instructions available in the supplementary information need to be referenced in the main text.

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Now we mention the software where the computational pipeline is described and reference the instructions available in the Supplemental Text.

      (33) Supplementary Material References: Several parts of the supplementary material are never referenced in the main text, including Figure S1, Movies S3-S4, and the Instructions for PolarSim. Please reference these in the main text to clarify their relevance and how they fit with the manuscript's narrative.

      We sincerely thank the editor(s) and referee(s) for pointing out this problem!

      Now we add all the missing references for supplementary materials to the main text properly.

    1. eLife Assessment

      This important study uses the delay line axon model in the chick brainstem auditory circuit to examine the interactions between oligodendrocytes and axons in the formation of internodal distances. This is a significant and actively studied topic, and the authors have used this preparation to support the hypothesis that regional heterogeneity in oligodendrocytes underlies the observed variation in internodal length. In a solid series of experiments, the authors have used enhanced tetanus neurotoxin light chains, a genetically encoded silencing tool, to inhibit vesicular release from axons and support the hypothesis that regional heterogeneity among oligodendrocytes may underlie the biased nodal spacing pattern in the sound localization circuit.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Egawa and colleagues investigates differences in nodal spacing in an avian auditory brain stem circuit. The results are clearly presented and data are of very high quality. The authors make two main conclusions:

      (1) Node spacing, i.e. internodal length, is intrinsically specified by the oligodendrocytes in the region they are found in, rather than axonal properties (branching or diameter).

      (2) Activity is necessary (we don't know what kind of signaling) for normal numbers of oligodendrocytes and therefore the extent of myelination.

      These are interesting observations, albeit phenomenon. I have only a few criticisms that should be addressed:

      (1) The use of the term 'distribution' when describing the location of nodes is confusing. I think the authors mean rather than the patterns of nodal distribution, the pattern of nodal spacing. They have investigated spacing along the axon. I encourage the authors to substitute node spacing or internodal length for node distribution.

      (2) In Seidl et al. (J Neurosci 2010) it was reported that axon diameter and internodal length (nodal spacing) were different for regions of the circuit. Can the authors help me better understand the difference between the Seidl results and those presented here?

      (3) The authors looked only in very young animals - are the results reported here applicable only to development, or does additional refinement take place with aging?

      (4) The fact that internodal length is specified by the oligodendrocyte suggests that activity may not modify the location of nodes of Ranvier - although again, the authors have only looked during early development. This is quite different than this reviewer's original thoughts - that activity altered internodal length and axon diameter. Thus, the results here argue against node plasticity. The authors may choose to highlight this point or argue for or against it based on results in adult birds?:

      Significance:

      This paper may argue against node plasticity as a mechanism for tuning of neural circuits. Myelin plasticity is a very hot topic right now and node plasticity reflects myelin plasticity. this seems to be a circuit where perhaps plasticity is NOT occurring. That would be interesting to test directly. One limitation is that this is limited to development.

    3. Reviewer #2 (Public review):

      Summary:

      Egawa et al describe the developmental timeline of the assembly of nodes of Ranvier in the chick brainstem auditory circuit. In this unique system, the spacing between nodes varies significantly in different regions of the same axon from early stages, which the authors suggest is critical for accurate sound localization. Egawa et al set out to determine which factors regulate this differential node spacing. They do this by using immunohistological analyses to test the correlation of node spacing with morphological properties of the axons, and properties of oligodendrocytes, glial cells that wrap axons with the myelin sheaths that flank the nodes of Ranvier. They find that axonal structure does not vary significantly, but that oligodendrocyte density and morphology varies in the different regions traversed by these axons, which suggests this is a key determinant of the region-specific differences in node density and myelin sheath length. They also find that differential oligodendrocyte density is partly determined by secreted neuronal signals, as (presumed) blockage of vesicle fusion with tetanus toxin reduced oligodendrocyte density in the region where it is normally higher. Based on these findings, the authors propose that oligodendrocyte morphology, myelin sheath length, and consequently nodal distribution are primarily determined by intrinsic oligodendrocyte properties rather than neuronal factors such as activity.

      Major comments:

      (1) It is essential that the authors validate the efficiency of TeNT to prove that vesicular release is indeed inhibited, to be able to make any claims about the effect of vesicular release on oligodendrogenesis/myelination.

      (2) Related to 1, can the authors clarify if their TeNT expression system results in the whole tract being silenced? It appears from Fig. 6 that their approach leads to sparse expression of TeNT in individual neurons, which enables them to measure myelination parameters. Can the authors discuss how silencing a single axon can lead to a regional effect in oligodendrocyte number?

      (3) The authors need to fully revise their statistical analyses throughout and supply additional information that is needed to assess if their analyses are adequate:<br /> (3.1) the authors use a variety of statistical tests and it is not always obvious why they chose a particular test. For example, in Fig. 2G they chose a Kruskal-Wallis test instead of a two-way ANOVA or Mann-Whitney U test, which are much more common in the field. What is the rationale for the test choice?<br /> (3.2) in some cases, the choice of test appears wholly inappropriate. For example, in Fig. 3H-K, an unpaired t-test is inappropriate if the two regions were analysed in the same samples. In Fig. 5, was a t-test used for comparisons between multiple groups in the same dataset? If so, an ANOVA may be more appropriate.<br /> (3.3) in some cases, the authors do not mention which test was used (Fig 3: E-G no test indicated, despite asterisks; G/L/M - which regression test that was used? What does r indicate?)<br /> (3.4) more concerningly, throughout the results, data may have been pseudo-replicated. t-tests and ANOVAs assume that each observation in a dataset is independent of the other observations. In figures 1-4 and 6 there is a very large "n" number, but the authors do not indicate what this corresponds to. This leaves it open to interpretation, and the large values suggest that the number of nodes, internodal segments, or cells may have been used. These are not independent experimental units, and should be averaged per independent biological replicate - i.e. per animal (N).<br /> (3.5) related to the pseudo-replication issue, can the authors include individual datapoints in graphs for full transparency, per biological replicates, in addition or in alternative to bar-graphs (e.g. Fig. 5 and 6).

      (4) The main finding of the study is that the density of nodes differs between two regions of the chicken auditory circuit, probably due to morphological differences in the respective oligodendrocytes. Can the authors discuss if this finding is likely to be specific to the bird auditory circuit?

      (5) Provided the authors amend their statistical analyses, and assuming significant differences remain as shown, the study shows a correlation (but not causation) between node spacing and oligodendrocyte density, but the authors did not manipulate oligodendrocyte density per se (i.e. cell-autonomously). Therefore, the authors should either include such experiments, or revise some of their phrasing to soften their claims and conclusions. For example, the word "determine" in the title could be replaced by "correlate with" for a more accurate representation of the work. Similar sentences throughout the main text should be amended.

      (6) The authors fail to introduce, or discuss, very pertinent prior studies, in particular to contextualize their findings with:<br /> (6.1) known neuron-autonomous modes of node formation prior to myelination, e.g. Zonta et al (PMID 18573915); Vagionitis et al (PMID 35172135); Freeman et al (PMID 25561543)<br /> (6.2) known effects of vesicular fusion directly on myelinating capacity and oligodendrogenesis, e.g. Mensch et al (PMID 25849985)<br /> (6.3) known correlation of myelin length and thickness with axonal diameter, e.g. Murray & Blakemore (PMID 7012280); Ibrahim et al (PMID 8583214); Hildebrand et al (PMID 8441812).<br /> (6.4) regional heterogeneity in the oligodendrocyte transcriptome (page 9, studies summarized in PMID 36313617)

      Significance:

      In our view the study tackles a fundamental question likely to be of interest to a specialized audience of cellular neuroscientists. This descriptive study is suggestive that in the studied system, oligodendrocyte density determines the spacing between nodes of Ranvier, but further manipulations of oligodendrocyte density per se are needed to test this convincingly.

    4. Reviewer #3 (Public review):

      Summary:

      The authors have investigated the myelination pattern along the axons of chick avian cochlear nucleus. It has already been shown that there are regional differences in the internodal length of axons in the nucleus magnocellularis. In the tract region across the midline, internodes are longer than in the nucleus laminaris region. Here the authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons. However, the demonstration falls rather short of being convincing.

      Major comments:

      (1) The authors neglect the possibility that nodal cluster may be formed prior to myelin deposition. They have investigated stages E12 (no nodal clusters) and E15 (nodal cluster plus MAG+ myelin). Fig. 1D is of dubious quality. It would be important to investigate stages between E12 and E15 to observe the formation of pre-nodes, i.e., clustering of nodal components prior to myelin deposition.

      (2) The claim that axonal diameter is constant along the axonal length need to be demonstrated at the EM level. This would also allow to measure possible regional differences in the thickness of the myelin sheath and number of myelin wraps.

      (3) The observation that internodal length differs is explain by heterogeneity of sources of oligodendrocyte is not convincing. Oligodendrocytes a priori from the same origin remyelinate shorter internode after a demyelination event.

      Significance:

      The authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons.

    5. Author response:

      General Statements

      We sincerely appreciate the constructive comments from the reviewers, which have significantly enhanced the clarity and rigor of our manuscript. Most of their suggestions have already been incorporated into the revised version. Additionally, we are conducting an additional experiment to further substantiate our conclusions, and preliminary data seem to support our findings.

      As pointed out by Reviewer #1, the regulation of neural circuit function by oligodendrocytes is currently a highly significant and actively studied topic. Our study demonstrates that regional heterogeneity in oligodendrocytes underlies the microsecond-level computational processes in the sound localization circuit. We believe this work represents a substantial contribution to the field.

      Description of the planned revisions

      • Evaluation of node formation along axons sparsely expressing eTeNT (related to Reviewer #2: comment 1)

      Based on the approximately 90% expression efficiency of A3V-eTeNT in NM neurons, we interpreted that vesicular release from NM axons was largely inhibited in the NL region, leading to the suppression of oligodendrogenesis and the subsequent emergence of unmyelinated segments. However, the effects of eTeNT on myelination are likely diverse, and a possibility remains that eTeNT directly disrupted axon-oligodendrocyte interactions, preventing oligodendrocytes from myelinating the axons expressing eTeNT.

      To test this possibility, we have initiated an additional experiment to evaluate formation of nodes along axons, while expressing eTeNT sparsely by electroporation. Preliminary results indicated that unmyelinated segments did not increase, supporting our original conclusion. After completion of the experiment, we will include the findings as a Supplementary Figure associated with Figure 6, which will provide a clearer understanding of how eTeNT influences myelination.

      Description of the revisions that have already been incorporated in the transferred manuscript

      • Revised terminology from "nodal distribution" to "nodal spacing" throughout the manuscript. (Reviewer #1: comment 1)

      • Emphasized that our analyses were focused on the main trunk of NM axons (Reviewer #1: comment 2) We explicitly stated throughout the manuscript that we analyzed the main trunk of NM axons and made it clear that our findings do not contradict those by Seidl et al. (J Neurosci 2010), showing the similar axon diameter between midline and ventral NL regions (page 7, line 7).

      • Added an explanation on the maturation of sound localization circuit (Reviewer #1: comment 3) We explained that chickens have high ability of sound localization at hatch, emphasizing that the sound localization circuit is almost fully developed by E21 (page 4, line 12).

      • Emphasized the diverse effects of neuronal activity on oligodendrocytes (page 10, line 18) (Reviewer #1: comment 4)

      • Added details on the efficiency of A3V-eTeNT expression in NM neurons to the Results section (page 8, line 5) (Reviewer #2: comment 1)  

      • Made it clear in Figure Legend for Figure 6D that the analysis was conducted under the condition, where most of the axons were labeled by A3V-eTeNT (page 31, line 9) (Reviewer #2: comment 2)

      • Clarified the rationale for statistical test selection (Reviewer #2: comment 3.1)

      • Reanalyzed all statistical data with appropriate methods using R (Reviewer #2: comment 3.2)

      • Clearly indicated which statistical tests were used in each figure (Reviewer #2: comment 3.3)

      • Clarified what n represents and N used in each experiment (Reviewer #2: comment 3.4)

      • Added individual data points to bar graphs in Figure  5 and 6 (Reviewer #2: comment 3.5)

      • Emphasized the importance of comparing the ITD circuit with that of rodents (page 11, line 32) (Reviewer #2: comment 4) 

      • Softened the expressions related to "determine" (Reviewer #2: comment 5)

      Our study demonstrates that regional differences in the intrinsic properties of oligodendrocytes are the prominent determinant of nodal spacing patterns. However, we acknowledge that this does not establish a direct causation. Accordingly, relevant expressions have been revised throughout the manuscript.

      • Added references (Reviewer #2: comment 6)

      • Corrected units in Figure 1G (Reviewer #2: comment 7)

      • Added discussion about the involvement of pre-nodal clusters in the regional differences in nodal spacing (page 9, line 35) (Reviewer #3: comment 1).

      Related to this issue, we have added new data to Figure 6I.

      • Discussed the possibility that the developmental origin and/or the pericellular microenvironment of OPCs contributed to the regional heterogeneity of oligodendrocytes (page 9, line 21) (Reviewer #3: comment 3).

      • Added references used in the response to reviewers into the main text.

      • Corrected the data error in Figure 6G, H

      • Corrected the dataset in Figure 3E

      We limited the data in Figure 3E–G to those measuring both myelin length and diameter simultaneously.

      Description of analyses that authors prefer not to carry out

      • Analysis in adult chickens (Reviewer #1: comment 3,4)

      The chick brainstem auditory circuit is nearly fully developed by E21, and we have also demonstrated that nodal spacing increases by approximately 20% while maintaining regional differences up to P9. Therefore, our study covers the period from pre-myelination to postfunctional maturation, and we think that the necessity of analyzing aged animals is small.

      • Functional evaluation of the efficiency of eTeNT suppression (Reviewer #2: comment 1)

      It is technically challenging to quantitatively assess the inhibition of vesicular release by eTeNT in NM axons given that multiple synapses from different NM axons converge onto postsynaptic neurons. In addition, previous studies have already validated the efficacy of this construct in multiple species. Therefore, we will not evaluate electrophysiologically the extent of vesicular release inhibition by eTeNT in this study. Instead, we have provided clear evidence that A3V-eTeNT is expressed efficiently and leads to notable phenotypic changes, such as the inhibition of oligodendrogenesis. (page 8, line 5).

      • Replacing figures with data averaged per animal (Reviewer #2: comment 3.4)

      Our study focuses on the distribution of morphological characteristics at the single-cell level rather than solely on group means. Averaging measurements per animal could obscure this cellular heterogeneity and potentially misrepresent our findings. Given that data distributions in our plots show clear distinctions, we believe that averaging per biological replicate is not essential in this case. If requested, we will be happy to provide the outputs of PlotsOfDifferences as supplementary source data files, similar to those used in eLife publications, for each figure.

      • Additional experiments to manipulate oligodendrocyte density (Reviewer #2: comment 5)

      We have already demonstrated that A3V-eTeNT reduces oligodendrocyte density in the NL region, and some of the arguments in our study are based on this result. Therefore, we think that further experiments are not necessary.

      • Verification of the presence of pre-nodal clusters (Reviewer #3: comment 1)

      We investigated the presence of pre-nodal clusters on NM axons, but we could not identify them in the immunohistochemistry of AnkG. As the occurrence of pre-nodal clusters varies depending on neuronal type, we consider that pre-nodal clusters are not prominent in the NM axons and that further experimental validation would not be necessary. Instead, we have added a discussion on the possibility that pre-nodal clusters contribute to regional differences in nodal spacing along NM axons (page 9, line 35).

      • Axon diameter measurements using EM (Reviewer #3: comment 2)

      This experiment was already done by Seidl et al. (2010), and hence, we do not think it necessary to repeat it. We believe that the relative differences in axon diameter between the regions could be adequately assessed using the optical approach with membrane-targeted GFP.

    1. eLife Assessment

      The authors use a multidisciplinary approach to provide a valuable link between Beta-alanine and S. Typhimurium (STM) infection and virulence. The work shows how Beta-alanine synthesis mediates zinc homeostasis regulation, possibly contributing to virulence. The work is convincing as it adds to the existing knowledge of metabolic flexibility displayed by STM during infection. However, the authors need to address some lingering concerns.

    2. Reviewer #1 (Public review):

      Summary:

      Ma & Yang et al. report a new investigation aimed at elucidating one of the key nutrients S. Typhimurium (STM) utilizes with the nutrient-poor intracellular niche within macrophage, focusing on the amino acid beta-alanine. From these data, the authors report that beta-alanine plays important roles in mediating STM infection and virulence. The authors employ a multidisciplinary approach that includes some mouse studies, and ultimately propose a mechanism by which panD, involved in B-Ala synthesis, mediates regulation of zinc homeostatisis in Salmonella.

      Strengths and weaknesses:

      The results and model are adequately supported by the authors' data. Further work will need to be performed to learn whether the Zn2+ functions as proposed in their mechanism. By performing a small set of confirmatory experiments in S. Typhi, the authors provide some evidence of relevance to human infections.

      Impact:

      This work adds to the body of literature on the metabolic flexibility of Salmonella during infection that enable pathogenesis.

    3. Reviewer #3 (Public review):

      Summary:

      Salmonella is interesting due to its life within a compact compartment, which we call SCV or Salmonella containing vacuole in the field of Salmonella. SCV is a tight-fitting vacuole where the acquisition of nutrients is a key factor by Salmonella. The authors among many nutrients, focussed on beta-alanine. It is also known that Salmonella requires beta-alanine from many other studies. The authors have done in vitro RAW macrophage infection assays and In vivo mouse infection assays to see the life of Salmonella in the presence of beta-alanine. They concluded by comprehending that beta-alanine modulates the expression of many genes including zinc transporters which is required for pathogenesis.

      Strengths:

      Made a couple of knockouts in Salmonella and did transcriptomic to understand the global gene expression pattern

      Weaknesses:

      Transport of Beta-alanine to SCV is not yet elucidated. Is it possible to determine whether the Zn transporter is involved in B-alanine transport?

      Beta-alanine can also be shuttled to form carnosine along with histidine. If beta-alanine is channelled to make more carnosine, then the virulence phenotypes may be very different.

      Some amino acid transporters can be knocked out to see if beta-alanine uptake is perturbed. Like ArgT transport Arginine, and its mutation perturbs the uptake of beta-alanine. What is the beta-alanine concentration in the SCV? SCVS can be purified at different time points, and the Beta-alanine concentration can be measured

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ma, Yang et al. report a new investigation aimed at elucidating one of the key nutrients S. Typhimurium (STM) utilizes with the nutrient-poor intracellular niche within the macrophage, focusing on the amino acid beta-alanine. From these data, the authors report that beta-alanine plays an important role in mediating STM infection and virulence. The authors employ a multidisciplinary approach that includes some mouse studies and ultimately propose a mechanism by which panD, involved in B-Ala synthesis, mediates the regulation of zinc homeostasis in Salmonella. The impact of this work is questionable. There are already many studies reporting Salmonella-effector interactions, and while this adds to that knowledge it is not a significant advance over previous studies. While the authors are investigating an interesting question, the work has two important weaknesses; if addressed, the conclusions of this work and broader relevance to bacterial pathogenesis would be enhanced.

      Strengths:

      This reviewer appreciates the multidisciplinary nature of the work. The overall presentation of the figure graphics are clear and organized.

      Weaknesses:

      First, this study is very light on mechanistic investigations, even though a mechanism is proposed. Zinc homeostasis in cells, and roles in bacteria infections, are complex processes with many players. The authors have not thoroughly investigated the mechanisms underlying the roles of B-Ala and panD in impacting STM infection such that other factors cannot be ruled out. Defining the cellular content of Zn2+ STM in vivo would be one such route. With further mechanistic studies, the possibility cannot be ruled out that the authors have simply deleted two important genes and seen an infection defect - this may not relate directly to Zn2+ acquisition.

      Thank you for your patient and thoughtful reading, as well as the constructive comments and advice regarding our manuscript. We have revised the manuscript based on your comments and suggestions.

      You are correct that this work has not thoroughly investigated the mechanisms underlying the roles of β-alanine, panD, and zinc in impacting Salmonella infection. It is challenging to isolate sufficient amounts of Salmonella from infected cells or tissues and then measure the zinc concentration in the bacteria, and we have attempted to do so without success. Therefore, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (Δ_panD_), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. This information has been added to the revised manuscript (lines 325-329, 344-348).

      Meanwhile, we concur that additional, unknown mechanisms are involved in the virulence regulation by β-alanine in Salmonella. Our findings indicate that the double mutant Δ_panD_Δ_znuA_, which cannot synthesize β-alanine nor uptake zinc, is more attenuated than the single mutant Δ_znuA_ (Figures 5D and 6B). This suggests that the contribution of β-alanine to Salmonella's virulence is partially dependent on zinc acquisition. We have revised the related descriptions throughout the manuscript for clarity (lines 31, 304, 341,1056, 1068).

      Second, the authors hint at their newly described mechanism/pathway being important for disease and possibly a target for therapeutics. This claim is not justified given that they have employed a single STM strain, which was isolated from chickens and is not even a clinical isolate. The authors could enhance the impact of their findings and relevance to human disease by demonstrating it occurs in human clinical isolates and possibly other serovars. Further, the use of mouse macrophage as a model, and mice, have limited translatability to human STM infections.

      We thank you for your comments and advice on our manuscript and are delighted to accept them. Salmonella Typhimurium causes systemic disease in mice, which is similar to the symptoms of typhoid fever in humans and has been widely used to explore the pathogenesis of Salmonella. Based on your comment, we have now performed additional experiments to confirm several key points of our findings in another typical Salmonella serovar, Salmonella enterica serovar Typhi, which is a human-limited serovar and the cause of typhoid fever in humans (PLoS Pathog. 2012, 8(10):e1002933).

      We constructed the panD mutant strain (ΔpanD) in the S. Typhi strain Ty2 and  subsequently compared the replication of ΔpanD with that of the Ty2 wild-type in the human THP-1 monocyte like cell line (ATCC TIB-22) using gentamicin protection assays. The results showed that the replication of ΔpanD in THP-1cells was reduced by 2.6-fold at 20 h post-infection compared to the Ty2 wild-type strain  (P < 0.01) (Figure 2_figure Supplement 3), suggesting that panD also facilitates S. Typhi replication in human macrophages and may be involved in the systemic infection of S. Typhi in humans. This result has been included in the revised manuscript. (lines 203-210).

      Based on these results, we speculate that PanD may serve as a potential target for treating Salmonella infection.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28. Latin phrases like de novo should be italicized.

      Thank you for your careful review. We have revised the manuscript thoroughly (Lines 28, 65, 77, 106, 171, 173, 214, 1002, 1023, 1078).

      (2) Line 45. 'survival' typo.

      We have corrected it in the revised manuscript (Line 45).

      (3) Line 57. What evidence or prior work supports the SCV of macrophages in a nutrient-poor environment? Citation needed.

      The relevant reference has now been added (lines 62-63).

      (4) Lines 65-68. If an 'increasing number of studies have focused' on this topic, please cite them here.

      The relevant reference has now been added (lines 72-73).

      (5) Lines 69-71. Citations are needed for these claims.

      The relevant reference has now been added (lines 76-77, 79-80).

      (6) Line 76-77. Citation needed for this claim.

      The relevant reference has now been added (lines 84, 86).

      (7) Line 116-122, and Figure 1C, and Figure 1 legend. An important claim in this work is that the amino acid content of the macrophage cytoplasm is different +/- STM infection. The authors need to explain this result more carefully and define their acronyms. What is VIP, Log2 FC, etc.? What do the colors in Figure 1C mean? They are not defined. If possible, it would be more approachable to list these as molar concentrations, weight/cell, or number of molecules/cell. The authors should calculate an effect size for each of these data to help assess if the differences are meaningful. Without this information, and a clearer explanation of what these data are, it is difficult to evaluate the authors' claim that "8 [amino acids] showed significant differences in abundance."

      Thank you for the comment. The full names of VIP (Variable Importance in the Projection) and FC (fold change) have been included in the revised manuscript. In Figure 1C of the original manuscript, pink represents the content of amino acids that increased following Salmonella infection, whereas blue signifies the content of amino acids that decreased after Salmonella infection.

      Based on your suggestion, we have revised Figure 1C (now Figure 1C, D in the revised manuscript) and the content of amino acids is now expressed as weight per cell (ng/ 10<sup>7</sup> cells). The legend has been updated accordingly. (lines 9931-997).

      (8) Line 134-138. Additional controls are required for this experiment. By adding a nutrient (B-Ala) you have increased the nutrient availability and growth potential of the bacteria. This may not relate to anything special to B-Ala. Perhaps the addition of another amino acid, or sugar, would have a similar impact. Further, this result would be more compelling if the authors demonstrated a dose-dependent effect of B-Ala addition.

      Thank you for the comment. To further confirm that host-derived β-alanine can promote intracellular Salmonella replication, we have added varying concentrations of β-alanine (0.5, 1, 2, and 4 mM) to the culture medium (RPMI) of RAW264.7 cells. Subsequently, we infected these cells with Salmonella to assess the impact of β-alanine supplementation on the bacterium's replication within macrophages. Our observations indicate that the addition of 1, 2, and 4 mM β-alanine significantly (P < 0.001) enhanced Salmonella replication in RAW264.7 cells. Furthermore, the increase in Salmonella intracellular replication was dose-dependent, as illustrated in the revised Figure 1E. These findings suggest that host-derived β-alanine facilitates Salmonella replication inside macrophages. We have included these results in the revised manuscript (lines 141-149).

      (9) Lines 181-184, and Figure 2E. In addition to the fold-change replication data, here and elsewhere the authors should provide raw CFU counts for data transparency.

      Thank you for bringing this to our attention. In this work, we have utilized “fold intracellular replication (20 h intracellular bacterial CFU/ 2 h intracellular bacterial CFU)” to illustrate the differences in intracellular replication of different Salmonella strains in macrophages. The term “fold intracellular replication” is commonly employed in recently published reports (eg. FEMS Microbiol Lett. 2024, 9;371:fnae067; mBio. 2024, 15(7):e0112824; Front Microbiol. 2024, 14:1340143). To ensure data transparency, we have included the raw CFU counts in the source data file.

      (10) Line 197. Why employ i.p. injection of STM? As a non-typhoidal serovar, STM infection is enteric, and so i.p. injection seems very artificial if the goal is to understand the role B-Ala synthesis in disease.

      Thank you for the comment. Salmonella can induce gastroenteritis or systemic infection, which are associated with its capacity to invade intestinal epithelial cells and replicate within macrophages, respectively. In this study, using gentamicin protection assays and immunofluorescence analysis, we demonstrated that β-alanine is crucial for Salmonella replication inside macrophages. Since replication in macrophages is a key determinant of systemic Salmonella infection, we hypothesized that β-alanine also affects Salmonella systemic infection in vivo. Intraperitoneal (i.p.) injection enables Salmonella to disseminate directly to systemic sites via the lymphatic and bloodstream systems, bypassing the need for intestinal invasion (Microbiol Res. 2023, 275:127460; Int Immunopharmacol. 2016, 31:233-8). Thus, we conducted the mice infection assays via intraperitoneal (i.p.) injection to ascertain whether β-alanine affects systemic Salmonella infection. We have included the description in the revised manuscript to enhance clarity. (lines 217-221).

      Whether β-alanine influences Salmonella invasion of intestinal epithelial cells and intestinal colonization has not been investigated in this work; this issue will be explored in our future studies.

      (11) Line 207-214 and Figure 3. If the hypothesis is that B-Ala mediates STM survival/virulence through enhancing metabolism in the SCV and intracellular niche, why did the authors not investigate/enumerate STM in this niche in their in vivo studies?

      Thank you for the comment. Through immunofluorescence staining, we have investigated the bacterial count of Salmonella wild-type (WT), panD mutant (Δ_panD_), and complemented strain (cpanD) within the macrophages of the mouse liver. The findings indicated that the number of Δ_panD_ in each liver macrophage was significantly (P < 0.0001) lower than that of WT, and the complementation of Δ_panD_ increased the bacterial count in each liver macrophage to the level of WT (refer to Figure 3E in the revised manuscript). These results have been included in the revised manuscript. (lines 234-239).

      (12) Figure 4B - the down genes label is cut off.

      Thank you for your careful review. We have corrected it in the revised Figure 4B.

      (13) Line 260-265. SPI-2 needs to be defined and introduced, as do other terms here, to make the work approachable to non-STM specialists.

      The introduction of SPI-2 has been added to the revised manuscript. (Lines 290-292).

      (14) Line 300-301. Additional experiments are needed to support the claim that "data indicate that β-alanine promotes in vivo virulence of Salmonella, partially by increasing the expression of zinc transporter genes." Gene up- or down-regulation does not necessarily have any meaningful impact on function or activity. The authors here need an assay that confirms that the function of znuA is disrupted, such as examining the cell Zn2+ content in vivo at different levels of B-Ala exposure and/or panD activity. Moreover, more Zn2+ is not necessarily beneficial for STM, at levels too high zinc can exert cell toxicity. So, the authors have a correlation but no data supporting this mechanism explains their observations of virulence and infection. How much Zn2+ is ideal for STM growth?

      Thank you for the comment. It is challenging to isolate sufficient amounts of Salmonella from infected cells or tissues and then measure the zinc concentration in the bacteria, and we have attempted to do so without success. Therefore, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (ΔpanD), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. This information has been added to the revised manuscript (lines 325-329, 344-348).

      Zinc is essential for bacterial survival and growth, as zinc-binding proteins constitute approximately 5% of the bacterial proteome and play crucial roles in bacterial metabolism and growth (J Proteome Res. 2006, 5(11):3173-8; Future Med Chem. 2017, 9(9):899-910). Regarding Salmonella, zinc is also employed to undermine the antimicrobial host defense mechanisms of macrophages, by inhibiting NF-кB activation and impairing NF-кB-dependent bacterial clearance (J Biol Chem. 2018, 293(39):15316-15329; Infect Immun. 2017, 85(12):e00418-17). Thus, the efficient acquisition of zinc may play a crucial role in the survival and replication of Salmonella within macrophages, where zinc availability is extremely limited (Infect Immun. 2007, 75(12):5867-76; Biochim Biophys Acta. 2016, 1860(3):534-41). It has been reported that Salmonella utilizes the high-affinity ZnuABC zinc transporter to maximize zinc availability within host cells (Infect Immun. 2007, 75(12):5867-76). Here, we discovered that β-alanine can enhance the expression of the zinc transporter genes znuABC, which might serve as a supplementary mechanism for the efficient uptake of zinc by Salmonella within macrophages.

      You are correct that more zinc is not necessarily beneficial for Salmonella, as excessive zinc can inhibit the growth of Salmonella. Considering that zinc availability is limited within macrophages and the znuABC genes are significantly upregulated when Salmonella resides inside macrophages (PLoS Pathog. 2015, 11(11):e1005262; Science. 2018, 362(6419):1156-1160), it is likely that zinc acts as a limiting factor and may not attain very high concentrations during Salmonella's growth within macrophages. We have included a discussion on this matter in the revised manuscript.t (lines 459-466).

      (15) Figure 6B. Related to the above, these data would be more compelling with higher n and a dose-dependent response demonstrated for Zn2+ addition. This is a central point of the manuscript, and effectively what the authors propose as the underlying mechanism, and it should be more robustly substantiated.

      Thank you for the comment. As stated in the previous response, we were unable to directly assess the bacterial zinc concentration during Salmonella growth within macrophages. Instead, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (ΔpanD), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. Moreover, considering that zinc availability is limited within macrophages and the znuABC genes are significantly upregulated when Salmonella resides inside macrophages (PLoS Pathog. 2015, 11(11):e1005262; Science. 2018, 362(6419):1156-1160), it is likely that zinc acts as a limiting factor and may not attain very high concentration during Salmonella's growth within macrophages.

      Reviewer #2 (Public review):

      Summary:

      Salmonella exploits host- and bacteria-derived β-alanine to efficiently replicate in host macrophages and cause systemic disease. β-alanine executes this by increasing the expression of zinc transporter genes and therefore the uptake of zinc by intracellular Salmonella.

      Strengths:

      The experiments designed are thorough and the claims made are directly related to the outcome of the experiments. No overreaching claims were made.

      Weaknesses:

      A little deeper insight was expected, particularly towards the mechanistic aspects. For example, zinc transport was found to be the cause of the b-alanine-mediated effect on Salmonella intracellular replication. It would have been very interesting to see which are the governing factors that may get activated or inhibited due to Zn accumulation that supports such intracellular replication.

      We appreciate your review and advice. To further investigate the mechanisms by which β-alanine, panD, and zinc influence Salmonella infection, we have conducted additional experiments as suggested. For instance, we examined the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (Δ_panD_). This approach indirectly reflects zinc acquisition by intracellular Salmonella, as it is challenging to isolate sufficient amounts of the bacteria from infected cells or tissues for zinc concentration measurement. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared to that in WT-infected counterparts (Figures 5E and 6A). This suggests that the panD gene and β-alanine are crucial for Salmonella to absorb zinc from host cells. This new information has been included in the revised manuscript (lines 325-329, 344-348).

      Zinc is essential for bacterial survival and growth, as zinc-binding proteins constitute approximately 5% of the bacterial proteome and play crucial roles in bacterial metabolism and growth. (J Proteome Res. 2006, 5(11):3173-8; Future Med Chem. 2017, 9(9):899-910 ). Regarding Salmonella, zinc is also employed to undermine the antimicrobial host defense mechanisms of macrophages, by inhibiting NF-кB activation and impairing NF-кB-dependent bacterial clearance (J Biol Chem. 2018, 293(39):15316-15329; Infect Immun. 2017, 85(12):e00418-17). Thus, efficient zinc uptake could be crucial for Salmonella survival and replication within macrophages, where zinc availability is extremely limited (Infect Immun. 2007, 75(12):5867-76; Biochim Biophys Acta. 2016, 1860(3):534-41). It has been reported that Salmonella exploits the high-affinity ZnuABC zinc transporter to maximize zinc availability in host cells (Infect Immun. 2007, 75(12):5867-76). Here, we discovered that β-alanine can enhance the expression of the zinc transporter genes znuABC, which might serve as a supplementary mechanism for the efficient uptake of zinc by Salmonella within macrophages. We have addressed this issue in the revised manuscript (lines 459-466).

      Reviewer #2 (Recommendations for the authors):

      A few general clarifications and suggested experiments:

      (1) Metabolome analysis: Salmonella can itself produce b-alanine. Given that it is isolated from infected cells where salmonella has scavenged b-alanine from host cytosol as well as produced it, how b-alanine levels went down in metabolome analysis is confusing.

      Thank you for the comment. The method for targeted metabolic profiling is conducted as outlined in a recently published paper by our group (Nat Commun. 2021, 12(1):879). To prevent delays and changes in metabolite concentrations during the separation of bacterial contents from macrophages, we determined the combined metabolite concentrations directly from infected cells and Salmonella. We observed that each Salmonella cell contained only 0.01%-0.02% of the concentration of each corresponding combined metabolite. Approximately 94% of the infected macrophages contained no more than ten bacteria at 8 hours post-infection, confirming that the combined metabolites were predominantly from the host. We have included an explanation of this issue in the method section. (lines 557-560).

      (2) What is the basal level of b-alanine produced by macrophages? How was 1 mM conc. chosen?

      According to our results, the content of β-alanine in uninfected RAW264.7 cells is 26-33 μM/10<sup>7</sup> cell (700-900 ng/10<sup>7</sup> cell). The 1 mM concentration was chosen based on a published report (Appl Microbiol Biotechnol. 2004, 65(5):576-82).

      Additionally, we have supplemented the culture medium (RPMI) of RAW264.7 cells with 0.5, 1, 2, and 4 mM β-alanine and subsequently infected them with Salmonella to assess the impact of β-alanine supplementation on the bacterium's replication within macrophages. Our observations revealed that the supplementation with 1, 2, and 4 mM β-alanine significantly (P < 0.001) enhanced Salmonella replication in RAW264.7 cells. Furthermore, the addition of β-alanine to the infected cells resulted in a dose-dependent increase in Salmonella intracellular replication, as depicted in Figure 1E. These findings further support the notion that host-derived β-alanine facilitates Salmonella replication within macrophages. This data has been incorporated into the revised manuscript (lines 141-149).

      (3) The antimicrobial activity of macrophages preventing the growth of intracellular Salmonella will primarily be governed by genes such as GBPs, defensins, nitric oxide, etc. The expression of these genes should be tested rather than cytokines which are secreted with little effect on intracellular Salmonella.

      Thank you for the suggestion. We have investigated the levels of ROS (reactive oxygen species) and RNS (reactive nitrogen species) in Salmonella-infected RAW264.7 cells, both in the presence and absence of 1 mM β-alanine. The results indicated that β-alanine did not affect the ROS and RNS levels in RAW 264.7 cells (Figure 1_figure Supplement 1), suggesting that β-alanine does not influence the antimicrobial activity of macrophages. We have included these results in the revised manuscript (lines150-153).

      (4) For animal experiments, how many times was the experiment repeated? Can the animal experiment be done with b-alanine supplementation and panD mutant? Can the liver be stained to detect the bacteria?

      Thank you for the comment.

      i) Mouse infection assays were conducted twice, with at least 2 mice (n ≥ 2) in each injection group. The combined data from the two experiments was used for statistical analysis. This information has been added to the revised manuscript. (lines 678-681).

      ii) As suggested, mice infected with the panD mutant (Δ_panD_) were administered β-alanine (500 mg/kg/day, Behav Brain Res. 2014, 272:131-40; Physiol Behav. 2015, 145:29-37) orally on a daily basis. On the third day post-infection, the bacterial burden in the liver and spleen and the body weight of the infected mice were measured. The results indicated that administering β-alanine to mice did not affect the bacterial burden of ΔpanD in the liver and spleen nor did it influence the body weight of the infected mice (please refer to Author response image 1 below). It has been reported that β-alanine is a rate-limiting precursor for the biosynthesis of carnosine in mammals (Med Sci Sports Exerc. 2010, 42(6):1162-73; Neurochem Int. 2010, 57(3):177-88). Following supplementation, β-alanine may be rapidly synthesized into carnosine in mice, and the free β-alanine, particularly that which enters the macrophages of the liver and spleen, may be limited and insufficient to enhance Salmonella replication.

      Author response image 1.

      iii) Through immunofluorescence staining, we have investigated the bacterial count of Salmonella wild-type (WT), panD mutant (Δ_panD_), and complemented strain (c_panD_) within the macrophages of the mouse liver. The findings indicate that the number of Δ_panD_ in each liver macrophage was significantly (P < 0.0001) lower than that of WT, and the complementation of Δ_panD_ increased the bacterial count in each liver macrophage to the level of WT (Figure 3E in the revised manuscript). These results have been included in the revised manuscript. (lines 234-239).

      Reviewer #3 (Public review):

      Summary:

      Salmonella is interesting due to its life within a compact compartment, which we call SCV or Salmonella containing vacuole in the field of Salmonella. SCV is a tight-fitting vacuole where the acquisition of nutrients is a key factor by Salmonella. The authors among many nutrients, focussed on beta-alanine. It is also known from many other studies that Salmonella requires beta-alanine. The authors have done in vitro RAW macrophage infection assays and In vivo mouse infection assays to see the life of Salmonella in the presence of beta-alanine. They concluded by comprehending that beta-alanine modulates the expression of many genes including zinc transporters which are required for pathogenesis.

      Strengths:

      This study made a couple of knockouts in Salmonella and did a transcriptomic investigation to understand the global gene expression pattern.

      Weaknesses:

      The following questions are unanswered:

      (1) It is not clear how the exogenous beta-alanine is taken up by macrophages.

      We thank the reviewer for the question. It has been reported that β-alanine is transported into eukaryotic cells via the TauT (SLC6A6) and PAT1 (SLC36A1) transporters (Acta Physiol (Oxf). 2015, 213(1):191-212; Am J Physiol Cell Physiol. 2020 Apr 1;318(4):C777-C786; Biochim Biophys Acta. 1994, 1194(1):44-52.).

      (2) It is not clear how the Beta-alanine from the cytosol of the macrophage enters the SCV.

      According to the published report, translocation of SPI2 effector proteins induces the formation of specific tubular membrane compartments extend from the SCV, known as Salmonella-induced filaments (SIFs) (Traffic. 2001, 2(9):643-53; Traffic. 2007, 8(3):212-25; Traffic. 2008, 9(12):2100-16; Microbiology (Reading). 2012, 158(Pt 5):1147-1161). The membranes and lumens of both SIFs and SCVs form a continuous network, allowing vacuolar Salmonella to access various types of endocytosed materials (Front Cell Infect Microbiol. 2021, 11:624650; Cell Host Microbe. 2017, 21(3):390-402). We hypothesize that β-alanine may enter SCVs from the cytoplasm of macrophages via SIFs. This information has been included in the revised manuscript (lines 56-61).

      (3) It is not clear how the beta-alanine from SCV enters the bacterial cytosol.

      Thank you for the question. We have attempted to identify the transporter of β-alanine in Salmonella, but we found that the CycA transporter, which transports β-alanine in Escherichia coli, does not function in the same manner in Salmonella, despite Salmonella being closely related to E. coli.

      BasC is a bacterial LAT (L-Amino acid transporter) with an APC fold (J Gen Physiol. 2019, 151(4):505-517). The basC gene is reported to be present in the genomes of Pseudomonas, Acinetobacter, and Aeromonas, etc. Following your suggestion, we searched the genome of Salmonella Typhimurium at NCBI and did not find any basC gene or genes with a sequence similar to basC. Unfortunately, we have yet to identify the β-alanine transporter in Salmonella, and we will persist in our search in future work.

      (4) There is no clarity on the utilization of exogenous beta-alanine of the host and the de novo synthesis of beta-alanine by panD of Salmonella.

      Thank you for the comment. Our findings indicated that β-alanine levels were reduced in Salmonella-infected RAW264.7 cells. Furthermore, the addition of β-alanine to the culture medium (RPMI) of RAW264.7 cells significantly enhanced Salmonella replication, suggesting that the intracellular Salmonella utilize host-derived β-alanine for their growth. However, to date, we have not identified the transporter responsible for the uptake of exogenous β-alanine into the Salmonella cytosol.

      Moreover, we have discovered that the replication of the Salmonella panD mutant within macrophages and its virulence in mice are significantly reduced compared to the wild type (WT), indicating that the de novo synthesis of β-alanine is crucial for Salmonella's intracellular replication and virulence.

      These results indicate that either acquisition from the host or de novo synthesis of β-alanine is critical for Salmonella replication inside macrophages.

      Reviewer #3 (Recommendations for the authors):

      Cite this paper from 1985, which talks about the role of beta-alanine in Salmonella infection J Gen Microbiol,. 1985 May;131(5):1083-90. doi: 10.1099/00221287-131-5-1083. A Salmonella typhimurium strain defective in uracil catabolism and beta-alanine synthesis, T P West, T W Traut, M S Shanley, G A O'Donovan

      We have now cited this paper in the revised manuscript (lines 82-83).

      (2) BasC- can be important for beta-alanine transport. CycA transporter was not found to be involved in beta-alanine. However, it is important to find out which transporter is required for the uptake of beta-alaine.

      Thank you for pointing it out. We agree that it is important to determine which transporter is necessary for the uptake of β-alanine in Salmonella. BasC is a bacterial LAT (L-Amino acid transporter) with an APC fold (J Gen Physiol. 2019, 151(4):505-517). The basC gene is reported to be present in the genomes of Pseudomonas, Acinetobacter, and Aeromonas, etc. Following your suggestion, we searched the genome of Salmonella Typhimurium at NCBI and did not find any basC gene or genes with a sequence similar to basC. Unfortunately, we have yet to identify the β-alanine transporter in Salmonella, and we will persist in our search in future work.

      (3) Bacteria being quite stringent with its energy resources, it is unlikely that it will use de novo synthesis if the host resources are available. Only if the host resources are depleted, can it turn on the de novo synthesis involving panD. What is the status of fold-replication of panD mutant in the presence of exogenous addition of beta-alanine?

      Thank you for the comment. The addition of 1 to 4 mM of β-alanine increased the replication of the panD mutant (Δ_panD_) in RAW264.7 cells by 1.7- to 3.1-fold. This increase in Salmonella intracellular replication was dose-dependent, as shown in Figure 2H of the revised manuscript, further illustrating that host-derived β-alanine promotes Salmonella replication inside macrophages.

      We agree that bacteria are quite stringent with their energy resources. The results of this work indicate that either acquisition from the host or de novo synthesis of β-alanine is critical for Salmonella replication inside macrophages. We speculate that Salmonella relies on a large amount of β-alanine to efficiently replicate in macrophages, thereby highlighting the importance of β-alanine for Salmonella intracellular growth. We have discussed this issue in the revised manuscript. (lines 392-396).

      (4) 100% survival of animals infected with panD mutant is a bit of concern. What happens when beta-alanine is fed to mice and infected with panD mutant?

      Thank you for the comment. As suggested, mice infected with the panD mutant (ΔpanD) were administered β-alanine (500 mg/kg/day, as reported in Behav Brain Res. 2014, 272:131-40; Physiol Behav. 2015, 145:29-37) orally on a daily basis. On the third day post-infection, the bacterial load in the liver and spleen, as well as the body weight of the infected mice, were measured. The results indicated that administering β-alanine did not affect the bacterial load of Δ_panD_ in the liver and spleen nor did it influence the body weight of the infected mice (refer to Author response image 1). It has been reported that β-alanine is a rate-limiting precursor for the biosynthesis of carnosine in mammals (Med Sci Sports Exerc. 2010, 42(6):1162-73; Neurochem Int. 2010, 57(3):177-88). Following supplementation, β-alanine may be rapidly converted into carnosine in mice, and the free β-alanine, particularly that which enters the macrophages of the liver and spleen, may be limited and insufficient to enhance Salmonella replication.

      (5) How does beta-alanine from macrophages' cytosol enter the SCV.

      Thank you for pointing it out. According to published reports, the translocation of SPI2 effectors triggers the formation of specialized tubular membrane compartments, known as Salmonella-induced filaments (SIFs), which extend from the SCV (Traffic. 2001, 2(9):643-53; Traffic. 2007, 8(3):212-25; Traffic. 2008, 9(12):2100-16; Microbiology. 2012, 158:1147-1161). The membranes and lumens of SIFs and SCVs create a continuous network, allowing vacuolar Salmonella to access various types of endocytosed materials (Front Cell Infect Microbiol. 2021, 11:624650; Cell Host Microbe. 2017, 21(3):390-402). Consequently, it is plausible that β-alanine enters SCVs from the macrophage cytoplasm via SIFs. This information has been included in the revised manuscript.(lines 56-61).

      (6) It would be essential to dissect the role of exogenous beta-alanine and the use of de novo synthesized beta-alanine.

      We agree that it is essential to dissect the role of exogenous β-alanine and the use of de novo synthesized β-alanine. Our results indicate that Salmonella-infected macrophages exhibited lower levels of β-alanine compared to mock-infected macrophages. Furthermore, β-alanine supplementation in the cell medium enhanced Salmonella replication within macrophages in a dose-dependent manner, revealing that Salmonella utilizes host-derived β-alanine to promote intracellular replication. Additionally, a deficiency in the biosynthesis of β-alanine, resulting from mutation of the rate-limiting gene panD, led to reduced Salmonella replication in macrophages and systemic infection in mice. This suggests that Salmonella also employs bacterial-derived β-alanine to enhance intracellular replication and pathogenicity.

      We sought to identify the main transporters responsible for β-alanine uptake in Salmonella. Unfortunately, we have not yet found the transporter. We will address this issue in our future work.

    1. eLife Assessment

      By taking advantage of noise in gene expression, this important study introduces a new approach for detecting directed causal interactions between two genes without perturbing either. The main theoretical result is supported by a proof. Preliminary simulations and experiments on small circuits are solid, but further investigations are needed to demonstrate the broad applicability and scalability of the method.

    2. Reviewer #2 (Public Review):

      Summary:

      This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality.

      The authors benchmark their approach experimentally in several synthetic circuits. In 4 positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in 2 of the 4 positive control circuits. The authors constructed 16 negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter, or simply the cellular growth rate. The proposed method detected a causal effect in two of the 16 negative controls, which the authors argue is perhaps not a false positive, but due to an unexpected causal effect. Overall, the data support the potential value of the proposed approach.

      Strengths:

      The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations.

      By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations.

      The authors have improved the clarity and completeness of their proof compared to a previous version of the manuscript.

      Limitations:

      The authors themselves clearly outline the primary limitations of the study: The experimental benchmark is a proof of principle, and limited to synthetic circuits involving a handful of genes expressed on plasmids in E. coli. As acknowledged in the Discussion, negative controls were chosen based on the absence of known interactions, rather than perturbation experiments. Further work is needed to establish that this technique applies to other organisms and to biological networks involving a wider variety of genes and cellular functions. It seems to me that this paper's objective is not to delineate the technique's practical domain of validity, but rather to motivate this future work, and I think it succeeds in that.

      Might your new "Proposed additional tests" subsection be better housed under Discussion rather than Results?

      I may have missed this, but it doesn't look like you ran simulation benchmarks of your bootstrap-based test for checking whether the normalized covariances are equal. It would be useful to see in simulations how the true and false positive rates of that test vary with the usual suspects like sample size and noise strengths.

      It looks like you estimated the uncertainty for eta_xz and eta_yz separately. Can you get the joint distribution? If you can do that, my intuition is you might be able to improve the power of the test (and maybe detect positive control #3?). For instance, if you can get your bootstraps for eta_xz and eta_yz together, could you just use a paired t-test to check for equality of means?

      The proof is a lot better, and it's great that you nailed down the requirement on the decay of beta, but the proof is still confusing in some places:

      - On pg 29, it says "That is, dividing the right equation in Eq. 5.8 with alpha, we write the ..." but the next equation doesn't obviously have anything to do with Eq. 5.8, and instead (I think) it comes from Eq 5.5. This could be clarified.

      - Later on page 29, you write "We now evoke the requirement that the averages xt and yt are stationary", but then you just repeat Eq. 5.11 and set it to zero. Clearly you needed the limit condition to set Eq. 5.11 to zero, but it's not clear what you're using stationarity for. I mean, if you needed stationarity for 5.11 presumably you would have referenced it at that step.

      It could be helpful for readers if you could spell out the practical implications of the theorem's assumptions (other than the no-causality requirement) by discussing examples of setups where it would or wouldn't hold.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript presents a method to infer causality between two genes (and potentially proteins or other molecules) based on the non-genetic fluctuations among cells using a version of the dual-reporter assay as a causal control, where one half of the dual-reporter pair is causally decoupled, as it is inactive. The authors propose a statistical invariant identity to formalize this idea. 

      We thank the referee for this summary of our work. 

      Strengths: 

      The paper outlines a theoretical formalism, which, if experimentally used, can be useful in causal network inference, which is a great need in the study of biological systems. 

      We thank the referee for highlighting the potential value of our proposed method.

      Weaknesses: 

      The practical utility of this method may not be straightforward and potentially be quite difficult to execute. Additionally, further investigations are needed to provide evidence of the broad applicability of the method to naturally occurring systems and its scalability beyond the simple circuit in which it is experimentally demonstrated. 

      We agree with these two points and have rewritten the manuscript, in particular highlighting the considerable future work that remains to be done to establish the broad applicability and scalability of our method.

      In the rewritten manuscript we explicitly spell out potential practical issues and we explicitly state that our presented proof–of–principle feasibility study does not guarantee that our method will successfully work in systems beyond the narrowly sampled test circuits. This helps readers to clearly distinguish between what we claim to have done from what remains to be done. The re-written parts and additional clarifications are:

      Abstract (p. 1), Introduction (p. 1-2), Sec. “Proposed additional tests” (p. 8), and “Limitations of this study” (p. 10).

      Reviewer #2 (Public Review): 

      Summary: 

      This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality. 

      The authors benchmark their approach experimentally in several synthetic circuits. In four positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in two or three of the positive control circuits. The authors constructed sixteen negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter or simply the cellular growth rate. The proposed method detected a causal effect in one of the eight negative controls, which the authors argue is not a false positive, but due to an unexpected causal effect. Overall, the data support the practical usefulness of the proposed approach. 

      We thank the referee for their summary of our work.

      Strengths: 

      The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations. 

      By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations. 

      We thank the referee for summarizing the value of our work. 

      Caveats: 

      The term "causally" is used in the main-text statement of the central theorem (Eq 2) without a definition of this term. This makes it difficult to fully understand the statement of the paper's central theorem without diving into the supplement.  

      We thank the referee for this suggestion. In the revised manuscript we now define causal effects right before the statement of the main theorem of the main text (p. 2). We have also added a definition of the causal network arrows in the caption of Fig. 1 to help readers better understand our central claim.

      The basic argument of theorem 1 appears to rely on establishing that x(t) and y(t) are independent of their initial conditions. Yet, there appear to be some scenarios where this property breaks down: 

      (1) Theorem 1 does not seem to hold in the edge case where R=beta=W=0, meaning that the components of interest do not vary with time, or perhaps vary in time only due to measurement noise. In this case x(t), y(t), and z(t) depend on x(0), y(0), and z(0). Since the distributions of x(0), y(0), and z(0) are unspecified, a counterexample to the theorem may be readily constructed by manipulating the covariance matrix of x(0), y(0), and z(0). 

      (2) A similar problem may occur when transition probabilities decay with time. For example, suppose that again R=0 and X are degraded by a protease (B), but this protease is subject to its own first-order degradation. The deterministic version of this situation can be written, for example, dx/dt=-bx and db/dt=-b. In this system, x(t) approaches x(0)exp(-b(0)) for large t. Thus, as above, x(t) depends on x(0). If similar dynamics apply to the Y and Z genes, we can make all genes depend on their initial conditions, thus producing a pathology analogous to the above example. 

      The reviewer does not know when such examples may occur in (bio)physical systems. Nevertheless, since one of the advantages of mathematics is the ability to correctly identify the domain of validity for a claim, the present work would be strengthened by "building a fence" around these edge cases, either by identifying the comprehensive set of such edge cases and explicitly prohibiting them in a stated assumption set, or by pointing out how the existing assumptions already exclude them.  

      We thank the referee for bringing to our attention these edge cases that indeed violate our theorem as stated. In the revised manuscript we have “built a fence” around these edge cases by adding two requirements to the premise of our theorem: First, we have added the requirement that the degradation rate does not decay to zero for any possible realization. That is, if beta(t) is the degradation rate of X and Y for a particular cell over time, then taking the time average of beta(t) over all time must be non-zero. Second, we have added the requirement that the system has evolved for enough time such that the dual reporter averages <x> and <y>, along with the covariances Cov(x, z_{k}) and Cov(y, z_{k}) have reached a time-independent stationary state.  

      With these requirements, no assumptions need to be made about the initial conditions of the system, because any differences in the initial conditions will decay away as the system reaches stationarity. For instance, the referee’s example (1) is not possible with these requirements because beta(t) can no longer remain zero. Additionally, example (2) is no longer possible because the time average of the degradation rate would be zero, which is no longer allowed (i.e., we would have that integral from 0 to T of b(0)exp(-t)/T dt =  0 when T goes to infinity). 

      Note that adding the condition that degradation cannot decay to exactly zero does not reduce the biological applicability of the theorem. But as the referee correctly points out any mathematical theorem needs to be accurately stated and stand on its own regardless of whether biological systems could realize particular edge cases. Also note, that the requirement that the cellular ensemble has reached a time-independent distribution of cell-to-cell variability can be (approximately) experimentally verified by taking snapshots of ensemble variability at two sufficiently separate different moments in time. 

      In response to the referee’s comment, we have added the above requirements when stating the theorem in the main text. We have also added the requirement of non-decay of the degradation rate to the definition of the system in SI Sec. 4, along with the stationarity requirement in theorem 1 in SI Sec 5. We have also added mathematical details to the proof of the invariant in SI Sec 5.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      This manuscript presents a method to infer causality between two genes (and potentially proteins or other molecules) based on the non-genetic fluctuations among cells using a version of the dual-reporter assay as a causal control, where one half of the dual-reporter pair is causally decoupled, as it is inactive. The authors propose a statistical invariant identity to formalize this idea. They propose and experimentally demonstrate the utility of this idea with a synthetic reporter system in bacteria. 

      The paper is well written and clearly outlines the principle, the mathematical invariant relationship both to give the reader an intuitive understanding of why the relationship must be true and in their mathematical derivation of the proof of Theorem 1. 

      The paper outlines a theoretical formalism, which, if experimentally used, can be useful in causal network inference, which is a great need in the study of biological systems. However, the practical utility of this method may not be straightforward and potentially be quite difficult to execute. We think this work could offer a platform to advance the field of network inference, but would encourage the authors to address the following comments. 

      We thank the reviewer for the positive comments on readability, summarizing the value of our work, as well as the critical comments below that helped us improve the manuscript.

      Major comments: 

      (1) Although the invariant identity seems theoretically sound, the data from synthetic engineered circuits in this manuscript do not support that the invariant holds for natural causal relations between genes in wild-type cells. In all the positive control synthetic circuits (numbers 1 to 4) the target gene Z i.e. RFP was always on the plasmid, and in circuit #4 there was an additional endogenous copy. The authors recapitulate the X-to-Z causality in circuits 1, 2, and 3 but not 4. Ultimately, the utility of this method lies in the ability to capture causality from endogenous correlations, this observation suggests that the method might not be useful for that task. 

      We thank the referee for their careful reading of our synthetic circuits and sincerely apologize for an error in our description of circuit #4 in the schematic of Table S2 of the supplement. We incorrectly stated that this circuit contained a chromosomally expressed RFP. In fact, in circuit #4 RFP was only on the plasmid just like in the circuits #1-3. We have corrected the schematic in the revised manuscript and have verified that the other circuits are correctly depicted.

      In the revised manuscript, we now explicitly spell out that all our “positive control” test cases had the genes of interest expressed on plasmids, and that we have not shown that our method successfully detected causal interactions in a chromosomally encoded gene regulatory circuit, see additional statements in Sec. “Causally connected genes that break the invariant” on p. 6. 

      In the absence of any explicit experimental evidence, it is then important to consider whether chromosomally encoded circuits are expected to cause problems for our method which is based on a fluctuation test. Due to plasmid copy number fluctuations, X and Z will fluctuate significantly more when expressed on plasmids than when expressed chromosomally. However, because this additional variability is shared between X and Z it does not help our analysis which relies on stochastic differences in X and Z expression due to “intrinsic noise” effects downstream of copy number fluctuations. The additional “extrinsic noise” fluctuations due to plasmid copy number variability would wash out violations of Eq. (2) rather than amplify them. If anything, we thus expect our test cases to have been harder to analyze than endogenous fluctuations. This theoretical expectation is indeed borne out by numerical test cases presented in the revised supplement where plasmid copy fluctuations severely reduced the violations of Eq. 2, see new additional SI Sec. 15. 

      Additionally, the case of the outlier circuit (number 12) suggests that exogenous expression of certain genes may lead to an imbalance of natural stoichiometry and lead to indirect effects on target genes which can be misinterpreted as causal relations. Knocking out the endogenous copy may potentially ameliorate this issue but that remains to be tested. 

      We agree with the referee that the expression of exogenous genetic reporters can potentially affect cellular physiology and lead to undesired effects. In the revised manuscript we now explicitly spell out that the metabolic burden or the phototoxicity of introducing fluorescent proteins could in principle cause artificial interactions that do not correspond to the natural gene regulatory network, see Sec. “Proposed additional tests” on p. 8.

      However, it is also important to consider that the test circuit #12 represents a synthetic circuit with genes that were expressed at extremely high levels (discussed in 3rd paragraph of Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit”, p. 8), which led to the presumed cellular burden. Arguably, natural systems would not typically exhibit such high expression levels, but importantly even if they did, our method does not necessarily rely on fluorescently tagged proteins but can, in principle, also be applied to other methods such as transcript counting through sequencing or in-situ hybridization of fluorescent probes.  

      Ultimately, the value of this manuscript will be greatly elevated if the authors successfully demonstrate the recapitulation of some known naturally existing causal and non-causal relations. For this, the authors can choose any endogenous gene Z that is causally controlled by gene X. The gene X can be on the exogenous plasmid along with the reporter and the shared promoter. Same for another gene Z' which is not causally controlled by gene X. Potentially a knockout of endogenous X may be required but it might depend  on what genes are chosen. 

      If the authors think the above experiments are outside the scope of this manuscript, they should at least address these issues and comment on how this method could be effectively used by other labs to deduce causal relations between their favorite genes. 

      Because a full analysis of naturally occurring gene interactions was beyond the scope of our work, we agree with the referee’s suggestion to add a section to discuss the limitations of our experimental results. In the revised manuscript we reiterate that additional investigations are needed to show that the method works to detect causal interactions between endogenous genes, see Abstract (p. 1), Introduction (p. 1-2), Sec. “Proposed additional tests” (p. 8), and “Limitations of this study”  (p. 9). In the original manuscript we explicitly spelled out how other researchers can potentially carry out this further work in the subsections titled “Transcriptional dual reporters” (p. 3) and ”Translational dual reporters” (p. 3).  In the revised manuscript, we have added a section “Proposed additional tests” (p. 8) in which we propose an experiment analogous to the one proposed by the referee above, involving an endogenous gene circuit found in E. coli, as an example to test our invariant. 

      (2) For a theoretical exposition that is convincing, we suggest the authors simulate a larger network (for instance, a network with >10 nodes), like the one shown schematically in Figure 1, and demonstrate that the invariant relationship holds for the causally disconnected entities, but is violated for the causally related entities. It would also be interesting to see if any quantification for the casual distance between "X" and the different causally related entities could be inferred.  

      We thank the referee for this suggestion. We have added SI Sec. 14 where we present simulation results of a larger network with 10 nodes. We find that all of the components not affected by X satisfy Eq. (2) as they must. However, it is important to consider that we have analytically proven the invariant of Eq. (2) for all possible systems. It provably applies equally to networks with 5, 100, or 10,000 components. The main purpose of the simulations presented in Fig. (2) is to illustrate our results and to show that correlation coefficients do not satisfy such an invariant. However, they are not used as a proof of our mathematical statements.

      We thank the referee for the interesting suggestion of quantifying a “causal distance”. Unfortunately, the degree to which Eq. (2) is violated cannot directly equate to an absolute measure for the “causal distance” of an interaction. This is because both the strength of the interaction and the size of the stochastic fluctuations in X affect the degree to which Eq. (2) is violated. The distance from the line should thus be interpreted as a lower bound on the causal effect from X to Z because we do not know the magnitude of stochastic effects inherent to the expression of the dual reporters X and Y. While the dual reporters X and Y are identically regulated, they will differ due to stochastic fluctuations. Propagation of these fluctuations from X to Z are what creates an asymmetry between the normalized covariances. In the most extreme example, if X and Y do not exhibit any stochastic fluctuations we have x(t)=y(t) for all times and Eq. (2) will not be violated even in the presence of a strong causal link from X to Z.

      However, it might be possible to infer a relative causal distance to compare causal interactions within cells.

      That is, in a given network, the normalized covariances between X, Y and two other components of interest Z1, Z2 that are affected by X can be compared. If the asymmetry between (η𝑥𝑧1 , η𝑦𝑧1) is larger than the asymmetry between (η𝑥𝑧2 , η𝑦𝑧2) , then we might be able to conclude that X affects Z1 with a stronger interaction than the interaction from X to Z2, because here the intrinsic fluctuations in X are the same in both cases. 

      In response to the referee’s comment and to test the idea of a relative causal distance, we have simulated a larger network made of 10 components. In this network, X affects a cascade of components called Z8, Z9, and Z10, see the additional SI Sec. 14. Here the idea of a causal distance can be defined as the distance down the cascade: Z8 is closest to X and so has the largest causal strength, whereas Z10 has the weakest. Indeed, simulating this system we find that the asymmetry between η𝑥𝑧8 and η𝑦𝑧8 is the largest whereas that between  η𝑥𝑧10 and η𝑦𝑧10 the smallest. We also find that all of the components not affected by X have normalized covariances that satisfy Eq. (2). This result suggests that the relative causal distance or strength in a network could potentially be estimated from the degree of the violations of Eq. (2). 

      However, we note that these are preliminary results. In the case of the specific regulatory cascade now considered in SI Sec. 14, the idea of a causal distance can be well defined. Once feedback is introduced into the system, this definition may no longer make sense. For instance, consider the same network that we simulate in SI Sec. 14, but where the most downstream component in the cascade, Z10, feeds back and affects X and Y. In such a circuit it is unclear whether Z8 or Z10 is “causally closer” to X. A more thorough theoretical analysis, equipped with a more universal quantitative definition for causal distance or strength, would be needed to deduce what information can be inferred from the relative distances in the violations of Eq. (2). While this defines an interesting research question, answering it goes beyond the scope of the current manuscript. 

      Minor comments: 

      - The method relies on the gene X and the reporter Y having the same control which would result in similar dynamics. The authors do not quantitatively compare the YFP and CFP expression if this indeed holds for the synthetic circuits. It would be useful to know how much deviation between the two can be tolerated while not affecting the outcome. 

      We thank the referee for their comment. The invariant of Eq. (2) is indeed only guaranteed to hold only when the transcription rate of Y is proportional to that of X. How much levels of X and Y covary depends on the stochastic effects intrinsic to the expression of the dual reporters as well as how similar the transcriptional control of X and Y is. The stochastic difference between X and Y is exactly what we exploit. 

      However, in the limit of high YFP and CFP levels, intrinsic fluctuations that cause stochastic expression differences between X and Y become negligible and we can directly infer whether they are indeed tightly co-regulated from time-traces: Below, we show two single cell traces taken with our experimental setup in which the YFP and CFP fluorescence trajectories are almost exactly proportional. Both of these traces are from circuit #10 as defined in Table. S4. 

      Author response image 1.

      We chose the above traces because they showed the highest correlation between YFP and CFP levels. Other traces for lower expression levels have lower correlations due to effects of intrinsic noise (see Tables S2-S4). However, the existence of one trace in which YFP is almost perfectly proportional to CFP throughout can only occur if the YFP and CFP genes are under the same control. And, since the control of YFP and CFP genes in all of our synthetic circuits are identical (with the same promoters and plasmid positions), these data strongly suggest that our dual reporters are tightly co-regulated in all the synthetic circuits. Moreover, the negative control experiments presented in Fig. 3E provide a natural consistency check that the YFP and CFP are under the same control and satisfy Eq. (1).

      We agree that it would be useful to know how much the X and Y production rates can differ for Eq. (2) to hold. Importantly, our proven theorem already allows for the rates to differ by an unspecified proportionality constant. In response to the referee’s comment we have derived a more general condition under which our approach holds. In the newly added SI Sec. 7 we prove that Eq. (2) holds also when rates differ as long as the difference is stochastic in nature with an average of zero. We also prove that Eq. (2) holds in the face of multiplicative noise that is independent of the X and Y production rates.

      However, the production rates of X and Y cannot differ in all ways. Some types of differences between the X and Y production rates can lead to deviations of Eq. (2) even when there is no causal interaction. To highlight this, we added the results of simulations of a toy model in which the X and Y production rates differ by an additive noise term that does not average to zero, see Fig. S19B of the newly added SI Sec. 7.

      - The invariant should potentially hold true for any biological species that are causally related e.g. protein-protein interactions. Also, this method could potentially find many applications in eukaryotic cells. Although it's outside the scope of current work to experimentally demonstrate such applications, the authors should comment on experimental strategies to apply this method to overcome potential pitfalls (e.g. presence of enhancers in eukaryotic cells). 

      We thank the referee for this suggestion. We agree that there are potential pitfalls that could come into effect when our proposed approach is applied on more complex systems such as eukaryotic gene expression. In response to the referee’s comment, we have added an explicit discussion of these potential pitfalls in the discussion section “Limitations of this study” (see p. 10). 

      In particular, in eukaryotes there are many genes in which promoter sequences may not be the sole factor determining transcription rates. Other factors that can be involved in gene regulation include the presence of enhancers, epigenetic modifications, and bursts in gene expression, to name a few. We thus propose a few strategies, which include positioning the passive reporter at a similar gene loci as the gene of interest, measuring the gene regulation activities of the gene of interest and its passive reporter using a separate method, and exploiting the invariant with a third gene, where it is known there is no causal interaction, as a consistency check. In addition, we include in the SI a new section SI Sec. 8 which shows that the invariant holds in the face of many types of bursty gene expression dynamics.

      However, the above is not a comprehensive list. Some of the issues the referee mentions are serious and may not be straightforward to overcome. We now spell this out explicitly in the revised manuscript (p. 10). 

      - In the legend of Fig. 1, the sentence "Data points here are for..." is missing a few words, or needs to be rephrased. 

      We thank the referee for this comment. We have rewritten the figure caption, which now reads “Data points are numerical simulations of specific example networks (see SI for details) to illustrate the analytically proven theorem of Eq. 2.”

      - Fig. 2 talks about the uncertainties associated with each point on the scatter plots. However, it is difficult to understand the quantification in such a plot. It would be great to have a plot quantifying the uncertainties in the invariant relation for the different topologies studied, specifically in order to understand if one topology is consistently deviating more from the x=y line than the other topologies studied here.  

      We thank the referee for this suggestion. In the supplement of the revised manuscript we have added supplemental Figs. S3, S4, and  S5 to separately quantify the uncertainty of the difference processes plotted in Fig. 2 and have added a new section (SI Sec. 11) to discuss the processes simulated in Fig. 2 in more detail. In short, each simulated process generated less than ~5% of outliers when considering 95% confidence intervals (with the max percentage deviation being 5.01% for process 5, see Fig. S5). These outliers were then simulated over a larger number of simulations to reduce the sampling error, which resulted in 0% of outliers (see Sec. “Confidence intervals for finite sampling error” on Materials and Methods on p. 11). Some simulated processes generated larger percentage errors in the normalized covariances than others, but this is expected as different processes have different dynamics which will result in different degrees of sampling of the underlying distributions.

      Note, that the invariant of Eq. 2 is analytically proven for all tested topologies as none of the topologies include a causal effect from X to Z. Any deviation of the numerical data from the straight line prediction of Eq. 2 (right column in Fig. 2C) is due to the finite sampling of a stochastic process to estimate the true covariance from the sampling covariance. Any given parameter set was simulated several times which allowed us to estimate the sampling error from differences in between repeated samples. In the additional SI figures we now quantify this error for the different topologies. 

      In addition to the above changes we want to highlight that the purpose of the simulations presented in Fig. (2) is not to prove our statements or explore the behavior of different topologies. The purpose of the data presented in the right column of Fig. 2C is to illustrate the theoretical invariant and act as a numerical sanity check of our analytically proven result. In contrast, the data in the left column of Fig 2C illustrates that the correlations do not satisfy an invariant like Eq. 2 which applies to covariances but not correlations.  

      - The legend for Fig. 3 seems to end abruptly. There likely needs to be more.  

      We thank the referee for catching this mistake. We have corrected the accidentally truncated figure caption of Fig. 3.

      - There is a typo in equation (5.3) on page 23 of supplementary material, there should be x instead of y in the degradation equation of x. 

      We thank the referee for catching this mistake which has been corrected in the revised manuscript.

      - In the supplemental material, to understand the unexpected novel discovery of causality, Figure S5 is presented. However, this doesn't give the context for other negative controls designed, and the effect of rfp dynamics (which can be seen in the plots both in the main paper and the supplement) in the growth rate of cells in those constructs. As a baseline, it would be nice to have those figures.  

      We thank the referee for this suggestion. We have now included representative RFP traces with the growth rates for other negative control circuits, see Fig. S10. In addition, we have now included the cross correlation functions between RFP and growth rate in these negative control circuits, see Fig. S10A. While in all cases, RFP and growth rate are negatively correlated, the outlier circuit exhibits the largest negative correlation.

      The suggested comparison of the referee thus highlights that – in isolation – a negative correlation between RFP and growth rate is only weak evidence for our hypothesized causal interaction because negative correlations can result from the effect of growth rate affecting volume dilution and thus RFP concentration. Crucially, we thus additionally considered the overall variability of growth rate and found the outlier circuit has the largest growth rate variability which is indicative of something that is affecting the growth rate of those cells, see Fig. S10B. To compare the magnitude of RFP variability against other strains requires constraining the comparison group to other synthetic circuits that have RFP located on the chromosome rather than a plasmid. This is why we compare the CV of the outlier with the CV of circuit #5, which corresponds to the “regular” repressilator (i.e., the outlier circuit without the endogenous lacI gene). As an additional comparison, we computed the CV for a strain of E. coli that does not contain a synthetic plasmid at all, but still contains the RFP gene on the chromosome. We find that the CVs in the outlier circuit to be larger than in these two additional circuits, suggesting that the outlier circuit causes additional fluctuations in the RFP and growth rate. We now spell this out explicitly in the revised manuscript (see Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit“, p. 8).

      The referee is correct that the above arguments are only circumstantial evidence, but they do show that the data is consistent with a plausible explanation of the hypothesized causal interaction. Our main evidence for an RpoS mediated stress response that explains the deviations from Eq. 2 in the outlier circuit is the perturbation experiment in which the deviation disappears for the RpoS knockout strain. We now spell out this argument explicitly in the revised manuscript (see Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit“, p. 8).

      Reviewer #2 (Recommendations For The Authors): 

      The proof of theorem 1 relies on an earlier result, lemma 1. Lemma 1 only guarantees the existence of a "dummy" system that satisfies the separation requirement and preserves the dynamics of X and Y. However, in principle, it may be possible to maintain the dynamics of X and Y while still changing the relationship between Cov(X,Zk) and Cov(Y,Zk). This could occur if the dynamics of Zk differ in a particular way between the original system and the dummy system. So lemma 1 needs to be a little stronger- it needs  to mention that the dynamics of Zk are preserved, or something along these lines. The proof of lemma 1 appears to contain the necessary ingredients for what is actually needed, but this should be clarified. 

      We agree with the referee that this is an important distinction. Lemma 1 does in fact guarantee that any component Zk that is not affected by X and Y will have the same dynamics in the “dummy” system. However, as the referee points out, this is not stated in the lemma statement nor in the proof of the lemma. In response to the referee’s comment, we have made it clear in the lemma statement that the Zk dynamics are preserved in the “dummy” system, and we have also added details to the proof to show that this is the case, see Lemma 1 on p. 27 of the SI. 

      Readers who are familiar with chemical reaction diagrams, but not birth-death process diagrams may waste some time trying to interpret Equation 1 as a chemical reaction diagram with some sort of rate constant as a label on each arrow (I did this). It may be helpful to either provide a self-contained definition of the notation used, or mention a source where the necessary definitions can be found. 

      We agree with the referee. In the revised manuscript we have added a description of the notation used below Equation 1 of the main text, see p. 2. The notational overloading of the “arrow notation” is a perennial problem in the field and we thank the referee for reminding us of the need to clarify what the arrows mean in our diagrams.

      It would be helpful if the authors could propose a rule for deciding whether dependence is detected or not. As it stands presently, the output of the approach seems to be a chart like that in Figure 3D where you show eta_xz and eta_yz with confidence interval bars and the reader must visually assess whether the points more-or-less fall on the line of unity. It would be better to have some systematic procedure for making a "yes or no" call as to whether a causal link was detected or not. Having a systematic detection rule would allow you to make a call as to whether dependence in circuit 3 was detected or not. It would also allow you or a future effort to evaluate the true positive rate of the approach in simulated settings. 

      We thank the referee for this suggestion. In the revised manuscript we have added an explicit rule for detecting causality using the invariant of Eq. (2). Specifically, Eq. (2) can be re-written as r = 1 where r is the covariability ratio r = etaXZ/etaYZ. In that case, given 95% confidence intervals for the experimentally determined covariability ratio r, we say that there is a causal interaction if the confidence intervals overlap with the value of r = 1. 

      This corresponds to a null hypothesis test at the 2.5% significance level. The reason that it is at 2.5% significance and not 5% significance is as follows. Let’s say we measure a covariability ratio of r_m, and the 95% confidence interval is [r_m - e_m, r_m + e_m] for some error e_m. Without loss of generality, let’s say that r_m > 1 (the same applies if r_m < 1). This means that Prob(r < r_m - e_m) = 2.5% and Prob(r > r_m + e_m) = 2.5% , where r is the actual value of the covariability ratio. Under the null hypothesis that there is no causal interaction, we set r = 1. However, we now have Prob(1 < r_m + e_m) = 0, because we know that r_m > 1 and so we must have r_m + e_m > 1. The probability that the value of 1 falls outside the error bars is therefore 2.5% under the null hypothesis. 

      This proposed rule is the same rule that we used to detect statistical outliers in our simulations, where we found a “false positive” rate of 2.3% over 6522 simulated systems due to statistical sampling error (as discussed in the Materials and Methods section). In response to the referee’s suggestion, we have added the section “A rule for detecting causality in the face of measurement uncertainty” (p. 4). We also apply the rule to the experimental data and find that the rule detects 2/4 causal interactions in Fig. 3D. We have clarified this in the Fig. 3D caption, in the main text, and we have added a figure in the SI (Fig. S2) where we apply the null hypothesis test on the measured covariability ratios. 

      Note, whether the third interaction is “detected” or not depends on the cut-off value used. We picked the most common 95% rule to be consistent with the traditional statistical approaches. With this rule one of the data points lies right at the cusp of detection, but ultimately falls into the “undetected” category if a strictly binary answer is sought under the above rule. 

      It would be helpful to mention what happens when the abundance of a species hits zero. Specifically, there are two ways to interpret the arrow from X to X+d with a W on top: 

      Interpretation (1): 

      P(X+d | X) = W if X+d {greater than or equal to} 0  P(X+d | X) = 0 if X_i+d_i < 0 for at least one i 

      Interpretation (2): 

      P(X+d | X) = W regardless of whether X+d < 0  W = 0 whenever X_i < d_i for at least one i 

      Interpretation (1) corresponds to a graph where the states are indexed on the non-negative integers. Interpretation (2) corresponds to a graph where the states are indexed on the integers (positive or negative), and W is responsible for enforcing the non-negativity of mass. I believe you need the second interpretation because the first interpretation leads to problems with your definition of causality. For example, consider the reaction: 

      (Na, K) -- 0.1 --> (Na-1, K+1) 

      This could occur if Na and K are the intracellular concentrations of sodium and potassium ions in a cell that has an ATP-driven sodium-potassium exchanger whose rate is limited by the frequency with which extracellular potassium ions happen to flow by. Per the definition of causality found in the appendix, Na has no causal effect on K since Na does not show up in the reaction rate term. However, under interpretation (1), Na clearly has a causal effect on K according to a reasonable definition of causality because if Na=0, then the reaction cannot proceed, whereas if Na>0 then it can. However, under interpretation (2), the reaction above cannot exist and so this scenario is excluded. 

      We thank the referee for this comment that helped us clarify the meaning of arrows with propensities. In short, interpretation (2) corresponds to the definition of our stochastic systems. This is consistent with the standard notation used for the chemical master equation. As the referee points out, because molecular abundances cannot be negative, any biochemical system must then have the property that the propensity of a reaction must be equal to zero when the system is in a state in which an occurrence of that reaction would take one of the abundances to negative numbers. Stochastic networks that do not have this property cannot correspond to biochemical reaction networks.

      In the revised manuscript, we now spell this out explicitly to avoid any confusion, see SI page 25.

      Furthermore, we additionally discuss the referee’s example in which the rate of exchanging Na for K through an ion exchanger is approximately independent of the intracellular Na concentration. Because biochemical systems cannot become negative, it cannot be that the rate is truly constant, but at some point for low concentrations must go down until it becomes exactly zero for zero molecules. 

      Importantly, agreement with Eq. (2) does not imply that there is no causal effect from X to Zk. It is the deviation from Eq. (2) that implies the existence of a causal effect from X to Zk. Therefore, although the above referee’s example would constitute a causal interaction in our framework, it would not lead to a deviation of Eq. (2) because the fluctuations in Na (which we exploit) do not propagate to K. From a practical point of view, our method thus detects whether changing X over the observed range affects the production and degradation rates of Zk. 

      In the course of setting up the negative control benchmark circuits, a perturbation-based causal validation would be nice. For instance, first, verify that X does not affect Z by intervening on X (e.g. changing its copy number or putting it under the control of an inducible promoter), and ensuring that Z's activity is not affected by such interventions upon X. This approach would help to adjudicate questions of whether the negative control circuits actually have an unknown causal link. The existing benchmark is already reasonably solid in my view, and I do not know how feasible this would be with the authors' setup, but I think that a perturbation-based validation could in principle be the gold standard benchmark.  

      We agree that additional perturbation-based validation tests on all of the negative control circuits would indeed improve the evidence that our method worked as advertised. While such experiments are indeed beyond the scope of our current work we now explicitly point out the benefits of such additional controls in the revised Discussion.

      Below is a series of comments about typography, mostly about section 4 of the supplement. 

      We thank the referee for their careful reading and highlighting those mistakes.

      At the bottom of page 21, Z_aff is defined as the set of components that are affected by X. However, later Z_aff seems to refer to components affected by X or Y. For instance, in the proof of lemma 1, it is written "However, because a is part of z_aff, the {ak} variables must be affected by X and/or Y." 

      We thank the referee for catching this mistake. We have changed the definition of Z_aff throughout the supplement to refer to components affected by X or Y. If it can be experimentally ensured that Y is a passive reporter (i.e., it does not affect other components in the cell), then the theorem can only be violated if X affects Z. 

      In the equation following Eq 5.2, W_k and d_k should be W_i and d_i ?  

      Yes, the referee is correct. In the revised manuscript we have corrected W_k and d_k to W_i and d_i. 

      In Eq 5.3 in the lower-left transition diagram, I think a "y" should be an "x". 

      Yes, the referee is correct. In the revised manuscript  we have fixed this typo.

      In the master equation above Eq 5.5, the "R" terms for the y reactions are missing the alpha term, and I think two of the beta terms need to be multiplied by x and y respectively.  

      The referee is correct. In the revised manuscript  we have fixed this typo.

      The notation of Eq 5.8, where z_k(t) is the conditional expectation of z_kt, is strange and difficult to follow. Why does z_k(t) not get a bar over it like its counterparts for x, y, R, and beta? The bars, although not a perfect solution, do help.  

      We agree with the referee’s comment and have added further explanations to define the averages in question, see SI p. 28. In short, when we condition on the history of the components not affected by X or Y, we in effect condition on the time trajectories of z_{k} (when it is part of the components not affected by X and/or Y) and beta (since it only depends on the components not affected by X or Y). We thus previously did not include the bars when taking the averages of these components in the conditional space because the conditioning in effect sets their time-trajectories (so they become deterministic functions of time). In the revised manuscript we now also denote these conditional expectations with bars and we have added comments to the proof to clarify their definition.

      I think it would be helpful to show how the relationship <x>=<y>/alpha is obtained from Eq 5.5.  

      We agree with this suggestion and have added the derivations, see Eqs. (5.9) - (5.13) in the revised SI. 

      In the main text, the legend of Fig 3 cuts off mid-sentence.  

      We thank the referee for catching this mistake which has been fixed in the revised manuscript.

    1. eLife Assessment

      This important study provides compelling data from in vitro models and patient-derived samples to demonstrate how modulation of GSK3 activity can reprogram macrophages, revealing potential therapeutic applications in inflammatory diseases such as severe COVID-19. The study stands out for its clear and systematic presentation, convincing experimental approach, and the relevance of its findings to the field of immunology.

    2. Reviewer #1 (Public review):

      The manuscript by Rios et al. investigates the potential of GSK3 inhibition to reprogram human macrophages, exploring its therapeutic implications in conditions like severe COVID-19. The authors present convincing evidence that GSK3 inhibition shifts macrophage phenotypes from pro-inflammatory to anti-inflammatory states, thus highlighting the GSK3-MAFB axis as a potential therapeutic target. Using both GM-CSF- and M-CSF-dependent monocyte-derived macrophages as model systems, the study provides extensive transcriptional, phenotypic, and functional characterizations of these reprogrammed cells. The authors further extend their findings to human alveolar macrophages derived from patient samples, demonstrating the clinical relevance of GSK3 inhibition in macrophage biology.

      The experimental design is sound, leveraging techniques such as RNA-seq, flow cytometry, and bioenergetic profiling to generate a comprehensive dataset. The study's integration of multiple model systems and human samples strengthens its impact and relevance. The findings not only offer insights into macrophage plasticity but also propose novel therapeutic strategies for macrophage reprogramming in inflammatory diseases.

      Strengths:

      (1) Robust Experimental Design: The use of both in vitro and ex vivo models adds depth to the findings, making the conclusions applicable to both experimental and clinical settings.

      (2) Thorough Data Analysis: The extensive use of RNA-seq and gene set enrichment analysis (GSEA) provides a clear transcriptional signature of the reprogrammed macrophages.

      (3) Relevance to Severe COVID-19: The study's focus on macrophage reprogramming in the context of severe COVID-19 adds clinical significance, especially given the relevance of macrophage-driven inflammation in this disease.

      Weaknesses:

      There are no significant weaknesses in the study.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The manuscript by Rios et al. investigates the potential of GSK3 inhibition to reprogram human macrophages, exploring its therapeutic implications in conditions like severe COVID-19. The authors present convincing evidence that GSK3 inhibition shifts macrophage phenotypes from pro-inflammatory to anti-inflammatory states, thus highlighting the GSK3-MAFB axis as a potential therapeutic target. Using both GM-CSF- and M-CSF-dependent monocyte-derived macrophages as model systems, the study provides extensive transcriptional, phenotypic, and functional characterizations of these reprogrammed cells. The authors further extend their findings to human alveolar macrophages derived from patient samples, demonstrating the clinical relevance of GSK3 inhibition in macrophage biology.

      The experimental design is sound, leveraging techniques such as RNA-seq, flow cytometry, and bioenergetic profiling to generate a comprehensive dataset. The study's integration of multiple model systems and human samples strengthens its impact and relevance. The findings not only offer insights into macrophage plasticity but also propose novel therapeutic strategies for macrophage reprogramming in inflammatory diseases.

      Strengths:

      (1) Robust Experimental Design: The use of both in vitro and ex vivo models adds depth to the findings, making the conclusions applicable to both experimental and clinical settings.

      (2) Thorough Data Analysis: The extensive use of RNA-seq and gene set enrichment analysis (GSEA) provides a clear transcriptional signature of the reprogrammed macrophages.

      (3) Relevance to Severe COVID-19: The study's focus on macrophage reprogramming in the context of severe COVID-19 adds clinical significance, especially given the relevance of macrophage-driven inflammation in this disease.

      Weaknesses:

      There are no significant weaknesses in the study, though some minor points could be addressed for clarity and completeness, as outlined in the recommendations below.

      Many thanks for these comments. Please find below the response to the  specific recommendations.

      Recommendations for the authors:

      (1) In lines 263-266, the term "MoMac-VERSE" and its associated clusters are introduced without sufficient explanation. The authors should provide additional clarification on what these clusters represent and how they were derived.

      We have revised the text according to the reviewer´s suggestion and followed the original nomenclature of the MoMac-VERSE monocyte/macrophage clusters, also recognizing the procedure for their identification. The newly modified text now states: "Thus, analysis of the MoMac-VERSE (a resource that identified conserved monocyte and macrophage states derived from healthy and pathologic human tissues) (GSE178209) (2), indicated that GSK3 inhibition augments the expression of the gene sets that define MoMac-VERSE subsets identified as long-term resident macrophages [Cluster HES1_Mac (#2)] and tumor-associated macrophages with an M2-like signature [Clusters HES1_Mac (#2), TREM2_Mac (#3), C1Q<sup>hi</sup>_Mac (#16) and FTL_Mac (#17)] (2) (Figure 1H)."

      (2) In line 283, the reference labeled "2227" appears incorrect. It seems to be a formatting issue, and it might refer to references 22-27. Please verify and correct.

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (3) In line 353, the reference is incorrect. Please reviewe ensure that all references are properly cited throughout the manuscript.

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (4) In line 368, one of the patient samples shows a decreased IL-10 response after CHIR treatment. The authors should acknowledge the heterogeneity in the primary cell responses and adjust the conclusion accordingly to reflect this variability.

      We have modified the text following the reviewer´s comment, and acknowledge the heterogeneity in the production of IL-10 after GSK3 inhibition in the three analyzed samples. The modified text now states: "Consistent with these findings, CHIR-AMØ exhibited higher expression of MAFB (Figure 6F) whose increase correlated with an augmented secretion of Legumain, CCL2 and IL-10 (Figure 6G), although the latter was only seen in two samples, probably reflecting heterogeneity in primary cell responses."

      (5) Figure 7B: the UMAP shows 4 populations, but according to the visualization in the sup fig 3, there should be many more clusters. How do the authors explain this? Are these patient-specific clusters? Also, IMs can be separated into at least subpopulations. Can the authors plot also bona fide macrophage markers expressed by all subpopulations?

      To clarify this whole issue, and avoid misleading visualization of donor-specific clusters (see below), we have now replaced all UMAP plots shown in the previous version (in old Figure 7 and old Supplementary Figure 3) with new UMAP plots after running scVI reduction. In addition, we are including a new Supplementary Figure (new Supplementary Figure 3) that contains the information of the 21310 single-cell transcriptomes from human lungs reported in GSE128033 (ref. 47) after filtering and integration [nFeature > 200 and < 6000; Unique Molecular Identifiers (nCount) > 1000) and % of mitochondrial genes (< 15 %)]. Besides, old Supplementary Figure 3 has been replaced by the new Supplementary Figure 4, which includes the information of the single-cell transcriptomes from human lung macrophages selected from GSE128033 (ref. 47) based on their expression of the monocyte/macrophage-associated markers CD163, FABP4, LYVE1 or FCN1.

      Addresing the first question, UMAPs in old Figure 7B and old Supplementary Figure 3B had a different  number of clusters because old Figure 7B was derived from old Supplementary Figure 3B after grouping macrophage clusters according to the expression of previously defined markers and to limit the weight of donor-specific clusters. Specifically, the macrophage clusters from old Figure 7B were re-grouped according to the differential expression of:

      - FCN1 (including cluster 4, 7 and 12 from Figure 7B): Infiltrating monocytes.

      - FABP4 and TYMS-negative (including clusters 0, 2, 5 and 13 from Figure 7B), or MARCO and INHBA (cluster 9 from Figure 7B) or PPARG (cluster 11 from Figure 7B): Alveolar macrophages (AMØ).

      - TYMS, MKI67, TOP2A and NUSAP1 (cluster 15 from Figure 7B): Proliferating AMØ.

      - LYVE1 or RNASE1 or LGMN (including clusters 1, 3, 6, 8, 10 and 14 from Figure 7B): Interstitial Macrophages (IMØ).

      As the reviewer suggested, this type of UMAP plot yielded a large number of donor-specific clusters. To avoid such a misleading representation, we have now plotted UMAPs after running scVI reduction in every case. The new plots are now shown in new Figure 7A, new Figure 7B, new Supplementary Figure 3 (containing the information of the 21310 single-cell transcriptomes from GSE128033) and the novel Supplementary Figure 4 (with the information of the single-cell transcriptomes from human lung macrophages from GSE128033).

      Finally, to address the last issue, we have now plotted the expression of genes used for macrophage definition (CD163, FABP4, LYVE1, FCN1), as well as proliferation-associated genes (TYMS, MKI67, TOP2A, NUSAP1) and other bona fide macrophage marker genes (SPI1, FOLR2) in Supplementary Figure 4C.

      (6) statistics should be indicated in every figure legend and for every subfigure where applicable.

      We have now included the specific statistical procedure applied for each Figure and panel.

      Reviewer 2 (Public review):

      The study by Rios and colleagues provides the scientific community with a compelling exploration of macrophage plasticity and its potential as a therapeutic target. By focusing on the GSK3-MAFB axis, the authors present a strong case for macrophage reprogramming as a strategy to combat inflammatory and fibrotic diseases, including severe COVID-19. Using a robust and comprehensive methodology, in this study it is conducted a broad transcriptomic and functional analyses and offers valuable mechanistic insights while highlighting its clinical relevance

      Strengths:

      Well performed and analyzed

      Weaknesses:

      Additional analyses, including mechanistic studies, would increase the value of the study

      In an effort to address the comment of the reviewer, we have performed more detailed analysis of the kinetics and dose-response effects of GSK3 inhibition, which are now provided as new Supplementary Figure 3A.

      Regarding additional mechanistic studies, we decided to explore the relationship between inactive GSK3β and MAFB levels at the early stages of M-CSF- or GM-CSF-driven monocyte-to-macrophage differentiation. These experiments, performed in three independent monocyte preparations, indicated that, 48 hours along differentiation, M-CSF promoted a huge increase in both MAFB expression and a slight (albeit significant) rise in inactive GSK3β (P-Ser9-GSK3β) (compared to either untreated or GM-CSF-treated monocytes), further supporting the macrophage re-programming effect of GSK3. However, since the M-CSF-promoted increase in MAFB levels was much robust than the enhancement in inactive GSK3β, we hypothesize that proteasomal degradation of MAFB might be also distinct between M-CSF- (M-MØ) and GM-CSF-dependent (GM-MØ) monocyte-derived macrophages.

      Author response image 1.

      Total GSK3β, p-Ser9-GSK3β and MAFB levels in three preparations of freshly purified monocytes either unstimulated (-) or stimulated with M-CSF (10 ng/ml) or GM-CSF (1,000 U/ml) at different time points, as determined by Western blot (upper panel). Vinculin protein levels were determined as protein loading control. Mean ± SEM of the GSK3β/Vinculin, p-Ser9-GSK3β/Vinculin, and MAFB/Vinculin protein ratios from the three independent experiments are shown (lower panel) (paired Student’s t test: *, p<0.05; ****, p<0.001).

      Based on this finding, we then determined proteasome activity in fully differentiated M-CSF- and GM-CSF-dependent monocyte-derived macrophages. Use of the Immunoproteasome Activity Fluorometric Assay Kit II (UBPBio) in M-MØ and GM-MØ, either untreated or exposed to the proteasome inhibitor MG132, revealed that immune-proteasomal and proteasomal activity is significantly stronger in GM-MØ than in M-MØ,  as demonstrated in assays for chymotrypsin-like (ANW) and branched amino acid preferring (PAL) activity (immunoproteasome), and trypsin-like (KQL) activity (both proteasome and immunoproteasome). This result suggested that, indeed, immunoproteasomal activity might contribute to the differential expression of MAFB in M-MØ and GM-MØ.

      Author response image 2.

      Immunoproteasome activity in M-MØ and GM-MØ, either untreated or exposed to MG132, as determined using the Immunoproteasome Activity Fluorometric Assay Kit II (UBPBio) on the three indicated peptides (upper panel).  Mean ± SEM of three independent experiments are shown (paired Student’s t test: *, p<0.05) (lower panel).

      Consequently, we next set up experiments to assess whether the proteasome inhibitor MG132 was capable of enhancing the expression of MAFB-dependent genes in GM-MØ. Preliminary results of GM-MØ exposure to MG132 for 6 hours indicated an increase in the expression of MAFB protein and the MAFB-dependent genes LGMN and IL10. , as well as a reduction in the expression of the GM-MØ-specific gene CD1C.

      Author response image 3.

      A. Schematic representation of the exposure of MG132 to GM-MØ for 6 hours. B. MAFB protein levels in four independent preparations of GM-MØ exposed to either DMSO (DMSO-GM-MØ) or the proteasome inhibitor MG132 (MG132-GM-MØ) for 6 hours, as determined by Western blot (left panel). GAPDH protein levels were determined as protein loading control. Mean ± SEM of the MAFB/GAPDH protein ratios from the four independent experiments are shown (right panel) (paired Student’s t test: ***, p<0.005). C. Relative mRNA levels of the indicated genes in DMSO-GM-MØ and MG132-GM-MØ, as determined by RT-PCR on seven independent samples (paired Student’s t test: ***, p<0.005; ****, p<0.001).

      Unfortunately, this proteasome inhibitor (MG-132) caused a great reduction in cell viability after 6-8 hours. Since a similar decrease in cell viability was observed upon analysis with the ONX-0914 immunoproteasome inhibitor, we could not procede any further with this approach.

      Given the reviewer´s suggestion to include mechanistic insights to the manuscript, we are now providing these results (and the corresponding figures) only for the reviewer´s information and to make clear our attempts to comply with his/her request.

      Recommendations for the authors:

      The results are of interest, and only some minor issues need to be addressed to strengthen the conclusions of the study.

      We gratefully thank the reviewer for his/her comments. 

      (1) This study employs a single dose of 10 μM of the GSK3 inhibitor CHIR-99021 for 48 hours, which is reasonable for in vitro studies. However, further investigation into the effect of different doses and exposure times could provide additional insight into optimal dosing and durability of reprogramming effects. In addition, would an alternative GSK3 inhibitors have comparable effects?

      Following the reviewer suggestion, we have performed a kinetics and dose-response analysis of the effects of CHIR-99021, using MAFB protein levels as a readout. This experiments is now shown in new Supplementary Figure 1A, that replaces the old Supplementary Figure 1A panel where a shorter kinetics was presented. Results of this new experiment indicates a maximal effect of 10µM CHIR-99021, and that the effect of the inhibitor becomes maximal 24-48 hours after treatment. The text has been modified accordingly, and it now states: "Kinetics and dose-response analysis of the effects of CHIR-99021 on MAFB expression showed that maximal protein levels were achieved after a 24-48 hour exposure to 10µM CHIR-99021 (Supplementary Figure 1A), conditions that were used hereafter."

      Regarding the use of alternative GSK3 inhibitors, we had already provided that information in Supplementary Figure 1B, where the effects of SB-216763 (10 µM) or LiCl (10 mM) were evaluated. The huge reversal of the Tyr<sup>216</sup>/Ser<sup>9</sup> GSK3β phosphorylation ratio observed with CHIR-99021 was not seen with other GSK3 inhibitors, as indicated in the text. In any event, we believe that the relevance of this result with SB-216763 or LiCl is minimized by the results generated after siRNA-mediated GSK3 knockdown (shown in Figure 4), that completely reproduced the effects seen with CHIR-99021.

      (2) Why in the "reanalysis of single cell RNAseq data" section, the authors use Seurat v5 (R) but then change to python, and the other way around?

      As indicated in the documentation for Integrative Analysis in Seurat v5 (https://satijalab.org/seurat/articles/seurat5_integration), scVIIntegration requires reticulate package which allow us to run Python environment in R.

      (3) When the authors refer to the clusters enriched in MoMacVERSE, they use the labels of the clusters (for example #2 or #3). I would suggest using the annotations described in the original paper, to link it to the bibliography published through the labels established in the paper.

      We have revised the text according to the reviewer´s suggestion and followed the original nomenclature of the MoMac-VERSE monocyte/macrophage clusters, also recognizing the procedure for their identification. The newly modified text now states: "Thus, analysis of the MoMac-VERSE (a resource that identified conserved monocyte and macrophage states derived from healthy and pathologic human tissues) (GSE178209) (2), indicated that GSK3 inhibition augments the expression of the gene sets that define MoMac-VERSE subsets identified as long-term resident macrophages [Cluster HES1_Mac (#2)] and tumor-associated macrophages with an M2-like signature [Clusters HES1_Mac (#2), TREM2_Mac (#3), C1Q<sup>hi</sup>_Mac (#16) and FTL_Mac (#17)] (2) (Figure 1H)."

      (4) In line 309. Is there any significance on the "having a stronger effect"?

      We apologize for the misleading sentence. The phrase has been modified for better clarity, and the text now states: "Like CHIR-99021, silencing of both GSK3A and GSK3B augmented the expression of MAFB, with the simultaneous silencing of both GSK3A and GSK3B genes having a stronger effect (Figure 4B), and modulated the expression of 329 genes (Figure 4C,D)."

      (5) In line 337, "(22)(27)", are these references?

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (6) In the single-cell reanalysis, could you please provide integration Qc plots? It would be interesting to have it on the paper.

      To clarify this whole issue, and avoid misleading visualization of donor-specific clusters (see below), we have now replaced all UMAP plots shown in the previous version (in old Figure 7 and old Supplementary Figure 3) with new UMAP plots after running scVI reduction. In addition, we are including a new Supplementary Figure (new Supplementary Figure 3) that contains the information of the 21310 single-cell transcriptomes from human lungs reported in GSE128033 (ref. 47) after filtering and integration [nFeature > 200 and < 6000; Unique Molecular Identifiers (nCount) > 1000) and % of mitochondrial genes (< 15 %)]. Besides, old Supplementary Figure 3 has been replaced by the new Supplementary Figure 4, which includes the information of the single-cell transcriptomes from human lung macrophages selected from GSE128033 (ref. 47) based on their expression of the monocyte/macrophage-associated markers CD163, FABP4, LYVE1 or FCN1.

      As requested by the reviewer, we are now providing the Qc plots for the re-analysis in the new Supplementary Figures 3 and 4.

    1. eLife Assessment

      This important work presents a stochastic branching process model of tumour-immune coevolution, incorporating stochastic antigenic mutation accumulation and escape within the cancer cell population. They then used this model to investigate how tumour-immune interactions influence tumour outcome and the summary statistics of sequencing data of bulk and single-cell sequencing of a tumour. The evidence is currently incomplete: statistical comparisons between the observed mutational burden distribution and theoretical predictions in the absence of immune selection should be carried out. Conclusions should be tested extensively for robustness/sensitivity to parameters.

    2. Reviewer #1 (Public review):

      Summary:

      The topic of tumor-immune co-evolution is an important, understudied topic with, as the authors noted, a general dearth of good models in this space. The authors have made important progress on the topic by introducing a stochastic branching process model of antigenicity/immunogenicity and measuring the proportion of simulated tumors that go extinct. The model is extensively explored, and the authors provide some nice theoretical results in addition to simulated results.

      Major comments

      The text in lines 183-191 is intuitively and nicely explained. However, I am not sure all of it follows from the figure panels in Figure 2. For example, the authors refer to a mutation that has a large immunogenicity, but it's not shown how many mutations, or the relative size of the mutations in Figure 2. The same comment holds true for the claim that spikes also arise for mutations with low antigenicity.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors developed a model of tumour-immune dynamics, incorporating stochastic antigenic mutation accumulation and escape within the cancer cell population. They then used this model to investigate how tumour-immune interactions influence tumour outcome and summary statistics of sequencing data.

      Strengths:

      This novel modeling framework addresses an important and timely topic. The authors consider the useful question of how bulk and single-cell sequencing may provide insights into the tumour-immune interactions and selection processes.

      Weaknesses:

      One set of conclusions presented in the paper is the presence of cyclic dynamics between effector/cancer cells, antigenicity, and immunogenicity. However, these conclusions are supported in the manuscript by two sample trajectories of stochastic simulations, and these provide mixed support for the conclusions (i.e. the phasing asynchrony described in the text does not seem to apply to Figure 2C). Similarly, the authors also find immune selection effects on the shape of the mutational burden in Figure 5 D/H using a qualitative comparison between the distributions and theoretical predictions in the absence of immune response. However the discrepancy appears quite small in panel D, and there are no quantitative comparisons provided to evaluate the significance. An analysis of the robustness of all the conclusions to parameter variation is missing. Lastly, the role of the Appendix results in the main messages of the paper is unclear.

    1. eLife Assessment

      This valuable study shows that locomotion-related modulations in the mouse visual cortex are not uniform but primarily affect neurons in muscarinic receptor-negative patches, which receive projections from specific cortical areas. While the evidence is mostly solid, some uncertainties remain regarding the link between anatomical data and functional measurements. The study should be of interest to neuroscientists interested in state modulation of cortical function.

    2. Reviewer #1 (Public review):

      Processing in the primary visual cortex (V1) of mice is not only based on sensory inputs but also strongly modulated by locomotion. In this study, Meier et al. ask whether neurons that are modulated by locomotion form clusters in V1. Their work is based on previous studies from their lab establishing a modularity in the organization of primary visual cortex based on M2-muscarinic-acetylcholine-receptor-positive patches and interpatches (Ji et al. 2015, D'Souza et al. 2019). In these studies, they have highlighted the clustering of specific visual pathways and inhibition. In the current study, they extend this modularity to motor inputs, confirming a clustering of locomotion modulated neurons but also show that these clusters overlap with the M2-negative interpatches of layer 1. Finally, they establish a blueprint for visual processing streams in V1, segregating projections to and from lateral visual areas (LM, AL, and RL) from projections to and from the lateral areas, including the visual area PM, the retrosplenial cortex (RSP), and the secondary motor area (MOs).

      Conceptually, this study provides an important finding in the organization of locomotion-related signaling in primary visual cortex, which clearly has substantial implications for sensory processing in visual cortex. While the anatomical data are solid, the link to physiology is incomplete. In conclusion, there are numerous issues that leave the main findings in some doubt, so the authors have some work to do before I find this story convincing.

      Major issues:

      (1) The major results in this study rely on proper quantification of neuronal responses during resting and running. Recently, it has been reported that hemodynamic occlusion can strongly influence measurements of fluorescent changes using two-photon imaging (Yogesh et al. 2025, doi.org/10.1101/2024.10.29.620650). Since it is unclear whether there is an inherent bias in vasculature and hemodynamic occlusion in M2 patches and interpatches, a quantification of the effect of hemodynamic occlusion would be necessary. This control would ideally be done using mice with GFP expression to test if there is still a clustering of locomotion-modulated neurons that overlaps with M2-negative interpatches. Alternatively, the authors should at the very least quantify the vascularization in M2 patches and interpatches.

      (2) To assess the effects, the authors use a correlation analysis for many of their findings (e.g., Figures 2b,c, 4j,k, ...). This, however, is inappropriate to assess the significance of the results. I suggest redoing all statistics with hierarchical bootstrap sampling (Saravanan et al. 2020, PMID: 33644783) or similar.

      (3) The authors use two different measures to assess whether and to what extent a neuron is locomotion sensitive, the LMI and "locomotion-responsive". While the LMI is defined based on recording in the light and dark (Figure 2), the "locomotion-responsiveness" is defined only in the dark (Figure 3a,c,d). The link between the two measures should be clarified.

      a) Additionally, Figure 2b shows higher average LMI for interpatches, but the locomotion-responsive fraction is similar in interpatches and patches (relative number of pairs in Figure 3c and Figure 3d). How do the authors explain this discrepancy?

      b) How is the LMI calculated - based on the average or the maximum response over stimuli? One particular stimulus? If the LMI is defined for each stimulus separately, what is plotted in Figure 2b?

      (4) In the last panels of Figures 4-7, the authors analyze the alignment of cell bodies with the M2 patches. While in superficial layers it might be straightforward to align the cell body locations with the M2 patches and interpatches in layer 1, this alignment does not appear to be trivial for deeper layers. The authors should provide additional material to convince the reader of the proper alignment.

      (5) Related to point 4 above - Given the importance of a proper alignment of M2 patches with the in vivo imaging, the in vivo - ex vivo alignment should be more convincing than Figure 1 C-E. Measuring M2 patches in vivo (as the authors have tried to do) would have provided more solid evidence. Have the authors tried to remove the dura for their in vivo imaging to increase signal-to-noise? In any case, more examples of proper alignment are necessary.

      (6) The authors state that locomotion selectively affects M2-/M2- pairs based on Figure 3c. However, to make this claim, there should be a significant difference between the correlation of stimulus-driven noise of M2-/M2- locomotion-responsive pairs and M2-/M2- locomotion-unresponsive pairs, AND no significant difference in the same analysis for M2+/M2+ pairs (i.e., testing the differences between the bars in Figure 3c and Figure 3d).

    3. Reviewer #2 (Public review):

      Summary:

      Meier et al. explore the variability of locomotion-related modulations in mouse area V1. They present 4 major findings: V1 L2/3 neurons beneath M2- interpatches are more strongly locomotion-modulated than those beneath M2+ patches, while V1 L2/3 neurons are more strongly orientation tuned. They then use viral tracing to examine the relationship of M2- interpatches and M2+ patches with inputs from and outputs to HVOs, MO, RSP, and LP, and find evidence for different closed-loop subnetworks within L1; these relationships, however, are more complicated for cell bodies in L2/3. Finally, they also describe an overlap between M2- interpatches and SOM+ dendrites/axons.

      Strengths:

      The strength of the manuscript is the detailed anatomical quantification of closed-loop connectivity, and the description of the organizing principles of M2- interpatches and M2+ patches.

      Weaknesses:

      The major weakness of the manuscript is the lack of a direct connection between the functional and the anatomical data, and the somewhat puzzling effects observed in the analysis of noise correlations. The former issue might be alleviated by modelling, where the authors could explore the space of possibilities that could explain the functional data based on the anatomical connectivity. Some control analyses could be done, for the comparison of noise correlations.

    4. Reviewer #3 (Public review):

      The authors build on the large body of their previous research, which showed that the mouse primary visual cortex is organised into two types of clusters, M2+ and M2-, which exhibit distinct input patterns from thalamus and higher visual cortical areas and distinct visual tuning preferences. The current study reveals that a like-to-like projection from within-cluster neurons to the areas that provide feedback projections and, furthermore, that neurons in the M2- clusters are more strongly affected by non-visual signals about the locomotion of the animal.

      The study adds fundamental insights to our understanding of the principles of cortical organisation and computation, specifically how the cortex integrates sensory and action-related signals.

      While the tracing data are very convincing, data analysis should be strengthened to support the claims:

      (1) The locomotion modulation index (LMI) compares the mean activity during running and not running but does not seem to account for differences between visual stimuli, so that the LMI could be influenced by the neuron's visual tuning rather than its sensitivity to locomotion, e.g. if the mouse was running more when the neuron's preferred stimulus was presented. Trials should first be averaged per stimulus, and then across stimuli. Alternatively, only the preferred stimulus could be considered.

      The significance test (unpaired t-test) suffers from the same flaw. Instead an ANOVA (with stimulus parameter as factor) would resolve the problem, or testing whether fitting the data with two tuning curves (one per locomotion state) or a single curve results in a lower error (using cross-validation).

      Given that there is evidence that specific visual stimuli can induce more or less running in mice, this issue is very important to account for behavioural differences across stimuli.

      (2) All bars in Figure 2b show a lower LMI than the reported mean LMI of 0.19. This should be checked.

      (3) Correlation tests: Pearson correlation is only meaningful when applied to continuous data. A more suitable test for discrete data like the M2 patch quantile is a rank test like Kendall's coefficient of rank correlation. This applies to data in Figure 2b,c, 4j,k, Figure 2 - Supplement 2,1a, etc.

      (4) How OSI was determined should be clarified. Specifically, were R_pref and R_ortho the mean responses to the two opposite movement directions? Similarly, how was the half-width at half-maximum of orientation determined? From the fits in Figure 2a, it looks like the widths of both Gaussians can be different.

      (5) The correlation measures in Figure 3 would greatly benefit from additional analyses to help interpretation of the results.

      a) Correlations between neurons typically increase with increasing firing rates (e.g., de la Rocha J, Doiron B, Shea-Brown E, Josić K, Reyes A. 2007. Correlation between neural spike trains increases with firing rate. Nature 448:802-6. doi:10.1038/nature06028). Could the higher correlations in M2+ pairs (Figure 3a) be explained by higher firing rates in M2+ compared to M2- neurons?

      b) To determine correlations in Figure 3a, trials during locomotion and stationarity were pooled. As locomotion impacts the firing rate of the neurons, it would be helpful to separate correlations between the two states, locomotion vs stationarity, so the measures reflect something closer to "noise correlations" rather than tuning to locomotion.

      c) Similarly, in Figure 3b, I wonder whether the large correlations in M2- pairs are driven by locomotion rather than functional connectivity. As suggested in b, a better test of noise correlations would be to account for locomotion, i.e., separate trials by stimulus identity and locomotion state. To prevent conditions with few trials from having greater weight in the overall noise correlations, I suggest the authors first z-score responses per condition, then determine noise correlations across all trials (as explained in Renart et al., 2010).

      d) Correlations in Figure 3a,b should be tested with an ANOVA and a control for multiple tests.

      (6) In plots like Figure 4j-l, it would be very informative to show individual measures (per ROI and mouse) in addition to mean +- SEM. As the counts are low (<10) it wouldn't obstruct the plot.

      (7) The caption of Figure 4l says that most retrogradely labelled cells are located in L2/3. However, the plot only shows data from L2/3 and a single section of L4, so one cannot compare it to other layers. Can the authors corroborate the claim with data from other layers?

      (8) Methods:<br /> The authors should provide more details on the visual stimuli: What was the background on which gratings were presented? How long was the inter-stimulus interval? What was presented during the inter-stimulus interval? How large were gratings used to map tuning to SF, TF, and orientation?

    1. eLife Assessment

      The findings are important and intriguing, with theoretical or practical implications beyond a single subfield. The computational methods employed are clever and sophisticated and the strength of evidence is convincing. Many of the methodological concerns raised after the first round of review were addressed in the revised version, although all three reviewers also highlighted that the exploratory nature of the paper and the lack of clarity regarding the hypotheses make it hard to assess the impact of the results on existing theories.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use a sophisticated task design and Bayesian computational modeling to test their hypothesis that information generalization (operationalized as a combination of self-insertion and social contagion) in social situations is disrupted in Borderline Personality Disorder. Their main finding relates to the observation that two different models best fit the two tested groups: While the model assuming both self-insertion and social contagion to be present when estimating others' social value preferences fit the control group best, a model assuming neither of these processes provided the best fit to BPD participants.

      Strengths:

      The revisions have substantially strengthened the paper and the manuscript is much clearer and easier to follow now. The strengths of the presented work lie in the sophisticated task design and the thorough investigation of their theory by use of mechanistic computational models to elucidate social decision-making and learning processes in BPD.

      Weaknesses:

      Some critical concerns remain after the first revision, particularly regarding the use of causal language and the clarity of the hypotheses and results, specified in the points below.

      (1) The authors frequently refer to their predictions and theory as being causal, both in the manuscript and in their response to reviewers. However, causal inference requires careful experimental design, not just statistical prediction. For example, the claim that "algorithmic differences between those with BPD and matched healthy controls" are "causal" in my opinion is not warranted by the data, as the study does not employ experimental manipulations or interventions which might predictably affect parameter values. Even if model parameters can be seen as valid proxies to latent mechanisms, this does not automatically mean that such mechanisms cause the clinical distinction between BPD and CON, they could plausibly also refer to the effects of therapy or medication. I recommend that such causal language, also implicit to expressions like "parameter influences on explicit intentional attributions", is toned down throughout the manuscript.

      (2) Although the authors have now much clearer outlined the stuy's aims, there still is a lack of clarity with respect to the authors' specific hypotheses. I understand that their primary predictions about disruptions to self-other generalisation processes underlying BPD are embedded in the four main models that are tested, but it is still unclear what specific hypotheses the authors had about group differences with respect to the tested models. I recommend the authors specify this in the introduction rather than refering to prior work where the same hypotheses may have been mentioned.

      (3) Caveats should also be added about the exploratory nature of the many parameter group comparisons. If there are any predictions about group differences that can be made based on prior literature, the authors should make such links clear.

      (4) I'm not sure I understand why the authors, after adding multiple comparison correction, now list two kinds of p-values. To me, this is misleading and precludes the point of multiple comparison corrections, I therefore recommend they report the FDR-adjusted p-values only. Likewise, if a corrected p-value is greater than 0.05 this should not be interpreted as a result.

      (5) Can the authors please elaborate why the algorithm proposed to be employed by BPD is more 'entropic', especially given both their self-priors and posteriors about partners' preferences tended to be more precise than the ones used by CON? As far as I understand, there's nothing in the data to suggest BPD predictions should be more uncertain. In fact, this leads me to wonder, similarly to what another reviewer has already suggested, whether BPD participants generate self-referential priors over others in the same way CON participants do, they are just less favourable (i.e., in relation to oneself, but always less prosocial) - I think there is currently no model that would incorporate this possibility? It should at least be possible to explore this by checking if there is any statistical relationship between the estimated θ_ppt^m and 〖p(θ〗_par |D^0).

      "To note, social contagion under M3 was highly correlated with contagion under M1 (see Fig S11). This provides some preliminary evidence that trauma impacts beliefs about individualism directly, whereas trauma and persecutory beliefs impact beliefs about prosociality through impaired trait mentalising" - I don't understand what the authors mean by this, can they please elaborate and add some explanation to the main text?

    3. Reviewer #2 (Public review):

      Summary:

      The paper investigates social-decision making, and how this changes after observing the behaviour of other people, in borderline personality disorder. The paper employs a task including three phases, the first where participants make decision on how to allocate rewards to oneself and to a virtual partner, the second where they observe the same task performed by someone else, and a third phase equivalent to phase one, but with a new partner. Using sophisticated computational modelling to analyse choice data, the study reports that borderline participants (versus controls) are more certain about their preferences in phase one, used more neutral priors and are less flexible during phase two, and are less influenced by partners in phase three.

      Strengths:

      The topic is interesting and important, and the findings are potentially intriguing. The computational methods employed is clever and sophisticated, at the cutting edge of research in the field.

      Weaknesses:

      The paper is not based on specific empirical hypotheses formulated at the outset, but, rather, it uses an exploratory approach. Indeed, the task is not chosen in order to tackle specific empirical hypotheses. This, in my view, is a limitation since the introduction reads a bit vague and it is not always clear which gaps in the literature the paper aims to fill. As a further consequence, it is not always clear how the findings speak to previous theories on the topic.

    4. Reviewer #3 (Public review):

      In this paper, the authors use a three-phase economic game to examine the tendency to engage in prosocial versus competitive exchanges with three anonymous partners. In particular, they consider individual differences in the tendency to infer about others' tendencies based on one's preferences and to update one's preferences based on observations of others' behavior. The study includes a sample of individuals diagnosed with borderline personality disorder and a matched sample of psychiatrically healthy control participants.

      On the whole, the experimental design is well-suited to the questions and the computational model analyses are thorough, including modern model-fitting procedures. I particularly appreciated the clear exposition regarding model parameterization and the descriptive Table 2 for qualitative model comparison. In the revised manuscript, the authors now provide a more thorough treatment of examining group differences in computational parameters given that the best-fitting model differed by group. They also examine the connection of their task and findings to related research focusing on self-other representation and mentalization (e.g., Story et al., 2024).

      The authors note that the task does not encourage competition and instead captures individual differences in the motivation to allocate rewards to oneself and others in an interdependent setting. The paper could have been strengthened by clarifying how the Social Value Orientation framework can be used to interpret the motivations and behavior of BPD versus CON participants on the task. Although the authors note that their approach makes "clear and transparent a priori predictions," the paper could be improved by providing a clear and consolidated statement of these predictions so that the results could be interpreted vis-a-vis any a priori hypotheses.

      Finally, the authors have amended their individual difference analyses to examine psychometric measures such as the CTQ alongside computational model parameter estimate differences. I appreciate that these analyses are described as exploratory. The approach of using a partial correlation network with bootstrapping (and permutation) was interesting, but the logic of the analysis was not clearly stated. In particular, there are large group (Table 1: CON vs. BPD) differences in the measures introduced into this network. As a result, it is hard to understand whether any partial correlations are driven primarily by mean differences in severity (correlations tend to be inflated in extreme groups designs due to the absence of observation in middle of scales forming each bivariate distribution). I would have found these exploratory analyses more revealing if group membership was controlled for.

    5. Author response:

      The following is the authors’ response to the original reviews

      Response to the Editors’ Comments

      Thankyou for this summary of the reviews and recommendations for corrections. We respond to each in turn, and have documented each correction with specific examples contained within our response to reviewers below.

      ‘They all recommend to clarify the link between hypotheses and analyses, ground them more clearly in, and conduct critical comparisons with existing literature, and address a potential multiple comparison problem.’

      We have restructured our introduction to include the relevant literature outlined by the reviewers, and to be more clearly ground the goals of our model and broader analysis. We have additionally corrected for multiple comparisons within our exploratory associative analyses. We have additionaly sign posted exploratory tests more clearly.

      ‘Furthermore, R1 also recommends to include a formal external validation of how the model parameters relate to participant behaviour, to correct an unjustified claim of causality between childhood adversity and separation of self, and to clarify role of therapy received by patients.’

      We have now tempered our language in the abstract which unintentionally implied causality in the associative analysis between childhood trauma and other-to-self generalisation. To note, in the sense that our models provide causal explanations for behaviour across all three phases of the task, we argue that our model comparison provides some causal evidence for algorithmic biases within the BPD phenotype. We have included further details of the exclusion and inclusion criteria of the BPD participants within the methods.

      R2 specifically recommends to clarify, in the introduction, the specific aim of the paper, what is known already, and the approach to addressing it.’

      We have more thoroughly outlined the current state of the art concerning behavioural and computational approaches to self insertion and social contagion, in health and within BPD. We have linked these more clearly to the aims of the work.

      ‘R2 also makes various additional recommendations regarding clarification of missing information about model comparison, fit statistics and group comparison of parameters from different models.’

      Our model comparison approach and algorithm are outlined within the original paper for Hierarchical Bayesian Model comparison (Piray et al., 2019). We have outlined the concepts of this approach in the methods. We have now additionally improved clarity by placing descriptions of this approach more obviously in the results, and added points of greater detail in the methods, such as which statistics for comparison we extracted on the group and individual level.

      In addition, in response to the need for greater comparison of parameters from different models, we have also hierarchically force-fitted the full suite of models (M1-M4) to all participants. We report all group differences from each model individually – assuming their explanation of the data - in Table S2. We have also demonstrated strong associations between parameters of equivalent meaning from different models to support our claims in Fig S11. Finally, we show minimal distortion to parameter estimates in between-group analysis when models are either fitted hierarchically to the entire population, or group wise (Figure S10).

      ‘R3 additionally recommends to clarify the clinical and cognitive process relevance of the experiment, and to consider the importance of the Phase 2 findings.’

      We have now included greater reference to the assumptions in the social value orientation paradigm we use in the introduction. We have also responded to the specific point about the shift in central tendencies in phase 2 from the BPD group, noting that, while BPD participants do indeed get more relatively competitive vs. CON participants, they remain strikingly neutral with respect to the overall statespace. Importantly, model M4 does not preclude more competitive distributions existing.

      ‘Critically, they also share a concern about analyzing parameter estimates fit separately to two groups, when the best-fitting model is not shared. They propose to resolve this by considering a model that can encompass the full dynamics of the entire sample.’

      We have hierarchically force-fitted the full suite of models (M1-M4) to all participants to allow for comparison between parameters within each model assumption. We report all group differences from each model individually – assuming their explanation of the data - in Table S2 and Table S3. We have also demonstrated strong associations between parameters of equivalent meaning from different models to support our claims in Fig S11. We also show minimal distortion to parameter estimates in between-group analysis when models are either fitted hierarchically to the entire population, or group wise (Figure S10).

      Within model M1 and M2, the parameters quantify the degree to which participants believe their partner to be different from themselves. Under M1 and M2 model assumptions, BPD participants have meaningfully larger versus CON (Fig S10), which supports the notion that a new central tendency may be more parsimonious in phase 2 (as in the case of the optimal model for BPD, M4). We also show strong correlations across models between under M1 and M2, and the shift in central tendenices of beliefs between phase 1 and 2 under M3 and M4. This supports our primary comparison, and shows that even under non-dominant model assumptions, parameters demonstrate that BPD participants expect their partner’s relative reward preferences to be vastly different from themselves versus CON.

      ‘A final important point concerns the psychometric individual difference analyses which seem to be conducted on the full sample without considering the group structure.’

      We have now more clearly focused our psychometric analysis. We control for multiple comparisons, and compare parameters across the same model (M3) when assessing the relationship between paranoia, trauma, trait mentalising, and social contagion. We have relegated all other exploratory analyses to the supplementary material and noted where p values survive correction using False Discovery Rate.

      Reviewer 1:

      ‘The manuscript's primary weakness relates to the number of comparisons conducted and a lack of clarity in how those comparisons relate to the authors' hypotheses. The authors specify a primary prediction about disruption to information generalization in social decision making & learning processes, and it is clear from the text how their 4 main models are supposed to test this hypothesis. With regards to any further analyses however (such as the correlations between multiple clinical scales and eight different model parameters, but also individual parameter comparisons between groups), this is less clear. I recommend the authors clearly link each test to a hypothesis by specifying, for each analysis, what their specific expectations for conducted comparisons are, so a reader can assess whether the results are/aren't in line with predictions. The number of conducted tests relating to a specific hypothesis also determines whether multiple comparison corrections are warranted or not. If comparisons are exploratory in nature, this should be explicitly stated.’

      We have now corrected for multiple comparisons when examining the relationship between psychometric findings and parameters, using partial correlations and bootstrapping for robustness. These latter analyses were indeed not preregistered, and so we have more clearly signposted that these tests were exploratory. We chose to focus on the influence of psychometrics of interest on social contagion under model M3 given that this model explained a reasonable minority of behaviour in each group. We have now fully edited this section in the main text in response, and relegated all other correlations to the supplementary materials.

      ‘Furthermore, the authors present some measures for external validation of the models, including comparison between reaction times and belief shifts, and correlations between model predicted accuracy and behavioural accuracy/total scores. However it would be great to see some more formal external validation of how the model parameters relate to participant behaviour, e.g., the correlation between the number of pro-social choices and ß-values, or the correlation between the change in absolute number of pro-social choices and the change in ß. From comparing the behavioural and computational results it looks like they would correlate highly, but it would be nice to see this formally confirmed.’

      We have included this further examination within the Generative Accuracy and Recovery section:

      ‘We also assessed the relationship (Pearson rs) between modelled participant preference parameters in phase 1 and actual choice behaviour: was negatively correlated with prosocial versus competitive choices (r=-0.77, p<0.001) and individualistic versus competitive choices (r=-0.59, p<0.001); was positively correlated with individualistic versus competitive choices (r=0.53, p<0.001) and negatively correlated with prosocial versus individualistic choices (r=-0.69, p<0.001).’

      ‘The statement in the abstract that 'Overall, the findings provide a clear explanation of how self-other generalisation constrains and assists learning, how childhood adversity disrupts this through separation of internalised beliefs' makes an unjustified claim of causality between childhood adversity and separation of self - and other beliefs, although the authors only present correlations. I recommend this should be rephrased to reflect the correlational nature of the results.’

      Sorry – this was unfortunate wording: we did not intend to imply causation with our second clause in the sentence mentioned. We have amended the language to make it clear this relationship is associative:

      ‘Overall, the findings provide a clear explanation of how self-other generalisation constrains and assists learning, how childhood adversity is associated with separation of internalised beliefs, and makes clear causal predictions about the mechanisms of social information generalisation under uncertainty.’

      ‘Currently, from the discussion the findings seem relevant in explaining certain aberrant social learning and -decision making processes in BPD. However, I would like to see a more thorough discussion about the practical relevance of their findings in light of their observation of comparable prediction accuracy between the two groups.’

      We have included a new paragraph in the discussion to address this:

      ‘Notably, despite differing strategies, those with BPD achieved similar accuracy to CON participants in predicting their partners. All participants were more concerned with relative versus absolute reward; only those with BPD changed their strategy based on this focus. Practically this difference in BPD is captured either through disintegrated priors with a new median (M4) or very noisy, but integrated priors over partners (M1) if we assume M1 can account for the full population. In either case, the algorithm underlying the computational goal for BPD participants is far higher in entropy and emphasises a less stable or reliable process of inference. In future work, it would be important to assess this mechanism alongside momentary assessments of mood to understand whether more entropic learning processes contribute to distressing mood fluctuation.’

      ‘Relatedly, the authors mention that a primary focus of mentalization based therapy for BPD is 'restoring a stable sense of self' and 'differentiating the self from the other'. These goals are very reminiscent of the findings of the current study that individuals with BPD show lower uncertainty over their own and relative reward preferences, and that they are less susceptible to social contagion. Could the observed group differences therefore be a result of therapy rather than adverse early life experiences?’

      This is something that we wish to explore in further work. While verbal and model descriptions appear parsimonious, this is not straight forward. As we see, clinical observation and phenomenological dynamics may not necessarily match in an intuitive way to parameters of interest. It may be that compartmentalisation of self and other – as we see in BPD participants within our data – may counter-intuitively express as a less stable self. The evolutionary mechanisms that make social insertion and contagion enduring may also be the same that foster trust and learning.

      ‘Regarding partner similarity: It was unclear to me why the authors chose partners that were 50% similar when it would be at least equally interesting to investigate self-insertion and social contagion with those that are more than 50% different to ourselves? Do the authors have any assumptions or even data that shows the results still hold for situations with lower than 50% similarity?’

      While our task algorithm had a high probability to match individuals who were approximately 50% different with respect to their observed behaviour, there was variation either side of this value. The value of 50% median difference was chosen for two reasons: 1. We wanted to ensure participants had to learn about their partner to some degree relative to their own preferences and 2. we did not want to induce extreme over or under familiarity given the (now replicated) relationship between participant-partner similarity and intentional attributions (see below). Nevertheless, we did have some variation around the 50% median. Figure 3A in the top left panel demonstrates this fluctuation in participant-partner similarity and the figure legend further described this distribution (mean = 49%, sd = 12%). In future work we want to more closely manipulate the median similarity between participants and partners to understand how this facilitates or inhibits learning and generalisation.

      There is some analysis of the relationship between degrees of similiarity and behaviour. In the third paragraph of page 15 we report the influence of participant-partner similarity on reaction times. In prior work (Barnby et al., 2022; Cognition) we had shown that similarity was associated with reduced attributions of harm about a partner, irrespective of their true parameters (e.g. whether they were prosocial/competitive). We replicate this previous finding with a double dissociation illustrated in Figure 4, showing that greater discrepancies in participant-partner prosociality increases explicit harmful intent attributions (but not self-interest), and discrepancies in participant-partner individualism reduces explicit self-interest attributions (but not harmful intent). We have made these clearer in our results structure, and included FDR correction values for multiple comparisons.

      The methods section is rather dense and at least I found it difficult to keep track of the many different findings. I recommend the authors reduce the density by moving some of the secondary analyses in the supplementary materials, or alternatively, to provide an overall summary of all presented findings at the end of the Results section.

      We have now moved several of our exploratory findings into the supplementary materials, noteably the analysis of participant-partner similarity on reaction times (Fig S9), as well as the uncorrected correlation between parameters (Fig S7).

      Fig 2C) and Discussion p. 21: What do the authors mean by 'more sensitive updates'? more sensitive to what?

      We have now edited the wording to specify ‘more belief updating’ rather than ‘sensitive’ to be clearer in our language.

      P14 bottom: please specify what is meant by axial differences.

      We have changed this to ‘preference type’ rather than using the term ‘axial’.

      It may be helpful to have Supplementary Figure 1 in the main text.

      Thank you for this suggestion. Given the volume of information in the main text we hope that it is acceptable for Figure S1 to remain in the supplementary materials.

      Figure 3D bottom panel: what is the difference between left and right plots? Should one of them be alpha not beta?

      The left and right plots are of the change in standard deviation (left) and central tendency (right) of participant preference change between phase 1 and 3. This is currently noted in the figure legend, but we had added some text to be clearer that this is over prosocial-competitive beliefs specifically. We chose to use this belief as an example given the centrality of prosocial-comeptitive beliefs in the learning process in Figure 2. We also noticed a small labelling error in the bottom panels of 3D which should have noted that each plot was either with respect to the precision or mean-shift in beliefs during phase 3.

      ‘The relationship between uncertainty over the self and uncertainty over the other with respect to the change in the precision (left) and median-shift (right) in phase 3 prosocial-competitive beliefs .’

      Supplementary Figure 4: The prior presented does not look neutral to me, but rather right-leaning, so competitive, and therefore does indeed look like it was influenced by the self-model? If I am mistaken please could the authors explain why.

      This example distribution is taken from a single BPD participant. In this case, indeed, the prior is somewhat right-shifted. However, on a group level, priors over the partner were closely centred around 0 (see reported statistics in paragraph 2 under the heading ‘Phase 2 – BPD Participants Use Disintegrated and Neutral Priors). However, we understand how this may come across as misleading. For clarity we have expanded upon Figure S4 to include the phase 1 and prior phase 2 distributions for the entire BPD population for both prosocial and individualistic beliefs. This further demonstrates that those with BPD held surprisingly neutral beliefs over the expectations about their partners’ prosociality, but had minor shifts between their own individualistic preferences and the expected individualistic preferences of their partners. This is also visible in Figure S2.

      Reviewer 2:

      ‘There are two major weaknesses. First, the paper lacks focus and clarity. The introduction is rather vague and, after reading it, I remained confused about the paper's aims. Rather than relying on specific predictions, the analysis is exploratory. This implies that it is hard to keep track, and to understand the significance, of the many findings that are reported.’

      Thank you for this opportunity to be clearer in our framing of the paper. While the model makes specific causal predictions with respect to behavioural dynamics conditional on algorithmic differences, our other analyses were indeed exploratory. We did not preregister this work but now given the intriguing findings we intent to preregister our future analyses.

      We have made our introduction clearer with respect to the aims of the paper:

      ‘Our present work sought to achieve two primary goals: 1. Extend prior causal computational theories to formalise the interrelation between self-insertion and social contagion within an economic paradigm, the Intentions Game and 2., Test how a diagnosis of BPD may relate to deficits in these forms of generalisation. We propose a computational theory with testable predictions to begin addressing this question. To foreshadow our results, we found that healthy participants employ a mixed process of self-insertion and contagion to predict and align with the beliefs of their partners. In contrast, individuals with BPD exhibit distinct, disintegrated representations of self and other, despite showing similar average accuracy in their learning about partners. Our model and data suggest that the previously observed computational characteristics in BPD, such as reduced self-anchoring during ambiguous learning and a relative impermeability of the self, arise from the failure of information about others to transfer to and inform the self. By integrating separate computational findings, we provide a foundational model and a concise, dynamic paradigm to investigate uncertainty, generalization, and regulation in social interactions.’

      ‘Second, although the computational approach employed is clever and sophisticated, there is important information missing about model comparison which ultimately makes some of the results hard to assess from the perspective of the reader.’

      Our model comparison employed what is state of the art random-effects Bayesian model comparison (Piray et al., 2019; PLOS Comp. Biol.). It initially fits each individual to each model using Laplace approximation, and subsequently ‘races’ each model against each other on the group level and individual level through hierarchical constraints and random-effect considerations. We included this in the methods but have now expanded on the descrpition we used to compare models:

      In the results -

      ‘All computational models were fitted using a Hierarchical Bayesian Inference (HBI) algorithm which allows hierarchical parameter estimation while assuming random effects for group and individual model responsibility (Piray et al., 2019; see Methods for more information). We report individual and group-level model responsibility, in addition to protected exceedance probabilities between-groups to assess model dominance.’

      We added to our existing description in the methods –

      ‘All computational models were fitted using a Hierarchical Bayesian Inference (HBI) algorithm which allows hierarchical parameter estimation while assuming random effects for group and individual model responsibility (Piray et al., 2019). During fitting we added a small noise floor to distributions (2.22e<sup>-16</sup>) before normalisation for numerical stability. Parameters were estimated using the HBI in untransformed space drawing from broad priors (μM\=0, σ<sup>2</sup><sub>M</sub> = 6.5; where M\={M1, M2, M3, M4}). This process was run independently for each group. Parameters were transformed into model-relevant space for analysis. All models and hierarchical fitting was implemented in Matlab (Version R2022B). All other analyses were conducted in R (version 4.3.3; arm64 build) running on Mac OS (Ventura 13.0). We extracted individual and group level responsibilities, as well as the protected exceedance probability to assess model dominance per group.’

      (1) P3, third paragraph: please define self-insertion

      We have now more clearly defined this in the prior paragraph when introducing concepts.

      ‘To reduce uncertainty about others, theories of the relational self (Anderson & Chen, 2002) suggest that people have availble to them an extensive and well-grounded representation of themselves, leading to a readily accessible initial belief (Allport, 1924; Kreuger & Clement, 1994) that can be projected or integrated when learning about others (self-insertion).’

      (2) Introduction: the specific aim of the paper should be clarified - at the moment, it is rather vague. The authors write: "However, critical questions remain: How do humans adjudicate between self-insertion and contagion during interaction to manage interpersonal generalization? Does the uncertainty in self-other beliefs affect their generalizability? How can disruptions in interpersonal exchange during sensitive developmental periods (e.g., childhood maltreatment) inform models of psychiatric disorders?". Which of these questions is the focus of the paper? And how does the paper aim at addressing it?

      (3) Relatedly, from the introduction it is not clear whether the goal is to develop a theory of self-insertion and social contagion and test it empirically, or whether it is to study these processes in BPD, or both (or something else). Clarifying which specific question(s) is addressed is important (also clarifying what we already know about that specific question, and how the paper aims at elucidating that specific question).

      We have now included our specific aims of the paper. We note this in the above response to the reviwers general comments.

      (4) "Computational models have probed social processes in BPD, linking the BPD phenotype to a potential over-reliance on social versus internal cues (Henco et al., 2020), 'splitting' of social latent states that encode beliefs about others (Story et al., 2023), negative appraisal of interpersonal experiences with heightened self-blame (Mancinelli et al., 2024), inaccurate inferences about others' irritability (Hula et al., 2018), and reduced belief adaptation in social learning contexts (Siegel et al., 2020). Previous studies have typically overlooked how self and other are represented in tandem, prompting further investigation into why any of these BPD phenotypes manifest." Not clear what the link between the first and second sentence is. Does it mean that previous computational models have focused exclusively on how other people are represented in BPD, and not on how the self is represented? Please spell this out.

      Thank you for the opportunity to be clearer in our language. We have now spelled out our point more precisely, and included some extra relevant literature helpfully pointed out by another reviewer.

      ‘Computational models have probed social processes in BPD, although almost exclusively during observational learning. The BPD phenotype has been associated with a potential over-reliance on social versus internal cues (Henco et al., 2020), ‘splitting’ of social latent states that encode beliefs about others (Story et al., 2023), negative appraisal of interpersonal experiences with heightened self-blame (Mancinelli et al., 2024), inaccurate inferences about others’ irritability (Hula et al., 2018), and reduced belief adaptation in social learning contexts (Siegel et al., 2020). Associative models have also been adapted to characterize  ‘leaky’ self-other reinforcement learning (Ereira et al., 2018), finding that those with BPD overgeneralize (leak updates) about themselves to others (Story et al., 2024). Altogether, there is currently a gap in the direct causal link between insertion, contagion, and learning (in)stability.’

      (5) P5, first paragraph. The description of the task used in phase 1 should be more detailed. The essential information for understanding the task is missing.

      We have updated this section to point toward Figure 1 and the Methods where the details of the task are more clearly outlined. We hope that it is acceptable not to explain the full task at this point for brevity and to not interrupt the flow of the results.

      “Detailed descriptions of the task can be found in the methods section and Figure 1.’

      (6) P5, second paragraph: briefly state how the Psychometric data were acquired (e.g., self-report).

      We have now clarified this in the text.

      ‘All participants also self-reported their trait paranoia, childhood trauma, trust beliefs, and trait mentalizing (see methods).’

      (7) "For example, a participant could make prosocial (self=5; other=5) versus individualistic (self=10; other=5) choices, or prosocial (self=10; other=10) versus competitive (self=10; other=5) choices". Not sure what criteria are used for distinguishing between individualistic and competitive - they look the same?

      Sorry. This paragraph was not clear that the issue is that the interpretation of the choice depends on both members of the pair of options. Here, in one pair {(self=5,other=5) vs (self=10,other=5)}, it is highly pro-social for the self to choose (5,5), sacrificing 5 points for the sake of equality. In the second pair {(self=10,other=10) vs (self=10,other=5)}, it is highly competitive to choose (10,5), denying the other 5 points at no benefit to the self. We have clarified this:

      ‘We analyzed the ‘types’ of choices participants made in each phase (Supplementary Table 1). The interpretation of a participant’s choice depends on both values in a choice. For example, a participant could make prosocial (self=5; other=5) versus individualistic (self=10; other=5) choices, or prosocial (self=10; other=10) versus competitive (self=10; other=5) choices. There were 12 of each pair in phases 1 and 3 (individualistic vs. prosocial; prosocial vs. competitive; individualistic vs. competitive).’  

      (8) "In phase 1, both CON and BPD participants made prosocial choices over competitive choices with similar frequency (CON=9.67[3.62]; BPD=9.60[3.57])" please report t-test - the same applies also various times below.

      We have now included the t test statistics with each instance.

      ‘In phase 3, both CON and BPD participants continued to make equally frequent prosocial versus competitive choices (CON=9.15[3.91]; BPD=9.38[3.31]; t=-0.54, p=0.59); CON participants continued to make significantly less prosocial versus individualistic choices (CON=2.03[3.45]; BPD=3.78 [4.16]; t=2.31, p=0.02). Both groups chose equally frequent individualistic versus competitive choices (CON=10.91[2.40]; BPD=10.18[2.72]; t=-0.49, p=0.62).’

      (9) P 9: "Models M2 and M3 allow for either self-insertion or social contagion to occur independently" what's the difference between M2 and M3?

      Model M2 hypothesises that participants use their own self representation as priors when learning about the other in phase 2, but are not influenced by their partner. M3 hypothesises that participants form an uncoupled prior (no self-insertion) about their partner in phase 2, and their choices in phase 3 are influenced by observing their partner in phase 2 (social contagion). In Figure 1 we illustrate the difference between M2 and M3. In Table 1 we specifically report the parameterisation differences between M2 and M3. We have also now included a correlational analysis of parameters between models to demonstrate the relationship between model parameters of equivalent value between models (Fig S11). We have also force fitted all models (M1-M4) to the data independently and reported group differences within each (see Table S2 and Table S3).

      (10) P 9, last paragraph: I did not understand the description of the Beta model.

      The beta model is outlined in detail in Table 1. We have also clarified the description of the beta model on page 9:

      ‘The ‘Beta model’ is equivalent to M1 in its causal architecture (both self-insertion and social contagion are hypothesized to occur) but differs in richness: it accommodates the possibility that participants might only consider a single dimension of relative reward allocation, which is typically emphasized in previous studies (e.g., Hula et al., 2018).’

      (11) P 9: I wonder whether one could think about more intuitive labels for the models, rather than M1, M2 etc.. This is just a suggestion, as I am not sure a short label would be feasible here.

      Thank you for this suggestion. We apologise that it is not very intitutive. The problem is that given the various terms we use to explain the different processes of generalisation that might occur between self and other, and given that each model is a different combination of each, we felt that numbering them was a lesser evil. We hope that the reader will be able to reference both Figure 1 and Table 1 to get a good feel for how the models and their causal implications differ.

      (12) Model comparison: the information about what was done for model comparison is scant, and little about fit statistics is reported. At the moment, it is hard for a reader to assess the results of the model comparison analysis.

      Model comparison and fitting was conducted using simultaneous hierarchical fitting and random-effects comparison. This is employed through the HBI package (Piray et al., 2019) where the assumptions and fitting proceedures are outlined in great detail. In short, our comparison allows for individual and group-level hierarchical fitting and comparison. This overcomes the issue of interdependence between and within model fitting within a population, which is often estimated separately.

      We have outlined this in the methods, although appreciate we do not touch upon it until the reader reaches that point. We have added a clarification statement on page 9 to rectify this:

      ‘All computational models were fitted using a Hierarchical Bayesian Inference (HBI) algorithm which allows hierarchical parameter estimation while assuming random effects for group and individual model responsibility (Piray et al., 2019; see Methods for more information). We report individual and group-level model responsibility, in addition to protected exceedance probabilities between-groups to assess model dominance.’

      (13) P 14, first paragraph: "BPD participants were also more certain about both types of preference" what are the two types of preferences?

      The two types of preferences are relative (prosocial-competitive) and absolute (individualistic) reward utility. These are expressed as b and a respectively. We have expanded the sentence in question to make this clearer:

      ‘BPD participants were also more certain about both self-preferences for absolute and relative reward ( = -0.89, 95%HDI: -1.01, -0.75; = -0.32, 95%HDI: -0.60, -0.04) versus CON participants (Figure 2B).’

      (14) "Parameter Associations with Reported Trauma, Paranoia, and Attributed Intent" the results reported here are intriguing, but not fully convincing as there is the problem of multiple comparisons. The combinations between parameters and scales are rather numerous. I suggest to correct for multiple comparisons and to flag only the findings that survive correction.

      We have now corrected this and controlled for multiple comparisons through partial correlation analysis, bootstrapping assessment for robustness, permutation testing, and False Detection Rate correction. We only report those that survive bootstrapping and permutation testing, reporting both corrected (p[fdr]) and uncorrected (p) significance.

      (15) Results page 14 and page 15. The authors compare the various parameters between groups. I would assume that these parameters come from M1 for controls and from M4 for BDP? Please clarify if this is indeed the case. If it is the case, I am not sure this is appropriate. To my knowledge, it is appropriate to compare parameters between groups only if the same model is fit to both groups. If two different models are fit to each group, then the parameters are not comparable, as the parameter have, so to speak, different "meaning" in two models. Now, I want to stress that my knowledge on this matter may be limited, and that the authors' approach may be sound. However, to be reassured that the approach is indeed sound, I would appreciate a clarification on this point and a reference to relevant sources about this approach.

      This is an important point. First, we confirmed all our main conclusions about parameter differences using the maximal model M1 to fit all the participants. We added Supplementary Table 2 to report the outcome of this analysis. Second, we did the same for parameters across all models M1-M4, fitting each to participants without comparison. This is particularly relevant for M3, since at least a minority of participants of both groups were best explained by this model. We report these analyses in Fig S11:

      Since the M4 is nested within M1, we argue that this comparison is still meaningful, and note explanations in the text for why the effects noted between groups may occur given the differences in their causal meaning, for example in the results under phase 2 analyses:

      ‘Belief updating in phase 2 was less flexible in BPD participants. Median change in beliefs (from priors to posteriors) about a partner’s preferences was lower versus. CON ( = -5.53, 95%HDI: -7.20, -4.00; = -10.02, 95%HDI: -12.81, -7.30). Posterior beliefs about partner were more precise in BPD versus CON ( = -0.94, 95%HDI: -1.50, -0.45;  = -0.70, 95%HDI: -1.20, -0.25).  This is unsurprising given the disintegrated priors of the BPD group in M4, meaning they need to ‘travel less’ in state space. Nevertheless, even under assumptions of M1 and M2 for both groups, BPD showed smaller posteriors median changes versus CON in phase 2 (see Table T2). These results converge to suggest those with BPD form rigid posterior beliefs.’

      (16) "We built and tested a theory of interpersonal generalization in a population of matched participants" this sentence seems to be unwarranted, as there is no theory in the paper (actually, as it is now, the paper looks rather exploratory)

      We thank the reviewer for their perspective. Formal models can be used as a theoretical statement on the casual algorithmic process underlying decision making and choice behaviour; the development of formal models are an essential theoretical tool for precision and falsification (Haslbeck et al., 2022). In this sense, we have built several competing formal theories that test, using casual architectures, whether the latent distribution(s) that generate one’s choices generalise into one’s predictions about another person, and simultaneously whether one’s latent distribution(s) that represent beliefs about another person are used to inform future choices.

      Reviewer 3:

      ‘My broad question about the experiment (in terms of its clinical and cognitive process relevance): Does the task encourage competition or give participants a reason to take advantage of others? I don't think it does, so it would be useful to clarify the normative account for prosociality in the introduction (e.g., some of Robin Dunbar's work).’

      We agree that our paradigm does not encourage competition. We use a reward structure that makes it contingent on participants to overcome a particular threshold before earning rewards, but there is no competitive element to this, in that points earned or not earned by partners have no bearing on the outcomes for the participant. This is important given the consideration of recursive properties that arise through mixed-motive games; we wanted to focus purely on observational learning in phase 2, and repercussion-free choices made by participants in phase 1 and 3, meaning the choices participants, and decisions of a partner, are theoretically in line with self-preferences irrespective of the judgement of others. We have included a clearer statement of the structure of this type of task, and more clearly cited the origin for its structure (Murphy & Ackerman, 2011):

      ‘Our present work sought to achieve two primary goals. 1. Extend prior causal computational theories to formalise and test the interrelation between self-insertion and social contagion on learning and behaviour to better probe interpersonal generalisation in health, and 2., Test whether previous computational findings of social learning changes in BPD can be explained by infractions to self-other generalisation. We accomplish these goals by using a dynamic, sequential social value economic paradigm, the Intentions Game, building upon a Social Value Orientation Framework (Murphy & Ackerman, 2011) that assumes motivational variation in joint reward allocation.’

      Given the introductions structure as it stands, we felt providing another paragraph on the normative assumptions of such a game was outside the scope of this article.

      ‘The finding that individuals with BPD do not engage in self-other generalization on this task of social intentions is novel and potentially clinically relevant. The authors find that BPD participants' tendency to be prosocial when splitting points with a partner does not transfer into their expectations of how a partner will treat them in a task where they are the passive recipient of points chosen by the partner. In the discussion, the authors reasonably focus on model differences between groups (Bayesian model comparison), yet I thought this finding -- BPD participants not assuming prosocial tendencies in phase 2 while CON participant did -- merited greater attention. Although the BPD group was close to 0 on the \beta prior in Phase 2, their difference from CON is still in the direction of being more mistrustful (or at least not assuming prosociality). This may line up with broader clinical literature on mistrustfulness and attributions of malevolence in the BPD literature (e.g., a 1992 paper by Nigg et al. in Journal of Abnormal Psychology). My broad point is to consider further the Phase 2 findings in terms of the clinical interpretation of the shift in \beta relative to controls.’

      This is an important point, that we contextualize within the parameterisation of our utility model. While the shift toward 0 in the BPD participants is indeed more competitive, as the reviewer notes, it is surprisingly centred closely around 0, with only a slight bias to be prosocial (mean = -0.47;  = -6.10, 95%HDI: -7.60, -4.60). Charitably we might argue that BPD participants are expecting more competitive preferences from their partner. However even so, given their variance around their priors in phase 2, they are uncertain or unconfident about this. We take a more conservative approach in the paper and say that given the tight proximity to 0 and the variance of their group priors, they are likely to be ‘hedging their bets’ on whether their partner is going to be prosocial or competitive. While the movement from phase 1 to 2 is indeed in the competitive direction it still lands in neutral territory. Model M4 does not preclude central tendancies at the start of Phase 2 being more in the competitive direction.

      ‘First, the authors note that they have "proposed a theory with testable predictions" (p. 4 but also elsewhere) but they do not state any clear predictions in the introduction, nor do they consider what sort of patterns will be observed in the BPD group in view of extant clinical and computational literature. Rather, the paper seems to be somewhat exploratory, largely looking at group differences (BPD vs. CON) on all of the shared computational parameters and additional indices such as belief updating and reaction times. Given this, I would suggest that the authors make stronger connections between extant research on intention representation in BPD and their framework (model and paradigm). In particular, the authors do not address related findings from Ereira (2020) and Story (2024) finding that in a false belief task that BPD participants *overgeneralize* from self to other. A critical comparison of this work to the present study, including an examination of the two tasks differ in the processes they measure, is important.’

      Thank you for this opportunity to include more of the important work that has preceded the present manuscript. Prior work has tended to focus on either descriptive explanations of self-other generalisation (e.g. through the use of RW type models) or has focused on observational learning instability in absence of a causal model from where initial self-other beliefs may arise. While the prior work cited by the reviewer [Ereira (2020; Nat. Comms.) and Story (2024; Trans. Psych.)] does examine the inter-trial updating between self-other, it does not integrate a self model into a self’s belief about an other prior to observation. Rather, it focuses almost exclusively on prediction error ‘leakage’ generated during learning about individual reward (i.e. one sided reward). These findings are important, but lie in a slightly different domain. They also do not cut against ours, and in fact, we argue in the discussion that the sort of learning instability described above and splitting (as we cite from Story ea. 2024; Psych. Rev.) may result from a lack of self anchoring typical of CON participants. Nevertheless we agree these works provide an important premise to contrast and set the groundwork for our present analysis and have included them in the framing of our introduction, as well as contrasting them to our data in the discussion.

      In the introduction:

      ‘The BPD phenotype has been associated with a potential over-reliance on social versus internal cues (Henco et al., 2020), ‘splitting’ of social latent states that encode beliefs about others (Story et al., 2023), negative appraisal of interpersonal experiences with heightened self-blame (Mancinelli et al., 2024), inaccurate inferences about others’ irritability (Hula et al., 2018), and reduced belief adaptation in social learning contexts (Siegel et al., 2020). Associative models have also been adapted to characterize  ‘leaky’ self-other reinforcement learning (Ereira et al., 2018), finding that those with BPD overgeneralize (leak updates) about themselves to others (Story et al., 2024). Altogether, there is currently a gap in the direct causal link between insertion, contagion, and learning (in)stability.’

      In the discussion:

      ‘Disruptions in self-to-other generalization provide an explanation for previous computational findings related to task-based mentalizing in BPD. Studies tracking observational mentalizing reveal that individuals with BPD, compared to those without, place greater emphasis on social over internal reward cues when learning (Henco et al., 2020; Fineberg et al., 2018). Those with BPD have been shown to exhibit reduced belief adaptation (Siegel et al., 2020) along with ‘splitting’ of latent social representations (Story et al., 2024a). BPD is also shown to be associated with overgeneralisation in self-to-other belief updates about individual outcomes when using a one-sided reward structure (where participant responses had no bearing on outcomes for the partner; Story et al., 2024b). Our analyses show that those with BPD are equal to controls in their generalisation of absolute reward (outcomes that only affect one player) but disintegrate beliefs about relative reward (outcomes that affect both players) through adoption of a new, neutral belief. We interpret this together in two ways: 1. There is a strong concern about social relativity when those with BPD form beliefs about others, 2. The absence of constrained self-insertion about relative outcomes may predispose to brittle or ‘split’ beliefs. In other words, those with BPD assume ambiguity about the social relativity preferences of another (i.e. how prosocial or punitive) and are quicker to settle on an explanation to resolve this. Although self-insertion may be counter-intuitive to rational belief formation, it has important implications for sustaining adaptive, trusting social bonds via information moderation.’

      In addition, perhaps it is fairer to note more explicitly the exploratory nature of this work. Although the analyses are thorough, many of them are not argued for a priori (e.g., rate of belief updating in Figure 2C) and the reader amasses many individual findings that need to by synthesized.’

      We have now noted the primary goals of our work in the introduction, and have included caveats about the exploratory nature of our analyses. We would note that our model is in effect a causal combination of prior work cited within the introduction (Barnby et al., 2022; Moutoussis et al., 2016). This renders our computational models in effect a causal theory to test, although we agree that our dissection of the results are exploratory. We have more clearly signposted this:

      ‘Our present work sought to achieve two primary goals. 1. Extend prior causal computational theories to formalise and test the interrelation between self-insertion and social contagion on learning and behaviour to better probe interpersonal generalisation in health, and 2., Test whether previous computational findings of social learning changes in BPD can be explained by infractions to self-other generalisation. We accomplish these goals by using a dynamic, sequential economic paradigm, the Intentions Game, building upon a Social Value Orientation Framework (Murphy & Ackerman, 2011) that assumes innate motivational variation in joint reward allocation.‘

      ‘Second, in the discussion, the authors are too quick to generalize to broad clinical phenomena in BPD that are not directly connected to the task at hand. For example, on p. 22: "Those with a diagnosis of BPD also show reduced permeability in generalising from other to self. While prior research has predominantly focused on how those with BPD use information to form impressions, it has not typically examined whether these impressions affect the self." Here, it's not self-representation per se (typically, identity or one's view of oneself), but instead cooperation and prosocial tendencies in an economic context. It is important to clarify what clinical phenomena may be closely related to the task and which are more distal and perhaps should not be approached here.’

      Thank you for this important point. We agree that social value orientation, and particularly in this economically-assessed form, is but one aspect of the self, and we did not test any others. A version of the social contagion phenomena is also present in other aspects of the self in intertemporal (Moutoussis et al., 2016), economic (Suzuki et al., 2016) and moral preferences (Yu et al., 2021). It would be most interesting to attempt to correlate the degrees of insertion and contagion across the different tasks.

      We take seriously the wider concern that behaviour in our tasks based on economic preferences may not have clinical validity. This issue is central in the whole field of computational psychiatry, much of which is based on generalizing from tasks like ours, and discussing correlations with psychometric measures. We hope that it is acceptable to leave such discussions to the many reviews on computational psychiatry (Montague et al., 2012; Hitchcock et al., 2022; Huys et al., 2016). Here, we have just put a caveat in the dicussion:

      ‘Finally, a limitation may be that behaviour in tasks based on economic preferences may not have clinical validity. This issue is central to the field of computational psychiatry, much of which is based on generalising from tasks like that within this paper and discussing correlations with psychometric measures. Extrapolating  economic tasks into the real world has been the topic of discussion for the many reviews on computational psychiatry (e.g. Montague et al., 2012; Hitchcock et al., 2022; Huys et al., 2016). We note a strength of this work is the use of model comparison to understand causal algorithmic differences between those with BPD and matched healthy controls. Nevertheless, we wish to further pursue how latent characteristics captured in our models may directly relate to real-world affective change.’

      ‘On a more technical level, I had two primary concerns. First, although the authors consider alternative models within a hierarchical Bayesian framework, some challenges arise when one analyzes parameter estimates fit separately to two groups, particularly when the best-fitting model is not shared. In particular, although the authors conduct a model confusion analysis, they do not as far I could tell (and apologies if I missed it) demonstrate that the dynamics of one model are nested within the other. Given that M4 has free parameters governing the expectations on the absolute and relative reward preferences in Phase 2, is it necessarily the case that the shared parameters between M1 and M4 can be interpreted on the same scale? Relatedly, group-specific model fitting has virtues when believes there to be two distinct populations, but there is also a risk of overfitting potentially irrelevant sample characteristics when parameters are fit group by group.

      To resolve these issues, I saw one straightforward solution (though in modeling, my experience is that what seems straightforward on first glance may not be so upon further investigation). M1 assumes that participants' own preferences (posterior central tendency) in Phase 1 directly transfer to priors in Phase 2, but presumably the degree of transfer could vary somewhat without meriting an entirely new model (i.e., the authors currently place this question in terms of model selection, not within-model parameter variation). I would suggest that the authors consider a model parameterization fit to the full dataset (both groups) that contains free parameters capturing the *deviations* in the priors relative to the preceding phase's posterior. That is, the free parameters $\bar{\alpha}_{par}^m$ and $\bar{\beta}_{par}^m$ govern the central tendency of the Phase 2 prior parameter distributions directly, but could be reparametrized as deviations from Phase 1 $\theta^m_{ppt}$ parameters in an additive form. This allows for a single model to be fit all participants that encompasses the dynamics of interest such that between-group parameter comparisons are not biased by the strong assumptions imposed by M1 (that phase 1 preferences and phase 2 observations directly transfer to priors). In the case of controls, we would expect these deviation parameters to be centred on 0 insofar as the current M1 fit them best, whereas for BPD participants should have significant deviations from earlier-phase posteriors (e.g., the shift in \beta toward prior neutrality in phase 2 compared to one's own prosociality in phase 1). I think it's still valid for the authors to argue for stronger model constraints for Bayesian model comparison, as they do now, but inferences regarding parameter estimates should ideally be based on a model that can encompass the full dynamics of the entire sample, with simpler dynamics (like posterior -> prior transfer) being captured by near-zero parameter estimates.’

      Thank you for the chance to be clearer in our modelling. In particular, the suggestion to include a model that can be fit to all participants with the equivalent of the likes of partial social insertion, to check if the results stand, can actually be accomplished through our existing models.  That is, the parameter that governs the flexibility over beliefs in phase 2 under models M1 (dominant for CON participant) and M2 parameterises the degree to which participants think their partner may be different from themselves. Thus, forcibly fitting M1 and M2 hierarchically to all participants, and then separately to BPD and CON participants, can quantify the issue raised: if BPD participants indeed distinguish partners as vastly different from themselves enough to warent a new central tendency, should be quantitively higher in BPD vs CON participants under M1 and M2.

      We therefore tested this, reporting the distributional differences between for BPD and CON participants under M1, both when fitted together as a population and as separate groups. As is higher for BPD participants under both conditions for M1 and M2 it supports our claim and will add more context for the comparison - may be large enough in BPD that a new central tendency to anchor beliefs is a more parsimonious explanation.

      We cross checked this result by assessing the discrepancy between the participant’s and assumed partner’s central tendencies for both prosocial and individualistic preferences via best-fitting model M4 for the BPD group. We thereby examined whether belief disintegration is uniform across preferences (relative vs abolsute reward) or whether one tendency was shifted dramatically more than another.  We found that beliefs over prosocial-competitive preferences were dramatically shifted, whereas those over individualistic preferences were not.

      We have added the following to the main text results to explain this:

      Model Comparison:

      ‘We found that CON participants were best fit at the group level by M1 (Frequency = 0.59, Protected Exceedance Probability = 0.98), whereas BPD participants were best fit by M4 (Frequency = 0.54, Protected Exceedance Probability = 0.86; Figure 2A). We first analyse the results of these separate fits. Later, in order to assuage concerns about drawing inferences from different models, we examined the relationships between the relevant parameters when we forced all participants to be fit to each of the models (in a hierarchical manner, separated by group). In sum, our model comparison is supported by convergence in parameter values when comparisons are meaningful. We refer to both types of analysis below.’

      Phase 1:

      ‘These differences were replicated when considering parameters between groups when we fit all participants to the same models (M1-M4; see Table S2).’

      Phase 2:

      ‘To check that these conclusions about self-insertion did not depend on the different models, we found that only under M1 and M2 were consistently larger in BPD versus CON. This supports the notion that new central tendencies for BPD participants in phase 2 were required, driven by expectations about a partner’s relative reward. (see Fig S10 & Table S2). and parameters under assumptions of M1 and M2 were strongly correlated with median change in belief between phase 1 and 2 under M3 and M4, suggesting convergence in outcome (Fig S11).’

      ‘Furthermore, even under assumptions of M1-M4 for both groups, BPD showed smaller posterior median changes versus CON in phase 2 (see Table T2). These results converge to suggest those with BPD form rigid posterior beliefs.’

      ‘Assessing this same relationship under M1- and M2-only assumptions reveals a replication of this group effect for absolute reward, but the effect is reversed for relative reward (see Table S3). This accords with the context of each model, where under M1 and M2, BPD participants had larger phase 2 prior flexibility over relative reward (leading to larger initial surprise), which was better accounted for by a new central tendency under M4 during model comparison. When comparing both groups under M1-M4 informational surprise over absolute reward was consistently restricted in BPD (Table S3), suggesting a diminished weight of this preference when forming beliefs about an other.’

      Phase 3

      ‘In the dominant model for the BPD group—M4—participants are not influenced in their phase 3 choices following exposure to their partner in phase 2. To further confirm this we also analysed absolute change in median participant beliefs between phase 1 and 3 under the assumption that M1 and M3 was the dominant model for both groups (that allow for contagion to occur). This analysis aligns with our primary model comparison using M1 for CON and M4 for BPD  (Figure 2C). CON participants altered their median beliefs between phase 1 and 3 more than BPD participants (M1: linear estimate = 0.67, 95%CI: 0.16, 1.19; t = 2.57, p = 0.011; M3: linear estimate = 1.75, 95%CI: 0.73, 2.79; t = 3.36, p < 0.001). Relative reward was overall more susceptible to contagion versus absolute reward (M1: linear estimate = 1.40, 95%CI: 0.88, 1.92; t = 5.34, p<0.001; M3: linear estimate = 2.60, 95%CI: 1.57, 3.63; t = 4.98, p < 0.001). There was an interaction between group and belief type under M3 but not M1 (M3: linear estimate = 2.13, 95%CI: 0.09, 4.18, t = 2.06, p=0.041). There was only a main effect of belief type on precision under M3 (linear estimate = 0.47, 95%CI: 0.07, 0.87, t = 2.34, p = 0.02); relative reward preferences became more precise across the board. Derived model estimates of preference change between phase 1 and 3 strongly correlated between M1 and M3 along both belief types (see Table S2 and Fig S11).’

      ‘My second concern pertains to the psychometric individual difference analyses. These were not clearly justified in the introduction, though I agree that they could offer potentially meaningful insight into which scales may be most related to model parameters of interest. So, perhaps these should be earmarked as exploratory and/or more clearly argued for. Crucially, however, these analyses appear to have been conducted on the full sample without considering the group structure. Indeed, many of the scales on which there are sizable group differences are also those that show correlations with psychometric scales. So, in essence, it is unclear whether most of these analyses are simply recapitulating the between-group tests reported earlier in the paper or offer additional insights. I think it's hard to have one's cake and eat it, too, in this regard and would suggest the authors review Preacher et al. 2005, Psychological Methods for additional detail. One solution might be to always include group as a binary covariate in the symptom dimension-parameter analyses, essentially partialing the correlations for group status. I remain skeptical regarding whether there is additional signal in these analyses, but such controls could convince the reader. Nevertheless, without such adjustments, I would caution against any transdiagnostic interpretations such as this one in the Highlights: "Higher reported childhood trauma, paranoia, and poorer trait mentalizing all diminish other-to-self information transfer irrespective of diagnosis." Since many of these analyses relate to scales on which the groups differ, the transdiagnostic relevance remains to be demonstrated.’

      We have restructured the psychometric section to ensure transparency and clarity in our analysis. Namely, in response to these comments and those of the other reviewers, we have opted to remove the parameter analyses that aimed to cross-correlate psychometric scores with latent parameters from different models: as the reviewer points out, we do not have parity between dominant models for each group to warrant this, and fitting the same model to both groups artificially makes the parameters qualitatively different. Instead we have opted to focus on social contagion, or rather restrictions on , between phases 1 and 3 explained by M3. This provides us with an opportunity to examine social contagion on the whole population level isolated from self-insertion biases. We performed bootstrapping (1000 reps) and permutation testing (1000 reps) to assess the stability and significance of each edge in the partial correlation network, and then applied FDR correction (p[fdr]), thus controlling for multiple comparisons. We note that while we focused on M3 to isolate the effect across the population, social contagion across both relative and absolute reward under M3 strongly correlated with social contagion under M1 (see Fig S11).

      ‘We explored whether social contagion may be restricted as a result of trauma, paranoia, and less effective trait mentalizing under the assumption of M3 for all participants (where everyone is able to be influenced by their partner). To note, social contagion under M3 was highly correlated with contagion under M1 (see Fig S11). We conducted partial correlation analysis to estimate relationships conditional on all other associations and retained all that survived bootstrapping (1000 reps), permutation testing (1000 reps), and subsequent FDR correction. Persecution and CTQ scores were both moderately associated with MZQ scores (RGPTSB r = 0.41, 95%CI: 0.23, 0.60, p = 0.004, p[fdr]=0.043; CTQ r = 0.354 95%CI: 0.13, 0.56, p=0.019, p[fdr]=0.02). MZQ scores were in turn moderately and negatively associated with shifts in prosocial-competitive preferences () between phase 1 and 3 (r = -0.26, 95%CI: -0.46, -0.06, p=0.026, p[fdr]=0.043). CTQ scores were also directly and negatively associated with shifts in individualistic preferences (; r = -0.24, 95%CI: -0.44, -0.13, p=0.052, p[fdr]=0.065). This provides some preliminary evidence that trauma impacts beliefs about individualism directly, whereas trauma and persecutory beliefs impact beliefs about prosociality through impaired mentalising (Figure 4A).’

      (1) As far as I could tell, the authors didn't provide an explanation of this finding on page 5: "However, CON participants made significantly fewer prosocial choices when individualistic choices were available" While one shouldn't be forced to interpret every finding, the paper is already in that direction and I found this finding to be potentially relevant to the BPD-control comparison.

      Thank you for this observation. This sentance reports the fact that CON participants were effectively more selfish than BPD participants. This is captured by the lower value of reported in Figure 2, and suggests that CON participants were more focused on absolute value – acting in a more ‘economically rational’ manner – versus BPD participants. This fits in with our fourth paragraph of the discussion where we discuss prior work that demonstrates a heightened social focus in those with BPD. Indeed, the finding the reviewer highlights further emphasises the point that those with BPD are much more sensitive, and motived to choose, options concerning relative reward than are CON participants. The text in the discussion reads:

      ‘We also observe this in self-generated participant choice behaviour, where CON participants were more concerned over absolute reward versus their BPD counterparts, suggesting a heighted focus on relative vs. absolute reward in those with BPD.’

      (2) The adaptive algorithm for adjusting partner behavior in Phase 2 was clever and effective. Did the authors conduct a manipulation check to demonstrate that the matching resulted in approximately 50% difference between one's behavior in Phase 1 and the partner in Phase 2? Perhaps Supplementary Figure suffices, but I wondered about a simpler metric.

      Thanks for this point. We highlight this in Figure 3B and within the same figure legend although appreciate the panel is quite small and may be missed.  We have now highlighted this manipulation check more clearly in behavioural analysis section of the main text:

      ‘Server matching between participant and partner in phase 2 was successful, with participants being approximately 50% different to their partners with respect to the choices each would have made on each trial in phase 2 (mean similarity=0.49, SD=0.12).’

      (3) The resolution of point-range plots in Figure 4 was grainy. Perhaps it's not so in the separate figure file, but I'd suggest checking.

      Apologies. We have now updated and reorganised the figure to improve clarity.

      (4) p. 21: Suggest changing to "different" as opposed to "opposite" since the strategies are not truly opposing: "but employed opposite strategies."

      We have amended this.

      (5) p. 21: I found this sentence unclear, particularly the idea of "similar updating regime." I'd suggest clarifying: "In phase 2, CON participants exhibited greater belief sensitivity to new information during observational learning, eventually adopting a similar updating regime to those with BPD."

      We have clarified this statement:

      ‘In observational learning in phase 2, CON participants initially updated their beliefs in response to new information more quickly than those with BPD, but eventually converged to a similar rate of updating.’

      (6) p. 23: The content regarding psychosis seemed out of place, particularly as the concluding remark. I'd suggest keeping the focus on the clinical population under investigation. If you'd like to mention the paradigm's relevance to psychosis (which I think could be omitted), perhaps include this as a future direction when describing the paradigm's strengths above.

      We agree the paragraph is somewhat speculative. We have omitted it in aid of keeping the messaging succinct and to the point.

      (7) p. 24: Was BPD diagnosis assess using unstructured clinical interview? Although psychosis was exclusionary, what about recent manic or hypomanic episodes or Bipolar diagnosis? A bit more detail about BPD sample ascertainment would be useful, including any instruments used to make a diagnosis and information about whether you measured inter-rater agreement.

      Participants diagnosed with BPD were recruited from specialist personality disorder services across various London NHS mental health trusts. The diagnosis of BPD was established by trained assessors at the clinical services and confirmed using the Structured Clinical Interview for DSM-IV (SCID-II) (First et al., 1997). Individuals with a history of psychotic episodes, severe learning disability or neurological illness/trauma were excluded. We have now included this extra detail within our methods in the paper:

      ‘The majority of BPD participants were recruited through referrals by psychiatrists, psychotherapists, and trainee clinical psychologists within personality disorder services across 9 NHS Foundation Trusts in the London, and 3 NHS Foundation Trusts across England (Devon, Merseyside, Cambridgeshire). Four BPD participants were also recruited by self-referral through the UCLH website, where the study was advertised. To be included in the study, all participants needed to have, or meet criteria for, a primary diagnosis of BPD (or emotionally-unstable personality disorder or complex emotional needs) based on a professional clinical assessment conducted by the referring NHS trust (for self-referrals, the presence of a recent diagnosis was ascertained through thorough discussion with the participant, whereby two of the four also provided clinical notes). The patient participants also had to be under the care of the referring trust or have a general practitioner whose details they were willing to provide. Individuals with psychotic or mood disorders, recent acute psychotic episodes, severe learning disability, or current or past neurological disorders were not eligible for participation and were therefore not referred by the clinical trusts.‘

    1. eLife Assessment

      This work provides important insights into mucosal antibody responses against SARS-CoV-2 following intranasal immunization by characterizing a large number of monoclonal antibodies at both mucosal and non-mucosal sites. The evidence supporting the claims is solid. The demonstrated in vitro antiviral activity of antibodies characterized provides a rationale for developing mucosal vaccines, especially if confirmed in vivo and benchmarked against antibodies generated following intramuscular vaccination.

    2. Reviewer #2 (Public review):

      Summary:

      Demonstrate the breadth of IgA response as determined by isolating individual antigen-specific B cells and generating mAbs in mice following intranasal immunization of mice with SARS-CoV2 Spike protein. The findings show that some IgA mAb can neutralize the virus, but many do not. Notable immunization with Wuhan S protein generates a weak response to the omicron variant.

      Strengths:

      Detailed analysis characterizing individual B cells with the generation of mAbs demonstrates the response's breadth and diversity of IgA responses and the ability to generate systemic immune responses.

      Comments on Revision:

      I have re-reviewed the paper and responses to my and other reviewers' comments. I feel the authors have adequately addressed my and other reviewer's comments.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review): 

      Despite evidence suggesting the benefits of neutralizing mucosa-derived IgA in the upper airway in protection against the SARS-CoV-2 virus, all currently approved vaccines are administered intramuscularly, which mainly induces systemic IgG. Waki et al. aimed to characterize the benefits of intranasal vaccination at the molecular level by isolating B cell clones from nasal tissue. The authors found that Spike-specific plasma cells isolated from the spleen of vaccinated mice showed significant clonal overlap with Spikespecific plasma cells isolated from nasal tissue. Interestingly, they could not detect any spike-specific plasma cells in the bone marrow or Peyer's patches, indicating that these nose-derived cells did not necessarily home to and reside in these locations, although the Peyer's patch is not a typical plasma cell niche - rather the lamina propria of the gut would have been a better place to look. Furthermore, they found that multimerization improves the antibody/antigen binding when the antibody is of low or intermediate affinity, but that high-affinity monomeric antibodies do not benefit from multimerization. Lastly, the authors used a competitive ELISA assay to show that multimerization could improve the neutralizing capacity of these

      antibodies. 

      The strength of this paper is the cloning of multiple IgA from the nasal mucosae (n=99) and the periphery (n=114) post-SARS-CoV-2 i.n. vaccination to examine the clonal relationship of this IgA with other sites, including the spleen. This analysis provides novel insights into the nature of the mucosal antibody response at the site where the host would encounter the virus, and whether this IgA response disseminates to other

      tissues. 

      There were also some weaknesses: 

      (1) The finding that multimerization improves binding and neutralization is not surprising as this was observed before by Wang and Nussenzweig for anti-SARS-CoV-2 IgA (authors should cite Enhanced SARS-CoV-2 neutralization by dimeric IgA. Wang et al., Sci. Transl. Med 2021, 13:3abf1555). 

      We have cited the paper, and the relevant sentence has been modified as follows (line 51-53); Recent studies have demonstrated that multimeric IgA is more effective and provides greater cross-protection than IgG and M-IgA (Okuya et al., 2020b) (Asahi et al., 2002) (Dhakal et al., 2018) (Asahi-Ozaki et al., 2004) (Wang et al., 2021).

      In addition, as far as I can tell we cannot ascertain the purity of fractions from the size exclusion chromatography thus I wasn't sure whether the input material used in Fig. 4 was a mixed population of dimer/trimer/tetramer?  

      The S-IgAs used in the SPR analysis in Fig. 4 consist of a mixture of dimers, trimers, and tetramers. The observed values indicate the average affinity of the S-IgAs. Please refer to the revised version (line 278280).

      (2) The flow cytometric assessment of the IgA+ clones from the nasal mucosae was difficult to interpret (Fig. 1B). It was hard for me to tell what they were gating on and subsequently analyzing without an IgA-negative population for reference. 

      We have updated FACS plots to illustrate the presence of IgA+ plasma cells in Fig. 1B, and the detailed gating strategy is outlined in Fig. 1B legend. Please find the relevant statements (line 115-120).

      (3) While the i.n. study itself is large and challenging, it would have been interesting to compare an i.m. route and examine the breadth of SARS-CoV-2 variant S1 binding for IgGs as in Fig. 2A. Are the IgA responses derived from the mucosae of greater breadth than systemic IgG responses? Alternatively, and easier, authors could do some comparisons with well-characterized IgG mAb for affinity and cross-reactivity as a benchmark to compare with the IgAs they looked at. Overall the authors did a good job of looking at a large range of systemic vs mucosal S1-specific antibodies in the context of an intra-nasal vaccination and this provides additional evidence for the utility of mucosal vaccination approaches for reducing person-to-person transmission. 

      I appreciate your consideration. Recent reports indicate that some M-IgA monomers possess neutralizing activity that is equivalent to or less than that of IgGs. However, the opposite phenomenon has also been observed. These results suggest that the Fc does not merely correlate with the degree of increase in antibody reactivity or functionality. We believe the discrepancies in previous studies are due to variations in the binding modes between the epitope and paratope of each antibody clone. Nevertheless, oligomerization enhances the functionality of most monomeric antibody clones, suggesting that the multivalent S-IgA enables a mode of action that is challenging to achieve with a monomeric antibody. Please refer to the revised version (line 399-403).

      Alternatively, and easier, authors could do some comparisons with well-characterized IgG mAb for affinity and cross-reactivity as a benchmark to compare with the IgAs they looked at. Overall the authors did a good job of looking at a large range of systemic vs mucosal S1-specific antibodies in the context of an intra-nasal vaccination and this provides additional evidence for the utility of mucosal vaccination approaches for reducing person-to-person transmission. 

      We have summarized the characteristics of the four types of nasal IgAs in Fig.7 and in the Discussion. Please refer to the revised version (line 405-422).

      Reviewer #2 (Public Review): 

      Summary: 

      This research demonstrates the breadth of IgA response as determined by isolating individual antigenspecific B cells and generating mAbs in mice following intranasal immunization of mice with SARS-CoV2 Spike protein. The findings show that some IgA mAb can neutralize the virus, but many do not. Notable immunization with Wuhan S protein generates a weak response to the omicron variant. 

      Strengths: 

      Detailed analysis characterizing individual B cells with the generation of mAbs demonstrates the response's breadth and diversity of IgA responses and the ability to generate systemic immune responses. 

      Weaknesses: 

      The data presentation needs clarity, and results show mAb ability to inhibit SARS-CoV2 in vitro. How IgA functions in vivo is uncertain. 

      We conducted an additional experiment using a hamster model and confirmed that S-IgAs can protect against SARS-CoV-2 infection. Please refer to the revised version (line 349-373 and 431-438).

      Reviewer #1 (Recommendations For The Authors): 

      (1) Figure 1A shows antibody titers in nasal lavage fluid and serum of mice post intranasal vaccination with SARS-CoV-2 Spike protein. The Y-axis of this figure is labeled as "U/mg" however these units are not clearly defined. 

      The antibody titers are expressed as optical density (OD450) value per total protein in nasal lavage fluids or serum. Please find the relevant statements (line 113-114).

      Furthermore, what do antibody titers in the nasal lavage fluid and serum look like post-intramuscular vaccination with the same vaccine and dose? Comparison of titers to the intramuscular route as well as to the PBS control would make this data more impactful. 

      We appreciate your consideration. We have not conducted experiments comparing the effects of intramuscular and intranasal administration using the same dosage and adjuvant. Cholera toxin has primarily been used as an adjuvant for nasal immunization, but it is seldom applied for intramuscular injection. We are interested in its impact on the immune compartment when using cholera toxin as an adjuvant for intramuscular injection. We plan to conduct further experiments in the future.

      Lastly, in Figure 1B, the detection of nasal IgG is not shown even though the authors assess nasally-derived IgG in the spleen further into the study.  

      Since the number of lymphocytes that can be collected from the nasal mucosa is limited, there is an insufficient capacity to isolate IgG+ plasma cells after collecting IgA+ plasma cells. Therefore, conducting such an experiment on mice is technically challenging. A larger animal, such as rats, will be necessary to perform this experiment. Further investigation is needed to determine whether antigen-specific IgG+ plasma cells, sharing V-(D)-J with nasal IgA, can be detected in the nasal mucosa.

      (2) There appears to be something amiss with the IgA stain. It is smushed up against the X-axis. Better flow cytometry profiles should be shown. Likewise in Supplemental Fig. 1A, their IgA stain appears to not be working. This must be addressed using positive and negative controls. 

      We have updated FACS-polts to show the IgA+ plasma cell in Fig.1B, and the detailed gating strategy is outlined in the Fig.1B legend. Please find the relevant statements on line 115-120.

      (3) We do not know the purity of the samples that were subjected to SPR and since the legend of Fig. 4 is partially incorrect, it was difficult to know how this experiment was done. 

      The S-IgA used in the SPR analysis shown in Figure 4 is a mixture of dimers, trimers, and tetramers, and the observed values are believed to reflect the affinity of the S-IgA in the nasal mucosa. Please refer to the revised version (line 278-280).

      (4) Fig. 5 results need to compare with some of the well-characterized mAb (IgG) to understand the biological significance of these neutralizing titres. 

      We have summarized the characteristics of the four types of nasal IgA in Fig.7 and in the Discussion. Please refer to the revised version (page 405-422).

      Communication of results: 

      (1) Authors could improve the communication of their results by introducing the vaccination protocol in the results section accompanied by a diagram of the vaccination strategy (nature of the Ag, route, and frequency). This could be Fig. 1A .  

      A schematic diagram of the vaccination protocol is presented in Fig.1.

      (2) Care should be taken with some of the terminology. Intranasal is the accepted term but authors sometimes use "internasal". The term "immunosuppression" on page 2 could be misleading as it means something different to other audiences. The distinction when speaking about "protection from harmful pathogens" should be made between protection against infection (ie sterilizing immunity) vs protection against disease (ie morbidity and mortality). Instead of "nose", one should say "nasal". Nose-related could be rephrased as "potentially nasal-derived". P.5, line 2 didn't make sense: "IgG+ plasma cells that express nose-related IgA"...

      In many places, Spike is missing it's "e".  

      We have made the correction accordingly.

      (3) Page 3: The lumping of the human and animal SARS-CoV-2 intranasal studies together is a bit misleading. Very little has worked for intranasal vaccination against SARS-CoV-2 in humans at this point in time (although hopefully that will change soon!). Authors should specify which studies were done in animals and which were done in humans. 

      The manuscript has been revised to include two citations on line 73-75 (Ewer et al., 2021 and Zhu et al., 2023).

      (4) What is ER-tracker? It comes out of nowhere and should be explained why it was used to the reader (as well as why they used the other markers) to sort for Spike-specific PC. 

      ER-Tracker is a fluorescent dye that is highly selective for the endoplasmic reticulum of living cells. Because plasma cells have an expanded endoplasmic reticulum for properly folding and secreting large quantities of antibodies, using ER-Tracker along with anti-CD138 facilitates the isolation of plasma cells from lymphocytes without the need for additional antibodies. Please refer to the revised version for details. (ine 130-134).

    1. eLife Assessment

      This study uses C. elegans to investigate how the Calcium/Calmodulin-dependent kinase CMK-1 regulates adaptation to thermo-nociceptive stimuli. The authors use compelling approaches to identify Calcineurin as a phosphorylation target of CMK-1 and to investigate the relationship between CMK-1 and Calcineurin using gain and loss of function genetic and pharmacological methods. The findings of this study are valuable as they show that CMK-1 and Calcineurin act in separate neurons in an antagonistic and complex manner to regulate thermo-nociceptive adaptation, and these results may be relevant for understanding some chronic human pain conditions.

    2. Reviewer #1 (Public review):

      Summary:

      Goal: Find downstream targets of cmk-1 phosphorylation, identify one that also seems to act in thermosensory habituation, test for genetic interactions between cmk-1 and this gene and assess where these genes are acting in the thermosensory circuit during thermosensory habituation.

      Methods: Two in vitro analyses of cmk-1 phosphorylation of C. elegans proteins. Thermosensory habituation of cmk-1 and tax-6 mutants and double mutants was assessed by measuring rate of heat evoked reversals (reversal probability) of C. elegans before and after 20s ISI repeated heat pulses over 60 minutes.

      Conclusions: cmk-1 and tax-6 act in separate habituation processes primarily in AFD, that interact complexly, but both serve to habituate the thermosensory reversal response. They found that cmk-1 primarily acts in AFD and tax-6 primarily acts in RIM (and FLP for naïve responses). They also identified hundreds of potential cmk-1 phosphorylation substrates in vitro.

      Strengths:

      The effects size in the genetic data is quite strong and a large number of genetic interaction experiments between cmk-1 and tax-1 demonstrate a complex interaction.

      A major concern concerning this manuscript was the assumption that the process they are observing is habituation. The two previously cited papers using this (or a very similar) protocol, Lia and Glauser 2020 and Jordan and Glauser 2023, both use the word 'adaptation' to describe the observed behavioral decrement. Jordan and Glauser 2023 does occasionally use the words 'habituation' or 'habituation-like' 10 times, however it uses 'adaptation' over 100 times. It is critical to distinguish habituation from sensory adaptation (or fatigue) in this thermal reversal protocol. These processes are often confused/conflated, however they are very different; sensory adaptation is a process that decreases how much the nervous system is activated by a repeated stimulus, therefore it can even occur outside of the nervous system. Habituation is a learning process where the nervous system responds less to a repeated stimulus, despite (at least part of the nervous system) the nervous system still being similarly activated by the stimulus. Habituation is considered an attentional process, while adaptation is due to fatigue of sensory transduction machinery. Control experiments such as tests for dishabituation (where application of a different stimulus causes recovery of the decremented response) or rate of spontaneous recovery (more rapid recovery after short inter-stimulus intervals) are required to determine if habituation or sensory adaptation are occurring. These experiments will allow the results to be interpreted with clarity; without them, it isn't actually clear what biological process is actually being studied. The authors have accepted this distinction and now correctly call the process adaptation.

      While there was originally some discrepancy between the two in vitro phosphorylation experiments and the in silico predictions, the revision has cleared up the issues.<br /> Figure 3 -S1: This model has been adjusted to more closely fit the data.

      The authors have expanded the discussion about the significance of the sites of cmk-1 and tax-6 function in the neural circuit.

    3. Reviewer #2 (Public review):

      Summary:

      The reduction in a response to a specific stimuli after repeated exposures is called habituation. Alterations in habituation to noxious stimuli are associated with chronic pain in humans, however the underlying molecular mechanisms involved are not clear. This study uses the nematode C. elegans to study genes and mechanisms that underlie adaptation to a form of noxious stimuli based on heat, termed thermo-noxious stimuli. The authors previously showed that the Calcium/Calmodulin-dependent protein kinase (CMK-1) regulates thermo-nociceptive adaptation in the nematode C. elegans. Although CMK-1 is a kinase with many known substrates, the downstream targets relevant for thermo-nociceptive adaptation are not known. In this study, the authors use two different kinase screens to identify phosphorylation targets of CMK-1. One of the targets they identify is Calcineurin (TAX-6). The authors show that CMK-1 phosphorylates a regulatory domain of Calcineurin at a highly conserved site (S443). In a series of elegant experiments, the authors use genetic and pharmacological approaches to increase or decrease CMK-1 and Calcineurin signaling to study their effects on thermo-nociceptive adaptation in C. elegans. They also combine these various approaches to study the interactions between these two signaling proteins. The authors use specific promoters to determine in which neurons CMK-1 and Calcineurin function to regulate thermo-nociceptive adaptation. The authors propose a model based on their findings, illustrating that CMK-1 and Calcineurin act mostly in different neurons to antagonistically regulate adaptation to thermo-nociceptive stimuli in a complex manner.

      Strengths:

      - Given the conservation of adaptation across phylogeny, identifying genes and mechanisms that underlie nociceptive adaptation in C. elegans may be relevant for understanding chronic pain in humans.<br /> - The identification of canonical CaM Kinase phosphorylation motifs in the substrates identified in the CMK-1 substrate screen validates the screen.<br /> - The use of loss and gain of function approaches to study the effects of CMK-1 and Calcineurin on thermo-nociceptive responses and adaptation is elegant.<br /> - The ability to determine the cellular place of action of CMK-1 and Calcineurin using neuron specific promoters in the nematode is a clear strength of the genetic model system.

      Weaknesses:

      - The manuscript begins by identifying Calcineurin as a direct substrate of CMK-1 but ends by showing that CMK-1 and Calcineurin mostly act in different neurons to regulate nociceptive adaptation, thus the physiological relevance of CMK-1 phosphorylation of Calcineurin is not clear.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      Goal: Find downstream targets of cmk-1 phosphorylation, identify one that also seems to act in thermosensory habituation, test for genetic interactions between cmk-1 and this gene, and assess where these genes are acting in the thermosensory circuit during thermosensory habituation.  

      Methods: Two in vitro analyses of cmk-1 phosphorylation of C. elegans proteins. Thermosensory habituation of cmk-1 and tax-6 mutants and double mutants was assessed by measuring the rate of heat-evoked reversals (reversal probability) of C. elegans before and after 20s ISI repeated heat pulses over 60 minutes.  

      Conclusions: cmk-1 and tax-6 act in separate habituation processes, primarily in AFD, that interact complexly, but both serve to habituate the thermosensory reversal response. They found that cmk-1 primarily acts in AFD and tax-6 primarily acts in RIM (and FLP for naïve responses). They also identified hundreds of potential cmk-1 phosphorylation substrates in vitro.  

      Strengths:  

      The effect size in the genetic data is quite strong and a large number of genetic interaction experiments between cmk-1 and tax-1 demonstrate a complex interaction.  

      Thanks a lot for these positive remarks.

      Weaknesses:  

      The major concern about this manuscript is the assumption that the process they are observing is habituation. The two previously cited papers using this (or a very similar) protocol, Lia and Glauser 2020 and Jordan and Glauser 2023, both use the word 'adaptation' to describe the observed behavioral decrement. Jordan and Glauser 2023 use the words 'habituation' or 'habituation-like' 10 times, however, they use 'adaptation' over 100 times. It is critical to distinguish habituation from sensory adaptation (or fatigue) in this thermal reversal protocol. These processes are often confused/conflated, however, they are very different; sensory adaptation is a process that decreases how much the nervous system is activated by a repeated stimulus, therefore it can even occur outside of the nervous system. Habituation is a learning process where the nervous system responds less to a repeated stimulus, despite (at least part of the nervous system) the nervous system still being similarly activated by the stimulus. Habituation is considered an attentional process, while adaptation is due to the fatigue of sensory transduction machinery. Control experiments such as tests for dishabituation (where the application of a different stimulus causes recovery of the decremented response) or rate of spontaneous recovery (more rapid recovery after short inter-stimulus intervals) are required to determine if habituation or sensory adaptation are occurring. These experiments will allow the results to be interpreted with clarity, without them, it isn't actually clear what biological process is actually being studied.  

      Thanks for the comment. As this reviewer points out, “adaptation” and “habituation” are often conflated. Many scientists (maybe not the majority though) use a less stringent definition for the word habituation, than the one presented by this reviewer. More particularly, the term habituation is used in human pain research to refer solely to the reduction of response to repeated stimuli, in the absence of a detailed assessment of the more stringent criteria mentioned here (see, e.g.,  PMID: 22337205 ; PMID: 18947923 ; PMID: 17258858; PMID: 20685171 ; PMID: 15978487). In addition to the practice in pain research, the main reason why we steered toward ‘habituation’ from our previous publication is because it immediately conveys the idea of a response reduction, whereas ‘adaptation’ could in principle be either an up-regulation or a downregulation of the response (again, based on various definitions). But we agree that using the word “habituation” came at the cost of triggering a confusion about the exact nature of the process, for those considering the stricter definition of the word “habituation” and those not in the narrower field of pain research. In the revised manuscript, we have thus changed this terminology to “adaptation”. Also following suggestions from Reviewer 2, we have strengthened the description of the protocol in the Result section and clarified, why the adaptation phenomenon is not a ‘thermal damage’ effect or ‘fatigue’ effect in the neuro-muscular circuit controlling reversal. One of the most convincing piece of evidence it cannot be solely explained by “damages” or “exhaustion” is simply the existence of non-adapting mutants (like cmk-1(lf)) or pharmacological treatments (Cyclosporin A) blocking the adaptation effect and enabling worm to continuously reverse for hours without any problems.  

      While the discrepancy between the in vitro phosphorylation experiments and the in silico predictions was discussed, the substantial discrepancy (over 85% of the substrates in the smaller in vitro dataset were not identified in the larger dataset) between the two different in vitro datasets was not discussed. This is surprising, as these approaches were quite similar, and it may indicate a measure of unreliability in the in vitro datasets (or high false negative rates).

      Thanks for the comment. This is an important aspect which we now more extensively cover in the Discussion section.

      The strong consistency of the CMK-1 recognition consensus sequences across the two in vitro dataset speaks against the unreliability of the analyses. Instead, there are a few points to highlight that explain the somewhat low degree of overlap between the two datasets, which indeed relate to the false negative rates as this reviewer suggests.

      (1) In the peptide library analysis, Trypsin cleavage prior to kinase treatment will leave a charged N-term or C- terminus and in addition remove part of the protein context required for efficient kinase recognition. This will have a variable effect across the different substrates in the peptide library, depending on the distance between the cleavage site and the phosphosite, but will not affect the native protein library. This effect increases the false negative rate in the peptide library.

      (2) The number and distribution of “available substrate phosphosites” diverge in the two libraries. Indeed, the peptide library is expected to contain a markedly larger diversity of potential CMK-1 substrate sites than the protein library (because the Trypsin digestion will reveal substrates that are normally buried in a native protein), but the depth of MS analysis is the same for the two libraries. In somewhat simplistic terms, the peptide-library analysis is prone to be saturated with abundant phosphorylated peptides, which prevent detecting all phosphosites. If the peptide analysis could have been made deeper, we would probably have increased the overlap (at the cost of increasing the number of false positive too).

      (3) We have chosen quite strict criteria and applied them separately to define each hit list; therefore, we know we have many false negatives in each list, which will naturally reduce the expected overlap.

      We now extended the discussion of the limited overlap of the two dataset in a dedicated paragraph in the discussion. We also clarify that we tend to give more trust to the protein-library dataset (since substrates are in a configuration closer to that in vivo), with those hits also present in the peptide dataset (like TAX-6 was) as the most convincing hits, as they could be validated in a second type of experiment.

      Additionally, the rationale for, and distinction between, the two separate in vitro experiments is not made clear.  

      We reasoned that both substrate types have their own benefits and limitations (as discussed in the manuscript), so it was an added value to run both. We proposed that the subset of targets present in both datasets to be the most solid list of candidates. We have reinforced this point in the discussion.  

      Line 207: After reporting that both tax-6 and cnb-1 mutants have high spontaneous reversals, it is not made clear why cnb-1 is not further explored in the paper. Additionally, this spontaneous reversal data should be in a supplementary figure.  

      We kept the focus of the article primarily on TAX-6, because it was identified as CMK-1 target in vitro; CNB-1 was not. Moreover, we didn’t have cnb-1(gf) mutants to pursue the analysis with, and we were stuck by the cnb-1(lf) constitutive high reversal rate for any further follow up. We have added a supplementary file to present the spontaneous reversals rates.

      Figure 3 -S1: This model doesn't explain why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement (presumably by reducing the inhibition by tax-6) but the +cyclo A group (inhibited tax-6) showed weaker response decrement, as here there is even further weakened inhibition of tax-6 on this process. Also, the cmk-1(lf) +cyclo A group is labeled as constitutive habituation, however, this doesn't appear to be the case in Figure 3 (seems like a similar initial level and response decrement phenotype to wildtype).  

      Thanks a lot for the comment. We are glad that the presentation of our complex dataset was clear enough to bring the reader to that level of detailed reflection and interpretation on the proposed model. To address the two points raised in this reviewer comment, we made modifications to the model presentation and provide additional clarifications below, where we use the term adaptation instead of habituation (as in the revised Figure):

      Regarding the first point, “why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement … but the +cyclo A group showed weaker response decrement”. This is really a very good point, that cannot be easily explained if all the branches (arrows) in the model have the same weight or work as ON/OFF switches. We tried to convey the relative importance of the regulation effect via the thickness of the arrow lines (which we have now clarified in the legend in the revised ms). The main ‘quantitative’ nuances to take into consideration here originate from 2 assumptions of the model (which we have clarified in the revised ms):

      Assumption 1: the inhibitory effect of TAX-6 on the CMK-1 antiadaptation branch and the inhibitory effect of TAX-6 on the CMK-1 pro-adaptation branch are not of the same magnitude (we have further enhanced the line thickness differences in the revised model, top left panel for wild type).

      Assumption 2: the two antagonistic direct effects of CMK-1 on adaptation are not of the same magnitude, most strikingly in the context of CMK-1(gf) mutants.

      In our model, the cyclosporin A treatment alone (bottom left panel) causes a strong boost on the CMK-1 inhibitory branch and a less marked boost on the CMK-1 activator branch (following assumption 1). This causes an imbalance between the two antagonist direct CMK-1-dependent drives, which reduces (but doesn’t fully block) adaptation. Indeed, we don’t observe a total block of adaptation with cyclosporin A in wild type, the effect being significantly milder than the totally nonadapting phenotypes seen, e.g., in TAX-6(gf) mutants. From there, the question is what happen in CMK-1(gf) background that would mask the anti-adaptation effect of Cyclosporin A? Here assumption 2 is relevant, and the CMK-1(gf) pro-adaptation direct branch is always prevalent and imbalances the regulation toward faster adaptation (the role of TAX-6 becoming negligible in the CMK-1(gf) background and ipso facto that of Cyclosporin A).

      Regarding the second point, “the cmk-1(lf) +cyclo A group is labeled as constitutive habituation”. We regret a confusing word choice in the first version of the manuscript; we intended to mean “normal habituation phenotype” but in the joint absence of antagonistic CMK-1 and TAX-6 regulatory signaling (so the regulation is not like in wild-type, but the phenotype ends up like in wild type). We have modified the label to “normal adaptation” and left a note in the legend that an apparently normal adaptation phenotype seems to be the default situation when the two antagonistic regulatory pathways are shut off.

      More discussion of the significance of the sites of cmk-1 and tax-6 function in the neural circuit should take place. Additionally, incorporating the suspected loci of cmk-1 and tax-6 in the neural circuit into the model would be interesting (using proper hypothetical language). For example, as it seems like AFD is not required for the naïve reversal response but just its reduction, cmk-1 activity in AFD might be generating inhibition of the reversal response by AFD. It certainly would be understandable if this isn't workable, given extrasynaptic signaling and other unknowns, but it potentially could also be helpful in generating a working model for these complex interactions. For example, cmk1 induces AIZ inhibition of AVA (AIZ is electrically coupled to AFD), and tax-6 reduces RIM activation of AVA (these neurons are also electrically coupled according to the diagram). RIM is also a neuropeptide-rich neuron, so this could allow it to interact with the cmk-1-related process(es) in AFD. Some discussion of possibilities like this could be informative.  

      Thanks for the comment. These hypothetical inter-cellular communication pathways are indeed nice possibilities. On the other hand, we could envision several additional pathways. While RIM is indeed a neuropeptide-rich neurons, all these neurons actually express neuropeptides. Following this helpful suggestion, we have slightly expanded the discussion of hypothetical cellular pathways that can be modulated downstream of CMK-1 in AFD. We also slightly lengthened the discussion to mention hypothetical post-synaptic target of TAX-6 within interneurons based on the literature.

      Provide an explanation for why some of the experiments in Figure 4 have such a high N, compared to other experiments.  

      The conditions with the highest n correspond to conditions which we have also used as ‘control’ condition for other type of experiments in the lab and as part of side projects, but which could be gathered for the present article. We have been working with cmk-1(lf) and tax-6(gf) mutants for many years… and the robust non-adapting phenotype was a reference point and a quality control when analyzing other nonadapting mutants.

      Because the loss of function and gain of function mutations in cmk-1 have a similar effect, it is likely that this thermosensory plasticity phenotype is sensitive to levels of cmk-1 activity. Therefore, it is not surprising that the cmk-1 promoter failed to rescue very well as these plasmid-driven rescues often result in overexpression. Given this and that the cmk-1p rescue itself was so modest, these rescue experiments are not entirely convincing (and very hard to interpret; for example, is the AFD rescue or the ASER rescue more complete? The ASER one is actually closer to the cmk-1p rescue). Given the sensitivity to cmk-1 activity levels, a degradation strategy would be more likely to deliver clear results (or perhaps even the overactivation approach used for tax-6).  

      Thanks for the comment. We respectfully disagree with this reviewer’s statement “the loss of function and gain of function mutations in cmk-1 have a similar effect”. We suspect a confusion here, because our data clearly show that these two mutant types have an opposite phenotype. That being said, we interpret the weak rescue effect with cmk-1p as a probable result of overexpression or incomplete/imbalanced expression across neurons (as the promoter used might not include all the relevant regulatory regions). We dedicated considerable efforts to establish an endogenous CMK-1::degron knock in, for tissue-specific auxin-induced degradation (AID), but we were unfortunately not able to obtain consistent results. Unfortunately, the only useful data regarding CMK-1 place-of-action are the cell-specific rescue data already included in the report.

      Reviewer #2 (Public review):  

      Summary:  

      The reduction in a response to a specific stimulus after repeated exposures is called habituation. Alterations in habituation to noxious stimuli are associated with chronic pain in humans, however, the underlying molecular mechanisms involved are not clear. This study uses the nematode C. elegans to study genes and mechanisms that underlie habituation to a form of noxious stimuli based on heat, termed thermo-noxious stimuli. The authors previously showed that the Calcium/Calmodulin-dependent protein kinase (CMK-1) regulates thermo-nociceptive habituation in the nematode C. elegans. Although CMK-1 is a kinase with many known substrates, the downstream targets relevant for thermo-nociceptive habituation are not known. In this study, the authors use two different kinase screens to identify phosphorylation targets of CMK-1. One of the targets they identify is Calcineurin (TAX-6). The authors show that CMK-1 phosphorylates a regulatory domain of Calcineurin at a highly conserved site (S443). In a series of elegant experiments, the authors use genetic and pharmacological approaches to increase or decrease CMK-1 and Calcineurin signaling to study their effects on thermo-nociceptive habituation in C. elegans. They also combine these various approaches to study the interactions between these two signaling proteins. The authors use specific promoters to determine in which neurons CMK-1 and Calcineurin function to regulate thermonociceptive habituation. The authors propose a model based on their findings illustrating that CMK-1 and Calcineurin act mostly in different neurons to antagonistically regulate habituation to thermo-nociceptive stimuli in a complex manner.  

      Strengths:  

      (1) Given the conservation of habituation across phylogeny, identifying genes and mechanisms that underlie nociceptive habituation in C. elegans may be relevant for understanding chronic pain in humans.  

      (2) The identification of canonical CaM Kinase phosphorylation motifs in the substrates identified in the CMK-1 substrate screen validates the screen.  

      (3) The use of loss and gain of function approaches to study the effects of CMK-1 and Calcineurin on thermo-nociceptive responses and habituation is elegant.  

      (4) The ability to determine the cellular place of action of CMK-1 and Calcineurin using neuron-specific promoters in the nematode is a clear strength of the genetic model system.  

      Thanks a lot for these positive remarks.

      Weaknesses:  

      (1) The manuscript begins by identifying Calcineurin as a direct substrate of CMK-1 but ends by showing that CMK-1 and Calcineurin mostly act in different neurons to regulate nociceptive habituation which disrupts the logical flow of the manuscript.  

      We understand this point and we have carefully considered and (reconsidered) the way to articulate the report. However, we could not present the story much differently as we would have no justification to investigate the role of TAX-6 and its interaction with CMK-1, if we would not have first identified it as phospho-target in vitro. Carefully considering this point, we found that the abstract of the first manuscript version was probably too cursory and susceptible to trigger wrong expectations among readers. We have thus extensively revised the abstract to clarify this point. Furthermore, we have reinforced this point in the last paragraph of the introduction and in the conclusion paragraph of the Discussion.

      (2) The physiological relevance of CMK-1 phosphorylation of Calcineurin is not clear.

      We do agree and have explicitly mentioned this aspect in the abstract, in the end of the introduction, and in the discussion section.

      (3) It is not clear if Calcineurin is already a known substrate of CaM Kinases in other systems or if this finding is new.  

      We are not aware of any study having shown Calcineurin is a direct target of CaM kinase I. But it was found to be substrate of CaM kinase II as well as of other kinases, as we explicitly presented in the discussion section. We have complemented the text mentioning we are not aware of Calcineurin having so far been reported to be a CaM kinase I substrate.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):  

      (1) The authors might consider reorganizing the results, so that the substrate phosphorylation analysis follows the cmk-1 habituation data, as it may not be clear to the reader why you are looking for substrates downstream of cmk-1 at that point. Or the authors could mention the previous habituation data for cmk-1 at the beginning of the results.  

      Thank you. This is something that we considered while (re-)writing. However, we prefer to keep CMK-1 data side-by-side with TAX-6 data, regarding the result section. Nevertheless, we have modified the last paragraph of intro to better transition and justify the specific interest of searching for CMK-1 targets in the context of the present study.

      (2) Line 209: 'controls' is too strong a word. 'regulates' would be better, and it should be stated that this is for 'spontaneous reversal behavior'.  

      Thank you. This was modified.

      (3) Line 359: we suspect that these reflect functional enrichments.  

      We don’t see what would exactly be wrong with the original sentence. The proposed change (if it is a proposed change) would completely obliterate the intended meaning of our sentence. We rewrote the sentence to be as clear as possible, as follows: ”Even if we cannot rule out an actual inclination of the CaM kinase pathway to regulate these processes, we suspect that these GO term enrichments rather reflect an analytical bias toward abundant proteins.”

      (4) Line 563: In this subsection, it is not made clear when the T0 and T60 heat pulses are given, in relation to the 20s ISI heat pulses given for 60 minutes. Are they the first and last pulse, or given some time before or after this train of heat pulses?  

      Thanks for spotting this poor description, which we have improved in the revised manuscript. The heat pulse recording is given immediately before and immediately after the 60 min of repeated stimulation. After the T0 heat pulse recording there is a period of about 30 s (period of post stimuli recording + transfer from the recording device (INFERNO) to the habituation device (ThermINATOR)).  For the T60 acquisition, there is a lag of about 50 s between the last ‘habituation’ stimuli and the recording stimuli (time needed to move the plate between the habituation device and the recording device + 40 s of baseline reversal recording in the absence of heat stimuli).

      Reviewer #2 (Recommendations for the authors):  

      (1) There appears to be little to no connection between the phosphorylation site discovered in Calcineurin (S443) and the behavioral phenotypes being studied. What is the thermo-nociceptive response if phosphorylation of S443 in Calcineurin is blocked (using a S443A mutation) and/or combined with CMK-1 gain of function?  

      Thanks for the suggestion. The suggested analysis is complicated by several factors. First, the tax-6(lf) is not directly suitable for rescue analysis (until we would have identified a way to restore baseline reversal), so we cannot use a S443A-carrying rescue transgene. Second, the truncated TAX-6(GF) mutant lacks the C-terminal part, including S443, so we cannot introduce a S443A in this context. The left approach would be to modify the endogenous locus. This again is complicated by the fact that S443 exists in two different isoforms (with conserved RxxS motifs in two different alternative exons). It will be very difficult to perform these experiments until we know more about the expression pattern and function of the respective isoforms. This is work in progress, but this analysis will need to await a future publication.

      (2) The authors should state clearly if Calcineurin is a novel substrate of CaM Kinase or if this is already known in the field.  

      We have complemented the text mentioning we are not aware of Calcineurin having so far been reported to be a CaM kinase I substrate.

      (3) The logical flow of the manuscript could be improved given that CMK-1 and Calcineurin appear to act in different cells to regulate nociceptive habituation.  

      As detailed above, we have considered this point carefully and modified the introduction and the abstract. The discussion about the two places of action was also improved.

      (4) More detail about the experimental methods used for the heat-evoked reversals should be included in the Results section.  

      Thanks for the suggestion. We have improved the description in the Method section and expanded the partial description in the result section, so readers could hopefully proceed without needing to go back and forth with the methods.

      (5) Check for typos. For example: line 197 - fix typo "...to a series repeated heat stimulation...".  

      Thank you. We have carefully read the revised manuscript to correct remaining typos.

    1. eLife Assessment

      The authors developed a methodology to graph antigenic surface loops on influenza virus neuraminidases. The hybrid proteins retained the structure of the neuraminidase scaffold and the antigenicity of the grafted loops. This fundamental work should help in developing novel neuraminidase constructs for use in influenza virus vaccines. The paper presents compelling evidence supporting the conclusions arrived at by the authors.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript described a structure-guided approach to graft important antigenic loops of the neuraminidase to a homotypic but heterologous NA. This approach allows the generation of well-expressed and thermostable recombinant proteins with antigenic epitopes of choice to some extent. The loop-grafted NA was designated hybrid.

      Strengths:

      The hybrid NA appeared to be more structurally stable than the loop-donor protein while acquiring its antigenicity. This approach is of value when developing a subunit NA vaccine which is difficult to express. So that antigenic loops could be potentially grafted to a stable NA scaffold to transfer strain-specific antigenicity.

    3. Reviewer #2 (Public review):

      In their manuscript, Rijal and colleagues describe a 'loop grafting' strategy to enhance expression levels and stability of recombinant neuraminidase. The work is interesting and important.

      Major points from first round of review:

      (1) The authors overstress the importance of the epitopes covered by the loops they use and play down the importance of antibodies binding to the side, the edges, or the underside of the NA. A number of papers describing those mAbs are also not included.

      (2) The rationale regarding the PR8 hybrid is not well described and should be described better.

      (3) Figure 3B and 6C: This should be given as numbers (quantified), not as '+'.

      (4) Figure 5A and 7A: Negative controls are missing.

      (5) The authors claim that they generate stable tetramers. Judging from SDS-PAGE provided in Supplementary Figure 3B (BS3-crosslined), many different species are present including monomers, dimers, tetramers, and degradation products of tetramers. In line 7 for example there are at least 5 bands.

      [Editors' note: the authors have appropriately responded to and addressed these points.]

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript described a structure-guided approach to graft important antigenic loops of the neuraminidase to a homotypic but heterologous NA. This approach allows the generation of well-expressed and thermostable recombinant proteins with antigenic epitopes of choice to some extent. The loop-grafted NA was designated hybrid.

      Strengths:

      The hybrid NA appeared to be more structurally stable than the loop-donor protein while acquiring its antigenicity. This approach is of value when developing a subunit NA vaccine which is difficult to express. So that antigenic loops could be potentially grafted to a stable NA scaffold to transfer strain-specific antigenicity.

      Weaknesses:

      However, major revisions to better organize the text, and figure and make clarifications on a number of points, are needed. There are a few cases in which a later figure was described first, data in the figures were not sufficiently described, or where there were mismatched references to figures.

      More importantly, the hybrid proteins did not show any of the advantages over the loop-donor protein in the format of VLP vaccine in mouse studies, so it's not clear why such an approach is needed to begin with if the original protein is doing fine.

      We thank the reviewer for their helpful comments. We have incorporated feedback from the authors to improve the manuscript. Please see our point-by-point response.

      The purpose of loop-grafting between H5N1/2021 (a high-expressor) and the PR8 virus was not to improve the expression of PR8, which is already a good expressing NA. Instead, the loop-grafting and the in vivo experiments were done to show the loop-specific protection following a lethal PR8 virus challenge.

      Reviewer #2 (Public review):

      In their manuscript, Rijal and colleagues describe a 'loop grafting' strategy to enhance expression levels and stability of recombinant neuraminidase. The work is interesting and important, but there are several points that need the author's attention.

      Major points

      (1) The authors overstress the importance of the epitopes covered by the loops they use and play down the importance of antibodies binding to the side, the edges, or the underside of the NA. A number of papers describing those mAbs are also not included.

      We have discussed the distribution of epitopes on NA molecule in the Discussion section "The distribution of epitopes in neuraminidase" (new line number 350). In Supplementary Figures 1 and 2, we have compiled the epitopes reported by polyclonal sera and mAbs via escape virus selection or crystal structural studies. There are 45 residues examples of escape virus selection, and we found that approximately 90% of the epitopes are located within the top loops (Loops 01 and Loops 23, which include the lateral sides and edges of NA). We have also included the epitopes of underside mAbs NDS.1 and NDS.3 in Supplementary Figure 2. Some of the interactions formed by these mAbs are also within the L01 and L23 loops. All relevant references are cited in Supplementary Figures 1 and 2.

      A new figure has been added [Figure 1b (ii)] to illustrate the surface mapping of epitopes on NA.

      (2) The rationale regarding the PR8 hybrid is not well described and should be described better.

      We described the rationale for the PR8 hybrid (new lines 247-250). For clarity, we have added the following sentence within the section "Loop transfer between two distant N1 NAs:...."

      (new lines 255-258):

      "mSN1 showed sufficient cross-reactivity to N1/09 to protect mice against virus challenge. Therefore, we performed loop transfer between mSN1 and PR8N1, which differ by 18 residues within the L01 and L23 loops and show no or minimal cross-reactivity, to assess the loop-specific protection."

      (3) Figure 3B and 6C: This should be given as numbers (quantified), not as '+'.

      We have included the numerical data in Supplementary Figure 6. The data is presented in semi-quantitative manner for simplification. To improve clarity, we have now added the following sentence to the Figure 3c legend: "Refer to Supplementary Figure 6 for binding titration data".

      (4) Figure 5A and 7A: Negative controls are missing.

      A pool of Empty VLP sera was included as a negative control, showing no inhibition at 1:40 dilution. In the figure legends, we have stated "Pooled sera to unconjugated mi3 VLP was negative control and showed no inhibition at 1:40 dilution (not included in the graphs)"

      (5) The authors claim that they generate stable tetramers. Judging from SDS-PAGE provided in Supplementary Figure 3B (BS3-crosslinked), many different species are present including monomers, dimers, tetramers, and degradation products of tetramers. In line 7 for example there are at least 5 bands.

      Tetrameric conformation of soluble proteins is evidenced by the size-exclusion chromatographs shown in Figures 3a and 6b. The BS3 crosslinked SDS-PAGE are only suggestive data, indicating that the protein is a tetramer if a band appears at ~250 kDa. However, depending on the reaction conditions, lower molecular weight bands may also be observed if crosslinking is incomplete.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      - Description of Figure 2 on page 3 should go before Figure 3 lines 87-105 or swap the order of the two figures.

      We have moved lines 91-96, which refer to Figure 3, to appear after Figure 2.

      - Figure 3a, an EC50 should be calculated for both NA activity assay.

      Figure 3a has been updated to include the EC50 and AUC (Area under curve) values for both NA activity assays. The same update has also been made for Figure 6b.

      - Line 150, I'm not sure it's appropriate to cite a manuscript that was in preparation but not published. I'm referring to the two mAbs AG7C and AF9C that were claimed to bind to the L01 and L23 loops but not.

      We have changed the "manuscript in preparation" to "personal communication with Dr. Yan Wu, Capital Medical University".

      - The description in Figure 4a is lacking.

      We have added a detailed description for Figure 4a.

      - Figure 4c, sufficient description is needed. For example, the cavity should be outlined and annotated, what is the role of Val149? Why the first monomer is assigned a number of II and the second monomer with a number of I.

      We have added a detailed description for Figure 4c and amended the figure as per the reviewer’s suggestions.

      - Figure 5a, in addition to ELLA data to mSN1 and N1/09, ELLA data to N1/19 should also be measured and shown. Figure S7, please show IC50 instead of curves for better comparison.

      We included IC50 for mSN1 and N1/09 as we intended to associate the loops with protection.  Graphs for N1/19 have not been reported, but the IC50 titres from pooled sera are shown in Supplementary Figure 7 as a representation. Due to the limited sera sample sourced from tail vein bleed, these assays were performed using pooled sera, which represent the total response (established in numbers of experiments).

      - Line 234-238, the author made a statement about the data shown in Figure 7b "These results mirrored several studies in the literature which showed that immunization with the 2009 N1 could provide at least partial protection in mice and ferrets to the avian H5N1 challenge". The data did not reflect that. In Figure 5b, mSN1 protects as well as other proteins. In fact, there was no advantage of N109 and N109 hybrid over mSN1 in protection against the homologous H1N109. Although higher levels of NAI antibodies were induced with the homologous protein in Figure 5a. The protection could be contributed by non-NAI antibodies, so the authors should measure binding antibodies. The author may increase the challenge dose from 200 LD50 to 1000 LD50 to see a difference due to the strong immunogenicity of the nanoparticles vaccine plus addavax. Otherwise, it looks like loop grafting is not necessary as heterologous NA could broadly protect.

      We agree that msN1, despite its low NAI titres, was equally protective as homologous NA or its hybrid NA against H1N1/09 virus challenge at 200 LD50. There may be additional protective components, including non-NAI antibodies in homologous groups that may have contributed to the protection.

      We assessed sera binding to H1N1/2009 and found that the binding antibody levels were also lower in the msN1 group. The corresponding graph has now been added in Figure S7d. It was difficult to determine the NAI titre required to confer protection in this experiment. For this reason, we later chose PR8 as the challenge virus to demonstrate loop-specific protection.

      We are uncertain whether a 1000 LD50 challenge would have helped establish a correlation between protection and NAI IC50 titres, as the dose used is already lethal for DBA/2 mice.

      - Why would the authors separate work with N1/09 and N1/19 from PR8 N1? To this reviewer's understanding, they are all the same strategies with increasing numbers of dissimilar residues from N1/09 (12) to N1/19 (16) and to PR8 (18). They are all characterized by the same approaches in vitro and in vivo.

      We had two different goals for making hybrids with N1/09 and PR8 N1, therefore, we have presented these results separately.

      (1) For N1/09 and N1/19, we showed that loop-grafting improved protein yield and stability. Additionally, we showed that the N1/09 hybrid can be as protective as the homologous protein.

      (2) PR8 N1 is a high-yielding protein, so loop grafting did not significantly increase its yield. However, the PR8 virus challenge confirmed loop-specific protection.

      - For in vivo study testing the PR8 construct, although PR8 and PR8 hybrid protect better than the heterologous mSN1, the hybrid again did not show any advantages over the PR8 original proteins.

      That's correct - the PR8 hybrid was not advantageous over the original PR8 protein. However, the purpose of this experiment was to demonstrate loop specific protection. The PR8 hybrid (PR8 loops - mS scaffold) protected 6/6 mice, whereas mS hybrid (mS loops - PR8 scaffold) provided no protection.

      - Line 243-249, lack of reference to figures.

      References to Supplementary Figure 7b,c and Figure 2 has been added.

      - What was the reason that the challenge was one by 200 LD50 for 2009 H1N1 and 1000 LD50 for PR8.

      Viruses were titrated in the BALB/c strain for PR8 virus and the DBA/2 strain for X-179A (H1N1/2009) virus. These doses were selected based on their lethality and the time required to reach the endpoint (~20% weight loss) post-infection, which is 5-6 days. Most studies in the literature have used 10 LD50 or higher; thus the virus doses we used are relatively high.

      - Line 268, there is no Figure 5C.

      This was a mistake and has been corrected to Figure 6c.

      - Line 275 what are the readers supposed to see in supplementary Figure 5a? There is not enough description for the referred figures.

      A sentence has been added to Fig S5a description, to make a point about recognition of the NA scaffold by mAb CD6. "Binding by mAb CD6 is predominantly scaffold dependent and occurs across two protomers"

      - The discussion is very long and some of it is not relevant to the study. For example, the role of the tetramerization domain and the basis for structurally stable tetramer formation, were not the focuses of this study.

      We felt it was important to discuss the tetramerisation domain and the basis for stable tetramer formation. A previous study by Ellis et al.  used the VASP tetramerisation domain and introduced multiple NA interface mutations to achieve a more stable closed conformation. In contrast, NA proteins used in our study required the tetrabrachion tetramerisation domain to form a properly assembled tetramer.

      In lines 382-383, there is one unfinished sentence.

      This is corrected.

      The definition of the loops is also confusing. Line 381, the author stated that in the N1/19 hybrid design, residue N200S, could have been considered as part of the loop B2L23, and was it not?

      The designation of loop ends should not be rigid but rather based on multiple factors such as, their proximity to antigenic epitopes, charge, and hydrophobicity. This is discussed in the " Definition of loops" section.

      - Figure 1a and Figure S2, please provide sufficient descriptions, what do the blocks in different colors mean?

      We have updated the Figure 1a legend to indicate the colours.

      The descriptions for Figures S1 and S2 have also been revised for clarity.

      Reviewer #2 (Recommendations for the authors):

      Minor points

      (1) Line 37: Should be 'Influenza virus neuraminidase'.

      This is corrected.

      (2) Line 65: https://pubmed.ncbi.nlm.nih.gov/35446141/, https://pubmed.ncbi.nlm.nih.gov/33568453/ and https://pubmed.ncbi.nlm.nih.gov/28827718/ indicate that protective mAbs bind all over the NA head domain.

      We have discussed the epitopes on the NA head in detail in the section "The distribution of epitopes on Neuraminidase". In Supplementary Figures 1 and 2, we compiled several studies, including those on polyclonal sera and mAbs epitopes, emphasizing that loops 01 and 23 are the predominant antibody targets (~90%). Some antibodies also bind to the underside of NA. We have discussed and referenced these studies accordingly.

      A new figure has been added [Figure 1b (ii)] to illustrate the surface mapping of epitopes on NA.

      The first reference has been included in both our discussion and Supplementary figure 1.

      The NA epitopes discussed in the second reference have also been incorporated into our discussion and Supplementary figures 1 and 2. Note that, the E258K mutation generated on the NA underside was not relevant to mAbs and was generated randomly by passaging of H3N2 A/New York/PV190/2017 virus. 

      The third reference pertains to murine mAbs against influenza B virus NA.

      (3) Lines 71, 72, and throughout: 'et al.' should be in italics.

      All "et al." have been italicised.

      (4) Many abbreviations are not defined including CHO, SDS-PAGE, MUNANA, mi3, HEPES, BSA, TPCK, MWCO, HRP, PBS, TMB, TCID50, LD50, MES, PEG, PGA, MME, PGA-LM.

      The text has been amended to define these abbreviations.

      (5) Line 209: Shouldn't this be ID50 instead of IC50? Also, it is not defined.

      IC50 has been defined.

      (6) Line 210, line 346, line 581-582: No need to capitalize letters at the beginning of words mid-sentence.

      This is amended.

      (7) Line 227: Is 2009 H1N1 NA meant?

      This has been changed to "H1N1/2009 neuraminidase"

      (8) Line 310: Is this really quantitatively true? (see major comment 1).

      Based on the compilation of epitopes from published NA mAbs and polyclonal sera (via escape mutagenesis and NA-Fabs crystal structures), it is accurate to state that the protective epitopes are primarily located within loops 01 and 23.

      Please also refer to our response to minor point 2. 

      (9) Line 352 and throughout the manuscript: 'in vitro' should be in italics.

      This is amended.

      (10) Line 355: https://pubmed.ncbi.nlm.nih.gov/35446141/https://pubmed.ncbi.nlm.nih.gov/33568453/ and https://pubmed.ncbi.nlm.nih.gov/28827718/ should be included here.

      Studies reporting epitopes on Influenza A neuraminidase have been compiled in Supplementary Figures 1 and 2 and cited appropriately.

      (11) Line 365: https://pubmed.ncbi.nlm.nih.gov/35446141/ and https://pubmed.ncbi.nlm.nih.gov/33568453/ also describe epitopes on the underside of the NA.

      Please refer to the above response to point 10.

      (12) Line 365: Reference https://pubmed.ncbi.nlm.nih.gov/37506693/ is missing here.

      The reference has been added.

      (13) Line 369-371: Is it really a minority?

      In terms of the protective response, the majority of the antibody response is directed towards loops 01 and 23, which form the top antigenic surface. The term 'lateral' is used in some literature to describe NA mAb epitopes; loops 01 and 23 also encompass the lateral regions.

      To clarify this, we have added the following sentence to the Discussion section - "The distribution of epitopes on neuraminidase"

      "It is important to note that loops 01 and 23 include a portion of epitopes that have been described in the literature as side, lateral, or underside (see mAbs NDS.1, NDS.3, and CD6 in Supplementary Fig. 2)"

      Additionally in our studies in mice, we showed that protection is mediated by antibodies targeting the loops (Figure 7). We are uncertain about the binding response to the NA underside, but the NA inhibiting and protective response to the underside appears to be minimal.

      Furthermore Lederhof et al. showed that among the 'underside' mAbs, NDS.1 protected mice against virus challenge, whereas NDS.3 did not. In our analysis (Supplementary Figure 2), NDS.1 makes eight-residue contacts with B4L01 and B5L01, whereas NDS.3 make five-residue contacts with B3L01 and B4L01.

      (14) Line 530: The A in ELLA already stands for assay.

      This is corrected.

    1. eLife Assessment

      This is an important study that examines the role of TFAM, a protein that helps maintain mtDNA, in mtDNA mutator mice. With convincing evidence, the authors have demonstrated that TFAM's counteractive role in mtDNA mutator mice is tissue-specific. The study does a thorough job of assessing the impact of modulating TFAM levels in a polg mutator mouse model of aging. The authors have thoroughly addressed all the points raised during the first round of review.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Kremer et al. characterizes the tissue-specific responses to changes in TFAM levels and mtDNA copy number in prematurely aging mice (polg mutator model). The authors find that overexpression of TFAM can have beneficial or detrimental effects depending on the tissue type. For instance, increased TFAM levels increase mtDNA copy number in the spleen and improve spleen homeostasis but do not elevate mtDNA copy number in the liver and impair mtDNA expression. Similarly, the consequences of reduced TFAM expression are tissue-specific. Reduced TFAM levels improve brown adipocyte tissue function while other tissues are unaffected. The authors conclude that these tissue-specific responses to altered TFAM levels demonstrate that there are tissue-specific endogenous compensatory mechanisms in response to the continuous mutagenesis produced in the prematurely aging mice model, including upregulation of TFAM expression, elevated mtDNA copy number, and altered mtDNA gene expression. Thus, the impact of genetically manipulating global TFAM expression is limited and there must be other determinants of mtDNA copy number under pathological conditions beyond TFAM.

      Strengths:

      Overall, this is an interesting study. It does a good job of demonstrating that given the multi-functional role of TFAM, the outcome of manipulating its activity is complex.

      Weaknesses:

      No major weaknesses noted. The authors have adopted all our suggestions to improve the clarity of the manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Kremer et al. investigates the impact of modulation of expression of TFAM, a key protein involved in mitochondrial DNA (mtDNA) packaging and expression, in mtDNA mutator mice, which carry random mtDNA mutations. While previous research suggested that increasing TFAM could counteract the pathological effects of mtDNA mutations, this study reveals that the effects of TFAM modulation are tissue-specific. These findings highlight the complexity of mtDNA copy number regulation and gene expression, emphasizing that TFAM alone is not the sole determinant of mtDNA levels in contexts where oxidative phosphorylation is impaired. Other factors likely play a significant role, underscoring the need for nuanced approaches when targeting TFAM for therapeutic interventions.

      Strengths:

      The data presented in the manuscript are of high quality and support the major conclusions.

      Comments on revisions:

      The authors have thoroughly addressed all the points raised during the first round of review. Their revisions effectively clarify key aspects of the manuscript, and the additional data and explanations have significantly improved the overall quality of the work. I believe the manuscript is now well-prepared for publication.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This manuscript by Kremer et al. characterizes the tissue-specific responses to changes in TFAM levels and mtDNA copy number in prematurely aging mice (polg mutator model). The authors find that overexpression of TFAM can have beneficial or detrimental effects depending on the tissue type. For instance, increased TFAM levels increase mtDNA copy number in the spleen and improve spleen homeostasis but do not elevate mtDNA copy number in the liver and impair mtDNA expression.

      Similarly, the consequences of reduced TFAM expression are tissue-specific. Reduced TFAM levels improve brown adipocyte tissue function while other tissues are unaffected. The authors conclude that these tissue-specific responses to altered TFAM levels demonstrate that there are tissue-specific endogenous compensatory mechanisms in response to the continuous mutagenesis produced in the prematurely aging mice model, including upregulation of TFAM expression, elevated mtDNA copy number, and altered mtDNA gene expression. Thus, the impact of genetically manipulating global TFAM expression is limited and there must be other determinants of mtDNA copy number under pathological conditions beyond TFAM. 

      Strengths: 

      Overall, this is an interesting study. It does a good job of demonstrating that given the multi-functional role of TFAM, the outcome of manipulating its activity is complex. 

      Weaknesses: 

      No major weaknesses were noted. We have minor suggestions for improving the clarity of the manuscript that are detailed in the "recommendations for the authors" section. 

      We thank the reviewer for the suggestions and addressed them as described in the "recommendations for the authors" section.

      Reviewer #2 (Public review): 

      Summary: 

      This study by Kremer et al. investigates the impact of modulation of expression of TFAM, a key protein involved in mitochondrial DNA (mtDNA) packaging and expression, in mtDNA mutator mice, which carry random mtDNA mutations. While previous research suggested that increasing TFAM could counteract the pathological effects of mtDNA mutations, this study reveals that the effects of TFAM modulation are tissue-specific. These findings highlight the complexity of mtDNA copy number regulation and gene expression, emphasizing that TFAM alone is not the sole determinant of mtDNA levels in contexts where oxidative phosphorylation is impaired. Other factors likely play a significant role, underscoring the need for nuanced approaches when targeting TFAM for therapeutic interventions. 

      Strengths: 

      The data presented in the manuscript is of high quality and supports major conclusions. 

      Weaknesses: 

      The statistical methods used are not clearly described, and some marked nonsignificant results appear visually significant, which raises concerns about data analysis. 

      Data presentation requires improvement. 

      We thank the reviewer for the comments. We updated the text in the Materials and Methods section to state the statistical methods and improved the figures as described in detail in the "recommendations for the authors" section.

      Recommendations for the authors:

      (1) Please include testis data in Figure 2 given previous work by authors showing that elevated mtDNA copy number can improve testis function. It would be interesting to compare the changes in mtDNA copy number in testis to these other tissues.

      We measured mtDNA copy number in testis using the CytB probe and added it as Supplementary figure 2 A.

      (2) The clarity of Table 1 could be improved. It is difficult to know whether the changes in the TFAM to mtDNA ratio are driven by changes in TFAM levels or mtDNA copy number. A suggestion is to include the TFAM and mtDNA values in parenthesis next to each listed ratio.

      We updated Table 1 and included the values of the normalized TFAM and mtDNA levels in parentheses.

      (3) The authors should consider showing TFAM western blot data in Figure 1.

      We thank the reviewer for the suggestion but would like to keep the TFAM western blot data with the other western blot data for the respective tissue.

      (4) The graphs for qPCR data (e.g. Figure 2) show mRNA or mtDNA levels relative to the control, which is always set to 1. Why, then, does the control group display error bars?

      For the normalization of the data to the WT group, we first calculate the average of the values from all the samples of the WT group. We then divide all values from the samples of all groups, including the WT group, by that average value. By doing so, we set the average value of the WT group to 1 and express all values from all samples of all groups, including the WT group, relative to this average value. Differences between the samples of the WT group are hence retained and allow for error calculations and the display of error bars.  

      (5) Page 3 second sentence to the last: overexpression of TFAM leads to...? Did the author mean mtDNA?

      We updated the text to “Heterozygous knockout of Tfam in wild-type mice results in ~50% decrease of mtDNA levels, whereas moderate overexpression of Tfam leads to ~50% increase in mtDNA levels25,26”

      (6) The sentence "In summary, mtDNA copy number regulation is more complex than previously assumed and the TFAM-to-mtDNA ratio seems to be finely tuned in a tissue-specific manner" - not clear who assumed (references?) and based on what data, please rephrase.

      We updated the text and it now reads “In summary, mtDNA copy number regulation is more complex than suggested by previous studies23–27 and the TFAM-to-mtDNA ratio seems to be finely tuned in a tissue-specific manner.”

      (7) The significant increase in complex II activity under TFAM overexpression (Figure 3) warrants additional discussion.

      We updated the Results section and it now reads “We detected increased levels of the complex II subunit Succinate Dehydrogenase Complex Iron Sulfur Subunit B (SDHB). Complex II is exclusively nuclear encoded and a compensatory increase upon impaired mitochondrial gene expresson has been observed before32.

      We proceeded to measure the enzyme activities of individual OXPHOS complexes in liver mitochondria (Fig. 3C). The complex I and complex IV activities were reduced to about 50% in Polg-/mut; Tfam+/+ mice in comparison with wild-type mice (Fig. 3C). However, we did not see any further alteration of the reduced enzyme activities induced by TFAM overexpression or reduced TFAM expression (Fig. 3C). Interestingly, we detected a significant increase in complex II and complex II + complex III activity upon TFAM overexpression, which can partially be explained by the increased complex II protein levels we oberseved in Polg-/mut; Tfam+/OE mice (Fig. 3, B and C).”

      (8) The statistical methods used should be explicitly stated. Some results marked as non-significant appear visually significant, for example, mt-Cytb in Figure 2C, Supplementary Figure 2B).

      We updated the text in the Materials and Methods section to state the statistical methods and it now reads “Statistical analysis and generation of graphs were performed with GraphPad Prism v9 software except for quantitative mass spectrometry data which was analyzed and plotted using R as described above. Statistical comparisons were performed using one-way analysis of variance (ANOVA), and post hoc analysis was conducted with Dunnett’s multiple comparisons test. Values of P < 0.05 were considered statistically significant.”

      Minor points: 

      (1) Replace numerical indications of significance with asterisks for consistency.

      We replaced all numerical indications of significance with asterisks.

      (2) Abbreviations SKM and BAT are not defined.

      We removed the mentioning of SKM (skeletal muscle) as the data from this tissue was not included. The Introduction reads “In contrast, in brown adipose tissue (BAT), a decrease in TFAM levels normalized Uncoupling protein 1 (Ucp1) expression.”

      (3) Use uniform scales across bar graphs in Figure 2 to improve clarity.

      We updated Figure 2 to have uniform scales.

      (4) Remove or increase the transparency of data points in Figure 1A to make group averages more discernible.

      We removed the data points in Figure 1A.

      (5) Add a Y-axis title to Figure 1C.

      We added the Y-axis title “Heart / body weight” to Figure 1C.

      (6) Size of the font used in some figures (4?) is not appropriate.

      We increased the font size for the figures.

      (7) All figure legend titles need work. Insert "expression" after TFAM in the Figure 2 title, Change the title to "Modulation of TFAM expression..." in Figure 4. 

      The figure legends now read as follows:

      “Figure 2: Modulation of TFAM expression affects mtDNA copy number in a tissue-specific manner.”

      “Figure 4: Alteration of TFAM expression does not affect the heart phenotype of mtDNA mutator mice.”

    1. eLife Assessment

      This important paper describes the regulatory pathway of rRNA synthesis by Meioc-Piwil1 in germ cell differentiation in zebrafish. Using the molecular genetic and cytological approaches, the authors provide convincing evidence that Meioc antagonizes Piwil1, which downregulates the 45S pre-rRNA synthesis by heterochromatin formation for spermatocyte differentiation. The results will be of use to researchers in the field of germ cell/meiosis as well as RNA biosynthesis and chromatin.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper Kawasaki et al describe a regulatory role for the PIWI/piRNA pathway in rRNA regulation in Zebrafish. This regulatory role was uncovered through a screen for gonadogenesis defective mutants, which identified a mutation in the meioc gene, a coiled-coil germ granule protein. Loss of this gene leads to redistribution of Piwil1 from germ granules to the nucleolus, resulting in silencing of rRNA transcription.

      Strengths:

      Most of the experimental data provided in this paper is compelling. It is clear that in the absence of meioc, PiwiL1 translocates in to the nucleolus and results in down regulation of rRNA transcription. the genetic compensation of meioc mutant phenotypes (both organismal and molecular) through reduction in PiwiL1 levels are evidence for a direct role for PiwiL1 in mediating the phenotypes of meioc mutant.

      Weaknesses:

      Questions remain on the mechanistic details by which PiwiL1 mediated rRNA down regulation, and whether this is a function of Piwi in an unperturbed/wildtype setting. There is certainly some evidence provided in support of the a natural function for piwi in regulating rRNA transcription (figure 5A+5B). However, the de-enrichment of H3K9me3 in the heterozygous (Figure 6F) is very modest and in my opinion not convincingly different relative to the control provided. It is certainly possible that PiwiL1 is regulating levels through cleavage of nascent transcripts. Another aspect I found confounding here is the reduction in rRNA small RNAs in the meioc mutant; I would have assumed that the interaction of PiwiL1 with the rRNA is mediated through small RNAs but the reduction in numbers do not support this model. But perhaps it is simply a redistribution of small RNAs that is occurring. Finally, the ability to reduce PiwiL1 in the nucleolus through polI inhibition with actD and BMH-21 is surprising. What drives the accumulation of PiwiL1 in the nucleolus then if in the meioc mutant there is less transcription anyway?

      Despite the weaknesses outlined, overall I find this paper to be solid and valuable, providing evidence for a consistent link between PIWI systems and ribosomal biogenesis. Their results are likely to be of interest to people in the community, and provide tools for further elucidating the reasons for this link.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors report that Meioc is required to upregulate rRNA transcription and promote differentiation of spermatogonial stem cells in zebrafish. The authors show that upregulated protein synthesis is required to support spermatogonial stem cells' differentiation into multi-celled cysts of spermatogonia. Coiled coil protein Meioc is required for this upregulated protein synthesis and for increasing rRNA transcription, such that the Meioc knockout accumulates 1-2 cell spermatogonia and fails to produce cysts with more than 8 spermatogonia. The Meioc knockout exhibits continued transcriptional repression of rDNA. Meioc interacts with and sequesters Piwil1 to the cytoplasm. Loss of Meioc increases Piwil1 localization to the nucleolus, where Piwil1 interacts with transcriptional silencers that repress rRNA transcription.

      Strengths:

      This is fundamental study that expands our understanding of how ribosome biogenesis contributes to differentiation and demonstrates that zebrafish Meioc plays a role in this process during spermatogenesis. This work also expands our evolutionary understanding of Meioc and Ythdc2's molecular roles in germline differentiation. In mouse, the Meioc knockout phenocopies the Ythdc2 knockout, and studies thus far have indicated that Meioc and Ythdc2 act together to regulate germline differentiation. Here, in zebrafish, Meioc has acquired a Ythdc2-independent function. This study also identifies a new role for Piwil1 in directing transcriptional silencing of rDNA.

      Comments on revisions:

      Major and minor concerns were addressed in the revision.

    4. Reviewer #3 (Public review):

      Summary:

      The paper describes the molecular pathway to regulate germ cell differentiation in zebrafish through ribosomal RNA biogenesis. Meioc sequesters Piwil1, a Piwi homolog, which suppresses the transcription of the 45S pre-rDNA by the formation of heterochromatin, to the perinuclear bodies.

      Strong points:

      The authors nicely provided the molecular evidence on the antagonism of Meioc to Piwil1 in the rRNA synthesis, which supported by the genetic evidence that the inability of the meioc mutant to enter meiosis is suppressed by the piwil1 heterozygosity. The authors nicely address my previous points.

      Weak points:

      Although the authors made an effort to revise the text. However, there are still some points that the authors need to check their text. Some of them are shown in "Minor points" below. I am sorry that some of them should have been pointed in my previous review.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper Kawasaki et al describe a regulatory role for the PIWI/piRNA pathway in rRNA regulation in Zebrafish. This regulatory role was uncovered through a screen for gonadogenesis defective mutants, which identified a mutation in the meioc gene, a coiled-coil germ granule protein. Loss of this gene leads to redistribution of Piwil1 from germ granules to the nucleolus, resulting in silencing of rRNA transcription.

      Strengths:

      Most of the experimental data provided in this paper is compelling. It is clear that in the absence of meioc, PiwiL1 translocates in to the nucleolus and results in down regulation of rRNA transcription. the genetic compensation of meioc mutant phenotypes (both organismal and molecular) through reduction in PiwiL1 levels are evidence for a direct role for PiwiL1 in mediating the phenotypes of meioc mutant.

      Weaknesses:

      Questions remain on the mechanistic details by which PiwiL1 mediated rRNA down regulation, and whether this is a function of Piwi in an unperturbed/wildtype setting. There is certainly some evidence provided in support of the natural function for piwi in regulating rRNA transcription (figure 5A+5B). However, the de-enrichment of H3K9me3 in the heterozygous (Figure 6F) is very modest and in my opinion not convincingly different relative to the control provided. It is certainly possible that PiwiL1 is regulating levels through cleavage of nascent transcripts. Another aspect I found confounding here is the reduction in rRNA small RNAs in the meioc mutant; I would have assumed that the interaction of PiwiL1 with the rRNA is mediated through small RNAs but the reduction in numbers do not support this model. But perhaps it is simply a redistribution of small RNAs that is occurring. Finally, the ability to reduce PiwiL1 in the nucleolus through polI inhibition with actD and BMH-21 is surprising. What drives the accumulation of PiwiL1 in the nucleolus then if in the meioc mutant there is less transcription anyway?

      Despite the weaknesses outlined, overall I find this paper to be solid and valuable, providing evidence for a consistent link between PIWI systems and ribosomal biogenesis. Their results are likely to be of interest to people in the community, and provide tools for further elucidating the reasons for this link.

      The amount of cytoplasmic rRNA in piwi+/- was increased by 26% on average (figure 5A+5B), the amount of ChiP-qPCR of H3K9 was decreased by about 26% (Figure 6F), and ChiP-qPCR of Piwil1 was decreased by 35% (Figure 6G), so we don't think there is a big discrepancy. On the other hand, the amount of ChiP-qPCR of H3K9 in meioc<sup>mo/mo</sup> was increased by about 130% (Figure 6F), while ChiP-qPCR of Piwil1 was increased by 50%, so there may be a mechanism for H3K9 regulation of Meioc that is not mediated by Piwil1. As for what drives the accumulation of Piwil1 in the nucleolus, although we have found that Piwil1 has affinity for rRNA (Fig. 6A), we do not know what recruits it. Significant increases in the 18-35nt small RNA of 18S, 28S rRNAs and R2 were not detected in meioc<sup>mo/mo</sup> testes enriched for 1-8 cell spermatogonia, compared with meioc<sup>+/mo</sup> testes. The nucleolar localization of Piwil1 has revealed in this study, which will be a new topic for future research.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors report that Meioc is required to upregulate rRNA transcription and promote differentiation of spermatogonial stem cells in zebrafish. The authors show that upregulated protein synthesis is required to support spermatogonial stem cells' differentiation into multi-celled cysts of spermatogonia. Coiled coil protein Meioc is required for this upregulated protein synthesis and for increasing rRNA transcription, such that the Meioc knockout accumulates 1-2 cell spermatogonia and fails to produce cysts with more than 8 spermatogonia. The Meioc knockout exhibits continued transcriptional repression of rDNA. Meioc interacts with and sequesters Piwil1 to the cytoplasm. Loss of Meioc increases Piwil1 localization to the nucleolus, where Piwil1 interacts with transcriptional silencers that repress rRNA transcription.

      Strengths:

      This is a fundamental study that expands our understanding of how ribosome biogenesis contributes to differentiation and demonstrates that zebrafish Meioc plays a role in this process during spermatogenesis. This work also expands our evolutionary understanding of Meioc and Ythdc2's molecular roles in germline differentiation. In mouse, the Meioc knockout phenocopies the Ythdc2 knockout, and studies thus far have indicated that Meioc and Ythdc2 act together to regulate germline differentiation. Here, in zebrafish, Meioc has acquired a Ythdc2-independent function. This study also identifies a new role for Piwil1 in directing transcriptional silencing of rDNA.

      Weaknesses:

      There are limited details on the stem cell-enriched hyperplastic testes used as a tool for mass spec experiments, and additional information is needed to fully evaluate the mass spec results. What mutation do these testes carry? Does this protein interact with Meioc in the wildtype testes? How could this mutation affect the results from the Meioc immunoprecipitation?

      Stem cell-enriched hyperplastic testes came from wild-type adult sox17::GFP transgenic zebrafish. Sperm were found in these hyperplastic testes, and when stem cells were transplanted, they self-renewed and differentiated into sperm. It is not known if the hyperplasias develop due to a genetic variant in the line. We added the following comment in L201-204.

      “The SSC-enriched hyperplastic testes, which are occasionally found in adult wildtype zebrafish, contain cells at all stages of spermatogenesis. Hyperplasia-derived SSCs self-renewed and differentiated in transplants of aggregates mixed with normal testicular cells.”

      Reviewer #3 (Public review):

      Summary:

      The paper describes the molecular pathway to regulate germ cell differentiation in zebrafish through ribosomal RNA biogenesis. Meioc sequesters Piwil1, a Piwi homolog, which suppresses the transcription of the 45S pre-rDNA by the formation of heterochromatin, to the perinuclear bodies. The key results are solid and useful to researchers in the field of germ cell/meiosis as well as RNA biosynthesis and chromatin.

      Strengths:

      The authors nicely provided the molecular evidence on the antagonism of Meioc to Piwil1 in the rRNA synthesis, which supported by the genetic evidence that the inability of the meioc mutant to enter meiosis is suppressed by the piwil1 heterozygosity.

      Weaknesses:

      (1) Although the paper provides very convincing evidence for the authors' claim, the scientific contents are poorly written and incorrectly described. As a result, it is hard to read the text. Checking by scientific experts would be highly recommended. For example, on line 38, "the global translation activity is generally [inhibited]", is incorrect and, rather, a sentence like "the activity is lowered relative to other cells" is more appropriate here. See minor points for more examples.

      Thank you for pointing that out. I corrected the parts pointed out.

      (2) In some figures, it is hard for readers outside of zebrafish meiosis to evaluate the results without more explanation and drawing.

      We refined Figure 1A and added explanation about SSC, sox17::egfp positive cells, and the SSC-enriched hyperplastic testis in L155-158.

      (3) Figure 1E, F, cycloheximide experiments: Please mention the toxicity of the concentration of the drug in cell proliferation and viability.

      When testicular tissue culture was performed at 0.1, 1, 10, 100, 250, and 500mM, abnormal strong OP-puro signals including nuclei were found in cells at 10mM or more. We added the results in the Supplemental Figure S2G. In addition, at 1mM, growth was perturbed in fast-growing 32≤-cell cysts of spermatogonia, but not in 1-4-cell spermatogonia, as described in L127-130.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I don't have any recommendations for improvement. While I have outlined some of the weaknesses of the paper above. I don't see addressing these questions as pertinent for publication of this paper.

      Reviewer #2 (Recommendations for the authors):

      (1) The manuscript uses the terms 1-2 cell spermatogonia, GSC, and SSC throughout the figures and text. For example, 1-2 cell spermatogonia is used in Figure 1C, GSC is used in Figure 1F, and SSC is used in Figure 1 legend. The use of all three terms without definitions as to how they each relate with one another is confusing, particularly to those outside the zebrafish spermatogenesis field. It would be best to only use one term if the three terms are used interchangeably or to define each term if they represent different populations.

      GSC is a writing mistake. In this study, sox17-positive cells, which have been confirmed to self-renew and differentiate (Kawasaki et al., 2016), are considered SSCs. On the other hand, a comparison of meioc and ythdc2 mutants revealed differences in the composition of each cyst, so we describe the number of cysts confirmed. We added new data that 1-2 cell spermatogonia are sox17-positive in Supplemental Figure S3 (L157-158).

      (2) Figure 1B: What does the "SC" label represent in these figure panels?

      We added the explanation in the Figure legend.

      (3) Fig 7B and S7B show incongruent results, and the text implies that Fig S7B data better reflects in vivo biology. It is not clear how the authors interpret the different results between 7B and S7B.

      Thank you for pointing that out. Fig 7A and 7B were obtained by isolating sox17-positive cells. Because it was difficult to detect nucleoli in the isolated cells, probably due to the isolation procedure, we added S7B, which was analyzed in sectioned tissues. As this reviewer pointed out, S7B reflects the in vivo state better, so we changed S7B to 7B and 7B to S7B.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      (1) For general readers, it is nice to add a scheme of zebrafish spermatogenesis (lines 77-78) together with Figure 1A.

      As mentioned above, we refined Figure 1A.

      (2) Line 28, silence: the word "silence" is too strong here since rDNA is transcribed in some levels to ensure the cell survival.

      Thank you for your comment. We changed "silence" to "maintain low levels."

      (3) Line 60, YTDHC2: Please explain more about what protein YTDHC2 is.

      We added a description of Ythdc2 in the introduction.

      (4) Line 69, Piwil1: Please explain more about what protein Piwil1 is.

      We added a description of Piwil1 in the introduction.

      (5) Figure 1B, sperm: Please show clearly which sperms are in this figure using arrows etc.

      We represented sperm using arrowheads in Fig 1B.

      (6) Figure 1C, SC: Please show what SC is in the legend.

      We added the explanation in the Figure legend.

      (7) Line 83, meiotic makers: should be "meiotic prophase I makers".

      Thank you for pointing out the inaccurate expression description. We revised it.

      (8) Line 84, phosphor-histone H3: Should be "histone H3 phospho-S10 "

      We revised it.

      (9) Figure S1A, PH3: Please add PH3 is "histone H3 phospho-S10 ".

      We revised it.

      (10) Figure S1A, moto+/-: this heterozygous mutant showed an increased apoptosis. If so, please mention this in the text. If not, please remove the data.

      Thank you for pointing that out. The heterozygous mutant did not increase apoptosis, so we removed the data.

      (11) Line 88, no females developed: This means all males in the mutant. If so, what Figure S1B shows? These cells are spermatocytes? No "oocytes" developed is correct here?

      All meioc<sup>mo/mo</sup> zebrafish were males, and the meioc<sup>mo/mo</sup> cells in Fig. S1B are spermatogonia. No spermatocytes or oocytes were observed. To show this, we added "no oocytes" in L90.

      (12) Line 89, initial stages: What do the initial stages mean here? Please explain.

      The “initial stages” was changed to the pachytene stage.

      (13) Figure S1C: mouse Meioc rectangle lacks a right portion of it. Please explain two mutations encode a truncated protein in the main text.

      I apologize. It seems that the portion was missing during the preparation of the manuscript. We corrected it. In addition, we added a description of the protein truncation in L100-101.

      (14) Line 99: What "GRCz11" is.

      GRCz11 refers to the version of the zebrafish reference genome assembly. We added this.

      (15) Figure S2A: Dotted lines are cysts. If so, please mention it in the legend.

      We corrected the figure legend.

      (16) Figure S2B and C:, B1-4, C1-7: Rather use spermatogonia etc as a caption here.

      We corrected the figure and figure legend.

      (17) Line 113, hereafter, wildtype: Should be "wild type" or "wild-type".

      We corrected them.

      (18) Figure 1C: Please indicate what dotted lines mean here.

      We added “Dotted lines; 1-2 cell spermatogonia.”

      (19) Line 113, de novo: Please italicize it.

      We corrected it.

      (20) Line 113-116: Figure 1D shows two populations in the protein synthesis (low and high) in the 1-2-cell stage. Please mention this in the text.

      We added mention of two population.

      (21) Line 121, in vitro: Please italicize it.

      We corrected it.

      (22) Line 138-139, Figure 2A: Please indicate two populations in the rRNA concentrations (low and high) in the 1-2-cell stage. How much % of each cell is?

      We added mention of two population and % of each cell.

      (23) Figure 2B, cytes: Please explain the rRNA expression in spermatocytes (cytes) in the text.

      The decrease in rRNA signal intensity in spermatocytes was added.

      (24) Figure 2A, lines 147, low signals: Figure 2A did not show big differences between wild type and the mutant. What did the authors mean here? Lower levels of rRNAs in the mutant than in wild type. If so, please write the text in that way.

      We think that it is important to note that we were unable to find cells with upregulated rRNA signals, and therefore changed to “could not find cells with high signals of rRNAs and Rpl15 in meioc<sup>mo/mo</sup> spermatogonia”.

      (25) Figure 2E: Please add a schematic figure of a copy of rDNA locus such as Fig. S3A right.

      We added a schema of rDNA locus and primer sites such as Figure S3A right (now Figure 2F) in Figure 2E.

      (26) Figure S3A: This Figure should be in the main Figure. The quantification of Northern blots should be shown as a graph with statistical analysis.

      We added the quantification and transfer to the main Figure (Figure 2F).

      (27) Figure 4A: Please show single-color images (red or green) with merged ones.

      We added single-color images in the Figure 4A.

      (28) Line 198, Piwil1: Please explain what Piwil1 is briefly.

      We are sorry, but we could not quite understand the meaning of this comment. To show that Piwil1 is located in the nucleolus, we indicated it as (Figure 4A, arrowhead) in L209.

      (29) Line 198, Ddx4-positive: What is "Ddx4-positive"? Explain it for readers.

      Ddx4 is a marker for germinal granules, and the description was changed to reflect this.

      (30) Line 209, Fig. S4D-G: Please mention the method of the detection of piRNA briefly.

      We have described that we have sequenced small RNAs of 18-35 nt. Accordingly, we changed the term piRNA to small RNA.

      (31) Line 217: Please mention piwil1 homozygous mutant are inviable.

      We added that piwil1-/- are viable in L231.

    1. eLife Assessment

      This study provides new insights into the expression profile of ILCs that demonstrate a history of RAG expression. It examines in part the potential intrinsic regulation of RAG expression and seeks to understand how the epigenetic state of ILCs is established, although a full understanding of intrinsic factors is only partially supported. The work provides a convincing and important molecular dataset, and strengthens our understanding of intrinsic regulation, and would be of interest more broadly to cell biologists seeking to understand immune cell development.

    2. Reviewer #1 (Public Review):

      The study starts with the notion that in an AD-like disease model, ILC2s in the Rag1 knock-out were expanded and contained relatively more IL-5+ and IL-13+ ILC2s. This was confirmed in the Rag2 knock-out mouse model.

      By using a chimeric mouse model in which wild-type knock-out splenocytes were injected into irradiated Rag1 knock-out mice, it was shown that even though the adaptive lymphocyte compartment was restored, there were increased AD-like symptoms and increased ILC2 expansion and activity. Moreover, in the reverse chimeric model, i.e. injecting a mix of wild-type and Rag1 knock-out splenocytes into irradiated wild-type animals, it was shown that the Rag1 knock-out ILC2s expanded more and were more active. Therefore, the authors could conclude that the RAG1 mediated effects were ILC2 cell-intrinsic.

      Subsequent fate-mapping experiments using the Rag1Cre;reporter mouse model showed that there were indeed RAGnaïve and RAGexp ILC2 populations within naïve mice. Lastly, the authors performed multi-omic profiling, using single-cell RNA sequencing and ATAC-sequencing, in which a specific gene expression profile was associated with ILC2. These included well-known genes but the authors notably also found expression of Ccl1 and Ccr8 within the ILC2. The authors confirmed their earlier observations that in the RAGexp ILC2 population, the Th2 regulome was more suppressed, i.e. more closed, compared to the RAGnaïve population, indicative of the suppressive function of RAG on ILC2 activity. I do agree with the authors' notion that the main weakness was that this study lacks the mechanism by which RAG regulates these changes in ILC2s.

      The manuscript is very well written and easy to follow, and the compelling conclusions are well supported by the data. The experiments are meticulously designed and presented. I wish to commend the authors for the study's quality.

    3. Reviewer #2 (Public Review):

      Summary:

      The study by Ver Heul et al., investigates the consequences of RAG expression for type 2 innate lymphoid cell (ILC2) function. RAG expression is essential for the generation of the receptors expressed by B and T cells and their subsequent development. Innate lymphocytes, which arise from the same initial progenitor populations, are in part defined by their ability to develop in the absence of RAG expression. However, it has been described in multiple studies that a significant proportion of innate lymphocytes show a history of Rag expression. In compelling studies several years ago, members of this research team revealed that early Rag expression during the development of Natural Killer cells (Karo et al., Cell 2014), the first described innate lymphocyte, had functional consequences.

      Here, the authors revisit this topic, a worthwhile endeavour given the broad history of Rag expression within all ILCs and the common use of RAG-deficient mice to specifically assess ILC function. Focusing on ILC2s and utilising state-of-the-art approaches, the authors sought to understand whether early expression of Rag during ILC2 development had consequences for activity, fitness, or function. Having identified cell-intrinsic effects in vivo, the authors investigated the causes of this, identifying epigenetic changes associated with the accessibility genes associated with core ILC2 functions.

      The manuscript is well written and does an excellent job of supporting the reader through reasonably complex transcriptional and epigenetic analyses, with considerate use of explanatory diagrams. Overall I think that the conclusions are fair, the topic is thought-provoking, and the research is likely of broad immunological interest. I think that the extent of functional data and mechanistic insight is appropriate.

      Strengths:

      - The logical and stepwise use of mouse models to first demonstrate the impact on ILC2 function in vivo and a cell-intrinsic role. Initial analyses show enhanced cytokine production by ILC2 from RAG-deficient mice. Then through two different chimeric mice (including BM chimeras), the authors convincingly show this is cell intrinsic and not simply as a result of lymphopenia. This is important given other studies implicating enhanced ILC function in RAG-/- mice reflect altered competition for resources (e.g. cytokines).

      - Use of Rag expression fate mapping to support analyses of how cells were impacted - this enables a robust platform supporting subsequent analyses of the consequences of Rag expression for ILC2.

      - Use of snRNA-seq supports gene expression and chromatin accessibility studies - these reveal clear differences in the data sets consistent with altered ILC2 function.

      - Convincing evidence of epigenetic changes associated with loci strongly linked to ILC2 function. This forms a detailed analysis that potentially helps explain some of the altered ILC2 functions observed in ex vivo stimulation assays.

      - Provision of a wealth of expression data and bioinformatics analyses that can serve as valuable resources to the field.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The study starts with the notion that in an AD-like disease model, ILC2s in the Rag1 knockout were expanded and contained relatively more IL-5<sup>+</sup> and IL-13<sup>+</sup> ILC2s. This was confirmed in the Rag2 knock-out mouse model.

      By using a chimeric mouse model in which wild-type knock-out splenocytes were injected into irradiated Rag1 knock-out mice, it was shown that even though the adaptive lymphocyte compartment was restored, there were increased AD-like symptoms and increased ILC2 expansion and activity. Moreover, in the reverse chimeric model, i.e. injecting a mix of wild-type and Rag1 knock-out splenocytes into irradiated wild-type animals, it was shown that the Rag1 knock-out ILC2s expanded more and were more active. Therefore, the authors could conclude that the RAG1 mediated effects were ILC2 cell-intrinsic.

      Subsequent fate-mapping experiments using the Rag1Cre;reporter mouse model showed that there were indeed RAGnaïve and RAGexp ILC2 populations within naïve mice. Lastly, the authors performed multi-omic profiling, using single-cell RNA sequencing and ATACsequencing, in which a specific gene expression profile was associated with ILC2. These included well-known genes but the authors notably also found expression of Ccl1 and Ccr8 within the ILC2. The authors confirmed their earlier observations that in the RAGexp ILC2 population, the Th2 regulome was more suppressed, i.e. more closed, compared to the RAGnaïve population, indicative of the suppressive function of RAG on ILC2 activity. I do agree with the authors' notion that the main weakness was that this study lacks the mechanism by which RAG regulates these changes in ILC2s.

      The manuscript is very well written and easy to follow, and the compelling conclusions are well supported by the data. The experiments are meticulously designed and presented. I wish to commend the authors for the study's quality.

      Even though the study is compelling and well supported by the presented data, some additional context could increase the significance:

      (1) The presence of the RAGnaïve and RAGexp ILC2 populations raises some questions on the (different?) origin of these populations. It is known that there are different waves of ILC2 origin (most notably shown in the Schneider et al Immunity 2019 publication, PMID 31128962). I believe it would be very interesting to further discuss or possibly show if there are different origins for these two ILC populations.

      Several publications describe the presence and origin of ILC2s in/from the thymus (PMIDs 33432227 24155745). Could the authors discuss whether there might be a common origin for the RAGexp ILC2 and Th2 cells from a thymic lineage? If true that the two populations would be derived from different populations, e.g. being the embryonic (possibly RAGnaïve) vs. adult bone marrow/thymus (possibly RAGexp), this would show a unique functional difference between the embryonic derived ILC2 vs. adult ILC2.

      We agree with the Reviewer that our findings raise important questions about ILC ontogeny. These are areas of ongoing investigation for us, and it is our hope this study may inform further investigation by others as well.

      Regarding the Schneider et al study, we have considered the possibility that RAG expression may mark a particular wave of ILC2 origin. In that study, the authors used a tamoxifen-based inducible Cre strategy in their experiments to precisely time the lineage tracing of a reporter from the Rosa26 locus. Those lineage tracing mice would overlap genetically with the RAG lineage tracing mice we used in our current study, thus performing combined timed migration fate mapping and RAG fate mapping experiments would require creating novel mouse strains.

      Similarly, the possible influence of the thymic or bone marrow environment on RAG expression in ILCs is an exciting possibility. Perhaps there are signals common to those environments that can influence all developing lymphocytes, including not only T and B cells but also ILCs, with one consequence being induction of RAG expression. While assessing levels of RAG-experienced ILCs in these tissues using our lineage tracing mouse may hint at these possibilities, conclusive evidence would require more precise control over the timing of RAG lineage tracing than our current reagents allow (e.g. to control for induction in those environments vs migration of previously fate-mapped cells to those environments).

      To answer these questions directly, we are developing orthogonal lineage tracing mouse strains, which can report on both timing of ILC development and RAG expression, but these mice are not available yet. Given the limitations of our currently available reagents, we were careful to focus our manuscript on the skin phenotype and the more descriptive aspects of the RAG-induced phenotype. We have elaborated on these important questions and referenced all the studies noted by the Reviewer in the Discussion section as areas of future inquiry on lines 421-433.  

      (2) On line 104 & Figures 1C/G etc. the authors describe that in the RAG knock-out ILC2 are relatively more abundant in the lineage negative fraction. On line 108 they further briefly mentioned that this observation is an indication of enhanced ILC2 expansion. Since the study includes an extensive multi-omics analysis, could the authors discuss whether they have seen a correlation of RAG expression in ILC2 with regulation of genes associated with proliferation, which could explain this phenomenon?

      We thank the Reviewer for pointing out this opportunity to further correlate our functional and multiomic findings. To address this, we first looked deeper into our prior analyses and found that among the pathways enriched in GSEA analysis of differentially expressed genes (DEGs) between RAG<sup>+</sup> and RAG<sup>-</sup> ILC2s, one of the pathways suppressed in RAG<sup>+</sup> ILC2s was “GOBP_EPITHELIAL_CELL_PROLIFERATION.”

      ( Author response image 1). There are a few other gene sets present in other databases such as MSigDB with terms including “proliferation,” but these are often highly specific to a particular cell type and experimental or disease condition (e.g. tissue-specific cancers). We did not find any of these enriched in our GSEA analysis.

      Author response image 1.

      GSEA plot of GOBP epithelial proliferation pathway in RAG-experienced vs RAG-naïve ILC2s.

      The ability to predict cellular proliferation states from transcriptomic data is an area of active research, and there does not appear to be any universally accepted method to do this reliably. We found two recent studies (PMIDs 34762642; 36201535) that identified novel “proliferation signatures.” Since these gene sets are not present in any curated database, we repeated our GSEA analysis using a customized database with the addition of these gene sets. However, we did not find enrichment of these sets in our RAG+/- ILC2 DEG list. We also applied our GPL strategy integrating analysis of our epigenomic data to the proliferation signature genes, but we did not see any clear trend. Conversely, our GSEA analysis did not identify any enrichment for apoptotic signatures as a potential mechanism by which RAG may suppress ILC2s.

      Notwithstanding the limitations of inferring ILC2 proliferation states from transcriptomic and epigenomic data, our experimental data suggest RAG exerts a suppressive effect on ILC2 proliferation. To formally test the hypothesis that RAG suppresses proliferation in the most rigorous way, we feel new mouse strains are needed that allow simultaneous RAG fate mapping and temporally restricted fate mapping. We elaborate on this in new additions to the discussion on lines 421-433.

      Reviewer #2 (Public Review):

      Summary:

      The study by Ver Heul et al., investigates the consequences of RAG expression for type 2 innate lymphoid cell (ILC2) function. RAG expression is essential for the generation of the receptors expressed by B and T cells and their subsequent development. Innate lymphocytes, which arise from the same initial progenitor populations, are in part defined by their ability to develop in the absence of RAG expression. However, it has been described in multiple studies that a significant proportion of innate lymphocytes show a history of Rag expression. In compelling studies several years ago, members of this research team revealed that early Rag expression during the development of Natural Killer cells (Karo et al., Cell 2014), the first described innate lymphocyte, had functional consequences.

      Here, the authors revisit this topic, a worthwhile endeavour given the broad history of Rag expression within all ILCs and the common use of RAG-deficient mice to specifically assess ILC function. Focusing on ILC2s and utilising state-of-the-art approaches, the authors sought to understand whether early expression of Rag during ILC2 development had consequences for activity, fitness, or function. Having identified cell-intrinsic effects in vivo, the authors investigated the causes of this, identifying epigenetic changes associated with the accessibility genes associated with core ILC2 functions.

      The manuscript is well written and does an excellent job of supporting the reader through reasonably complex transcriptional and epigenetic analyses, with considerate use of explanatory diagrams. Overall I think that the conclusions are fair, the topic is thoughtprovoking, and the research is likely of broad immunological interest. I think that the extent of functional data and mechanistic insight is appropriate.

      Strengths:

      - The logical and stepwise use of mouse models to first demonstrate the impact on ILC2 function in vivo and a cell-intrinsic role. Initial analyses show enhanced cytokine production by ILC2 from RAG-deficient mice. Then through two different chimeric mice (including BM chimeras), the authors convincingly show this is cell intrinsic and not simply as a result of lymphopenia. This is important given other studies implicating enhanced ILC function in RAG-/- mice reflect altered competition for resources (e.g. cytokines).

      - Use of Rag expression fate mapping to support analyses of how cells were impacted - this enables a robust platform supporting subsequent analyses of the consequences of Rag expression for ILC2.

      - Use of snRNA-seq supports gene expression and chromatin accessibility studies - these reveal clear differences in the data sets consistent with altered ILC2 function.

      - Convincing evidence of epigenetic changes associated with loci strongly linked to ILC2 function. This forms a detailed analysis that potentially helps explain some of the altered ILC2 functions observed in ex vivo stimulation assays.

      - Provision of a wealth of expression data and bioinformatics analyses that can serve as valuable resources to the field.

      We appreciate the strengths noted by the Reviewer for our study. We would like to especially highlight the last point about our single cell dataset and provision of supplemental data tables. Although our study is focused on AD-like skin disease and skin draining lymph nodes, we hope that our findings can serve as a valuable resource for future investigation into mechanisms of RAG modulation of ILC2s in other tissues and disease states.  

      Weaknesses:

      - Lack of insight into precisely how early RAG expression mediates its effects, although I think this is beyond the scale of this current manuscript. Really this is the fundamental next question from the data provided here.

      We thank the Reviewer for their recognition of the context of our current work and its future implications. We aimed to present compelling new observations within the scope of what our current data can substantiate. We believe answering the next fundamental question of the mechanisms by which RAG mediates its effects in ILC2s will require development of novel reagents. We are actively pursuing this, and we look forward to others building on our findings as well.

      - The epigenetic analyses provide evidence of differences in the state of chromatin, but there is no data on what may be interacting or binding at these sites, impeding understanding of what this means mechanistically.

      We thank the Reviewer for pointing out this aspect of the epigenomic data analysis and the opportunity to expand the scope of our manuscript. We performed additional analyses of our data to identify DNA binding motifs and infer potential transcription factors that may be driving the effects of a history of RAG expression that we observed. We hope that these additional data, analyses, and interpretation add meaningful insight for our readers.

      We first performed the analysis for the entire dataset and validated that the analysis yielded results consistent with prior studies (e.g. finding EOMES binding motifs as a marker in NK cells). Then, we examined the differences in RAG fate-mapped ILC2s. These analyses are in new Figure S10 and discussed on lines 277-316.  

      We also performed an analysis specifically on the Th2 locus, given the effects of RAG on type 2 cytokine expression. These analyses are in new Figure S12 and discussed on lines 366-378.

      - Focus on ILC2 from skin-draining lymph nodes rather than the principal site of ILC2 activity itself (the skin). This may well reflect the ease at which cells can be isolated from different tissues.

      We appreciate the Reviewer’s insight into the limitations of our study. Difficulties in isolating ILC2s from the skin were indeed a constraint in our study. In particular, we were unable to isolate enough ILC2s from the skin for stimulation and cytokine staining. Given that one of our main hypotheses was that RAG affects ILC2 function, we focused our studies on skin draining lymph nodes, which allowed measurement of the two main ILC2 functional cytokines, IL-5 and IL-13, as readouts in the key steady state and AD-like disease experiments.

      - Comparison with ILC2 from other sites would have helped to substantiate findings and compensate for the reliance on data on ILC2 from skin-draining lymph nodes, which are not usually assessed amongst ILC2 populations.

      We agree with the Reviewer that a broader survey of the RAG-mediated phenotype in other tissues and by extension other disease models would strengthen the generalizability of our observations. Indeed, we did a more expansive survey of tissues in our BM chimera experiments. We found a similar trend to our reported findings in the sdLN in tissues known to be affected by ILC2s ( Author response image 2) including the skin and lung and in other lymphoid tissues including spleen and mesenteric lymph nodes (mLN). We found that donor reconstitution in each tissue was robust except for the skin, where there was no significant difference between host and -donor CD45<sup>+</sup> immune cells and where CD45<sup>-</sup> parenchymal cells predominated ( Author response image 2A,C,E,G,I). This may explain why Rag1<sup>-/-</sup> donor ILC2s were significantly higher in proportion in all tissues except the skin, where we observed a similar trend that was not statistically significant ( Author response image 2B,D,F,H,J).

      Notwithstanding these results, given that we unexpectedly observed enhanced AD-like inflammation in the MC903 model in Rag1 KO mice, we concentrated our later experiments and analyses on defining the differences in skin draining ILC2s modulated by RAG. Our subsequent findings in the skin provoke many new hypotheses about the role of RAG in ILC2s in other tissues, and our tissue survey in the BM chimera provides additional rationale to pursue similar studies in disease models in other tissues. While this is an emerging area of investigation in our lab, we opted to focus this manuscript on our findings related to the AD-like disease model. We have ongoing studies to investigate other tissues, and we are still in the early stages of developing disease models to expand on these findings. However, if the reviewer feels strongly this additional data should be included in the manuscript, we are happy to add it. Considering the complexity of the data and concepts in the manuscript, we hoped to keep it focused to where we have strong molecular, cellular, and phenotypic outcomes.

      Author response image 2.

      Comparison of immune reconstitution in and ILC2 donor proportions in different tissues from BM chimeras. Equal quantities of bone marrow cells from Rag1<sup>-/-</sup> (CD45.2,CD90.2) and WT (CD45.2, CD90.1) C57Bl/6J donor mice were used to reconstitute the immune systems of irradiated recipient WT (CD45.1) C57Bl/6J mice. The proportion of live cells that are donor-derived (CD45.2), host-derived (CD45.1), or parenchymal (CD45-) [above] and proportion of ILC2s that are from Rag1<sup>-/-</sup> (CD90.2) or WT (CD90.1) donors [below] for A,B) skin C,D) sdLN E,F) lung G,H) spleen and I,J) mLN.

      - The studies of how ILC2 are impacted are a little limited, focused exclusively on IL-13 and IL-5 cytokine expression.

      We agree with the reviewer that our functional readout on IL-5 and IL-13 is relatively narrow. However, this focused experimental design was based on several considerations. First, IL-5 and IL-13 are widely recognized as major ILC2 effector molecules (Vivier et al, 2018, PMID 30142344). Second, in the MC903 model of AD-like disease, we have previously shown a clear correlation between ILC2s, levels of IL-5 and IL-13, and disease severity as measured by ear thickness (Kim et al, 2013, PMID 23363980). Depletion of ILC2s led to decreased levels of IL-13 and IL-5 and correspondingly reduced ear inflammation. However, while ILC2s are also recognized to produce other effector molecules such as IL-9 and Amphiregulin, which are likely involved in human atopic dermatitis (Namkung et al, 2011, PMID 21371865; Rojahn et al, 2020, PMID 32344053), there is currently no evidence linking these effectors to disease severity in the MC903 model. Third, IL-13 is emerging as a key cytokine driving atopic dermatitis in humans (Tsoi et al, 2019, PMID 30641038). Drugs targeting the IL-4/IL-13 receptor (dupilumab), or IL-13 itself (tralokinumab, lebrikizumab), have shown clear efficacy in treating atopic dermatitis. Interestingly, drugs targeting more upstream molecules, like TSLP (tezepelumab) or IL-33 (etokimab), have failed in atopic dermatitis. Taken together, these findings from both mouse and human studies suggest IL-13 is a critical therapeutic target, and thus functional readout, in determining the clinical implications of type 2 immune activation in atopic dermatitis.

      Aside from effector molecules, other readouts such as surface receptors may be of interest in understanding the mechanism of how RAG influences ILC2 function. For example, IL-18 has been shown to be an important co-stimulatory molecule along with TSLP in driving production of IL-13 by cutaneous ILC2s (Ricardo-Gonzalez et al, 2018, PMID 30201992). Our multiomic analysis showed decreased IL-18 receptor regulome activity in RAG-experienced ILC2s, which may be a mechanism by which RAG suppresses IL-13 production. Ultimately, in that study the role of IL-18 in enhancing MC903-induced inflammation through ILC2s was via increased production of IL-13, which was one of our major functional readouts. To clearly define mechanisms like these will require generation of new mice to interrogate RAG status in the context of tissue-specific knockout of other genes, such as the IL-18 receptor. We plan to perform these types of experiments in follow up studies. Notwithstanding this, we have now included additional discussion on lines 476508 to highlight why understanding how RAG impacts other regulatory and effector pathways would be an interesting area of future inquiry.

      Reviewer #3 (Public Review):

      In this study, Ver Heul et al. investigate the role of RAG expression in ILC2 functions. While RAG genes are not required for the development of ILCs, previous studies have reported a history of expression in these cells. The authors aim to determine the potential consequences of this expression in mature cells. They demonstrate that ILC2s from RAG1 or RAG2 deficient mice exhibit increased expression of IL-5 and IL-13 and suggest that these cells are expanded in the absence of RAG expression. However, it is unclear whether this effect is due to a direct impact of RAG genes or a consequence of the lack of T and B cells in this condition. This ambiguity represents a key issue with this study: distinguishing the direct effects of RAG genes from the indirect consequences of a lymphopenic environment.

      The authors focus their study on ILC2s found in the skin-draining lymph nodes, omitting analysis of tissues where ILC2s are more enriched, such as the gut, lungs, and fat tissue. This approach is surprising given the goal of evaluating the role of RAG genes in ILC2s across different tissues. The study shows that ILC2s derived from RAG-/- mice are more activated than those from WT mice, and RAG-deficient mice show increased inflammation in an atopic dermatitis (AD)-like disease model. The authors use an elegant model to distinguish ILC2s with a history of RAG expression from those that never expressed RAG genes. However, this model is currently limited to transcriptional and epigenomic analyses, which suggest that RAG genes suppress the type 2 regulome at the Th2 locus in ILC2s.

      We agree with the Reviewer that understanding the role of RAG in ILC2s across different tissues is an important goal. One of the primary inspirations for our paper was the clinical paradox that patients with Omenn syndrome, despite having profound adaptive T cell deficiency, develop AD with much greater penetrance than in the general population. Thus, there was always an appreciation for the likelihood that skin ILC2s have a unique proclivity towards the development of AD-like disease. Notwithstanding this, given the profound differences that can be found in ILC2s based on their tissue residence and disease state (as the Reviewer also points out below), we focused our investigations on characterizing the skin draining lymph nodes to better define factors underlying our initial observations of enhanced AD-like disease in Rag1<sup>-/-</sup> mice. While our findings in skin provoke the hypothesis that similar effects may be observed in other tissues and influence corresponding disease states, we were cautious not to suggest this may be the case by reporting surveys of other tissues without development of additional disease models to formally test these hypotheses. We present this manuscript now as a short, skin-focused study, rather than delaying publication to expand its scope. Truthfully, this project started in 2015 and has undergone many delays with the hopes of newer technologies and reagents coming to add greater clarity. We hope our study will enable others to pursue the goal of understanding the broader effects of RAG in ILC2s, and potentially other innate lymphoid lineages as well.

      We did a more expansive survey of tissues in our BM chimera experiments. We found a similar trend to our reported findings in the sdLN in tissues known to be affected by ILC2s ( Author response image 2) including the skin and lung and in other lymphoid tissues including spleen and mesenteric lymph nodes (mLN). We found that donor reconstitution in each tissue was robust except for the skin, where there was no significant difference between host and donor CD45<sup>+</sup> immune cells and where CD45<sup>-</sup> parenchymal cells predominated ( Author response image 2A,C,E,G,I). This may explain why Rag1<sup>-/-</sup> donor ILC2s were significantly higher in proportion in all tissues except the skin, where we observed a similar trend that was not statistically significant ( Author response image 2B,D,F,H,J). However, given the lack of correlation to disease readouts in other organ systems, we chose to not include this data in our manuscript. However, if the Reviewer feels these data should be included, we would be happy to include as a supplemental figure.

      The authors report a higher frequency of ILC2s in RAG-/- mice in skin-draining lymph nodes, which is expected as these mice lack T and B cells, leading to ILC expansion. Previous studies have reported hyper-activation of ILCs in RAG-deficient mice, suggesting that this is not necessarily an intrinsic phenomenon. For example, RAG-/- mice exhibit hyperphosphorylation of STAT3 in the gut, leading to hyperactivation of ILC3s. This study does not currently provide conclusive evidence of an intrinsic role of RAG genes in the hyperactivation of ILC2s. The splenocyte chimera model is artificial and does not reflect a normal environment in tissues other than the spleen. Similarly, the mixed BM model does not demonstrate an intrinsic role of RAG genes, as RAG1-/- BM cells cannot contribute to the B and T cell pool, leading to an expected expansion of ILC2s. As the data are currently presented it is expected that a proportion of IL-5-producing cells will come from the RAG1/- BM.

      The Reviewer raises an important point about the potential cell-intrinsic roles of RAG vs the many cell-extrinsic explanations that could affect ILC2 populations, with the most striking being the lack of T and B cells in RAG knockout mice. It is well-established that splenocyte transfer into T and B cell-deficient mice reconstitutes T cell-mediated effects (such as the T cell transfer colitis model pioneered by Powrie and others), and we were careful in our interpretation of the splenocyte chimera experiment to conclude only that lack of Tregs was unlikely to explain the enhanced ADlike disease in T (and B) cell-deficient mice.

      We agree with the Reviewer that the Rag1<sup>-/-</sup> BM will not contribute to the B and T cell pool. However, BM from the WT mice would be expected to contribute to development of the adaptive lymphocyte pool. Indeed, we found that most of the CD45<sup>+</sup> immune cells in the spleens of BM chimera mice were donor-derived ( Author response image 3A), and total levels of B cells and T cells showed reconstitution in a pattern similar to control spleens from donor WT mice, while spleens from donor Rag1<sup>-/-</sup> mice expectedly had essentially no detectable adaptive lymphocytes ( Author response image 3B-D). From this, we concluded the BM chimera experiment was successful in establishing an immune environment with the presence of adaptive lymphocytes, and the differences in ILC2 proportions we observed were in the context of developing alongside a normal number of B and T lymphocytes. Notwithstanding the potential role of the adaptive lymphocyte compartment in shaping ILC2 development, since we transplanted equal amounts of WT and Rag1<sup>-/-</sup> BM into the same recipient environment, we are not able to explain how cell-extrinsic effects alone would account for the unequal numbers of WT vs Rag1<sup>-/-</sup> ILC2s we observed after immune reconstitution.

      Author response image 3.

      Comparison of immune reconstitution in BM chimeras to controls. Equal quantities of bone marrow cells from Rag1<sup>-/-</sup> (CD45.2) and WT (CD45.2) C57Bl/6J donor mice were used to reconstitute the immune systems of irradiated recipient WT (CD45.1) C57Bl/6J mice. A) Number of WT recipient CD45.1+ immune cells in the spleens of recipient mice compared to number of donor CD45.2+ cells (WT and Rag1<sup>-/-</sup>) normalized to 100,000 live cells. Comparison of numbers of B cells, CD4+ T cells, and CD8+ T cells in spleens of B) BM chimera mice, C) control WT mice and D) control Rag1<sup>-/-</sup> mice.

      We also subsequently found transcriptional and epigenomic differences in RAG-experienced ILC2s compared to RAG-naïve ILC2s. Critically, these differences were present in ILC2s from the same mice that had developed normally within an intact immune system, rather than in the setting of a BM transplant or a defective immune background such as in Rag1<sup>-/-</sup> mice.

      We recognize that there are almost certainly cell-extrinsic factors affecting ILC2s in Rag1<sup>-/-</sup> mice due to lack of B and T cells, and that BM chimeras are not perfect substitutes for simulating normal hematopoietic development. However, the presence of cell-extrinsic effects does not negate the potential contribution of cell-intrinsic factors as well, and we respectfully stand by our conclusion that our data support a role, however significant, for cell-intrinsic effects of RAG in ILC2s.

      Finally, the Reviewer mentions the interesting observation that gut ILC3s exhibit hyperphosphorylation of STAT3 in Rag1<sup>-/-</sup> mice compared to WT as an example of cell-extrinsic effects of RAG deficiency (we assume this is in reference to Mao et al, 2018, PMID 29364878 and subsequent work). We now reference this paper and have included additional discussion on how our observations of ILC2s may be generalizable to not only other organ systems, but also other ILC subsets, limitations on these generalizations, and future directions on lines 477-520.

      Overall, the level of analysis could be improved. Total cell numbers are not presented, the response of other immune cells to IL-5 and IL-13 (except the eosinophils in the splenocyte chimera mice) is not analyzed, and the analysis is limited to skin-draining lymph nodes.

      We thank the Reviewer for the suggestions to add rigor to our analysis. ILC2 populations are relatively rare, and we designed our experiments to assess frequencies, rather than absolute numbers. We did not utilize counting beads, so our counts may not be comparable between samples. We have added additional data for absolute cell counts normalized to 100,000 live cells for each experiment (see below for a summary of new panels in each figure). Our new data on total cell numbers are consistent with the initial observations regarding frequency of ILC2s we reported from our experiments. For the BM chimera experiments, we presented the proportions of ILC2s, and IL-5 and IL-13 positive ILC2s, by donor source, as this is the critical question of the experiment. Notwithstanding our analysis by proportion, we found that the frequency of Rag1<sup>-/-</sup> ILC2s, IL-5<sup>+</sup> cells, or IL-13<sup>+</sup> cells within Lin- population was also significantly increased. While our initial submission included only the proportions for clarity and simplicity, we now include frequency and absolute numbers in new panels for more critical appraisal of our data by readers.

      In New Figure 1, we added new panels for ILC2 cell number in both the AD-like disease experiment (C) and in steady state (H).

      In New Figure S2, we added a panel for ILC2 cell number in steady state (B).

      In Figure 2 and associated supplemental data in Figure S4, we added several more panels. For the splenocyte chimera, we added a panel for ILC2 cell number in New Figure 2C.

      We incorporated multiple new panels in New Figure S4 to address the need for more data to be shown for the BM chimera (also requested by Reviewer #2). These included total cell counts and frequency for ILC2 (New Figure S4F,G), and IL-5<sup>+</sup> (New Figure S4I,K) and IL-13<sup>+</sup> (New Figure S4J,L) ILCs in addition to the proportions originally presented in Figure 2.  

      In terms of the limited analysis of other tissues, our initial observation of enhanced AD-like disease in Rag1<sup>-/-</sup> compared to WT mice built on our prior work elucidating the role of ILC2s in the MC903 model of AD-like disease in mice and AD in humans (Kim et al, 2013, PMID 23363980). Consequently, we focused on the skin to further develop our understanding of the role of RAG1 in this model. As in our prior studies, technical limitations in obtaining sufficient numbers of ILC2s from the skin itself for ex vivo stimulation to assess effector cytokine levels required performing these experiments in the skin draining lymph nodes.

      We agree that IL-5 and IL-13 are major mediators of type 2 pathology and studying their effects on immune cells is an important area of inquiry, particularly since there are multiple drugs available or in development targeting these pathways. However, our goal was not to study what was happening downstream of increased cytokine production from ILC2s, but instead to understand what was different about RAG-deficient or RAG-naïve ILC2s themselves that drive their expansion and production of effector cytokines compared to RAG-sufficient or RAGexperienced ILC2s. By utilizing the same MC903 model in which we previously showed a critical role for ILC2s in driving IL-5 and IL-13 production and subsequent inflammation in the skin, we were able to instead focus on defining the cell-intrinsic aspects of RAG function in ILC2s.

      The authors have a promising model in which they can track ILC2s that have expressed RAG or not. They need to perform a comprehensive characterization of ILC2s in these mice, which develop in a normal environment with T and B cells. Approximately 50% of the ILC2s have a history of RAG expression. It would be valuable to know whether these cells differ from ILC2s that never expressed RAG, in terms of proliferation and expression of IL5 and IL-13. These analyses should be conducted in different tissues, as ILC2s adapt their phenotype and transcriptional landscape to their environment. Additionally, the authors should perform their AD-like disease model in these mice.

      We agree with the Reviewer (and a similar comment from Reviewer #2) that a broader survey of the RAG-mediated phenotype in other tissues and by extension other disease models would strengthen the generalizability of our observations. Indeed, we did a more expansive survey of tissues in our BM chimera experiments. We found a similar trend to our reported findings in the sdLN in tissues known to be affected by ILC2s ( Author response image 2) including the skin and lung and in other lymphoid tissues including spleen and mesenteric lymph nodes (mLN). We found that donor reconstitution in each tissue was robust except for the skin, where there was no significant difference between host and donor CD45<sup>+</sup> immune cells and where CD45<sup>-</sup> parenchymal cells predominated (Author response image 2A,C,E,G,I). This may explain why Rag1<sup>-/-</sup> donor ILC2s were significantly higher in proportion in all tissues except the skin, where we observed a similar trend that was not statistically significant (Author response image 2B,D,F,H,J). We omitted these analyses to maintain the focus on the skin, but we will be happy to add this data to the manuscript if the Reviewer feels this figure should be helpful.

      Notwithstanding these results, given that we unexpectedly observed enhanced AD-like inflammation in the MC903 model in Rag1 KO mice, we concentrated our later experiments and analyses on defining the differences in skin draining ILC2s modulated by RAG. Our subsequent findings in the skin provoke many new hypotheses about the role of RAG in ILC2s in other tissues, and our tissue survey in the BM chimera provides additional rationale to pursue similar studies in disease models in other tissues. While this is an emerging area of investigation in our lab, we opted to focus this manuscript on our findings related to the AD-like disease model. We have ongoing studies to investigate other tissues, and we are still in the early stages of developing disease models to expand on these findings. However, if the reviewer feels strongly this additional data should be included in the manuscript, we are happy to add it. Considering the complexity of the data and concepts in the manuscript, we hoped to keep it focused to where we have strong molecular, cellular, and phenotypic outcomes. We elaborate on the implications of our work for future studies, including limitations of our study and currently available reagents and need for new mouse strains to rigorously answer these questions on lines 476-508

      The authors provide a valuable dataset of single-nuclei RNA sequencing (snRNA-seq) and ATAC sequencing (snATAC-seq) from RAGexp (RAG fate map-positive) and RAGnaïve (RAG fate map-negative) ILC2s. This elegant approach demonstrates that ILC2s with a history of RAG expression are epigenomically suppressed. However, key genes such as IL-5 and IL-13 do not appear to be differentially regulated between RAGexp and RAGnaïve ILC2s according to Table S5. Although the authors show that the regulome activity of IL-5 and IL-13 is decreased in RAGexp ILC2s, how do the authors explain that these genes are not differentially expressed between the RAGexp and RAGnaïve ILC2? I think that it is important to validate this in vivo.

      We thank the Reviewer for highlighting the value and possible elegance of our data. The Reviewer brings up an important issue that we grappled with in this study and that highlights a major technical limitation of single cell sequencing studies. Genes for secreted factors such as cytokines are often transcribed at low levels and are poorly detected in transcriptomic studies. This is particularly true in single cell studies with lower sequencing depth. Various efforts have been made to overcome these issues such as computational approaches to estimate missing data (e.g. van Djik et al, 2018, PMID 29961576; Huang et al, 2018, PMID 29941873), or recent use of cytokine reporter mice and dial-out PCR to enhance key cytokine signals in sequenced ILCs (Bielecki et al, 2021, PMID 33536623). We did not utilize computational methods to avoid the risk of introducing artifacts into the data, and we did not perform our study in cytokine reporter mice. Thus, cytokines were poorly detected in our transcriptomic data, as evidenced by lack of identification of cytokines as markers for specific clusters (e.g. IL-5 for ILC2s) or significant differential expression between RAG-naïve and RAG-experienced ILC2s.

      However, the multiomic features of our data allowed a synergistic analysis to identify effects on cytokines. For example, transcripts for the IL-4 and IL-5 were not detected at a high enough level to qualify as marker genes of the ILC2 cluster in the gene expression (GEX) assay but were identified as markers for the ILC2 cluster in the ATAC-seq data in the differentially accessible chromatin (DA) assay. Using the combined RNA-seq and ATAC-seq gene to peak links (GPL) analyses, many GPLs were identified in the Th2 locus for ILC2s, including for IL-13, which was not identified as a marker for ILC2s by any of the assays alone. Thus, our combined analysis took advantage of the potential of multiomic datasets to overcome a general weakness inherent to most scRNAseq datasets.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Line 168; Reference 23 also showed expression in the NK cells, please add this reference to reference 24.

      We thank the reviewer for catching this oversight, and we have corrected it in the revised manuscript.

      - Please add the full names for GPL and sdLN in the text of the manuscript when first using these abbreviations. They are now only explained in the legends.

      We reviewed the manuscript text and found that we defined sdLNs for the first time on line 104. We defined GPLs for the first time on line 248. We believe these definitions are placed appropriately near the first references to the corresponding figures/analysis, but if the Reviewer believes we should move these definitions earlier, we are happy to do so.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest that the following reanalyses would improve the clarity of the data:

      - Can ILC2 numbers, rather than frequency, be used (e.g. in Figure 1C, S2B, and so on). This would substantiate the data that currently relies on percentages.

      This was a weakness also noted by Reviewer #3. We have added data on ILC2 numbers for each experiment as outlined below:

      In New Figure 1, we added new panels for ILC2 cell number in both the AD-like disease experiment (C) and in steady state (H).

      In New Figure S2, we added a panel for ILC2 cell number in steady state (B).

      In Figure 2 and associated supplemental data in Figure S4, we added several more panels. For the splenocyte chimera, we added a panel for ILC2 cell number in New Figure 2C.

      We incorporated multiple new panels in New Figure S4 to address the need for more data to be shown for the BM chimera (also requested by Reviewer #2). These included total cell counts and frequency for ILC2 (New Figure S4F,G), and IL-5<sup>+</sup> (New Figure S4I,K) and IL-13<sup>+</sup> (New Figure S4J,L) ILCs in addition to the proportions originally presented in Figure 2.  

      - Can the authors provide data on IL-33R expression on sdLN ILC2s? Expression of ST-2 (IL-33R) does vary between ILC2 populations and is impacted by the digestion of tissue. All of the data provided here requires ILC2 to be IL-33R<sup>+</sup>. In the control samples, the ILC2 compartment is very scarce - in LNs, ILC2s are rare. The gating strategy with limited resolution of positive and negative cells in the lineage gate doesn't help this analysis.

      The Reviewer raises a valid point regarding the IL-33R marker and ILC2s. We designed our initial experiments to be consistent with our earlier observations of skin ILC2s, which were defined as CD45<sup>+</sup>Lin-CD90+CD25+IL33+, and the scarcity of skin draining lymph node ILC2s at steady state was consistent with our prior findings (Kim et al, 2013, PMID 23363980). We can include MFI data on IL-33R expression in these cells if the reviewer feels strongly that this would add to the manuscript, but we did not include other ILC2-specific markers in these experiments that would give us an alternative total ILC2 count to calculate frequency of IL-33R<sup>+</sup> ILC2s, which would also make the context of the IL-33 MFI difficult to interpret.

      Other studies defining tissue specific expression patterns in ILC2s have called into question whether IL-33R is a reliable marker to define skin ILC2s (Ricardo-Gonzalez et al, 2018, PMID 30201992). However, there is evidence for region-specific expression of IL-33R (Kobayashi et al, 2019, PMID 30712873), with ILC2s in the subcutis expressing high levels of IL-33R and both IL5 and IL-13, while ILC2s in the epidermis and dermis have low levels of IL-33R and IL-5 expression. In contrast to the Kobayashi et al study, Ricardo-Gonzalez et al sequenced ILC2s from whole skin, thus the region-specific expression patterns were not preserved, and the lower expression of IL-33R in the epidermis and dermis may have diluted the signal from the ILC2s in the subcutis. These may also be the ILC2s most likely to drain into the lymph nodes, which is the tissue on which we focused our analyses (consistent with our prior work in Kim et al, 2013).

      - In Figure 2 (related to 2H, 2I) can flow plots of the IL-5 versus IL-13 gated on either CD90.1+CD45.2+ or CD90.2+CD45.2+ ILC2 be shown? I.e. gate on the ILC2s and show cytokine expression, rather than the proportion of donor IL5/13. The proportion of donor ILC2 is shown to be significantly higher in 2G. Therefore gating on the cells of interest and showing on a cellular basis their ability to produce the cytokines would better make the point I think.

      We agree that this is important additional data to include. We have added flow plots of sdLN ILC2s from the BM chimera divided by donor genotype showing IL-5 and IL-13 expression in New Figure S4H.

      I assume the authors have looked and there is no obvious data, but does analysis of transcription factor consensus binding sequences in the open chromatin provide any new insight?

      The Reviewer also commented on this in the public review. As copied from our response above:

      We found that the most enriched sites in the ILC2 gene loci contained the consensus sequence GGGCGG (or its reverse complement), a motif recognized by a variety of zinc finger transcription factors (TFs). Predictions from our analyses predicted the KLF family of zinc finger TFs as most likely to be enriched at the identified open chromatin regions. To infer which KLFs might be occupying these sites in the RAG-experienced or RAG-naïve cells, we also assessed the expression levels of these identified TFs. Interestingly, KLF2 and KLF6 are more expressed in RAG-experienced ILC2s. KLF6 is a tumor suppressor (PMID: 11752579), and both KLF6 and KLF2 were recently shown to be markers of “quiescent-like” ILCs (PMID: 33536623). Further, upon analysis of the Th2 locus, the (A/T)GATA(A/G) consensus site (or reverse complement) was enriched in identified open chromatin at that locus. The algorithm predicted multiple TFs from the GATA family as possible binding partners, but expression analysis showed only GATA3 was highly expressed in ILC2s, consistent with what would be predicted from prior studies (PMID: 9160750).

      We have added this data in new Figure S10 and new Figure S12, with corresponding text in the Results section on lines 277-316 and lines 366-378.

      In terms of phrasing and presentation:

      - It would help to provide some explanation of why all analyses focus on the draining LNs rather than the actual site of inflammation (the ear skin). I do not think it appropriate to ask for data on this as this would require extensive further experimentation, but there should be some discussion on this topic. This feels relevant given that the skin is the site of inflammatory insult and ILC2 is present here. How the ILC2 compartment in the skindraining lymph nodes relates to those in the skin is not completely clear, particularly given the prevailing dogma that ILC2 are tissue-resident.

      Given limitations of assessing cytokine production of the relatively rare population of skin-resident ILC2s, we focused on the skin-draining lymph nodes (sdLN). Our findings in the current manuscript are consistent with our prior work in Kim et al, 2013 (PMID 23363980), and more recently in Tamari et al, 2024 (PMID 38134932), which demonstrated correlation of increased ILC2s in sdLN with increased skin inflammation in the MC903 model. Similarly, Dutton et al (PMID 31152090) have demonstrated expansion of the sdLN ILC2 pool in response to MC903-induced AD-like inflammation in mice. We elaborate on the implications of our work for future studies, including limitations of our study (including the focus on the sdLN), and currently available reagents and need for new mouse strains to rigorously answer these questions on lines 476-508

      - I think the authors should explicitly state that cytokine production is assessed after ex vivo restimulation (e.g. Lines 112-113).

      We have added this statement to the revised text.

      - I also think that it would help to be consistent with axis scales where analyses are comparable (e.g. Figure 1D vs Figure 1H).

      We agree with the Reviewer and we have adjusted the axes for consistency. The data remains unchanged, but axes are slightly adjusted in New Figure 1 (D&I, E&J, F&K) and New Figure S2 (C-E match New Figure 1 D-F). This same axis scaling scheme is carried forward to New Figure 2 (D-E) and New Figure S4 (G,K,L). New data on cell counts is also included per request by Reviewers 2 and 3 (see above). However, we found results for total cells, including ILC2s (New Figure 1C,H, New Figure S2B, New Figure 2C, New Figure S4F), were consistent within experiments, but not between experiments, likely representing issues with normalizing counts (we did not include counting beads for more accurate total counts). Thus, the y-axes in those panels are not consistent between experiments/figures.

      We feel reporting the proportion of WT vs Rag1<sup>-/-</sup> donor cells for the BM chimera is most illustrative of the effect of RAG and have kept it in the main New Figure 2, but for the BM chimera experiment panels we also include the total counts of IL-5<sup>+</sup> and IL-13<sup>+</sup> ILC2s (New Figure S4I,J).

    1. eLife Assessment

      This important revised manuscript presents compelling findings by delineating two molecularly distinct liver cancer subtypes through comprehensive multi-omics integration and constructing a rigorously validated prognostic model. The authors have strengthened the analytical framework and validation across multiple datasets, including single-cell RNA sequencing. The evidence remains robust, with enhanced methodological clarity and expanded validation in both internal and independent cohorts. The revisions have improved the study's rigor and translational relevance.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to classify hepatocellular carcinoma (HCC) patients into distinct subtypes using a comprehensive multi-omics approach. They employed an innovative consensus clustering method that integrates multiple omics data types, including mRNA, lncRNA, miRNA, DNA methylation, and somatic mutations. The study further sought to validate these subtypes by developing prognostic models using machine learning algorithms and extending the findings through single-cell RNA sequencing (scRNA-seq) to explore the cellular mechanisms driving subtype-specific prognostic differences.

      Strengths:

      (1) Comprehensive Data Integration: The study's integration of various omics data provides a well-rounded view of the molecular characteristics underlying HCC. This multi-omics approach is a significant strength, as it allows for a more accurate and detailed classification of cancer subtypes.

      (2) Innovative Methodology: The use of a consensus clustering approach that combines results from 10 different clustering algorithms is a notable methodological advancement. This approach reduces the bias that can result from relying on a single clustering method, enhancing the robustness of the findings.

      (3) Machine Learning-Based Prognostic Modeling: The authors rigorously apply a wide array of machine learning algorithms to develop and validate prognostic models, testing 101 different algorithm combinations. This comprehensive approach underscores the study's commitment to identifying the most predictive models, which is a considerable strength.

      (4) Validation Across Multiple Cohorts: The external validation of findings in independent cohorts is a critical strength, as it increases the generalizability and reliability of the results. This step is essential for demonstrating the clinical relevance of the proposed subtypes and prognostic models.

      Weaknesses:

      (1) Inconsistent Storyline:<br /> Despite the extensive data mining and rigorous methodologies, the manuscript suffers from a lack of a coherent and consistent narrative. The transition between different sections, particularly from multi-omics data integration to single-cell validation, feels disjointed. A clearer articulation of how each analysis ties into the overall research question would improve the manuscript.

      (2) Questionable Relevance of Immune Cell Activity Analysis:<br /> The evaluation of immune cell activities within the cancer cell model raises concerns about its meaningfulness. The methods used to assess immune function in the tumor microenvironment may not be fully appropriate, potentially limiting the insights gained from this part of the study.

      (3) Incomplete Single-Cell RNA-Seq Validation:<br /> The validation of the findings using single-cell RNA-seq data appears insufficient to fully support the study's claims. While the authors make an effort to extend their findings to the single-cell level, the analysis lacks depth. A more comprehensive validation is necessary to substantiate the robustness of the identified subtypes.

      (4) Figures and Visualizations:<br /> Several figures in the manuscript are missing necessary information, which affects the clarity of the results. For instance, the pathways in Figure 3A could be clustered to enhance interpretability, the blue bar in Figure 4A is unexplained, and Figure 4B is not discussed in the text. Additionally, the figure legend in Figure 7C lacks detail, and many figure descriptions merely repeat the captions without providing deeper insights.

      (5) Appraisal of the Study's Aims and Results<br /> The authors have set out to achieve an ambitious goal of classifying HCC patients into distinct prognostic subtypes and validating these findings through both bulk and single-cell analyses. While the methodologies employed are innovative and the data integration comprehensive, the study falls short in fully achieving its aims due to inconsistencies in the narrative and incomplete validation. The results partially support the conclusions, but the lack of coherence and depth in certain areas limits the overall<br /> impact of the study.

      (6) Impact on the Field<br /> If the identified weaknesses are addressed, this study has the potential to significantly impact the field of HCC research. The multi-omics approach combined with machine learning is a powerful framework that could set a new standard for cancer subtype classification. However, the current state of the manuscript leaves some uncertainty regarding the practical applicability of the findings, particularly in clinical settings.

      (7) Additional Context<br /> For readers and researchers, this study offers a valuable look into the potential of integrating multi-omics data with machine learning to improve cancer classification and prognostication. However, readers should be aware of the noted weaknesses, particularly the need for more consistent narrative development and comprehensive validation of the methods. Addressing these issues could greatly enhance the study's utility and relevance to the community.

      Comments on revisions:

      The authors have addressed the reviewers' concerns effectively.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      (1) Storyline and Narrative Flow:

      Consider revising the manuscript to create a more coherent and consistent narrative. Clarify how each section of the study-particularly the transition from multi-omics data integration to single-cell RNA-seq validation-contributes to the overall research question. This will help readers better understand the logical flow of the study.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have modified some text, including the connections between different sections in the results part and the objectives and roles of various analyses in each section, thus enhancing the coherence between the contexts and clarifying the objectives and functions of each analysis, We believe this will help readers better understand the main content of the entire text.

      (2) Immune Cell Activity Analysis:

      Reevaluate the methods used to assess immune cell activities within the context of the tumor microenvironment. Consider providing additional justification for the relevance of using the cancer cell model for this analysis. If necessary, explore alternative methods or models that might offer more meaningful insights into immune-tumor interactions.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      Using RNA-Bulk data, we evaluated the tumor immune microenvironment through various methods to assess immune infiltration levels and responses to immunotherapy. We found that the results were largely consistent with those presented in the manuscript, providing strong support for our viewpoints. We also acknowledge the limitations of findings from bioinformatics analysis. In our upcoming research, we plan to develop organoid models with gene expression patterns of both CS1 and CS2 subtypes, using these models as a foundation for studying the tumor immune microenvironment.

      (3) Single-Cell RNA-Seq Validation:

      Expand the validation of your findings using single-cell RNA-seq data. This could include more in-depth analyses that explore the heterogeneity within the subtypes and confirm the robustness of your classification method at the single-cell level. This would strengthen the support for your claims about the relevance of the identified subtypes.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      In this manuscript, we employed the NTP algorithm to classify malignant cells identified by the CopyKAT algorithm using characteristic genes of CS1 and CS2 subtypes. This approach is similar to previous method that analyzed patients in the ICGC cohort with the same subtype genes. We consider this classification method valid.

      After classifying the malignant cells, we performed metabolic and cell communication analyses on the CS1 and CS2 subtype cells, revealing significant differences in biological pathways enriched by differential genes, metabolic levels, and cell signaling patterns. These differences align with variations observed in prior classifications and analyses based on RNA-Bulk data.

      We also acknowledge that validating the classification method solely with the single-cell dataset from this study is insufficient. We analyzed GSE202642 using the same processes and methods as GSE229772, finding that the results were generally consistent, indicating that our classification method exhibits a degree of robustness at the single-cell level.

      (4) Methodological Justification:

      Provide a more detailed rationale for the selection of machine learning algorithms and integration strategies used in the study. Explain why the chosen methods are particularly well-suited for this research, and discuss any potential limitations they might have.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have updated the methodology section to enhance readers' understanding of the fundamental principles involved. This analysis has two key features: first, it combines 10 machine learning algorithms to generate 101 models and ultimately selects the prognostic prediction model with the highest C-index from these 101 algorithms; second, it utilizes the LOOCV method to analyze the training and validation sets. Compared to the conventional method of randomly dividing the training and validation sets by a fixed ratio, this approach significantly minimizes the bias and randomness introduced by the splitting process. Therefore, we believe this analysis can leverage the characteristic genes of the CS1 and CS2 subtypes, combined with existing clinical data from public databases, to yield results that are more accurate and reliable than the commonly used prognostic models in previous literature, such as COX regression and Lasso regression, as well as other individual algorithms. While this analysis presents advantages over some previous modeling methods, it is essential to recognize that it remains based on analyses conducted using public databases, which may obscure certain factors that might be clinically relevant to patient prognosis due to the mathematical logic of the algorithms.

      (5) Figures and Visualizations:

      Improve the clarity of your figures by addressing the following:

      a) Figure 3A: Cluster the pathways to make the comparisons clearer and more meaningful.

      b) Figure 4A: Clearly explain the significance of the blue bar.

      c) Figure 4B: Ensure this figure is discussed in the main text to justify its inclusion.

      d) Figure 7C: Enhance the figure legend to provide more informative details.

      Additionally, ensure that figure descriptions go beyond the captions and provide detailed explanations that help the reader understand the significance of each figure.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      Figure 3A: We clustered the samples based on CS1 and CS2 subtypes and displayed the immune-related cell scores of each sample as a heatmap.

      Figure 4A: The blue bars in the figure represent the average C-index of this algorithm combination in the training dataset TCGA and the validation dataset ICGC, which we have supplemented in the corresponding sections of the text.

      Figure 4B: We described this figure in the results section, which primarily aims to validate whether our prognostic prediction model can predict patient outcomes in the TCGA cohort. The results showed that after performing prognostic risk scoring on patients based on the prediction model and categorizing them into high-risk and low-risk groups, the two groups exhibited significant prognostic differences, with the high-risk group showing worse outcomes compared to the low-risk group. This indicates that our prognostic prediction model can effectively distinguish the prognostic risk differences among patients in the TCGA-LIHC cohort. We also discussed these findings in the discussion section.

      Figure 7C: We used both point color and size to visualize the levels of metabolic scores, resulting in two dimensions in the legend, which actually represent the same information. Therefore, we removed the results that used point size to indicate the levels of metabolic scores.

      (6) Supplementary Materials:

      Consider including more detailed supplementary materials that provide additional validation data, extended methodological descriptions, and any other information that would support the robustness of your findings.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      In the subsequent version of the record, we will upload the important results obtained during the research to GitHub, and in this revision, we have updated some figures that may better explain the results or the robustness of the findings as supplementary materials.

      (7) Recent Literature:

      a) Incorporate more recent studies in your discussion, especially those related to HCC subtypes and the application of machine learning in oncology. This will provide a more current context for your work and help position your findings within the broader field.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have reviewed several studies related to HCC subtype classification and the application of machine learning in this field. In the discussion section, we summarize the significance and limitations of these studies. Additionally, we discuss the characteristics of our study in comparison to previous research in this field.

      (8) Data and Code Availability:

      Ensure that all data, code, and materials used in your study are made available in line with eLife's policies. Provide clear links to repositories where readers can access the data and code used in your analyses.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have examined the relevant data, code, and materials. We confirm that we have indicated the sources of the data and tools used in the analysis within the manuscript. Moreover, these data and tools are accessible via the websites or references we have provided.

      Reviewer #2 (Recommendations for the authors):

      (1) While the computational findings are robust, further experimental validation of the two subtypes, particularly the role of the MIF signaling pathway, would strengthen the biological relevance of the findings. In vitro or in vivo validation could confirm the proposed mechanisms and their influence on patient prognosis.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We intend to verify our findings in future studies using tumor cell line models and animal models. We aim to identify and intervene with key molecules in the MIF signaling pathway. We will investigate how the MIF signaling pathway affects tumor sensitivity to treatment in both cell line and animal models, along with the underlying mechanisms.

      (2) Consider testing the model on additional independent cohorts beyond the TCGA and ICGC datasets to further demonstrate its generalizability and applicability across different patient populations.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We analyzed the GSE14520 study recorded in the GEO database, which uploaded a cohort consisting of 209 HCC patients and their corresponding RNA sequencing data. We validated the prognostic model obtained in this study using this cohort, and found that the model effectively distinguishes patients into high-risk and low-risk prognostic categories. Furthermore, there is a significant prognostic difference between the high-risk and low-risk patient groups. This is consistent with the results we obtained previously.

      (3) Review the manuscript for long or complex sentences, which can be broken down into shorter, more readable parts.

      We have made revisions to the long and complex sentences in the manuscript without compromising its academic integrity and rationality, with the hope that this will help readers better understand the content of this study.

      During the revision process, in addition to addressing the reviewer comments, we conducted a thorough review of the analysis. In the course of this review, we identified a few errors in the data usage and have since corrected the relevant data and figures:

      Figure 4: Due to space constraints, we adjusted the composition of the figures after incorporating the validation results from the GSE14520 dataset.

      Figure 5A: We rechecked the regression coefficients included in the model, updated several more recent prognostic models, and calculated the C-index for 20 prognostic models in the TCGA and ICGC cohorts using a method consistent with previous studies.

      Figure 5C-D: We adjusted the clarity of the figures.

      Figure 8: We reclassified the selected malignant cells and updated the subtypes results. Subsequently, based on the repeatedly confirmed typing results, we comprehensively updated the analysis results of the subsequent cell communication network construction, ensuring that the entire analysis process remains consistent with previous findings. We also adjusted the composition of the figure and presented the images that could not be conveniently merged due to space constraints as Figure 9.

    1. eLife Assessment

      This valuable study uses C. elegans to provide new insights into the role of the conserved protein FLWR-1/Flower in synaptic transmission. Employing a variety of techniques, including calcium imaging, ultrastructural analysis, and electrophysiology, the paper provides evidence that challenges some previous thinking about FLWR-1 function. While most of the findings are convincing, some of the authors' conclusions about the mechanisms of FLWR-1 function remain somewhat speculative.

    2. Reviewer #1 (Public review):

      Public Review

      The authors investigated the role of the C. elegans Flower protein, FLWR-1, in synaptic transmission, vesicle recycling, and neuronal excitability. They confirmed that FLWR-1 localizes to synaptic vesicles and the plasma membrane and facilitates synaptic vesicle recycling at neuromuscular junctions. They observed that hyperstimulation results in endosome accumulation in flwr-1 mutant synapses, suggesting that FLWR-1 facilitates the breakdown of endocytic endosomes. Using tissue-specific rescue experiments, the authors showed that expressing FLWR-1 in GABAergic neurons restored the aldicarb-resistant phenotype of flwr-1 mutants to wild-type levels. By contrast, cholinergic neuron expression did not rescue aldicarb sensitivity at all. They also showed that FLWR-1 removal leads to increased Ca2+ signaling in motor neurons upon photo-stimulation. From these findings, the authors conclude that FLWR-1 helps maintain the balance between excitation and inhibition (E/I) by preferentially regulating GABAergic neuronal excitability in a cell-autonomous manner.

      Overall, the work presents solid data and interesting findings, however the proposed cell-autonomous model of GABAergic FLWR-1 function may be overly simplified in my opinion.

      Most of my previous comments have been addressed; however, two issues remain.

      (1) I appreciate the authors' efforts conducting additional aldicarb sensitivity assays that combine muscle-specific rescue with either cholinergic or GABergic neuron-specific expression of FLWR-1. In the revised manuscript, they conclude, "This did not show any additive effects to the pure neuronal rescues, thus FLWR-1 effects on muscle cell responses to cholinergic agonists must be cell-autonomous." However, I find this interpretation confusing for the reasons outlined below.

      Figure 1 - Figure Supplement 3B shows that muscle-specific FLWR-1 expression in flwr-1 mutants significantly restores aldicarb sensitivity. However, when FLWR-1 is co-expressed in both cholinergic neurons and muscle, the worms behave like flwr-1 mutants and no rescue is observed. Similarly, cholinergic FLWR-1 alone fails to restore aldicarb sensitivity (shown in the previous manuscript). These observations indicate a non-cell-autonomous interaction between cholinergic neurons and muscle, rather than a strictly muscle cell-autonomous mechanism. In other words, FLWR-1 expressed in cholinergic neurons appears to negate or block the rescue effect of muscle-expressed FLWR-1. Therefore, FLWR-1 could play a more complex role in coordinating physiology across different tissues. This complexity may affect interpretations of Ca2+ dynamics and/or functional data, particularly in relation to E/I balance, and thus warrants careful discussion or further investigation.

      (2) The revised manuscript includes new GCaMP analyses restricted to synaptic puncta. The authors mention that "we compared Ca2+ signals in synaptic puncta versus axon shafts, and did not find any differences," concluding that "FLWR-1's impact is local, in synaptic boutons." This is puzzling: the similarity of Ca2+ signals in synaptic regions and axon shafts seems to indicate a more global effect on Ca2+ dynamics or may simply reflect limited temporal resolution in distinguishing local from global signals due to rapid Ca2+ diffusion. The authors should clarify how they reached the conclusion that FLWR-1 has a localized impact at synaptic boutons, given that synaptic and axonal signals appear similar. Based on the presented data, the evidence supporting a local effect of FLWR-1 on Ca2+ dynamics appears limited.

    3. Reviewer #2 (Public review):

      Summary:

      The Flower protein is expressed in various cell types, including neurons. Previous studies in flies have proposed that Flower plays a role in neuronal endocytosis by functioning as a Ca2+ channel. However, its precise physiological roles and molecular mechanisms in neurons remain largely unclear. This study employs C. elegans as a model to explore the function and mechanism of FLWR-1, the C. elegans homolog of Flower. This study offers intriguing observations that could potentially challenge or expand our current understanding of the Flower protein. Nevertheless, further clarification or additional experiments are required to substantiate the study's conclusions.

      Strengths:

      A range of approaches was employed, including the use of a flwr-1 knockout strain, assessment of cholinergic synaptic activity via analyzing aldicarb (a cholinesterase inhibitor) sensitivity, imaging Ca2+ dynamics with GCaMP3, analyzing pHluorin fluorescence, examination of presynaptic ultrastructure by EM, and recording postsynaptic currents at the neuromuscular junction. The findings include notable observations on the effects of flwr-1 knockout, such as increased Ca2+ levels in motor neurons, changes in endosome numbers in motor neurons, altered aldicarb sensitivity, and potential involvement of a Ca2+-ATPase and PIP2 binding in FLWR-1's function.

      The authors have adequately addressed most of my previous concerns, however, I recommend minor revisions to further strengthen the study's rigor and interpretation:

      Major suggestions

      (1) This study relies heavily on aldicarb assays to support its conclusions. While these assays are valuable, their results may not fully align with direct assessment of neurotransmitter release from motor neurons. For instance, prior work has shown that two presynaptic modulators identified through aldicarb sensitivity assays exhibited no corresponding electrophysiological defects at the neuromuscular junction (Liu et al., J Neurosci 27: 10404-10413, 2007). Similarly, at least one study from the Kaplan lab has noted discrepancies between aldicarb assays and electrophysiological analyses. The authors should consider adding a few sentences in the Discussion to acknowledge this limitation and the potential caveats of using aldicarb assays, especially since some of the aldicarb assay results in this study are not easily interpretable.

      (2) The manuscript states, "Elevated Ca2+ levels were not further enhanced in a flwr-1;mca-3 double mutant." (lines 549-550). However, Figure 7C does not include statistical comparisons between the single and double mutants of flwr-1 and mca-3. Please add the necessary statistical analysis to support this statement.

      (3) The term "Ca2+ influx" should be avoided, as this study does not provide direct evidence (e.g. voltage-clamp recordings of Ca2+ inward currents in motor neurons) for an effect of the flwr-1 mutation of Ca2+ influx. The observed increase in neuronal GCaMP signals in response to optogenetic activation of ChR2 may result from, or be influenced by, Ca2+ mobilization from of intracellular stores. For example, optogenetic stimulation could trigger ryanodine receptor-mediated Ca2+ release from the ER via calcium-induced calcium release (CICR) or depolarization-induced calcium release (DICR). It would be more appropriate to describe the observed increase in Ca2+ signal as "Ca2+ elevation" rather than increased "Ca2+ influx".

    4. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      The authors investigated the role of the C. elegans Flower protein, FLWR-1, in synaptic transmission, vesicle recycling, and neuronal excitability. They confirmed that FLWR-1 localizes to synaptic vesicles and the plasma membrane and facilitates synaptic vesicle recycling at neuromuscular junctions. They observed that hyperstimulation results in endosome accumulation in flwr-1 mutant synapses, suggesting that FLWR-1 facilitates the breakdown of endocytic endosomes. Using tissue-specific rescue experiments, the authors showed that expressing FLWR-1 in GABAergic neurons restored the aldicarb-resistant phenotype of flwr-1 mutants to wild-type levels. By contrast, cholinergic neuron expression did not rescue aldicarb sensitivity at all. They also showed that FLWR-1 removal leads to increased Ca2+ signaling in motor neurons upon photo-stimulation. From these findings, the authors conclude that FLWR-1 helps maintain the balance between excitation and inhibition (E/I) by preferentially regulating GABAergic neuronal excitability in a cell-autonomous manner.

      Overall, the work presents solid data and interesting findings, however the proposed cell-autonomous model of GABAergic FLWR-1 function may be overly simplified in my opinion.

      Most of my previous comments have been addressed; however, two issues remain.

      (1) I appreciate the authors' efforts conducting additional aldicarb sensitivity assays that combine muscle-specific rescue with either cholinergic or GABergic neuron-specific expression of FLWR-1. In the revised manuscript, they conclude, "This did not show any additive effects to the pure neuronal rescues, thus FLWR-1 effects on muscle cell responses to cholinergic agonists must be cell- autonomous." However, I find this interpretation confusing for the reasons outlined below.

      Figure 1 - Figure Supplement 3B shows that muscle-specific FLWR-1 expression in flwr-1 mutants significantly restores aldicarb sensitivity. However, when FLWR-1 is co-expressed in both cholinergic neurons and muscle, the worms behave like flwr-1 mutants and no rescue is observed. Similarly, cholinergic FLWR-1 alone fails to restore aldicarb sensitivity (shown in the previous manuscript).

      This data is still shown in the manuscript, Fig. 3D. We interpreted our finding in the muscle/cholinergic co-rescue experiment as meaning, that FLWR-1 in cholinergic neurons over-compensates, so worms should be resistant, and the rescuing effect of muscle FLWR-1 is therefore cancelled. But it is true, if this were the case, why does the pure cholinergic rescue not show over-compensation? We added a sentence to acknowledge this inconsistency and we added a sentence in the discussion (see also below, comment 1) of reviewer #2).

      These observations indicate a non-cell-autonomous interaction between cholinergic neurons and muscle, rather than a strictly muscle cell-autonomous mechanism. In other words, FLWR-1 expressed in cholinergic neurons appears to negate or block the rescue effect of muscle-expressed FLWR-1. Therefore, FLWR-1 could play a more complex role in coordinating physiology across different tissues. This complexity may affect interpretations of Ca2+ dynamics and/or functional data, particularly in relation to E/I balance, and thus warrants careful discussion or further investigation.

      For the Ca2+ dynamics, we think the effects of flwr-1 are likely very immediate, as the imaging assay relies on a sensor expressed directly in the neurons or muscles under study, and not on indirect phenotypes as muscle contraction and behavior, that depend on an interplay of several cell types influencing each other.

      (2) The revised manuscript includes new GCaMP analyses restricted to synaptic puncta. The authors mention that "we compared Ca2+ signals in synaptic puncta versus axon shafts, and did not find any differences," concluding that "FLWR-1's impact is local, in synaptic boutons." This is puzzling: the similarity of Ca2+ signals in synaptic regions and axon shafts seems to indicate a more global effect on Ca2+ dynamics or may simply reflect limited temporal resolution in distinguishing local from global signals due to rapid Ca2+ diffusion. The authors should clarify how they reached the conclusion that FLWR-1 has a localized impact at synaptic boutons, given that synaptic and axonal signals appear similar. Based on the presented data, the evidence supporting a local effect of FLWR-1 on Ca2+ dynamics appears limited.

      We apologize, here we simply overlooked this misleading wording in our rebuttal letter. The data we mentioned, showing no obvious difference in axon vs. bouton, are shown in Author response image 1, including time constants for the onset and the offset of the stimulus (data is peak normalized for better visualization):

      Author response image 1.

      One can see that axonal Ca2+ signals may rise a bit slower than synaptic Ca2+ signals, as expected for Ca2+ entering the boutons, and then diffusing out into the axon. The loss of FLWR- 1 does not affect this. However, the temporal resolution of the used GCaMP6f sensor is ca. 200 ms to reach peak, and the decay time (to t1/2) is ca. 400 ms (PMID: 23868258). Thus, it would be difficult to see effects based on Ca2+ diffusion using this assay. For the decay, this is similar for both axon and synapse, while flwr-1 mutants do not reduce Ca2+ as much as wt. In the axon, there is a seemingly slightly slower reduction in flwr-1 mutants, however, given the kinetics of the sensor, this is likely not a meaningful difference. Therefore, we wrote we did not find differences. The interpretation should not have been that the impact of FLWR-1 is local. It may be true if one could image this at faster time scales, i.e. if there is more FLWR-1 localized in boutons (as indicated by our data showing FLWR-1 enrichment in boutons; Fig. 3), and when considering its possible effect on MCA-3 localization (and assuming that MCA-3 is the active player in Ca2+ removal), i.e. FLWR-1 recruiting MCA-3 to boutons (Fig. 9C, D).

      Reviewer #2 (Public review):

      Summary:

      The Flower protein is expressed in various cell types, including neurons. Previous studies in flies have proposed that Flower plays a role in neuronal endocytosis by functioning as a Ca2+ channel. However, its precise physiological roles and molecular mechanisms in neurons remain largely unclear. This study employs C. elegans as a model to explore the function and mechanism of FLWR-1, the C. elegans homolog of Flower. This study offers intriguing observations that could potentially challenge or expand our current understanding of the Flower protein. Nevertheless, further clarification or additional experiments are required to substantiate the study's conclusions.

      Strengths:

      A range of approaches was employed, including the use of a flwr-1 knockout strain, assessment of cholinergic synaptic activity via analyzing aldicarb (a cholinesterase inhibitor) sensitivity, imaging Ca2+ dynamics with GCaMP3, analyzing pHluorin fluorescence, examination of presynaptic ultrastructure by EM, and recording postsynaptic currents at the neuromuscular junction. The findings include notable observations on the effects of flwr-1 knockout, such as increased Ca2+ levels in motor neurons, changes in endosome numbers in motor neurons, altered aldicarb sensitivity, and potential involvement of a Ca2+-ATPase and PIP2 binding in FLWR-1's function.

      The authors have adequately addressed most of my previous concerns, however, I recommend minor revisions to further strengthen the study's rigor and interpretation:

      Major suggestions

      (1) This study relies heavily on aldicarb assays to support its conclusions. While these assays are valuable, their results may not fully align with direct assessment of neurotransmitter release from motor neurons. For instance, prior work has shown that two presynaptic modulators identified through aldicarb sensitivity assays exhibited no corresponding electrophysiological defects at the neuromuscular junction (Liu et al., J Neurosci 27: 10404-10413, 2007). Similarly, at least one study from the Kaplan lab has noted discrepancies between aldicarb assays and electrophysiological analyses. The authors should consider adding a few sentences in the Discussion to acknowledge this limitation and the potential caveats of using aldicarb assays, especially since some of the aldicarb assay results in this study are not easily interpretable.

      Aldicarb assays have been used very successfully in identifying mutants with defects in chemical synaptic transmission, and entire genetic screens have been conducted this way. The reviewer is right, one needs to realize that it is the balance of excitation and inhibition at the NMJ of C. elegans, which underlies the effects on the rate of aldicarb-induced paralysis, not just cholinergic transmission. I.e. if a given mutant affects cholinergic and GABAergic transmission differently, things become difficult to interpret, particularly if also muscle physiology is affected. Therefore, we combined mutant analyses with cell-type specific rescue. We acknowledge that results are nonetheless difficult to interpret. We thus added a sentence in the first paragraph of the discussion.

      (2) The manuscript states, "Elevated Ca2+ levels were not further enhanced in a flwr-1;mca-3 double mutant." (lines 549-550). However, Figure 7C does not include statistical comparisons between the single and double mutants of flwr-1 and mca-3. Please add the necessary statistical analysis to support this statement.

      Because we only marked significant differences in that figure, and n.s. was not shown. This was stated in the figure legend.

      (3) The term "Ca2+ influx" should be avoided, as this study does not provide direct evidence (e.g. voltage-clamp recordings of Ca2+ inward currents in motor neurons) for an effect of the flwr-1 mutation of Ca2+ influx. The observed increase in neuronal GCaMP signals in response to optogenetic activation of ChR2 may result from, or be influenced by, Ca2+ mobilization from of intracellular stores. For example, optogenetic stimulation could trigger ryanodine receptor-mediated Ca2+ release from the ER via calcium-induced calcium release (CICR) or depolarization-induced calcium release (DICR). It would be more appropriate to describe the observed increase in Ca2+ signal as "Ca2+ elevation" rather than increased "Ca2+ influx".

      Ok, yes, we can do this, we referred by ‘influx’ to cytosolic Ca2+, that fluxes into the cytosol, be it from the internal stores or the extracellular. Extracellular influx, more or less, inevitably will trigger further influx from internal stores, to our understanding. We changed this to “elevated Ca2+ levels” or “Ca2+ level rise” or “Ca2+ level increase”.

      Reviewer #1 (Recommendations for the authors):

      A thorough discussion on the impact of cell-autonomous versus non-cell-autonomous effects is necessary.

      Revise and clarify the distinction between local and global Ca²⁺ changes.

      See above.

      Reviewer #2 (Recommendations for the authors):

      Minor suggestions

      (1) In "Few-Ubi was shown to facilitate recovery of neurons following intense synaptic activity (Yao et al.,.   " (lines 283-284), please specify which aspects of neuronal recovery are influenced by the Flower protein.

      We added “refilling of SV pools”.

      (2) The abbreviation "Few-Ubi" is used for the Drosophila Flower protein (e.g., line 283, Figure 1A, and Figure 8A). Please clarify what "Ubi" stands for and verify whether its inclusion in the protein name is appropriate.

      This is inconsistent across the literature, sometimes Few-Ubi is also referred to as FweA. We now added this term. Ubi refers to ubiquitous („Therefore, we named this isoform fweubi because it is expressed ubiquitously in imaginal discs“) (Rhiner 2010)

      (3) The manuscript uses "pflwr-1" (line 303 and elsewhere) to denote the flwr-1 promoter. This notation could be misleading, as it may be interpreted as a gene name. Please consider using either "flwr-1p" or "Pflwr-1" instead. Additionally, ensure proper italicization of gene names throughout the manuscript.

      We changed this throughout. We will change to italicized at proof stage, it would be too time- consuming to spot these incidents now.

      (4) The authors tagged the C-terminus of FLWR-1 by GFP (lines 321). The fusion protein is referred to as "GFP::FLWR-1" throughout the manuscript. Please verify whether "FLWR-1::GFP" would be the more appropriate designation.

      Thank you, yes, we changed this in the text, GFP is indeed N-terminal.

      (5) In "This did not show any additive effects. " (line 363), please clarify what "This" refers to.

      Altered to “The combined rescues did not show any additive effects…”

      (6) In "..., supporting our previous finding of increased neurotransmitter release in GABAergic neurons" (lines 412-413), please provide a citation for the referenced previous study.

      This refers to our aldicarb data within this paper, just further up in the text. We removed “previous”.

      (7) Figure 4C, D examines the effect of flwr-1 mutation on body length in the genetic background of the unc-29 mutation, which selectively disrupts the levamisole-sensitive acetylcholine receptor. Please comment on the rationale for implicating only the levamisole receptor rather than the nicotinic acetylcholine receptor in muscle cells.

      This was because we used a behavioral assay. Despite the fact that the homopentameric ACR- 16/N-AChR mediate about 2/3 of the peak currents in response to acute ACh application to the NMJ (e.g. Almedom et al., EMBO J, 2009), the acr-16 mutant has virtually no behavioral / locomotion phenotype. Likely, this is because the heteropentameric, UNC-29 containing L- AChR, while only contributing 1/3 of the peak current, desensitizes much more slowly and thus unc-29 mutants show a severe behavioral phenotype (uncoordinated locomotion, etc.). We thus did not expect a major effect when performing the behavoral assay in acr-16 mutants and thus chose the unc-29 mutant background.

      (8) In "we found no evidence  insertion into the PM (Yao et al., 2009)", It appears that the cited paper was not authored by any of the current manuscript. Please confirm whether this citation is correctly attributed.

      This sentence was arranged in a misleading way, we did not mean that we authored this paper. It was change in the text: “While a facilitating role of Flower in endocytosis appears to be conserved in C. elegans, in contrast to previous findings from Drosophila (Yao et al., 2009), we found no evidence that FLWR-1 conducts Ca2+ upon insertion into the PM.”


      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Seidenthal et al. investigated the role of the C. elegans Flower protein, FLWR-1, in synaptic transmission, vesicle recycling, and neuronal excitability. They confirmed that FLWR-1 localizes to synaptic vesicles and the plasma membrane and facilitates synaptic vesicle recycling at neuromuscular junctions, albeit in an unexpected manner. The authors observed that hyperstimulation results in endosome accumulation in flwr-1 mutant synapses, suggesting that FLWR-1 facilitates the breakdown of endocytic endosomes, which differs from earlier studies in flies that suggested the Flower protein promotes the formation of bulk endosomes. This is a valuable finding. Using tissue-specific rescue experiments, the authors showed that expressing FLWR-1 in GABAergic neurons restored the aldicarb-resistant phenotype seen in flwr-1 mutants to wild-type levels. In contrast, FLWR-1 expression in cholinergic neurons in flwr-1 mutants did not restore aldicarb sensitivity, yet muscle expression of FLWR-1 partially but significantly recovered the aldicarb-resistant defects. The study also revealed that removing FLWR-1 leads to increased Ca<sup>2+</sup> signaling in motor neurons upon photo-stimulation. Further, the authors conclude that FLWR-1 contributes to the maintenance of the excitation/inhibition (E/I) balance by preferentially regulating the excitability of GABAergic neurons. Finally, SNG-1::pHluorin data imply that FLWR-1 removal enhances synaptic transmission, however, the electrophysiological recordings do not corroborate this finding.

      Strengths:

      This study by Seidenthal et al. offers valuable insights into the role of the Flower protein, FLWR-1, in C. elegans. Their findings suggest that FLWR-1 facilitates the breakdown of endocytic endosomes, which marks a departure from its previously suggested role in forming endosomes through bulk endocytosis. This observation could be important for understanding how Flower proteins function across species. In addition, the study proposes that FLWR-1 plays a role in maintaining the excitation/inhibition balance, which has potential impacts on neuronal activity.

      Weaknesses:

      One issue is the lack of follow-up tests regarding the relative contributions of muscle and GABAergic FLWR-1 to aldicarb sensitivity. The findings that muscle expression of FLWR-1 can significantly rescue aldicarb sensitivity are intriguing and may influence both experimental design and data interpretation. Have the authors examined aldicarb sensitivity when FLWR-1 is expressed in both muscles and GABAergic neurons, or possibly in muscles and cholinergic neurons? Given that muscles could influence neuronal activity through retrograde signaling, a thorough examination of FLWR-1's role in muscle is necessary, in my opinion.

      We thank the reviewer for this suggestion. Indeed, the retrograde inhibition of cholinergic transmission by signals from muscle has been demonstrated by the Kaplan lab in a number of publications. We have now done the experiments that were suggested, see the new Fig. S3B: rescuing FLWR-1 in cholinergic neurons and in muscle did not perform any better in the aldicarb assay, while co-rescue in GABAergic neurons and muscle, like rescue in GABA neurons, led to a complete rescue to wild type levels. Thus, retrograde signaling from muscle to neurons does not contribute to effects on the E/I imbalance caused by the absence of FLWR1. The fact that muscle rescue can partially rescue the flwr-1 phenotype is likely due a cellautonomous effect of FLWR-1 on muscle excitability, facilitating muscle contraction.

      Would the results from electrophysiological recordings and GCaMP measurements be altered with muscle expression of FLWR-1? Most experiments presented in the manuscript compare wild-type and flwr-1 mutant animals. However, without tissue-specific knockout, knockdown, or rescue experiments, it is difficult to separate cell-autonomous roles from non-cell-autonomous effects, in particular in the context of aldicarb assay results. Also, relying solely on levamisole paralysis experiments is not sufficient to rule out changes in muscle AChRs, particularly due to the presence of levamisole-resistant receptors.

      We repeated the Ca<sup>2+</sup> imaging in cholinergic neurons, in response to optogenetic activation, with expression of FLWR-1 in muscle, see Fig. 4E. This did not significantly alter the increased excitability of the flwr-1 mutant. Thus, we conclude that, along with the findings in aldicarb assays, the function of FLWR-1 in muscle is cell-autonomous, and does not indirectly affect its roles in the motor neurons. Also, cholinergic expression of FLWR-1 by itself reduced Ca<sup>2+</sup> levels to those in wild type (Fig. 4E). In addition, we now also assessed the contribution of the N-AChR (ACR-16) to aldicarb-induced paralysis (Fig. S3C), showing that flwr-1 and acr-16 mutations independently mediate aldicarb resistance, and that these effects are additive. Thus, FLWR-1 does not affect the expression level or function of the N-AChR, as otherwise, the flwr1; acr-16 double mutation would not exacerbate the phenotype of the single mutants.

      This issue regarding the muscle role of FLWR-1 also complicates the interpretation of results from coelomocyte uptake experiments, where GFP secreted from muscles and coelomocyte fluorescence were used to estimate endocytosis levels. A decrease in coelomocyte GFP could result from either reduced endocytosis in coelomocytes or decreased secretion from muscles. Therefore, coelomocytespecific rescue experiments seem necessary to distinguish between these possibilities.

      We have performed a rescue of FLWR-1 in coelomocytes to address this, and found that this fully recovered the CC GFP signals to wild type levels. Therefore, the absence of FLWR-1 in muscles does not affect exocytosis of GFP. The data can be found in Fig. 5A, B.

      The manuscript states that GCaMP was used to estimate Ca<sup>2+</sup> levels at presynaptic sites. However, due to the rapid diffusion of both Ca<sup>2+</sup> and GCaMP, it is unclear how this assay distinguishes Ca<sup>2+</sup> levels specifically at presynaptic sites versus those in axons. What are the relative contributions of VGCCs and ER calcium stores here? This raises a question about whether the authors are measuring the local impact of FLWR-1 specifically at presynaptic sites or more general changes in cytoplasmic calcium levels.

      We compared Ca<sup>2+</sup> signals in synaptic puncta versus axon shafts, and did not find any differences. The data previously shown have been replaced by data where the ROIs were restricted to synaptic puncta. The outcome is the same as before. These data are provided in Fig. 4A, B, E, F. We thus conclude that the impact of FLWR-1 is local, in synaptic boutons.

      The experiments showing FLWR-1's presynaptic localization need clarification/improvement. For example, data shown in Fig. 3B represent GFP::FLWR-1 is expressed under its own promoter, and TagRFP::ELKS-1 is expressed exclusively in GABAergic neurons. Given that the pflwr-1 drives expression in both cholinergic and GABAergic neurons, and there are more cholinergic synapses outnumbering GABAergic ones in the nerve cord, it would be expected that many green FLWR-1 puncta do not associate with TagRFP::ELKS-1. However, several images in Figure 3B suggest an almost perfect correlation between FLWR-1 and ELKS-1 puncta. It would be helpful for the readers to understand the exact location in the nerve cord where these images were collected to avoid confusion.

      Thank you for making us aware that the provided images may be misleading. We have now extended this Figure (Fig. 3A-C) and provided more intensity profiles along the nerve cords in Fig. S4A-C. The quantitative analysis of average R<sup>2</sup> for the two fluorescent signals in each neuron type did not show any significant difference between the two, also after choosing slightly smaller ROIs for line scan analysis. We also highlighted the puncta corresponding to FLWR-1 in both neurons types, as well as to ELKS-1 in each specific neuron type, to identify FLWR-1 puncta without co-localized ELKS-1 signal. Also, we indicated the region that was imaged, i.e. the DNC posterior of the vulva, halfway to the posterior end of the nerve cord.

      The SNG-1::pHluorin data in Figure 5C is significant, as they suggest increased synaptic transmission at flwr-1 mutant synapses. However, to draw conclusions, it is necessary to verify whether the total amount of SNG-1::pHluorin present on synaptic vesicles remains the same between flwr-1 mutant and wild-type synapses. Without this comparison, a conclusion on levels of synaptic vesicle release based on changes in fluorescence might be premature, in particular given the results of electrophysiological recordings.

      We appreciate the comment. We now added data and experiments that verify that the basal SNG-1::pHluorin signal in the plasma membrane, measured at synaptic puncta and in adjacent axonal areas, is not different in flwr-1 mutants compared to wild type in the absence of stimulation. This data can be found in Fig. S5A. In addition, we cultured primary neurons from transgenic animals to compare total SNG-1::pHluorin to the vesicular fraction, by adding buffers of defined pH to the external, or buffers that penetrate the cell and fix intracellular pH. These experiments (Fig. S5B, C) showed no difference in the vesicle fraction of the pHluorin signal in wild type vs. flwr-1 mutant cells, demonstrating that flwr-1 mutants do not per se have altered SNG-1::pHluorin in their SV or plasma membranes.

      Finally, the interpretation of the E74Q mutation results needs reconsideration. Figure 8B indicates that the E74Q variant of FLWR-1 partially loses its rescuing ability, which suggests that the E74Q mutation adversely affects the function of FLWR-1. Why did the authors expect that the role of FLWR-1 should have been completely abolished by E74Q? Given that FLWR-1 appears to work in multiple tissues, might FLWR-1's function in neurons requires its calcium channel activity, whereas its role in muscles might be independent of this feature? While I understand there is ongoing debate about whether FLWR1 is a calcium channel, the experiments in this study do not definitively resolve local Ca<sup>2+</sup> dynamics at synapses. Thus, in my opinion, it may be premature to draw firm conclusions about calcium influx through FLWR-1.

      Thank you for bringing this up. We did not expect E74Q to necessarily abolish FLWR-1 function, unless it would be a Ca<sup>2+</sup> channel. Of course the reviewer is right, FLWR-1 might have functions as an ion channel as well as channel-independent functions. Yet, we are quite confident that FLWR-1 is not an ion channel. Instead, we think that E74Q alters stability of the protein (however, in the absence of biochemical data, we removed this conclusion), and that this impairs the function of FLWR-1 as a modulator, or possibly even, accessory subunit of the PMCA MCA-3. This interaction was indicated by a new experiment we added, where we found that FLWR-1 and MCA-3 must be physically very close to each other in the plasma membrane, using bimolecular fluorescence complementation (see new Fig. 9A, B). This provides a reasonable explanation for findings we obtained, i.e. increased Ca<sup>2+</sup> levels in stimulated neurons of the flwr-1 mutant. If FLWR-1 acts as a stimulatory subunit of MCA-3, then its absence may cause reduced MCA-3 function and thus an accumulation of Ca<sup>2+</sup> in the synaptic terminals. In Drosophila, hyperstimulation of neurons led to reduced Ca<sup>2+</sup> levels (Yao et al., 2017, PLoS Biol 15: e2000931), suggesting that Flower is a Ca<sup>2+</sup> channel. Based on our findings, we suggest an alternative explanation. Based on proteomics, the PMCA is a component of SVs (Takamori et al., 2006, Cell 127: 831-846). Increased insertion of PMCA into the plasma membrane during high stimulation, along with impaired endocytosis in flower mutants, would increase the steadystate levels of PMCA in the PM. This could lead to reduced steady state levels of Ca<sup>2+</sup>. This ‘g.o.f.’ in Flower may also impact on Ca<sup>2+</sup> microdomains of the P/Q type VGCC required for SV fusion, which could contribute to the rundown of EPSCs we find during synaptic hyperstimulation (Fig. 5G-J). We acknowledge, though, that Yao et al. (2009, Cell 138: 947– 960), showed increased uptake of Ca<sup>2+</sup> into liposomes reconstituted with purified Flower protein. However, it cannot be ruled out that a protein contaminant could be responsible, as the controls were empty liposomes, not liposomes reconstituted with a mutated Flower protein purified the same way.

      We also tested the E74Q mutant in its ability to rescue the reduced PI(4,5)P<sub>2</sub> levels in coelomocytes (CCs), where we observed no positive effect. While we have not measured Ca<sup>2+</sup> in CCs, we would assume that here a function of FLWR-1 affecting increased PI(4,5)P<sub>2</sub> levels is not linked to a channel function. It was, nevertheless, compromised by E74Q (Fig. 8D).

      Also, the aldicarb data presented in Figures 8B and 8D show notable inconsistencies that require clarification. While Figure 8B indicates that the 50% paralysis time for flwr-1 mutant worms occurs at 3.5-4 hours, Figure 8D shows that 50% paralysis takes approximately 2.5 hours for the same flwr-1 mutants. This discrepancy should be addressed. In addition, the manuscript mentions that the E74Q mutation impairs FLWR-1 folding, which could significantly affect its function. Can the authors show empirical data supporting this claim?

      We performed the aldicarb assays in a consistent manner, but nonetheless note that some variability from day to day can affect such outcomes. Importantly, we always measured each control (wild type, flwr-1) along with each test strain (FLWR-1 point mutants), to ensure the relevant estimate of a point-mutant’s effect. These assays have been repeated, now including the FLWR-1 wild type rescue strain as a comparison. The data are now combined in Fig. 8B. Regarding the assumed instability of the E74Q mutant, as we, indeed, do not have any experimental data supporting this, we removed this sentence.

      Reviewer #2 (Public review):

      Summary:

      The Flower protein is expressed in various cell types, including neurons. Previous studies in flies have proposed that Flower plays a role in neuronal endocytosis by functioning as a Ca<sup>2+</sup> channel. However, its precise physiological roles and molecular mechanisms in neurons remain largely unclear. This study employs C. elegans as a model to explore the function and mechanism of FLWR-1, the C. elegans homolog of Flower. This study offers intriguing observations that could potentially challenge or expand our current understanding of the Flower protein. Nevertheless, further clarification or additional experiments are required to substantiate the study's conclusions.

      Strengths:

      A range of approaches was employed, including the use of a flwr-1 knockout strain, assessment of cholinergic synaptic activity via analyzing aldicarb (a cholinesterase inhibitor) sensitivity, imaging Ca<sup>2+</sup> dynamics with GCaMP3, analyzing pHluorin fluorescence, examination of presynaptic ultrastructure by EM, and recording postsynaptic currents at the neuromuscular junction. The findings include notable observations on the effects of flwr-1 knockout, such as increased Ca<sup>2+</sup> levels in motor neurons, changes in endosome numbers in motor neurons, altered aldicarb sensitivity, and potential involvement of a Ca<sup>2+</sup>-ATPase and PIP2 binding in FLWR-1's function.

      Weaknesses:

      (1) The observation that flwr-1 knockout increases Ca<sup>2+</sup> levels in motor neurons is notable, especially as it contrasts with prior findings in flies. The authors propose that elevated Ca<sup>2+</sup> levels in flwr-1 knockout motor neurons may stem from "deregulation of MCA-3" (a Ca<sup>2+</sup> ATPase in the plasma membrane) due to FLWR-1 loss. However, this conclusion relies on limited and somewhat inconclusive data (Figure 7). Additional experiments could clarify FLWR-1's role in MCA-3 regulation. For instance, it would be informative to investigate whether mutations in other genes that cause elevated cytosolic Ca<sup>2+</sup> produce similar effects, whether MCA-3 physically interacts with FLWR-1, and whether MCA-3 expression is reduced in the flwr-1 knockout.

      We thank the reviewer for bringing up these critical points. As to other mutations that produce elevated cytosolic Ca<sup>2+</sup>: Possible mutations could be g.o.f. mutations of the ryanodine receptor UNC-68, the sarco-endoplasmatic Ca<sup>2+</sup> ATPase, or mutants affecting VGCCs, like the L-type channel EGL-19 or the P/Q-type channel UNC-2. However, any such mutant would affect muscle contractions (as we have shown for r.o.f. mutations in unc-68, egl-19 and unc-2 in Nagel et al. 2005 Curr Biol 15: 2279-84) and thus would affect aldicarb assays (see aldicarb resistance induced by RNAi of these genes in Sieburth et al., 2005, Nature 436: 510). The same should be expected for g.o.f. mutations of any such gene. In neurons, we would expect increased or decreased Ca<sup>2+</sup> levels in response to stimulation.

      Regarding the physical interaction of MCA-3 and FLWR-1, we performed bimolecular fluorescence complementation, with two fragments of mVenus fused to the two proteins. This assay shows mVenus reconstitution (i.e., fluorescence) if the two proteins are found in close vicinity to each other. Testing MCA-3 and FLWR-1 in muscle indeed showed a robust signal, evenly distributed on the plasma membrane. As a control, FLWR-1 did not interact with another plasma membrane protein, the stomatin UNC-1 interacting with gap junction proteins (Chen et al., 2007, Curr Biol 17: 1334-9). FLWR-1 also interacted with the ER chaperone Nicalin (NRA2 in C. elegans), which helps assembling the TM domains of integral membrane proteins in association with the SEC translocon. However, this signal only occurred in the ER membrane, demonstrating the specificity of the BiFC assay. This data is presented in Fig. 9A, B. Additionally, we show that FLWR-1 expression has a function in stabilizing MCA-3 localization at synapses, which is also in line with the idea of a direct interaction (Fig. 9C, D).

      (2) In silico analysis identified residues R27 and K31 as potential PIP2 binding sites in FLWR-1. The authors observed that FLWR-1(R27A/K31A) was less effective than wild-type FLWR-1 in rescuing the aldicarb sensitivity phenotype of the flwr-1 knockout, suggesting that FLWR-1 function may depend on PIP2 binding at these two residues. Given that mutations in various residues can impair protein function non-specifically, additional studies may be needed to confirm the significance of these residues for PIP2 binding and FLWR-1 function. In addition, the authors might consider explicitly discussing how this finding aligns or contrasts with the results of a previous study in flies, where alanine substitutions at K29 and R33 impaired a Flower-related function (Li et al., eLife 2020).

      We further investigated the role of these two residues in an in vivo assay for PIP2 binding and membrane association of a reporter. We used the coelomocytes (CCs), in which a previous publication demonstrated that a GFP variant tagged with a PH domain would be recruited to the CC membrane (Bednarek et al., 2007, Traffic 8: 543-53). This assay was performed in wild type, flwr-1 mutants, and flwr-1 mutants rescued with wild type FLWR-1, the FLWR-1(E74Q) mutant, or the FLWR-1(K27A; R31A) double mutant. The data are shown in Fig. 8C, D. While the wild type FLWR-1 rescued PH-GFP levels at the CC membrane to the wild type control, the FLWR-1(K27A; R31A) double mutant did not rescue the reporter binding, indicating that, at least in CCs, reduced PIP2 levels are associated with non-functional FLWR-1. Mechanistically, this is not clear at present, though we noted a possible mechanism as found for synaptotagmin, that recruits the PIP2 kinase to the plasma membrane via a lysine and arginine containing motif (Bolz et al., 2023, Neuron 111: 3765-3774.e3767). We mention this now in the discussion. We also discussed our data with respect to the findings of Li et al., about the analogous residues K27, R31 (K29, R33) in the discussion section, i.e. lines 667-670, and the differences of our findings in electron microscopy compared to the Drosophila work (more rather than less bulk endosomes) were discussed in lines 713-720.

      (3) A primary conclusion from the EM data was that FLWR-1 participates in the breakdown, rather than the formation, of bulk endosomes (lines 20-22). However, the reasoning behind this conclusion is somewhat unclear. Adding more explicit explanations in the Results section would help clarify and strengthen this interpretation.

      We added a sentence trying to better explain our reasoning. Mainly, the argument is that accumulation of such endosomes of unusually large size is seen in mutants affecting formation of SVs from the endosome (in endophilin and synaptojanin mutants), while mutants affecting mainly endocytosis (dynamin) cause formation of many smaller endocytic structures that stay attached to the plasma membrane (Kittelmann et al., 2013, PNAS 110: E3007-3016). We changed our data analysis in that we collated the data for what we previously termed endosomes and large vesicles. According to the paper by Watanabe, 2013, eLife 2: e00723, endosomes are defined by their location in the synapse, and their size. However, this work used a much shorter stimulus and froze the preparations within a few dozens to hundreds of msec after the stimulus, while we used the protocol of Kittelmann 2013, which uses 30 sec stimulation and freezing after 5 sec. There, endosomes were defined as structures larger than SVs or DCVs, but no larger than 80 nm, with an electron dense lumen, and were very rarely observed. In contrast, large vesicles or ‘100 nm vesicles’, ranged from 50-200 nm diameter, with a clear lumen, were morphologically similar to the bulk endosomes as observed by Li et al., 2021. We thus reordered our data and jointly analyzed these structure as large vesicles / bulk endosomes. The outcome is still the same, i.e. photostimulated flwr-1 mutants showed more LVs than wild type synapses.

      (4) The aldicarb assay results in Figure 3 are intriguing, indicating that reduced GABAergic neuron activity alone accounts for the flwr-1 mutant's hyposensitivity to aldicarb. Given that cholinergic motor neurons also showed increased activity in the flwr-1 mutant, one might expect the flwr-1 mutant to display hypersensitivity to aldicarb in the unc-47 knockout background. However, this was not observed. The authors might consider validating their conclusion with an alternative approach or, at the minimum, providing a plausible explanation for the unexpected result. Since aldicarb-induced paralysis can be influenced by factors beyond acetylcholine release from cholinergic motor neurons, interpreting aldicarb assay results with caution may be advisable. This is especially relevant here, as FLWR-1 function in muscle cells also impacts aldicarb sensitivity (Figure S3B). Previous electrophysiological studies have suggested that aldicarb sensitivity assays may sometimes yield misleading conclusions regarding protein roles in acetylcholine release.

      We tested the unc-47; flwr-1 animals again at a lower concentration of aldicarb, to see if the high concentration may have leveled the differences between unc-47 animals and the double mutant. This experiment is shown in Fig. S3D, demonstrating that the double mutant is significantly less resistant to aldicarb. This verifies that FLWR-1 acts not only in GABAergic neurons, but also in cholinergic neurons (as we saw by electron microscopy and electrophysiology), and that the increased excitability of cholinergic cells leads to more acetylcholine being released. In the double mutant, where GABA release is defective, this conveys hypersensitivity to aldicarb.

      (5) Previous studies have suggested that the Flower protein functions as a Ca<sup>2+</sup> channel, with a conserved glutamate residue at the putative selectivity filter being essential for this role. However, mutating this conserved residue (E74Q) in C. elegans FLWR-1 altered aldicarb sensitivity in a direction opposite to what would be expected for a Ca<sup>2+</sup> channel function. Moreover, the authors observed that E74 of FLWR1 is not located near a potential conduction pathway in the FLWR-1 tetramer, as predicted by Alphafold3. These findings raise the possibility that Flower may not function as a Ca<sup>2+</sup> channel. While this is a potentially significant discovery, further experiments are needed to confirm and expand upon these results.

      As above, we do not exclude that FLWR-1 may constitute a channel, however, based on our findings, AF3 structure predictions and data in the literature, we are considering alternative explanations for the observed effect on Ca<sup>2+</sup> levels of Flower mutants in worms and flies. The observations of increase Ca<sup>2+</sup> levels in stimulated flwr-1 mutant neurons could result from a reduced stimulation of the PMCA, and this was also observed with low stimulation in Drosophila (Yao et al., 2017). This idea is supported by the indications of a direct physical interaction, or proximity, of the two proteins. The reduced Ca<sup>2+</sup> levels after hyperstimulation of Drosophila Flower mutants may have to do with increased levels of non-recycling PMCA in the plasma membrane, indicating that PMCA requires Flower for recycling. This could be underlying the rundown of evoked PSCs we find in worm flwr-1 mutants, and would also be in line with a function of FLWR-1 and MCA-3 in coelomocytes, cells that constantly endocytose, and in which both proteins are required for proper function (our data, Figs. 5A, B; 8D, E) and Bednarek et al., 2007 (Traffic 8: 543-553). CCs need to recycle / endocytose membranes and membrane proteins, and such proteins, likely including FLWR-1 and MCA-3, need to be returned to the PM effectively.

      We thus refrained from testing a putative FLWR-1 channel function in Xenopus oocytes, in part also because we would not be able to acutely trigger possible FLWR-1 gating. A constitutive Ca<sup>2+</sup> current, if it were present, would induce large Cl<sup>-</sup> conductance in oocytes, that would likely be problematic / killing the cells. The demonstration that FLWR-1(E74Q) does not rescue the PI(4,5)P<sub>2</sub> levels in coelomocytes is also more in line with a non-channel function of FLWR-1.

      (6) Phrases like "increased excitability" and "increased Ca<sup>2+</sup> influx" are used throughout the manuscript. However, there is no direct evidence that motor neurons exhibit increased excitability or Ca<sup>2+</sup> influx. The authors appear to interpret the elevated Ca<sup>2+</sup> signal in motor neurons as indicative of both increased excitability and Ca<sup>2+</sup> influx. However, this elevated Ca<sup>2+</sup> signal in the flwr-1 mutant could occur independently of changes in excitability or Ca<sup>2+</sup> influx, such as in cases of reduced MCA-3 activity. The authors may wish to consider alternative terminology that more accurately reflects their findings.

      Thank you, we rephrased the imprecise wording. Ca<sup>2+</sup> influx was meant with respect to the cytosol.

      Reviewer #3 (Public review):

      Summary:

      Seidenthal et al. investigated the role of the Flower protein, FLWR-1, in C. elegans and confirmed its involvement in endocytosis within both synaptic and non-neuronal cells, possibly by contributing to the fission of bulk endosomes. They also uncovered that FLWR-1 has a novel inhibitory effect on neuronal excitability at GABAergic and cholinergic synapses in neuromuscular junctions.

      Strengths:

      This study not only reinforces the conserved role of the Flower protein in endocytosis across species but also provides valuable ultrastructural data to support its function in the bulk endosome fission process. Additionally, the discovery of FLWR-1's role in modulating neuronal excitability broadens our understanding of its functions and opens new avenues for research into synaptic regulation.

      Weaknesses:

      The study does not address the ongoing debate about the Flower protein's proposed Ca<sup>2+</sup> channel activity, leaving an important aspect of its function unexplored. Furthermore, the evidence supporting the mechanism by which FLWR-1 inhibits neuronal excitability is limited. The suggested involvement of MCA-3 as a mediator of this inhibition lacks conclusive evidence, and a more detailed exploration of this pathway would strengthen the findings.

      We added new data showing the likely direct interaction of FLWR-1 with the PMCA, possibly upregulating / stimulating its function. This data is shown now in Fig. 9A, B. Also, we show now that FLWR-1 is required to stabilize MCA-3 expression / localization in the pre-synaptic plasma membrane (Fig. 9C, D). These findings are not supporting the putative function of FLWR-1 as an ion channel, but suggest that increased Ca<sup>2+</sup> levels following neuron stimulation in flwr-1 mutants are due to an impairment of MCA-3 and thus reduced Ca<sup>2+</sup> extrusion.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors might consider focusing on one or two key findings from this study and providing robust evidence to substantiate their conclusions.

      We did substantiate the interactions of FLWR-1 and the PMCA, as well as assessing the function of FLWR-1 in the coelomocytes and the function of FLWR-1 in regulating PIP2 levels in the plasma membrane.

      Reviewer #3 (Recommendations for the authors):

      (1) Behavioral Analysis of Locomotion

      In Figure 1, the authors are encouraged to examine whether flwr-1 mutants show altered locomotion behaviors, such as velocity, in a solid medium.

      We performed such an analysis for wild type, comparing to flwr-1 mutants and flwr-1 mutants rescued with FLWR-1 expressed from the endogenous promoter. The data are shown in Fig. S1C. There was no difference. We note that we observed differences in swimming assays also only when we strongly stimulated the cholinergic neurons by optogenetic depolarization, but not during unstimulated, normal swimming.

      (2) Validation of FLWR-1 Tagging

      In Figure 2A, it is recommended that the authors confirm the functionality of the C-terminal-tagged FLWR-1.

      We performed such rescue assays during swimming. The data is shown in Fig. S2S, E. While the GFP::FLWR-1 animals were slightly affected right after the photostimulation, they quickly caught up with the wild type controls, while flwr-1 mutants remained affected even after several minutes.

      (3) Explanation of Differential Rescue in GABAergic Neurons and Muscle

      The authors should provide a rationale for why restoring FLWR-1 in GABAergic neurons fully rescues the aldicarb resistance phenotype, while its restoration in muscle also partially rescues it.

      We think that these effects are independent of each other, i.e. loss of FLWR-1 in muscles increases muscular excitability, which becomes apparent in the behavioral assay that depends on locomotion and muscle contraction. To assess this further, we performed combined GABAergic neuron and muscle rescue assays, as shown in Fig. S3B. The double rescue was not different from wild type, and performed better than the muscle rescue alone.

      (4) Rescue Experiments for Swimming Defect in GABAergic Neurons

      Consider adding rescue experiments to determine whether expressing FLWR-1 specifically in GABAergic neurons can restore the swimming defect phenotype.

      We did not perform this assay as swimming is driven by cholinergic neurons, meaning that we would only indirectly probe GABAergic neuron function and a GABAergic FLWR-1 rescue would likely not improve swimming much. Also, given the importance of the correct E/I balance in the motor neurons, it would likely require achieving expression levels that are very precisely matching endogenous expression levels, which is not possible in a cell-specific manner.

      (5) Further Data on GCaMP Assay for mca-3; flwr-1 Additive Effect

      The additive effect of the mca-3 and flwr-1 mutations on GCaMP signals requires further data for substantiation. Additional GCaMP recordings or statistical analysis would provide stronger support for the proposed interaction between MCA-3 and FLWR-1 in calcium signaling.

      Thank you. We increased the number of observations, and could thus improve the outcome of the assay in that it became more conclusive. Meaning, the double mutation was not exacerbating the effect of either single mutant, demonstrating that FLWR-1 and MCA-3 are acting in the same pathway. The data are in Fig. 7B, C.

      (6) Inclusion of Wild-Type FLWR-1 Rescue in Figures 8B and 8D

      Figures 8B and 8D would benefit from the inclusion of wild-type FLWR-1 as a rescue control.

      We included the FLWR-1 wild type rescue as suggested and summarized the data in Fig. 8B.

    1. eLife Assessment

      In this important work, a quantitative analysis method for three-dimensional morphogenetic processes during embryonic development is introduced. The proposed method is a pipeline combining several methods, allowing quantitative analysis of developmental processes without cell segmentation and tracking. Upon application of their method, the authors obtain convincing evidence that ascidian gastrulation is a two-step process. This work should be of interest to a broad range of developmental biologists who aim to obtain a quantitative understanding of morphogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      The authors propose a new method to quantitatively assess morphogenetic processes during organismal development. They apply their method to ascidian morphogenesis and thus find that gastrulation is a two-step process.

      The method applies to morphogenetic changes of surfaces. It consists of the following steps: first, surface deformations are quantified based on microscopy images without requiring cellular segmentation and tracking. This is achieved by mapping, at each time point, a polygonal mesh initially defined on a sphere to the surface of the embryo. The mapped vertices of this polygonal mesh then serve as (Lagrangian) markers for the embryonic surface. From these, one can infer the deformation of the surface, which can be expressed in terms of the strain tensor at each point of the surface. Changes in the strain tensor give the strain rate, which captures the morphogenetic processes. Second, at each time point, the strain rate field is decomposed in terms of spherical harmonics. Finally, the evolution of the weights of the various spherical harmonics in the decomposition is analysed via a wavelet analysis. The authors apply their workflow to ascidian development between 4 and 8.7 hpf. From their analysis they find clear indications for gastrulation and neurulation and identify two sub-phases of gastrulation, namely, endoderm invagination and 'blastophore closure'.

      Strengths:

      The combination of various tools allows the authors to obtain a quantitative description of the developing embryo without the necessity of identifying fiducial markers. Visual inspection shows that their method works well. Furthermore, this quantification then allows for an unbiased identification of different morphogenetic phases.

      Weaknesses:

      At times, the explanation of the method is hard to follow, unless the reader is already familiar with concepts like level-set methods or wavelet transforms. Furthermore, the software for performing the determination of Lagrangian markers or the subsequent spectral analysis does not seem to be available to the readers.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors proposed a method to quantitatively analyze 3D live imaging data of early developing embryos, using the ascidian development as an example. For this purpose, the previously proposed level set method was used to computationally track the temporal evolution of reference points introduced on the embryo surface. Then, from the obtained three-dimensional trajectories, the velocity field was obtained, from which the strain rate field was computed. The strain rate field was analyzed using spherical harmonics.

      In this paper, the authors focused on the modes with lower order with real coefficients. The time evolution of these modes was analyzed using wavelet transforms. The results obtained by the pipeline reflected the developmental stages of ascidian embryos.

      Strengths:

      In this way, this manuscript proposes a pipeline of analyses combining various methods. The strength of this method lies in its ability to quantitatively analyze the deformation of the entire embryo without the requirement for cellular segmentation and tracking.

      Weaknesses:

      The mathematics behind this method is not straightforward to understand. The value of this method will be understood as analyses of real data using this method accumulate.

      Comments on revised version:

      I have reviewed the revised manuscript and the reply from the authors. All concerns have been addressed appropriately.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1:

      (1) Figure 2 is mentioned before Figure 1

      We thank the reviewer for pointing this out, this was a mistake. What was meant by Figure 2 was actually Figure 1. This has been corrected in the manuscript.

      (2) Figure 1c: red is used to indicate cell junctions on raw data, but also the error.

      The color red is used to indicate cell junctions on raw data on figure 1c left, while it is used to indicate the error on figure 1c right.

      The Lagrangian error can be negative right? This is not reflected by the error scale which goes from 0% to 100%

      A negative Lagragian error would mean that the distance between real and simulated cellular junctions decreased over time. We effectively treat this case as if there was no displacement, and the error is hence 0%.

      Why do you measure the error in percent?

      The error is measured in percentages because it is relative to the apical length of a cell.

      (3) Figure 2: The distinction between pink and red in e_2(t) is very difficult. What do the lines indicate?

      The lines indicate directions of the eigen vectors of the strain rate tensor at every material particle of the embryo.

      (4) L156 "per unit length": Rather per unit time?

      We thank the reviewer for pointing this out. We apologize for this mistake. "per unit length" has been changed to "per unit time"

      (5) L159 "Eigen vectors in this sense": is there another sense?

      "In this sense" is referring to the geometric description of eigen vectors. The phrase has been removed

      (6) L164 "magnitude of the rate of change underwent by a particle at the surface of the embryo in the three orthogonal spatial directions of most significant rate of change."

      Would a decomposition in two directions within the surface's tangent plane and one perpendicular to it not be better?

      We also performed the decomposition of the strain rate tensor as suggested within the surface's tangent plane and one perpendicular to it, but did not notice any tangible differences in the overall analysis, especially after derivation of the scalar field.

      (7) L174 "morphological activity": I think this notion is never defined

      By morphological activity we mean any noticeable shape changes

      (8) L177: I did not quite understand this part

      This part tries to convey that the scalar strain rate field evidences coordinated cell behaviors by highlighting wide regions of red that traverse cell boundaries (e.g. fig.2b, $t=5.48hpb$). At the same time, the strain rate field preserves cell boundaries, highlighted by bands of red at cellular intersections, when cell coordinated cell behaviors are not preponderant (e.g. fig.2b, $t=4hpb$).

      (9) Ll 194 "Unsurprisingly, these functions play an important role in many branches of science including quantum mechanics and geophysics Knaack and Stenflo (2005); Dahlen and Tromp (2021)." Does this really help in understanding spherical harmonics?

      This comment was made with the aim of showing to the reader that Spherical Harmonics have proved to be useful in other fields. Although it does not help in understanding spherical harmonics, it establishes that they can be effective.

      (10) Figure 3a: I do not find this panel particularly helpful. What does the color indicate? What are the prefactors of the spherical harmonics?

      This panel showcases the restriction of the strain rate scalar field to the spherical harmonics with the l and m specified. Each material particle of the embryo surface at the time  is colored with respect to the value of . The values are computed according to equation 2 and are showcased in figure 3c.

      (11) L 265: Please define "scalogram" as opposed to a spectrogram.

      Scalograms are the result of wavelet transforms applied to a signal. Although spectrogram can specifically refer to the spectrum of frequencies resulting for example from a Fourier transform, the term can also be used in a broader sense to designate any time-frequency representation. In the context of this paper, we used it interchangeably with scalogram. We have changed all occurrences of spectrogram to scalogram in the revised manuscript.

      (12) L 299 "the analysis was carried out the 64-cell stage.": Probably 'the analysis was carried out at the 64-cell stage'

      We thank the reviewer for pointing this out. The manuscript was revised to reflect the suggested change.

      (13) L 340 "Another outstanding advantage over traditional is": Something seems to be missing in this sentence.

      We thank the reviewer for pointing this out. We have modified the sentence in the revised manuscript. It now reads “Another outstanding advantage of our workflow over traditional methods is that our workflow is able to compress the story of the development ... ”.

      (14) Ll 357 "on the one hand, the overall spatial resolution of the raw data, on the other hand, the induced computational complexity.": Is there something missing in this sentence

      The sentence tries to convey the idea that in implementing our method, there is a comprise to be made between the choice of the number of particles on the constructed mesh and the computational complexity induced by this choice. There is also a comprise to be made between this choice of the number of particles and the spatial resolution of the original dataset.

      Reviewer 2:

      (1) The authors should clearly state to which data this method has been applied in this paper. Also, to what kind of data can this method be applied? For instance, should the embryo surface be segmented?

      The method has been applied on 3D+time imaging data of ascidian embryonic development data hosted on the morphonet (morphonet.org) platform. The data on the morphonet platform comes in two formats: closed surface meshes of segmented cells spatially organized into the embryo, and 3D voxelated images of the embryo. The method was first designed for the former format and then extended to the later. There is no requirement for the embryo surface to be segmented.

      (2) In this paper, it is essential to understand the way that the authors introduced the Lagrangian markers on the surface of the embryo. However, understanding the method solely based on the description in the main text was difficult. I recommend providing a detailed explanation of the methodology including equations in the main text for clarity.

      We believe that adding mathematical details of the method into the text will cloud the text and make it more difficult to understand. Interested readers can refer to the supplementary material for detailed explanation of the method.

      (3) In eq.(1) of the supplementary information, d(x,S_2(t)) could be a distance function between S_1 and S_2 although it was not stated. How was the distance function between the surfaces defined?

      What was meant here was d(x,S_1(t)) where x is a point of S_2(t). d(x,S_1(t)) referring to the distance between point x and S_1(t). The definition of the distance function has been clarified in the supplementary information.

      (4) In the section on the level set scheme of supplementary information, the derivation of eq.(4) from eq.(3) was not clear.

      We added an intermediary equation for clarification.

      (5) Why is a reference shape S_1(0) absent at t=0?

      A reference shape S_1(0) is absent at t=0 precisely because that is what we are trying to achieve: construct an evolving Lagrangian surface S_2(t) matching S_1(t) at all times.

      (6) In Figure 2(a), it is unclear what was plotted. What do the colors mean? A color bar should be provided.

      The caption of the figure describes the colors: “a) Heatmap of the eigenvector fields of the strain rate tensor. Each row represents a vector field distinguished by a distinct root color (\textit{yellow, pink, white}). The gradient from the root color to red represents increasing magnitudes of the strain rate tensor.”

      (7) With an appropriate transformation, it would be possible to create a 2D map from a 3D representation shown in for instance Figure 2. Such a 2D representation would be more tractable for looking at the overall activities.

      We thank the reviewer for pointing this out. In Figure 4b of the supplementary information, we provide a 2D projection of the scalar strain rate field.

      (8) The strain rate is a second-order tensor that contains rich information. In this paper, the information in the tensor has been compressed into a scalar field by taking the square root of the sum of the squares of the eigenvalues. However, such a representation may not distinguish important events such as stretching and compression of the tissue. The authors should provide appropriate arguments regarding the limitations of this analysis.

      The tensor form of the strain rate field is indeed endowed with more information than the scalar eigen value field derived. However, our objective in this project was not to exhaust the richness of the strain rate tensor field but rather to serve as a proof of concept that our global approach to studying morphogenesis could in fact unveil sufficiently rich information on the dynamical processes at play. Although not in the scope of this project, a more thorough exploration of the strain rate tensor field could be the object of future investigations.

      (9) The authors claimed that similarities emerge between the spatiotemporal distribution of morphogenesis processes in the previous works and the heatmaps in this work. Some concrete data should be provided to support this claim.

      All claims have been backed with references to previous works. For instances, looking at figure 2b, the two middle panels on the lower row (5.48hpf, 6.97hpf), we explained that the concentration of red refers respectively to endoderm invagination during gastrulation, and zippering during neurulation [we cited Hashimoto et al. (2015)]. Here, we relied on eye observation to spot the similarities. The rest of the paper provides substantial and robust additional support for these claims using spectral decomposition in space and time.

      (10) The authors also claimed that "A notable by-product of this scalar field is the evidencing of the duality of the embryo as both a sum of parts constituted of cells and an emerging entity in itself: the strain rate field clearly discriminates between spatiotemporal locations where isolated single cell behaviours are preponderant and those where coordinated cell behaviours dominate." The authors should provide specific examples and analysis to support this argument.

      Here, we relied on eye observation to make this claim. This whole section of the paper “Strain rate field describes ascidian morphogenesis” was about computing, plot and observing the strain rate field.

      However, specific examples were provided. This paragraph was building towards this statement, and the evidence was scattered through the paragraph. We have now revised the sentence to ensure that we highlight specific examples:

      “A notable by-product of this scalar field is the evidencing of the duality of the embryo as both a sum of parts constituted of cells and an emerging entity in itself: the strain rate field clearly discriminates between spatiotemporal locations where isolated single cell behaviours are preponderant (e.g. fig.2b, $t=4hpb$) and those where coordinated cell behaviours dominate (e.g. fig.2b, $t=5.48hpb$).”

      (11) The authors should provide the details of the analysis method used in Figure 3b, including relevant equations. In particular, it would be helpful to clarify the differences that cause the observed differences between Figure 3b and Figure 3c.

      Figure 3b was introduced with the sentence: “In analogy to Principal Components Analysis, we measure the average variance ratio over time of each harmonic with respect to the original signal (Fig.3b).” explaining the origin of variance ratio values used in figure 3b. We have now added the mathematical expression to further clarify.

      (12) The authors found that the variance ratio of Y_00 was 64.4%. Y_00 is a sphere, indicating that most of the activity can be explained by a uniform activity. Which actual biological process explains this symmetrical activity?

      The reviewer makes a good point which also gave us a lot to think about during the analysis. Observing that the contribution of Y00 peaks during synchronous divisions, which are interestingly restricted only to the animal pole, we conjecture that localized morphological ripples and can be felt throughout the embryo. 

      (13) The contribution of other spherical harmonics than Y_00 and Y_10 should be shown.

      Other spherical harmonics contributed individual to less than 1% and we did not find it important to include them in the main figure. We will add supplementary material.

    1. eLife Assessment

      This important manuscript by Genzoni et al. reports the striking discovery of a regulatory role for trophic eggs in ant caste determination. Prior to this study, trophic eggs were widely assumed to play only a nutritional role in the colony, but this compelling study shows that trophic eggs can suppress queen development, and therefore regulate caste determination in specific social contexts.

    2. Reviewer #2 (Public review):

      The revised manuscript by Genzoni et al. reports the striking discovery of a regulatory role for trophic eggs. Prior to this study, trophic eggs were widely assumed to play a nutritional role in the colony, but this study shows that trophic eggs can suppress queen development, and therefore, can play a role in regulating caste determination in specific social contexts. In this revised version of the manuscript, the authors have addressed many of the concerns raised in the first version regarding the lack of sufficient information and context in the Introduction and Discussion. I have several (mostly minor) comments I would like the authors to address:

      Comments:

      (1) The authors' experimental design is based on the comparison of a larva-only (control) versus larva+3 trophic eggs (treatment). The authors convincingly show that the larva plus 3 trophic eggs treatment has an inhibitory effect versus larva-only control. However, the authors should have also done a treatment composed of larva + 3 viable eggs to determine if the inhibitory effect observed on queens is specific to trophic eggs or whether it is an inhibitory effect of all eggs. This has had important mechanistic consequences, because if the inhibitory effect is specific to trophic eggs, it means there are specific inhibitory factors deposited in trophic eggs during oogenesis and the differences observed between trophic versus viable eggs are meaningful beyond just nutritional differences. If the inhibitory effect is a property of all eggs, then the inhibitory factor is dumped into all eggs and the differences observed between trophic and viable eggs are related to something else. In all cases, this reviewer is not necessarily asking that they perform this additional treatment, but the authors have to be clear in the text that they cannot claim that the inhibitory effect is specific to trophic eggs alone without doing this experiment.

      (2) The other untested assumption the authors are making is that queen-laid trophic eggs would behave the same as worker-laid trophic eggs. This is apparent in the Discussion (line 422). They should instead highlight the interesting question of whether worker-laid trophic eggs would be similar in composition and have the same effect on caste as queen-laid eggs.

      (3) To this reviewer, they are missing a crucial explanation in the discussion. As far as this reviewer knows, young queens produce a higher proportion of trophic eggs than older queens, meaning that trophic egg production decreases with age of the queen. This raises the possibility that trophic eggs may, in part, function to prevent the production of more virgin queens in young and immature colonies with small colony sizes. This would allow colonies to invest in producing more workers at a time when rapidly expanding the colony is crucial in young colonies' life. Production of trophic eggs, therefore, may have a dual function: one for nutrition and larval survival, and one in suppressing queen development in immature young colonies. It can be said then that trophic eggs can regulate / influence caste determination in specific social / life history contexts of the colony, rather than only proposing that trophic eggs are a constant attempt by the queen to manipulate her offspring. I prefer the superorganism explanation, but readers should at least hear explanations at the individual and superorganism scales as a way of explaining the authors' discovery that trophic eggs suppress further queen development.

      (4) Why did the authors change the wording from caste "determination" to caste "differentiation." Determination is more appropriate because the trophic eggs do not affect morphogenesis of queens or workers, but rather the developmental switch between queens and workers.

      (5) Khila and Abouheif (2008) is listed in the References but not cited in the text.

      (6) On Line 70-81: "...may play a role in the regulation of body size" - I think the authors are trying to be broad in their language here since one study showed trophic eggs increased worker size but didn't induce queens, but this statement implies that the hypothesis is that trophic eggs act via body size to affect caste. Since the authors don't measure body size changes, only binary caste outcome, this is not the best way to set up the question. Could instead just conclude that previous work shows an effect on both caste and body size.

      (7) Paragraph beginning line 432: this paragraph seems out of place, not well connected to previous parts of discussion. It introduces the term "egg cannibalism" without defining it - not clear if this is meant as a synonym for eating of trophic eggs, or broader (i.e., eating viable eggs also). Could either remove the paragraph, or better set up the context that egg-eating behaviour is common in ants, could have evolved for worker policing reasons and/or for nutritional exchange, trophic eggs (and potentially co-option of trophic eggs for caste determination functions) presumably evolved in this context of existing egg-eating behaviour.

      (8) Line 41: Should read 'play an important part.

      (9) Line 51: The food that was given is listed, but there is no information about the quantity of food given.

      (10) Line 74: The paragraph states that queens were isolated for 16 hours per day. However, it lacks a clear reason for this specific duration. Why 16 hours? Could this isolation period have impacted egg quality or larval development?

      (11) Line 76: The eggs were collected every 8 hours and then held for 10 days until hatching. This is a very long time for eggs to be held outside of the normal colony environment. This could have a large impact on the viability of the eggs, and the resulting larvae.

      (12) Line 78: twice "that" in "suggested that that the larger castes"

      (13) Lines 96-97: the following sentence is unclear: "The question mark indicates that it is unclear whether about the evidence for the production trophic eggs by queens and workers"

      (14) Line 209: By simply stating "binomial GLMM," the authors are leaving out a crucial piece of information. Readers cannot fully understand how the model was fitted or how the coefficients should be interpreted without knowing the link function. Therefore, the critique is that for complete and replicable science, the link function must be reported.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript describes a series of experiments documenting trophic egg production in a species of harvester ant, Pogonomyrmex rugosus. In brief, queens are the primary trophic egg producers, there is seasonality and periodicity to trophic egg production, trophic eggs differ in many basic dimensions and contents relative to reproductive eggs, and diets supplemented with trophic eggs had an effect on the queen/worker ratio produced (increasing worker production).

      The manuscript is very well prepared and the methods are sufficient. The outcomes are interesting and help fill gaps in knowledge, both on ants as well as insects, more generally. More context could enrich the study and flow could be improved.

      We thank the reviewer for these comments. We agree that the paper would benefit from more context. We have therefore greatly extended the introduction.

      Reviewer #2 (Public Review):

      The manuscript by Genzoni et al. provides evidence that trophic eggs laid by the queen in the ant Pogonomyrmex rugosis have an inhibitory effect on queen development. The authors also compare a number of features of trophic eggs, including protein, DNA, RNA, and miRNA content, to reproductive eggs. To support their argument that trophic eggs have an inhibitory effect on queen development, the authors show that trophic eggs have a lower content of protein, triglycerides, glycogen, and glucose than reproductive eggs, and that their miRNA distributions are different relative to reproductive eggs. Although the finding of an inhibitory influence of trophic eggs on queen development is indeed arresting, the egg cross-fostering experiment that supports this finding can be effectively boiled down to a single figure (Figure 6). The rest of the data are supplementary and correlative in nature (and can be combined), especially the miRNA differences shown between trophic and reproductive eggs. This means that the authors have not yet identified the mechanism through which the inhibitory effect on queen development is occurring. To this reviewer, this finding is more appropriate as a short report and not a research article. A full research article would be warranted if the authors had identified the mechanism underlying the inhibitory effect on queen development. Furthermore, the article is written poorly and lacks much background information necessary for the general reader to properly evaluate the robustness of the conclusions and to appreciate the significance of the findings.

      We thank the reviewer for these comments. We agree that the paper would benefit by having more background information and more discussion. We have followed this advice in the revision.

      Reviewer #3 (Public Review):

      In "Trophic eggs affect caste determination in the ant Pogonomyrmex rugosus" Genzoni et al. probe a fundamental question in sociobiology, what are the molecular and developmental processes governing caste determination? In many social insect lineages, caste determination is a major ontogenetic milestone that establishes the discrete queen and worker life histories that make up the fundamental units of their colonies. Over the last century, mechanisms of caste determination, particularly regulators of caste during development, have remained relatively elusive. Here, Genzoni et al. discovered an unexpected role for trophic eggs in suppressing queen development - where bi-potential larvae fed trophic eggs become significantly more likely to develop into workers instead of gynes (new queens). These results are unexpected, and potentially paradigm-shifting, given that previously trophic eggs have been hypothesized to evolve to act as an additional intracolony resource for colonies in potentially competitive environments or during specific times in colony ontogeny (colony foundation), where additional food sources independent of foraging would be beneficial. While the evidence and methods used are compelling (e.g., the sequence of reproductive vs. trophic egg deposition by single queens, which highlights that the production of trophic eggs is tightly regulated), the connective tissue linking many experiments is missing and the downstream mechanism is speculative (e.g., whether miRNA, proteins, triglycerides, glycogen levels in trophic eggs is what suppresses queen development). Overall, this research elevates the importance of trophic eggs in regulating queen and worker development but how this is achieved remains unknown.

      We thank the reviewer for these comments and agree that future work should focus on identifying the substances in trophic eggs that are responsible for caste determination.  

      Reviewer #1 (Recommendations For The Authors):

      Introduction:

      The context for this study is insufficiently developed in the introduction - it would be nice to have a more detailed survey of what is known about trophic eggs in insects, especially social insects. The end of the introduction nicely sets up the hypothesis through the prior work described by Helms Cahan et al. (2011) where they found JH supplementation increased trophic egg production and also increased worker size. I think that the introduction could give more context about egg production in Pogonomyrmex and other ants, including what is known about worker reproduction. For example, Suni et al. 2007 and Smith et al. 2007 both describe the absence of male production by workers in two different harvester ants. Workers tend to have underdeveloped ovaries when in the presence of the queen. Other species of ants are known to have worker reproduction seemingly for the purpose of nutrition (see Heinze and Hölldober 1995 and subsequent studies on Crematogaster smithi). Because some ants, including Pogonomyrmex, lack trophallaxis, it has been hypothesized that they distribute nutrients throughout the nest via trophic eggs as is seen in at least one other ant (Gobin and Ito 2000). Interestingly, Smith and Suarez (2009) speculated that the difference in nutrition of developing sexual versus worker larvae (as seen in their pupal stable isotope values) was due to trophic egg provisioning - they predicted the opposite as was found in this study, but their prediction was in line with that of Helms Cahan et al. (2011). This is all to say that there is a lot of context that could go into developing the ideas tested in this paper that is completely overlooked. The inclusion of more of what is known already would greatly enrich the introduction.

      We agree that it would be useful to provide a larger context to the study. We now provide more information on the life-history of ants and explained under what situations queens and workers may produce trophic eggs. We also mentioned that some ants such as Crematogaster smithi have a special caste of “large workers” which are morphologically intermediate between winged queens and small workers and appear to be specialized in the production of unfertilized eggs. We now also mention the study of Goby and Ito (200) where the authors show that trophic eggs may play an important role in food distribution withing the colony, in particular in species where trophallaxis is rare or absent.

      Methods:

      L49: What lineage is represented in the colonies used? The collection location is near where both dependent-lineage (genetic caste determining) P. rugosus and "H" lineage exist. This is important to know. Further, depending on what these are, the authors should note whether this has relevance to the study. Not mentioning genetic caste determination in a paper that examines caste determination is problematic.

      This is a good point. We have now provided information at the very beginning of the material and method section that the queens had been collected in populations known not to have dependentlineage (genetic caste determining) mechanisms of caste determination.

      L63 and throughout: It would be more efficient to have a paragraph that cites R (must be done) and RStudio once as the tool for all analyses. It also seems that most model construction and testing was done using lme4 - so just lay this out once instead of over and over.

      We agree and have updated the manuscript accordingly.

      L95: 'lenght' needs to be 'length' in the formula.

      Thanks, corrected.

      L151: A PCA was used but not described in the methods. This should be covered here. And while a Mantel test is used, I might consider a permANOVA as this more intuitively (for me, at least) goes along with the PCA.

      We added the PCA description in the Material and Method section.

      Results:

      I love Fig. 3! Super cool.

      Thanks for this positive comment.

      Discussion:

      It would be good to have more on egg cannibalism. This is reasonably well-studied and could be good extra context.

      We have added a paragraph in the discussion to mention that egg cannibalism is ubiquitous in ants.

      Supp Table 1: P. badius is missing and citations are incorrectly attributed to P. barbatus.

      P. badius was present in the Table but not with the other Pogonomyrmex species. For some genera the species were also not listed in alphabetic order. This has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      COMMENTS ON INTRODUCTION:

      The introduction is missing information about caste determination in ants generally and Pogonomyrmex rugosis specifically. This is important because some colonies of Pogonomyrmex rugosis have been shown to undergo genetic caste determination, in which case the main result would be rendered insignificant. What is the evidence that caste determination in the lineages/colonies used is largely environmentally influenced and in what contexts/environmental factors? All of this should be made clear.

      This is a good point. We have expanded the introduction to discuss previous work on caste determination in Pogonomyrmex species with environmental caste determination and now also provide evidence at the beginning of the Material and Method section that the two populations studied do not have a system of genetic caste determination.

      Line 32 and throughout the paper: What is meant exactly by 'reproductive eggs'? Are these eggs that develop specifically into reproductives (i.e., queens/males) or all eggs that are non-trophic? If the latter, then it is best to refer to these eggs as 'viable' in order to prevent confusion.

      We agree and have updated the manuscript accordingly.

      Figure 1/Supp Table 1: It is surprising how few species are known to lay trophic eggs. Do the authors think this is an informative representation of the distribution of trophic egg production across subfamilies, or due to lack of study? Furthermore, the branches show ant subfamilies, not families. What does the question mark indicate? Also, the information in the table next to the phylogeny is not easy to understand. Having in the branches that information, in categories, shown in color for example, could be better and more informative. Finally, having the 'none' column with only one entry is confusing - discuss that only one species has been shown to definitely not lay trophic eggs in the text, but it does not add much to the figure.

      Trophic eggs are probably very common in ants, but this has not been very well studied. We added a sentence in the manuscript to make this clear.

      Thanks for noticing the error family/subfamily error. This has been corrected in Figure 1 and Supplementary Table 1.

      The question mark indicates uncertainty about whether queens also contribute to the production of trophic eggs in one species (Lasius niger). We have now added information on that in the Figure legend.

      We agree with the reviewer that it would be easier to have the information on whether queens and workers produce trophic on the branches of the Tree. However, having the information on the branches would suggest that the “trait” evolved on this part of the tree. As we do not know when worker or queen production of trophic eggs exactly evolved, we prefer to keep the figure as it is.

      Finally, we have also removed the none in the figure as suggested by the reviewer and discussed in the manuscript the fact that the absence of trophic eggs has been reported in only one ant species (Amblyopone silvestrii: Masuko 2003).

      COMMENTS ON MATERIALS AND METHODS:

      Why did they settle on three trophic eggs per larva for their experimental setup?

      We used three trophic eggs because under natural conditions 50-65% of the eggs are trophic. The ratio of trophic eggs to viable eggs (larvae) was thus similar natural condition.

      Line 50: In what kind of setup were the ants kept? Plaster nests? Plastic boxes? Tubes? Was the setup dry or moist? I think this information is important to know in the context of trophic eggs.

      We now explain that colonies were maintained in plastic boxes with water tubes.

      Line 60: Were all the 43 queens isolated only once, or multiple times?

      Each of the 43 queens were isolated for 8 hours every day for 2 weeks, once before and once after hibernation (so they were isolated multiple times). We have changed the text to make clear that this was done for each of the 43 queens.

      Could isolating the queen away from workers/brood have had an effect on the type of eggs laid?

      This cannot be completely ruled out. However, it is possible to reliably determine the proportion of viable and trophic eggs only by isolating queens. And importantly the main aim of these experiments was not to precisely determine the proportion viable and trophic eggs, but to show that this proportion changes before and after hibernation and that queens do not lay viable and trophic eggs in a random sequence.

      Since it was established that only queens lay trophic eggs why was the isolation necessary?

      Yes this was necessary because eggs are fragile and very difficult to collect in colonies with workers (as soon as eggs are laid they are piled up and as soon as we disturb the nest, a worker takes them all and runs away with them). Moreover, it is possible that workers preferentially eat one type of eggs thus requiring to remove eggs as soon as queens would have laid them. This would have been a huge disturbance for the colonies.

      Line 61: Is this hibernation natural or lab induced? What is the purpose of it? How long was the hibernation and at what temperature? Where are the references for the requirement of a diapause and its length?

      The hibernation was lab induced. We hibernated the queens because we previously showed that hibernation is important to trigger the production of gynes in P. rugosus colonies in the laboratory (Schwander et al 2008; Libbrecht et al 2013). Hibernation conditions were as described in Libbrecht et al (2013).  

      Line 73: If the queen is disturbed several times for three weeks, which effect does it have on its egg-laying rate and on the eggs laid? Were the eggs equally distributed in time in the recipient colonies with and without trophic eggs to avoid possible effects?

      It is difficult to respond what was the effect of disturbance on the number and type of eggs laid. But again our aim was not to precisely determine these values but determine whether there was an effect of hibernation on the proportion of trophic eggs. The recipient colonies with and without trophic eggs were formed in exactly the same way. No viable eggs were introduced in these colonies, but all first instar larvae have been introduced in the same way, at the same time, and with random assignment. We have clarified this in the Material and Method section.

      Line 77: Before placing the freshly hatched larvae in recipient colonies, how long were the recipient colonies kept without eggs and how long were they fed before giving the eggs? Were they kept long enough without the queen to avoid possible effects of trophic eggs, or too long so that their behavior changed?

      The recipient colonies were created 7 to 10 days before receiving the first larvae and were fed ad libitum with grass seeds, flies and honey water from the beginning. Trophic eggs that would have been left over from the source colony should have been eaten within the first few days after creating the recipient colonies. However, even if some trophic eggs would have remained, this would not influence our conclusion that trophic eggs influence caste fate, given the fully randomized nature of our treatments and the considerable number of independent replicates. The same applies to potential changes in worker behavior following their isolation from the queen.

      Line 77: Is it known at what stage caste determination occurs in this species? Here first instar larvae were given trophic eggs or not. Does caste-determination occur at the first instar stage? If not, what effect could providing trophic eggs at other stages have on caste-determination?

      A previous study showed that there is a maternal effect on caste determination in the focal species (Schwander et al 2008). The mechanism underlying this maternal effect was hypothesized to be differential maternal provisioning of viable eggs. However, as we detail in the discussion, the new data presented in our study suggests that the mechanism is in fact a different abundance of trophic eggs laid by queens. There is currently no information when exactly caste determination occurs during development

      COMMENTS ON RESULTS:

      Line 65: How does investigating the order of eggs laid help to "inform on the mechanisms of oogenesis"?

      We agree that the aim was not to study the mechanism of oogenesis. We have changed this sentence accordingly: “To assess whether viable and trophic eggs were laid in a random order, or whether eggs of a given type were laid in clusters, we isolated 11 queens for 10 hours, eight times over three weeks, and collected every hour the eggs laid”

      Figure 2: There is no description/discussion of data shown in panels B, C, E, and F in the main text.

      We have added information in the main text that while viable eggs showed embryonic development at 25 and 65 hours (Fig 12 B, C) there was no such development for trophic eggs (Fig. 2 E,F).

      Line 172: Please explain hibernation details and its significance on colony development/life cycle.

      We have added this information in the Material and Method section.

      Figure 6: How is B plotted? How could 0% of gynes have 100% survival?

      The survival is given for the larvae without considering caste. We have changed the de X axis of panel B and reworded the Figure legend to clarify this.

      Is reduced DNA content just an outcome of reduced cell number within trophic eggs, i.e., was this a difference in cell type or cell number? Or is it some other adaptive reason?

      It is likely to be due to a reduction in cell number (trophic eggs have maternal DNA in the chorion, while viable eggs have in addition the cells from the developing zygote) but we do not have data to make this point.

      Is there a logical sequence to the sequence of egg production? The authors showed that the sequence is non-random, but can they identify in what way? What would the biological significance be?

      We could not identify a logical sequence. Plausibly, the production of the two types of eggs implies some changes in the metabolic processes during egg production resulting in queens producing batches of either viable or trophic eggs. This would be an interesting question to study, but this is beyond the scope of this paper.

      Figure 6b is difficult to follow, and more generally, legends for all figures can be made clearer and more easy to follow.

      We agree. We have now improved the legends of Fig 6B and the other figures.

      Lines 172-174: "The percentage of eggs that were trophic was higher before hibernation...than after. This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable" - are these data shown? It would be nice to see how the total egglaying rate changes after hibernation. Also, is the proportion of trophic eggs laid similar between individual queens?

      No the data were not shown and we do not have excellent data to make this point. We have therefore removed the sentence “This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable” from the manuscript.

      Figure 6B: Do several colonies produce 100% gynes despite receiving trophic eggs? It would be interesting if the authors discussed why this might occur (e.g., the larvae are already fully determined to be queens and not responsive to whatever signal is in the trophic eggs).

      The reviewer is correct that 4 colonies produced 100% gynes despite receiving trophic eggs. However, the number of individuals produced in these four colonies was small (2,1,2,1, see supplementary Table 2). So, it is likely that it is just by chance that these colonies produced only gynes.

      Figure 5: Why a separation by "size distribution variation of miRNA"? What is the relevance of looking at size distributions as opposed to levels?

      We did that because there many different miRNA species, reflected by the fact that there is not just one size peak but multiple one. This is why we looked at size distribution

      Figure 2: The image of the viable embryo is not clear. If possible, redo the viable to show better quality images.

      Unfortunately, we do not anymore have colonies in the laboratory so this is not possible.

      COMMENTS ON DISCUSSION:

      Lines 236-247: Can an explanation be provided as to why the effect of trophic eggs in P. rugosus is the opposite of those observed by studies referenced in this section? Could P. rugosus have any life history traits that might explain this observation?

      In the two mentioned studies there were other factors that co-varied with variation in the quantity of trophic eggs. We mentioned that and suggested that it would be useful to conduct experimental manipulation of the quantity of trophic eggs in the Argentine ant and P. barbatus (the two species where an effect of trophic eggs had been suggested).

      The discussion should include implications and future research of the discovery.

      We made some suggestions of experiments that should be performed in the future

      The conclusion paragraph is too short and does not represent what was discussed.

      We added two sentences at the end of the paragraph to make suggestions of future studies that could be performed.

      Lines 231 to 247: Drastically reduce and move this whole part to the introduction to substantiate the assumption that trophic eggs play a nutritional role.

      We moved most of this paragraph to the introduction, as suggested by the reviewer.

      Reviewer #3 (Recommendations For The Authors):

      I would like to commend the authors on their study. The main findings of the paper are individually solid and provide novel insight into caste determination and the nature of trophic eggs. However, the inferences made from much of the data and connections between independent lines of evidence often extend too far and are unsubstantiated.

      We thank the reviewer for the positive comment. We made many changes in the manuscript to improve the discussion of our results.

    1. eLife Assessment

      This paper introduces an important theoretical method for characterizing symmetries of cells in biological tissues by capturing their real shape, and it sets results in contrast to related methods. The robustness of the paper's method to correctly capture dynamic and geometric changes that the cells may undergo is determined by convincing computational models, but the experimental support is incomplete and would benefit from better experimental imaging with higher quality and extended analysis. This would not only support the advantage of this method, but also strengthen its application to biological systems.

    2. Reviewer #1 (Public review):

      Summary:

      The authors' stated aim is to introduce so-called Minkowski tensors to characterize and quantify the shape of cells in tissues. The authors introduce Minkowski tensors and then define the p-atic order q<sub>p</sub>, where p is an integer, as a cell shape measure. They also introduce a previously defined measure of p-atic order in the form of the parameter γ<sub>p</sub>. The authors compute q<sub>p</sub>p for data obtained by simulating an active vertex model and a multiphase field model, where they focus on p=2 and p=6 - nematic and hexatic order - as the two values of highest biological relevance. Based on their analysis, the authors claim that q<sub>2</sub> and q<sub>6</sub> are independent, that there is no crossover for the coarse-grained quantities, that the comparison of q<sub>p</sub> for different values of p is not meaningful, and determine the dependence of the mean value of q<sub>2</sub> and q<sub>6</sub>q<sub>6</sub> on cell activity and deformability. They then apply their method to data from MDCK monolayers and argue that the γ<sub>p</sub> "fail to capture the nuances of irregular cell shapes".

      Strength:

      The work presents a set of parameters that are useful for analyzing cell shape.

      Weaknesses:

      The main weakness of the manuscript is that the points that the authors make are not sufficiently elaborated or supported by the data. Although they start out with Minkowski tensors, they eventually only consider the parameters q<sub>p</sub>, which can be defined without any recourse to Minkowski tensors. Also, I dare to doubt that the average reader will benefit from the introduction to Minkowski tensors as it remains abstract and does not really go beyond repeating definitions. Eventually, for me, the work boils down to the statement that when you want to characterize (2d) cell shape, then it is better to take the whole cell contour instead of only the positions of the vertices of a polygon that approximates the full cell shape. By the way, for polygons, the q<sub>p</sub> and γ<sub>p</sub> should convey the same information as the vertex positions contain the whole geometric information.

      Some statements made about the values of q<sub>p</sub> are not supported by the data. For example, an independence of values of q<sub>2</sub> and q<sub>6</sub> cannot be inferred from Figure 7. Actually, Figure 8 points to some dependence between these values as the peaks of the pdfs move in the opposite direction as deformability and activity are changed. Figure 1 suggests that in general, larger cells have lower values of q<sub>p</sub> for all p. Some more serious quantification should be obtained here.

      The presented experimental data on MDCK cells is anecdotal.

    3. Reviewer #2 (Public review):

      Summary:

      Orientational symmetries of cells and tissues play an important role in describing processes in development and disease, and the methods used to investigate them rely on the detection of cell shape. In this interesting and very timely manuscript by Lea Happel et al., Minkowski tensors are introduced to study the orientational symmetries of cells and set in comparison to existing shape descriptors, such as the shape function introduced by Armengol-Collado et al., which captures the orientational symmetry by the vertex positions of the polygonal shape of the cell. As an advantage, the Minkowski tensors consider the real cell shape with its arbitrary curvature of the cortex. Using computational models, such as the active vertex model and the multiphase field model, as well as experimental support with MDCK monolayers, the authors find that the orientational symmetries are independent of one another, as well as that they are dependent on the activity and deformability of the cells, resulting in a monotonic trend. A trend that has not been observed for the hexatic symmetry using the shape function. Together with the lack of hexatic-nematic crossover at the tissue scale, the authors suggest a reconsideration of findings from other shape descriptors. Taken together, the Minkowski tensors set a framework to investigate orientational symmetries at a single cell scale and how they may interplay in biological tissues.

      Strengths:

      The authors introduce the Minkowski tensors, which capture the p-atic orders of cells in tissues, considering their real shape instead of a polygonal approximation as reported for other shape descriptors in the literature. Thus, they do not depend on the vertex positions of the cells nor on the number of neighboring cells. The Minkowski tensors capture the dependence of the p-atic orders on the cell activity and deformability in a monotonic manner, which makes them a robust tool for quantifying p-atic orders at a single-cell scale, especially for rounded cells. The robustness has been tested by comparing the results of two computational model systems that simulate cell monolayers and whose results have been extended with experimental data. The Minkowski tensors have been used to explore the role of cell-cell adhesion and density in epithelial cells and have shown similar results to the shape function, a polygonal shape descriptor.

      Weaknesses:

      The authors point out the importance of studying the orientational order in biological systems. However, the current version of the manuscript lacks statistical information, a description of analysis methods, and experimental support. This support is needed to strengthen (i) the results of the two computational models and (ii) give weight to the authors' strong claim against other widely accepted shape descriptors capturing p-atic orders. The Minkowski tensors, which consider the real cell shapes, are reported to be a better method to investigate the p-atic orders of cells than the shape function introduced by Armengol-Collado et al. While there may be differences in the reported results coming from the two different approaches, both approaches show similar trends. As it stands, there is substantiated discussion as to why one method would be better than the other. The shape function, γ<sub>6</sub>, may not be monotonic for great changes in cell activity and deformability, hinting at a potential weakness. In contrast to the shape function and results by Armengol-Collado et al. and Eckert et al., the coarse-grained Minkowski tensors do not capture the hexatic-nematic crossover at the tissue scale, applied here only to computational models. The cells simulated in the computational models have a similar size and the monolayer has a nearly regular pattern, which does not reflect the density variance in biological tissues. To strengthen the author's claim that there is no crossover at the tissue scale, experimental verification is essential. Further, the robustness of the Minkowski tensors seems to rely on determining the p-atic orders on the shape of individual cells in the tissue. However, when applying the shape descriptor to experimental systems, the p-atic orders are very low, perhaps too low for comparisons between different p-atic orders with meaningful conclusions.

    4. Reviewer #3 (Public review):

      Hapel et al. submit an article entitled “Quantifying the shape of cells - from Minkowski tensors to p-atic order”. The paper reports the p-actic quantitative method - established in physics - to extract cell shapes in experiments using phase contrast images of MDCK cells and simulations - vertex model and phase fields. The rationale of the quantification with adaptation of Minkowski tensors, as well as the detailed extraction of distributions of shapes and plots, distributions quantifying shapes are documented, with an emphasis on changes in cell shapes and their importance in epithelial dynamics.

      Higher rank tensors are considered as well as representations with intuitive meanings and q<sub>i</sub> orders and their potential correlations or absence of correlations. For example, q<sub>2</sub> and q<sub>6</sub>, and statements about nematic and hexatic orders. A strong body of evidence is already reported in the papers of Armengol et al., quoted substantially in the paper, and the authors insist on an improvement thanks to the Minkowski tensors approach to challenge the former crossovers correlations statements.

      Although the approach seems to present advantages, the paper does not appear sufficiently novel. Beyond the Armengol et al. paper, the advantages of this approach compared to the shear decomposition (from MPI-PKS Dresden) or the links joining centroids and its neighbours approach (MSC/Curie Paris) for example.

    5. Author response:

      We thank the editors and the reviewers for their valuable comments. In response to these suggestions, we will add rigorous statistical measures and extend the experimental support of our findings in a revised version. Indeed, as we will show, doing so strengthens all the main claims. Specifically:

      Concerning Reviewer 1:

      - It is important to emphasise that the advantage of deriving shape measures q<sub>p</sub> from Minkowski tensors is their robustness and stability, that is well-established from extensive, rigorous mathematical analyses. Introducing q<sub>p</sub> without this connection to revised Minkowski tensors would not allow to claim this stability property for the considered measures.

      - Even though for a polygon the vertex positions contain the whole geometric information, using q<sub>p</sub> and γ<sub>p</sub> lead to different results, see Fig. 6 for an example.

      - We wholeheartedly agree that our statement on independence of values of q<sub>2</sub> and q<sub>6</sub> can be extended and more quantitatively established by rigorous statistical measures. This is exactly what we will do in the revised version, not only providing statistical measures on the presented data, but also extending our analyses to the published data from Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779. As we shall show these analyses further strengthen this claim, unequivocally establishing the independence of q<sub>2</sub> and q<sub>6</sub> in two different models (active vertex model and multiphase-field model), as well as two different sets of experiments (the ones in the original manuscript, and the published one from Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779).

      Concerning Reviewer 2:

      To fully address this point, we have extended our analyses to explore the published data of Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779. As we shall show in the revised manuscript, the crossover between nematic and hexatic is only specific to the use of γ<sub>p</sub> for characterizing the shape and coarse-graining of the associated order. Using q<sub>p</sub> as the shape measure this crossover disappears. Therefore, this analyses concretely demonstrate that the crossover is not a robust physical feature of the system and is dependent on the method used to define shape characteristics.

      Concerning Reviewer 3:

      We respectfully note a misunderstanding from the referee: The briefly mentioned approaches of other groups, turn out to be not measuring shape but connections between cells. Conceptually these approaches are therefore related to bond order parameters. We already comment at the end of the section introducing Minkowski tensors that bond order parameters cannot quantify the shape of a cell. The same argumentation also holds for other such approaches. In our revised version we will further clarify this distinction, to avoid any confusion or misinterpretation.

    1. eLife Assessment

      In this manuscript, the authors analyse the nanoscale localisation of α5β1 and αVβ3 integrins in integrin adhesion complexes (IAC) by dual-colour STORM and assess the spatial organisation at the nano and mesoscale of their main adaptors (paxillin, talin and vinculin). This is an important work that provides detailed analyses that reveal how elements of these complex structures are really organised at the nanoscale, an essential perspective for a better understanding of how IACs function and regulate mechanotransduction processes. The evidence presented is solid, with super-resolution imaging experiments conducted using a single, validated methodology and subsequent computational modelling that enabled a quantitative assessment of the resulting data.

    2. Reviewer #1 (Public review):

      Summary:

      In recent years, it has become increasingly evident how beautifully intricate IAC are at the nanoscale. Studies like the one presented here that shed light on the precise inner organisation of IAC are thus quite important and relevant in order to obtain a better in-depth understanding of IAC functioning and the contribution of different integrin subtypes to cell adhesive and mechanotransductive processes.

      Interestingly, the authors found a distinct localisation of α5β1 and αVβ3 integrin nanoclusters within focal adhesion of human fibroblasts, with α5β1 integrin nanoclusters being at the periphery of IAC and αVβ3 integrin nanoclusters randomly distributed. Furthermore, a surprisingly high percentage of inactive integrins within IAC and relatively low spatial integrin colocalisation with adaptor proteins has been shown.

      Strengths:

      This is a very thoroughly performed STORM-based assessment of the nanodistribution of α5β1 and αVβ3 nanoclusters within IAC (and outside). The image quality is outstanding, and the authors have meticulously executed the experiments and the image analyses.

      Weaknesses:

      The only weakness is maybe that the manuscript remains descriptive. However, the high quality of the "description" of the nano-organisation of IAC by this scrupulous study is really important to better understand the inner workings of IAC. It provides a very solid foundation to look deeper into the (patho)physiological implications of this organisation, see recommendations (which are rather suggestions in this case).

    3. Reviewer #2 (Public review):

      Summary:

      In this study, dual-color super-resolution microscopy analysis was performed to study the co-operation between integrins and focal adhesion proteins in human fibroblast cells. The study focused on two integrins which have been previously found to be mainly responsible for focal adhesions, namely α5β1 and αvβ3.

      Specifically, the study tried to shed light on the nanoclustering of integrins in focal adhesions.

      In the current study, more integrin nanoclusters were observed in focal adhesions compared to other cell-matrix adhesion structures. The study revealed that both α5β1 and αvβ3 form nanoclusters, and those appear segregated from each other. While αvβ3 nanoclusters organize randomly inside focal adhesions regardless of their activation state, α5β1 nanoclusters, and particularly the nanoclusters containing β1-integrin in active conformation, preferentially organized at the edges of focal adhesions. The nanoclusters formed by each integrin were similar in size.

      Cytoplasmic adapter proteins appeared less in nanocluster assemblies, suggesting that integrin nanoclusters are also forming without the studied cytoplasmic adapter proteins (talin, vinculin, paxillin). Active integrins were identified with the help of conformation-specific antibodies, and this enabled us to study the colocalization between integrins and their cytoplasmic adapter proteins. This analysis revealed that activated integrins are strongly engaged with adapter proteins

      Strengths:

      The study stems from the thorough computational modelling of the nanoclusters, which enables quantification of the behavior of the clusters, including their mesoscale distribution.

      The study strengthens the view that α5β1 and αvβ3 have specific functions in focal adhesions, α5β1 nanoclusters localizing preferentially on focal adhesion edges. The study also revealed that nanoclusters localized at the edges of focal adhesion were enriched for talin and paxillin but not for vinculin.

      Analysis of adaptor protein nanoclusters (paxillin, talin, and vinculin) revealed that all adapter protein nanoclusters studied here close to active β1 nanoclusters are enriched on the focal adhesion edge region, whereas integrin adaptor nanoclusters far from active β1 appear to be more uniformly distributed.

      Importantly, the current study suggests that integrin subtype-specific nanoclusters are not only present at an early stage of adhesion formation, but integrin nanoclusters remain segregated from each other also in mature focal adhesions, maintaining their sizes and number of molecules.

      Interestingly, the study revealed that selected cytoplasmic adaptors (paxillin, talin, and vinculin), also form nanoclusters of similar size and number of single molecule localizations as the integrins, regardless of whether they locate inside or outside focal adhesions. The adapter nanoclusters are enriched in the focal adhesion "belt", colocalizing with the active α5β1 integrin nanoclusters.

      Weaknesses:

      The current study is highly dependent on the antibodies. It is possible that antibodies containing two binding sites for antigen influence the nanoscale organization (and also activation) of the receptors. Control experiments to study the possible contribution of antibodies to the measured outcome should be performed to verify the main findings. One possible approach could be to use fluorescently tagged integrins available. Alternatively, integrins (or adapter proteins) could be tagged with a small ligand and detected using a monovalent binder.

      Only a limited number of integrin adapter proteins were investigated. Given the high number of identified adapter proteins, this is an understandable choice. However, it would be fascinating to understand if the nanoclusters of inactive integrins are dominantly bound with a certain adapter protein, such as tensin.

    4. Reviewer #3 (Public review):

      Summary:

      In their study, the authors reveal using dual-color super-resolution STORM microscopy modality and immunolabeling in fixed adherent cells, that β1 and β3 integrins as well as adaptors (paxillin, talin and vinculin) are all organized in nanoclusters of similar size (50nm) and molecular density (20 copy number) inside FAs but also outside. Using activity-specific immunolabeling of β1 and β3 integrins, they revealed that active integrin subpopulations were both clustered but in distinct exclusive nano-aggregates in agreement with Spiess et al. (2018). Once more, the "active" integrin nanoclusters displayed similar properties in terms of size and molecular density, suggesting that molecular organization in nanoclusters is an intrinsic property of integrins in plasma membrane multimerizing independently of their location (inside or outside FAs), their level of activation, or their connection to the cytoskeleton. Then the authors followed up by analyzing at the mesoscale how these "universal" nanoclustered adhesive units are distributed spatially. Inspecting the surface density of nanoclusters revealed that the density of integrin nanoclusters in FAs was 5x larger, compared to integrin nanoclusters outside adhesions. Interestingly, whereas the density of total integrin nanoclusters was 2-4x larger than adaptor nanoclusters, the density of "active" integrin nanoclusters stoichiometrically matches that of talin and vinculin nanoclusters, and was slightly outnumbered by paxillin nanoclusters. These findings suggest that inside FAs, among the total number of integrin nanoclusters, the subset of "active" integrin nanoclusters could be engaged with "adaptor" nanoclusters on a 1:1 ratio. Using analysis of the nearest neighbor distance (NND) between distinct integrin clusters and each of the adaptors, the authors report that they found negligible spatial colocalization of integrins with these adaptor proteins and that spatial segregation is essentially determined by the density of nanoclusters within the FAs. As authors reported that α5β1 and αvβ3 do not intermix at the nanoscale, the authors finally highlighted how α5β1 and αvβ3 distinct nanoclusters are differently organized and segregated inside FAs. Adapting the NND analysis in order to inspect how far the nanoclusters are from the edges of FAs they are located in, authors revealed that α5β1 but not αvβ3 integrin nanoclusters are enriched on FA edges and that similar FA edge-enriched distribution for "active" α5β1 and adaptor protein nanoclusters was found for talin and paxillin but not vinculin. The latter results suggest that FA edges could constitute multiprotein hubs for enhanced colocalization and activation for α5β1 integrin nanoclusters and adaptors such as talin and paxillin. Unfortunately NND analysis could not confirm this enhanced colocalization hypothesis.

      General Assessment:

      While the study presents some valuable findings, it reads currently as a compilation of intriguing but preliminary observations derived primarily from a single methodology (dual-color STORM and DBSCAN clustering analysis). As the initial findings often lack confirmation through additional data analysis (such as the NND analysis the authors used), there's a critical necessity to bolster the methodological approach. This should involve replicating the main findings using alternative single-molecule super-resolution techniques (such as quantitative DNA-PAINT) or employing different clustering analytical tools (such as voronoi-tessellation). Furthermore, the manuscript feels incomplete, focusing solely on describing molecular organization without offering substantial insights into how these observations correlate with the regulation, activation, and functionality of integrins at the cellular level.

      The manuscript presents extensive datasets and utilizes methodologies in which the investigators demonstrate expertise. Nevertheless, there's uncertainty regarding the novelty and broad appeal of the findings. For instance, the observation of integrin nanoclustering has been previously reported in several publications (e.g., Changede et al., Dev Cell 2015; Spiess et al., JCB 2018; Fujiwara et al., JCB 2023). Similarly, the accumulation of specific proteins at the periphery of FAs has been documented elsewhere (e.g., Sun et al., NCB 2016; Stubb et al., NatComm 2019; Nunes-Vicente TCB 2023), as well as the differential dynamic organization of α5β1 and αvβ3 integrins inside FAs (e.g., Rossier et al., NCB 2012). Beyond the universal organization of adhesive proteins, there's a need to identify novel insights that significantly advance the field. One potential avenue could involve pinpointing the molecular determinant controlling the FA edge enrichment of active α5β1 integrins and talin nanoclusters. For instance, could there be an interplay between α5β1 and αvβ3 integrin nanoclusters visible on one's organisation when suppressing the other using deletion (KO) or depletion (SiRNA)? Also, could KANK, which also exhibits enrichment and regulates talin activity (e.g., Sun et al., NCB 2016), play a role in this process? Identifying the molecular players that regulate even partially the mesoscale organization of nanoclusters of proteins would really benefit the breadth of this manuscript.

      Echoing the previous concern, the manuscript described a novel and rather surprising finding related to molecular clustering of adhesion proteins. Indeed, the fact that nanoclusters exhibit uniform size and molecular density regardless of the protein type, location, or activation level is indeed surprising and raises many questions about the methodology used to assess molecular clustering. I feel that the description and characterization of integrin nanoclusters appear incomplete and need to be expanded by comparing different analytical strategies for protein clustering. Furthermore, a lack of the manuscript in its actual form concerns the quantification of integrin numbers inside the observed nanoclusters. I agree that the path from optical microscopy to protein stoichiometry quantification is hard and full of drawbacks. But the authors do not fully address these issues that are extremely important when discussing protein nanoclustering. This quantitative aspect should be discussed.

      First, it is crucial for the authors to carefully examine and discuss in their manuscript whether there are any potential biases or limitations in the experimental techniques (dual-color STORM) or data analysis methods employed (DBSCAN). Second, the authors did not in the current manuscript, but should provide control samples to demonstrate the sensitivity and dynamic range of their experimental strategy.

      In STORM images displayed in Figure S1, the authors highlighted localization clusters detected by DBSCAN as a signature for integrin nanoclusters. But the authors do not discuss the localization spots that were not detected by DBSCAN. Could they be individual integrins? And if so, they should also be considered as useful information? This brings me to another related technical question about how DBSCAN handles the case where fluorescent molecules are blinking. This is important as multiple emissions by a single fluorophore could be detected as a nanocluster of several molecules where it would be an artefact due to the photophysics of the fluorophore. Could the authors comment on these points?

      Also, using isolated and stochastically physisorbed fluorophores (Ab coupled with activator /reporter pairs used in this study) on glass helped define the signature in STORM of a single isolated molecule. To obtain the signature of clustered fluorophores, the authors could use anti-donkey antibodies to cross-link those STORM-specifically labeled Ab as a means to artificially obtain clustered fluorophores. Ultimately, to avoid the bias effect of the glass surfaces on the photophysics of fluorophores and be in the same imaging conditions as for the described nanoclusters, the authors should use model systems composed of multimers of GFP vs. single GFP, immunolabeled with a GFP-binding monoclonal antibody. This will permit evaluation of the cluster signature obtained with DBSCAN analysis of STORM data for single vs. multimers of known stoichiometry. This would constitute an undisputable molecular stoichiometry ruler.

      Due to the surprising finding of the nanoclusters' "universality", it is imperative for the authors to validate the findings through complementary methodologies and analytical tools. This should involve replication of results using alternative super-resolution techniques (quantitative DNA-PAINT) and exploring different clustering algorithms (Voronoï-Tesselation) to ensure the robustness and reliability of the observations.

    5. Author response:

      As a short response to the public reviews, we would like to outline the following planned revisions:

      (1) Address the antibody concerns as indicated by reviewer 1

      (2) Assess the role of tensin (and possibly KANK), as suggested by reviewers 2 and 3, respectively.

      (3) Validate our main experimental findings using alternative super-resolution approaches, including STED to avoid potential blinking artefacts associated to standard STORM, and most possibly DNA-PAINT as a more quantitative technique, as suggested by reviewer 3.

      (4) Implement alternative analytical strategies to DBSCAN, including Voronoi tessellation as suggested by reviewer 3.

      (5) Expanded discussion on the main findings of our work and biological significance.

    1. eLife Assessment

      This paper presents a method for detecting Naegleria fowleri infection, which is almost always fatal, using small RNA from blood. This could be an important advance since early detection might improve treatment outcomes. The mouse work is methodologically solid, but only a very small number of human samples were available for human validation.

    2. Reviewer #1 (Public review):

      Summary:

      Early and accurate diagnosis is critical to treating N. fowleri infections, which often lead to death within 2 weeks of exposure. Current methods-sampling cerebrospinal fluid are invasive, slow, and sometimes unreliable. Therefore, there is a need for a new diagnostic method. Russell et al. address this need by identifying small RNAs secreted by Naegleria fowleri (Figure 1) that are detectable by RT-qPCR in multiple biological fluids including blood and urine. SmallRNA-1 and smallRNA-2 were detectable in plasma samples of mice experimentally infected with 6 different N. fowleri strains, and were not detected in uninfected mouse or human samples (Figure 4). Further, smallRNA-1 is detectable in the urine of experimentally infected mice as early as 24 hours post-infection (Figure 5). The study culminates with testing human samples (obtained from the CDC) from patients with confirmed N. fowleri infections; smallRNA-1 was detectable in cerebrospinal fluid in 6 out of 6 samples (Figure 6B), and in whole blood from 2 out of 2 samples (Figure 6C). These results suggest that smallRNA-1 could be a valuable diagnostic marker for N. fowleri infection, detectable in cerebrospinal fluid, blood, or potentially urine.

      Strengths:

      This study investigates an important problem, and comes to a potential solution with a new diagnostic test for N. fowleri infection that is fast, less invasive than current methods, and seems robust to multiple N. fowleri strains. The work in mice is convincing that smallRNA1 is detectable in blood and urine early in infection. Analysis of patient blood samples suggest that whole blood (but not plasma) could be tested for smallRNA-1 to diagnose N. fowleri infections.

      Weaknesses:

      (1) There are not many N. fowleri cases, so the authors were limited in the human samples available for testing. It is difficult to know how robust this biomarker is in whole blood (only 2 samples were tested, both had detectable smallRNA-1), serum (1 out of 1 sample tested negative), or human urine (presumably there is no material available for testing). This limitation is openly discussed in the last paragraph of the discussion section.

      (2) There seems to be some noise in the data for uninfected samples (Figures 4B-C, 5B, and 6C), especially for those with serum (2E). While this is often orders of magnitude lower than the positive results, it does raise questions about false positives, especially early in infection when diagnosis would be the most useful. A few additional uninfected human samples may be helpful.

    3. Reviewer #2 (Public review):

      Summary:

      The authors sought to develop a rapid and non-invasive diagnostic method for primary amoebic meningoencephalitis (PAM), a highly fatal disease caused by Naegleria fowleri. Due to the challenges of early diagnosis, they investigated extracellular vesicles (EVs) from N. fowleri, identifying small RNA biomarkers. They developed an RT-qPCR assay to detect these biomarkers in various biofluids.

      Strengths:

      (1) This study has a clear methodological approach, which allows for the reproducibility of the experiments.

      (2) Early and Non-Invasive Diagnosis - The identification of a small RNA biomarker that can be detected in urine, plasma, and cerebrospinal fluid (CSF) provides a non-invasive diagnostic approach, which is crucial for improving early detection of PAM.

      (3) High Sensitivity and Rapid Detection - The RT-qPCR assay developed in the study is highly sensitive, detecting the biomarker in 100% of CSF samples from human PAM cases and in mouse urine as early as 24 hours post-infection. Additionally, the test can be completed in ~3 hours, making it feasible for clinical use.

      (4) Potential for Disease Monitoring - Since the biomarker is detectable throughout the course of infection, it could be used not only for early diagnosis but also for tracking disease progression and monitoring treatment efficacy.

      (5) Strong Experimental Validation - The study demonstrates biomarker detection across multiple sample types (CSF, urine, whole blood, plasma) in both animal models and human cases, providing robust evidence for its clinical relevance.

      (6) Addresses a Critical Unmet Need - With a >97% case fatality rate, PAM urgently requires improved diagnostics. This study provides one of the first viable liquid biopsy-based diagnostic approaches, potentially transforming how PAM is detected and managed.

      Weaknesses:

      (1) Limited Human Sample Size - While the biomarker was detected in 100% of CSF samples from human PAM cases, the number of human samples analyzed (n=6 for CSF) is relatively small. A larger cohort is needed to validate its diagnostic reliability across diverse populations.

      (2) Lack of Pre-Symptomatic or Early-Stage Human Data - Although the biomarker was detected in mouse urine as early as 24 hours post-infection, there is no data on whether it can be reliably detected before symptoms appear in humans, which is crucial for early diagnosis and treatment initiation.

      (3) Plasma Detection Challenges - While the biomarker was detected in whole blood, it was not detected in human plasma, which could limit the ease of clinical implementation since plasma-based diagnostics are more common. Further investigation is needed to understand why it is absent in plasma and whether alternative blood-based approaches (e.g., whole blood assays) could be optimized.

    1. eLife Assessment

      The study investigates an emerging research field: the interaction between sleep and development. The authors use Drosophila larvae sleep as a study model and provide valuable insight into how neuropeptide circuitry controls larvae sleep. By using a broad range of behaviour and imaging methods and analysis, the authors conclude a sleep regulatory neural pathway of Hugin-PK2-Dilps in the Drosophila neurosecretory centre IPC. However, the evidence that supports this pathway is incomplete - in particular, the methodology in sleep measurement and the specificity at each step of the Hugin-PK2-Dilps pathway require further clarifying experiments or explanation.

    2. Reviewer #1 (Public review):

      Summary:

      The study investigates how neuropeptidergic signaling affects sleep regulation in Drosophila larvae. The authors first conduct a screen of CRISPR knock-out lines of genes encoding enzymes or receptors for neuropeptides and monoamines. As a result of this screen, the authors follow up on one hit, the hugin receptor, PK2-R1. They use genetic approaches, including mutants and targeted manipulations of PK2-R1 activity in insulin-producing cells (IPCs) to increase total sleep amounts in 2nd instar larvae. Similarly, dilp3 and dilp5 null mutants and genetic silencing of IPCs show increases in sleep. The authors also show that hugin mutants and thermogenetic/optogenetic activation of hugin-expressing neurons caused reductions in sleep. Furthermore, they show through imaging-based approaches that hugin-expressing neurons activate IPCs. A key finding is that wash-on of hugin peptides, Hug-γ and PK-2, in ex vivo brain preparations activates larval IPCs, as assayed by CRTC::GFP imaging. The authors then examine how the PK2-R1, hugin, and IPC manipulations affect adult sleep. Finally, the authors examine how Ca2+ responses through CRTC::GFP imaging in adult IPCs are influenced by the wash-on of hugin peptides. The conclusions of this paper are somewhat well supported by data, but some aspects of the experimental approach and sleep analysis need to be clarified and extended.

      Strengths:

      (1) This paper builds on previously published studies that examine Drosophila larval sleep regulation. Through the power of Drosophila genetics, this study yields additional insights into what role neuropeptides play in the regulation of Drosophila larval sleep.

      (2) This study utilizes several diverse approaches to examine larval and adult sleep regulation, neural activity, and circuit connections. The impressive array of distinct analyses provides new understanding into how Drosophila sleep-wake circuitry in regulated across the lifespan.

      (3) The imaging approaches used to examine IPC activation upon hugin manipulation (either thermogenetic activation or wash-on of peptides) demonstrate a powerful approach for examining how changes in neuropeptidergic signaling affect downstream neurons. These experiments involve precise manipulations as the authors use both in vivo and ex vivo conditions to observe an effect on IPC activity.

      Weaknesses:

      Although the paper does have some strengths in principle, these strengths are not fully supported by the experimental approaches used by the authors. In particular:

      (1) The authors show total sleep amount over an 18-hour period for all the measures of 2nd instar larval sleep throughout the paper. However, published studies have shown that sleep changes over the course of 2nd instar development, so more precise time windows are necessary for the analyses in this study.

      (2) Previously published reports of sleep metrics in both Drosophila larvae and adults include the average number of sleep episodes (bout number) and the average length of sleep episodes (bout length). Neither of these metrics is included in the paper for either the larval sleep or adult sleep data. Not including these metrics makes it difficult for readers to compare the findings in this study to previously published papers in the established Drosophila sleep literature.

      (3) Because Drosophila adult & larval sleep is based on locomotion, the authors need to show the activity values for the experiments supporting their key conclusions. They do show travel distances in Figure 2 - Figure Supplement 1, however, it is not clear how these distances were calculated or how the distances relate to the overall activity of individual larvae during sleep experiments. It is also concerning that inactivation of the PK2-R1-expressing neurons causes a reduction in locomotion speed. This could partially explain the increase in sleep that they observe.

      (4) The authors rely on homozygous mutant larvae and adult flies to support many of their conclusions. They also rely on Gal4 lines with fairly broad expression in the Drosophila brain to support their conclusions. Adding more precise tissue-specific manipulations, including thermogenetic activation and inhibition of smaller populations of neurons in the study would be needed to increase confidence in the presented results. Similarly, demonstrating that larval development and feeding are not affected by the broad manipulations would strengthen the conclusions.

      (5) Many of the experiments presented in this study would benefit from genetic and temperature controls. These controls would increase confidence in the presented results.

      (6) The authors claim that their findings in larvae uncover the circuit basis for larval sleep regulation. However, there is very little comparison to published studies demonstrating that neuropeptides like Dh44 regulate larval sleep. Because hugin-expressing neurons have been shown to be downstream of Dh44 neurons, the authors need to include this as part of their discussion. The authors also do not explain why other neuropeptides in the initial screen are not pursued in the study. Given the effect that these manipulations have on larval sleep in their initial screen, it seems likely that other neuropeptidergic circuits regulate larval sleep.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines larval sleep patterns and compares them to sleep regulation in adult flies. The authors demonstrate hallmark sleep characteristics in larvae, including sleep rebound and increased arousal thresholds. Through genetic and behavioral analyses, they identify PK2-R1 as a key receptor involved in sleep modulation, likely via the HuginPC-IPC signaling pathway. Loss of PK2-R1 results in increased sleep, which aligns with previous findings in hugin knockout mutants. While the study presents significant contributions to the field, further investigation is needed to address discrepancies with earlier research and strengthen mechanistic claims.

      Strengths:

      (1) The study explores a relatively understudied aspect of sleep regulation, focusing on larval development.

      (2) The use of an automated behavioral measurement system ensures precise quantification of sleep patterns.

      (3) The findings provide strong genetic and behavioral evidence supporting the role of the HuginPC-IPC pathway in sleep regulation.

      (4) The study has broader implications for understanding the evolution and functional divergence of sleep circuits.

      Weaknesses:

      (1) The manuscript does not sufficiently discuss previous studies, particularly concerning hugin mutants and their metabolic effects.

      (2) The specificity of IPC secretion mechanisms is unclear, particularly regarding potential indirect effects on Dilp2.

      (3) Alternative circuits, such as the HuginPC-DH44 pathway, require further consideration.

      (4) Functional connectivity between HuginPC neurons and IPCs is not directly validated.

      (5) Developmental differences in sleep regulatory mechanisms are not thoroughly examined.

    4. Reviewer #3 (Public review):

      Summary:

      Sleep affects cognition and metabolism, evolving throughout development. In mammals, infants have fast sleep-wake cycles that stabilize in adults via circadian regulation. In this study, the author performed a genetic screen for neurotransmitters/peptides regulating sleep and identified the neuropeptide Hugin and its receptor PK2-R1 as essential components for sleep in Drosophila larvae. They showed that IPCs express Pk2-R1 and silencing IPCs resulted in a significant increase in the sleep amount, which was consistent with the effect they observed in PK2-R1 knock-out mutants. They also showed that Hugin peptides, secreted by a subset of Hugin neurons (Hug-PC), activate IPCs through the PK2-R1 receptor. This activation prompts IPCs to release insulin-like peptides (Dilps), which are implicated in the modulation of sleep. They showed that Hugin peptides induce a PK2-R1 dependent calcium (Ca²⁺) increase in IPCs, which they linked to the release of Dilp3, showing a connection between Hugin signaling to IPCs, Dilp3 release, and sleep regulation. Additionally, the activation of Hug-PC neurons reduced sleep amounts, while silencing them had the opposite effect. In contrast to the larval stage, the Hugin/PK2-R1 axis was not critical for sleep regulation in Drosophila adults, suggesting that this neuropeptidergic circuitry has divergent roles in sleep regulation across different stages of development.

      Strengths:

      This study used an updated system for sleep quantification in Drosophila larvae, and this method allowed precise measurement of larval sleep patterns which is essential for the understanding of sleep regulation.

      The authors performed unbiased genetics screening and successfully identified novel regulators for larval sleep, Hugin and its receptor PK2-R1, making a substantial contribution to the understanding of neuropeptidergic control of sleep regulation.

      They clearly demonstrated the mechanism by which Hugin-expressing neurons influence sleep through the activation of IPCs via PK2-R1 with Ca2+ responses and can modulate sleep.

      Based on the demonstrated activation of PK2-R1 by the human Hugin orthologue Neuromedin U, research on human sleep disorders may benefit from the discoveries from Drosophila since sleep-regulating mechanisms are conserved across species.

      Weaknesses:

      The study primarily focused on sleep regulation in Drosophila larvae, showing that the Hugin/PK2-R1 axis is critical for larval sleep but not necessary for adult sleep. The effects of the Hugin axis in the adult are, however, incompletely explained and somewhat inconsistent. PK2-R1 knockout adults also display increased sleep, as does HugPC silencing, at least for daytime sleep. The difference lies in Dilp3/5 mutant animals showing decreased sleep and IPCs seemingly responding with reduced Dilp3 release to PK-2 treatment (Figure 6). It seems difficult to reconcile the author's conclusions regarding this point without additional data. It could be argued that PK2-R1 still regulates adult sleep, but not via Hugin and IPCs/Dilps.

      Another issue might be that the authors show relative sleep levels for adults using Trikinetics monitoring. From the methods, it is not clear if the authors backcrossed their line to an isogenic wild-type background to normalize for line-specific effects on sleep. Thus, it is likely that each line has differences in total sleep time due to background effects, e.g., their Kir2.1 control line showed reduced sleep relative to the compared genotypes. This might limit the conclusions on the role of Hugin/PK2-R1 on adult sleep.

    1. eLife Assessment

      This valuable study presents an alternative platform for nanobody discovery using phage-displayed synthetic libraries. The evidence supporting the platform is compelling, which is used to isolate and validate nanobodies targeting Drosophila secreted proteins. By making this library openly accessible, this provides an excellent resource to the wider scientific community. The detailed protocol used in this manuscript, associated with various methods for nanobody screening, provides an alternative and reliable platform for nanobody discovery.

    2. Reviewer #1 (Public review):

      Summary:

      Using highly specific antibody reagents for biological research is of prime importance. In the past few years, novel approaches have been proposed to gain easier access to such reagents. This manuscript describes an important step forward toward the rapid and widespread isolation of antibody reagents. Via the refinement and improvement of previous approaches, the Perrimon lab describes a novel phage-displayed synthetic library for nanobody isolation. They used the library to isolate nanobodies targeting Drosophila secreted proteins. They used these nanobodies in immunostainings and immunoblottings, as well as in tissue immunostainings and live cell assays (by tethering the antigens on the cell surface).

      Since the library is made freely available, it will contribute to gaining access to better research reagents for non-profit use, an important step towards the democratisation of science.

      Strengths:

      (1) New design for a phage-displayed library of high content.

      (2) Isolation of valuble novel tools.

      (3) Detailed description of the methods such that they can be used by many other labs.

      Weaknesses:

      My comments largely concentrate on the representation of the data in the different Figures.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors propose an alternative platform for nanobody discovery using a phage-displayed synthetic library. The authors relied on DNA templates originally created by McMahon et al. (2018) to build the yeast-displayed synthetic library. To validate their platform, the authors screened for nanobodies against 8 Drosophila secreted proteins. Nanobody screening has been performed with phage-displayed nanobody libraries followed by an enzyme-linked immunosorbent assay (ELISA) to validate positive hits. Nanobodies with higher affinity have been tested for immunostaining and immunoblotting applications using Drosophila adult guts and hemolymph, respectively.

      Strengths:

      The authors presented a detailed protocol with various and complementary approaches to select nanobodies and test their application for immunostaining and immunoblotting experiments. Data are convincing and the manuscript is well-written, clear, and easy to read.

      Weaknesses:

      On the eight Drosophila secreted proteins selected to screen for nanobodies, the authors failed to identify nanobodies for three of them. While the authors mentioned potential improvements of the protocol in the discussion, none of them have been tested in this manuscript.

      The same comment applies to the experiments using membrane-tethered forms of the antigens to test the affinity of nanobodies identified by ELISA. Many nanobodies fail to recognize the antigens. While authors suggested a low affinity of these nanobodies for their antigens, this hypothesis has not been tested in the manuscript.

      Improving the protocol at each step for nanobody selection would greatly increase the success rate for the discovery of nanobodies with high affinity.

    1. eLife Assessment

      This valuable study determines the functional requirements for localization and activity of S. cerevisiae septin-associated kinases using in vivo imaging, in vitro and in vivo protein-protein interaction assays, and an instructive in vivo "tethering" approach. In addition to confirming previous results, the study offers evidence that the septin-associated kinases may directly interact with the contractile ring machinery. Although the experiments appear to have been conducted correctly, the quantitative analysis of some experiments is incomplete and should be improved to strengthen the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors wanted to better understand how the various septin-associated kinases contribute to septin organization and function in budding yeast. This question has been recently addressed by similar kinds of studies but there are still some open questions, particularly as regards to what extent the kinases may interact with and/or modify components of the contractile ring that drives cytokinesis.

      Strengths:

      This study uses sensitive imaging with good temporal and spatial resolution to monitor the localization of various proteins in living cells. Particularly informative is the use of a GFP/GFP-binding-protein "tethering" approach to ask if the requirement for one protein can be bypassed by physically tethering another protein to a third protein. Results from a yeast two-hybrid assay for measuring protein-protein interactions in vivo are buttressed by direct in vitro binding assays using purified proteins, which is important given the likelihood of "bridging" interactions between yeast proteins in the two-hybrid approach. The authors' conclusions are quite well supported by the data.

      Weaknesses:

      A control for non-specific binding is missing from the in vitro binding assay. The figures suffer sometimes from the very small text in the labels, which obscures understanding. Ultimately, while the study provides some interesting and novel insights, we still don't understand which phosphorylation events on which proteins are important for the events occurring at the molecular level, so the advance in knowledge is somewhat incremental.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Bhojappa et al. provide insights into the function of septin-related kinases Elm1, Gin4, Hsl1, and Kcc4 in septin organization and actomyosin ring (AMR) structure and constriction. Their findings are both corroborative of and complementary to previous related studies.

      First, the authors provide a comparative analysis of the dynamic localization of these kinases at the bud neck, as well as a comparative analysis of defects in septin localization, splitting dynamics, AMR constriction rates, and cell morphology in kinase-deficient cells. They find that septin localization and splitting kinetics, as well as AMR constriction rates, are significantly perturbed in elm1∆ and gin4∆ mutants but remain largely unaffected in hsl1∆ and kcc4∆. A similar trend is observed in terms of cell morphology and viability.

      Next, the authors focus on elm1∆ and gin4∆ cells, demonstrating that the residence time of the F-BAR protein Hof1 is significantly increased and defective in these mutants. Using yeast two-hybrid (Y2H) and in vitro binding assays, they show that the KA1 domain of Gin4 interacts with the F-BAR domain of Hof1, which may explain the cytokinesis-related functions of Elm1 and Gin4. Supporting this, they find that Gin4's role in septin localization, AMR constriction kinetics, and Hof1 bud neck localization is kinase-independent.

      The authors then conduct a series of artificial tethering experiments given their bud neck localization is mostly interdependent. They first demonstrate that artificially tethering Gin4 to the bud neck rescues the morphology defects of elm1∆ cells, with the strongest rescue observed when Gin4 was forced to interact with Hsl1-an effect that was also kinase-independent. Additionally, artificial tethering of Hsl1 to the bud neck restores the morphology of elm1∆ cells in a KA1 domain-dependent manner, suggesting that Hsl1 functions downstream of Elm1 to maintain normal cell morphology. Consistently, artificial tethering of Elm1 to the bud neck in gin4∆ cells rescues morphology defects, as well as defects in Myo1 localization and AMR constriction, but only in the presence of full-length Hsl1. The rescue fails in the absence of Hsl1 or when using a version of Hsl1 lacking the KA1 domain, which supports the role of Hsl1 downstream to Elm1 in cytokinesis.

      Strengths

      Altogether, this study offers valuable insights into the mode of cytokinesis regulation mediated by the septin-related kinases, mainly Elm1, Gin4, and Hsl1, and would be an important contribution to the field of septins and cytokinesis after addressing current weaknesses.

      Weaknesses

      (1) When assessing rescue of the elm1∆ phenotype, it needs to become clearer whether only morphology or also cytokinesis and septin organization are rescued.

      (2) The quantification of the microscopy data does not always match up with the example images, and it's not always clear how the authors quantitatively analyzed their data.

      (3) The forced tethering data are key to the paper, but the lack of a summarizing table makes it difficult to grasp the full picture.

      (4) Novel results and those confirming earlier results could be better distinguished.

    4. Reviewer #3 (Public review):

      Summary:

      The study by Bhojappa et al. brings new and interesting elements about the stability of the septin ring and the crosstalk between septin and actomyosin ring assemblies. The study focuses on the four kinases associated with the septin ring, Elm1p, Gin4p, Hsl1p, and Kcc4p. Elm1 and Gin4 show strong knock-out phenotypes, whereas Hsl1p and Kcc4p show weak knock-out phenotypes. The Elm1p/Kccp1p and Gin4p/Hsl1p pairs show similar timing at the bud neck. While these kinases share redundant functions, Gin4 appears to have a unique interaction with the BAR domain protein Hof1, revealing a novel direct interaction between the septin and actomyosin rings. Interestingly, the kinase activity of Gin4 is not required for its role in septin organisation and AMR constriction. The last part of the manuscript shows an original protein tethering protocol used to show that Hsl1 and its membrane binding ability are required for phenotype rescue of gin4null cells.

      Strengths:

      The combination of genetics, cell imaging, and biochemical characterization of protein-protein interactions is attractive.

      Weaknesses:

      (1) Imaging and data analysis is the main weakness of this manuscript. The authors must avoid manual counting and selection when easy analysis software can be used to limit bias. Instead of presenting unclear statistics of "percentage phenotypes", they need to define clear metrics to offer meaningful phenotype analysis.

      (2) This manuscript examines a very complex mechanism with four kinases of overlapping function using new data and existing literature. A clearer picture/model at the end of the manuscript that synthesizes the current knowledge would be beneficial.