10,000 Matching Annotations
  1. Aug 2025
    1. eLife Assessment

      This manuscript by Li, Lu et al., presents important findings on the role of cDC1 in atherosclerosis and their influence on the adaptive immune system. Using Xcr1Cre-Gfp Rosa26LSL-DTA ApoE-/- mouse models, these data convincingly reveal an unexpected, non-redundant role of the XCL1-XCR1 axis in mediating cDC1 contributions to atherosclerosis.

    2. Reviewer #1 (Public review):

      Summary:

      In this study by Li et al., the authors re-investigated the role of cDC1 for atherosclerosis progression using the ApoE model. First, the authors confirmed the accumulation of cDC1 in atherosclerotic lesions in mice and humans. Then in order to examine the functional relevance of this cell type, the authors developed a new mouse model to selectively target cDC1. Specifically, they inserted the Cre recombinase directly after the start codon of endogenous XCR1 gene, thereby avoiding off-target activity. Following validation of this model, the authors crossed it with ApoE-deficient mice and found a striking reduction of aortic lesions (numbers and size) following high fat diet. The authors further characterized the impact of cDC1 depletion on lesional T cells and their activation state. Also, they provide in-depth transcriptomic analyses of lesional in comparison to splenic and nodal cDC1. These results imply cellular interactions between lesion T cells and cDC1. Finally, the authors show that the chemokine XCL1, which is produced by activated CD8 T cells (and NK cells) plays a key role for the interaction with XCR1-expressing cDC1 and particularly for the atherosclerotic disease progression.

      Strengths:

      The surprising results on XCL1 represent a very important gain in knowledge. The role of cDC1 is clarified with a new genetic mouse model.

      Comments on revised version:

      The authors have addressed my concerns in the revised version of this manuscript.

    3. Reviewer #2 (Public review):

      This study investigates the role of cDC1 in atherosclerosis progression using Xcr1Cre-Gfp Rosa26LSL-DTA ApoE-/- mice. The authors demonstrate that selective depletion of cDC1 reduces atherosclerotic lesions in hyperlipidemic mice. While cDC1 depletion did not alter macrophage populations, it suppressed T cell activation (both CD4+ and CD8+ subsets) within aortic plaques. Further, targeting the chemokine Xcl1 (ligand of Xcr1) effectively inhibits atherosclerosis. The manuscript is well-written, and data are clearly presented. The data provided in the article can well support the author's conclusion.

      Comments on revised version:

      The authors have addressed all previous concerns and made appropriate revisions to the data. I have no further questions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this study by Li et al., the authors re-investigated the role of cDC1 for atherosclerosis progression using the ApoE model. First, the authors confirmed the accumulation of cDC1 in atherosclerotic lesions in mice and humans. Then, in order to examine the functional relevance of this cell type, the authors developed a new mouse model to selectively target cDC1. Specifically, they inserted the Cre recombinase directly after the start codon of the endogenous XCR1 gene, thereby avoiding off-target activity. Following validation of this model, the authors crossed it with ApoE-deficient mice and found a striking reduction of aortic lesions (numbers and size) following a high-fat diet. The authors further characterized the impact of cDC1 depletion on lesional T cells and their activation state. Also, they provide in-depth transcriptomic analyses of lesional in comparison to splenic and nodal cDC1. These results imply cellular interactions between lesion T cells and cDC1. Finally, the authors show that the chemokine XCL1, which is produced by activated CD8 T cells (and NK cells), plays a key role in the interaction with XCR1-expressing cDC1 and particularly in the atherosclerotic disease progression.<br /> Strengths:

      The surprising results on XCL1 represent a very important gain in knowledge. The role of cDC1 is clarified with a new genetic mouse model.

      Thank you

      Weaknesses:

      My criticism is limited to the analysis of the scRNAseq data of the cDC1. I think it would be important to match these data with published data sets on cDC1. In particular, the data set by Sophie Janssen's group on splenic cDC1 might be helpful here (PMID: 37172103; https://www.single-cell.be/spleen_cDC_homeostatic_maturation/datasets/cdc1). It would be good to assign a cluster based on the categories used there (early/late, immature/mature, at least for splenic DC).

      Thank you very much for your help. Using the scRNA seq data of Xcr1<sup>+</sup> cDC1 sorted from ApoE<sup>–/–</sup> mice, we re-annotated the populations, following the methodology proposed by Sophie Janssen's group. These results are presented in Figure S9 and Figure S10 and described in detail in the Results and Discussion section.

      Please refer to the Results section from line 264 to 284: “Using the scRNA seq data of Xcr1<sup>+</sup> cDC1 sorted from hyperlipidemic mice, we annotated the 10 populations as shown in Figure S9A, following the methodology from a previous study [41]. Ccr7<sup>+</sup> mature cDC1s (Cluster 3, 7 and 9) and Ccr7- immature cDC1s (remaining clusters) were identified across cDC1 cells sorted from aorta, spleen and lymph nodes (Figure S9B). Further stratification based on marker genes reveals that Cluster 10 is the pre-cDC1, with high expression level of CD62L (Sell) and low expression level of CD8a (Figure S9C). Cluster 6 and 8 are the proliferating cDC1s, which express high level of cell cycling genes Stmn1 and Top2a (Figure S9D). Cluster 1 and 4 are early immature cDC1s, and cluster 2 and 5 are late immature cDC1s, according to the expression pattern of Itgae, Nr4a2 (Figure S9E). Cluster 9 cells are early mature cDC1s, with elevated expression of Cxcl9 and Cxcl10 (Figure S9F). Cluster 3 and 7 as late mature cDC1s, characterized by the expression of Cd63 and Fscn1 (Figure S9G). As shown in Figure 5C and Figure S9, the 10 populations displayed a major difference of aortic cDC1 cells that lack in pre-cDC1s (cluster 10) and mature cells (cluster 3, 7 and 9). Interestingly, in hyperlipidemic mice splenic cDC1 possess only Cluster 3 as the late mature cells while the lymph node cDC1 cells have two late mature populations namely Cluster 3 and Cluster 7. In further analysis, we also compared splenic cDC1 cells from HFD mice to those from ND mice. As shown in Figure S10, HFD appears to impact early immature cDC1-1 cells (Cluster 1) and increases the abundance of late immature cDC1 cells (Cluster 2 and 5), regardless of the fact that all 10 populations are present in two origins of samples. We also found that Tnfaip3 and Serinc3 are among the most upregulated genes, while Apol7c and Tifab are downregulated in splenic cDC1 cells sorted from HFD mice”.  

      Please refer to the Discussion section from line 380 to 385: “Based on the maturation analysis of the cDC1 scRNA seq data [41], our findings suggest that the aortic cDC1 cells display a major difference from those of spleen and lymph nodes by lacking the mature clusters, whereas lymph node cDC1 cells contain an additional Fabp5<sup>+</sup> S100a4<sup>+</sup> late mature Cluster. Our results also suggest that hyperlipidemia contributes to alteration in early immature cDC1 and in the abundance of late immature cDC1 cells, which was associated with dramatic change in gene expression of Tnfaip3, Serinc3, Apol7c and Tifab”.

      Reviewer #2 (Public review):

      This study investigates the role of cDC1 in atherosclerosis progression using Xcr1Cre-Gfp Rosa26LSL-DTA ApoE-/- mice. The authors demonstrate that selective depletion of cDC1 reduces atherosclerotic lesions in hyperlipidemic mice. While cDC1 depletion did not alter macrophage populations, it suppressed T cell activation (both CD4+ and CD8+ subsets) within aortic plaques. Further, targeting the chemokine Xcl1 (ligand of Xcr1) effectively inhibits atherosclerosis. The manuscript is well-written, and the data are clearly presented. However, several points require clarification:

      (1) In Figure 1C (upper plot), it is not clear what the Xcr1 single-positive region in the aortic root represents, or whether this is caused by unspecific staining. So I wonder whether Xcr1 single-positive staining can reliably represent cDC1. For accurate cDC1 gating in Figure 1E, Xcr1+CD11c+ co-staining should be used instead.

      The observed false-positive signal in the wavy structures within immunofluorescence Figure 1C (upper panel) results from the strong autofluorescence of elastic fibers, a major vascular wall component (alongside collagen). This intrinsic property of elastic fibers is a well-documented confounder in immunofluorescence studies [A, B].

      In contrast, immunohistochemistry (IHC) employs an enzymatic chromogenic reaction (HRP with DAB substrate) that generates a brown precipitate exclusively at antigen-antibody binding sites. Importantly, vascular elastic fibers lack endogenous enzymatic activity capable of catalyzing the DAB reaction, thereby preventing this source of false positivity in IHC.

      Given that Xcr1 is exclusively expressed on conventional type 1 dendritic cells [C], and considering that IHC lacks the multiplexing capability inherent to immunofluorescence for antigen co-localization, single-positive Xcr1 staining reliably identifies cDC1s in IHC results.

      [A] König, K et al. “Multiphoton autofluorescence imaging of intratissue elastic fibers.” Biomaterials vol. 26,5 (2005): 495-500. doi:10.1016/j.biomaterials.2004.02.059

      [B] Andreasson, Anne-Christine et al. “Confocal scanning laser microscopy measurements of atherosclerotic lesions in mice aorta. A fast evaluation method for volume determinations.” Atherosclerosis vol. 179,1 (2005): 35-42. doi:10.1016/j.atherosclerosis.2004.10.040

      [C] Dorner, Brigitte G et al. “Selective expression of the chemokine receptor XCR1 on cross-presenting dendritic cells determines cooperation with CD8+ T cells.” Immunity vol. 31,5 (2009): 823-33. doi:10.1016/j.immuni.2009.08.027

      (2) Figure 4D suggests that cDC1 depletion does not affect CD4+/CD8+ T cells. However, only the proportion of these subsets within total T cells is shown. To fully interpret effects, the authors should provide:

      (a) Absolute numbers of total T cells in aortas.

      (b) Absolute counts of CD4+ and CD8+ T cells.

      Thanks for your suggestions. We agree that assessing both proportions and absolute numbers in Figure 4 provides a more complete picture of the effects of cDC1 depletion on T cell populations. Furthermore, we also add the absolute count of cDC1 cells and total T cells, and CD44 MFI (mean fluorescence intensity) in CD4<sup>+</sup> and CD8<sup>+</sup> T cells in Figure 4, and supplemented corresponding textual descriptions in the revised manuscript.

      Please refer to the Results section from line 183 to 187: “Subsequently, we assessed T cell phenotype in the two groups of mice. While neither the frequencies nor absolute counts of aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells differed significantly between two groups of mice (Figure 4D-F), CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcr1<sup>+</sup> cDC1 depleted mice compared to controls (Figure 4G and H)”.

      (3) How does T cell activation mechanistically influence atherosclerosis progression? Why was CD69 selected as the sole activation marker? Were other markers (e.g., KLRG1, ICOS, CD44) examined to confirm activation status?

      We sincerely appreciate these insightful comments. As extensively documented in the literature, activated effector T cells (both CD4+ and CD8+) critically promote plaque inflammation and instability through their production of pro-inflammatory cytokines (particularly IFN-γ and TNF-α), which drive endothelial activation, exacerbate macrophage inflammatory responses, and impair smooth muscle cell function [A].

      In our study, we specifically investigated the role of cDC1 cells in atherosclerosis progression. Our key findings demonstrate that cDC1 depletion attenuates T cell activation (as shown by reduced CD69/CD44 expression) and that this reduction in activation is functionally linked to the observed decrease in atherosclerosis burden in our model. 

      Regarding CD44 as an activation marker, we performed quantitative analyses of CD44 mean fluorescence intensity (MFI) in aortic T cells (Figure 4). Importantly, the MFI of CD44 was significantly lower on both CD4+ and CD8+ T cells from Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> mice compared to the control ApoE<sup>–/–</sup> mice (data shown below), which is consistent with the result of CD69 in Figure 4. We added the related description in the Result section.

      Please refer to the Results section from line 185 to 187 “CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4+ and CD8+ T cells from Xcr1+ cDC1 depleted mice compared to controls (Figure 4G and H)”.

      Similarly, MFI of CD44 was significantly lower on both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> mice compared to the control ApoE<sup>–/–</sup> mice (data shown below), which is consistent with the result of CD69 in Figure 7. We also added the related description in the Result section.

      Please refer to the Results section from line 308 to 309 “Crucially, CD69<sup>+</sup> frequency and CD44 MFI remained comparable in both aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells between two groups (Figure 7D-F).”

      [A] Hansson, Göran K, and Andreas Hermansson. “The immune system in atherosclerosis.” Nature immunology vol. 12,3 (2011): 204-12. doi:10.1038/ni.2001

      (4) Figure 7B: Beyond cDC1/2 proportions within cDCs, please report absolute counts of: Total cDCs, cDC1, and cDC2 subsets. Figure 7D: In addition to CD4+/CD8+ T cell proportions, the following should be included:

      (a) Total T cell numbers in aortas

      (b) Absolute counts of CD4+ and CD8+ T cells.

      Thanks for your suggestions. We have now included in Figure 7 the absolute counts of cDC, cDC1, and cDC2 cells, along with CD4<sup>+</sup> and CD8<sup>+</sup> T cells in aortic tissues. Additionally, we provide the corresponding CD44 mean fluorescence intensity (MFI) measurements for both CD4<sup>+</sup> and CD8<sup>+</sup> T cell populations. We added the related description in the Result section.

      Please refer to the Results section from line 303 to 311: “The flow cytometric results illustrated that both frequencies and absolute counts of Xcr1<sup>+</sup> cDC1 cells in the aorta were significantly reduced, but cDCs and cDC2 cells from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> were comparable with that from ApoE<sup>–/–</sup> (Figure 7A-C). Moreover, in both lymph node and spleen, the absolute numbers of pDC, cDC1 and cDC2 from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> were comparable with that from ApoE<sup>–/–</sup> (Figure S11). Crucially, CD69<sup>+</sup> frequency and CD44 MFI remained comparable in both aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells between two groups (Figure 7D-F). However, aortic CD8<sup>+</sup> T cells exhibited reduced frequency and absolute count, while CD4<sup>+</sup> T cells showed increased frequency but unchanged counts in Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> mouse versus controls (Figure 7G and H).”

      (5) cDC1 depletion reduced CD69+CD4+ and CD69+CD8+ T cells, whereas Xcl1 depletion decreased Xcr1+ cDC1 cells without altering activated T cells. How do the authors explain these different results? This discrepancy needs explanation.

      We sincerely appreciate your professional and insightful comments regarding the mechanistic relationship between cDC1 depletion and T cell activation. Direct cDC1 depletion in the Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> micmodel removes both recruited and tissue-resident cDC1s, eliminating their multifunctional roles in antigen presentation, co-stimulation and cytokine secretion essential for T cell activation. In contrast, Xcl1 depletion reduces, but does not eliminate cDC1 migration into plaques. Furthermore, alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue cDC1 recruitment [13, 68, 69], and non-cDC1 APCs (e.g., monocytes, cDC2s) may compensate for T cell activation [55, 70]. We emphasize that Xcl1 depletion specifically failed to alter T cell activation in hyperlipidemic ApoE<sup>–/–</sup> mice. However, its impact may differ in other pathophysiological contexts due to compensatory mechanisms. We thank you again for highlighting this nuance, which strengthens our mechanistic interpretation. We have added these points to the discussion section and included new references.

      Please refer to the Discussion section from line 407 to 413: “Notably, while complete ablation of Xcr1<sup>+</sup> cDC1s impaired T cell activation, reduction of Xcr1<sup>+</sup> cDC1 recruitment via Xcl1 deletion did not significantly compromise this process. This discrepancy may arise through compensatory mechanisms: alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue Xcr1<sup>+</sup> cDC1 homing [13, 68, 69], while non-cDC1 antigen-presenting cells (e.g., monocytes, cDC2s) may sustain T cell activation [55, 70]. Furthermore, tissue-specific microenvironment factors could potentially modulate its role in other diseases.”. [13] Eisenbarth, S C. “Dendritic cell subsets in T cell programming: location dictates function.” Nature reviews. Immunology vol. 19,2 (2019): 89-103. doi:10.1038/s41577-018-0088-1 [55] Brewitz, Anna et al. “CD8+ T Cells Orchestrate pDC-XCR1+ Dendritic Cell Spatial and Functional Cooperativity to Optimize Priming.” Immunity vol. 46,2 (2017): 205-219. doi:10.1016/j.immuni.2017.01.003 [68] de Oliveira, Carine Ervolino et al. “CCR5-Dependent Homing of T Regulatory Cells to the Tumor Microenvironment Contributes to Skin Squamous Cell Carcinoma Development.” Molecular cancer therapeutics vol. 16,12 (2017): 2871-2880. doi:10.1158/1535-7163.MCT-17-0341.[69] He F, Wu Z, Liu C, Zhu Y, Zhou Y, Tian E, et al. Targeting BCL9/BCL9L enhances antigen presentation by promoting conventional type 1 dendritic cell (cDC1) activation and tumor infiltration. Signal Transduct Target Ther. 2024;9(1):139. Epub 2024/05/30. doi: 10.1038/s41392-024-01838-9. PubMed PMID: 38811552; PubMed Central PMCID: PMCPMC11137111.[70] Böttcher, Jan P et al. “Functional classification of memory CD8(+) T cells by CX3CR1 expression.” Nature communications vol. 6 8306. 25 Sep. 2015, doi:10.1038/ncomms9306.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 32 - The authors might want to add that the mouse model leads to a "constitutive" depletion of cDC1.

      Thanks for your advice, we have revised the sentence as follows.

      Please refer to the Results section from line 31 to 33: “we established Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> mice, a novel and complex genetic model, in which cDC1 was constitutively depleted in vivo during atherosclerosis development”.

      (2) Line 187-188: The authors claim that T cell activation was "inhibited" if cDC1 was depleted. The data shows that the T cells were less activated, but there is no indication of any kind of inhibition; this should be corrected.

      Thanks for your advice, we have revised the sentence as follows.

      Please refer to the Results section from line 183 to 187: “Subsequently, we assessed T cell phenotype in the two groups of mice. While neither the frequencies nor absolute counts of aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells differed significantly between two groups of mice (Figure 4D-F), CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcr1<sup>+</sup> cDC1 depleted mice compared to controls (Figure 4G and H)”.

      (3) Why are some splenic DC clusters absent in LNs and vice versa? This is not obvious to this reviewer and should at least be discussed.

      We appreciate the insightful question regarding the absence of certain splenic DC clusters in LNs. This phenomenon in Figure 5 aligns with the 'division of labor' paradigm in dendritic cell biology: tissue microenvironments evolve specialized DC subsets to address local immunological challenges. The absence of universal clusters reflects functional adaptation, not technical artifacts. We acknowledge that this tissue-specific heterogeneity warrants further discussion and have expanded our analysis to address this point in the discussion part of our manuscript.

      Please refer to the Discussion section from line 375 to 385: “This pronounced tissue-specific compartmentalization of Xcr1<sup>+</sup> cDC1 subsets may related to multiple mechanisms including developmental imprinting that instructs precursor differentiation into transcriptionally distinct subpopulations [62], and microenvironmental filtering through organ-specific chemokine axes (e.g., CCL2/CCR2 in spleen) selectively recruits receptor-matched subsets [63, 64]. This spatial specialization optimizes pathogen surveillance for local immunological challenges. Based on the maturation analysis of the cDC1 scRNA seq data [41], our findings suggest that the aortic cDC1 cells display a major difference from those of spleen and lymph nodes by lacking the mature clusters, whereas lymph node cDC1 cells contain an additional Fabp5<sup>+</sup> S100a4<sup>+</sup> late mature Cluster. Our results also suggest that hyperlipidemia contributes to alteration in early immature cDC1 and in the abundance of late immature cDC1 cells, which was associated with dramatic change in gene expression of Tnfaip3, Serinc3, Apol7c and Tifab”.

      [62]. Liu Z, Gu Y, Chakarov S, Bleriot C, Kwok I, Chen X, et al. Fate Mapping via Ms4a3-Expression History Traces Monocyte-Derived Cells. Cell. 2019;178(6):1509-25 e19. Epub 2019/09/07. doi: 10.1016/j.cell.2019.08.009. PubMed PMID: 31491389.

      [63]. Bosmans LA, van Tiel CM, Aarts S, Willemsen L, Baardman J, van Os BW, et al. Myeloid CD40 deficiency reduces atherosclerosis by impairing macrophages' transition into a pro-inflammatory state. Cardiovasc Res. 2023;119(5):1146-60. Epub 2022/05/20. doi: 10.1093/cvr/cvac084. PubMed PMID: 35587037; PubMed Central PMCID: PMCPMC10202633.

      [64]. Mildner A, Schonheit J, Giladi A, David E, Lara-Astiaso D, Lorenzo-Vivas E, et al. Genomic Characterization of Murine Monocytes Reveals C/EBPbeta Transcription Factor Dependence of Ly6C(-) Cells. Immunity. 2017;46(5):849-62 e7. Epub 2017/05/18. doi: 10.1016/j.immuni.2017.04.018. PubMed PMID: 28514690.

      [41]. Bosteels V, Marechal S, De Nolf C, Rennen S, Maelfait J, Tavernier SJ, et al. LXR signaling controls homeostatic dendritic cell maturation. Sci Immunol. 2023;8(83):eadd3955. Epub 2023/05/12. doi: 10.1126/sciimmunol.add3955. PubMed PMID: 37172103.

      (4) The authors should discuss how XCL1 could impact lesional cDC1 and T cell abundance. Notably, preDCs do not express XCR1, and T cells express XCL1 following TCR activation. Is there a recruitment or local proliferation defect of cDC1 in the absence of XCL1? Could there also be a role for NK cells as a potential source of XCL1?

      We appreciate your insightful questions regarding the differential effects of Xcl1 on cDC1s and T cells. Xcl1 primarily mediates the recruitment of mature cDC1s. Our data demonstrate that Xcl1 deletion significantly reduces aortic cDC1 abundance, which correlates with a concomitant decrease in CD8<sup>+</sup> T cell numbers within the aorta. These findings strongly suggest that the Xcl1-Xcr1 axis plays a regulatory role in T cell accumulation in aortic plaques.

      Consistent with prior studies [A, B], cDC1 recruitment can occur in the absence of Xcl1 which echoes our findings that cDC1 cells were still found in Xcl1 knockout aortic plaque but in lower abundance. It is very true that further studies are required to address how the Xcl1 dependent and independent cDC1 cells activate T cells and if they possess capability of proliferation in tissue differentially. We have added these points in discussion section.

      Please refer to the Discussion section from line 407 to 415: “Notably, while complete ablation of Xcr1<sup>+</sup> cDC1s impaired T cell activation, reduction of Xcr1<sup>+</sup> cDC1 recruitment via Xcl1 deletion did not significantly compromise this process. This discrepancy may arise through compensatory mechanisms: alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue Xcr1<sup>+</sup> cDC1 homing [13, 68, 69], while non-cDC1 antigen-presenting cells (e.g., monocytes, cDC2s) may sustain T cell activation [55, 70]. Furthermore, tissue-specific microenvironment factors could potentially modulate its role in other diseases. In summary, our findings identify Xcl1 as a potential therapeutic target for atherosclerosis therapy, though its cellular origins and regulation of lesional Xcr1<sup>+</sup> cDC1 and T cells dynamics require further studies”.

      In literatures, Xcl1 are expressed in NK cells and subsects of T cells, and NK cells can be a potential source of Xcl1 during atherosclerosis which deserve further investigations [A, C, D].

      [A] Böttcher, Jan P et al. “NK Cells Stimulate Recruitment of cDC1 into the Tumor Microenvironment Promoting Cancer Immune Control.” Cell vol. 172,5 (2018): 1022-1037.e14. doi:10.1016/j.cell.2018.01.004

      [B] He, Fenglian et al. “Targeting BCL9/BCL9L enhances antigen presentation by promoting conventional type 1 dendritic cell (cDC1) activation and tumor infiltration.” Signal transduction and targeted therapy vol. 9,1 139. 29 May. 2024, doi:10.1038/s41392-024-01838-9

      [C] Woo, Yeon Duk et al. “The invariant natural killer T cell-mediated chemokine X-C motif chemokine ligand 1-X-C motif chemokine receptor 1 axis promotes allergic airway hyperresponsiveness by recruiting CD103+ dendritic cells.” The Journal of allergy and clinical immunology vol. 142,6 (2018): 1781-1792.e12. doi:10.1016/j.jaci.2017.12.1005

      [D] Winkels, Holger et al. “Atlas of the Immune Cell Repertoire in Mouse Atherosclerosis Defined by Single-Cell RNA-Sequencing and Mass Cytometry.” Circulation research vol. 122,12 (2018): 1675-1688. doi:10.1161/CIRCRESAHA.117.312513

      Reviewer #2 (Recommendations for the authors):

      There is a logical error in line 298. I suggest revising to: "Collectively, these data suggest that Xcl1 promotes atherosclerosis by recruiting Xcr1+ cDC1 cells, which subsequently drive T cell activation in lesions."

      Thanks for your advice. Since Xcl1 deficiency reduced both the frequencies and absolute counts of Xcr1+ cDC1 and CD8+ T cells in lesions without affecting T cell activation, we revised the sentence as you suggested.

      Please refer to the Results section from line 314 to 315: “Collectively, these data suggest that Xcl1 promotes atherosclerosis by recruiting Xcr1<sup>+</sup> cDC1 cells, and facilitating CD8<sup>+</sup> T cell accumulation in lesions”.

    1. eLife Assessment

      This important study elucidates the molecular function of the SARS-CoV-2 helicase NSP13, which inhibits the transcriptional activity of the YAP/TEAD complex in vitro and in vivo. The evidence supporting the authors' claims is compelling, based on cell biological assays and multi-omic studies. This work contributes to the understanding of the new regulatory mechanism of YAP/TEAD after SARS-CoV-2 infection and will be of interest to researchers investigating COVID-19 infection and the Hippo-YAP signaling pathway.

    2. Reviewer #1 (Public review):

      In the revised manuscript, Meng et al. report that SARS-CoV-2 infection suppresses YAP target gene transcription in both patient lung samples and iPSC-derived cardiomyocytes. Among the tested viral proteins, the helicase nonstructural protein 13 (NSP13) was identified as a key factor that impairs YAP/TEAD transcriptional activity. Through mutagenesis and protein-protein interaction studies, the authors propose a mechanism where NSP13 binds YAP/TEAD complex, remodels chromatin structure, and recruits transcriptional repressors to inhibit YAP/TEAD's transcriptional activity.

      Overall, this study uncovers a novel regulation of Hippo signaling by SARS-CoV-2 through NSP13, suggesting a potential role of this growth-related pathway in host innate immune response to viral infection. While these findings are intriguing, future studies are needed to validate the involvement of YAP/TEAD in patient tissues and to assess their potential as therapeutic targets against SARS-CoV-2.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Meng et al. describes a role for the coronavirus helicase NSP13 in the regulation of YAP-TEAD-mediated transcription. The authors present data that NSP13 expression in cells reduces YAP-induced TEAD luciferase reporter activity and that NSP13 transduction in cardiomyocytes blocks hyperactive YAP-mutant phenotypes in vivo. Mechanisms by which viral proteins (particularly those from coronaviruses) intersect with cellular signaling events is an important research topic, and the intersection of NSP13 with YAP-TEAD transcriptional activity (independent of upstream Hippo pathway mediated signals) offers new knowledge that is of interest to a broad range of researchers.

      Strengths:

      The manuscript presents convincing data mapping the effects of NSP13 on YAP-TEAD reporter activity to the helicase domain. Moreover, the in vivo data demonstrating that NSP13 expression in YAP5SA mouse cardiomyocytes increased survival animal rates, and restored cardiac function is striking and is supportive of the model presented.

      Weaknesses:

      While there are some hints at the mechanisms by which NSP13 regulates YAP-TEAD activity through the identification of NSP13-associated proteins by mass spec, the relationships and functions of these factors in the context of YAP-TEAD regulation requires further study in the future.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Major points

      (1) The authors discovered a novel regulation of the Hippo-YAP pathway by SARS-CoV-2 infection but did not address the pathological significance of this finding. It remains unclear why YAP downstream gene transcription needs to be inhibited in response to SARS-CoV-2 infection. Is this inhibition crucial for the innate immune response to SARS-CoV-2? The authors should re-analyze their snRNA-seq and bulk RNA-seq data described in Figure 1 to determine whether any of the affected YAP downstream genes are involved in this process.

      We appreciate the reviewer’s suggestion to clarify the pathological significance of YAP pathway inhibition in SARS-CoV-2 infection. To address this, we re-analyzed our snRNA-seq and bulk RNA-seq datasets to determine whether YAP target genes overlap with known mediators of the innate immune response. As described in Fig. 1C, bulk RNA-seq revealed decreased expression of multiple YAP downstream targets linked to innate immune regulation (e.g., Thbs1, Ccl2, Axl, and Csf1) in SARS-CoV-2–infected cells in vitro.

      snRNA-seq of alveolar type I (AT1) cells from COVID-19 patients revealed a more complex landscape: While we observed reduced YAP activity overall (Fig. 1G), multiple YAP target genes involved in innate immunity and cytokine signaling were paradoxically elevated (Supplemental Fig. 1E). Several factors likelt explain these conflicting observations: 1. In the lung, AT1 cells (which are critical for gas exchange) may cell specifically respond to virus infection by upregulating genes related to immune response by other signaling pathway(s); 2. In vivo, SARS-CoV-2 infection triggers a surge in cytokines, chemokines, and other local factors that can differentially modulate YAP binding sites and thus affect its downstream targets, a complexity not fully captured in vitro; 3. YAP is highly sensitive to mechanical signals and tissue architecture. The 3D structure of altered cell–cell junctions in infected lung tissue, and fluid shear stress in the alveolar space could shape YAP target gene transcription differently from simplified monolayer cell cultures.

      We have expanded the results section of the new version to include the above points. We also acknowledge that ongoing and future work is needed to delineate the exact molecular and tissue-specific pathways through which YAP inhibition confers a potential advantage in combating SARS-CoV-2.

      (2) The authors concluded that helicase activity is required for NSP13-induced inhibition of YAP transcriptional activity based on mutation studies (Figure 3B). This finding is somewhat confusing, as K131, K345/K347, and R567 are all essential residues for NSP13 helicase activity while mutating K131 did not affect NSP13's ability to inhibit YAP (Figure 3B). Additionally, there are no data showing exactly how NSP13 inhibits the YAP/TEAD complex through its helicase function. This point was also not reflected in their proposed working model (Figure 4H).

      We appreciate the reviewer’s concerns regarding the helicase‐dependent inhibition of YAP by NSP13, particularly the roles of K131, K345/K347, and R567. Based on published structural and biochemical studies, each of these residues uniquely supports helicase function (1): K131 is crucial for stabilizing the NSP13 stalk region by interacting with S424. Substituting K131 with alanine (K131A) reduces helicase efficiency but does not completely abolish it; K345/K347 are key DNA‐binding residues, and mutating both (K345A/K347A) largely prevents NSP13 from binding DNA, thus eliminating unwinding. R567 is critical for ATP hydrolysis, and the R567A mutant retains DNA binding capacity but fails to unwind it. In Fig. 3B, K131A suppresses YAP transactivation to nearly the same extent as wild‐type NSP13, suggesting that partial helicase activity is sufficient for complete YAP/TEAD inhibition. Conversely, the K345A/K347A and R567A mutants show markedly diminished repression, underscoring the importance of DNA binding and ATP hydrolysis.

      As the new Fig. 4J illustrates, NSP13 must bind DNA and hydrolyze ATP to unwind nucleic acids. This helicase‐dependent process likely enables NSP13 to remodel chromatin structure by binding TEAD and properly organize YAP repressors at YAP/TEAD complex to prevent YAP/TEAD transactivation. In support of this mechanism, the K345A/K347A mutant, unable to anchor to DNA, fails to repress YAP and slightly increases YAP‐driven transcription (Fig. 3B), presumably by mislocalizing YAP repressors. Likewise, the ATPase‐dead R567A can bind DNA but does not unwind and remodel chromatin to recruit YAP repressors, resulting in a loss of YAP suppression (Fig. 3B and 3F). Our revised model demonstrates that both DNA binding and ATP‐dependent unwinding are essential for NSP13 to suppress YAP transcriptional activity. We have updated the results, discussion, and model accordingly.

      (3) The proposed model that NSP13 binds TEAD4 to recruit repressor proteins and inhibits YAP/TEAD downstream gene transcription (Figure 4H) needs further characterization. Second, NSP13 is a DNA-binding protein, and its nucleic acid-binding mutant K345A/K347A failed to inhibit YAP transcriptional activity (Figure 3B). The authors should investigate whether NSP13 could bind to the TEAD binding sequence or the nearby sequence on the genome to modulate TEAD's DNA binding ability. Third, regarding the identified nuclear repressors, the authors should validate the interaction of NSP13 with the ones whose loss activates YAP transcriptional activity (Figure 4G). Lastly, why can't NSP13 bind TEAD4 in the cytoplasmic fractionation if both NSP13 and TEAD4 are detected there (Figure 3B)? This finding indicates their interaction is not a direct protein-protein interaction but is mediated by something in the nucleus, such as genomic DNA.

      (1) Low TEAD expression in HEK293T cells: Our IP-MS experiments were performed in HEK293T cells, which, according to the Human Protein Atlas, express TEAD1–4 at comparatively low levels (TEAD1: 16.5, TEAD2: 16.4, TEAD3: 4.9, TEAD4: 38.7 nTPM). In contrast, HeLa cells, where we successfully validated NSP13-mediated YAP suppression (Fig. 4H, Supplementary Fig.5B-D), show higher expression of these TEAD isoforms (TEAD1: 97.1, TEAD2: 27.3, TEAD3: 12.2, TEAD4: 48.1 nTPM). Therefore, insufficient TEAD abundance in HEK293T cells may limit the sensitivity needed to detect TEAD–NSP13 interactions in our proteomic screens.

      (2) Transience and potential DNA dependence: Our co-immunoprecipitation (co-IP) experiments (Fig. 4B, Supplementary Fig.4C-E) indicated that NSP13–TEAD4 binding is low-affinity. Under standard IP-MS conditions (which typically do not include chemical cross-linkers or nucleic acids to stabilize transient complexes), weak or short-lived interactions can be lost during washes or sample processing.

      (3). Additional supporting evidence: We carefully checked our IP-MS data and found that the well-known TEAD binding proteins, including CTBP1/2 and GATA4, were pulled down, suggesting TEAD’s absence does not rule out an NSP13–TEAD association.

      (3a) We acknowledge that our NSP13 immunoprecipitation–mass spectrometry (IP-MS) did not identify any TEAD proteins (Fig. 4G and IP-MS tables). Several factors likely contributed to this outcome:

      (3b) We sincerely appreciate the reviewer’s insightful suggestion. While we agree that mapping NSP13 occupancy at individual TEAD-binding motifs is valuable, we respectfully consider this to be beyond the scope of the current study. Biochemical and structural work on coronavirus NSP13 shows that it recognizes nucleic‑acid substrates primarily through their 5′ single‑stranded overhang and duplex architecture, not through a defined base sequence(2, 3). Accordingly, our data (Fig. 3B and 3F) indicate that DNA binding ability, rather than recognition of a specific motif, enables NSP13 to perform its helicase activity in proximity to TEAD and recruit repressors. Moreover, the DNA‑binding mutant K345A/K347A and the ATPase‑dead mutant R567A both fail to suppress YAP/TEAD transcription despite retaining the ability to interact with TEAD (Fig. 3B). These loss‑of‑function phenotypes demonstrate that NSP13’s chromatin engagement and unwinding activity, rather than sequence‑restricted targeting, are essential for repression. For these reasons, motif‑specific binding assays were not pursued in this revision, but we clarified in the discussion that NSP13’s DNA engagement is likely structural or TEAD-dependent, rather than sequence‑directed. We also highlighted this as an important avenue for future investigation.

      (3c) To validate the NSP13 interacting proteins from our IP-MS data, we generated plasmids expressing several candidates (CCT3, SMARCD1, EIF4A1, LMNA, TTF2, and YY2) and performed co-IP assays. As predicted, we confirmed the robust interaction between NSP13 and TEAD (Supplemental Fig. 5E). However, these putative nuclear repressors exhibited weak binding to NSP13 compared with TEAD4, suggesting that NSP13 associates with them indirectly, possibly as part of a larger multiprotein complex or depending on the chromatin structure, rather than via direct protein–protein interaction (Fig. 4J).

      (3d) We appreciate the reviewer’s question. To investigate whether their association might be DNA‐dependent, we performed co‐IP experiments using nuclear lysates in the presence or absence of various nucleases: Universal Nuclease (which degrades all forms of DNA and RNA), DNase I (which cleaves both single‐ and double‐stranded DNA), and RNase H (which selectively cleaves the RNA strand in RNA/DNA hybrids). Our findings revealed that nucleic acid removal did not disrupt the NSP13/TEAD4 interaction (Supplemental Fig.4E), indicating that their binding is not solely mediated by DNA or RNA.

      Reviewer #2 (Public Review):

      Specific comments and suggestions for improvement of the manuscript:

      (1) NSP13 has been reported to block, in a helicase-dependent manner, episomal DNA transcription (PMID: 37347173), raising questions about the effects observed on the data shown from the HOP-Flash and 8xGTIIC assays. It would be valuable to demonstrate the specificity of the proposed effect of NSP13 on TEAD activation by YAP (versus broad effects on reporter assays) and also to show that NSP13 reduces the function of endogenous YAP-TEAD transcriptional activity (i.e., does ectopic NSP13 expression reduce the expression of YAP induced TEAD target genes in cells).

      We appreciate the reviewer’s comments and have carefully revisited the conclusions from the published paper(4) (PMID: 37347173), which reported that NSP13 suppresses episomal DNA transcription, as evidenced by reduced Renilla luciferase (driven by the herpes simplex virus thymidine kinase promoter) and GFP expression upon co‐expression with NSP13. For our experiments, we used a dual‐luciferase assay with Renilla luciferase (under the same promoter) as an internal control. After re-examining our raw Renilla luciferase data (now provided in the supplemental Excel file “Supporting data value”), we found that while 100 ng of NSP13 did not affect Renilla luciferase levels, 400 ng of NSP13 reduced them by approximately 50% relative to the YAP5SA‐only group (Supplemental Fig.2B, Fig.3C-D). We observed a similar reduction with NSP13 truncation mutants—an outcome not fully consistent with the published study (Supplemental Fig.3D, PMID: 37347173). However, unlike their finding of robust episomal DNA suppression, our data indicate that the K345A/K347A mutant of NSP13, which lacks DNA‐binding ability, completely lost its suppressive effect (Fig.3B).

      We performed additional Notch reporter assays to address the concern that NSP13 might nonspecifically inhibit episomal DNA transcription (including the HOP‑Flash and 8×GTIIC reporters). These experiments revealed that co‑expression of NSP13 with NICD (Notch intracellular domain) does not suppress Notch signaling (Supplemental Fig. 2C), indicating that NSP13 does not globally block all reporter systems. To evaluate whether NSP13 reduces endogenous YAP‑TEAD activity, we transiently overexpressed NSP13 WT and its R567A mutant in HeLa cells. However, bulk RNA‑seq and qPCR analyses did not reveal a clear decrease in YAP target genes, possibly due to the low transfection efficiency (< 50%, Supplemental Fig.4D). Interestingly, we observed that YAP5SA was predominantly retained in the nucleus upon NSP13 or R567A co‑expression, suggesting that NSP13 (or together with its interacting partners) restricts YAP5SA cytoplasmic shuttling. Future studies will involve stable cell lines expressing NSP13 WT or R567A to better characterize the mechanisms driving YAP5SA nuclear retention and clarify how NSP13 specifically suppresses YAP activity.

      (2) While the IP-MS experiment may have revealed new regulators of TEAD activity, the data presented are preliminary and inconclusive. No interactions are validated and beyond slight changes in TEAD reporter activity following knockdown, no direct links to YAP-TEAD are demonstrated, and no link to NPS13 was shown. Also, no details are provided about the methods used for the IP-MS experiment, raising some concerns about potential false positive associations within the data.

      We appreciate the reviewer’s feedback regarding our IP-MS findings and acknowledge that additional validation is required to establish definitive links between the identified putative regulators, YAP-TEAD, and NSP13. We have taken the following steps (and plan further experiments) to address these concerns:

      (2a) Co-IP validation: Same with the answer for Reviewer #1 (3c), we generated plasmids expressing several top candidate interactors from the IP-MS data (CCT3, SMARCD1, EIF4A1, LMNA, TTF2, and YY2) and performed direct co-IP assays in a more controlled setting. The results indicated that these putative NSP13 interactors had weaker binding compared to TEAD4, implying that NSP13 may associate with them as part of a larger complex or depending on the chromatin structure rather than through a direct protein–protein interaction (Fig. 4J).

      (2b) qPCR validation: Beyond reporter assays for evaluating YAP transactivation after the candidate YAP suppressor knockdown (Fig. 4H and Supplemental Fig. 5C), we performed qPCR to detect YAP activation on endogenous YAP-TEAD target genes (e.g., CTGF CYR61, and AMOTL2) after CCT3 knockdown. Expression of CTGF and CYR61 was higher compared to control (Supplemental Fig. 5D), strengthening the case for an interaction relevant to YAP-TEAD signaling.

      (2c) To investigate how NSP13‐interacting proteins link to the YAP/TEAD complex, we examined the IP‑MS dataset and identified several well‐known YAP and TEAD binding partners, including CTBP1/2 (TEAD‐binding), GATA4 (TEAD‐binding), and multiple 14‐3‐3 isoforms (YWHAZ/YWHAB/YWHAH/YWHAQ, YAP binding). These findings suggest that NSP13 may form a larger nuclear complex with YAP/TEAD and associated cofactors. In the future, we will determine whether these putative TEAD regulators also interact with NSP13 under various conditions (e.g., in the presence or absence of DNA) and whether co‐expression of NSP13 influences their association with YAP or TEAD. This approach will clarify how NSP13 might leverage these factors to regulate YAP‐TEAD function.

      (2e) For the mass spectrometry experiments, HEK293T cells were transfected with Flag‐YAP1, HA‐NSP13, or Flag‐YAP1 + HA‐NSP13 according to the manufacturer’s standard protocols. After nuclear extraction and lysis, the supernatant was incubated with HA magnetic beads to immunoprecipitate (IP) NSP13. The IP samples were subsequently analyzed by mass spectrometry to identify NSP13‐associated proteins (Fig. 4F). Each experimental condition was performed in duplicate to ensure reproducibility. We included an appropriate negative control (Flag‐YAP1) and stringent data‐filtering criteria to minimize false positives. We apologize for not including these details in our original Methods section; in this revised manuscript, we have fully described the number of replicates, the controls used, and our data analysis pipelines.

    1. eLife Assessment

      The study presents valuable theoretical insights by attempting to classify pattern-forming gene subnetworks and exploring their potential mechanisms. However, the results are incomplete, as they rely on oversimplified models, limited classifications, and assumptions that may not hold in more complex or realistic scenarios.

    2. Reviewer #1 (Public review):

      Summary:

      The authors tackle a long-standing question in developmental theory: given a gene-regulatory network that includes extracellular signalling, which topologies are even capable of transforming an initial spatial profile into a genuinely new pattern? Building on the classical reaction-diffusion framework in one dimension, but imposing biologically motivated constraints, they prove that every one-signal sub-network must be either Hierarchical (H), self-activating (L+), or self-inhibiting (L-). They further demonstrate that only three composite classes of full networks - pure H, a coupled L+ L- "Turing" pair, and an L- module fed by an intracellular positive loop ("noise-amplifying")-can create non-trivial spatial transformations. Analytical criteria and illustrative simulations are provided, together providing a closed taxonomy, which is supposed to be relevant for real systems.

      Strengths:

      (1) Useful classification framework. Reducing a vast number of possible gene circuits to three canonical pattern-forming motifs is a valuable organising insight for both theorists and experimentalists.

      (2) Logical completeness. All required cases are addressed, and the proofs elevate previous computational observations to formal statements.

      (3) Practical interpretability. Given a reaction network diagram, one can now decide (assuming the model applies to the real systems) whether spatial patterning is even possible, saving experimental effort on in-silico screens that could never succeed.

      Weaknesses:

      (1) The Results section is difficult to follow. Key logical steps and network configurations are described shortly in prose, which constantly require the reader to address either SI or other parts of the text (see numerous links on the requirements R1-R5 listed at the beginning of the paper) to gain minimal understanding. As a result, a scientifically literate but non-specialist reader may struggle to grasp the argument with a reasonable time invested.

      (2) A central step in the model formulation is the linearisation of the reaction term around a homogeneous steady state; higher-order kinetics, including ubiquitous bimolecular sinks such as A + B → AB, are simply collapsed into the Jacobian without any stated amplitude bound on the perturbations. Because the manuscript never analyses how far this assumption can be relaxed, the robustness of the three-class taxonomy under realistic nonlinear reactions or large spike amplitudes remains uncertain.

      (3) All modelling is confined to one spatial dimension, and the very definition of a "non-trivial" transformation is framed in terms of peak positions along a line, which clearly must be reformulated for higher dimensions. It's well-known that diffusions in 1, 2, and 3 dimensions are also dramatically different, so the relevance of the three-class taxonomy to real multicellular tissues remains unclear, or at least should be explained in more detail.

      Discussion:

      As stated above, there are several uncertainties about the relevance of the presented framework for real systems. However, if the results hold, researchers could look at a gene-network diagram and quickly judge whether it can make spatial patterns and, if so, which of the three known mechanisms it will use. That shortcut would save experimental and computational time. In the case that the results don't hold for the real systems, the authors' proof tools at least give theorists a solid base they can extend to more complex cases.

    3. Reviewer #2 (Public review):

      Summary:

      This study explores how gene regulatory networks that include intra- and extracellular signaling can give rise to spatial patterns of gene expression in cells. The authors investigate this question in a simplified theoretical framework, where all cells are assumed to respond identically to signals, and spatial details such as cell boundaries and extensions are abstracted away. Within this setting, they identify three distinct signaling topologies, referred to as L and H types, and combine them into three minimal subnetworks capable of generating patterns. The study analyzes possible combinations of these topologies and examines how each subnetwork behaves under three different initial conditions. Combining the analyses with mathematical proofs and heuristic arguments, the authors define necessary conditions under which such networks can produce non-trivial spatial patterns.

      Strengths:

      The authors break down larger gene regulatory networks into smaller subnetworks, which allows for a more tractable analysis of pattern formation. These minimal subnetworks are examined under different initial conditions, providing a range of examples for how patterns can emerge in simplified settings. The study also proposes necessary conditions for pattern formation, which may be useful for identifying relevant network structures. In addition, the manuscript offers heuristic explanations for the emergence of patterns in each subnetwork, which help to interpret the simulation results and analytical criteria.

      Weaknesses:

      (1) We have serious concerns regarding the validity of the simulation results presented in the manuscript. Rather than simulating the full nonlinear system described by Equation (1), the authors base their results on a truncated expansion (Equation S.8.2) that captures only the time evolution of small deviations around a spatially homogeneous steady state. However, it remains unclear how this reduced system is derived from the full equations - specifically, which terms are retained or neglected and why - and how the expansion of the nonlinear function can be steady-state independent, as claimed. Additionally, in simulations involving the spike plus homogeneous initial condition, it is not evident - or, where equations are provided, it is not correct - that the assumed global homogeneous background actually corresponds to a steady state of the full dynamics. We elaborate on these concerns in the following:

      It is assumed that the homogeneous steady states are given by g_i=0 and g_i=c_i, where 1/c_i = \mu_i or \hat{\mu}_i​, independently of the specific network structure. However, the basis for this assumption is unclear, especially since some of the functions do not satisfy this condition - for example, f5​ as defined below Eq. S8.10.5. Moreover, if g_i=c_i does not correspond to a true steady state, then the time evolution of deviations from this state is not correctly described by Eq. S8.2, as the zeroth-order terms do not vanish in that case.

      Additionally, the equations used contain only linear terms and a cubic degradation term for each species g_i, while neglecting all quadratic terms and cubic terms involving cross-species interactions (i≠j). An explanation for this selective truncation is not provided, and without knowledge of the full equation (f), it is impossible to assess whether this expansion is mathematically justified. If, as suggested in the Supplementary Information, the linear and cubic terms are derived from f, then at the very least, the Jacobian matrix should depend on the background steady-state concentration. However, the equations for the small deviation around a steady state (including the Jacobian matrix) used in the simulations appear to be independent of the particular steady state concentration.

      This is why we believe that the differences observed between the spike-only initial condition and the spike superimposed on a homogeneous background are not due to the initial conditions themselves, but rather result from a modified reaction scheme introduced through a questionable cutoff.

      "In simulations with spike initial patterns, the reference value g≡0 represents an actual concentration of 0 and therefore, we must add to (S8.2) a Heaviside function Φ acting of f (i.e., Φ(f(g))=f(g) if f(g)>0 , Φ(f(g))=0 if f(g){less than or equal to}0 ) to prevent the existence of negative concentrations for any gene product (i.e., g_i<0 for some i )." (SI chapter S8).

      This cutoff alters the dynamics (no inhibition) and introduces a different reaction scheme between the two simulations. The need for this correction may itself reflect either a problem in the original equations (which should fulfill the necessary conditions and prevent negative concentrations (R4 in main text)) or the inappropriateness of using an expanded approximation which assumes independence on the steady state concentration. It is already questionable if the linearized equations with a cubic degradation term are valid for the spike initial conditions (with different background concentration values), as the amplitude of this perturbation seems rather large.

      Lastly, we note that under the current simulation scheme, it is not possible to meaningfully assess criteria RH2a and RH2b, as they rely on nonlinear interactions that are absent from the implemented dynamics.

      (2) Most of the proofs presented in the Supplementary Information rely on linearized versions of the governing equations, and it remains unclear how these results extend to the fully nonlinear system. We are concerned that the generality of the conclusions drawn from the linear analysis may be overstated in the main text. For example, in Section S3, the authors introduce the concept of dynamic equivalence of transitive chains (Proposition S3.1) and intracellular transitive M-branching (Proposition S3.2), which pertains to the system's steady-state behavior. However, the proof is based solely on the linearized equations, without additional justification for why the result should hold in the presence of nonlinearities. Moreover, the linearized system is used to analyze the response to a "spike initial pattern of arbitrary height C" (SI Chapter S5.1), yet it is not clear how conclusions derived from the linear regime can be valid for large perturbations, where nonlinear effects are expected to play a significant role. We encourage the authors to clarify the assumptions under which the linearized analysis remains valid and to discuss the potential limitations of applying these results to the nonlinear regime.

      (3) Several statements in the main text are presented without accompanying proof or sufficient explanation, which makes it difficult to assess their validity. In some cases, the lack of justification raises serious doubts about whether the claims are generally true. Examples are:

      "For the purpose of clarity we will explain our results as if these cells have a simple arrangement in space (e.g., a 1D line or a 2D square lattice) but, as we will discuss, our results shall apply with the same logic to any distribution of cells in space." (Main text l.145-l.148).

      "For any non-trivial pattern transformation (as long as it is symmetric around the initial spike), there exists an H gene network capable of producing it from a spike initial pattern." (Main text l.366f).

      "In 2D there are no peaks but concentric rings of high gene product concentration centered around the spike, while in 3D there are concentric spherical shells." (Main text l. 447ff).

      (4) The study identifies one-signal networks and examines how combinations of these structures can give rise to minimal pattern-forming subnetworks. However, the analysis of the combinations of these minimal pattern-forming subnetworks remains relatively brief, and the manuscript does not explore how the results might change if the subnetworks were combined in upstream and downstream configurations. In our view, it is not evident that all possible gene regulatory networks can be fully characterized by these categories, nor that the resulting patterns can be reliably predicted. Rather, the approach appears more suited to identifying which known subnetworks are present within a larger network, without necessarily capturing the full dynamics of more complex configurations.

      (5) The definition of non-trivial pattern formation is provided only in the Supplementary Information, despite its central importance for interpreting the main results. It would significantly improve clarity if this definition were included and explained in the main text. Additionally, it remains unclear how the definition is consistently applied across the different initial conditions. In particular, the authors should clarify how slope-based measures are determined for both the random noise and sharp peak/step function initial states. Furthermore, the authors do not specify how the sign function is evaluated at zero. If the standard mathematical definition sgn(0)=0 is used, then even a simple widening of a peak could fulfill the criterion for non-trivial pattern transformation.

      (6) The manuscript lacks a clear and detailed explanation of the underlying model and its assumptions. In particular, it is not well-defined what constitutes a "cell" in the context of the model, nor is it justified why spatial features of cells - such as their size or boundaries - can be neglected. Furthermore, the concept of the extracellular space in the one-dimensional model remains ambiguous, making it unclear which gene products are assumed to diffuse.

    4. Reviewer #3 (Public review):

      Pattern formation is responsible for generating the spatial organization of cells, tissues, and organs during embryogenesis. It operates within a multifactorial system including initial conditions, gene regulatory networks, extracellular signals, mechanical forces, stochastic noise, and environmental inputs. Finally, it ensures the functional anatomy of an organism.

      This study focuses on the one central aspect in pattern formation: how spatial heterogeneity arises from an initial condition and evolves into a more complex or distinct spatial pattern (non-trivial pattern formation, as they termed). The authors made efforts to explore and characterize all possible ways to achieve the pattern formation. They do this by discussing how extracellular signals spread, how individual cells respond to those signals, and how those responses, in turn, modulate signal propagation.

      Finally, their comprehensive analysis summarizes that there are three classes of interactions between extracellular signals and intracellular responses, corresponding to previously known mechanisms that can generate spatial patterns: difference in morphogen concentrations in space, noise-amplification, and Turing pattern.

    1. eLife Assessment

      This study presents a sequence-based method for predicting drug-interacting residues in intrinsically disordered proteins (IDPs), addressing an important challenge in understanding small-molecule:IDP interactions. The findings have solid support in illustrative examples that underscore the role of aromatic interactions. While predicted binding sites remain coarse, validation was done on a total of 10 IDPs, four of which thoroughly and six others less so. The method builds on previous work from the authors, with necessarily ad hoc modifications, and offers a starting point for further exploration in this emerging field.

    2. Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential drug-interacting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts drug-interacting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific lengthscale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim?

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential drug-interacting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      We fully recognize that different compounds may have different interaction propensity profiles along the IDP sequence. In future studies, we will investigate compound-specific parameter values. The limiting factor is training data, but such data are beginning to be available.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts drug-interacting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Actually, the factors that elevate R2 are well-established. These are local interactions and residual secondary structures (if any). The basic assumption of our method is that intra-IDP interactions that elevate R2 convert to IDP-drug interactions. This assumption was supported by our initial observation that the drug interaction propensity profiles predicted using the original SeqDYN parameters already showed good agreement with CSP profiles. We only made relatively small adjustments to the parameters to improve the agreement. Indeed we did not apply the helix boost portion of SeqDYN to DIRseq, and will state as such. We will also compare DIRseq with several alternative models.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      We will compare predictions of these various parameter sets, and summarize the results in a table.

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      We have explained the rationale for the parameter tweaks, which were limited to q values for four amino-acid types, i.e., to deemphasize hydrophobic interactions and slightly enhance electrostatic interactions (p. 4-5). We will add that these tweaks were motivated by observations from MD simulations of drug interactions with a-syn (ref 20). As already noted in the response to the preceding comment, we will also present results for the original parameter values as well as for when the four q values are changed one at a time.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific lengthscale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      The reviewer raises a very important point. In Discussion, we will add that it is important to further develop DIRseq to include drug-specific parameters when data for training become available.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      We will cite several studies showing that IDPs remain disordered upon drug binding.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim? 

      Here again we will add citations to support the statement.

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim. 

      We will add citations to both compound optimization and mechanism of action.

    1. eLife Assessment

      This work presents valuable new information on the microtubule-binding mode of the microtubule kinesin-13, MCAK, the authors use quantitative single-molecule studies to propose that MCAK preferentially binds to a GDP-Pi-tubulin portion of the microtubule end. However, the evidence provided to support this claim remains incomplete and would benefit from more rigorous methodology particularly the diffraction limited experiments do not provide sufficient spatial resolution to support the authors' conclusions. In addition, a more through discussion of the existing literature would further strengthen the manuscript.

    2. Reviewer #1 (Public review):

      The authors responded to multiple criticisms with additional data and more detailed statistics, in some instances improving the quality of the work. However, I had difficulty understanding some of the authors' responses. The logic was not always apparent, the writing was occasionally confusing or would benefit from more careful wording, and some of the provided responses were superficial or raised new concerns. In some cases, the underlying data needed to support their responses were not shown. Thus, the current version of the manuscript does not sufficiently resolve the following critical issues raised by myself and other reviewers.

      (1) A clear new insight into a physiological process or cellular behavior remains lacking. The study largely confirms prior observations of MCAK binding to both the microtubule wall and end. However, it is still unclear whether direct binding to the tip-as opposed to accumulation via wall diffusion or interaction with other tip-binding proteins-is a significant mechanism.

      (2) The newly revealed adenosine-nucleotide-dependent binding preferences do not help clarify MCAK's catalytic function or its mechanisms of tip recognition. Consequently, the final summary figure remains speculative and is not convincingly supported by the data. It is also unclear what exactly is meant by the "working model" (figure title), or by the claim of "a simple rule of how the end-binding regulators coordinate their activities" (abstract).

      (3) As noted in my previous review, the effects of adding different adenosine nucleotides on MCAK binding to microtubules are much more pronounced than the differences in MCAK binding to tubulin with various guanosine-containing nucleotides, or to lattice versus tip (e.g., Fig. 5E). Therefore, the manuscript title-"MCAK recognizes the nucleotide-dependent feature at growing microtubule ends"-does not do justice to the scale of these effects.

      (4) The title implies that MCAK selectively recognizes a feature determined by the tubulin-bound guanosine nucleotide. However, the authors frequently claim that MCAK binds to the "entire GTP cap." It appears that they exclude structural protrusions from their definition of the cap, which is debatable. Even using their definition, the conclusion that MCAK recognizes a specific "nucleotide-dependent feature" seems inconsistent with the claim that it binds uniformly across the cap. These distinctions were not made clear.

      (5) Some important technical details are still absent. For example, when reading the authors' response to another reviewer's question, I could not find an explanation of how the kon values for end and wall binding were calculated. These calculations clearly require assumptions, e.g. about the number of binding sites, but these details are not described. In addition, the binding data are expressed in units per tubulin dimer, which are non-standard and make comparisons to other published results difficult. There are other instances where more technical detail would be desirable, but they are too numerous to list here.

      (6) Several aspects of data presentation as graphs will make it difficult for other researchers to analyze or interpret the findings. Numerical Excel-style data sheets should be provided for all measurements, including raw data-not just the ratios or derived values shown in plots. Other, more significant issues include use of mean values for non-Gaussian distributions (e.g., dwell times); binding affinities inferred from single-concentration measurements, often under varying conditions (e.g., Figs. 3C, 4); and absence of side-by-side plotted controls (e.g., Fig. 6).

      (7) While the authors have added some quantitative values and descriptive detail, the manuscript still lacks a critical comparison of their findings with existing literature. This weakens the impact of the study and limits the reader's ability to place the results in a broader context.

    3. Reviewer #4 (Public review):

      The revised manuscript from Chen et al. implements many of the changes requested by the 3 reviewers of the initial submission. These changes are well-described in the corresponding Response to Reviews document. Of course, not every request from the reviewers was addressed, and the following major concerns remain:

      (1) The authors argue that MCAK binds to the same region as EB proteins, which they refer to as the "EB cap". Reviewers asked for experiments that would increase the size of the EB cap to create "comets" (e.g. by increasing the microtubule growth rate); the prediction is that the MCAK signal should increase in size as well. The authors declined to pursue these experiments. As a result, the EB signals and MCAK signals are diffraction-limited spots, as opposed to the predicted exponential decay signals characteristic of EB comets. The various diffraction-limited spots are then aligned with the diffraction-limited signal of the microtubule end. These alignments and sub-pixel comparisons are technically challenging. The revised manuscript does not go far enough to provide compelling evidence that all technical challenges were overcome. Thus, while the authors can safely conclude that MCAK, EBs, and the microtubule end do occupy the same diffraction-limited spot, more precise conclusions are not supported.

      (2) The reviewers criticized the initial manuscript for neglecting key references, particularly Kinoshita et al., Science 2001. Indeed, I cannot fathom writing a manuscript about MCAK and XMAP215 without putting a citation to such a landmark paper front and center. The authors have responded by including more discussion of the relevant literature (and citing Kinoshita et al.). However, the revised manuscript is often still cursory in giving credit where credit is due, contextualizing the new data, and generally engaging with the scholarship on MCAK.

      (3) The data presented does not include a simple measurement of the impact of MCAK on the catastrophe frequency of microtubules. The authors explain this absence by pointing out that their movies are short (5 min) and high frame rate (10 fps). While I understand that such imaging parameters are necessary to capture single molecule end-binding events, I do not understand why a separate set of experiments could not be performed. This type of "positive control" is often missing, as pointed out by the 3 reviewers.

      (4) Salt conditions, protein concentrations, and other key experimental parameters are not varied, even when varying them would provide excellent tests of the authors' hypotheses.

      In summary, the revised manuscript is improved in many ways, but the interested reader should look carefully at the previous reviews and compare the measurements presented here with those of other labs.

    1. eLife Assessment

      By taking advantage of noise in gene expression, this important study introduces a new approach for detecting directed causal interactions between two genes without perturbing either. The main theoretical result is supported by a proof. Preliminary simulations and experiments on small circuits are solid, but further investigations are needed to demonstrate the broad applicability and scalability of the method.

    2. Reviewer #2 (Public Review):

      Summary:

      This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality.

      The authors benchmark their approach experimentally in several synthetic circuits. In 4 positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in 2 of the 4 positive control circuits. The authors constructed 16 negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter, or simply the cellular growth rate. The proposed method detected a causal effect in two of the 16 negative controls, which the authors argue is perhaps not a false positive, but due to an unexpected causal effect. Overall, the data support the potential value of the proposed approach.

      Strengths:

      The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations.

      By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations.

      The authors have improved the clarity and completeness of their proof compared to a previous version of the manuscript.

      Limitations:

      The authors themselves clearly outline the primary limitations of the study: The experimental benchmark is a proof of principle, and limited to synthetic circuits involving a handful of genes expressed on plasmids in E. coli. As acknowledged in the Discussion, negative controls were chosen based on the absence of known interactions, rather than perturbation experiments. Further work is needed to establish that this technique applies to other organisms and to biological networks involving a wider variety of genes and cellular functions. It seems to me that this paper's objective is not to delineate the technique's practical domain of validity, but rather to motivate this future work, and I think it succeeds in that.

      Might your new "Proposed additional tests" subsection be better housed under Discussion rather than Results?

      I may have missed this, but it doesn't look like you ran simulation benchmarks of your bootstrap-based test for checking whether the normalized covariances are equal. It would be useful to see in simulations how the true and false positive rates of that test vary with the usual suspects like sample size and noise strengths.

      It looks like you estimated the uncertainty for eta_xz and eta_yz separately. Can you get the joint distribution? If you can do that, my intuition is you might be able to improve the power of the test (and maybe detect positive control #3?). For instance, if you can get your bootstraps for eta_xz and eta_yz together, could you just use a paired t-test to check for equality of means?

      The proof is a lot better, and it's great that you nailed down the requirement on the decay of beta, but the proof is still confusing in some places:

      On pg 29, it says "That is, dividing the right equation in Eq. 5.8 with alpha, we write the ..." but the next equation doesn't obviously have anything to do with Eq. 5.8, and instead (I think) it comes from Eq 5.5. This could be clarified.

      Later on page 29, you write "We now evoke the requirement that the averages xt and yt are stationary", but then you just repeat Eq. 5.11 and set it to zero. Clearly you needed the limit condition to set Eq. 5.11 to zero, but it's not clear what you're using stationarity for. I mean, if you needed stationarity for 5.11 presumably you would have referenced it at that step.

      It could be helpful for readers if you could spell out the practical implications of the theorem's assumptions (other than the no-causality requirement) by discussing examples of setups where it would or wouldn't hold.

    3. Author response:

      The following is the authors’ response to the previous reviews

      We have made the following small adjustments and resubmit the manuscript to be published as a Version of Record with eLife.

      Changes in main text of the manuscript:

      We have moved the “Proposed additional tests” subsection to the Discussion section as suggested by the referee. 

      We have added a link to a Github repository and a link to a Zenodo data repository at the beginning of the Materials and Methods section in the “Data and materials availability” subsection. The Github repository contains simulation code and data, and single-cell data analysis code. The Zenodo link contains our experimental data (we await your confirmation before we publish it officially on Zenodo).   

      Changes in the supplemental information files

      We have fixed the typo on page 29 of the SI in which Eq. (8) was referred to in a derivation. It should be Eq. (5) instead. We thank the referee for catching this mistake which has now been corrected.

      We have fixed a typo on page 29 of SI, in which the word “evoke” is now “invoke”.  

      We have clarified the derivation on page 29 of the SI. The referee is correct that the limit condition was used to set the right-hand side of Eq. (5.11) to zero.

    1. eLife Assessment

      This important study reports an advancement in the diagnosis of Animal African Trypanosomosis (AAT), which adapts a CRISPR-based diagnostic tool (SHERLOCK4AAT) to detect different trypanosome species responsible for AAT. The evidence supporting the conclusions is convincing and in line with the current state-of-the-art diagnostics. This study will be of interest to the fields of Epidemiology, Public Health, and Veterinary Medicine.

    2. Reviewer #1 (Public review):

      Summary:

      The authors developed SHERLOCK4AAT, a CRISPR-Cas13a-based diagnostic toolbox for detecting multiple trypanosome species responsible for animal African trypanosomiasis. They created species-specific assays targeting six prevalent parasite species and validated the system using dried blood spots from domestic pigs in Guinea and Côte d'Ivoire. Field testing revealed high infection rates (62.7% of pigs infected) and, notably, the presence of human-infective parasites in domestic animals.

      Major Strengths:

      This study represents a valuable application of CRISPR-based detection technology to veterinary diagnostics, with strong potential for practical implementation. The authors conducted comprehensive validation, including statistical analyses to determine sensitivity and specificity, and demonstrated field utility through large-scale testing of 424 samples from two geographically distinct regions. The detection of human-infective parasites in pigs at both sites provides important One Health insights supporting integrated disease surveillance and has direct implications for public health policy and disease elimination programs. The methodology is robust, incorporating Bayesian statistical modeling and offering clear practical advantages such as dried blood spot compatibility and detection of active infections. The revised manuscript also addresses implementation considerations, including cost, training needs, and field logistics.

      Major Weaknesses:

      Some technical limitations constrain broader applicability. The assay for one key parasite species (T. vivax) shows suboptimal sensitivity, which may limit its utility in detecting this important pathogen. The current assay design does not distinguish between closely related species within the same subgenus-an important factor for certain epidemiological studies. Additionally, some assays relied on synthetic controls due to unavailable biological material, and the discussion on potential cross-reactivity with related kinetoplastid parasites is limited.<br /> Achievement of Aims: The authors clearly achieved their primary objectives of developing a sensitive, species-specific diagnostic system and demonstrating its applicability in real-world settings. The detection of human-infective trypanosomes in domestic pigs provides valuable epidemiological evidence in support of One Health strategies and targeted disease elimination efforts.

      Impact and Utility:

      This work responds to a well-documented need in veterinary diagnostics, where current methods often lack sensitivity or species discrimination. The system offers practical benefits for resource-limited settings through a short assay duration and compatibility with dried blood spot samples. While certain performance limitations may restrict broader adoption, the species identification capability represents a substantial advancement over existing approaches. The findings enhance our understanding of parasite diversity in livestock and their potential role as zoonotic reservoirs, with implications extending beyond veterinary medicine to public health surveillance and policy development.

      Context:

      This study makes a timely and relevant contribution to diagnostic epidemiology and One Health surveillance frameworks. The field-adapted use of advanced molecular detection technologies represents a significant step toward improved disease monitoring in regions where trypanosomiasis poses ongoing threats to animal health, agriculture, and human livelihoods. The cross-disciplinary implications for veterinary medicine, public health, and disease elimination programs underscore the broader significance of this work.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript is fundamental due to the significance of its findings. The strength of the evidence is compelling, and the manuscript is publishable since the corrections have been made.

      Strengths:

      Using a Novel SHERLOCK4AAT toolkit for diagnosis.

      Identification of various sub-species of Trypanosomes.

      Differentiating the animal sub-species from the human one.

      Corrections Made:

      Definite articles have been removed from the title.

      The words of the title have been reduced to 15.

      Typographical errors have been corrected.

      Weaknesses:

      None

    4. Reviewer #3 (Public review):

      Summary:

      The study adapts CRISPR-based detection toolkit (SHERLOCK assay) using conserved and species-specific targets for the detection of some members of the Trypanosomatidae family of veterinary importance and species-specific assays to differentiate between the six most common animal trypanosomes species responsible for AAT (SHERLOCK4AAT). The assays were able to discriminate between Trypanozoon (T. b. brucei, T. evansi and T. equiperdum), T. congolense (Savanah, Forest Kilifi and Dzanga sangha), T. vivax, T. theileri, T. simiae and T. suis. The design of both broad and species-specific assays was based primarily on sequences of the 18S rRNA, GAPDH (Glyceraldehyde-3-phosphate dehydrogenase) and invariant flagellum antigen (IFX) genes for species identification. Most importantly the authors showed varying limit of detection for the different SHERLOCK assays which is somewhat comparable to PCR-derived molecular techniques currently used for detecting animal trypanosomes even though some of these methodologies have used other primers that target genes such as ITS1 and 7SL sRNA.

      The data presented in the study are particularly useful and of significant interest for diagnosis of AAT in affected areas.

      Strengths:

      The assays convincingly allow for the analysis and detection of most trypanosomes in AAT

      Weaknesses:

      Inability for the assay to distinguish T. b. brucei, T. evansi and T. equiperdum using the 18S rRNA gene as well as the IFX gene not achieving the sensitivity requirements for detection of T. vivax. Both T. brucei brucei and T. vivax are the most predominant infective species in animals (in addition to T. congolense), therefore a reliable assay should be able to convincingly detect these to allow for proper use of diagnostic assay.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses a critical gap in veterinary diagnostics by developing a CRISPR-based diagnostic toolbox (SHERLOCK4AAT) for detecting animal African trypanosomosis. It describes the development and field deployment of SHERLOCK4AAT, a CRISPR-Cas13-based diagnostic toolbox for the eco-epidemiological surveillance of animal African trypanosomosis (AAT) in West Africa.The authors successfully created and validated species-specific assays for multiple trypanosomes, including T. congolense, T. vivax, T. theileri, T. simiae, and T. suis, alongside pan-trypanosomatid and pan-Trypanozoon assays. The field validation in pigs from Guinea and Côte d'Ivoire revealed high trypanosome prevalence (62.7%), frequent co-infections, and importantly identified T. b. gambiense in one animal at each site, suggesting pigs may serve as potential reservoirs for this human-infective parasite.

      A major strength of the study lies in its methodological innovation. By adapting SHERLOCK to target both conserved and species-discriminating sequences, the authors achieved high sensitivity and specificity in detecting Trypanosoma species. Their use of dried blood spots, validated thresholds through ROC analyses, and statistical robustness (e.g., Bayesian latent class modeling) provides a strong foundation for their conclusions.

      The results are significant: over 60% of pigs tested positive for at least one trypanosome species, with co-infections observed frequently and T. b. gambiense detected in pigs at both sites. These findings have direct implications for the role of animal reservoirs in human disease transmission and underscore the value of pigs as sentinel hosts in gHAT elimination efforts.

      The limitations are well acknowledged, particularly the suboptimal sensitivity of the T. vivax assay and the reliance on synthetic controls for T. suis and T. simiae. However, these limitations do not undermine the overall conclusions, and the paper provides a clear roadmap for further assay refinement and implementation.

      This study offers a timely, impactful, and well-substantiated contribution to the field. The SHERLOCK4AAT toolbox holds promise for improving AAT diagnostics in resource-limited settings and advancing One Health surveillance frameworks.

      Thank you

      Strengths: 

      (1) The adaptation of SHERLOCK technology for AAT represents a significant technical advancement, offering higher sensitivity than traditional parasitological methods and the ability to detect multiple species simultaneously.

      (2) Rigorously performed with validation using appropriate controls, ROC curve analyses, and Bayesian latent class modelling, establishing clear analytical sensitivity and specificity for most assays.

      (3) Testing 424 pig samples across two countries provides robust evidence of the tool's utility and reveals important epidemiological insights about trypanosome diversity and prevalence.

      (4) The identification of T. b. gambiense in pigs at both sites has significant implications for HAT elimination strategies and highlights the need for integrated One Health approaches.

      (5) The use of dried blood spots and RNA detection for active infections makes the approach practical for field surveillance in resource-limited settings.

      Thank you

      Weaknesses: 

      (1) The manuscript would benefit from more detailed discussion of practical considerations such as cost, equipment requirements, and training needs for implementing SHERLOCK in endemic areas and rural settings which would improve applicability.

      This is now adressed in the revised discussion (end of the first section).

      (2) Limited discussion of pig selection criteria: More justification for choosing pigs as sentinel animals and discussion of potential limitations of this approach would strengthen the manuscript.

      Yes, this is now more clearly explained in the revised discussion (beginning of the first section).

      (3) More details on why certain genes were targeted would strengthen the methods.

      The first result section ‘Selection of targets for broad and species-specific SHERLOCK assays targeting AAT species (SHERLOCK4AAT)’ is already dedicated to extensively explaining target selection, hence we’re afraid we don’t know what could be added.  

      (4) Table formatting could be improved for readability. 

      (5) Some figures are complex and would benefit from additional explanations in the legends.

      We have tried to improve these two aspects as much as possible in the revised manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript is important due to the significance of the findings. The strength of evidence is convincing.

      Thank you

      Strengths: 

      (1) Using a Novel SHERLOCK4AAT toolkit for diagnosis. 

      (2) Identification of various sub-species of Trypanosomes. 

      (3) Differentiating the animal subspecies from the human one. 

      Thank you

      Weaknesses: 

      (1) The title is too long, and the use of definite articles should be reduced in the title.

      The title has been improved in the revised version.

      (2) The route of blood sample collection in the animals should be well defined and explained.

      This has been more clearly explained in the revised method section.

      Reviewer #3 (Public review):

      Summary: 

      The study adapts CRISPR-based detection toolkit (SHERLOCK assay) using conserved and species-specific targets for the detection of some members of the Trypanosomatidae family of veterinary importance and species-specific assays to differentiate between the six most common animal trypanosome species responsible for AAT (SHERLOCK4AAT). The assays were able to discriminate between Trypanozoon (T. b. brucei, T. evansi, and T. equiperdum), T. congolense (Savanah, Forest Kilifi, and Dzanga sangha), T. vivax, T. theileri, T. simiae, and T. suis. The design of both broad and species-specific assays was based primarily on sequences of the 18S rRNA, GAPDH (Glyceraldehyde-3-phosphate dehydrogenase), and invariant flagellum antigen (IFX) genes for species identification. Most importantly, the authors showed varying limits of detection for the different SHERLOCK assays, which is somewhat comparable to PCR-derived molecular techniques currently used for detecting animal trypanosomes, even though some of these methodologies have used other primers that target genes such as ITS1 and 7SL sRNA. <br /> The data presented in the study are particularly useful and of significant interest for the diagnosis of AAT in affected areas.

      Thank you

      Strengths: 

      The assays convincingly allow for the analysis and detection of most trypanosomes in AAT.

      Thank you

      Weaknesses: 

      Inability for the assay to distinguish T. b. brucei, T. evansi, and T. equiperdum using the 18S rRNA gene, as well as the IFX gene, not achieving the sensitivity requirements for detection of T. vivax.  Both T. brucei brucei and T. vivax are the most predominant infective species in animals (in addition to T. congolense), therefore, a reliable assay should be able to convincingly detect these to allow for proper use of the diagnostic assay.

      We agree with this point and aim to improve the toolbox for future studies.

      Reviewer #1 (Recommendations for the authors):

      (1) Provide additional details on the practicality of SHERLOCK deployment in the field, including training, costs, and infrastructure (potential challenges for field deployment, including suggestions for how to overcome these barriers).

      This is now adressed in the revised discussion (end of the first section).

      (2) Provide more detailed justification for choosing pigs as the main study species and discuss potential benefits and limitations of extending the approach to other livestock species.

      Yes, this is now more clearly explained in the revised discussion (beginning of the first section).

      (3) Add a comparison table comparing SHERLOCK4AAT performance metrics (sensitivity, specificity, LoD) with existing molecular diagnostic methods for AAT for ease of reference.

      There are dozens of different serological, immunological and molecular approaches with highlty variable levels of sensitivity and specificities already reviewed and compared in detail in two references from 2022 (Desquesnes et al. a and b), which we have cited, as well as in a newly added reference (EBHODAGHE F acta trop 2018). Hence, we decided to only refer to the most comparable studies in the present article.

      (4) Review complex figures and improve legends for better readability and interpretation.

      We have tried to improve this as much as possible in the revised manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) Reduce the number of words in the title from 28 to not more than 20.

      The title has been improved in the revised version.

      (2) Specify the particular route of collection of blood samples in the various animals.

      Yes, this is now more clearly explained in the revised method section.

      (3) Correct all typographical errors. 

      We have tried to improve this as much as possible in the revised manuscript.

      Thanks. I wish you the best in your publication process. 

      Thank you

      Reviewer #3 (Recommendations for the authors): 

      Minor comments 

      (1) The authors can expand the discussion to include other recent diagnostic assays for Animal trypanosomiasis, such as those that target other genes like tubulin.

      Please see response to Review 1 point #3 above.

      (2) The cost-effectiveness of the use of the assay can be discussed since the assay is expected to be used for work in some resource-deprived areas. For example, will it cost a researcher less to do a diagnosis with this assay relative to what is already available?

      This is now adressed in the revised discussion (end of the first section).

      (3) Is Cote d'Ivoire more endemic for AAT than Guinea? Will this account for the apparently consistent differences in the percentage of positive samples, or just because of the type of samples used from the two locations?

      As the sampling method, sample preservation and sample analysis were the same for both groups - yes, it appears that pigs, at least for domesticated ones, in the study region of Cote d'Ivoire were more frequently infected than those in the study region of Guinea. It is however risky to extrapolate these observations to the AAT prevalence in the entire countries and/or to other mammals.

      (4) Can the authors comment on how long one can store the samples for an effective and reliable assay?

      The samples can be stored for several months at ambient temperature in a sealed bag with silica gel packages to reduce humidity. We have added this detail in the revised methods section.

      (5) It is not clear whether the authors used conventional molecular diagnostics to compare the data obtained from this particular cohort of animals as reference is made to published data. It is not surprising that the SHERLOCK performed better than using parasitology-based methodology.

      This is now adressed in the revised discussion.

      (6) (Figure 4D-5D) should be 4D and 5D.

      Thank you, this has been corrected.

    1. eLife Assessment

      This useful study integrates experimental methods from materials science with psychophysical methods to investigate how frictional stabilities influence tactile surface discrimination. The authors argue that force fluctuations arising from transitions between frictional sliding conditions facilitate the discrimination of surfaces with similar friction coefficients. However, the reliance on friction data from an artificial finger, combined with correlational analyses that fall short of establishing a mechanistic link to perception, renders the findings incomplete.

    2. Reviewer #2 (Public review):

      This is a revised version of a paper I reviewed previously.

      Again, the purpose of the paper is to suggest that common metrics, such as friction or any given physical property of the surface, are probably inadequate to predict the perception of the surface or its discriminability. Instead, the authors propose a very interesting and original idea that, instead, frictional instabilities are related to fine touch perception (title).

      Overall, the authors have put much effort into improving the manuscript, enhancing clarity, and avoiding overstatements. And I feel the narrative is indeed much improved and less ambiguous.

      However, the authors have systematically avoided addressing the main comment of all reviewers: the link made between the mock finger passive experiment and the active human psychophysics is incorrect and should not be done, because its interpretation could be flawed.<br /> - First, this link is very weak (the correlation of 6 datapoints is barely significant).<br /> - Second, the real and mock fingers have very different properties (think about moisture, compliance, roughness,...).<br /> - Third, the comparison is made between a passive and well-controlled experiment and an active exploration. Yet, the comparison metrics (number of events) are clearly dependent on exploration procedures.

      In your response to my comments:<br /> "We have made changes throughout the manuscript to acknowledge that our findings are correlative, clarifying this throughout, and incorporating into the discussion how our work may enable biomechanical measurements and tactile decision making models"

      The authors admit that the analysis is flawed, yet they did not remove it. If they cannot demonstrate that the mock finger and the human finger behave the same way during the perceptual experiment, then they should remove Fig2 that combines apples and oranges. OR, they should look at the active exploration data and compute the same metrics on that data.

      "This "weird choice" is the central innovation of this paper. This choice was necessary because we demonstrated that the common usage of friction coefficient is fundamentally flawed: we see that friction coefficient suggests that surface which are more different would feel more similar - indeed the most distinctive surfaces would be two surfaces that are identical, which is clearly spurious. "

      They did not "demonstrate" such a flaw. Again, the difference in friction is between the mock finger trials. At the very least, the authors should verify that it is true of the active human experiment.

      "To fully implement this, a decision-making model is necessary because, as a counter example, a participant could have generated 10 swipes of SFW and 1 swipe of a Sp, but the Sp may have been the most important event for making a tactile decision. This type of scenario is not compatible with the analysis suggested - and similar counterpoints can be made for other types of seemingly straightforward analysis."

      The suggested analyses are straightforward and would be much more valuable than the data from the mock finger, even with the potential variability stated above.

      "We recognize that, with all factors being equal, this sample size is on the smaller end"

      Yet, the authors did not collect additional data to confirm their findings.

    3. Reviewer #3 (Public review):

      Strengths:

      The paper describes a new perspective on friction perception, with the hypothesis that humans are sensitive to the instabilities of the surface rather than the coefficient of friction. The paper is very well written and with a comprehensive literature survey.

      One of the central tools used by the author to characterize the frictional behavior is the frictional instabilities maps. With these maps, it becomes clear that two different surfaces can have both similar and different behavior depending on the normal force and the speed of exploration. It puts forward that friction is a complicated phenomenon, especially for soft

      The psychophysics study is centered around an odd-one-out protocol, which has the advantage of avoiding any external reference to what would mean friction or texture for example. The comparisons are made only based on the texture being similar or not.

      The results show a significant relationship between the distance between frictional maps and the success rate in discriminating two kinds of surface.

      Weaknesses:

      The main weakness of the paper comes from the fact that the frictional maps and the extensive psychophysics study are not made at the same time, nor with the same finger. The frictional maps are produced with an artificial finger made out of PDMS which is a poor substitute for the complex tribological properties of skin.

      The evidence would have been much stronger if the measurement of the interaction was done during the psychophysical experiment. In addition, because of the protocol, the correlation is based on aggregates rather than on individual interactions. However the current data already bring new light on the nature of frictional oscillation and their link to perception.

      The authors compensate with a third experiment where they used a 2AFC protocol and an online force measurement. But the results of this third study fail to solidify the relation.

      No map of the real finger interaction is shown, bringing doubt to the validity of the frictional map for something as variable as human fingers.

    4. Reviewer #4 (Public review):

      Summary:

      In this paper, Derkaloustian et al. look at the important topic of what affects fine touch perception. The observations that there may be some level of correlation with instabilities are intriguing. They attempted to characterize different materials by counting the frequency (occurrence #, not of vibration) of instabilities at various speeds and forces of a PDMS slab pulled lengthwise over the material. They then had humans perform the same vertical motion to discriminate between these samples. They correlated the % correct in discrimination with differences in frequency of steady sliding over the design space as well as other traditional parameters such as friction coefficient and roughness.

      The authors pose an interesting hypothesis and make an interesting observation about the occurrences of instability regimes in different materials while in contact with PDMS, which is interesting for the community to see in publication. It should be noted however that the finger is complex, and there are many factors that may be over simplified, and perhaps even incorrect, with the use of the PDMS finger. There are trends, such as the trend of surfaces that are more similar in PDMS friction coefficient being easier to discriminate than those with more different PDMS friction coefficient, that contradict multiple other papers in the literature (Fehlberg et al., 2024; Smith and Scott, 1996). This may be due to the PDMS finger not being representative of the real finger conditions. A measurement of friction and the instabilities with a human finger, or demonstration that the PDMS finger is producing the same results (friction coefficient, instabilities, etc.) as a human finger, is needed.

      Strengths:

      The strength of this paper is in its intriguing hypothesis and important observation that instabilities may contribute to what humans are detecting as differences in these apparently similar samples.

      Weaknesses:

      There is are significant weaknesses in the representativeness of the PDMS finger, the vertical motion, and the speed of sliding to real human exploration. The real finger has multiple layers with different moduli. In fact, the stratum corneum cells, which are the outer layer at the interface and determine the friction, have much higher modulus than PDMS. In addition, the flat contact area can cause shifting of contact points. Both can contribute to making the PDMS finger have much more stick slip than a real finger. In fact, if you look at the regime maps, there is very little space that has steady sliding. This does not represent well human exploration of surfaces. We do not tend to use force and velocity that will cause extensive stick slip (frequent regions of 100% stick slip) and, in fact, the speeds used in the study are on the slow side, which also contributes to more stick slip. At higher speeds and lower forces, all of the materials had steady sliding regions. Further, on these very smooth surfaces, the friction and stiction are more complex and cannot dismiss considerations such as finger material property change with sweat pore occlusion and sweat capillary forces. Also, the vertical motion of both the PDMS finger and the instructed human subjects is not the motion that humans typically use to discriminate between surfaces.

      This all leads to the critical question, why is the friction, normal force, and velocity not measured during the measured human exploration using the real human finger? An alternative would be showing that the PDMS finger reproduces the results of the human finger. I have checked the author's previous papers with this setup and did not find one that showed that the PDMS finger produced the same results as a human finger (Carpenter et al., 2018; Dhong et al., 2018; Nolin et al., 2022, 2021). The reviewer is not asking to do a more detailed psychophysical study with a decision-making model. All that is being asked is to use a human finger for the friction coefficient and instability measurements at typical human forces and speeds, or at least doing these measurements with both for one or two samples to show that the PDMS finger produces the same results as a human finger. The authors posed an extremely interesting hypothesis that humans may alter their speed to feel the instability transition regions. This is something that could be measured with a real finger but is not likely to be correlated accurately enough to match regime boundaries determined with such a simplified artificial finger.

      References

      Carpenter CW, Dhong C, Root NB, Rodriquez D, Abdo EE, Skelil K, Alkhadra MA, Ramírez J, Ramachandran VS, Lipomi DJ. 2018. Human ability to discriminate surface chemistry by touch. Mater Horiz 5:70-77. doi:10.1039/C7MH00800G<br /> Dhong C, Kayser LV, Arroyo R, Shin A, Finn M, Kleinschmidt AT, Lipomi DJ. 2018. Role of fingerprint-inspired relief structures in elastomeric slabs for detecting frictional differences arising from surface monolayers. Soft Matter 14:7483-7491. doi:10.1039/C8SM01233D<br /> Fehlberg M, Monfort E, Saikumar S, Drewing K, Bennewitz R. 2024. Perceptual Constancy in the Speed Dependence of Friction During Active Tactile Exploration. IEEE Transactions on Haptics 17:957-963. doi:10.1109/TOH.2024.3493421<br /> Nolin A, Licht A, Pierson K, Lo C-Y, Kayser LV, Dhong C. 2021. Predicting human touch sensitivity to single atom substitutions in surface monolayers for molecular control in tactile interfaces. Soft Matter 17:5050-5060. doi:10.1039/D1SM00451D<br /> Nolin A, Pierson K, Hlibok R, Lo C-Y, Kayser LV, Dhong C. 2022. Controlling fine touch sensations with polymer tacticity and crystallinity. Soft Matter 18:3928-3940. doi:10.1039/D2SM00264G<br /> Smith AM, Scott SH. 1996. Subjective scaling of smooth surface friction. Journal of Neurophysiology 75:1957-1962. doi:10.1152/jn.1996.75.5.1957

    1. eLife Assessment

      This valuable simulation study proposes a new coarse-grained model to explain the effects of CpG methylation on nucleosome wrapping energy and nucleosome positioning. The evidence to support the claims in the paper looks solid and this work will be of interest to the researchers working on gene regulation and mechanisms of DNA methylation.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors used a coarse-grained DNA model (cgNA+) to explore how DNA sequences and CpG methylation/hydroxymethylation influence nucleosome wrapping energy and the probability density of optimal nucleosomal configuration. Their findings indicate that both methylated and hydroxymethylated cytosines lead to increased nucleosome wrapping energy. Additionally, the study demonstrates that methylation of CpG islands increases the probability of nucleosome formation.

      Strengths:

      The major strength of this method is the model explicitly includes phosphate group as DNA-histone binding site constraints, enhancing CG model accuracy and computational efficiency and allowing comprehensive calculations of DNA mechanical properties and deformation energies.

      Weaknesses:

      A significant limitation of this study is that the parameter sets for the methylated and hydroxymethylated CpG steps in the cgNA+ model are derived from all-atom molecular dynamics (MD) simulations that use previously established force field parameters for modified cytosines (Pérez A, et al. Biophys J. 2012; Battistini, et al. PLOS Comput Biol. 2021). These parameters suggest that both methylated and hydroxymethylated cytosines increase DNA stiffness and nucleosome wrapping energy, which could predispose the coarse-grained model to replicate these findings. Notably, conflicting results from other all-atom MD simulations, such as those by Ngo T in Nat. Commun. 2016, shows that hydroxymethylated cytosines increase DNA flexibility, contrary to methylated cytosines. If the cgNA+ model were trained on these later parameters or other all-atom MD force fields, different conclusions might be obtained regarding the effects of methylated and hydroxymethylation on nucleosome formation.

      Despite the training parameters of the cgNA+ model, the results presented in the manuscript indicate that methylated cytosines increase both DNA stiffness and nucleosome wrapping energy. However, when comparing nucleosome occupancy scores with predicted nucleosome wrapping energies and optimal configurations, the authors find that methylated CGIs exhibit higher nucleosome occupancies than unmethylated ones, which seems to contradict the expected relationship where increased stiffness should reduce nucleosome formation affinity. In the manuscript, the authors also admit that these conclusions "apparently runs counter to the (perhaps naive) intuition that high nucleosome forming affinity should arise for fragments with low wrapping energy". Previous all-atom MD simulations (Pérez A, et al. Biophys J. 2012; Battistini, et al. PLOS Comput Biol. 202; Ngo T, et al. Nat. Commun. 20161) show that the stiffer DNA upon CpG methylation reduces the affinity of DNA to assemble into nucleosomes or destabilizes nucleosomes. Given these findings, the authors need to address and reconcile these seemingly contradictory results, as the influence of epigenetic modifications on DNA mechanical properties and nucleosome formation are critical aspects of their study.

      Understanding the influence of sequence-dependent and epigenetic modifications of DNA on mechanical properties and nucleosome formation is crucial for comprehending various cellular processes. The authors' study, focusing on these aspects, definitely will garner interest from the DNA methylation research community.

      Comments on revised version:

      The authors have addressed most of my comments and concerns regarding this manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      This study uses a coarse-grained model for double stranded DNA, cgNA+, to assess nucleosome sequence affinity. cgNA+ coarse-grains DNA on the level of bases and accounts also explicitly for the positions of the backbone phosphates. It has been proven to reproduce all-atom MD data very accurately. It is also ideally suited to be incorporated into a nucleosome model because it is known that DNA is bound to the protein core of the nucleosome via the phosphates.

      It is still unclear whether this harmonic model parametrized for unbound DNA is accurate enough to describe DNA inside the nucleosome. Previous models by other authors, using more coarse-grained models of DNA, have been rather successful in predicting base pair sequence dependent nucleosome behavior. This is at least the case as long as DNA shape is concerned whereas assessing the role of DNA bendability (something this paper focuses on) has been consistently challenging in all nucleosome models to my knowledge.

      It is thus of major interest whether this more sophisticated model is also more successful in handling this issue. As far as I can tell the work is technically sound and properly accounts for not only the energy required in wrapping DNA but also entropic effects, namely the change in entropy that DNA experiences when going from the free state to the bound state. The authors make an approximation here which seems to me to be a reasonable first step.

      Of interest is also that the authors have the parameters at hand to study the effect of methylation of CpG-steps. This is especially interesting as this allows to study a scenario where changes in the physical properties of base pair steps via methylation might influence nucleosome positioning and stability in a cell-type specific way.

      Overall, this is an important contribution to the questions of how sequence affects nucleosome positioning and affinity. The findings suggest that cgNA+ has something new to offer. But the problem is complex, also on the experimental side, so many questions remain open. Despite of this, I highly recommend publication of this manuscript.

      Strengths:

      The authors use their state-of-the-art coarse grained DNA model which seems ideally suited to be applied to nucleosomes as it accounts explicitly for the backbone phosphates.

      Weaknesses:

      The authors introduce penalty coefficients c_i to avoid steric clashes between the two DNA turns in the nucleosome. This requires c_i-values that are so high that standard deviations in the fluctuations of the simulation are smaller than in the experiments.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, authors utilize biophysical modeling to investigate differences in free energies and nucleosomal configuration probability density of CpG islands and nonmethylated regions in the genome. Toward this goal, they develop and apply the cgNA+ coarse-grained model, an extension of their prior molecular modeling framework.

      Strengths:

      The study utilizes biophysical modeling to gain mechanistic insight into nucleosomal occupancy differences in CpG and nonmethylated regions in the genome.

      Weaknesses:

      Although the overall study is interesting, the manuscripts need more clarity in places. Moreover, the rationale and conclusion for some of the analyses are not well described.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors used a coarse-grained DNA model (cgNA+) to explore how DNA sequences and CpG methylation/hydroxymethylation influence nucleosome wrapping energy and the probability density of optimal nucleosomal configuration. Their findings indicate that both methylated and hydroxymethylated cytosines lead to increased nucleosome wrapping energy. Additionally, the study demonstrates that methylation of CpG islands increases the probability of nucleosome formation.

      Strengths:

      The major strength of this method is that the model explicitly includes elastic constraints on the positions of phosphate groups facing a histone octamer, as DNA-histone binding site constraints. The authors claim that their model enhances the accuracy and computational efficiency and allows comprehensive calculations of DNA mechanical properties and deformation energies.

      Weaknesses:

      A significant limitation of this study is that the parameter sets for the methylated and hydroxymethylated CpG steps in the cgNA+ model are derived from all-atom molecular dynamics (MD) simulations that suggest that both methylated and hydroxymethylated cytosines increase DNA stiffness and nucleosome wrapping energy (P´erez A, et al. Biophys J. 2012; Battistini, et al. PLOS Comput Biol. 2021). It could predispose the coarse-grained model to replicate these findings. Notably, conflicting results from other all-atom MD simulations, such as those by Ngo T in Nat. Commun. 2016, shows that hydroxymethylated cytosines increase DNA flexibility, contrary to methylated cytosines. If the cgNA+ model was trained on these later parameters or other all-atom force fields, different conclusions might be obtained regarding the effects of methylated and hydroxymethylation on nucleosome formation.

      Despite the training parameters of the cgNA+ model, the results presented in the manuscript indicate that methylated cytosines increase both DNA stiffness and nucleosome wrapping energy. However, when comparing nucleosome occupancy scores with predicted nucleosome wrapping energies and optimal configurations, the authors find that methylated CGIs exhibit higher nucleosome occupancies than unmethylated ones, which seems to contradict their findings from the same paper which showed that increased stiffness should reduce nucleosome formation affinity. In the manuscript, the authors also admit that these conclusions “apparently runs counter to the (perhaps naive) intuition that high nucleosome forming affinity should arise for fragments with low wrapping energy”. Previous all-atom MD simulations (P´erez A, et al. Biophys J. 2012; Battistini, et al. PLOS Comput Biol. 202; Ngo T, et al. Nat. Commun. 20161) show that the stiffer DNA upon CpG methylation reduces the affinity of DNA to assemble into nucleosomes or destabilizes nucleosomes. Given these findings, the authors need to address and reconcile these seemingly contradictory results, as the influence of epigenetic modifications on DNA mechanical properties and nucleosome formation are critical aspects of their study. Understanding the influence of sequence-dependent and epigenetic modifications of DNA on mechanical properties and nucleosome formation is crucial for comprehending various cellular processes. The authors’ study, focusing on these aspects, will definitely garner interest from the DNA methylation research community.

      Training the cgNA+ model on alternative MD simulation datasets is certainly of interest to us. However, due to the significant computational cost, this remains a goal for future work. The relationship between nucleosome occupancy scores and nucleosome wrapping energy is still debated, with conflicting findings reported in the literature, as noted in our Discussion section. Interestingly, we find that our predicted log probability density of DNA spontaneously acquiring a nucleosomal configuration is a better indicator of nucleosome occupancy than our predicted DNA nucleosome wrapping energy.

      Reviewer #2 (Public Review):

      Summary:

      This study uses a coarse-grained model for double-stranded DNA, cgNA+, to assess nucleosome sequence affinity. cgNA+ coarse-grains DNA on the level of bases and accounts also explicitly for the positions of the backbone phosphates. It has been proven to reproduce all-atom MD data very accurately. It is also ideally suited to be incorporated into a nucleosome model because it is known that DNA is bound to the protein core of the nucleosome via the phosphates.

      It is still unclear whether this harmonic model parametrized for unbound DNA is accurate in describing DNA inside the nucleosome. Previous models by other authors, using more coarse-grained models of DNA, have been rather successful in predicting base pair sequence-dependent nucleosome behavior. This is at least the case as far as DNA shape is concerned whereas assessing the role of DNA bendability (something this paper focuses on) has been consistently challenging in all nucleosome models, to my knowledge.

      It is thus of major interest whether this more sophisticated model is also more successful in handling this issue. As far as I can tell the work is technically sound and properly accounts for not only the energy required in wrapping DNA but also entropic effects, namely the change in entropy that DNA experiences when going from the free state to the bound state. The authors make an approximation here which seems to me to be a reasonable first step.

      Of interest is also that the authors have the parameters at hand to study the effect of methylation of CpG-steps. This is especially interesting as it allows us to study a scenario where changes in the physical properties of base pair steps via methylation might influence nucleosome positioning and stability in a cell-type-specific way.

      Overall, this is an important contribution to the question of how the sequence affects nucleosome positioning and affinity. The findings suggest that cgNA+ has something new to offer. But the problem is complex, also on the experimental side, so many questions remain open.

      Strengths:

      The authors use their state-of-the-art coarse-grained DNA model which seems ideally suited to be applied to nucleosomes as it accounts explicitly for the backbone phosphates.

      Weaknesses:

      (1) According to the abstract the authors consider two “scalar measures of the sequence-dependent propensity of DNA to wrap into nucleosomes”. One is the bending energy and the other, is the free energy. Specifically in the latter, the authors take the difference between the free energies of the wrapped and the free DNA. Whereas the entropy of the latter can be calculated exactly, they assume that the bound DNA always has the same entropy (independent of sequence) in its more confined state. The problem is the way in which this is written (e.g. below Eq. 6) which is hard to understand. The authors should mention that the negative of Eq. 6 is what physicists call free energy, namely especially the free energy difference between bound and free DNA.

      We have included the necessary clarifications in the revised manuscript, below Eq. 6.

      (2) In Eq. 5 the authors introduce penalty coefficients c<sub>i</sub>. They write that values are “set by numerical experiment to keep distances ... within the ranges observed in the PDB structure, while avoiding sterical clashes in DNA.” This is rather vague, especially since it is unclear to me what type of sterical clashes might occur. Figure 1 shows then a comparison between crystal structures and simulated structures. They are reasonably similar but standard deviations in the fluctuations of the simulation are smaller than in the experiments. Why did the authors not choose smaller c<sub>i</sub>-values to have a better fit? Do smaller values lead to unwanted large fluctuations that would lead to steric clashes between the two DNA turns? I also wonder what side views of the nucleosomes look like (experiments and simulations) and whether in this side view larger fluctuations of the phosphates can be observed in the simulation that would eventually lead to turn-turn clashes for smaller c<sub>i</sub>-values.

      The side view plots of the experimental and predicted nucleosome structures are now added to Supplementary material (Figure S8). Indeed, smaller c<sub>i</sub> values lead to steric clashes between the two turns of DNA – this is now specified in the Methods section. A possible improvement of our optimisation method and a direction of future work would be adding a penalty which prevents steric clashes to the objective function. Then the c<sub>i</sub> values could be reduced to have bigger fluctuations that are even closer to the experimental structures. We added this explanation to the Results section.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors utilize biophysical modeling to investigate differences in free energies and nucleosomal configuration probability density of CpG islands and nonmethylated regions in the genome. Toward this goal, they develop and apply the cgNA+ coarse-grained model, an extension of their prior molecular modeling framework.

      Strengths:

      The study utilizes biophysical modeling to gain mechanistic insight into nucleosomal occupancy differences in CpG and nonmethylated regions in the genome.

      Weaknesses:

      Although the overall study is interesting, the manuscripts need more clarity in places. Moreover, the rationale and conclusion for some of the analyses are not well described.

      We edited the manuscript according to the reviewer’s suggestions and hopefully improved its readability.

      Reviewer #1 (Recommendations For The Authors):

      (1) The cgNA+ model parameters are derived from all-atom molecular dynamics (MD) simulations, yet there is no consensus within all-atom MD simulations regarding the impact of CpG methylation on DNA mechanical properties. The authors could consider fitting the coarsegrained model with a different all-atom force field to verify whether the conclusions regarding the effects of methylation and hydroxymethylation on DNA nucleosome wrapping energies still hold. For further details on MD simulations related to CpG methylation effects, the authors are advised to consult the review paper by Li et al. (2022) titled “DNA methylation: Precise modulation of chromatin structure and dynamics” published in Current Opinion in Structural Biology.

      Parametrizing the cgNA+ model using MD simulations with various force fields is certainly of interest to us. However, due to the computational cost involved, it remains a goal for future work.

      (2) Beyond DNA mechanical properties, which are directly linked to nucleosome wrapping energies in this study, the authors might also consider other factors such as geometric properties that could influence nucleosome formation. This approach might help the authors to reconcile the observed higher nucleosome occupancy scores for methylated CpGs. The authors are encouraged to review the aforementioned paper for additional experimental and MD simulation studies that could support this perspective.

      Geometric properties of DNA are directly incorporated into our method through the cgNA+ model equilibrium shape prediction µ. We compute the mechanical energy needed deform µ to a nucleosomal configuration. Notably, the equilibrium shape µ is sensitive to methylation, as demonstrated in Figure 3.

      (3) There are some issues with citation accuracy in the manuscript. For instance, in the Discussion section, the authors attribute a statement to Collings et al. and Anderson (2017), claiming that “methylated regions, known to have high wrapping energy, are among the highest nucleosome occupied elements in the genome.” However, upon reviewing this paper, it appears that it does not make any claims about the high wrapping energy of methylated regions.

      The paragraph is now edited and a separate citation, P´erez et al. (2012), is given for the statement that methylation regions have high wrapping energy.

      Reviewer #2 (Recommendations For The Authors):

      Please improve the readability by:

      (1) making clear that -ln ρ in Eq. 6 on page 4 is actually the free energy. Also, the word entropy comes too late (on page 7) where the best explanation of Eq. 6 is presented.

      We added a comment about -ln ρ being the free energy after Eq. 6 and also included an equation, relating ln ρ and entropy.

      (2) page 12 and 13 show two sets of experimental data. They are quite different from each other. When reading this, I wondered why there is this difference. But only on page 16, you explain that these are different cell types. The difference should be explained already when the papers are introduced on page 12.

      A corresponding sentence already appeared in page 12: “The observations about nucleosome occupancy should be regarded as preliminary, and be treated with caution, as they are based on experimental data obtained for the cancerous HeLa cells Schwartz et al. (2019) and human genome embryonic stem cells Yazdi et al. (2015)”. Now we also added this information to the first paragraph of the subsection for clarity.

      Finally, I add here some general thoughts that came up when reading the paper, comparing your findings with earlier findings in the field. This is not a strict one-to-one comparison and thus does not have to find its way into this manuscript but might give ideas for future studies. Experiments suggest that nucleosomes prefer DNA with a high content of C’s and G’s. Figure 2 does not look at the GC content but at the number of CpG’s. But in any case, let’s use this as a proxy for GC content. Figure 2a suggests that there is not a strong dependence of the bending energy on the number CpG steps. This is consistent with earlier work with the rigid basepair model which shows the same behavior for GC content (for both MD and crystal parametrizations). Figure 2c (related to the negative free energy) shows that with an increasing number of CpG steps the propensity to bind goes down. This suggests that the entropic cost to confine CpG-rich DNA increases, which in turn reflects that these DNA stretches are softer. This is rather interesting since in the case of the rigid basepair model this effect is observed only when stiffnesses are extracted from crystal data not MD data (however, this refers again to CG content). This might indicate a difference between the rigid bp model and cgNA+ which will be interesting to study in the future. Interesting is also the effect of CpG methylation. The stiffer methylated steps lead to an increase in the energy with the number of such steps (Figure 2a). The entropic cost for binding is thus expected to be smaller and this is indeed observed in Figure 2c when compared to the non-methylated steps.

      We thank the reviewer for this comment. As for the GC content, the energy and lnp plots are indeed very similar to those in Figure 2.

      Reviewer #3 (Recommendations For The Authors):

      (1) The formulation of the cgNA+ model in the method section was not easy to follow and can be described better to improve clarity.

      We have revised the model description and hope that its clarity has been improved

      (2) The authors mention utilizing 100 human genome sequences with 100 configurations from DB. It would be helpful to clarify the source of these 100 human genome sequences. Are these 100 distinct regions on the human reference genome, or are they from a specific dataset or database?

      We now include an explanation about the origin of sequences: “The human genome sequences are a random subset of our sequence sample for the CGI and NMI intersection in the Chromosome 1, but the following observations remain unchanged for sequence samples from different genomic regions.”

      (3) The authors mention the lack of tail unwrapping in their model. It would be beneficial to understand the magnitude of this issue and its potential impact on the overall results. How significant is the lack of unwrapping events in their current model?

      We observed the unwrapping of approximately five base-pairs at each end of our predicted nucleosome configurations, in comparison to the experimental configurations (Figure 1). This issue could be solved by adding additional constraints at the ends of the 147 bp sequence. The wrapping energy would increase marginally, as only about 10 of 147 bp would be affected. We added this remark to the main text.

      (4) Observations from Figure 3 are not described properly. Are these differences statistically significant? Why is twist higher for CpG sites but lower for a roll?

      We added an explanation of how the statistics was computed into the caption of Figure 3. In fact, we didn’t use statistical estimates here, but generated all the possible cases and computed the exact statistics (for the given set of our model parameters). Regarding the changes in twist and roll, we have added the following comment on page 7: “The ground state changes resulting from cytosine modifications – primarily characterized by an average increase in roll and a decrease in twist – may be linked to steric hindrance caused by the cytosine 5-substituent (Battistini et al. (2021)). Notably, the negative coupling between twist and roll has already been observed in X-ray crystallography data (Olson et al. (1998)).”

      (5) Figure 4 does not clarify the authors’ conclusion of higher stiffness for ApT and TpA dinucleotides. The authors should provide further explanation for this observation.

      We revised the text to clarify that the statement regarding ApT and TpA being the most stiff and the most flexible dinucleotides is not a conclusion derived from Figure 4, but rather from earlier work that we cite.

      (6) In Figure 7, the authors note that methylated CGIs have higher nucleosome occupancy on average than unmethylated sequences. Is this observation statistically significant?

      We observe that methylated sequences have a higher average occupancy than unmethylated sequences in Yazdi et al. data, when the CpG count falls into the intervals from 5 to 14 and from 15 to 24. For each of the two intervals this difference is statistically significant: the permutation test, used due to the lack of normality, yields a p-value of 0.0001 for both cases. The differences in mean scores shown in Figure 8 are also statistically significant. Such test results are expected, given the large sample sizes and the observed differences in means, therefore we prefer not to include this discussion in main text.

      (7) The authors note that their analyses to correlate nucleosome occupancy profile with the methylation state of underlying sequences are preliminary, as different cell lines were used to perform these analyses. Given this inconsistency, it needs to be clarified why this analysis was performed and what the takeaway is.

      We added the following comment at the end of the Results section: “Although comparing data from different cell lines is not optimal, to the best of our knowledge, no publicly available methylation and nucleosome occupancy data exist for the entire human genome within the same cell type. Nevertheless, since the lowest log probability densities in the human genome are predicted for CpG-rich sequences regardless of their methylation state (Figure 2d), and the same holds for both sets of the nucleosome occupancy scores (Figure 7), we conclude that the lowest occupancies occur for sequences with the lowest log probability densities.”

    1. eLife Assessment

      The authors addressed an important biological question, namely the role of glutamine metabolism in humoral responses, and they obtained solid conclusions. The strength of this study is that the authors used state-of-the-art transgenic mouse models together with in vitro analysis, thereby providing significant insights into the question posed. The following would strengthen the manuscript: i) adding more in-depth functionality/physiological relevance in the discussion part, and ii) regarding the experiments, the inclusion of more appropriate controls and a clearer and more accurate description of the methods.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Cho et al. present a comprehensive and multidimensional analysis of glutamine metabolism in the regulation of B cell differentiation and function during immune responses. They further demonstrate how glutamine metabolism interacts with glucose uptake and utilization to modulate key intracellular processes. The manuscript is clearly written, and the experimental approaches are informative and well-executed. The authors provide a detailed mechanistic understanding through the use of both in vivo and in vitro models. The conclusions are well supported by the data, and the findings are novel and impactful. I have only a few, mostly minor, concerns related to data presentation and the rationale for certain experimental choices.

      Detailed Comments:

      (1) In Figure 1b, it is unclear whether total B cells or follicular B cells were used in the assay. Additionally, the in vitro class-switch recombination and plasma cell differentiation experiments were conducted without BCR stimulation, which makes the system appear overly artificial and limits physiological relevance. Although the effects of glutamine concentration on the measured parameters are evident, the results cannot be confidently interpreted as true plasma cell generation or IgG1 class switching under these conditions. The authors should moderate these claims or provide stronger justification for the chosen differentiation strategy. Incorporating a parallel assay with anti-BCR stimulation would improve the rigor and interpretability of these findings.

      (2) In Figure 1c, the DMK alone condition is not presented. This hinders readers' ability to properly asses the glutaminolysis dependency of the cells for the measured readouts. Also, CD138+ in developing PCs goes hand in hand with decreased B220 expression. A representative FACS plot showing the gating strategy for the in vitro PCs should be added as a supplementary figure. Similarly, division number (going all the way to #7) may be tricky to gate and interpret. A representative FACS plot showing the separation of B cells according to their division numbers and a subsequent gating of CD138 or IgG1 in these gates would be ideal for demonstrating the authors' ability to distinguish these populations effectively.

      (3) A brief explanation should be provided for the exclusive use of IgG1 as the readout in class-switching assays, given that naïve B cells are capable of switching to multiple isotypes. Clarifying why IgG1 was preferentially selected would aid in the interpretation of the results.

      (4) The immunization experiments presented in Figures 1 and 2 are well designed, and the data are comprehensively presented. However, to prevent potential misinterpretation, it should be clarified that the observed differences between NP and OVA immunizations cannot be attributed solely to the chemical nature of the antigens - hapten versus protein. A more significant distinction lies in the route of administration (intraperitoneal vs. intranasal) and the resulting anatomical compartment of the immune response (systemic vs. lung-restricted). This context should be explicitly stated to avoid overinterpretation of the comparative findings.

      (5) NP immunization is known to be an inducer of an IgG1-dominant Th2-type immune response in mice. IgG2c is not a major player unless a nanoparticle delivery system is used. However, the authors arbitrarily included IgG2c in their assays in Figures 2 and 3. This may be confusing for the readers. The authors should either justify the IgG2c-mediated analyses or remove them from the main figures. (It can be added as supplemental information with proper justification).

      (6) Similarly, in affinity maturation analyses, including IgM is somewhat uncommon. I do not see any point in showing high affinity (NP2/NP20) IgMs (Figure 3d), since that data probably does not mean much.

      (7) Following on my comment for the PC generation in Figure 1 (see above), in Figure 4, a strategy that relies solely on CD40L stimulation is performed. This is highly artificial for the PC generation and needs to be justified, or more physiologically relevant PC generation strategies involving anti-BCR, CD40L, and various cytokines should be shown.

      (8) The effects of CB839 and UK5099 on cell viability are not shown. Including viability data under these treatment conditions would be a valuable addition to the supplementary materials, as it would help readers more accurately interpret the functional outcomes observed in the study.

      (9) It is not clear how the RNA seq analysis in Figure 4h was generated. The experimental strategy and the setup need to be better explained.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate the functional requirements for glutamine and glutaminolysis in antibody responses. The authors first demonstrate that the concentrations of glutamine in lymph nodes are substantially lower than in plasma, and that at these levels, glutamine is limiting for plasma cell differentiation in vitro. The authors go on to use genetic mouse models in which B cells are deficient in glutaminase 1 (Gls), the glucose transporter Slc2a1, and/or mitochondrial pyruvate carrier 2 (Mpc2) to test the importance of these pathways in vivo.

      Interestingly, deficiency of Gls alone showed clear antibody defects when ovalbumin was used as the immunogen, but not the hapten NP. For the latter response, defects in antibody titers and affinity were observed only when both Gls and either Mpc2 or Slc2a1 were deleted. These latter findings form the basis of the synthetic auxotrophy conclusion. The authors go on to test these conclusions further using in vitro differentiations, Seahorse assays, pharmacological inhibitors, and targeted quantification of specific metabolites and amino acids. Finally, the authors document reduced STAT3 and STAT1 phosphorylation in response to IL-21 and interferon (both type 1 and 2), respectively, when both glutaminolysis and mitochondrial pyruvate metabolism are prevented.

      Strengths:

      (1) The main strength of the manuscript is the overall breadth of experiments performed. Orthogonal experiments are performed using genetic models, pharmacological inhibitors, in vitro assays, and in vivo experiments to support the claims. Multiple antigens are used as test immunogens--this is particularly important given the differing results.

      (2) B cell metabolism is an area of interest but understudied relative to other cell types in the immune system.

      (3) The importance of metabolic flexibility and caution when interpreting negative results is made clear from this study.

      Weaknesses:

      (1) All of the in vivo studies were done in the context of boosters at 3 weeks and recall responses 1 week later. This makes specific results difficult to interpret. Primary responses, including germinal centers, are still ongoing at 3 weeks after the initial immunization. Thus, untangling what proportion of the defects are due to problems in the primary vs. memory response is difficult.

      (2) Along these lines, the defects shown in Figure 3h-i may not be due to the authors' interpretation that Gls and Mpc2 are required for efficient plasma cell differentiation from memory B cells. This interpretation would only be correct if the absence of Gls/Mpc2 leads to preferential recruitment of low-affinity memory B cells into secondary plasma cells. The more likely interpretation is that ongoing primary germinal centers are negatively impacted by Gls and Mpc2 deficiency, and this, in turn, leads to reduced affinities of serum antibodies.

      (3) The gating strategies for germinal centers and memory B cells in Supplemental Figure 2 are problematic, especially given that these data are used to claim only modest and/or statistically insignificant differences in these populations when Gls and Mpc2 are ablated. Neither strategy shows distinct flow cytometric populations, and it does not seem that the quantification focuses on antigen-specific cells.

      (4) Along these lines, the conclusions in Figure 6a-d may need to be tempered if the analysis was done on polyclonal, rather than antigen-specific cells. Alum induces a heavily type 2-biased response and is not known to induce much of an interferon signature. The authors' observations might be explained by the inclusion of other ongoing GCs unrelated to the immunization.

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript, the authors investigate how glutaminolysis (GLS) and mitochondrial pyruvate import (MPC2) jointly shape B cell fate and the humoral immune response. Using inducible knockout systems and metabolic inhibitors, they uncover a "synthetic auxotrophy": When GLS activity/glutaminolysis is lost together with either GLUT1-mediated glucose uptake or MPC2, B cells fail to upregulate mitochondrial respiration, IL 21/STAT3 and IFN/STAT1 signaling is impaired, and the plasma cell output and antigen-specific antibody titers drop significantly. This work thus demonstrates the promotion of plasma cell differentiation and cytokine signaling through parallel activation of two metabolic pathways. The dataset is technically comprehensive and conceptually novel, but some aspects leave the in vivo and translational significance uncertain.

      Strengths:

      (1) Conceptual novelty: the study goes beyond single-enzyme deletions to reveal conditional metabolic vulnerabilities and fate-deciding mechanisms in B cells.

      (2) Mechanistic depth: the study uncovers a novel "metabolic bottleneck" that impairs mitochondrial respiration and elevates ROS, and directly ties these changes to cytokine-receptor signaling. This is both mechanistically compelling and potentially clinically relevant.

      (3) Breadth of models and methods: inducible genetics, pharmacology, metabolomics, seahorse assay, ELISpot/ELISA, RNA-seq, two immunization models.

      (4) Potential clinical angle: the synergy of CB839 with UK5099 and/or hydroxychloroquine hints at a druggable pathway targeting autoantibody-driven diseases.

      Weaknesses:

      (1) Physiological relevance of "synthetic auxotrophy"

      The manuscript demonstrates that GLS loss is only crippling when glucose influx or mitochondrial pyruvate import is concurrently reduced, which the authors name "synthetic auxotrophy". I think it would help readers to clarify the terminology more and add a concise definition of "synthetic auxotrophy" versus "synthetic lethality" early in the manuscript and justify its relevance for B cells.

      While the overall findings, especially the subset specificity and the clinical implications, are generally interesting, the "synthetic auxotrophy" condition feels a little engineered. Therefore, the findings strongly raise the question of the likelihood of such a "double hit" in vivo and whether there are conditions, disease states, or drug regimens that would realistically generate such a "bottleneck". Hence, the authors should document or at least discuss whether GC or inflamed niches naturally show simultaneous downregulation/lack of glutamine and/or pyruvate. The authors should also aim to provide evidence that infections (e.g., influenza), hypoxia, treatments (e.g., rapamycin), or inflammatory diseases like lupus co-limit these pathways.

      It would hence also be beneficial to test the CB839 + UK5099/HCQ combinations in a short, proof-of-concept treatment in vivo, e.g., shortly before and after the booster immunization or in an autoimmune model. Likewise, it may also be insightful to discuss potential effects of existing treatments (especially CB839, HCQ) on human memory B cell or PC pools.

      (2) Cell survival versus differentiation phenotype

      Claims that the phenotypes (e.g., reduced PC numbers) are "independent of death" and are not merely the result of artificial cell stress would benefit from Annexin-V/active-caspase 3 analyses of GC B cells and plasmablasts. Please also show viability curves for inhibitor-treated cells.

      (3) Subset specificity of the metabolic phenotype

      Could the metabolic differences, mitochondrial ROS, and membrane-potential changes shown for activated pan-B cells (Figure 5) also be demonstrated ex vivo for KO mouse-derived GC B cells and plasma cells? This would also be insightful to investigate following NP-immunization (e.g., NP+ GC B cells 10 days after NP-OVA immunization).

      (4) Memory B cell gating strategy

      I am not fully convinced that the memory-B-cell gate in Supplementary Figure 2d is appropriate. The legend implies the population is defined simply as CD19+GL7-CD38+ (or CD19+CD38++?), with no further restriction to NP-binding cells. Such a gate could also capture naïve or recently activated B cells. From the descriptions in the figure and the figure legend, it is hard to verify that the events plotted truly represent memory B cells. Please clarify the full gating hierarchy and, ideally, restrict the MBC gate to NP+CD19+GL7-CD38+ B cells (or add additional markers such as CD80 and CD273). Generally, the manuscript would benefit from a more transparent presentation of gating strategies.

      (5) Deletion efficiency

      mRNA data show residual GLS/MPC2 transcripts (Supplementary Figure 8). Please quantify deletion efficiency in GC B cells and plasmablasts.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. The authors provide evidence that 1) non time-reversible models sometimes perform better than general time-reversible models when inferring phylogenetic trees out of simulated viral genome sequence data sets, and that 2) non time-reversible models can fit the real data better than the reversible substitution models commonly used in phylogenetics, a finding consistent with previous work. However, the methods are incomplete in supporting the main conclusion of the manuscript, that is that non time-reversible models should be incorporated in the model selection process for these data sets.

      The non-reversible models should be incorporated in the selection model process not because the significantly perform better but only because the do not perform worse than the reversible models and that true biochemical processes of nucleotide substitution does support the science of non-reversibility.

      Reviewer #1 (Public Review):

      The study by Sianga-Mete et al revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. This topic is not new, previous works already showed that non-reversible, and also covarion, substitution models can fit the real data better than the reversible substitution models commonly used in phylogenetics. In this regard, the results of the present study are not surprising. Specific comments are shown below.

      True

      It is well known that non-reversible models can fit the real data better than the commonly used reversible substitution models, see for example,

      https://academic.oup.com/sysbio/article/71/5/1110/6525257

      https://onlinelibrary.wiley.com/doi/10.1111/jeb.14147?af=R

      The manuscript indicates that the results (better fitting of non-reversible models compared to reversible models) are surprising but I do not think so, I think the results would be surprising if the reversible models provide a better fitting.

      I think the introduction of the manuscript should be increased with more information about non-reversible models and the diverse previous studies that already evaluated them. Also I think the manuscript should indicate that the results are not surprising, or more clearly justify why they are surprising.

      The surprise in the findings is in NREV12 performing better than NREV6 for double stranded DNA viruses as it was expected that NREV6 would perform better given the biochemical processes discussed in the introduction.

      In the introduction and/or discussion I missed a discussion about the recent works on the influence of substitution model selection on phylogenetic tree reconstruction. Some works indicated that substitution model selection is not necessary for phylogenetic tree reconstruction,

      https://academic.oup.com/mbe/article/37/7/2110/5810088

      https://www.nature.com/articles/s41467-019-08822-w

      https://academic.oup.com/mbe/article/35/9/2307/5040133

      While others indicated that substitution model selection is recommended for phylogenetic tree reconstruction,

      https://www.sciencedirect.com/science/article/pii/S0378111923001774

      https://academic.oup.com/sysbio/article/53/2/278/1690801

      https://academic.oup.com/mbe/article/33/1/255/2579471

      The results of the present study seem to support this second view. I think this study could be improved by providing a discussion about this aspect, including the specific contribution of this study to that.

      In our conclusion we have stated that:

      The lack of available data regarding the proportions of viral life cycles during which genomes exist in single and double stranded states makes it difficult to rationally predict the situations where the use of models such as GTR, NREV6 and NREV12 might be most justified: particularly in light of the poor over-all performance of NREV6 and GTR relative to NREV12 with respect to describing mutational processes in viral genome sequence datasets. We therefore recommend case-by-case assessments of NREV12 vs NREV6 vs GTR model fit when deciding whether it is appropriate to consider the application of non-reversible models for phylogenetic inference and/or phylogenetic model-based analyses such as those intended to test for evidence of natural section or the existence of molecular clocks.

      The real data was downloaded from Los Alamos HIV database. I am wondering if there were any criterion for selecting the sequences or if just all the sequences of the database for every studied virus category were analysed. Also, was any quality filter applied? How gaps and ambiguous nucleotides were considered? Notice that these aspects could affect the fitting of the models with the data.

      We selected varying number of sequences of the database for every studied virus type. Using the software aliview we did quality filter by re-aligning the sequences per virus type.

      How the non-reversible model and the data are compared considering the non-reversible substitution process? In particular, given an input MSA, how to know if the nucleotide substitution goes from state x to state y or from state y to state x in the real data if there is not a reference (i.e., wild type) sequence? All the sequences are mutants and one may not have a reference to identify the direction of the mutation, which is required for the non-reversible model. Maybe one could consider that the most abundant state is the wild type state but that may not be the case in reality. I think this is a main problem for the practical application of non-reversible substitution models in phylogenetics.

      True

      Reviewer #1 (Recommendations for the authors):

      The reversible and non-reversible models used in this study assume that all the sites evolve under the same substitution matrix, which can be unrealistic. This aspect could be mentioned.

      Done

      The manuscript indicates that "a phylogenetic tree was inferred from an alignment of real sequences (Avian Leukosis virus) with an average sequence identity (API) of ~90%.". I was wondering under which substitution model that phylogenetic tree reconstruction was performed? could the use of that model bias posterior results in terms of favoring results based on such a model?

      We have stated that the GTR+G model was used to reconstruct the tree. The use of the GTR+G model could yes bias the posterior results as we have stated in the paper too.

      I was wondering which specific R function was used to calculate the weighted Robinson-Foulds metric. I think this should be included in the manuscript.

      We stated that We used the weighted Robinson-Foulds metric (wRF; implemented in the R phangorn package (Schliep, 2011)⁠)

      Despite a minority, several datasets fitted better with a reversible model than with a non-reversible model. I think that should be clearly indicated. In addition, in my opinion the AIC does not enough penalizes the number of parameters of the models and favors the non-reversible models over the reversible models, but this is only my opinion based on the definition of AIC and it is not supported. Thus, I think the comparison between phylogenetic trees reconstructed under different substitution models was a good idea (but see also my second major comment).

      Noted

      When comparing phylogenetic trees I was wondering if one should consider the effect of the estimation method and quality of the studied data? For example, should bootstrap values be estimated for all the ancestral nodes and only ancestral nodes with high support be evaluated in the comparison among trees?

      Yes the estimation method and quality of the studied data should be considered. When using RF unlike wRF this will not matter but for weighted RF it does. When building the trees, using RaxML only high support nodes are added to the tree.

      In Figure 3, I do not see (by eye) significant differences among the models. I see in the legend that the statistical evaluation was based on a t test but I am not much convinced. Maybe it is only my view. Exactly, which pairs of datasets are evaluated with the t test? Next, I would expect that the influence of the substitution model on the phylogenetic tree reconstruction is higher at large levels of nucleotide diversity because with more substitution events there is more information to see the effects of the model. However, the t test seems to show that differences are only at low levels of nucleotide diversity (and large DNR), what could be the cause of this?

      The paired T-tests compares the wRF distances of the inferred tree real tree and the trees simulated using the GTR model verses the wRF distances of the inferred true tree from the trees simulated using the NREV12 model.

      The reason why the influence of the NREV12 model on the tree reconstructed is not significantly higher at large levels of nucleotide diversity could be because at a certain level the DNR are simply unrealistic.

      Can the user perform substitution model selection (i.e., AIC) among reversible and non-reversible substitution models with IQTREE? If yes, then doing that should be the recommendation from this study, correct?

      But, can DNR be estimated from a real dataset? DNR seems to be the key factor (Figure 3) for the phylogenetic analysis under a proper model.

      Substitution model selection can be performed among reversible and non-reversible using both HyPhy and IQTREE. And we have recommended that model tests should be done as a first step before tree building. Estimating DNR from real datasets requires a substation rate matrix of a non-reversible.

      The manuscript has many text errors (including typos and incorrect citations). For example, many citations in page 20 show "Error! Reference source not found.". I think authors should double check the manuscript before submitting. Also, some text is not formally written. For example, "G represents gamma-distributed rates", rates of what? The text should be clear for readers that are not familiar with the topic (i.e., G represents gamma-distributed substitution rates among sites). In general, I recommend a detailed revision of the whole text of the manuscript.

      Done

      Reviewer #2 (Public Review):

      The authors evaluate whether non time reversible models fit better data presenting strand-specific substitution biases than time reversible models. Specifically, the authors consider what they call NREV6 and NREV12 as candidate non time-reversible models. On the one hand, they show that AIC tends to select NREV12 more often than GTR on real virus data sets. On the other hand, they show using simulated data that NREV12 leads to inferred trees that are closer to the true generating tree when the data incorporates a certain degree of non time-reversibility.

      Based on these two experimental results, the authors conclude that "We show that non-reversible models such as NREV12 should be evaluated during the model selection phase of phylogenetic analyses involving viral genomic sequences". This is a valuable finding, and I agree that this is potentially good practice.

      However, I miss an experiment that links the two findings to support the conclusion: in particular, an experiment that solves the following question: does the best-fit model also lead to better tree topologies?

      By NREV12 leading to inferred trees that are closer to the true generating tree as compared to GTR, it then shows that the best-fit model in this case being NREV12 leads to better tree topologies.

      On simulated data, the significance of the difference between GTR and NREV12 inferences is evaluated using a paired t test. I miss a rationale or a reference to support that a paired t test is suitable to measure the significance of the differences of the wRF distance. Also, the results show that on average NREV12 performs better than GTR, but a pairwise comparison would be more informative: for how many sequence alignments does NREV12 perform better than GTR?

      We have used the popular paired t-test as it is the most widely used when comparing means values between two matched samples where the difference of each mean pair is normally distributed. And the wRF distances do match the guidelines above.

      The paired t-test contains the pairwise comparison and the boxplots side by side show the pairwise wRF comparisions.

      Reviewer #2 (Recommendations for the authors):

      The authors reference Baele et al., 2010 for describing NREV6 and NREV12. I suggest using the same name used in the referenced paper: GNR-SYM and GNR respectively. Although I do not think there is a standard name for these models, I would use a previously used one.

      We have built studies based on the names NREV6 and NREV12. We would like to keep the naming as standard for our studies.

      GTR and NREV12 models are already described in many other papers. I do not see the need to include such an extensive description. Also, a reference should be included to the discrete Gamma rate categories [1]

      We included the extensive description to enable other readers who are not super familiar with these models better understanding since we have given the models our own naming different from those used in other papers.

      We have added referencing for the discrete gamma rate as recommended. (Yang, 1994)

      To evaluate the exhaustiveness and correctness of the results, I would recommend publishing as supplementary material the simulated data sets or the scripts for generating the data set, the scripts or command lines for the analysis, and the versions of the software used (e.g., IQTREE). Also, to strongly support the main conclusion of the manuscript, I suggest adding to the simulations section results the RF-distances of the best-fit selected model under AIC, AICc, and BIC as well.

      We can go ahead and submit all the needed datasets. The simulated data RF-Distances results are available and will be submitted. We cannot however add them to the main document as this will create very long data tables.

      In some instances, it is mentioned that the selection criterion used is AIC, while in others, AIC-c is referenced. Even in the table captions, both terms are mixed. It should be made clearer which criterion is being employed, as AIC is not suitable for addressing the overparameterization of evolutionary models, given that it does not account for the sample size. A previous pre-print of this article [2] does not mention AIC-c, but also explicitly includes the formulas for AIC that do not take the sample size into account, and reports the same results as this manuscript, what indicates that AIC and not AIC-c was used here. This should be clarified. It is recommended to use AIC-c instead of AIC, especially if the sample size to model parameters ratio is low [3]. Two things may be appointed here: some authors consider tree branch lengths as model free parameters and others do not. In this paper it is not specified how the model parameters are counted. AIC tends to select more parameterized models than AIC-c, and overparameterization can lead to different tree inferences, as evidenced in Hoff et al., 2016. Therefore, it is expected that NREV12 is more frequently selected than NREV6 and GTR.

      In my opinion, a pairwise comparison between GTR and NREV12 performance is of great interest here, and the whiskers plots are not useful. Scatterplots would display the results better.

      Boxplots are meant to offer a simplified view of the results as the paired t-tests does all of the comparisons. We shall provide the scatter plots as supplementary information so that readers can get full detailed plots as recommended.

      Some references are missing.

      Missing references added

    2. Reviewer #1 (Public Review):

      The study by Sianga-Mete et al revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. This topic is not new, previous works already showed that non-reversible, and also covarion, substitution models can fit the real data better than the reversible substitution models commonly used in phylogenetics. In this regard, the results of the present study are not surprising.

    3. Reviewer #2 (Public Review):

      The authors evaluate whether non time reversible models fit better data presenting strand-specific substitution biases than time reversible models. Specifically, the authors consider what they call NREV6 and NREV12 as candidate non time-reversible models. On the one hand, they show that AIC tends to select NREV12 more often than GTR on real virus data sets. On the other hand, they show using simulated data that NREV12 leads to inferred trees that are closer to the true generating tree when the data incorporates a certain degree of non time-reversibility. Based on these two experimental results, the authors conclude that "We show that non-reversible models such as NREV12 should be evaluated during the model selection phase of phylogenetic analyses involving viral genomic sequences". This is a valuable finding, and I agree that this is potentially good practice. However, I miss an experiment that links the two findings to support the conclusion: in particular, an experiment that solves the following question: does the best-fit model also lead to better tree topologies?

      [Editors' note: the reviewers were sent the revised submission and rebuttal and based on their response, an amended eLife Assessment has been formulated.]

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      In this manuscript, Gruber et al perform serial EM sections of the antennal lobe and reconstruct the neurites innervating two types of glomeruli one that is narrowly tuned to geosmin and one that is broadly tuned to other odours. They quantify and describe various aspects of the innervations of olfactory sensory neurons (OSNs), uniglomerlular projection neurons (uPNs), and the multiglomerular Local interneurons (LNs) and PNs (mPNs). They find that narrowly tuned glomeruli had stronger connectivity from OSNs to PNs and LNs, and considerably more connections between sister OSNs and sister PNs than the broadly tuned glomeruli. They also had less connectivity with the contralateral glomeruli. These observations are suggestive of strong feed-forward information flow with minimal presynaptic inhibition in narrowly tuned glomeruli, which might be ecologically relevant, for example, while making quick decisions such as avoiding a geosmin-laden landing site. In contrast, information flow in more broadly tuned glomeruli show much more lateralisation of connectivity to the contralateral glomerulus, as well as to other ipsilateral glomeruli. 

      The data are well presented, the manuscript clearly written, and the results will be useful to the olfaction community. I wonder, given the hemibrain and FAFB datasets exist, whether the authors have considered verifying whether the trends they observe in connectivity hold across three brains? Is it stereotypic? 

      We appreciate the reviewer’s positive view of our study and their thoughtful and relevant comment on the issue of individual variation. We agree in that this is a very important question and notice that it was also asked for by the second Reviewer. It reflects both our limited understanding of the range of individual variation in synaptic connectivity—whether in flies, humans, or other species—and the challenge of determining which of the differences observed in our study are stereotypical features of each glomerulus type. Undoubtedly this criticism addresses a crucial problem of practically all connectome studies so far and for which there is no immediate solution. This type of studies requires so much time, efforts and money that increasing the number of samples is seldom feasible. The Reviewer wonders if we could compare our data with that made available by two of the largest connectome studies of Drosophila. This appeared to us to be a very good idea and we have tried to follow the advice but, unfortunately, it was impracticable because of the reasons we explain below. The hemibrain data cannot be used for this purpose because it does not contain the full glomerulus DA2 (Schlegel et al., 2021). A different problem hindered us from using the FAFB dataset, the other dataset mentioned by the Reviewer. In this case the three glomeruli were sectioned and reconstructed but the dataset lacks an annotated list of all synaptic connections corresponding to each glomerulus. Such annotation (a compendium of all synaptic connections inside each glomerulus informing for each connection which type of neuron provides the presynaptic site and which the postsynaptic site) is essential for direct comparison with our data. It is important to keep in mind that the current analytical tools available for the use of these datasets (e.g., NeuPrint, FlyWire and CATMAID) do not offer the ability to extract data on synapses exclusively from the glomerular volume of DA2 or DL5. In this case, it certainly is theoretically possible to obtain the data by doing ourselves the annotation. However, such a study will demand so much time, efforts and financial resources, which we believe would not be justified solely to increase the number of individuals from one to two. Instead, our manuscript includes a comparison of the OSN connectivity in VA1v and DL5 using the hemibrain dataset published by Schlegel et al. (2021) (see revised manuscript: lines 311–315; 431–434; 558–562; 602–606).

      Beyond the opinion, that we share in full with the Reviewer, that a comparison including three flies will be better than a comparison made with one glomerulus of each type we are still challenged by the question of which -if any- of the differences are stereotypic. The clarification of what are stereotypical differences between particular glomeruli in features as those discussed in our study and what is simply differences within the normal range of individual variation is basically a statistical problem. A first attempt at a comprehensive comparison focusing on intra- and inter-individual variability was recently made by comparing two connectome datasets from two different Drosophila individuals (Dorkenwald et al., 2024; Schlegel et al., 2024). At present, it is still unclear how many samples are needed to make a statistically robust comparison of olfactory synaptic circuits in adult flies—perhaps 3, 6, or even 18 individuals?  

      Reviewer #2 (Public Review):

      The chemoreceptor proteins expressed by olfactory sensory neurons differ in their selectivity such that glomeruli vary in the breadth of volatile chemicals to which they respond. Prior work assessing the relationship between tuning breadth and the demographics of principal neuron types that innervate a glomerulus demonstrated that narrowly tuned glomeruli are innervated more projection neurons (output neurons) and fewer local interneurons relative to more broadly tuned glomeruli. The present study used high-resolution electron microscopy to determine which synaptic relationships between principal cell types also vary with glomerulus tuning breadth using a narrowly tuned glomerulus (DA2) and a broadly tuned glomerulus (DL5). The strength of this study lies in the comprehensive, synapse-level resolution of the approach. Furthermore, the authors implement a very elegant approach of using a 2-photon microscope to score the upper and lower bounds of each glomerulus, thus defining the bounds of their restricted regions of interest. There were several interesting differences including greater axo-axonic afferent synapses and dendrodentric output neuron synapses in the narrowly tuned glomerulus, and greater synapses upon sensory afferents from multiglomerular neurons and output neuron autapses in the broadly tuned glomerulus.     The study is limited by a few factors. There was a technical need to group all local interneurons, centrifugal neurons, and multiglomerular projection neurons into one category ("multiglomerular neurons") which complicates any interpretations as even multiglomerular projection neurons are very diverse. Additionally, there were as many differences between the two narrowly tuned glomeruli as there were comparing the narrowly and broadly tuned glomeruli. Architecture differences may therefore not reflect differences in tuning breadth, but rather the ecological significance of the odors detected by cognate sensory afferents. Finally, some synaptic relationships are described as differing and others as being the same between glomeruli, but with only one sample from each glomerulus, it is difficult to determine when measures differ when there is no measure of inter-animal variability. If these caveats are kept in mind, this work reveals some very interesting potential differences in circuit architecture associated with glomerular tuning breadth.

      This work establishes specific hypotheses about network function within the olfactory system that can be pursued using targeted physiological approaches. It also identifies key traits that can be explored using other high-resolution EM datasets and other glomeruli that vary in their tuning selectivity. Finally, the laser "branding" technique used in this study establishes a reduced-cost procedure for obtaining smaller EM datasets from targeted volumes of interest by leveraging the ability to transgenically label brain regions in Drosophila.

      CLASSIFICATION OF NEURONAL TYPES

      We agree that grouping diverse types of interneurons into a single category (referred to as MGNs) limits the ability to make interpretations about synaptic similarities and differences between specific neuronal types. This was, however, an unavoidable compromise resulting from our decision to generate a comprehensive, synapse-level reconstruction of the restricted regions encompassing the DA2 and DL5 glomeruli. As both reviewers have noted, this approach offers significant value and we hope the Editor will also recognize that this limitation does not prevent readers from gaining important and novel insights into the synaptic circuitry of these two glomeruli.  

      Similar to the approach taken by Tobin at al. (2017) we prioritized producing a densely reconstructed neuropile, in which no synapses were omitted (Tobin et al., 2017). The downside of this method is that not all synaptic connections could be reliably assigned to specific neuronal types, with about 12% remaining unassigned." We anticipate that future research, supported by advances in semi-automated tracing methods, improved imaging technologies, and increased personnel resources, will allow not only for the generation of more complete connectomes of the entire brain (Scheffer et al., 2020; Zheng et al., 2018), but also, for the accurate reconstruction and classification of individual synapses—even in highly complex regions such as the olfactory glomeruli. We also expect that a second complete connectome of a male Drosophila will soon become available, which will provide valuable opportunities for comparisons across individuals and between male and female brains in future studies.

      INTERGLOMERULAR DIFFERENCES

      Thank you for this insightful comment. It is indeed true that despite both DA2 and VA1v being narrowly tuned glomeruli, they exhibit considerable differences in specific connectivity features (e.g., relative synaptic strengths above certain thresholds) and that those differences can be as pronounced as those observed between DA2 and the broadly tuned DL5. For this reason, comparing each individual glomerulus to every other is not a practical or informative approach. To derive robust interpretations, we focused instead on whether two glomeruli that share a particular functional characteristic—namely, being narrowly tuned for single odorants—also share connectivity patterns that distinguish them from a broadly tuned reference glomerulus.

      Our results support this. Furthermore, additional connectomics data reinforce our conclusions.

      For example, OSN-OSN connectivity is stronger in the two narrowly tuned glomeruli (DA2 and VA1v) relative to the broadly tuned glomerulus (DL5). While these pairwise differences alone are not conclusive, the finding that the two narrowly tuned glomeruli studied here share features that distinguish them from the broadly tuned glomerulus supports our interpretation. We found further support for this idea in the data reported by Schlegel et al. (2021) further. In that dataset, other narrowly tuned glomeruli (DA1, DL3, and DL4) also exhibit stronger OSNOSN connectivity than other broadly tuned glomeruli (DM1 or DM4).

      We do not deny that there are many differences between any given pair of glomeruli, regardless of whether they are narrowly or broadly tunned. Instead, we propose that our findings on circuit features indicate that most of the observed differences actually grouped the two narrowly tuned glomeruli together relative to the broadly tuned glomerulus. A more concise summary is now provided in the newly added Figure 8. We also added explanatory lines of text in the beginning of the chapter ‘specific features of narrowly tuned glomerular circuits. 

      ECOLOGICAL SIGNIFICANCE

      This is an interesting point. However, it is difficult to disentangle the "ecological significance" of processed odorants from the "tuning breadth" of a glomerulus. In the Drosophila olfactory system, glomerular circuits that respond to ecologically important odorants—such as those involved in reproduction or danger—tend to be more narrowly tuned. Moreover, while we refer to odorants with specific ecological significance as those linked to survival or reproductive behaviors, defining the significance of an odorant with precision is inherently challenging, as it can vary depending on context and environmental conditions.

      What both circuits share is their narrow tuning breadth. We therefore propose that the common circuit features of VA1v and DA2, highlighted in this study, are functionally related to the fact that each circuit processes single odorants. Consequently, their specificity is most likely determined at the level of the receptor. 

      INDIVIDUAL VARIABILITY

      We agree that accounting for inter-animal variability would strengthen the study. However, we are confident that even a modest statistically sound assessment of this variability would require a larger sample size, certainly more than just two or three flies, which is presently not feasible.

      We refer the reviewer to our response to Reviewer #1 regarding this important issue.

      Initial insights into variability between flies have been provided through comparative analyses of the two most comprehensive female Drosophila melanogaster connectomes—the FAFB and hemibrain datasets (Schlegel et al., 2024). For more detailed quantitative comparisons regarding inter-animal variability, please refer to our response to the second major point raised by Reviewer #2. As highlighted by Schlegel et al. (2024), making definitive statements about the stereotypy of neuron numbers, unitary cell-cell connections (edges), or synaptic strengths (weights) remains a complex challenge."

      While appreciating the rigour of this work we were surprised to notice the omission of a comparison of their observations with the two other existing datasets. This would not only have addressed the technical limitation of this particular study - the inability to identify specific neuron types due to imaging a small part of the brain - but would also have shed light on inter-animal variability 

      We strongly recommend that the authors do make this comparison - the datasets are currently extremely user friendly and so we don't estimate the replication of their key findings will be too onerous. This will be particularly important to resolve the issue of having to classify all multiglomerular local interneurons and multiglomerular projection neurons - broadly into "MGN. Such a comparison will dramatically strengthen this study that poses very interesting questions, but in its current form, has this striking shortcoming. 

      INDIVIDUAL VARIABILITY AS EXPRESSED HERE:

      Earlier on we were of the same opinion that the Reviewer express here but, unfortunately, it was not possible to follow his advice. As far as it was possible, we have compared some of our results to the values of the two datasets that the Reviewer refers to, but the absence of glomerulus DA2 in one of the datasets and the absence of synapse annotation for all the relevant glomeruli in the other dataset prevented us from making a full comparison. Moreover, believe that the problem of individual variation most probably cannot be solved by increasing the comparison with one or two more flies.

      Reviewer #1 (Recommendations for The Authors): 

      The lines 270 - 282 confused me in the backdrop of Figure 3B. 

      The concern may stem from our inclusion of a comparison between the uPNs of glomerulus DA2 and the single uPN of glomerulus DL5 in the statistical analysis presented in Figure 3. This comparison was included to ensure a comprehensive representation of the data, highlighting the variability across all major cell groups. We have clarified this rationale in the revised manuscript (see lines 274-282).

      Reviewer #2 (Recommendations for The Authors): 

      I commend the authors for taking such a thorough approach to advance an interesting topic in olfaction. The following suggestions are intended to strengthen this study: 

      Major points: 

      A color-blind-friendly palette should be used for all figures. Currently, five of seven figures use red and green, and in particular, Figure 5 will be uninterpretable for red/green color-blind readers. 

      We are thankful for this important comment. We changed the color palette as suggested by the reviewer, and replaced Red with Magenta and changed the figure legend accordingly.

      This level of analysis is extremely resource and time-consuming, so even obtaining this information at this resolution is an impressive achievement. However, this study would be well served by strategically supplementing the analysis of this dataset with information from other publicly available connectomics datasets. For instance, some interpretations are limited because there is information from only a single DL5 and DA2 glomerulus. Any claims in which one glomerulus has more, less, or the same of a metric must be tempered because without replicates, there are no measures of inter-animal variability. As an example, on lines 386-387 the authors state "The relative synaptic strength between MGN>uPN was stronger in DA2 (12%) than DL5 (10%)". It is difficult to assess whether this represents a difference that is outside of the range of inter-animal variability inherent to the olfactory system. Taking select measures from the Hemibrain and FAFB (via FlyWire) datasets could help strengthen these claims. 

      We fully agree with the Reviewer’s opinion that since our data is from one glomerulus of each type “It is difficult to assess whether this represents a difference that is outside of the range of inter-animal variability inherent to the olfactory system.” This is a weakness of practically all connectome studies based on electron microscopy in both Drosophila and other animals We cannot be sure that measurements from the Hemibrain and FAFB datasets could help strengthen our claims, because the magnitude of the range of individual variation is presently not known and most probably solving this problem will require more than one or two more flies. In any case, it is not possible to follow this advice and compare our data with that of the hemibrain because the DA2 was not included in that study. We ask the Reviewer to read our more detailed explanation in our response to Reviewer 1.

      In the particular case commented by the Reviewer above, the relative difference in synaptic strength exceeds 20%. Whether such a difference has functional relevance remains an open question but Schlegel et al. (2024) support our interpretation. They showed that synaptic weights with differences larger than 20% tend to be consistent across individuals, with strong correlations within and between animals (Pearson’s R = 0.97 and R = 0.8; Fig. 4).

      Grouping all local interneurons, centrifugal neurons response and multiglomerular PNs into one category limits the ability to make interpretations about similarities or differences in the synaptic relationships involving MGNs. The authors could get an estimate of the number of multiglomerular PNs in DL5, VA1v, and DA2 from Hemibrain and FlyWire platforms to get a better sense of differences between glomeruli in the MGN category. 

      We agree in that grouping a variety of interneurons into a single category (called MGNs) limits the ability to make interpretations about similarities or differences in the synaptic relationships involving different neurons. This was the unavoidable price to be paid once we decided to register a “comprehensive, synapse-level resolution” map of these two glomeruli. It appears to us that both reviewers have clearly recognized the intrinsic value of this approach and we hope that the Editor will share this opinion. 

      Consistent with the assumptions of Tobin et al., (2017) our hypothesis on LN connectivity differences is based on the fact that they are the most numerous and broadly arborizing neurons of the class that we call multiglomerular neurons in the AL (Chou et al., 2010; Lin et al., 2012; Tanaka et al., 2012). Recent connectome studies confirm this feature across all glomeruli (Bates et al., 2020; Horne et al., 2018; Scheffer et al., 2020; Schlegel et al., 2021; Zheng et al., 2018).  

      In response to the reviewer’s question, we conducted a case-specific reanalysis of the data from Horne (2018), which provides comprehensive connectivity information for the VA1v glomerulus. This allowed us to quantify the proportional contributions of LNs (n = 56) and mPNs (n = 13) to all MGN connections (MGN-MGN, MGN>OSN, MGN>uPN, uPN>MGN, OSN>MGN).

      Our analysis showed that 84% of MGN output originates from LNs. 57% of the input to MGN comes from LNs and 43% from mPNs, largely due to strong OSN>mPN input. Thus, for the filtered MGN connections relevant to distinguishing narrowly from broadly tuned circuits (e.g., MGN>OSN, uPN>MGN; see Fig. 8), LNs are the dominant contributors in VA1v. (These data are not included in the resubmitted manuscript.) This supports our interpretation that the LN are responsible for the majority of MGN connections underlying the observed differences between glomeruli.

      For instance, prior work has reported fewer local interneurons innervating DA2, but in this study there was an unexpected result that there was greater MGN innervation density and synapse # for DA2 relative to DL5 This discrepancy could be due to differences in the number of multiglomerular PNs innervating each glomerulus, which would be obscured when these PNs are combined with local interneurons in the MGN category. 

      "We agree that the greater MGN innervation density in DA2 in our study could reflect a stronger contribution from mPNs. However, innervation density alone does not indicate how many mPNs actually innervate DA2 or DL5. Alternatively, increased innervation and/or synaptic frequency of local interneurons (LNs) could also account for this observation. In our view, neuron number does not necessarily correlate with branching complexity or synaptic density. 

      For example, the dendritic length of the single uPN in glomerulus DL5 is approximately equal to the combined dendritic length of the multiple uPNs of the DA2. Similarly, Tobin et al. (2017) reported that when comparing uPNs in glomerulus DM6 between the left and right brain hemispheres, they found variability in cell number but not in dendritic length. More recently, the FAFB and hemibrain datasets showed a similar pattern in another neuronal type. A substantial variation in cell number was observed for Kenyon cells between the two Drosophila individuals, but this cell type consistently makes and receives, in both individuals, similar presynapses and post-synapses (Schlegel et al., 2024).

      On line 33 the authors cannot claim that DA2-OSNs experience less presynaptic inhibition based on the data in this study. Even without the limitations of the MGN category (described above), presynaptic inhibition depends on more than just the number of synapses, rather it is affected by GABA B receptor expression levels and the second messenger components downstream of this receptor. Physiological experiments are needed to justify this claim, so I recommend adjusting accordingly.

      We agree with the Reviewer and have adjusted the text on line 33 and in the main body of the text by referring to this finding as “presynaptic input”, which is what we have quantified, instead of “less presynaptic inhibition”.

      Figures 5 and 6 seek to distill the wealth of information from this study into broad takehome points for the reader, while still providing a good amount of detail. I think a final more concise graphic summary (similar to the graphical abstract or Figure 6 of Grabe et al 2016) depicting the most critical differences between glomeruli would further clarify the broad findings of this study. 

      We appreciate this comment and we have added a “graphic summary” as the Reviewer proposed. We made a new figure that becomes Figure 8 and summarizes our results and highlights differences between narrowly and broadly tuned glomeruli in a more concise graphical abstract format.

      Minor points: 

      Much of the manuscript provides details about synapse fractions or % synapses for a given synaptic relationship. Please ensure that it is clear which principal cell types are being described, as it can be easy to get lost.  - Should line 284 say "...than DL5 as it has been reported that DA2 is innervated by fewer LNs..."?

      We appreciate the reviewer’s comment and we have corrected this sentence that now reads as follows: (see text: beginning at line 290).  

      Taisz et al.  has been published, so the citation should be updated. 

      We have updated the corresponding citation.  

      On line 233, the authors ascribe the small electron-dense vesicles as likely housing sNPF released by MGNs. However, Carlsson et al. (2010) demonstrated that sNPF is released by OSNs, which was further functionally characterized by Root et al. (2011) and Ko et al. (2014). In terms of MGNs that release neuropeptides, Carlsson et al. 2010 demonstrated that local interneurons immunolabel for tachykinin, myoinhibitory peptide, and allatostatin-A, while two extrinsic neurons release SIFamide. In theory, aminergic neurons could also have small electron-dense vesicles, but this can be variable. 

      The Reviewer is completely right in his criticism. The MGN certainly contain neurons that have been reported to contain neuropeptides other than sNPF. We have corrected this sentence and it now reads as follows (page7, line 236): “Interestingly, besides the abundant clear small vesicles..

      On line 636, the Berck and Schlegel studies demonstrated that panglomerular local interneurons synapse upon OSN, but not that they induce presynaptic inhibition (which was demonstrated in the studies cited in the next sentence). I recommend adjusting this sentence.

      We agree and we have corrected the text following the Reviewers advice. It now reads as follows (page 19. Line 663): “We also observed that OSNs received less MGN feedback.

    2. eLife Assessment

      This study seeks to determine how synaptic relationships between principal cell types in the olfactory system vary with glomerulus selectivity and is therefore valuable to the field. The methodology is solid, and with the caveat that here was a technical need to group all local interneurons, centrifugal neurons and multiglomerular projection neurons into one category ("multiglomerular neurons"), this work reveals some very interesting potential differences in circuit architecture associated with glomerular tuning breadth.

    3. Reviewer #1 (Public review):

      In this manuscript, Gruber et al perform serial EM sections of the antennal lobe and reconstruct the neurites innervating two types of glomeruli - one that is narrowly tuned to geosmin and one that is broadly tuned to other odours. They quantify and describe various aspects of the innervations of olfactory sensory neurons (OSNs), uniglomerlular projection neurons (uPNs), and the multiglomerular Local interneurons (LNs) and PNs (mPNs). They find that narrowly tuned glomeruli had stronger connectivity from OSNs to PNs and LNs, and considerably more connections between sister OSNs and sister PNs than the broadly tuned glomeruli. They also had less connectivity with the contralateral glomerluli. These observations are suggestive of strong feed-forward information flow with minimal presynaptic inhibition in narrowly tuned gomeruli, which might be ecologically relevant, for example, while making quick decisions such as avoiding a geosmin-laden landing site. In contrast, information flow in more broadly tuned glomeruli show much more lateralisation of connectivity to the contralateral glomerulus, as well as to other ipsilateral glomeruli.

      The data are well presented, the manuscript clearly written, and the results will be useful to the olfaction community. I had earlier suggested comparisons with other EM datasets that exist to investigate stereotypy, and am convinced by their efforts and reasons for which these were either not possible to do or not possible within the timeframe of a revision.

      Comments on revisions:

      Thank you for the careful responses to my suggestions. I hope that such approaches will be possible by others going forward.

    4. Reviewer #2 (Public review):

      The chemoreceptor proteins expressed by olfactory sensory neuron differ in their selectivity such that glomeruli vary in the breadth of volatile chemicals to which they respond. Prior work assessing the relationship between tuning breadth and the demographics of principal neuron types that innervate a glomerulus demonstrated that narrowly tuned glomeruli are innervated more projection neurons (output neurons) and fewer local interneurons relative to more broadly tuned glomeruli. The present study used high resolution electron microscopy to determine which synaptic relationships between principal cell types also vary with glomerulus tuning breadth using a narrowly tuned glomerulus (DA2) and a broadly tuned glomerulus (DL5). The strength of this study lies in the comprehensive, synapse-level resolution of the approach. Furthermore, the authors implement a very elegant approach of using a 2-photon microscope to score the upper and lower bounds of each glomerulus thus defining the bounds of their restricted regions of interest. Using the approach, the authors identify several architectural motifs that differ between glomeruli with different tuning properties

      In the revised version of this study the authors discuss several important limitations. There was a technical need to group all local interneurons, centrifugal neurons and multiglomerular projection neurons into one category ("multiglomerular neurons") which complicates interpretations as even multiglomerular projection neurons are very diverse. With only 2 narrowly tuned glomeruli and 1 broadly tuned glomerulus, architecture differences may reflect more than just differences in tuning breadth. Finally, the degree to which inter-animal variability may contribute to differences between glomeruli is discussed. If these caveats are kept in mind, this work reveals some very interesting potential differences in circuit architecture associated with glomerular tuning breadth.

      This work establishes specific hypotheses about network function within the olfactory system that can be pursued using targeted physiological approaches. It also identifies key traits that can be explored using other high resolution EM datasets and other glomeruli that vary in their tuning selectivity. Finally, the laser "branding" technique used in this study establishes a reduced cost procedure for obtaining smaller EM datasets from targeted volumes of interest by leveraging the ability to transgenically label brain regions in Drosophila.

      Comments on revisions:

      I appreciate the thoughtful responses that the authors made regarding the initial assessment of their study. The authors discuss these limitations in their manuscript which should not be viewed as criticisms, but rather caveats to be considered for this study specifically and in some instances, for all connectomics studies.

      I still believe there is a lost opportunity to make use of the FlyWire dataset to make specific strategic comparisons. I do not propose attempting to replicate the comprehensive nature of the main study, but querying cell type based on glomerular innervation would allow the authors to address consistency of observed differences between glomeruli as ORNs and uPNs have been thoroughly annotated and analysis can be limited by neuropil. I agree that it is unclear how many individuals would need to be examined to achieve sufficient statistical power, but some of the circuit motifs revealed in this study can be readily tested in the FlyWire dataset. For instance, the observation from this study that narrowly tuned ORNs receive less synaptic input from LNs is supported in FlyWire, with DL5 ORNs getting far more synaptic input from LNs relative to DA2 and VA1v. I'm not proposing repeating all of the analyses from this study, and there is no doubt that inter-animal variability and technical differences can explain different observations across datasets, but I believe these are considerations of which the readers (who can query these synaptic relationships in FlyWire) should be made aware.

    1. eLife Assessment

      The manuscript presents a valuable finding that CCDC32, beyond its reported role in AP2 assembly, follows AP2 to the plasma membrane and regulates clathrin-coated pit assembly and dynamics. The authors further identify an alpha-helical region within CCDC32 that is essential for its interaction with AP2 and its cellular function. While live-cell and ultrastructural imaging data are solid, future biochemical studies will be needed to confirm the proposed CCDC32-AP2 interaction.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Yang et al. describes CCDC32 as a new clathrin mediated endocytosis (CME) accessory protein. The authors show that CCDC32 binds directly to AP2 via a small alpha helical region and cells depleted for this protein show defective CME. Finally, the authors show that the CCDC32 nonsense mutations found in patients with cardio-facial-neuro-developmental syndrome (CFNDS) disrupt the interaction of this protein to the AP2 complex. The results presented suggest that CCDC32 may act as both a chaperone (as recently published) and a structural component of the AP2 complex.

    3. Reviewer #2 (Public review):

      Summary:<br /> The authors responded to my previous concerns with additional arguments and discussion. While I do not object to the publication of this work, two critical experiments are still missing.

      Weaknesses:<br /> First, biochemical assays using recombinant proteins should be conducted to determine whether CCDC32 binds to the full AP2 adaptor or to specific AP2 intermediates, such as hemicomplexes. The current co-IP data from mammalian cell lysates are too complex to interpret conclusively. Second, cell fractionation should be performed to assess whether, and how, CCDC32 associates with membrane-bound AP2.

    4. Reviewer #3 (Public review):

      In this manuscript, Yang et al. characterize the endocytic accessory protein CCDC32, which has implications in cardio-facio-neuro-developmental syndrome (CFNDS). The authors clearly demonstrate that the protein CCDC32 has a role in the early stages of endocytosis, mainly through the interaction with the major endocytic adaptor protein AP2, and they identify regions taking part in this recognition. Through live cell fluorescence imaging and electron microscopy of endocytic pits, the authors characterize the lifetimes of endocytic sites, the formation rate of endocytic sites and pits and the invagination depth, in addition to transferrin receptor (TfnR) uptake experiments. Binding between CCDC32 and CCDC32 mutants to the AP2 alpha appendage domain is assessed by pull down experiments.

      Together, these experiments allow deriving a phenotype of CCDC32 knock-down and CCDC32 mutants within endocytosis, which is a very robust system, in which defects are not so easily detected. A mutation of CCDC32, mimicking CFNDS mutations, is also addressed in this study and shown to have endocytic defects.

      An experimental proof for the resistance of the different CCDC32 mutants to siRNA treatment would have helped to strengthen the conclusions.

      In summary, the authors present a strong combination of techniques, assessing the impact of CCDC32 in clathrin mediated endocytosis and its binding to AP2.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This is a revision of a manuscript previously submitted to Review Commons. The authors have partially addressed my comments, mainly by expanding the introduction and discussion sections. Sandy Schmid, a leading expert on the AP2 adaptor and CME, has been added as a co-corresponding author. The main message of the manuscript remains unchanged. Through overexpression of fluorescently tagged CCDC32, the authors propose that, in addition to its established role in AP2 assembly, CCDC32 also follows AP2 to the plasma membrane and regulates CCP maturation. The manuscript presents some interesting ideas, but there are still concerns regarding data inconsistencies and gaps in the evidence.

      With due respect, we would argue that a role for CCDC32 in AP2 assembly is hardly ‘established’.  Rather a single publication reporting its role as a co-chaperone for AAGAP appeared while our manuscript was under review.  We find some similar and some conflicting results, which are described in our revised manuscript.  However, in combination our two papers clearly show that CCDC32, a previously unrecognized endocytic accessory protein, deserves further study.

      (1) eGFP-CCDC32 was expressed at 5-10 times higher levels than endogenous CCDC32. This high expression can artificially drive CCDC32 to the cell surface via binding to the alpha appendage domain (AD)-an interaction that may not occur under physiological conditions.

      While we acknowledge that overexpression of eGFP-CCDC32 could result in artificially driving it to CCPs, we do not believe this is the case for the following reasons:

      i. The bulk of our studies (Figures 2-4) demonstrate the effects of siRNA knockdown on CCDC32 on CCP early stages of CME, and so it is likely that these functions require the presence of endogenous CCDC32 at nascent CCPs as detected with overexpressed eGFP-CCDC32 by TIRF imaging.

      ii. At these levels of overexpression eGFP-CCDC32 fully rescues the effects of siRNA KD of endogenous CCCDC32 of Tfn uptake and CCP dynamics (Figure 6F,G). If the protein was artificially recruited to the AP2 appendage domain, one would expect it to compete with the recruitment of other EAPS to CCPs and hence exhibit defects in CCP dynamics. Indeed, we see the opposite: CCPs that are positive for eGFP-CCDC32 show normal dynamics and maturation rates, while CCPs lacking eGFP-CCDC32 are short-lived and more likely to be aborted (Figure 1C).

      iii. We have identified two modes of binding of CCDC32 to AP2 adaptors: one is through canonical AP2-AD binding motifs, the second is through an a-helix in CCDC32 that, by modeling, docks only to the open conformation of AP2.  Overexpressed CCDC32 lacking this a-helix is not recruited to CCPs (Fig. 6 D,E), indicating that the canonical AP2 binding motifs are not sufficient to recruit CCDC32 to CCPs, even when overexpressed.

      (2) Which region of CCDC32 mediates alpha AD binding? Strangely, the only mutant tested in this work, Δ78-98, still binds AP2, but shifts to binding only mu and beta. If the authors claim that CCDC32 is recruited to mature AP2 via the alpha AD, then a mutant deficient in alpha AD binding should not bind AP2 at all. Such a mutant is critical for establish the model proposed in this work.

      We understand the reviewer’s confusion and thus devoted a paragraph in the discussion to this issue.  As revealed by AlphaFold 3.0 modeling (Figure S6) binding of CCDC32 to the alpha AD likely occurs via the 2 canonical AP2-AD binding motifs encoded in CCDC32. Given the highly divergent nature of AP2-AD binding motifs, we did not identify these motifs without the AlphaFold 3.0 modeling. While these interactions could be detected by GST-pull downs, they are apparently not of sufficient affinity to recruit CCDC32 to CCPs in cells. In the text, we now describe the a-helix we identified as being essential of CCP recruitment as ‘a’ AP2 binding site on CCDC32 rather than ‘the’ AP2 binding site.  Interestingly, and also discussed, Alphafold 3.0 identifies a highly predicted docking site on a-adaptin that is only accessible in the open, cargo-bound conformation of intact AP2.  This is also consistent with the inability of CCDC32(D78-99) to bind the a:µ2 hemi-complex in cell lysates.

      We agree that further structural studies on CCDC32’s interactions with AP2 and its targeting to CCPs will be of interest for future work.

      (3) The concept of hemicomplexes is introduced abruptly. What is the evidence that such hemicomplexes exist? If CCDC32 binds to hemicomplexes, this must occur in the cytosol, as only mature AP2 tetramers are recruited to the plasma membrane. The authors state that CCDC32 binds the AD of alpha but not beta, so how can the Δ78-98 mutant bind mu and beta?

      We introduced the concept of hemicomplexes based on our unexpected (and now explicitly stated as such) finding that the CCDC32(D78-99) mutant efficiently co-IPs with a b2:µ2 hemicomplex.  As stated, the efficiency of this pulldown suggests that the presumed stable AP2 heterotetramer must indeed exist in equilibrium between the two a:s2 and b2:µ2 hemicomplexes, such that CCDC32(D78-99) can sequester and efficiently co-IP with the b2:µ2 hemicomplex.  A previous study, now cited, had shown that the b2:µ2 hemicomplex could partially rescue null mutations of a in C. elegans (PMID: 23482940).  We do not know how CCDC32 binds to the b2:µ2 hemicomplex and we did not detect these interactions using AlphaFold 3.0. However, these interactions could be indirect and involve the AAGAB chaperone.  It is also likely, based on the results of Wan et al. (PMID: 39145939), that the binding is through the µ2 subunit rather than b2. As mentioned above, and in our Discussion, further studies are needed to define the complex and multi-faceted nature of CCDC32-AP2 interactions.

      (4) The reported ability of CCDC32 to pull down AP2 beta is puzzling. Beta is not found in the CCDC32 interactome in two independent studies using 293 and HCT116 cells (BioPlex). In addition, clathrin is also absent in the interactome of CCDC32, which is difficult to reconcile with a proposed role in CCPs. Can the authors detect CCDC32 binding to clathrin?

      Based on the studies of Wan et al. (PMID: 39145939), it is likely that CCDC32 binds to µ2, rather than to the b2 in the b2:µ2 hemicomplex.  As to clathrin being absent from the CCDC32 pull down, this is as expected since the interactions of clathrin even with AP2 are weak in solution (as shown in Figure 5C, clathrin is not detected in our AP2 pull down) so as not to have spontaneous assembly of clathrin coats in the cytosol. Rather these interactions are strengthened by both the reduction in dimensionality that occurs on the membrane and by avidity of multivalent interactions.  For example, Kirchausen reported that 2 AP2 complexes are required to recruit one clathrin triskelion to the PM.

      (5) Figure 5B appears unusual-is this a chimera?

      Figure 5B shows an internal insertion of the eGFP tag into an unstructured region in the AP2 hinge. As we have previously shown (PMID: 32657003), this construct, unique among other commonly used AP2 tags, is fully functional.  We have rearranged the text in the Figure legend to make this clearer.

      Figure 5C likely reflects a mixture of immature and mature AP2 adaptor complexes.

      This is possible, but mature heterotetramers are by far the dominant species, otherwise the 4 subunits would not be immuno-precipitated at near stoichiometric levels with the a subunit.  Near stoichiometric IP with antibodies to the a-AD have been shown by many others in many cell types. 

      (6) CCDC32 is reduced by about half in siRNA knockdown. Why not use CRISPR to completely eliminate CCDC32 expression?

      Fortuitously, partial knockdown was essential to reveal this second function of CCDC32, as we have emphasized in our Discussion.  Wan et al, used CRISPR to knockout CCDC32 and reveal its essential role as a AAGAB co-chaperone.  In the complete absence of CCDC32 mature AP2 complexes fail to form.  However, under our conditions of partial CCDC32 depletion, the expression of AP2 heterotetramers is unaffected revealing a second function of CCDC32 at early stages of CME.  We expect that the co-chaperone function of CCDC32 is catalytic, while its role in CME is more structural; hence the different concentration dependencies, the former being less sensitive to KD than the latter.  This is one reason that many researchers are turning to CRISPRi for whole genome perturbation studies as many proteins play multiple roles that can be masked in KO studies.

      Reviewer #2 (Public review):

      Yang et al. describes CCDC32 as a new clathrin mediated endocytosis (CME) accessory protein. The authors show that CCDC32 binds directly to AP2 via a small alpha helical region and cells depleted for this protein show defective CME. Finally, the authors show that the CCDC32 nonsense mutations found in patients with cardio-facial-neuro-developmental syndrome (CFNDS) disrupt the interaction of this protein to the AP2 complex. The results presented suggest that CCDC32 may act as both a chaperone (as recently published) and a structural component of the AP2 complex.

      Strengths:

      The conclusions presented are generally well supported by experimental data and the authors carefully point out the differences between their results and the results by Wan et al. (PNAS 2024).

      Weaknesses:

      The experiments regarding the role of CCDC32 in CFNDS still require some clarifications to make them clearer to scientists working on this disease. The authors fail to describe that the CCDC32 isoform they use in their studies is different from the one used when CFNDS patient mutations were described. This may create some confusion. Also, the authors did not discuss that the frame-shift mutations in patients may be leading to nonsense mediated decay.

      As requested we have more clearly described our construct with regard to the human mutations and added the possibility of NMD in the context of the human mutations.

      Reviewer #3 (Public review):

      In this manuscript, Yang et al. characterize the endocytic accessory protein CCDC32, which has implications in cardio-facio-neuro-developmental syndrome (CFNDS). The authors clearly demonstrate that the protein CCDC32 has a role in the early stages of endocytosis, mainly through the interaction with the major endocytic adaptor protein AP2, and they identify regions taking part in this recognition. Through live cell fluorescence imaging and electron microscopy of endocytic pits, the authors characterize the lifetimes of endocytic sites, the formation rate of endocytic sites and pits and the invagination depth, in addition to transferrin receptor (TfnR) uptake experiments. Binding between CCDC32 and CCDC32 mutants to the AP2 alpha appendage domain is assessed by pull down experiments. While interaction between CCDC32 and the alpha appendage domain of AP2 is clearly described, a discussion of potential association with other AP2 domains would be beneficial to understand the impact of CCDC32 in endocytosis.

      The reviewer is correct. That CCDC32 also interacts with other subunits of AP2, is evident from the findings of Wan et al. and by the fact that the CCDC32(D78-99) mutant efficiently co-IPs with the b2:µ2 hemicomplex.  We expanded our discussion around this point. CCDC32 remains an, as yet, poorly characterized, but we now believe very interesting EAP worth further study.

      Together, these experiments allow deriving a phenotype of CCDC32 knock-down and CCDC32 mutants within endocytosis, which is a very robust system, in which defects are not so easily detected. A mutation of CCDC32, mimicking CFNDS mutations, is also addressed in this study and shown to have endocytic defects.

      In summary, the authors present a strong combination of techniques, assessing the impact of CCDC32 in clathrin mediated endocytosis and its binding to AP2.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) The authors must be clear about the differences between the CCDC32 isoform they used in their manuscript and the one used to describe the patient mutations. This could be done, for example, in the methods. This is essential for the capacity of other labs to reproduce, follow up and correctly cite these results.

      We have added this information to the Methods. 

      (2) I believe the authors have misunderstood what nonsense mediated decay is. NMD occurs at the mRNA level and requires a full genome context to occur (introns and exons). The fact that a mutant protein is expressed normally from a construct by no means prove that it does not happen. I believe that adding the possibility of NMD occurring would enrich the discussion.

      Thank you, we have now done more homework and have added this possibility into our discussion of the mutant phenotype.  However, if a robust NMD mechanism resulted in a complete loss of CCDC42 protein, then the essential co-chaperone function reported by Wan et al, would result in complete loss of AP2.  A more detailed characterization of the cellular phenotype of these mutations, including assessing the expression levels of AP2 would be informative.

      Reviewer #3 (Recommendations for the authors):

      - It is not clear what the authors mean by '~30s lifetime cohort' (line 159). They refer to Figure 2H, which shows the % of CCPs. Can the authors explain exactly what kind of tracks they used for this analysis, for example which lifetime variations were accepted? Do they refer to the cohorts in Figure S4? In Figure S4, the most frequent tracks have lifetimes < 20 s (in contrast to what is stated in the main text). Why was this cohort not used?

      The ‘30s cohort’ refers to CCPs with lifetimes between 25-35s which encompasses the most abundant species in control cells and CCDC32 KD cells, as shown by the probability curves in Figure 2H. Given the large number of CCPs analyzed we still have large numbers for our analyses n=5998 and 4418, for control and siRNA treated conditions, respectively.  Figure 2H shows the frequency of CCPs in cells treated with CCDC32 siRNA are shifted to shorter lifetimes. We have clarified this in the text.

      - Figure S1: It is now clear, why the mutant versions of CCDC32 are not detected in this western blot. However, data that show the resistance of these proteins to siCCDC32 is still missing (S1 A is in the absence of siCCSC32 I assume, as the legend suggests). A western blot using an anti-GFP antibody, as the one used in Figure S1, after siRNA knock-known would provide clarity.

      That these constructs all contain the same mutation in the siRNA target sequence gives us confidence that they are indeed resistant to siRNA.

      - Note that the anti-CCDC32 antibody does not detect the eGFP-CCDC32(∆78-98) as well as full-length and is unable to detect eGFP-CCDC32(1-54)'. This phrase should belong to Figure S1 (B), not (A)

      Corrected.

      - The immunoprecipitations of CCDC32 and its mutants with AP2 and its subunits are partially confusing. In Figure 5, the authors show that CCDC32 interacts specifically with the alpha-AD, but not with the beta-AD of AP2. In Figure 6B and C, on the other hand, Co-IPs are shown also with the beta and the mu domain of AP2. This is understandable in the context of the full AP2. However, when interaction with the alpha domain (and sigma) is abolished through mutation of helix 78-98, why would beta and mu still interact, when the beta-AD cannot interact with CCDC32 on its own. Are there interaction sites expected outside the ADs in the beta or mu domains?

      See responses to reviewer 1 above.  This result likely reflects the co-chaperone activity of CCDC32 as reported by Wan et al it likely due to their reported interactions of CCDC32 with the µ2 subnit of b2:µ2 hemicomplexes.

      - Figure S6 D, E and F: How much confidence do the authors have on the AlphaFold predictions? Have the same binding poses been obtained repeatedly by independent predictions?

      We provide, with a color scale, the confidence score for each interaction, which is very high (>90%). Of course, this is still a prediction that will need to be verified by further structural studies as we have stated.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Cook et al. have presented an important study on the transcriptomic and epigenomic signature underlying craniofacial development in marsupials. Given the lack of a dunnart genome, the authors also prepared long and short-read sequence datasets to assemble and annotate a novel genome to allow for the mapping of RNAseq and ChIPseq data against H3K4me3 and H3K27ac, which allowed for the identification of putative promoter and enhancer sites in dunnart. They found that genes proximal to these regulatory loci were enriched for functions related to bone, skin, muscle and embryonic development, highlighting the precocious state of newborn dunnart facial tissue. When compared with mouse, the authors found a much higher proportion of promoter regions aligned between species than for enhancer regions, and subsequent profiling identified regulatory elements conserved across species and are important for mammalian craniofacial development. In contrast, the identification of dunnart-specific enhancers and patterns of RNA expression further confirm the precocious state of muscle development, as well as for sensory system development, in dunnart suggesting that early formation of these features are critical for neonate marsupials likely to assist with detecting and responding to cues that direct the joeys to the mother's teat after birth. This is one of the few epigenomic studies performed in marsupials (of any organ) and the first performed in fat-tailed dunnart (also of any organ). Marsupials are emerging as an important model for studying mammalian development and evolution and the authors have performed a novel and thorough analysis, impressively including the assembly of a new marsupial reference genome that will benefit many future studies.

      Strengths:

      The study provides multiple pieces of evidence supporting the important role enhancer elements play in mammalian phenotypic evolution, namely the finding of a lower proportion of peaks present in both dunnart and mouse for enhancers than for promoters, and dunnart showing more genes uniquely associated with it's active enhancers than any other combination of mouse and dunnart samples, whereas this pattern was less pronounced than for promoter-associated genes. In addition, rigorous parameters were used for the cross-species analyses to identify the conserved regulatory elements and the dunnart-specific enhancers. For example, for the results presented in Figure 1, I agree that it is a little surprising that the average promoter-TSS distance is greater than that for enhancers, but that this could be related to the possible presence of unannotated transcripts between genes. The authors addressed this well by examining the distribution of promoter-TSS distances and using proximal promoters (cluster #1) as high confidence promoters for downstream analyses.

      The genome assembly method was thorough, using two different long read methods (Pacbio and ONT) to generate the long reads for contig and scaffold construction, increasing the quality of the final assembled genome.

      Weaknesses:

      Biological replicates of facial tissue were collected at a single developmental time point of the fat-tailed dunnart within the first postnatal day (P0), and analysed this in the context of similar mouse facial samples from the ENCODE consortium at six developmental time points, where previous work from the authors have shown that the younger mouse samples (E11.5-12.5) approximately corresponds to the dunnart developmental stage (Cook et al. 2021). However, it would be useful to have samples from at least one older dunnart time point, for example, at a developmental stage equivalent to mouse E15.5. This would provide additional insight into the extent of accelerated face development in dunnart relative to mouse, i.e. how long do the regulatory elements that activated early in dunnart remain active for and does their function later influence other aspects of craniofacial development?

      We thank the reviewer for their feedback and agree that the inclusion of multiple postnatal stages in the dunnart would give further valuable insights to the comparative analyses. Unfortunately, we were limited by the pouch young available and prioritized ensuring robust data at a single stage for this study. We hope to expand this work to more stages in future studies.

      The authors refer to the development of the CNS being delayed in marsupials relative to placental mammals, however, evidence shows how development of the dunnart brain (whole brain or cortex) is protracted compared to mouse, by a factor of at least 2 times, rather than delayed per se (Workman et al. 2013; Paolino et al. 2023). In addition, there is evidence that cortical formation and cell birth may begin at approximately the same stage across species equivalent to the neonate period in dunnart (E10.5 in mouse), and that shortly after this at the stage equivalent to mouse E12.5, the dunnart cortex shows signs of advanced neurogenesis followed by a protracted phase of neuronal maturation (Paolino et al. 2023). Therefore, it is possible that marsupial CNS development appears delayed relative to mouse but instead begins at the same stage and then proceeds to develop on a different timing scale.

      The comparison here is not directly between CNS development in placental and marsupials but CNS development relative to development of a subset of structures of the cranial skeleton and musculature (as first proposed by Kathleen Smith 1997). For example, Smith 1997 found that in eutherians, evagination of the telencephalon and appearance of the pigment in the eye occur before the ossification of the premaxilla, maxilla, and dentary. However, in marsupials, evagination of the telencephalon and appearance of the pigment in the eye occur concurrently with condensation of cartilage in the basicranium and the ossification of the premaxilla, maxilla, and dentary. Smith 1997 reports both a delay in the initiation of CNS development in marsupials relative to craniofacial ossification and a protraction of CNS development compared to placental mammals.

      This also highlights the challenges of correlating different staging systems between placentals and marsupials as stages determined as equivalent can change depending on which developmental events are used. The protracted development of the CNS in marsupials (Smith 1997, Workman et al. 2013; Paolino et al. 2023) still supports the hypothesis that during the short gestation period in marsupials structures required for life outside the womb in an embryonic-like state, such as the orofacial region, are likely prioritized.

      We have clarified this based on the reviewers feedback and added text referring to the protraction of marsupial CNS development to the Discussion section.

      [New text]: Marsupials display advanced development of the orofacial region relative to development of the central nervous system when compared to placental mammals[3,6].

      [New text]: Although development of the central nervous system is protracted in marsupials compared to placentals, marsupials have well-developed peripheral motor nerves and sensory nerves (eg. the trigeminal) at birth [5].

      Reviewer #2 (Public review):

      This study by Cook and colleagues utilizes genomic techniques to examine gene regulation in the craniofacial region of the fat-tailed dunnart at perinatal stages. Their goal is to understand how accelerated craniofacial development is achieved in marsupials compared to placental mammals.

      The authors employ state-of-the-art genomic techniques, including ChIP-seq, transcriptomics, and high-quality genome assembly, to explore how accelerated craniofacial development is achieved in marsupials compared to placental mammals. This work addresses an important biological question and contributes a valuable dataset to the field of comparative developmental biology. The study represents a commendable effort to expand our understanding of marsupial development, a group often underrepresented in genomic studies.

      The dunnart's unique biology, characterized by a short gestation and rapid craniofacial development, provides a powerful model for examining developmental timing and gene regulation. The authors successfully identified putative regulatory elements in dunnart facial tissue and linked them to genes involved in key developmental processes such as muscle, skin, bone, and blood formation. Comparative analyses between dunnart and mouse chromatin landscapes suggest intriguing differences in deployment of regulatory elements and gene expression patterns.

      Strengths

      (1) The authors employ a broad range of cutting-edge genomic tools to tackle a challenging model organism. The data generated - particularly ChIP-seq and RNA-seq from craniofacial tissue - are a valuable resource for the community, which can be employed for comparative studies. The use of multiple histone marks in the ChIP-seq experiments also adds to the utility of the datasets.

      (2) Marsupial occupy an important phylogenetic position, but they remain an understudied group. By focusing on the dunnart, this study addresses a significant gap in our understanding of mammalian development and evolution. Obtaining enough biological specimens for these experiments studies was likely a big challenge that the authors were able to overcome.

      (3) The comparison of enhancer landscapes and transcriptomes between dunnarts and can serve as the basis of subsequent studies that will examine the mechanisms of developmental timing shifts. The authors also carried out liftover analyses to identify orthologous enhancers and promoters in mice and dunnart.

      Weaknesses and Recommendations

      (1) The absence of genome browser tracks for ChIP-seq data makes it difficult to assess the quality of the datasets, including peak resolution and signal-to-noise ratios. Including browser tracks would significantly strengthen the paper by provide further support for adequate data quality.

      We have put together an IGV session with the dunnart genome, annotation and ChIP-seq tracks. This is now available in the FigShare data repository (10.7554/eLife.103592.1).

      (2) The first two figures of the paper heavily rely in gene orthology analysis, motif enrichment, etc, to describe the genomic data generated from the dunnart. The main point of these figures is to demonstrate that the authors are capturing the epigenetic signature of the craniofacial region, but this is not clearly supported in the results. The manuscript should directly state what these analyses aim to accomplish - and provide statistical tests that strengthen confidence on the quality of the datasets.

      As this is the first epigenomic profiling for this species we performed extensive data quality control (See Supplementary Tables 2-3, 18, 20-23 and Supplementary Figures 1-3, 6-11). These figures and corresponding Supplementary Tables show the robustness of the data, including well-described metrics for assessing promoters and enhancers, GO terms relevant to craniofacial development and binding motifs for key developmental TF families.

      We have emphasised this aspect of the work more strongly in the results section, particularly in [Defining craniofacial putative enhancer- and promoter regions in the dunnart].

      (3) The observation that "promoters are located on average 106 kb from the nearest TSS" raises significant concerns about the quality of the ChIP-seq data and/or genome annotation. The results and supplemental information suggest a combination of factors, including unannotated transcripts and enhancer-associated H3K4me3 peaks - but this issue is not fully resolved in the manuscript. The authors should confirm that this is not caused by spurious peaks in the CHIP-seq analysis - and possibly improve genome annotation with the transcriptomic datasets presented in the study.

      Spurious ChIP-seq peaks could be possible as there is no “blacklisted regions” database for the dunnart to filter on, however we used a no-IP control, a stringent FDR of 0.01 and peaks had to be reproducible in two biological replicates when calling peaks - all of which should reduce the likelihood of false positives.

      H3K4me3 activity at enhancers is well-established, in particular when enhancer sequences are also bound by RNA Pol II ((Koch and Andrau, 2011; Pekowska et al., 2011). However, compared to H3K4me3 activity at promoters, H3K4me3 levels at enhancers are low (Calo and Wysocka, 2013). This is in line with our observations that H3K4me3 levels at enhancers are much lower than observed at promoter regions (see Supplementary Note 2). We found that H3K4me3 peaks located closer to the TSS had a stronger peak signal (mean = 46.10) than distal H3K4me3 peaks (mean = 6.95; Wilcoxon FDR-adjusted p < 2.2 x 10<sup>-16</sup>). This suggests that although some distal promoter peaks may be due to missingness in the annotation, the majority likely represent peaks associated with enhancer regions. We have emphasized this finding more strongly in the results section:

      [New text]: H3K4me3 activity at enhancers is well-established[25,26], however, compared to H3K4me3 activity at promoters, H3K4me3 levels at enhancers are low[27]. This is in line with our observations where H3K4me3 levels at distal enhancer peaks are nearly 7 times lower than those observed at promoter regions (see SupNote2).

      (4) The comparison of gene regulation between a single dunnart stage (P1) and multiple mouse stages lacks proper benchmarking. Morphological and gene expression comparisons should be integrated to identify equivalent developmental stages. This "alignment" is essential for interpreting observed differences as true heterochrony rather than intrinsic regulatory differences.

      Given the developmental differences between eutherian and marsupial mammals it is challenging to assign the dunnart a precise “equivalent” developmental stage to the mouse. From our morphological and developmental characterisation (see Cook et al. 2020 Nat Comms Bio) based on ossification patterns the dunnart orofacial region on the day of birth appears to be similar to that of an E12.5 mouse embryo (just prior to the observation of ossified craniofacial bones). However, when we compared both regulatory elements and expressed genes between the dunnart at this stage (P1) and 5 developmental stages in the mouse, there is no obvious equivalent stage. For example, when we simply compare genes linked to enhancer peaks, the group with the largest intersection between dunnart and any mouse stage are ~500 genes that are present in dunnart, and mouse stages E10.5, E12.5 - E15.5, Figure 5B). When we then compare genes expressed in the dunnart to temporal gene expression dynamics during mouse development we find that the largest overlap is with genes highly expressed at E14.5 or E15.5 in the mouse (Figure 6, Supplementary Figure 5). We have strengthened the rationale for the selected mouse stages in the comparative analyses section of the results.

      (5) The low conservation of putative enhancers between mouse and dunnart (0.74-6.77%) is surprising given previous reports of higher tissue-specific enhancer conservation across mammals. The authors should address whether this low conservation reflects genuine biological divergence or methodological artifacts (e.g., peak-calling parameters or genome quality). Comparisons with published studies could contextualize these findings.

      The reported range (0.74 - 6.77%) refers to the number regions called as an active enhancer peak in both species (conserved activity) divided by the total number of dunnart peaks alignable to the mouse genome, which we expect to be low given sequence turnover rates and the evolutionary distance separating dunnart and mice. The alignability (conserved sequence) for dunnart enhancers to the mouse genome was ~13% for 100bp regions and can be found in Supplementary Table 22, we have now clarified this in the main text.

      [New Text]: After building dunnart-mm10 liftover chains (see Methods and SupNote5) we compared mouse and dunnart regulatory elements. The alignability (conserved sequence) for dunnart enhancers to the mouse genome was ~13% for 100bp regions (Supplementary Table 22).

      The activity conservation range reported here is consistent with previously reported for marsupial-placental enhancer comparisons (Villar et al. 2015), where ~1% of conserved liver-specific human enhancers had conserved activity to opossum. Follow up studies in Berthelot et al 2018 also found that approximately 1% of human liver enhancers were conserved across the placental mammals included in the study.

      (6) Focusing only on genes associated with shared enhancers excludes potentially relevant genes without clear regulatory conservation. A broader analysis incorporating all orthologous genes may reveal additional insights into craniofacial heterochrony.

      We appreciate the reviewers comment, we understand that a broader analysis may provide some additional insights to this question however in this study our focus was understanding the enhancers driving craniofacial development in these species. We linked enhancers with gene expression data as additional evidence of regulatory programs involved in craniofacial development. The majority (~70%) of genes reproducibly expressed were linked to an active enhancer and/or promoter.   This has now been highlighted in the result section.

      [New Text]: There were 12,153 genes reproducibly expressed at a level > 1 TPM across three biological replicates, with the majority of genes 67% of genes expressed (67%; 8158/12153) associated with near an active enhancer and/or promoter peak.

      In conclusion, this study provides an important dataset for understanding marsupial craniofacial development and highlights the potential of genomic approaches in non-traditional model organisms. However, methodological limitations, including incomplete genome annotation and lack of developmental benchmarking weaken the robustness and of the findings. Addressing these issues would significantly enhance the study's utility to the field and its ability to support the study's central conclusion that dunnart-specific enhancers drive accelerated craniofacial development.

      Reviewer #1 (Recommendations for the authors):

      Minor comments and corrections:

      (1) ChIP-seq FRiP fractions were much higher in dunnart samples than in mouse. Is this related to any differences in sample preparation they are aware of in the ENCODE datasets of mouse, such as different anti-histone antibodies used (and therefore different efficiency of binding to the same histone markers across species)? The authors appear to have addressed something similar with respect to the much lower enriched peak number observed in the mouse sample relative to dunnart in Supp note 4. I suspect the "technical cofounder" they refer to there is affecting both the FRiP scores and the higher correlation coefficients between IP and input in mouse.

      We chose the same antibodies used in the mouse craniofacial tissue ENCODE experiments however, the procedure is slightly different. We used the MAGnify Chromatin Immunoprecipitation System while in the ENCODE assays performed by Bing Ren’s group in 2012 was an in-house lab protocol for MicroChIP. Given that the samples for mouse and dunnart were not processed together, by the same researcher, with the same protocol there could be any number of technical cofounders impacting enrichment. A low FRiP score suggests low specificity as the majority of reads are in non-specific regions (low enrichment), consistent with the higher correlation between IP and input in mouse. The data quality also appears to vary between H3K27ac and H3K4me3 in the mouse (Supplementary Table 21), with H3K4me3 FRiP scores more similar to those observed in our dunnart experiments. This suggests a potential confounder specific to the mouse H3K27ac IP. QC metrics (FRiP, bam correlation) are consistent between H3K27ac and H3K4me3 IPs in our experiments (Supplementary Table 20).

      (2) Some of the promoter peak numbers in Supp table 1 do not match the numbers in the main text.

      We have corrected the incorrect number reported in the text for promoter peaks with orthologous genes (8590 -> 8597).

      (3) In Supp tables 2 and 3, the number of GO terms similar across tables is 466, which is ~42% of total number of enriched GO terms. However the authors mention that only 23% of terms were the same between promoters and enhancers, and a value of 42% was applied to the proportion of terms uniquely enriched for terms associated with genes assigned to promoters only. Unless I'm reading these Supp tables incorrectly, is it possible the proportions were mixed up?

      Thanks for catching this. The lists provided in Supplementary Table 2 were incorrect. The Supplementary Tables and in text description has been corrected to reflect this.

      (4) Would be helpful to add a legend for the mouse samples in Supp Figure 10.

      We have added the labels to the plot.

      (5) In Supp note 5, regarding the percentage of alignable peaks recovered, the percentages mentioned for the 50bp and 500bp peak summit lengths for enhancers and promoters do not seem to match the values in Supp tables 22 and 23.

      Thank you for catching this - we have corrected the Supplementary Tables and in text.

      (6) Please provide additional information to explain how dunnart RNA expression was associated with the five temporal expression clusters found in the mouse data shown in Figure 6 given there is only one dunnart time point and so the species temporal pattern's could not be compared, i.e. how was the odds ratio calculated and was this applied iteratively for dunnart against each mouse age and within each temporal cluster?

      The TCseq package takes the mouse expression data across all 6 stages and calls differentially expressed genes with an absolute log<sub>2</sub> fold-change > 2 compared to the starting time-point (E10.5). The mouse gene expression patterns were clustered into 5 clusters that each show distinct temporal expression patterns (see Supplementary Figure 5D). The output from this is 5 lists where within each list are unique genes that share a temporal pattern. These lists of mouse genes were then each compared to the orthologous genes expressed in the dunnart using a Fishers Exact test with corrections for multiple testing using the Holm method. We have added additional details in the methods:

      [New text]: Orthologous genes reproducibly expressed >1 TPM in the dunnart were compared to the list of genes for each cluster using Fisher’s Exact Test followed by p-value corrections for multiple testing with the Holm method.

      (7) SupFile1 and SupFile2 - which supplementary note or figure are these referring to?

      Apologies for this error. These items were meant to link to the FigShare repository where the supplementary files can be found. We have corrected this using the DOI for the repository.

      Reviewer #2 (Recommendations for the authors):

      (1) Authors should clarify that the mouse ENCODE data used for the comparisons was obtained from craniofacial tissue.

      This has now been corrected to clarify that the mouse ENCODE data used was from craniofacial tissues. ENCODE mouse embryonic facial prominence ChIP-seq and gene expression quantification file accession numbers and details used in study can be found in Supplementary Table 17.

      (2) Given the large differences in TPM for highly expressed genes shown in Figure 5, a MA or volcano plot would provide a more comprehensive view of global transcriptome differences between species.

      We have added this plot as Supplementary Figure 13.

      (3) It is unclear whether the enrichment analysis was performed for mouse genes, dunnart genes, or both.

      In reference to Figure 5, Gene Ontology enrichment analysis was performed on the top 500 highly expressed genes in dunnart. Because there is not an ontology database for dunnart gene IDs, these top 500 dunnart gene IDs were converted to the orthologous gene ID in mouse before performing the enrichment analysis. We apologise for the lack of clarity and have added additional text in the results section to make this clearer. In addition, the relevant methods section now reads:

      [New text]: As there is no equivalent gene ontology database for dunnart, we converted the Tasmanian devil RefSeq IDs to Ensembl v103 using biomaRt v2.46.3 and then converted these to mouse Ensembl v103 IDs. In this way we were able to use the mouse Ensembl Gene Ontology annotations for the dunnart gene domains. All gene ontology analyses were performed using clusterProfiler v4.1.4[117], with Gene Ontology from the org.Mm.eg.db v3.12.0 database[118], setting an FDR-corrected p-value threshold of 0.01 for statistical significance.

    2. eLife Assessment

      This important study of regulatory elements and gene expression in the craniofacial region of the fat-tailed dunnart shows that, compared to placental mammals, marsupial craniofacial tissue develops in a precocious manner, with enhancer regulatory elements as primary driver of this difference. The compelling data, including a new dunnart genome assembly, provide an invaluable reference for future mammalian evolution studies, especially once additional developmental time point for the fat-tailed dunnart become available.

    3. Reviewer #1 (Public review):

      Summary:

      Compared to placental mammals, marsupials have a short gestation period and give birth to altricial young. To assist with the detection and response to cues that direct the neonate joeys to the mother's pouch, as well as latching onto the teat, marsupial craniofacial development at this stage is rapid and heterochronous relative to placentals. Cook et al. have presented an important study on the transcriptomic and epigenomic signature underlying this heterochronous development of craniofacial features across mammals, using the fat-tailed dunnart as a marsupial model.

      Given the lack of a dunnart genome, the authors prepared long and short read sequence datasets to assemble and annotate a novel genome to allow for mapping of RNAseq and ChIPseq data against H3K4me3 and H3K27ac, which allowed for identification of putative promoter and enhancer sites in dunnart. They found that genes proximal to these regulatory loci were enriched for functions related to bone, skin, muscle and embryonic development, verifying the precocious state of newborn dunnart facial tissue. When compared with mouse, the authors found a much higher proportion of promoter regions aligned between species than for enhancer regions, and subsequent profiling identified regulatory elements conserved across species and are important for mammalian craniofacial development. In contrast, identification of dunnart-specific enhancers and patterns of RNA expression further confirm the precocious state of muscle development, as well as for sensory system development, in dunnart, suggesting that early formation of these features are critical for neonate marsupials.

      Marsupials are emerging as an important model for studying mammalian development and evolution, and the authors have performed a novel and thorough analysis that helps to elucidate the regulatory profile underlying craniofacial heterochrony. Impressively, this study also includes the assembly of a new marsupial reference genome that will benefit many future studies of mammalian developmental biology.

      Strengths:

      The genome assembly method was thorough, using two different long-read methods (Pacbio and ONT) to generate the long reads for contig and scaffold construction, increasing the quality of the final assembled genome, which was effectively annotated and used for functional analysis of orthologous regulatory elements.

      The birth of altricial young in marsupials is an important feature of their development that is distinct from placental mammals which are separated by about 160 million years of evolution. Very little is known, however, about the regulatory profile that contributes to the advanced craniofacial development required for joey survival. This is one of the few epigenomic studies performed in marsupials (of any organ) and the first performed in fat-tailed dunnart (also of any organ), which begins to address this lack of knowledge.

      The study also provides evidence supporting the important role enhancer elements play in mammalian phenotypic evolution, relative to promoters.

      Weaknesses:

      Biological replicates of facial tissue were collected at a single developmental time point of the fat-tailed dunnart within the first postnatal day (P0), and analysed this in the context of similar mouse facial samples from the ENCODE consortium at six developmental time points, where previous work from the authors have shown that the younger mouse samples (E11.5-12.5) approximately corresponds to the dunnart developmental stage (Cook et al. 2021). However, it would be useful to have samples from at least one older dunnart time point, for example, at a developmental stage equivalent to mouse E15.5. This would provide additional insight into the extent of accelerated face development in dunnart relative to mouse, i.e. how long do the regulatory elements that are activated early in dunnart remain active for and does their function later influence other aspects of craniofacial development?

    1. eLife Assessment

      This study presents a valuable comparison of the efficiency and precision of two prime editing methods to introduce single-nucleotide variants and longer exogenous DNA sequences into the zebrafish genome. Solid data support the conclusion that the PE2 prime editor Nickase is more effective at introducing single-nucleotide variants, while the PEn prime editor nuclease is more effective at integrating short sequences from 3 up to 30 base pairs, for both somatic and germline editing. The results will be of interest to the zebrafish community, in particular to model human disease variants in this model organism.

    2. Reviewer #1 (Public review):

      Ono et al. compared the activity of prime editor Nickase PE2 and prime editor nuclease PEn in introducing SNPs and short exogenous DNA sequences into the zebrafish genome to model human disease variants. They find the nickase PE2 prime editor had a higher rate of precise integration for introducing single-nucleotide substitutions, whereas the nuclease PEn prime editor showed improved precision of integration of short DNA sequences. In somatic tissue, the percentage of SNP variant precision edits improved when using PE2 RNP injection instead of mRNA injection, but increased precision editing correlated with elevated indel formation. While PEn overall had higher rates of precision edits, the indel rate was also elevated. Similar rates were observed when introducing a 3 bp stop codon into the ror gene using a standard pegRNA with a 13-nucleotide homology arm, or a springRNA lacking the homology arm that drives integration via NHEJ. Inclusion of an abasic sequence in the springRNA prevented imprecise edits caused by scaffold incorporation, but did not improve the overall percentage of precise edits in somatic tissue. Recovery of a germline ror-TGA integration allele using PEn with RNP was robust, resulting in 5 out of 10 founders transmitting a precise allele. Lastly, the authors demonstrate that PEn was effective at the integration of a 30 bp nuclear localization signal into the 5' end of GFP in an existing muscle-specific reporter line. However, the undefined number of cassettes in this multicopy transgene complicates accurate measurements of editing frequency. Integration of the NLS or other longer sequences at an endogenous locus would demonstrate the broad utility of this approach. From the work presented, it is unclear how prime editing could be used to transiently model human pathogenic variants, given the low frequency of precision edits in somatic tissue, or to isolate stable germline alleles of variants that are potentially dominant negative or gain-of-function in nature. Without a direct comparison with CRISPR/Cas9 nuclease HDR-based methods that use oligonucleotide templates to introduce edits, the advantage of prime editing is unclear. A cost comparison between prime editing and HDR methods would also be of interest, particularly for integration of longer DNA sequences.

      The conclusions of the paper are mostly well supported, but some changes to the text and additional analyses would strengthen the conclusion that PE2 vs. PEn is preferred for introducing variants, short or long DNA sequences.

      (1) In Figure 3, the data indicate a significant increase in precise edits of the 3 bp TGA using PE2 RNP (11.5%) vs. PE2 mRNA (1.3%). At the adgrf3b locus, only PEn mRNA was tested for introducing the 3 bp and 12 bp insertions. The previous study testing PE2 for 3 and 12 bp insertions was mentioned, but the frequency was not listed, and the study wasn't cited (lines 204 - 207). A comparison of germline transmission rates using PE2 vs. PEn would support the conclusion that PEn allows precise integration of longer templates and recovery of germline integration alleles.

      (2) Figure 4 shows the results of introducing a TGA stop codon that is predicted to result in nonsense-mediated decay. Testing the ability to also isolate different substitution mutations in the germline would be useful information for identifying the most effective approach for generating human disease variant models.

      (3) A comparison with the prime editing variant knock-in frequencies reported in the recent publication by Vanhooydonck et al., 2025, Lab Animal should be included in the Discussion.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript provides a comparison of nickase-based (PE2) and nuclease-based (PEn) Prime Editors in zebrafish, evaluating their efficiencies for substitutions, short insertions (3-30 bp), and germline transmission.

      Strengths:

      The manuscript has demonstrated for the first time that nuclease-based PEn more efficiently inserts nucleotide sequences up to 30 bp (nuclear localization sequence) than PE2, providing an improvement for the application of gene editing in functional genetics research. Additionally, the demonstration of stable zebrafish lines with edited ror2 and smyhc1:gfp loci is well-supported by sequencing and phenotypic data, confirming functional consequences of edits.

      Weaknesses:

      The study lacks conceptual innovation, as the central methodology-RNP-based Prime Editor delivery in zebrafish-was previously established by Petri et al. (2022). The present study extends this by testing longer insertions (30 bp) with nuclease-based PEn, but this incremental advance does not substantially shift the field's understanding or capabilities. The manuscript does not sufficiently differentiate its contributions from these precedents.

      The comparative analysis between PE2 and PEn systems suffers from limited evidentiary support. The comparison relies on single loci for substitutions (crbn) and insertions (ror2), raising concerns about generalizability. Additional validation across multiple loci is necessary to support broad conclusions about PE2/PEn performance.

    4. Reviewer #3 (Public review):

      The manuscript by Ono et al describes the application of prime editors to introduce precise genetic changes in the zebrafish model system. Probably the most important observation is that, compared to the "standard" PE2, the prime editor with full nuclease activity appears to be more efficient at introducing insertions into the genome. Although many laboratories around the world have successfully used oligonucleotide-mediated HDR to insert short exogenous sequences such as epitope tags or loxP sites into the zebrafish genome, the method suffers from a high frequency of indels at the edit site. Thus, additional tools are badly needed, making this manuscript very important. Length of the longer reported insertion (+30) is quite close to the range of V5 (14 amino acids) and ALFA (12 amino acids without "spacer" prolines) epitope tags, as well as loxP site (34 nucleotides). Conclusions drawn in the paper are supported by compelling evidence. I only have a few minor comments:

      (1) The logic for introducing two nucleotide changes (at +3 and +10) to change a single amino acid (I378) should be explicitly explained in the main body of the manuscript. It is indeed self-explanatory when looking at Supplementary Figure 1. One way of doing it could be to include Supplementary Figure 1a in Figure 1.

      (2) It is not clear why a 3-nucleotide insertion was used to generate W722X. The human W720X is a single-nucleotide polymorphism, and it should be possible to make a corresponding zebrafish mutant by introducing two nucleotide changes.

      (3) Lines 137-138: T7 Endonuclease assay used in Figure 2d detects all polymorphisms, both precise changes and indels. Thus, if this assay were performed on embryos shown in Figure 1c-d, the overall percentage of modified alleles would be similarly higher for PEn over PE2 (add up precise prime edits and indels). The conclusion in the last sentence of the paragraph is, therefore, incorrect, I believe.

      (4) Use of terminology. "Germline transmission" is typically used to refer to the fraction of F0s transmitting desired changes (or transgenes) to their progeny, while "germline mosaicism" refers to the fraction of F1s with the desired change in the progeny of a given F0. "Germline transmission" in line 217 should be replaced with "germline mosaicism".

      (5) Lines 253-255: The fraction of injected embryos that had mosaic nuclear expression of GFP, indicative of NLS insertion, should be clarified. It should also be clarified whether embryos positive for nuclear GFP were preselected for amplicon sequencing and germline transmission analyses. This is extremely important for extrapolation to scenarios like epitope tagging, where preselection is not possible.

      (6) Statistical analyses. It would be helpful to clarify why different statistical tests are sometimes used to assess seemingly very similar datasets (Figures 1c, 1d, 2b, 2c, 2f).

      (7) Discussion. Since authors suggest that PEn might be especially beneficial for insertion of additional sequences, it is important to stress locus-to-locus variability of success. While the precise +3 insertion was indeed tremendously efficient at both tested loci (ror2 and adgrf3b), +12 addition into adgrf3b was over 10 times less efficient (lines 193-194). In contrast, +30 into smyhc:GFP using the shorter pegRNA was highly efficient again with an average of 8.5% of sequence reads indicating precise integration (line 257, Figure 5c). Longer pegRNA did not work nearly as well (Figure 5c), but was still much better than +12 into adgrf3b. As dangerous as it is to extrapolate from small datasets, perhaps these observations indicate that optimization of RT template and PBS may be needed for each new locus in order to significantly outperform oligonucleotide-mediated HDR? If so, would the cost of ordering several pegRNAs and the effort needed to compare them factor in when deciding which method to use? Reported germline transmission rates for both ror2 W722X (+3, Figure 4a) and smyhc:NLS-GFP (+30, Figure 5f) are tantalizingly high.

    1. eLife Assessment

      This important study demonstrates that disruption of a common protein-folding system renders drug-resistant clinical bacteria susceptible to antibiotics. The work convincingly shows that targeting protein folding can be used to combat multidrug-resistant pathogens, both by potentiating the efficacy of existing drugs and by therapeutic use of small-molecule inhibitors. This study is significant and timely as it informs on a new strategy that is relevant to microbiologists and clinicians interested in combating antimicrobial resistance.

    2. Reviewer #1 (Public review):

      Summary:

      In this work the authors provide evidence that impairment of cell envelope protein homeostasis through blocking the machinery for disulfide bond formation restores efficacy of antibiotics including beta-lactam drugs and colistin against AMR in Gram-negative bacteria.

      Strengths:

      The authors employ a thorough approach to showcase the restoration of antibiotic sensitivity through inhibition of the DSB machinery, including the evaluation of various antibiotics on both normal and Dsb-deficient pathogenic bacteria (i.e. Pseudomonas and Stenotrophomonas). The authors corroborate these findings by employing Dsb inhibitors in addition to delta dsbA strains. The methodology is appropriate and includes measuring MICs as well as validating their observations in vivo using the Galleria model.

    3. Reviewer #2 (Public review):

      Summary:

      This work by Kadeřábková and Furniss et al. demonstrates the importance of a specific protein folding system to effectively folding β-lactamase proteins, which are responsible for resistance to β-lactam antibiotics, and shows that inhibition of this system sensitize multidrug-resistant pathogens to β-lactam treatment. In addition, the authors extend these observations to a two-species co-culture model where β-lactamases provided by one pathogen can protect another, sensitive pathogen from β-lactam treatment. In this model, disrupting the protein folding system also disrupted protection of the sensitive pathogen from antibiotic killing. Overall, the data presented provide a convincing foundation for subsequent investigations and development of inhibitors for β-lactamases and other resistance determinants. This and similar strategies may have application to polymicrobial contexts when molecular interactions are suspected to confer resistance to natively antibiotic-sensitive pathogens.

      Strengths:

      The authors use clear and reliable molecular biology strategies to show that β-lactamase proteins from P. aeruginosa and Burkholderia species, expressed in E. coli in the absence of the dsbA protein folding system, are variably less capable of resisting the effects of different β-lactam antibiotics compared to the dsbA-competent parent strain (Figure 1). The appropriate control is included in the supplemental materials to demonstrate that this effect is specifically dependent on dsbA, since complementing the mutant with an intact dsbA gene restores antibiotic resistance (Figure S1). The authors subsequently show that this lack of activity can be explained by significantly reduced protein levels and loss-of-function protein misfolding in the dsbA mutant background (Figure 2). These data support the importance of this protein folding mechanism in the activity of multiple clinically relevant β-lactamases.

      Native bacterial species are used for subsequent experiments, and the authors provide important context for their antibiotic choices and concentrations by referencing the breakpoints that guide clinical practice. In Figure 4, the authors show that loss of the DsbA system in P. aeruginosa significantly sensitizes clinical isolates expressing different classes of β-lactamases to clinically relevant antibiotics. The appropriate control showing that the dsbA1 mutation does not result in sensitivity to a non-β-lactam antibiotic is included in Figure S2. The authors further show, using an in vivo model for antibiotic treatment, that treatment of a dsbA1 mutant results in moderate and near-complete survival of the infected organisms. The importance of this system in S. maltophilia is then investigated similarly (Figure 5), showing that a dsbA dsbL mutant is also sensitive to β-lactams and colistin, another antibiotic whose resistance mechanism is dependent on the DsbA protein folding system. Importantly, the authors show that a small-molecule inhibitor that disrupts the DsbA system, rather than genetic mutations, is also capable of sensitizing S. maltophilia to these antibiotics. It should be noted that while the sensitization is less pronounced, this molecule has not been optimized for S. maltophilia and would be expected to increase in efficacy following optimization. Together, the data support that interference with the DsbA system in native hosts can sensitize otherwise resistant pathogens to clinically relevant antibiotic therapy.

      Finally, the authors investigate the effects of co-culturing S. maltophilia and P. aeruginosa (Figure 5E). These assays are performed in synthetic cystic fibrosis sputum medium (SCFM), which provides a nutritional context similar to that in CF but without the presence of more complex components such as mucin. The authors show that while P. aeruginosa alone is sensitive to the antibiotic, it can survive moderate concentrations in the presence of S. maltophilia and even grow in higher concentrations where S. maltophilia appears to overproduce its β-lactamases. However, this protection is lost in S. maltophilia without the DsbA protein folding system, showing that the protective effect depends on functional production of β-lactamase in the presence of viable S. maltophilia. The authors further achieved the difficult task of labeling these multi-drug resistant pathogens with selection markers to determine co-infection CFUs in the supplemental materials. Overall, the data support a protective role for DsbA-dependent β-lactamase under these co-culture conditions.

      Weaknesses:

      No significant weaknesses are noted beyond the limitations identified and discussed by the authors.

    4. Reviewer #3 (Public review):

      Summary:

      In the face of emerging antibiotic resistance and slow pace of drug discovery, strategies that can enhance the efficacy of existing clinically used antibiotics are highly sought after. In this manuscript, through genetic manipulation of a model bacterium (Escherichia coli) and clinically isolated and antibiotic resistant strains of concern (Pseudomonas, Burkholderia, Stenotrophomonas), an additional drug target to combat resistance and potentiate existing drugs is put forward. These observations were validated in both pure cultures, mixed bacterial cultures and in worm models. The drug target investigated in this study appears to be broadly relevant to the challenge posed by lactamases enzyme that render lactam antibiotics ineffective in the clinic. The compounds that target this enzyme are being developed already, some of which were tested in this study displaying promising results and potential for further optimization by medicinal chemists.

      Strengths:

      The work is well designed and well executed and targets an urgent area of research with the unprecedented increase in antibiotic resistance.

      Weaknesses:

      The impact of the work can be strengthened by demonstrating increased efficacy of antibiotics in mice models or wound models for Pseudomonas infections. Worm models are relevant, but still distant from investigations in animal models.

    5. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Recommendation For the Authors):

      Thanks to the authors for addressing my suggestions. I think these modifications have improved the clarity of the data and the overall presentation of the manuscript. The methods are now more clearly explained, and the additional details help make the results easier to interpret. Where addressing the comment wasn't feasible, the authors gave reasonable explanations. Overall, the revisions strengthen the paper, and I have no further concerns.

      Thank you for your recommendations, which have significantly improved our paper.

      Reviewer #2 (Recommendation For the Authors):

      The additional work conducted by the authors is greatly appreciated. All concerns (and beyond) have been thoroughly addressed by the authors and I am thankful for their consideration and attention to detail. Only one possible issue with the revisions is described below for consideration:

      Regarding the CFU counts and/or axis labels in Figure S3B, some of the listed "CFU per 1 mL" values (in both the figure itself and File S2B) are extraordinarily high. For example, the greatest CFU for PA14 observed in Figure 4E is ~1x10^9. However, PA14 at 0 ug/mL Ceftazidime reaches nearly 1x10^16 in Figure S3B. From what I can tell, this should be beyond the capacity of bacteria in this space by several orders of magnitude. (E.g., a cubic centimeter [~1 mL] is ~1x10^12 cubic micrometers. At their smallest dimensions and volume, a maximum of ~1x10^13 cells could theoretically fit in this space assuming no liquid and perfect organization.) Similarly, both "AMM" and "AMM (+PA14)" consistently reach CFUs between 1x10^12 and 1x10^14 in this assay. Are the authors confident in the values and/or depiction of CFUs for this figure? It seems like this could be a labeling or dilutioncounting issue.

      Thank you for your positive remarks on our revised manuscript and for your constructive comments that have strengthened our work.

      We agree with the concern regarding the CFU counts in Figure S3B. The very high values (>10<sup>12</sup>CFU) reflect a technical enumeration artifact that, due to the nature of the assay, cannot be fully avoided. The origin of these inflated counts is described in more detail below:

      Following competition assays between Pseudomonas aeruginosa and Stenotrophomonas maltophilia in liquid culture with antibiotics, we enumerate survivors for each species by colony forming unit (CFU) counts. Because two different bacterial species must be quantified from mixed cultures, we use a gentamicin resistance marker carried by one species at a time.

      Each condition is therefore enumerated twice, as we alternate which species harbors the gentamicin cassette.

      During coculture in antibiotics and minimal medium, clinical isolates of P. aeruginosa and S. maltophilia, like those used here, can transiently increase their tolerance to antibiotics, including aminoglycosides. This reduces the effectiveness of gentamicin selection at the plating step necessary for CFU enumeration. For the data presented in Figure S3B, in a subset of highOD₆₀₀ conditions in the competition assay, this tolerance produces artificially inflated CFU values that exceed the biological carrying capacity during the CFU enumeration step.

      We evaluated alternative enumeration strategies (e.g., fluorescent protein markers with a nonselective medium), but these proved unsuitable for these strains due to differences in growth rates and media compatibility, introducing other large biases. Given these constraints, selective plating remains the only feasible approach for this work, and the associated artifact cannot be eliminated entirely.

      Importantly, transient resistance (tolerance), although common, is not a universal occurrence (e.g., we did not observe it when we performed the experiments shown in Figure 4E). When it does arise, it occurs reproducibly under the same experimental high-OD<sub>600</sub> conditions and does not obscure any of the relative comparisons that underpin our conclusions.

      For transparency, we have retained the measured values in Figure S3B and we note in the legend that counts above ~10<sup>12</sup> CFU represent a technical overestimation due to transient gentamicin tolerance. Counts below 10<sup>12</sup> CFU are accurately enumerated.

      Reviewer #3 (Recommendation For the Authors):

      All concerns have been satisfied and the manuscript is ready for publishing.

      Thank you for your recommendations, which have significantly improved our paper.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The study would benefit from presenting raw data in some cases, such as MIC values and SDS-PAGE gels, by clarifying the number of independent experiments used, as well as further clarification on statistical significance for some of the data.

      All original data used to generate Fig. 1, Fig. 4E, Fig. S3 and Fig. S4A are presented in File S2. Tab (A) is dedicated to data used for Fig. 1 and Fig. S4A, while tabs (B) and (C) show the data used for Fig. 4E and S3, respectively. This information is indicated in the legends of the relevant figures.

      All experiments in this study were performed in three independent (biological) experiments (with the exception of the complementation data shown in Fig. S1 and Fig. S5, which were performed in two independent (biological) experiments). The number of biological and technical replicates for each experiment is stated in the figure legends, as well as in the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper. Specifically, for antibiotic MIC assays we have not performed statistical analyses as per recommended practice. The reason for this is stated in the following section from the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper (lines 699-711 of the revised manuscript):

      “Antibiotic MIC values were determined in biological triplicate, except for MIC values recorded for dsbA complementation experiments in our E. coli K-12 inducible system that were carried out in duplicate. All ETEST MICs were determined as a single technical replicate, and all BMD MICs were determined in technical triplicate. All recorded MIC values are displayed in the relevant graphs; for MIC assays where three or more biological experiments were performed, the bars indicate the median value, while for assays where two biological experiments were performed the bars indicate the most conservative of the two values (i.e., for increasing trends, the value representing the smallest increase and for decreasing trends, the value representing the smallest decrease). We note that in line with recommended practice, our MIC results were not averaged. This should be avoided because of the quantized nature of MIC assays, which only inform on bacterial survival for specific antibiotic concentrations and do not provide information for antibiotic concentrations that lie in-between the tested values.”

      Reviewer #2 (Public review):

      While Figure 5E demonstrates a protective effect of DsbA-dependent β-lactamase, the omission of CFU data for S. maltophilia makes it difficult to assess the applicability of the polymicrobial strategy. Since S. maltophilia is pre-cultured prior to the addition of P. aeruginosa and antibiotics, it is unclear whether the protective effect is dependent on high S. maltophilia CFU. It is also unclear what the fate of the S. maltophilia dsbA dsbL mutant is under these conditions. If DsbA-deficient S. maltophilia CFU is not impacted, then this treatment will result in the eradication of only one of the pathogens of interest. If the mutant is lost during treatment, then it is not clear whether the loss of protection is due specifically to the production of non-functional β-lactamase or simply the absence of S. maltophilia.

      We have simultaneously tracked the abundance of P. aeruginosa and S. maltophilia strains in our cross-protection experiment for select antibiotic concentrations. To be able to perform this experiment, we had to label two extremely-drug-resistant strains of S. maltophilia with an antibiotic resistance marker that allowed us to quantify them in mixtures with P. aeruginosa. Our results can be found in Fig. S3 of our revised manuscript and, in a nutshell, show that ceftazidime treatment leads to eradication of both P. aeruginosa and S. maltophilia when disulfide bond formation is impaired in S. maltophilia.

      The following text was added to address the questions of the reviewer:

      “Due to the naturally different growth rates of these two species (S. maltophilia grows much slower than P. aeruginosa) especially in laboratory conditions, the protocol we followed [1] requires S. maltophilia to be grown for 6 hours prior to co-culturing it with P. aeruginosa. To ensure that at this point in the experiment our two S. maltophilia strains, with and without dsbA, had grown comparatively to each other, we determined their cell densities (Fig. S3A). We found that S. maltophilia AMM dsbA dsbL had grown at a similar level as the wild-type strain, and both were at a higher cell density [~10<sup>7</sup> colony forming units (CFUs)] compared to the P. aeruginosa PA14 inoculum (5 x 10<sup>4</sup> CFUs)” (lines 353-361 of the revised manuscript).

      “To ensure that ceftazidime treatment leads to eradication of both P. aeruginosa and S. maltophilia when disulfide bond formation is impaired in S. maltophilia, we monitored the abundance of both strains in each synthetic community for select antibiotic concentrations (Fig. S3B). In this experiment we largely observed the same trends as in Fig. 4E. At low antibiotic concentrations, for example 4 μg/mL of ceftazidime, S. maltophilia AMM is fully resistant and thrives, thus outcompeting P. aeruginosa PA14 (dark pink and dark blue bars in Fig. S3B). The same can also be seen in Fig. 4E, whereby decreased P. aeruginosa PA14 CFUs are recorded. By contrast S. maltophilia AMM dsbA dsbL already displays decreased growth at 4 μg/mL of ceftazidime because of its non-functional L1-1 enzyme, allowing comparatively higher growth of P. aeruginosa (light pink and light blue bars in Fig. S3B). Despite the competition between the two strains, P. aeruginosa PA14 benefits from S. maltophilia AMM’s high hydrolytic activity against ceftazidime, which allows it to survive and grow in high antibiotic concentrations even though it is not resistant (see 128 μg/mL; dark pink and dark blue bars in Fig. S3B). In stark opposition, without its disulfide bond in S. maltophilia AMM dsbA dsbL, L1-1 cannot confer resistance to ceftazidime, resulting in killing of S. maltophilia AMM dsbA dsbL and, consequently, also of P. aeruginosa PA14 (see 128 μg/mL; light pink and light blue bars in Fig. S3B).

      The data presented here show that, at least under laboratory conditions, targeting protein homeostasis pathways in specific recalcitrant pathogens has the potential to not only alter their own antibiotic resistance profiles (Fig. 3 and 4A-D), but also to influence the antibiotic susceptibility profiles of other bacteria that co-occur in the same conditions (Fig. 5). Admittedly, the conditions in a living host are too complex to draw direct conclusions from this experiment. That said, our results show promise for infections, where pathogen interactions affect treatment outcomes, and whereby their inhibition might facilitate treatment” (lines 381406 of the revised manuscript).

      The alleged clinical relevance and immediate, theoretical application of this approach should be properly contextualized. At multiple junctures, the authors state or suggest that interactions between S. maltophilia and P. aeruginosa are known to occur in disease or have known clinical relevance related to treatment failure and disease states. For instance, the citations provided for S. maltophilia protection of P. aeruginosa in the CF lung environment both describe simplified laboratory experiments rather than clinical or in vivo observations. Similarly, the citations provided for both the role of S. maltophilia in treatment failure and CF disease severity do not support either claim. The role of S. maltophilia in CF is currently unsettled, with more recent work reporting conflicting results that support S. maltophilia as a marker, rather than cause, of severe disease. These citations also do not support the suggestion that S. maltophilia specifically contributes to treatment failure. While it is reasonable to pursue these ideas as a hypothesis or potential concern, there is no evidence provided that these specific interactions occur in vivo or that they have clinical relevance.

      Thank you for your comment. You are entirely correct. We have amended the test throughout our revised manuscript to avoid overstating the role of S. maltophilia in CF infections and to reference additional relevant works in the literature. Please find below representative examples of such passages:

      “On the other hand, CF microbiomes are increasingly found to encompass S. maltophilia [2-4], a globally distributed opportunistic pathogen that causes serious nosocomial respiratory and bloodstream infections [5-7]. S. maltophilia is one of the most prevalent emerging pathogens [6] and it is intrinsically resistant to almost all antibiotics, including β-lactams like penicillins, cephalosporins and carbapenems, as well as macrolides, fluoroquinolones, aminoglycosides, chloramphenicol, tetracyclines and colistin. As a result, the standard treatment option for lung infections, i.e., broad-spectrum β-lactam antibiotic therapy, is rarely successful in countering S. maltophilia [7,8], creating a definitive need for approaches that will be effective in eliminating both pathogens” (lines 33-41 of the revised manuscript).

      “Of the organisms studied in this work, S. maltophilia deserves further discussion because of its unique intrinsic resistance profile. The prognosis of CF patients with S. maltophilia lung carriage is still debated [4,9-16], largely because studies with extensive and well-controlled patient cohorts are lacking. This notwithstanding, the therapeutic options against this pathogen are currently limited to one non-β-lactam antibiotic-adjuvant combination, , which is not always effective, trimethoprim-sulfamethoxazole [17-20], and a few last-line β-lactam drugs, like the fifth-generation cephalosporin cefiderocol and the combination aztreonam-avibactam. Resistance to commonly used antibiotics causes many problems during treatment and, as a result, infections that harbor S. maltophilia have high case fatality rates [7]. This is not limited to CF patients, as S. maltophilia is a major cause of death in children with bacteremia [5]” (lines 440-450 of the revised manuscript).

      Reviewer #3 (Public review):

      The impact of the work can be strengthened by demonstrating increased efficacy of antibiotics in mice models or wound models for Pseudomonas infections. Worm models are relevant, but still distant from investigations in animal models.

      Thank you for this comment. We appreciate the sentiment, and we would have liked to be able to perform experiments in a murine model of infection. There are several reasons that made this not possible, and as a result we used G. mellonella as an informative preliminary in vivo infection model. The DSB proteins have been shown to play a central role in bacterial virulence. Because of this our P. aeruginosa and S. maltophilia mutant strains are not efficient in establishing an infection, even in a wound model. This could be overcome had we been able to use the chemical inhibitor of the DSB system in vivo, however this also is not possible This is due to the fact that the chemical compound that we use to inhibit the function of DsbA acts on DsbB. Inhibition of DsbB blocks the re-oxidation of DsbA and leads to its accumulation in its inactive reduced form. However, the action of the inhibitor can be bypassed through reoxidation and re-activation of DsbA by small-molecule oxidants such as L-cystine, which are abundant in rich growth media or animal tissues. This makes the inhibitor only suitable for in vitro assays that can be performed in minimal media, where the presence of small-molecule oxidants can be strictly avoided, but entirely unsuitable for an insect or a vertebrate animal model.

      Reviewer #1 (Recommendation For the Authors):

      (1) The analysis of the role of DsbA in the assembly of cysteine-containing β-lactamases is a significant finding. However, in addition to showing the MIC fold difference, I think, it would be important to show the raw data for the actual MIC values obtained for each β-lactamase enzyme/antibiotic combination and in both strains (+ and - dsbA).

      Also, can the authors clarify whether these experiments were conducted on 3 independent samples (there seems to be some contradicting information in the paper and the supplementary figures). If possible, I would also recommend showing in the figure whether the MIC differences observed were statistically significant.

      All original data used to generate Fig. 1, Fig. 4E, Fig. S3 and Fig. S4A are presented in File S2. Tab (A) is dedicated to data used for Fig. 1 and Fig. S4A, while tabs (B) and (C) show the data used for Fig. 4E and S3, respectively. This information is indicated in the legends of the relevant figures.

      All experiments in this study were performed in three independent (biological) experiments (with the exception of the complementation data shown in Fig. S1 and Fig. S5, which were performed in two independent (biological) experiments). The number of biological and technical replicates for each experiment is stated in the figure legends, as well as in the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper. Specifically, for antibiotic MIC assays we have not performed statistical analyses as per recommended practice. The reason for this is stated in the following section from the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper (lines 699-711 of the revised manuscript):

      “Antibiotic MIC values were determined in biological triplicate, except for MIC values recorded for dsbA complementation experiments in our E. coli K-12 inducible system that were carried out in duplicate. All ETEST MICs were determined as a single technical replicate, and all BMD MICs were determined in technical triplicate. All recorded MIC values are displayed in the relevant graphs; for MIC assays where three or more biological experiments were performed, the bars indicate the median value, while for assays where two biological experiments were performed the bars indicate the most conservative of the two values (i.e., for increasing trends, the value representing the smallest increase and for decreasing trends, the value representing the smallest decrease). We note that in line with recommended practice, our MIC results were not averaged. This should be avoided because of the quantized nature of MIC assays, which only inform on bacterial survival for specific antibiotic concentrations and do not provide information for antibiotic concentrations that lie in-between the tested values.”

      (2) For Figure 2A, can the authors provide the full Westerns and ideally the SDS-PAGE gel corresponding to the Westerns where the Β-lactamases and the control DNA-K were detected.

      Thank you for this comment. Full immunoblots and SDS PAGE analysis of the immunoblot samples for total protein content are shown in File S3 of our revised manuscript.

      (3) For the enzymatic assays, was the concentration of enzyme used "normalised " based on the amount detected in the westerns where possible or was only the total amount of protein considered. When similar amounts of enzyme were added, was the activity still compromised?

      The β-lactam hydrolysis assay was normalized based on the weight of the cell pellets (wet cell pellet mass) of the tested strains. This means, that for each enzyme expressed in cells with and without DsbA, strains were normalized to the same weight to volume ratio, and thus strains expressing the same enzyme were only compared to each other.

      Because enzyme degradation in the absence of DsbA is a key factor underlying the effects we describe for most of the tested β-lactamases (see Fig. 2A and S4A; no protein band is detected for 5 of the 7 enzymes in the dsbA mutant), it was not possible to normalize our samples based on enzyme levels detected by immunoblot. Normalization based on enzyme amounts would be feasible had we purified each β-lactamase after expression in the two different strain backgrounds (+/- dsbA) assuming sufficient protein amounts could be isolated from the dsbA mutant strain. Nonetheless, we feel that such a comparison would be misleading, since enzyme degradation likely plays the biggest role in the lack of activity observed for most of these enzymes in the absence of DsbA.

      (4) Not sure whether Fig 3 is very informative. Perhaps it could be redesigned to better encapsulate the findings in this manuscript (combine figurer 3 and 6 into one). I would also include the chemical structure of the inhibitors used and perhaps include how they block the system by binding to DsbB.

      Thank you for this comment. Fig. 3 was combined with Fig. 6 of the submitted manuscript. The new model figure is Fig. 5 in our revised manuscript.

      The inhibitor compound used in our study has been extensively characterized in a previous publication [21]. Considering that this inhibitor is not the main focus of our paper, we have avoided showing its chemical structure in any of the main display items. That said, its structure can be found in File S5 of our revised manuscript, which contains the quality control information on this compound. As suggested, we included the following sentence to describe the mode of action of this inhibitor: “Compound 36 was previously shown to inhibit disulfide bond formation in P. aeruginosa via covalently binding onto one of the four essential cysteine residues of DsbB in the DsbA-DsbB complex [21]” (lines 309-311 of the revised manuscript).

      (5) Figure 4: Similar to my comment above showing in the figure whether the differences observed in Figure 4, particularly A-C, are statistically significant (i.e. galleria survival difference in the presence and absence of dsbA) would be beneficial.

      As mentioned in our answer to comment 1 above, we have not performed statistical analyses for antibiotic MIC assays because, in line with recommended practice, our MIC results were not averaged (Fig. 3A,B,D,E of our revised manuscript). This should be avoided because of the quantized nature of MIC assays, which only inform on bacterial survival for specific antibiotic concentrations and do not provide information for antibiotic concentrations that lie in-between the tested values. Statistical analysis of G. mellonella survival data (Fig. 3C,F of our revised manuscript) was performed and is described fully in the legend of Fig. 3, as well as in the “Statistical analysis of experimental data” part of the “Materials and Methods” section of the paper (lines 729-738 of the revised manuscript). Finally, the statistical analyses for the most important comparisons in panels (C) and (F) of Fig. 3 are also marked directly on the figure.

      (6) Were the authors able to test the redox state of DsbA upon addition of the DsbB inhibitor to further demonstrate that the effects observed were indeed due to the obstruction of the Dsb machinery and not due to off target effects.

      Thank you for the opportunity to clarify this. In previous work from our lab, we have used a DSB system inhibitor termed “compound 12” in [22] with activity against DsbB proteins from Enterobacteria. In our previous study [23] we, indeed, tested the redox state of DsbA in the presence of this inhibitor compound. We could not perform the same experiment here with “compound 36” from [21], because we do not have an antibody against the DsbA protein of S. maltophilia. That said, we have carried out experiments that confirm that our results are due to specific inhibition of the DSB system and not because of off-target effects. In particular, we show that the gentamicin MIC values of S. maltophilia AMM remain unchanged in the presence of the inhibitor and treatment of S. maltophilia AMM dsbA dsbL with the compound does not affects its colistin MIC value (Fig. S2E and lines 317-320 of the revised manuscript).

      (7) Given the remarkable effects shown by the DsbB inhibitor, did the authors use this compound to assess whether inhibition of the Dsb system with small molecules would block cross-resistance in S. maltophilia - P. aeruginosa mixed communities (Fig 5D).

      Unfortunately, this was not possible. The decrease in the ceftazidime MIC value of S. maltophilia AMM in the presence of the DSB inhibitor compound is more modest than the effects we observed when the dsbA dsbL mutant is used (compare Fig. 4D (left) with Fig.4A of the revised manuscript). This means that in the presence of the DSB inhibitor there are still sufficient amounts of functional β-lactamase present and we expect that they would contribute to cross-protection of P. aeruginosa. While the use of the DSB inhibitor does have a drastic impact on the colistin resistance profile of S. maltophilia AMM (Fig. 4D of the revised manuscript), unlike β-lactamases, which act as common goods, MCR enzymes act solely on the lipopolysaccharide of their producer and do not contribute to bacterial interactions, precluding the use of colistin for a cross-protection experiment.

      Reviewer #2 (Recommendation For the Authors):

      (1) The acronym used for synthetic cystic fibrosis sputum medium (lines 523, 531, 535, 601, and 603) is defined in the manuscript as 'SCF', but the common formulation is 'SCFM', including in the provided citation. Suggest changing to SCFM for consistency.

      Thank you for this comment. This has been amended throughout our revised manuscript.

      (2) In Figure 1, while the legend states that "No changes in MIC values are observed for strains harboring the empty vector control (pDM1)[...]" (lines 729-30), the median of ceftazidime in the pDM1 control appears to indicate a 2-fold decrease in MIC. This would not seem to significantly impact the other results since the MIC decreases observed for other conditions are all 3-fold or greater, but this should be addressed and/or explained in the text.

      You are correct. Thank you for the opportunity to clarify this. Generally, since MIC assays have a degree of variability, we have only followed decreases in MIC values that are greater than 2fold. Generally, for most of our controls, the recorded MIC fold changes are below 2-fold. The only exception to this is the ceftazidime MIC drop of the empty-vector control, showing a 2fold change, which we do not consider significant.

      To ensure that this is clear in our text and figure legends the following changes were made:

      The clause “only differences larger than 2-fold were considered” was added to the text (lines 110-111 of the revised manuscript).

      We amended the legend of Fig. 1 accordingly: “No changes in MIC values are observed for the aminoglycoside antibiotic gentamicin (white bars) confirming that absence of DsbA does not compromise the general ability of this strain to resist antibiotic stress. Minor changes in MIC values (≤ 2-fold) are observed for strains harboring the empty vector control (pDM1) or those expressing the class A β-lactamases L2-1 and LUT-1, which contain two or more cysteines (Table S1), but no disulfide bonds (top row)”.

      (3) Similarly, in Fig S1E, there appears to be only partial complementation for BPS-1m. Do the authors hypothesize that this observation is related to a folding defect, rather than degradation of protein, as described for BPS-1m for Figure 2?

      Thank you for the opportunity to clarify this. You are correct that we only achieve partial complementation for the E. coli strain expressing the BPS-1m enzyme from the Burkholderia complex. Despite the fact that the gene for this enzyme was codon optimized, we observed that its expression in E. coli is sub-optimal and incurs fitness effects. In fact, to record the data presented in our manuscript the E. coli strains had to be transformed anew every time. Considering that the related enzyme BPS-6 does not present any of these challenges, we attribute the partial complementation to technical difficulties with the expression of the bps-1m gene in E. coli. 

      We clarified this by adding the following clause to our manuscript: “we only achieve partial complementation for the dsbA mutant expressing BPS-1m, which we attribute to the fact that expression of this enzyme in E. coli is sub-optimal” (lines 132-134 of the revised manuscript).

      (4) Lines 204-206: "[...]we deleted the principal dsbA gene, dsbA1 (pathogenic bacteria often encode multiple DsbA analogues [24,25]), in several multidrug-resistant (MDR) P. aeruginosa clinical strains (Table S2)". That multiple DsbA analogues are often encoded is good information to provide, but it was unclear from quickly looking at the citations whether Pa is counted among these. Is it expected that all oxidative protein folding in Pa functions through DsbA1? Conveying this information, if possible, may make the impact of the results in this model clearer.

      Thank you for this comment. To address it we added the following text to our manuscript:

      “To determine whether the effects on β-lactam MICs observed in our inducible system (Fig. 1 and [23]) can be reproduced in the presence of other resistance determinants in a natural context with endogenous enzyme expression levels, we deleted the principal dsbA gene, dsbA1, in several multidrug-resistant (MDR) P. aeruginosa clinical strains (Table S2). Pathogenic bacteria often encode multiple DsbA analogues [24,25] and P. aeruginosa is no exception. It encodes two DsbAs, but DsbA1 has been found to catalyze the vast majority of the oxidative protein folding reactions taking place in its cell envelope [26]” (lines 172-178 of the revised manuscript).

      (5) Regarding the clinical Pa isolates G4R7 and G6R7, have the authors performed any phenotypic testing on these strains to identify differences that might explain the substantial difference in piperacillin MIC? I.e., can these isolates be distinguished by growth rate, genetic markers or expression levels, early or late infection, mucoidy, etc. This is not essential for the current work, but could weigh on the efficacy of this treatment strategy for AIM1expressing clinical isolates. (E.g., the G4R7 dsbA1 strain exhibits a piperacillin MIC still ~2fold higher than WT G6R7).

      Thank you for the opportunity to clarify this. For clinical strains used in our study, we have evaluated their antibiotic resistance profiles, but we have not performed any additional phenotypic characterization. There are many reasons that contribute to differences in antibiotic resistance, starting simply from β-lactamase expression levels and extending to organismal effects, like the ones mentioned by the reviewer. Such characterization would fall outside the scope of our paper, especially since we sensitize our tested P. aeruginosa clinical isolates for the majority of the β-lactams antibiotics tested. 

      We acknowledged this by adding the following sentence to our revised manuscript: 

      “Despite the fact that P. aeruginosa G4R7 dsbA1 was not sensitized for piperacillintazobactam, possibly due to the high level of piperacillin-tazobactam resistance of the parent clinical strain, our results across these two isolates show promise for DsbA as a target against β-lactam resistance in P. aeruginosa” (lines 191-194 of the revised manuscript).

      (6) Lines 180-2: "This shows that without their disulfide bonds, these proteins are unstable and are ultimately degraded by other cell envelope proteostasis components [33]". While it is clear that protein is significantly lost in all cases except for BPS-1m in 2A, the dsbA pDM1bla constructs in 2B appear to all retain non-trivial (>10-fold) nitrocefin hydrolysis activity compared to the dsbA pDM1 control. This does not impact the other results in 2B, but it would seem that a loss-of-function folding defect, as described subsequently for BPS-1m, is also part of the explanation for the observed MIC decreases, and this was not necessarily clear from the quoted passage. This could simply be clarified in the final sentence - that both mechanisms are potentially in play - if the authors agree with that interpretation.

      You are correct, thank you for your comment. We amended the text in our revised manuscript as follows: 

      The data presented so far (Fig. 1 and 2) demonstrate that disulfide bond formation is essential for the biogenesis (stability and/or protein folding) and, in turn, activity of an expanded set of clinically important β-lactamases, including enzymes that currently lack inhibitor options” (lines 158-161 of the revised manuscript).

      (7) While it is clear from Figure S2 that the various dsb mutants do not have a general growth defect or collateral sensitivity to another antibiotic, it does not appear that there is an analogous control for the DSB inhibitor demonstrating no growth/toxic effects at the concentration used. This could be provided similarly to Figure S2, using gentamicin as a control antibiotic.

      We have carried out experiments that confirm that our results are due to specific inhibition of the DSB system and not because of off-target effects. In particular, we show that the gentamicin MIC values of S. maltophilia AMM remain unchanged in the presence of the inhibitor and treatment of S. maltophilia AMM dsbA dsbL with the compound does not affects its colistin MIC value (Fig. S2E and lines 317-320 of the revised manuscript).

      (8) Complementation is appropriately provided for experiments with E. coli, but are not provided for P. aeruginosa or S. maltophilia. It should be straightforward to complement in Pa, but is also probably less critical considering the evidence from E. coli. However, since the Sm mutant is a gene cluster with two genes, it would seem more imperative to complement this strain. This reviewer is not familiar enough with Sm to know if complementation is routine or feasible with this organism; if not, the controls for the DSB inhibitor should at least be provided.

      As mentioned in our response to comment 7 above, we have carried out experiments that confirm that our DSB inhibitor results are due to specific inhibition of the DSB system and not because of off-target effects.

      Moreover, in response to this comment, we have further demonstrated that our results are due to the specific interaction of DsbA with β-lactamase enzymes by complementing dsbA deletions in representative clinical strains of multidrug-resistant Pseudomonas aeruginosa and extremely-drug-resistant Stenotrophomonas maltophilia. We would like to note here that gene complementation in clinical isolates remains very rare in the literature due to their high levels of resistance and limited genetic tractability. Most of the few complementation examples reported for these two organisms are limited to strains that, although pathogenic, are commonly used in the lab, or to complementation efforts in non-clinical strain systems (for example use of P. aeruginosa PA14 for complementation, instead of the focal clinical isolate).

      We tested three different complementation strategies, two of which ended up being unsuccessful. After approximately 9 months of work, we succeeded in complementing a representative clinical strain for each organism (P. aeruginosa CDC #769 dsbA1 and S. maltophilia AMM dsbA dsbL) by inserting the dsbA1 gene from P. aeruginosa PAO1 into the Tn7 site on the chromosome. Both clinical strains show full complementation for every antibiotic tested; our complementation results can be found in Fig. S2B,D of the revised manuscript.

      The following text was added for P. aeruginosa clinical isolates:

      We have demonstrated the specific interaction of DsbA with the tested β-lactamase enzymes in our E. coli K-12 inducible system using gentamicin controls (Fig. 1 and File S2A) and gene complementation (Fig. S1). To confirm the specificity of this interaction in P. aeruginosa, we performed representative control experiments in one of our clinical strains, P. aeruginosa CDC #769. We first tested the general ability of P. aeruginosa CDC #769 dsbA1 to resist antibiotic stress by recording MIC values against gentamicin, and found it unchanged compared to its parent (Fig. S2A). Gene complementation in clinical isolates is especially challenging and rarely attempted due to the high levels of resistance and lack of genetic tractability in these strains. Despite these challenges, to further ensure the specificity of the interaction of DsbA with tested β-lactamases in P. aeruginosa, we have complemented dsbA1 from P. aeruginosa PAO1 into P. aeruginosa CDC #769 dsbA1. We found that complementation of dsbA1 restores MICs to wild-type values for both tested β-lactam compounds (Fig. S2B) further demonstrating that our results in P. aeruginosa clinical strains are not confounded by off-target effects” (lines 226-239 of the revised manuscript).

      The following text was added for S. maltophilia clinical isolates: 

      “Since the dsbA and dsbL are organized in a gene cluster in S. maltophilia, we wanted to ensure that our results reported above were exclusively due to disruption of disulfide bond formation in this organism. First, we recorded gentamicin MIC values for S. maltophilia AMM dsbA dsbL and found them to be unchanged compared to the gentamicin MICs of the parent strain (Fig. S2C). This confirms that disruption of disulfide bond formation does not compromise the general ability of this organism to resist antibiotic stress. Next, we complemented S. maltophilia AMM dsbA dsbL. The specific oxidative roles and exact regulation of DsbA and DsbL in S. maltophilia remain unknown. For this reason and considering that genetic manipulation of extremely-drug-resistant organisms is challenging, we used our genetic construct optimized for complementing P. aeruginosa CDC #769 dsbA1 with dsbA1 from P. aeruginosa PAO1 (Fig. S2B) to also complement S. maltophilia AMM dsbA dsbL. We based this approach on the fact that DsbA proteins from one species have been commonly shown to be functional in other species [27-30]. Indeed, we found that complementation of S. maltophilia AMM dsbA dsbL with P. aeruginosa PAO1 dsbA1 restores MICs to wild-type values for both ceftazidime and colistin (Fig. S2D), conclusively demonstrating that our results in S. maltophilia are not confounded by off-target effects” (lines 282-297 of the revised manuscript).

      (9) In Figure 5E, the growth inhibition and loss of Pa CFU in 4 ug/mL ceftazidime for the Sm co-culture condition, which is subsequently lost in the Sm dsbA dsbL co-culture, does not appear to be discussed. As Pa is shown to grow fine in monoculture at this concentration, this result should be discussed in relation to the co-culture dynamics. Is it expected or observed that WT Sm is out-competing Pa under this condition and growing to a high CFU/mL? This would seem to have parallels to citation 49.

      As requested by this reviewer (see comment 10 below), we simultaneously tracked the abundance of P. aeruginosa and S. maltophilia strains in our cross-protection experiment. During this process we probed the abundances of the two organisms at 4 µg/mL of ceftazidime. Our results can be seen in Fig. S3B of the revised manuscript. The reviewer is correct and these effects are due to competition between P. aeruginosa and S. maltophilia with the latter being able to reach very high CFUs in this antibiotic concentration. 

      The following text on co-culture dynamics was added to our revised manuscript: 

      At low antibiotic concentrations, for example 4 μg/mL of ceftazidime, S. maltophilia AMM is fully resistant and thrives, thus outcompeting P. aeruginosa PA14 (dark pink and dark blue bars in Fig. S3B). The same can also be seen in Fig. 4E, whereby decreased P. aeruginosa PA14 CFUs are recorded. By contrast S. maltophilia AMM dsbA dsbL already displays decreased growth at 4 μg/mL of ceftazidime because of its non-functional L1-1 enzyme, allowing comparatively higher growth of P. aeruginosa (light pink and light blue bars in Fig. S3B)” (lines 384-390 of the revised manuscript).

      (10) The data presented in Figure 5E would be augmented by the inclusion of, for at least a few representative cases, the Sm CFUs relative to the Pa CFUs. In describing the protective effects of Sm on Pa for imipenem treatment, the authors of citation 12 note that the effect was dependent on Sm cell density. This raises the immediate question of whether the protection observed in this work is similarly dependent on cell density of Sm. It is unclear if the authors expect Sm to persist under these conditions, and it seems Sm CFU should be expected to be relatively high considering it is pre-incubated for 6 hours prior to the assay. What is the physiological state of these cells, and how are they affected by ceftazidime? While many other variables are likely relevant to the translation of this protection, the relative abundance and localization of Sm and Pa commonly observed in CF patients, as well as the effective concentration of antibiotic observed in vivo, is likely worth consideration.

      As mentioned in our response to comment 9 above, we have simultaneously tracked the abundance of P. aeruginosa and S. maltophilia strains in our cross-protection experiment for select antibiotic concentrations. To be able to perform this experiment, we had to label two extremely-drug-resistant strains of S. maltophilia with an antibiotic resistance marker that allowed us to quantify them in mixtures with P. aeruginosa. Our results can be found in Fig. S3 of our revised manuscript and, in a nutshell, show that ceftazidime treatment leads to eradication of both P. aeruginosa and S. maltophilia when disulfide bond formation is impaired in S. maltophilia.

      The following text was added to address the questions of the reviewer:

      “Due to the naturally different growth rates of these two species (S. maltophilia grows much slower than P. aeruginosa) especially in laboratory conditions, the protocol we followed [1] requires S. maltophilia to be grown for 6 hours prior to co-culturing it with P. aeruginosa. To ensure that at this point in the experiment our two S. maltophilia strains, with and without dsbA, had grown comparatively to each other, we determined their cell densities (Fig. S3A). We found that S. maltophilia AMM dsbA dsbL had grown at a similar level as the wild-type strain, and both were at a higher cell density [~10<sup>7</sup> colony forming units (CFUs)] compared to the P.aeruginosa PA14 inoculum (5 x 10<sup>4</sup> CFUs)” (lines 353-361 of the revised manuscript).

      “To ensure that ceftazidime treatment leads to eradication of both P. aeruginosa and S. maltophilia when disulfide bond formation is impaired in S. maltophilia, we monitored the abundance of both strains in each synthetic community for select antibiotic concentrations (Fig. S3B). In this experiment we largely observed the same trends as in Fig. 4E. At low antibiotic concentrations, for example 4 μg/mL of ceftazidime, S. maltophilia AMM is fully resistant and thrives, thus outcompeting P. aeruginosa PA14 (dark pink and dark blue bars in Fig. S3B). The same can also be seen in Fig. 4E, whereby decreased P. aeruginosa PA14 CFUs are recorded. By contrast S. maltophilia AMM dsbA dsbL already displays decreased growth at 4 μg/mL of ceftazidime because of its non-functional L1-1 enzyme, allowing comparatively higher growth of P. aeruginosa (light pink and light blue bars in Fig. S3B). Despite the competition between the two strains, P. aeruginosa PA14 benefits from S. maltophilia AMM’s high hydrolytic activity against ceftazidime, which allows it to survive and grow in high antibiotic concentrations even though it is not resistant (see 128 μg/mL; dark pink and dark blue bars in Fig. S3B). In stark opposition, without its disulfide bond in S. maltophilia AMM dsbA dsbL, L1-1 cannot confer resistance to ceftazidime, resulting in killing of S. maltophilia AMM dsbA dsbL and, consequently, also of P. aeruginosa PA14 (see 128 μg/mL; light pink and light blue bars in Fig. S3B).

      The data presented here show that, at least under laboratory conditions, targeting protein homeostasis pathways in specific recalcitrant pathogens has the potential to not only alter their own antibiotic resistance profiles (Fig. 3 and 4A-D), but also to influence the antibiotic susceptibility profiles of other bacteria that co-occur in the same conditions (Fig. 5). Admittedly, the conditions in a living host are too complex to draw direct conclusions from this experiment. That said, our results show promise for infections, where pathogen interactions affect treatment outcomes, and whereby their inhibition might facilitate treatment” (lines 381406 of the revised manuscript).

      (11) Regarding the role of microbial interactions in CF and other disease/infection contexts, the authors should temper their descriptions in accordance with citations provided. As an example, lines 96-99: "For example, in the CF lung, highly drug-resistant S. maltophilia strains actively protect susceptible P. aeruginosa from β-lactam antibiotics [12], and ultimately facilitate the evolution of β-lactam resistance in P. aeruginosa [14]."

      Neither citation provided here attests to Sm protection of Pa "in the CF lung". Both papers use a simplified in vitro co-culture model to assess Sm protection of Pa from antibiotics and the evolution of Pa antibiotic resistance in the presence or absence of Sm, respectively. In the latter case, it should also be noted that while the authors observed somewhat faster Pa resistance evolution in one co-culture condition, they did not observe it in the other, and that resistance evolution in general was observed regardless of co-culture condition. There are also statements in the ultimate and penultimate paragraphs of the Discussion section that repeat these points. The authors could re-frame this aspect of their investigation as part of a working hypothesis related to potential interactions of these pathogens, and should appropriately caveat what is and is not known from in vitro and in vivo/clinical work.

      Thank you for your comment. You are entirely correct. We have amended the test throughout our revised manuscript to avoid overstating these finding and to be clear about the fact that they originate from experimental studies. Please find below representative examples of such passages:

      “In particular, some antibiotic resistance proteins, like β-lactamases, which decrease the quantities of active drug present, function akin to common goods, since their benefits are not limited to the pathogen that produces them but can be shared with the rest of the bacterial community. This means that their activity enables pathogen cross-resistance when multiple species are present [1,31], something that was demonstrated in recent work investigating the interactions between pathogens that naturally co-exist in CF infections. More specifically, it was shown that in laboratory co-culture conditions, highly drug-resistant S. maltophilia strains actively protect susceptible P. aeruginosa from β-lactam antibiotics [1]. Moreover, this crossprotection was found to facilitate, at least under specific conditions, the evolution of β-lactam resistance in P. aeruginosa [32]” (lines 47-57 of the revised manuscript).

      “The antibiotic resistance mechanisms of S. maltophilia impact the antibiotic tolerance profiles of other organisms that are found in the same infection environment. S. maltophilia hydrolyses all β-lactam drugs through the action of its L1 and L2 β-lactamases [7,8]. In doing so, it has been experimentally shown to protect other pathogens that are, in principle, susceptible to treatment, such as P. aeruginosa [1]. This protection, in turn, allows active growth of otherwise treatable P. aeruginosa in the presence of complex β-lactams, like imipenem [1], and, at least in some conditions, increases the rate of resistance evolution of P. aeruginosa against these antibiotics [32]” (lines 332-340 of the revised manuscript).

      (12) Regarding the role of S. maltophilia in CF disease, the authors should either discuss clinical associations more completely or note the conflicting data on its role in disease. As an example, lines 84-87: "As a result, the standard treatment option, i.e., broad-spectrum βlactam antibiotic therapy, constitutes a severe risk for CF patients carrying both P. aeruginosa and S. maltophilia [10,11], creating an urgent need for antimicrobial approaches that will be effective in eliminating both pathogens."

      It is unclear how this treatment results in a "severe risk" for CF patients colonized by both Sm and Pa. Citation 10 suggests an association between anti-pseudomonal antibiotic use and increased prevalence of Sm, but neither citation supports a worsening clinical outcome from this treatment. Citation 10 further notes that clinical scores between Sm-positive and control cohorts could not be distinguished statistically. Citation 11 is a review that makes note of this conflicting data regarding Sm, including reference to a more recent (at the time) result using multivariate analysis showing no independent affect of Sm on survival.

      The above point similarly applies to other statements in the manuscript, for example at lines 266-267: "Considering the contribution of S. maltophilia strains to treatment failure in CF lung infections [8,10,11][...]" As well as lines 79-80: "Pulmonary exacerbations and severe disease states are also associated with the presence of S. maltophilia [8]"

      Again, the provided citations do not support the implication that Sm specifically 'contributes to treatment failure in CF lung infections' or that Sm is specifically associated with severe disease states. In addition to the previously discussed citations, citation 8 describes broad "pulmotypes" composed of 10 species/genera that could be associated with particular clinical (e.g., exacerbation) or treatment (e.g., antibiotic therapy) characteristics, but these cannot, without further analysis, be associated with, or causally linked to, a specific pathogen. While pulmotype 2 in citation 8 was associated with a more severe clinical state and appeared to have the highest relative abundance of Sm compared to other pulmotypes, Sm was not identified (Figure 4A) as an independent factor that distinguishes between moderate and severe disease, unlike Pa and some anaerobes (4F-H). The authors also observed that decreasing relative abundance of Pa, in particuar, is correlated with subsequent exacerbation, but did not correlate this with the presence of any other species or genera. Again, this should be re-framed with the appropriate caveat that this is a hypothesis with possible clinical significance.

      Several suggested papers are included below on Sm association with clinical characteristics to incorporate into the manuscript if the authors choose to do so:

      https://doi.org/10.1177/14782715221088909

      https://doi.org/10.1016/j.prrv.2010.07.003

      https://doi.org/10.1016/j.jcf.2013.05.009 https://doi.org/10.1002/ppul.23943

      https://doi.org/10.1002/14651858.CD005405.pub2

      https://doi.org/10.1164/rccm.2109078 http://dx.doi.org/10.1136/thx.2003.017707

      https://erj.ersjournals.com/content/23/1/98.short

      Thank you for your comment. You are entirely correct. We have amended the test throughout our revised manuscript to avoid overstating the role of S. maltophilia in CF infections and to reference additional relevant works in the literature. Please find below representative examples of such passages:

      “On the other hand, CF microbiomes are increasingly found to encompass S. maltophilia [2-4], a globally distributed opportunistic pathogen that causes serious nosocomial respiratory and bloodstream infections [5-7]. S. maltophilia is one of the most prevalent emerging pathogens [6] and it is intrinsically resistant to almost all antibiotics, including β-lactams like penicillins, cephalosporins and carbapenems, as well as macrolides, fluoroquinolones, aminoglycosides, chloramphenicol, tetracyclines and colistin. As a result, the standard treatment option for lung infections, i.e., broad-spectrum β-lactam antibiotic therapy, is rarely successful in countering S. maltophilia [7,8], creating a definitive need for approaches that will be effective in eliminating both pathogens” (lines 33-41 of the revised manuscript).

      “Of the organisms studied in this work, S. maltophilia deserves further discussion because of its unique intrinsic resistance profile. The prognosis of CF patients with S. maltophilia lung carriage is still debated [4,9-16], largely because studies with extensive and well-controlled patient cohorts are lacking. This notwithstanding, the therapeutic options against this pathogen are currently limited to one non-β-lactam antibiotic-adjuvant combination, , which is not always effective, trimethoprim-sulfamethoxazole [17-20], and a few last-line β-lactam drugs, like the fifth-generation cephalosporin cefiderocol and the combination aztreonam-avibactam. Resistance to commonly used antibiotics causes many problems during treatment and, as a result, infections that harbor S. maltophilia have high case fatality rates [7]. This is not limited to CF patients, as S. maltophilia is a major cause of death in children with bacteremia [5]” (lines 440-450 of the revised manuscript).

      Reviewer #3 (Recommendation For the Authors):

      (1) The referencing of supplemental figures does not follow a sequential order. For example, Figure S2 appears in the text before S1. The sequential ordering of figure numbers improves the readability and can be considered while editing the manuscript for revision.

      Thank you for this comment. This is amended in our revised manuscript and supplemental figures and files are cited in order.

      (2 )It will be useful to provide a brief description of ambler classes since these are important to study design (for a broader audience).

      Thank you for this suggestion. This has been added and can be found in lines 91-101 of the revised manuscript.

      (3) The rationale for using K12 strain for E. coli should be provided. It appears that is a model system that is well established in their lab, but a scientific rationale can be listed. Maybe this strain does not have any lactamases in its genome other than the one being expressed as compared to pathogenic E. coli?

      Thank you for this suggestion. This has been added and can be found in lines 104-106 of the revised manuscript.

      (4) The reviewers used worm model to test their observations, which is relevant. Given the significant implications of their work in overcoming resistance to clinically used antibiotics and availability of already generated dsbA mutants in clinical strains, it will be useful to investigate survival in animal models or at least wound models of Pseudomonas infections. The reviewer does not deem this necessary, but it will significantly increase the impact of their seminal work.

      Thank you for this comment. We appreciate the sentiment, and we would have liked to be able to perform experiments in a murine model of infection. There are several reasons that made this not possible, and as a result we used G. mellonella as an informative preliminary in vivo infection model. The DSB proteins have been shown to play a central role in bacterial virulence. Because of this our P. aeruginosa and S. maltophilia mutant strains are not efficient in establishing an infection, even in a wound model. This could be overcome had we been able to use the chemical inhibitor of the DSB system in vivo, however this also is not possible This is due to the fact that the chemical compound that we use to inhibit the function of DsbA acts on DsbB. Inhibition of DsbB blocks the re-oxidation of DsbA and leads to its accumulation in its inactive reduced form. However, the action of the inhibitor can be bypassed through reoxidation and re-activation of DsbA by small-molecule oxidants such as L-cystine, which are abundant in rich growth media or animal tissues. This makes the inhibitor only suitable for in vitro assays that can be performed in minimal media, where the presence of small-molecule oxidants can be strictly avoided, but entirely unsuitable for an insect or a vertebrate animal model.

    1. eLife Assessment

      This work represents an important contribution to our understanding of how membrane energetics influence protein conformation and function in mechano-sensitive channels. Through extensive molecular dynamics simulations and energetic analysis, the study convincingly demonstrates how the channel structure is shaped by a balance of protein and membrane-induced forces, effectively reconciling experimental data from different membrane environments. This work will appeal broadly to researchers and readers with interests in ion channel structure and function, mechanosensation, and membrane biophysics.

    2. Reviewer #1 (Public review):

      Dixit, Noe, and Weikl apply coarse-grained and all-atom molecular dynamics to determine the response of the mechanosensitive proteins Piezo 1 and Piezo 2 proteins to tension. Cryo-EM structures in micelles show a high curvature of the protein whereas structures in lipid bilayers show lower curvature. Is the zero-stress state of the protein closer to the micelle structure or the bilayer structure? Moreover, while the tension sensitivity of channel function can be inferred from experiment, molecular details are not clearly available. How much does the protein's height and effective area change in response to tension? With these in hand, a quantitative model of its function follows that can be related to the properties of the membrane and the effect of external forces.

      Simulations indicate that in a bilayer the protein relaxes from the highly curved cryo-EM dome (Figure 1).

      Under applied tension the dome flattens (Figure 2) including the underlying lipid bilayer. The shape of the system is a combination of the membrane mechanical and protein conformational energies (Eq. 1). The membrane mechanical energy is well-characterized. It requires only the curvature and bending modulus as inputs. They determine membrane curvature and the local area metric (Eq. 4) by averaging the height on a grid and computing second derivatives (Eqs. 7, 8) consistent with known differential geometric formulas.

      While I am still critical generally of a precise estimate of the energy from simulated membrane shapes (after all it is not trivial to precisely determine even the bending modulus from a simulation), I believe with their revision the authors have convinced me that their estimate is a high quality one, without obvious issues. Although there appears to have been a miscommunication about increasing the density of grain or lowering the density of grain, the authors have tried two grains and determined a similar deformation energy, which addresses my concern. Furthermore, they have computed a dramatically reduced simplification of the curve and determined a similar value.

      In summary, this paper uses molecular dynamics simulations to quantify the force of the Piezo 1 and Piezo 2 proteins on a lipid bilayer using simulations under controlled tension, observing the membrane deformation, and using that data to infer protein mechanics. While much of the physical mechanism was previously known, the study itself is a valuable quantification.

    3. Reviewer #2 (Public review):

      Summary:

      In this study the authors suggest that the structure of Piezo2 in a tensionless simulation is flatter compared to the electron microscopy structure. This is an interesting observation and highlights the fact that the membrane environment is important for Piezo2 curvature. Additionally, the authors calculate the excess area of Piezo2 and Piezo1, suggesting that it is significantly smaller compared the area calculated using the EM structure or simulations with restrained Piezo2. Finally, the authors propose an elastic model for Piezo proteins. Those are very important findings, which would be of interest to the mechanobiology field.

      Whilst I like the suggestion that the membrane environment will change Piezo2 flatness, could this be happening because of the lower resolution of the MARTINI simulations? In other words, would it be possible that MARTINI is not able to model such curvature due to its lower resolution?

      Related to my comment above, the authors say that they only restrained the secondary structure using an elastic network model. Whilst I understand why they did this, Piezo proteins are relatively large. How can the authors know that this type of elastic network model restrains, combined with the fact that MARTINI simulations are perhaps not very accurate in predicting protein conformations, can accurately represent the changes that happen within Piezo channel during membrane tension?

      Modelling or Piezo1, seems to be based on homology to Piezo2. However, the authors need to further evaluate their model, e.g. how it compares with an Alphafold model.

      To calculate the tension induce flattening of Piezo channel, the authors "divide all simulation trajectories into 5 equal intervals and determine the nanodome shape in each interval by averaging over the conformations of all independent simulation runs in this interval.". However, probably the change in the flattening of Piezo channel happens very quickly during the simulations, possibly within the same interval. Is this the case? and if yes does this affects their calculations?

      Finally, the authors use a specific lipid composition, which is asymmetric. Is it possible that the asymmetry of the membrane causes some of the changes in the curvature that they observe? Perhaps more controls, e.g. with a symmetric POPC bilayer is needed to identify whether membrane asymmetry plays a role in the membrane curvature they observe.

    4. Reviewer #3 (Public review):

      Strengths:

      This work focuses on a problem of deep significance: quantifying the structure-tension relationship and underlying mechanism for the mechanosensitive Piezo 1 and 2 channels. Such an objective is challenging for molecular dynamics simulations, due to the relatively large size of each membrane-protein system. Nonetheless, the approach chosen here is based on methodology that is, in principle, established and widely accessible. Therefore, another group of practitioners would likely be able to reproduce these findings with reasonable effort.

      More specifically, while acknowledging the limitations of the MARTINI force field, this work makes a significant improvement compared to previous simulations of Piezo proteins by adopting a range of membrane tensions that includes physiologically relevant values (below 10 mN/m).

      Weaknesses:

      The two main results of this paper are (1) that both channels exhibit a flatter structure compared to cryo-EM measurements, and (2) their estimated force vs. displacement relationship. Although the former correlates at least quantitatively with prior experimental work, the latter relies exclusively on simulation results and model parameters.

      My remaining technical concerns in the revised manuscript are as follows:

      (1) At each membrane tension, all concurrent atomistic simulations were initialized from the same snapshot of a previous CG simulation: in my opinion, it is inaccurate to refer to those atomistic simulations as "independent" from each other (as is done twice in the caption of Figure 3, as well as in the text).

      (2) Continuum mechanics calculations were employed to model the membrane's curvature energetics. The bending modulus, k, was not determined for the specific lipid composition used in this study, but was instead taken from previous MARTINI simulations involving the same primary lipid, POPC. Given that these calculations are intended to describe MARTINI simulations specifically, this approximation may be acceptable. However, it does not account for the increased stiffness observed in POPC/cholesterol mixtures-an effect measured experimentally but not reproduced by the MARTINI model-nor does it reflect the asymmetric conditions, as all referenced simulations involve symmetric bilayers. As a result, the bending energies and forces shown in Figure 5(c,d) are internally consistent within the model, but they probably correspond to real values up to an unknown multiplicative factor.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Dixit, Noe, and Weikl apply coarse-grained and all-atom molecular dynamics to determine the response of the mechanosensitive proteins Piezo 1 and Piezo 2 proteins to tension. Cryo-EM structures in micelles show a high curvature of the protein whereas structures in lipid bilayers show lower curvature. Is the zero-stress state of the protein closer to the micelle structure or the bilayer structure? Moreover, while the tension sensitivity of channel function can be inferred from the experiment, molecular details are not clearly available. How much does the protein's height and effective area change in response to tension? With these in hand, a quantitative model of its function follows that can be related to the properties of the membrane and the effect of external forces. 

      Simulations indicate that in a bilayer the protein relaxes from the highly curved cryo-EM dome (Figure 1). 

      Under applied tension, the dome flattens (Figure 2) including the underlying lipid bilayer. The shape of the system is a combination of the membrane mechanical and protein conformational energies (Equation 1). The membrane's mechanical energy is well-characterized. It requires only the curvature and bending modulus as inputs. They determine membrane curvature and the local area metric (Equation 4) by averaging the height on a grid and computing second derivatives (Equations 7, 8) consistent with known differential geometric formulas. 

      The bending energy can be limited to the nano dome but this implies that the noise in the membrane energy is significant. Where there is noise outside the dome there is noise inside the dome. At the least, they could characterize the noisy energy due to inadequate averaging of membrane shape. 

      My concern for this paper is that they are significantly overestimating the membrane deformation energy based on their numerical scheme, which in turn leads to a much stiffer model of the protein itself.

      We agree that “thermal noise” is intrinsic to MD simulations, as in “real” systems, leading to thermally excited shape fluctuations of membranes and conformational fluctuations of proteins. However, for our coarse-grained simulations, the thermally excited membrane shape fluctuations can be averaged out quite well, and the resulting average shapes are smooth, see e.g. the shapes and lines of the contour plots in Fig. 1 and 2. For our atomistic simulations, the averaged shapes are not as smooth, see Fig. 3a and the lines of the contour plots in Fig. 3b. Therefore, we do not report bending energies for the nanodome shapes determined from atomistic simulations, because bending energy calculations are sensitive to remaining “noise” on small scales (due to the scale invariance of the bending energy), in contrast to calculations of excess areas, which we state now on lines 620ff.

      For our coarse-grained simulations, we now corroborate our bending energy calculations based on averaged 3d shapes by comparing to bending energy values obtained from highly smoothened 2d mean curvature profiles (see Fig. 1c for mean curvature profiles in tensionless membranes). We discuss this in detail from line 323 on, starting with:

      “To corroborate our bending energy calculations for these averaged three-dimensional nanodome shapes, we note that essentially identical bending energies can be obtained from the highly smoothened mean curvatures M of the two-dimensional membrane profiles. …”

      Two things would address this: 

      (1) Report the membrane energy under different graining schemes (e.g., report schemes up to double the discretization grain). 

      There are two graining schemes in the modeling, and we have followed the reviewer’s recommendation regarding the second scheme. In the first, more central graining scheme, we use quadratic membrane patches with a sidelength of about 2 nm to determine membrane midplane shapes and lipid densities of each simulation conformation. This graining scheme has also been previously employed in Hu, Lipowsky, Weikl, PNAS 38, 15283 (2013) to determine the shape and thermal roughness of coarse-grained membranes. A sidelength of 2 nm is necessary to have sufficiently many lipid headgroups in the upper and lower leaflet in the membrane patches for estimating the local height of these leaflets, and the local membrane midplane height as average of these leaflet heights (see subsection “Membrane shape of simulation conformation” in the Methods section for details).  However, we strongly believe that doubling the sidelength of membrane patches in this discretization is not an option, because a discretization length of 4 nm is too coarse to resolve the membrane deformations in the nanodome, see e.g. the profiles in Fig. 1b. Moreover, any “noise” from this discretization is rather completely smoothened out in the averaging process used in the analysis of the membrane shapes, at least for the coarse-grained simulations. This averaging process requires rotations of membrane conformations to align the protein orientations of the conformations (see subsection “Average membrane shapes and lipid densities” for details). Because of these rotations, the original discretization is “lost” in the averaging, and a continuous membrane shape is generated. To calculate the excess areas and bending energies for this smooth, continuous membrane shape, we use a discretization of the Monge plane into a square lattice with lattice parameter 1 nm. As a response to the referee’s suggestion, we now report that the results for the excess area do not change significantly when doubling this lattice parameter to 2 nm. On line 597, we write:

      “For a lattice constant of a=2 nm, we obtain extrapolated values of the excess area Delta A from the coarse-grained simulations that are 2 to 3% lower than the values for a=1 nm, which is a small compared to statistical uncertainties with relative errors of around 10%.”

      On lines 614ff, we now state that the bending energy results are about 10% to 13% lower for a=2 nm, likely because of the lower resolution of the curvature in the nanodome compared to a=1 nm, rather than incomplete averaging and remaining roughness of the coarse-grained nanodome shapes.

      (2) For a Gaussian bump with sigma=6 nm I obtained a bending energy of 0.6 kappa, so certainly in the ballpark with what they are reporting but significantly lower (compared to 2 kappa, Figure 5 lower left). It would be simpler to use the Gaussian approximation to their curves in Figure 3 - and I would argue more accurate, especially since they have not reported the variation of the membrane energy with respect to the discretization size and so I cannot judge the dependence of the energy on discretization. I view reporting the variation of the membrane energy with respect to discretization as being essential for the analysis if their goal is to provide a quantitative estimate for the force of Piezo. The Helfrich energy computed from an analytical model with a membrane shape closely resembling the simulated shapes would be very helpful. According to my intuition, finite-difference estimates of curvatures will tend to be overestimates of the true membrane deformation energy because white noise tends to lead to high curvature at short-length scales, which is strongly penalized by the bending energy. 

      Instead of Gaussian bumps, we now calculate the membrane bending energy also from the two-dimensional, continuous mean curvature profiles (see Fig. 1c). These mean curvature profiles are highly smoothened (see figure caption for details). Nonetheless, we obtain essentially the same bending energies as in our discrete calculations of averaged, smoothened threedimensional membrane shapes, see new text on lines 326ff. We believe that this agreement corroborates our bending energy calculations. We still focus on values obtained for threedimensional membrane shapes, because of incomplete rotational symmetry. The three-dimensional membrane shapes exhibit variations with the three-fold symmetry of the Piezo proteins, see Figure 2a and b.

      We agree that the bending energy of thermally rough membranes depends on the discretization scheme, because the discretization length of any discretization scheme leads to a cut-off length for fluctuation modes in a Fourier analysis. But again, we average out the thermal noise, for reasons given in the Results section, and analyse smooth membrane shapes.  

      The fitting of the system deformation to the inverse time appears to be incredibly ad hoc ... Nor is it clear that the quantified model will be substantially changed without extrapolation. The authors should either justify the extrapolation more clearly (sorry if I missed it!) or also report the unextrapolated numbers alongside the extrapolated ones. 

      We report the values of the excess area and bending energy in the different time intervals of our analysis as data points in Fig. 4 with supplement. We find it important to report the time dependence of these quantities, because the intended equilibration of the membrane shapes in our simulations is not “complete” within a certain time window of the simulations. So, just “cutting” the first 20 and 50% of the simulation trajectories, and analysing the remaining parts as “equilibrated” does not seem to be a reasonable choice here, at least for the membrane properties, i.e. for the excess area and bending energy. We agree that the linear extrapolation used in our analysis is a matter of choice. At least for the coarse-grained simulations, the extrapolated values of excess areas and bending energies are rather close to the values obtained in the last time windows (see Figure 4). 

      In summary, this paper uses molecular dynamics simulations to quantify the force of the Piezo 1 and Piezo 2 proteins on a lipid bilayer using simulations under controlled tension, observing the membrane deformation, and using that data to infer protein mechanics. While much of the physical mechanism was previously known, the study itself is a valuable quantification. I identified one issue in the membrane deformation energy analysis that has large quantitative repercussions for the extracted model. 

      Reviewer #2 (Public review): 

      Summary: 

      In this study, the authors suggest that the structure of Piezo2 in a tensionless simulation is flatter compared to the electron microscopy structure. This is an interesting observation and highlights the fact that the membrane environment is important for Piezo2 curvature. Additionally, the authors calculate the excess area of Piezo2 and Piezo1, suggesting that it is significantly smaller compared to the area calculated using the EM structure or simulations with restrained Piezo2. Finally, the authors propose an elastic model for Piezo proteins. Those are very important findings, which would be of interest to the mechanobiology field. 

      Whilst I like the suggestion that the membrane environment will change Piezo2 flatness, could this be happening because of the lower resolution of the MARTINI simulations? In other words, would it be possible that MARTINI is not able to model such curvature due to its lower resolution? 

      Related to my comment above, the authors say that they only restrained the secondary structure using an elastic network model. Whilst I understand why they did this, Piezo proteins are relatively large. How can the authors know that this type of elastic network model restrains, combined with the fact that MARTINI simulations are perhaps not very accurate in predicting protein conformations, can accurately represent the changes that happen within the Piezo channel during membrane tension? 

      These questions regarding the reliability of the Martini model are very reasonable and are the reason why we include also results from atomistic simulations, at least for Piezo 2, and compare the results. In the Martini model, secondary structure constraints are standard. In addition, constraints on the tertiary structure (e.g. via an elastic network model) are also typically used in simulations of soluble, globular proteins. However, such tertiary constraints would make it impossible to simulate the tension-induced flattening of the Piezo proteins. So instead, as we write on lines 427ff, “we relied on the capabilities of the Martini coarse-grained force field for modeling membrane systems with TM helix assemblies (Sharma and Juffer, 2013; Chavent et al., 2014; Majumder and Straub, 2021).” In these refences, Martini simulations were used to study the assembly of transmembrane helices, leading to agreement with experimentally observed structures. As we state in our article, our atomistic simulations corroborate the Martini simulations, with the caveats that are now more extensively discussed in the new last paragraph of the Discussion section starting on line 362.

      Modelling or Piezo1, seems to be based on homology to Piezo2. However, the authors need to further evaluate their model, e.g. how it compares with an Alphafold model. 

      We understand the question, but see it beyond the scope of our article, also because of the computational demand of the simulations. The question is: Do coarse-grained simulations of Piezo1 based on an Alphafold model as starting structure lead to different results? It is important to note that we only model the rather flexible 12 TM helices at the outer ends of the Piezo 1 monomers via homology modeling to the Piezo 2 structure, which includes these TM helices. For the inner 26 TM helices, including the channel, we use the high-quality cryo-EM structure of Piezo 1. Alphafold may be an alternative for modeling the outer 12 helices, but we don’t think this would lead to statistically significant differences in simulations – e.g. because of the observed overall agreement of membrane shapes in all our Piezo 1 and Piezo 2 simulation systems.

      To calculate the tension-induced flattening of the Piezo channel, the authors "divide all simulation trajectories into 5 equal intervals and determine the nanodome shape in each interval by averaging over the conformations of all independent simulation runs in this interval.". However, probably the change in the flattening of Piezo channel happens very quickly during the simulations, possibly within the same interval. Is this the case? and if yes does this affect their calculations? 

      Unfortunately, the flattening is not sufficiently quick, so is not complete within the first time windows, see data points in Figure 4. We therefore report the time dependence with the plots in Figure 4 and extrapolate, see also our response above to reviewer 1.

      Finally, the authors use a specific lipid composition, which is asymmetric. Is it possible that the asymmetry of the membrane causes some of the changes in the curvature that they observe? Perhaps more controls, e.g. with a symmetric POPC bilayer are needed to identify whether membrane asymmetry plays a role in the membrane curvature they observe. 

      Because of the rather high computational demands, such controls are beyond our scope. We don’t expect statistically significant differences for symmetric POPC/cholesterol bilayers. On lines 229ff, we now state:

      “Our modelling assumes that any spontaneous curvature from asymmetries in the lipid composition is small compared to the curvature of the nanodome and, thus, negligible, which is plausible for the rather slight lipid asymmetry of our simulated membranes (see Methods).”

      Reviewer #3 (Public review): 

      Strengths: 

      This work focuses on a problem of deep significance: quantifying the structure-tension relationship and underlying mechanism for the mechanosensitive Piezo 1 and 2 channels. This objective presents a few technical challenges for molecular dynamics simulations, due to the relatively large size of each membrane-protein system. Nonetheless, the technical approach chosen is based on the methodology that is, in principle, established and widely accessible. Therefore, another group of practitioners would likely be able to reproduce these findings with reasonable effort. 

      Weaknesses: 

      The two main results of this paper are (1) that both channels exhibit a flatter structure compared to cryo-EM measurements, and (2) their estimated force vs. displacement relationship. Although the former correlates at least quantitatively with prior experimental work, the latter relies exclusively on simulation results and model parameters. 

      Below is a summary of the key points we recommend addressing in a revised version of the manuscript: 

      (1) The authors should report and discuss controls for the membrane energy calculations, specifically by increasing the density of the discretization graining. We also suggest validating the bending modulus used in the energy calculations for the specific lipid mixture employed in the study. 

      We have addressed both points, see our response to the reviewer’s comments for further details.

      (2) The authors should consider and discuss the potential limitations of the coarse-grained simulation force field and clarify how atomistic simulations validate the reported results, with a more detailed explanation of the potential interdependencies between the two. 

      We now discuss the caveats in the comparison of coarse-grained and atomistic simulations in more detail in a new paragraph starting on line 362.

      (3) The authors should provide further clarification on other points raised in the reviewers' comments, for instance, the potential role of membrane asymmetry. 

      We have done this – see above. We now further explain on lines 437ff why we use an asymmetric membrane. On lines 230ff, we discuss that any spontaneous membrane curvature due to lipid asymmetry is likely small compared to the nanodome curvature and, thus, negligible.

      Reviewer #1 (Recommendations for the authors): 

      (1) Report discretization dependence of the membrane energy (up to double the density of the current discretization graining). 

      We have added several text pieces in the paragraph “Excess area and bending energy” starting on line 583 in which we state how the results depend on the lattice constant a of the calculations.

      (2) Evaluate an analytical energy of a membrane bump with a shape similar to the simulation. This would be free of all sampling and discretization artifacts and would thus be an excellent lower bound of the energy. 

      We have done this for the curvature profile in Figure 1c and corresponding curvature profiles of the shape profiles in Figure 2d, see next text on lines 326ff.

      Minor: 

      (1)  The lipid density (Figure 1 right, 2c, 3c) is not interesting nor is it referred to. It can be dropped. 

      We think the lipid density maps are important for two reasons: First, they show the protein shape obtained after averaging conformations, as low-lipid-density regions. Second, the lipid densities are used in the calculation of the bending energies, to limit the bending energy calculations to the membrane in the nanodome, see Eq. 9. We therefore prefer to keep them.

      (2) Figure 7 is attractive but not used in a meaningful way. I suggest inserting the protein graphic from Figure 7 into Figure 1 with the 4-helix bundles numbered alongside the structure. Figure 7 could then be dropped. 

      Figure 7 is a figure of the Methods section. We need it to illustrate and explain aspects of the setup (numbering of helices, missing loops) and analysis (numbering scheme of 4-TM helix units).

      (3) Some editing of the use of the English language would be helpful. "Exemplary" is a bit of a funny word choice, it implies that the conformation is excellent, and not simply representative. I'd suggest "Representative conformation". 

      We agree and have replaced “exemplary” by “representative”.

      (4) Typos: 

      Equation 4 - Missing parentheses before squared operator inside the square root. 

      We have corrected this mistake.

      Reviewer #2 (Recommendations for the authors): 

      This study focuses mainly on Piezo2; the authors do not perform any atomistic simulations of Piezo1, and the coarse-grained simulations for Piezo1 are shorter. As a result, their analysis for Piezo2 seems more complete. It would be good if the authors did similar studies with Piezo1 as with Piezo2. 

      We agree that atomistic simulations of Piezo 1 would be interesting, too. However, because the atomistic simulations are particularly demanding, this is beyond our scope.

      Reviewer #3 (Recommendations for the authors): 

      (1) At line 63, a very large tension from the previous work by De Vecchis et al is reported (68 mN/m). The authors are sampling values up to about 21 mN/m, which is considerably smaller. However, these values greatly exceed what typical lipid membranes can sustain (about 10 mN/m) before rupturing. When mentioning these large tensions, the authors should emphasize that these values are not physiologically significant, because they would rupture most plasma membranes. That said, their use in simulation could be justified to magnify the structural changes compared to experiments. 

      We agree that our largest membrane tension values are unphysiological. However, we see a main novelty and relevance of our simulations in the fact that we obtain a response of the nanodome in the physiological range of membrane tensions, see e.g. the 3<sup>rd</sup> sentence of the abstract. Yes, we include simulations at tensions of 21 mN/m, but most of our simulated tension values are in the range from 0 to 10 mN/m (see e.g. Fig. 3e), in contrast to previous simulation studies.   

      (2) At line 78 and in the Methods, only the reference paper is for the CHARMM protein force field, but not for the lipid force field. 

      We have added the reference Klauda et al., 2010 for the CHARMM36 lipid force field in both spots.

      (3) (Line 83) Acknowledging that the authors needed to use the structure from micelles (because it has atomic resolution), how closely do their relaxed Piezo structures compare with the lowerresolution data from the MacKinnon and Patapoutian papers? 

      There are no structures reported in these papers to compare with, only a clear flattening as stated.  

      (4) (Line 99) The authors chose a slightly asymmetric lipid membrane composition to capture some specific plasma-membrane features. However, they do not discuss which features are described by this particular composition, which doesn't include different acyl-chain unsaturations between leaflets. Further, they do not seem to comment on whether there is enrichment of certain lipid species coupled to curvature, or whether there is any "scrambling" occurring when the dome section and the planar membrane are stitched together in the preparation phase (Figure 8). 

      Enrichment of lipids in contact with the protein is addressed in the reference Buyan et al., 2020, based on Martini simulations with Piezo 1. We have a different focus, but still wanted to keep an asymmetric membrane as in essentially all previous simulation studies as now stated also on lines 439ff, to mimic the native Piezo membrane environment. There is no apparent “scrambling” in the setup of our membrane systems. We also did not explore any coupling between curvature and lipid composition, but will publish the simulation trajectories to enable such studies.  

      (5) (Caption of Figure 2). Please comment briefly in the text why the tensionless simulation required a longer simulation run (e.g. larger fluctuations?) 

      We added as explanation on line 500 as explanation: “ … to explore the role of the long-range shape fluctuations in tensionless membranes for the relaxation into equilibrium”. The relaxation time of membrane shape fluctuations strongly increases with the wave length, which is only limited by the simulation box size in the absence of tensions. However, also for 8 microsecond trajectories, we do not observe complete equilibriation and therefore decided to extrapolate the excess area and bending energy values obtained for different time intervals of the trajectories.

      (6) (Caption of Figure 3). Please clarify in the Methods how the atomistic simulations were initialized were they taken from independent CG simulation snapshots? If not, the use of the adjective "independent" would be questionable given the very short atomistic simulation time length. 

      We now added that the production simulations started from the same structure. On lines 386, we now discuss the starting structure of the atomistic simulations in more detail.

      (7) (Line 202). The approach of discretizing the bilayer shape is reasonable, but no justification was provided for the 1-nm grid spacing. In my opinion, there should be a supporting figure showing how the bending energy varies with the grid spacing. 

      We now report also the effect of a 2-nm grid spacing on the results, see new text passages on page 18, and provide an explanation for the smaller 1-nm grid spacing on lines 587ff, where we write:

      “This lattice constant [a = 1 nm] is chosen to be smaller than the bin width of about 2nm used in determining the membrane shape of the simulation conformations, to take into account that the averaging of these membrane shapes can lead to a higher resolution compared to the 2 nm resolution of the individual membrane shapes.”

      (8) (Line 211). The choice by the authors to use a mixed lipid composition complicates the task of defining a reasonable bending modulus. Experimentally and in atomistic simulations, lipids with one saturated tail (like POPC or SOPC) are much stiffer when they are mixed with cholesterol (https://doi.org/10.1529/biophysj.105.067652, https://doi.org/10.1103/PhysRevE.80.021931, https://doi.org/10.1093/pnasnexus/pgad269). On the other hand, MARTINI seems to predict a slight *softening* for POPC mixed with cholesterol (https://doi.org/10.1038/s41467-023-43892-x). Further complicating this matter, mixtures of phospholipids with different preferred curvatures are predicted to be softer than pure bilayers (e.g. https://doi.org/10.1021/acs.jpcb.3c08117), but asymmetric bilayers are stiffer than symmetric ones in some circumstances (https://doi.org/10.1016/j.bpj.2019.11.3398). 

      This issue can be quite thorny: therefore, my recommendation would be to either: (a) directly compute k for their lipid composition, which is straightforward when using large CG bilayers (as was done in Fowler et al, 2016), but it would also require more advanced methods for the atomistic ones; (b) use a reasonable *experimental* value for k, based on a similar enough lipid composition. 

      We now justify in somewhat more detail why we use an asymmetric membrane, but agree that his complicates the bending energy estimates. We only aim to estimate the bending energy in the Martini 2.2 force field, because our elasticity model is based on and, thus, limited to results obtained with this force field. We have included the two further references using the Martini 2.2 force field suggested by the reviewer on line 213, and discuss now in more detail how the bending rigidity estimate enters and affects the modeling, see lines 226ff.  

      (9) (Line 224). Does this closing statement imply that all experimental work from ex-vivo samples describe Piezo states under some small but measurable tension? 

      We compare here to the cryo-EM structure in detergent micelles. So, there is no membrane tension, there may be a surface tension of the micelle, but we assume here that Piezo proteins are essentially force free in detergent micelles. Membrane embedding, in contrast, leads to strong forces on Piezo proteins already in the absence of membrane tension, because of the membrane bending energy.

      (10) (Line 304). The Discussion concludes with a reasonable point, albeit on a down note: could the authors elaborate on what kind of experimental approach may be able to verify their modeling results? 

      Very good question, but this is somewhat beyond our expertise. We don’t have a clear recommendation – it is complicated. What can be verified is the flattening, i.e. the height and curvature of the nanodome in lower-resolution experiments. We see our results in line with these experiments, see Introduction. 

      (11) (Line 331). The very title of the Majumder and Straub paper addresses the problem of excessive binding strength between protein beads in the MARTINI force field, which should be mentioned. Figure 3(d) shows that the atomistic systems have larger excess areas than the CG ones. This could be related to MARTINI's "stickiness", or just statistical sampling. Characterizing the grid spacing (see point 7 above) might help illuminate this. 

      We discuss now the larger excess area values of the atomistic simulations on lines 381ff.  

      (12) (Lines 367, 375). Are the harmonic restraints absolute position restraints or additional bonds?

      Note also that the schedule at which the restraints are released (10-ns intervals) is relatively quick. Does the membrane have enough time to equilibrate the number of lipids in each leaflet? 

      These are standard, absolute position restraints. The 10-ns intervals may be too short to fully equilibrate the numbers of lipids, we have not explored this. The main point in the setup was to have a reasonable TM helix embedding with a smooth membrane, without any rupturing. This turned out to be tricky, with the procedures illustrated in Figure 8 as solution. If the membrane is smooth, the lipid numbers quickly equilibrate either in the final relaxation or in the initial nanoseconds of the production runs.

      (13) (Line 387) The use of an isotropic barostat for equilibration further impedes the system's ability to relax its structure. I feel that the authors should validate more strongly their protocol to rule out the possibility that incomplete equilibration could bias dynamics towards flatter membranes, which is one of the main results of this paper. 

      We don’t see how choices in the initial relaxation steps could have affected our results, at least for the coarse-grained simulations. There is more and more flattening throughout all simulation trajectories, see e.g. the extrapolations in Figure 4. All initial simulation structures are significantly less flattened than the final structures in the production runs.

      (14) (Line 403). What is the protocol for reducing the membrane size for atomistic simulation? This is even more important to mention than for CG simulations. 

      We just cut lipids beyond the intended box size of the atomistic simulations. As a technical point, we now have also added on line 507 how PIP2 lipids were converted.

      (15) (Line 423). The CHARMM force field requires a cut-off distance of 12 Å for van der Waals forces, with a force-based continuous switching scheme. The authors should briefly comment on this deviation and its possible impact on membrane properties. Quick test simulations of very small atomistic bilayers with the chosen composition could be used as a comparison. 

      We don’t expect any relevant effect on membrane properties within the statistical accuracies of the quantities of interest here (i.e. excess areas).

      (16) (Equation 4). There are some mismatched parentheses: please check. 

      We have corrected this mistake.

      (17) (Equations 7-8). Why did the authors use finite-differences derivatives of z(x,y) instead of using cubic splines and the corresponding analytical derivatives? 

      In our experience, second derivatives of standard cubic splines can be problematic. The continuous membrane shapes we obtain in our analysis are averages of such splines. We find standard finite differences more reliable, and therefore discretize these shapes. Already for the 2d membrane profiles of Figure 1b and 2d, calculating curvatures from interpolations using splines is problematic.

    1. eLife Assessment

      In this revised version, the authors provide a thorough investigation of the interaction of megakaryocytes (MK) with their associated extracellular matrix (ECM) during maturation; they provide compelling evidence that the existence of a dense cage-like pericellular structure containing laminin γ1 and α4 and collagen IV is key to fixing the perisinusoidal localization of MK and preventing their premature intravasation. Adhesion of MK to this ECM cage is dependent on integrin beta1 and beta3 expressed by MK. This strong conclusion is based on the use of state-of-the art techniques such as the use of primary murine bone marrow MK cultures, mice lacking ECM receptors, namely integrin beta1 and beta3 null mice, as well as high-resolution 2D and 3D imaging. The study provides valuable insight into the role of cell-matrix interactions in MK maturation and provides an interesting model with practical implications for the fields of hemostasis and thrombosis

    2. Reviewer #1 (Public review):

      The authors report on a thorough investigation of the interaction of megakaryocytes (MK) with their associated ECM during maturation. They report convincing evidence to support the existence of a dense cage-like pericellular structure containing laminin γ1 and α4 and collagen IV, which interacts with integrins β1 and β3 on MK and serve to fix the perisinusoidal localization of MK and prevent their premature intravasation. As with everything in nature, the authors support a Goldilocks range of MK-ECM interactions - inability to digest the ECM via inhibition of MMPs leads to insufficient MK maturation and development of smaller MK. This important work sheds light into the role of cell-matrix interactions in MK maturation, and suggests that higher-dimensional analyses are necessary to capture the full scope of cellular biology in the context of their microenvironment. The authors have responded appropriately to the majority of my previous comments.

      Some remaining points:

      In a previous critique, I had suggested that "it is unclear how activation of integrins allows the MK to become "architects for their ECM microenvironment" as the authors posit. A transcriptomic analysis of control and DKO MKs may help elucidate these effects". The authors pointed out the technical difficulty of obtained sufficient numbers of MK for such analysis, which I accept, and instead analyzed mature platelets, finding no difference between control and DKO platelets. This is not necessarily surprising, since mature circulating platelets have no need to engage an ECM microenvironment, and for the same reason I would suggest that mature platelet analyses are not representative of MK behavior as regards ECM interactions.

    3. Reviewer #2 (Public review):

      Summary:

      This study makes a significant contribution to understanding the microenvironment of megakaryocytes (MKs) in the bone marrow, identifying an extracellular matrix (ECM) cage structure that influences MK localization and maturation. The authors provide compelling evidence for the presence of this ECM cage and its role in MK homeostasis, employing an array of sophisticated imaging techniques and molecular analyses.

      The authors have addressed most of the concerns raised in the previous review, providing clarifications and additional data that strengthen their conclusions

      More broadly, this work adds to a growing recognition of the ECM as an active participant in haematopoietic cell regulation in the bone marrow microenvironment. This work could pave the way to future studies investigating how the megakaryocytes' ECM cage affects their function as part of the haematopoietic stem cell niche, and by extension, influences global haematopoiesis.

    1. eLife Assessment

      This study presents important findings demonstrating that the internalization and degradation of FZD5 and FZD8, two of the ten Frizzled proteins, are WNT dependent and do not involve DVL. The evidence supporting the claims of the authors is convincing. This research will be of interest to biologists specializing in Wnt signaling, cancer, and regenerative medicine.

    2. Reviewer #1 (Public review):

      Summary:

      The mechanism by which WNT signals are received and transduced into the cell has been the topic of extensive research. Cell surface levels of the WNT receptors of the FZD family are subject to tight control and it's well established that the transmembrane ubiquitin ligases ZNRF3 and RNF43 target FZDs for degradation and that proteins of the R-spondin family block this effect. This manuscript explores the role that WNT proteins play in receptor internalization, recycling and degradation, and the authors provide evidence that WNTs promote interactions of FZD with the ubiquitin ligases. Using cells mutant in all 3 DVL genes, the authors demonstrate that this effect of WNT on FZD is DVL-independent.

      Strengths:

      Overall, the data are of good quality and support the authors' hypothesis. Strengths of this study is the use of CRISPR-mutated cell lines to establish genetic requirements for the various components. The finding that FZD internalization and degradation is WNT dependent and does not involve DVL is novel.

      Weaknesses:

      A weakness of the work includes a heavy reliance on overexpression of FZD proteins. To detect endogenous FZDs, the authors have inserted a V5 tag into the endogenous gene, which may affect their activity(ies).

    3. Reviewer #2 (Public review):

      In this manuscript Luo et al uncover that the ZNRF3/RNF43 E3 ubiquitin ligases participate in the selective endocytosis and degradation of FZD5/8 receptors in response to Wnt stimulation. In my opinion there are three significant findings of this study: 1) Wnt proteins are required for ZNRF3/RNF43 mediated endocytosis and degradation of FZD receptors and this constitutes an important negative regulatory loop. 2) Wnt can induce FZD endocytosis in the absence of ZNRF3/RNF43 but this does not influence total or cell surface levels. 3) The ZNRF3/RNF43 substrate selectivity for FZD5/8 over the other 8 Frizzleds. Of course, many questions remain, and new ones emerge as it is often the case, but these findings challenge our dogmatic view on how the ZNRF3/RNF43 regulate Wnt signaling and emphasize their role in Wnt-dependent Frizzled endocytosis/degradation and beta-catenin signaling.

      This is an elegant study employing several CRISPR-edited cell lines to tag endogenous Frizzled receptors and to knockout ZNRF3/RNF43 and all three Dishevelled proteins. One major strength of the study is therefore the careful assessment of the roles of RNF43 and ZNFR3 in endogenous expression contexts. This is especially relevant since overexpression of membrane E3 ligases have been shown to ectopically degrade membrane proteins and could have blurred previous interpretations. A second strength is clarifying the role of Dishevelled proteins in FZD endocytosis. Indeed, although previous studies suggested that the Wnt-promoted interaction between FZD and RNF43/ZNFR3 was mediated through Dvl, the authors clearly show that this is not the case (using Dvl knockout cells and functional assays). Dvl proteins, on the other han,d are still required for ligand-independent FZD-endocytosis.

      The only weakness pertains to the difference in signaling outcome, comparing elevated signaling seen when FZD levels are upregulated following ZNFR3/RNF43 KO vs ectopic overexpression. Indeed, the authors suggest that in the absence of RNF43/ZNFR3 the receptors could be recycled back to the PM and thereby contribute to increased signaling seen in the mutant cells. This has not been directly demonstrated.

    1. eLife Assessment

      This valuable study demonstrates that it is possible to decode information about characters and locations from single-unit responses in the human brain to a narrative movie, using a convincing technical approach to capture information in population-level dynamics. The study introduces an exciting dataset of single-unit responses in humans during a naturalistic and dynamic movie stimulus, with recordings from multiple regions within the medial temporal lobe. Using both a traditional firing-rate analysis as well as a population decoding analysis to connect these neural responses to the visual content of the movie, they show that in this dataset, the decoding of semantic scene features (e.g., the person currently on screen), but not scene transitions, is surprisingly driven by classically non-responsive neurons. Based on these findings, the authors argue that dynamic naturalistic semantic information may be processed within the medial temporal lobe at the population level.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Gerken et al examined how neurons in the human medial temporal lobe respond to and potentially code dynamic movie content. They had 29 patients watch a long-form movie while neurons within their MTL were monitored using depth electrodes. They found that neurons throughout the region were responsive to the content of the movie. In particular, neurons showed significant responses to people, places, and to a lesser extent, movie cuts. Modeling with a neural network suggests that neural activity within the recorded regions was better at predicting the content of the movies as a population, as opposed to individual neural representations. Surprisingly, a subpopulation of unresponsive neurons performed better than the responsive neurons at decoding the movie content, further suggesting that while classically nonresponsive, these neurons nonetheless provided critical information about the content of the visual world. The authors conclude from these results that low-level visual features, such as scene cuts, may be coded at the neuronal level, but that semantic features rely on distributed population-level codes.

      Strengths:

      Overall, the manuscript presents an interesting and reasonable argument for their findings and conclusions. Additionally, the large number of patients and neurons that were recorded and analyzed makes this data set unique and potentially very powerful. On the whole, the manuscript was very well written, and as it is, presents an interesting and useful set of data about the intricacies of how dynamic naturalistic semantic information may be processed within the medial temporal lobe.

      Weaknesses:

      There are a number of concerns I have based on some of the experimental and statistical methods employed that I feel would help to improve our understanding of the current data.

      In particular, the authors do not address the issue of superposed visual features very well throughout the manuscript. Previous research using naturalistic movies has shown that low-level visual features, particularly motion, are capable of driving much of the visual system (e.g, Bartels et al 2005; Bartels et al 2007; Huth et al 2012; Çukur et al 2013; Russ et al 2015; Nentwich et al 2023). In some of these papers, low-level features were regressed out to look at the influence of semantics, in others, the influence of low-level features was explicitly modeled. The current manuscript, for the most part, appears to ignore these features with the exception of scene cuts. Based on the previous evidence that low-level features continue to drive later cortical regions, it seems like including these as regressors of no interest or, more ideally, as additional variables, would help to determine how well MTL codes for semantic features over top of these lower-order variables.

      Following on this, much of the current analyses rely on the training of deep neural networks to decode particular features. The results of these analyses are illuminating, however, throughout the manuscript, I was increasingly wondering how the various variables interact with each other. For example, separate analyses were done for the patients, regions, and visual features. However, the logistic regression analysis that was employed could have all of these variables input together, obtaining beta weights for each one in an overall model. This would potentially provide information about how much each variable contributes to the overall decoding in relation to the others.

      A few more minor points that would help to clarify the current results involve the selection of data for particular analyses. For some analyses, the authors chose to appropriately downsample their data sets to compare across variables. However, there are a few places where similar downsampling would be informative, but was not completed. In particular, the analyses for patients and regions may have a more informative comparison if the full population were downsampled to match the size of the population for each patient or region of interest. This could be done with the Monte Carlo sampling that is used in other analyses, thus providing a control for population size while still sampling the full population.

    3. Reviewer #2 (Public review):

      Summary:

      This study introduces an exciting dataset of single-unit responses in humans during a naturalistic and dynamic movie stimulus, with recordings from multiple regions within the medial temporal lobe. The authors use both a traditional firing-rate analysis as well as a sophisticated decoding analysis to connect these neural responses to the visual content of the movie, such as which character is currently on screen.

      Strengths:

      The results reveal some surprising similarities and differences between these two kinds of analyses. For visual transitions (such as camera angle cuts), the neurons identified in the traditional response analysis (looking for changes in firing rate of an individual neuron at a transition) were the most useful for doing population-level decoding of these cuts. Interestingly, this wasn't true for character decoding; excluding these "responsive" neurons largely did not impact population-level decoding, suggesting that the population representation is distributed and not well-captured by individual-neuron analyses.

      The methods and results are well-described both in the text and in the figures. This work could be an excellent starting point for further research on this topic to understand the complex representational dynamics of single neurons during naturalistic perception.

      Weaknesses:

      (1) I am unsure what the central scientific questions of this work are, and how the findings should impact our understanding of neural representations. Among the questions listed in the introduction is "Which brain regions are informative for specific stimulus categories?". This is a broad research area that has been addressed in many neuroimaging studies for decades, and it's not clear that the results tell us new information about region selectivity. "Is the relevant information distributed across the neuronal population?" is also a question with a long history of work in neuroscience about localist vs distributed representations, so I did not understand what specific claim was being made and tested here. Responses in individual neurons were found for all features across many regions (e.g., Table S1), but decodable information was also spread across the population.

      (2) The character and indoor/outdoor labels seem fundamentally different from the scene/camera cut labels, and I was confused by the way that the cuts were put into the decoding framework. The decoding analyses took a 1600ms window around a frame of the video (despite labeling these as frame "onsets" like the feature onsets in the responsive-neuron analysis, I believe this is for any frame regardless of whether it is the onset of a feature), with the goal of predicting a binary label for that frame. Although this makes sense for the character and indoor/outdoor labels, which are a property of a specific frame, it is confusing for the cut labels since these are inherently about a change across frames. The way the authors handle this is by labeling frames as cuts if they are in the 520ms following a cut (there is no justification given for this specific value). Since the input to a decoder is 1600ms, this seems like a challenging decoding setup; the model must respond that an input is a "cut" if there is a cut-specific pattern present approximately in the middle of the window, but not if the pattern appears near the sides of the window. A more straightforward approach would be, for example, to try to discriminate between windows just after a cut versus windows during other parts of the video. It is also unclear how neurons "responsive" to cuts were defined, since the authors state that this was determined by looking for times when a feature was absent for 1000ms to continuously present for 1000ms, which would never happen for cuts (unless this definition was different for cuts?).

      (3) The architecture of the decoding model is interesting but needs more explanation. The data is preprocessed with "a linear layer of same size as the input" (is this a layer added to the LSTM that is also trained for classification, or a separate step?), and the number of linear layers after the LSTM is "adapted" for each label type (how many were used for each label?). The LSTM also gets to see data from 800 ms before and after the labeled frame, but usually LSTMs have internal parameters that are the same for all timesteps; can the model know when the "critical" central frame is being input versus the context, i.e., are the inputs temporally tagged in some way? This may not be a big issue for the character or location labels, which appear to be contiguous over long durations and therefore the same label would usually be present for all 1600ms, but this seems like a major issue for the cut labels since the window will include a mix of frames with opposite labels.

      (4) Because this is a naturalistic stimulus, some labels are very imbalanced ("Persons" appears in almost every frame), and the labels are correlated. The authors attempt to address the imbalance issue by oversampling the minority class during training, though it's not clear this is the right approach since the test data does not appear to be oversampled; for example, training the Persons decoder to label 50% of training frames as having people seems like it could lead to poor performance on a test set with nearly 100% Persons frames, versus a model trained to be biased toward the most common class. There is no attempt to deal with correlated features, which is especially problematic for features like "Summer Faces" and "Summer Presence", which I would expect to be highly overlapping, making it more difficult to interpret decoding performance for specific features.

      (5) Are "responsive" neurons defined as only those showing firing increases at a feature onset, or would decreased activity also count as responsive? If only positive changes are labeled responsive, this would help explain how non-responsive neurons could be useful in a decoding analysis.

      (6) Line 516 states that the scene cuts here are analogous to the hard boundaries in Zheng et al. (2022), but the hard boundaries are transitions between completely unrelated movies rather than scenes within the same movie. Previous work has found that within-movie and across-movie transitions may rely on different mechanisms, e.g., see Lee & Chen, 2022 (10.7554/eLife.73693).

    4. Reviewer #3 (Public review):

      This is an excellent, very interesting paper. There is a groundbreaking analysis of the data, going from typical picture presentation paradigms to more realistic conditions. I would like to ask the authors to consider a few points in the comments below.

      (1) From Figure 2, I understand that there are 7 neurons responding to the character Summer, but then in line 157, we learn that there are 46. Are the other 39 from other areas (not parahippocampal)? If this is the case, it would be important to see examples of these responses, as one of the main claims is that it is possible to decode as good or better with non-responsive compared to single responsive neurons, which is, in principle, surprising.

      (2) Also in Figure 2, there seem to be relatively very few neurons responding to Summer (1.88%) and to outdoor scenes (1.07%). Is this significant? Isn't it also a bit surprising, particularly for outdoor scenes, considering a previous paper of Mormann showing many outdoor scene responses in this area? It would be nice if the authors could comment on this.

      (3) I was also surprised to see that there are many fewer responses to scene cuts (6.7%) compared to camera cuts (51%) because every scene cut involves a camera cut. Could this have been a result of the much larger number of camera cuts? (A way to test this would be to subsample the camera cuts.)

      (4) Line 201. The analysis of decoding on a per-patient basis is important, but it should be done on a per-session basis - i.e., considering only simultaneously recorded neurons, without any pooling. This is because pooling can overestimate decoding performances (see e.g. Quian Quiroga and Panzeri NRN 2009). If there was only one session per patient, then this should be called 'per-session' rather than 'per-patient' to make it clear that there was no pooling.

      (5) In general, the decoding results are quite interesting, and I was wondering if the authors could give a bit more insight by showing confusion matrices, with the predictions of the appearance of each of the characters, etc. Some of the characters may appear together, so this could be another entry of the decoder (say, predicting person A, B, C, A&B, A&C, B&C, A&B&C). I guess this could also show the power of analyzing the population activity.

      (6) Lines 406-407. The claim that stimulus-selective responses to characters did not account for the decoding of the same character is very surprising. If I understood it correctly, the response criterion the authors used gives 'responsiveness' but not 'selectivity'. So, were people's responses selective (e.g., firing only to Summer) or non-selective (firing to a few characters)? This could explain why they didn't get good decoding results with responsive neurons. Again, it would be nice to see confusion matrices with the decoding of the characters. Another reason for this is that what are labelled as responsive neurons have relatively weak and variable responses.

      (7) Line 455. The claim that 500 neurons drive decoding performance is very subjective. 500 neurons gives a performance of 0.38, and 50 neurons gives 0.33.

      (8) Lines 492-494. I disagree with the claim that "character decoding does not rely on individual cells, as removing neurons that responded strongly to character onset had little impact on performance". I have not seen strong responses to characters in the paper. In particular, the response to Summer in Figure 2 looks very variable and relatively weak. If there are stronger responses to characters, please show them to make a convincing argument. It is fine to argue that you can get information from the population, but in my view, there are no good single-cell responses (perhaps because the actors and the movie were unknown to the subjects) to make this claim. Also, an older paper (Quian Quiroga et al J. Neurophysiol. 2007) showed that the decoding of individual stimuli in a picture presentation paradigm was determined by the responsive neurons and that the non-responsive neurons did not add any information. The results here could be different due to the use of movies instead of picture presentations, but most likely due to the fact that, in the picture presentation paradigm, the pictures were of famous people for which there were strong single neuron responses, unlike with the relatively unknown persons in this paper.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Gerken et al examined how neurons in the human medial temporal lobe respond to and potentially code dynamic movie content. They had 29 patients watch a long-form movie while neurons within their MTL were monitored using depth electrodes. They found that neurons throughout the region were responsive to the content of the movie. In particular, neurons showed significant responses to people, places, and to a lesser extent, movie cuts. Modeling with a neural network suggests that neural activity within the recorded regions was better at predicting the content of the movies as a population, as opposed to individual neural representations. Surprisingly, a subpopulation of unresponsive neurons performed better than the responsive neurons at decoding the movie content, further suggesting that while classically nonresponsive, these neurons nonetheless provided critical information about the content of the visual world. The authors conclude from these results that low-level visual features, such as scene cuts, may be coded at the neuronal level, but that semantic features rely on distributed population-level codes.

      Strengths:

      Overall, the manuscript presents an interesting and reasonable argument for their findings and conclusions. Additionally, the large number of patients and neurons that were recorded and analyzed makes this data set unique and potentially very powerful. On the whole, the manuscript was very well written, and as it is, presents an interesting and useful set of data about the intricacies of how dynamic naturalistic semantic information may be processed within the medial temporal lobe.

      We thank the reviewer for their comments on our manuscript and for describing the strengths of our presented work

      Weaknesses:

      There are a number of concerns I have based on some of the experimental and statistical methods employed that I feel would help to improve our understanding of the current data.

      In particular, the authors do not address the issue of superposed visual features very well throughout the manuscript. Previous research using naturalistic movies has shown that low-level visual features, particularly motion, are capable of driving much of the visual system (e.g, Bartels et al 2005; Bartels et al 2007; Huth et al 2012; Çukur et al 2013; Russ et al 2015; Nentwich et al 2023). In some of these papers, low-level features were regressed out to look at the influence of semantics, in others, the influence of low-level features was explicitly modeled. The current manuscript, for the most part, appears to ignore these features with the exception of scene cuts. Based on the previous evidence that low-level features continue to drive later cortical regions, it seems like including these as regressors of no interest or, more ideally, as additional variables, would help to determine how well MTL codes for semantic features over top of these lower-order variables.

      We thank the reviewer for this insightful comment and for the relevant literature regarding visual motion in not only the primary visual system but in cortical areas as well. While we agree that the inclusion of visual motion as a regressor of no interest or as an additional variable would be overall informative in determining if single neurons in the MTL are driven by this level of feature, we would argue that our analyses already provide some insight into its role and that only the parahippocampal cortical neurons would robustly track this feature.

      As noted by the reviewer, our model includes two features derived from visual motion: Camera Cuts (directly derived from frame-wise changes in pixel values)  and Scene Cuts (a subset of Camera Cuts restricted to changes in scene). As shown in Fig. 5a, decoding performance for these features was strongest in the parahippocampal cortex (~20%), compared to other MTL areas (~10%). While the entorhinal cortex also showed some performance for Scene Cuts (15%), we interpret this as being driven by the changes in location that define a scene, rather than by motion itself.

      These findings suggest that while motion features are tracked in the MTL, the effect may be most robust in the parahippocampal cortex. We believe that quantifying more complex 3D motion in a naturalistic stimulus like a full-length movie is a significant challenge that would likely require a dedicated study. We agree this is an interesting future research direction and will update the manuscript to highlight this for the reader.

      A few more minor points that would help to clarify the current results involve the selection of data for particular analyses. For some analyses, the authors chose to appropriately downsample their data sets to compare across variables. However, there are a few places where similar downsampling would be informative, but was not completed. In particular, the analyses for patients and regions may have a more informative comparison if the full population were downsampled to match the size of the population for each patient or region of interest. This could be done with the Monte Carlo sampling that is used in other analyses, thus providing a control for population size while still sampling the full population.

      We thank the reviewer for raising this important methodological point. The decision not to downsample the patient- and region-specific analyses was deliberate, and we appreciate the opportunity to clarify our rationale.

      Generally, we would like to emphasize that due to technical and ethical limitations of human single-neuron recordings, it is currently not possible to record large populations of neurons simultaneously in individual patients. The limited and variable number of recorded neurons per subject (Fig. S1) generally requires pooling neurons into a pseudo-populations for decoding, which is a well‐established standard in human single‐neuron studies (see e.g., (Jamali et al., 2021; Kamiński et al., 2017; Minxha et al., 2020; Rutishauser et al., 2015; Zheng et al., 2022)).

      For the patient-specific analysis, our primary goal was to show that no single patient's data could match the performance of the complete pseudo-population. Crucially, we found no direct relationship between the number of recorded neurons and decoding performance; patients with the most neurons (patients 4, 13) were not top performers, and those with the fewest (patients 11, 14) were not the worst (see Fig. 4). This indicates that neuron count was not the primary limiting factor and that downsampling would be unlikely to provide additional insight.

      Similarly, for the region-specific analysis, regions with larger neural populations did not systematically outperform those with fewer neurons (Fig. 5). Given the inherent sparseness of single-neuron data, we concluded that retaining the full dataset was more informative than excluding neurons simply to equalize population sizes.

      We agree that this methodological choice should be transparent and explicitly justified in the text. We will add an explanation to the revised manuscript to justify why this approach was taken and how it differs from the analysis in Fig. 6.

      Reviewer #2 (Public review):

      Summary:

      This study introduces an exciting dataset of single-unit responses in humans during a naturalistic and dynamic movie stimulus, with recordings from multiple regions within the medial temporal lobe. The authors use both a traditional firing-rate analysis as well as a sophisticated decoding analysis to connect these neural responses to the visual content of the movie, such as which character is currently on screen.

      Strengths:

      The results reveal some surprising similarities and differences between these two kinds of analyses. For visual transitions (such as camera angle cuts), the neurons identified in the traditional response analysis (looking for changes in firing rate of an individual neuron at a transition) were the most useful for doing population-level decoding of these cuts. Interestingly, this wasn't true for character decoding; excluding these "responsive" neurons largely did not impact population-level decoding, suggesting that the population representation is distributed and not well-captured by individual-neuron analyses.

      The methods and results are well-described both in the text and in the figures. This work could be an excellent starting point for further research on this topic to understand the complex representational dynamics of single neurons during naturalistic perception.

      We thank the reviewer for their feedback and for summarizing the results of our work.

      (1) I am unsure what the central scientific questions of this work are, and how the findings should impact our understanding of neural representations. Among the questions listed in the introduction is "Which brain regions are informative for specific stimulus categories?". This is a broad research area that has been addressed in many neuroimaging studies for decades, and it's not clear that the results tell us new information about region selectivity. "Is the relevant information distributed across the neuronal population?" is also a question with a long history of work in neuroscience about localist vs distributed representations, so I did not understand what specific claim was being made and tested here. Responses in individual neurons were found for all features across many regions (e.g., Table S1), but decodable information was also spread across the population.

      We thank the reviewer for this important point, which gets to the core of our study's contribution. While concepts like regional specificity are well-established from studies on the blood-flow level, their investigation at the single-neuron level in humans during naturalistic, dynamic stimulation remains a critical open question. The type of coding (sparse vs. distributed) on the other hand cannot be investigated with blood-flow studies as the technology lacks the spatial and temporal resolution.

      Our study addresses this gap directly. The exceptional temporal resolution of single-neuron recordings allows us to move beyond traditional paradigms and examine cellular-level dynamics as they unfold in neuronal response on a frame-by-frame basis to a more naturalistic and ecologically valid stimulus. It cannot be assumed that findings from other modalities or simplified stimuli will generalize to this context.

      To meet this challenge, we employed a dual analytical strategy: combining a classic single-unit approach with a machine learning-based population analysis. This allowed us to create a bridge between prior work and our more naturalistic data. A key result is that our findings are often consistent with the existing literature, which validates the generalizability of those principles. However, the differences we observe between these two analytical approaches are equally informative, providing new insights into how the brain processes continuous, real-world information.

      We will revise the introduction and discussion to more explicitly frame our work in this context, emphasizing the specific scientific question driving this study, while also highlighting the strengths of our experimental design and recording methods.

      (2) The character and indoor/outdoor labels seem fundamentally different from the scene/camera cut labels, and I was confused by the way that the cuts were put into the decoding framework. The decoding analyses took a 1600ms window around a frame of the video (despite labeling these as frame "onsets" like the feature onsets in the responsive-neuron analysis, I believe this is for any frame regardless of whether it is the onset of a feature), with the goal of predicting a binary label for that frame. Although this makes sense for the character and indoor/outdoor labels, which are a property of a specific frame, it is confusing for the cut labels since these are inherently about a change across frames. The way the authors handle this is by labeling frames as cuts if they are in the 520ms following a cut (there is no justification given for this specific value). Since the input to a decoder is 1600ms, this seems like a challenging decoding setup; the model must respond that an input is a "cut" if there is a cut-specific pattern present approximately in the middle of the window, but not if the pattern appears near the sides of the window. A more straightforward approach would be, for example, to try to discriminate between windows just after a cut versus windows during other parts of the video. It is also unclear how neurons "responsive" to cuts were defined, since the authors state that this was determined by looking for times when a feature was absent for 1000ms to continuously present for 1000ms, which would never happen for cuts (unless this definition was different for cuts?).

      We thank the reviewer for the valuable comment regarding specifically the cut labels. The choice to label frames that lie in a time window of 520ms following a cut as positive was selected based on prior research and is intended to include the response onsets across all regions within the MTL (Mormann et al., 2008). We agree that this explanation is currently missing from the manuscript, and we will add a brief clarification in the revised version.

      As correctly noted, the decoding analysis does not rely on feature onset but instead continuously decodes features throughout the entire movie. Thus, all frames are included, regardless of whether they correspond to a feature onset.

      Our treatment of cut labels as sustained events is a deliberate methodological choice. Neural responses to events like cuts often unfold over time, and by extending the label, we provide our LSTM network with the necessary temporal window to learn this evolving signature. This approach not only leverages the sequential processing strengths of the LSTM (Hochreiter et al., 1997) but also ensures a consistent analytical framework for both event-based (cuts) and state-based (character or location) features.

      (3) The architecture of the decoding model is interesting but needs more explanation. The data is preprocessed with "a linear layer of same size as the input" (is this a layer added to the LSTM that is also trained for classification, or a separate step?), and the number of linear layers after the LSTM is "adapted" for each label type (how many were used for each label?). The LSTM also gets to see data from 800 ms before and after the labeled frame, but usually LSTMs have internal parameters that are the same for all timesteps; can the model know when the "critical" central frame is being input versus the context, i.e., are the inputs temporally tagged in some way? This may not be a big issue for the character or location labels, which appear to be contiguous over long durations and therefore the same label would usually be present for all 1600ms, but this seems like a major issue for the cut labels since the window will include a mix of frames with opposite labels.

      We thank the reviewer for their insightful comments regarding the decoding architecture. The model consists of an LSTM followed by 1–3 linear readout layers, where the exact number of layers is treated as a hyperparameter and selected based on validation performance for each label type. The initial linear layer applied to the input is part of the trainable model and serves as a projection layer to transform the binned neural activity into a suitable feature space before feeding it into the LSTM. The model is trained in an end-to-end fashion on the classification task.

      Regarding temporal context, the model receives a 1600 ms window (800 ms before and after the labeled frame), and as correctly pointed out by the reviewer, LSTM parameters are shared across time steps. We do not explicitly tag the temporal position of the central frame within the sequence. While this may have limited impact for labels that persist over time (e.g., characters or locations), we agree this could pose a challenge for cut labels, which are more temporally localized.

      This is an important point, and we will clarify this limitation in the revised manuscript and consider incorporating positional encoding in future work to better guide the model’s focus within the temporal window. Additionally, we will add a data table, specifying the ranges of hyperparameters in our decoding networks. Hyperparameters were optimized for each feature and split individually, but we agree that some more details on how these parameters were chosen are important and we will provide a data table in our revised manuscript giving more insights into the ranges of hyperparameters.

      We thank the reviewer for this important point. We will clarify this limitation in the revised manuscript and note that positional encoding is a valuable direction to better guide the model’s focus within the temporal window. To improve methodological transparency, we will also add a supplementary table detailing the hyperparameter ranges used for our optimization process.

      (4) Because this is a naturalistic stimulus, some labels are very imbalanced ("Persons" appears in almost every frame), and the labels are correlated. The authors attempt to address the imbalance issue by oversampling the minority class during training, though it's not clear this is the right approach since the test data does not appear to be oversampled; for example, training the Persons decoder to label 50% of training frames as having people seems like it could lead to poor performance on a test set with nearly 100% Persons frames, versus a model trained to be biased toward the most common class. [...]

      We thank the reviewer for this critical and thoughtful comment. We agree that the imbalanced and correlated nature of labels in naturalistic stimuli is a key challenge.

      To address this, we follow a standard machine learning practice: oversampling is applied exclusively to the training data. This technique helps the model learn from underrepresented classes by creating more balanced training batches, thus preventing it from simply defaulting to the majority class. Crucially, the test set remains unaltered to ensure our evaluation reflects the model's true generalization performance on the natural data distribution.

      For the “Persons” feature, which appears in nearly all frames, defining a meaningful negative class is particularly challenging. The decoder must learn to identify subtle variations within a highly skewed distribution. Oversampling during training helps provide a more balanced learning signal, while keeping the test distribution intact ensures proper evaluation of generalization.

      The reviewer’s comment—that we are “training the Persons decoder to label 50% of training frames as having people”—may suggest that labels were modified. We want to emphasize this is not the case. Our oversampling strategy does not alter the labels; it simply increases the exposure of the rare, underrepresented class during training to ensure the model can learn its pattern despite its low frequency.

      We will revise the Methods section to describe this standard procedure more explicitly, clarifying that oversampling is a training-only strategy to mitigate class imbalance.

      (5) Are "responsive" neurons defined as only those showing firing increases at a feature onset, or would decreased activity also count as responsive? If only positive changes are labeled responsive, this would help explain how non-responsive neurons could be useful in a decoding analysis.

      We define responsive neurons as those showing increased firing rates at feature onset; we did not test for decreases in activity. We thank the reviewer for this valuable comment and will address this point in the revised manuscript by assessing responseness without a restriction on the direction of the firing rate.

      (6) Line 516 states that the scene cuts here are analogous to the hard boundaries in Zheng et al. (2022), but the hard boundaries are transitions between completely unrelated movies rather than scenes within the same movie. Previous work has found that within-movie and across-movie transitions may rely on different mechanisms, e.g., see Lee & Chen, 2022 (10.7554/eLife.73693).

      We thank the reviewer for pointing out this distinction and for including the relevant work from Lee & Chan (2022) which further contextualizes this distinction. Indeed, the hard boundaries defined in the cited paper differ slightly from ours. The study distinguishes between (1) hard boundaries—transitions between unrelated movies—and (2) soft boundaries—transitions between related events within the same movie. While our camera cuts resemble their soft boundaries, our scene cuts do not fully align with either category. We defined scene cuts to be more similar to the study’s hard boundaries, but we recognize this correspondence is not exact. We will clarify the distinctions between our scene cuts and the hard boundaries described in Zheng et al. (2022) in the revised manuscript, and will update our text to include the finding from Lee & Chan (2022).

      Reviewer #3 (Public review):

      This is an excellent, very interesting paper. There is a groundbreaking analysis of the data, going from typical picture presentation paradigms to more realistic conditions. I would like to ask the authors to consider a few points in the comments below.

      (1) From Figure 2, I understand that there are 7 neurons responding to the character Summer, but then in line 157, we learn that there are 46. Are the other 39 from other areas (not parahippocampal)? If this is the case, it would be important to see examples of these responses, as one of the main claims is that it is possible to decode as good or better with non-responsive compared to single responsive neurons, which is, in principle, surprising.

      We thank the reviewer for pointing out this ambiguity in the text. Yes, the other 39 units are responsive neurons from other areas. We will clarify to which neuronal sets the number of responsive neurons corresponds. We will also include response plots depicting the unit activity for the mentioned units.

      (2) Also in Figure 2, there seem to be relatively very few neurons responding to Summer (1.88%) and to outdoor scenes (1.07%). Is this significant? Isn't it also a bit surprising, particularly for outdoor scenes, considering a previous paper of Mormann showing many outdoor scene responses in this area? It would be nice if the authors could comment on this.

      We thank the reviewer for this insightful point. While a low response to the general 'outdoor scene' label seems surprising at first, our findings align with the established role of the parahippocampal cortex (PHC) in processing scenes and spatial layouts. In previous work using static images, each image introduces a new spatial context. In our movie stimulus, new spatial contexts specifically emerge at scene cuts. Accordingly, our data show a strong PHC response precisely at these moments. We will revise the discussion to emphasize this interpretation, highlighting the consistency with prior work.

      Regarding the first comment, we did not originally test if the proportion of the units is significant using e.g. a binomial test. We will include the results of a binomial test for each region and feature pair in the revised manuscript.

      (3) I was also surprised to see that there are many fewer responses to scene cuts (6.7%) compared to camera cuts (51%) because every scene cut involves a camera cut. Could this have been a result of the much larger number of camera cuts? (A way to test this would be to subsample the camera cuts.)

      The decrease in responsive units for scene cuts relative to camera cuts could indeed be due to the overall decrease in “trials” from one label to the other. To test this, we will follow the reviewer’s suggestion and perform tests using sets of randomly subsampled camera cuts and will include the results in the revised manuscript.

      (4) Line 201. The analysis of decoding on a per-patient basis is important, but it should be done on a per-session basis - i.e., considering only simultaneously recorded neurons, without any pooling. This is because pooling can overestimate decoding performances (see e.g. Quian Quiroga and Panzeri NRN 2009). If there was only one session per patient, then this should be called 'per-session' rather than 'per-patient' to make it clear that there was no pooling.

      The per-patient decoding was indeed also a per-session decoding, as each patient contributed only a single session to the dataset. We will make note of this explicitly in the text to resolve the ambiguity.

      (6) Lines 406-407. The claim that stimulus-selective responses to characters did not account for the decoding of the same character is very surprising. If I understood it correctly, the response criterion the authors used gives 'responsiveness' but not 'selectivity'. So, were people's responses selective (e.g., firing only to Summer) or non-selective (firing to a few characters)? This could explain why they didn't get good decoding results with responsive neurons. Again, it would be nice to see confusion matrices with the decoding of the characters. Another reason for this is that what are labelled as responsive neurons have relatively weak and variable responses.

      We thank the reviewer for pointing out the importance of selectivity in addition to responsiveness. Indeed, our response criterion does not take stimulus selectivity into account and exclusively measures increases in firing activity after feature onsets for a given feature irrespective of other features.

      We will adjust the text to reflect this shortcoming of the response-detection approach used here. To clarify the relationship between neural populations, we will add visualizations of the overlap of responsive neurons across labels for each subregion. These figures will be included in the revised manuscript.

      In our approach, we trained separate networks for each feature to effectively mitigate the issue of correlated feature labels within the dataset (see earlier discussion). While this strategy effectively deals with the correlated features, it precluded the generation of standard confusion matrices, as classification was performed independently for each feature.

      To directly assess the feature selectivity of responsive neurons, we will fit generalized linear models to predict their firing rates from the features. This approach will enable us to quantify their selectivity and compare it to that of the broader neuronal population.

      (7) Line 455. The claim that 500 neurons drive decoding performance is very subjective. 500 neurons gives a performance of 0.38, and 50 neurons gives 0.33.

      We agree with the reviewer that the phrasing is unclear. We will adjust our summary of this analysis as given in Line 455 to reflect that the logistic regression-derived neuronal rankings produce a subset which achieve comparable performance.

      (8) Lines 492-494. I disagree with the claim that "character decoding does not rely on individual cells, as removing neurons that responded strongly to character onset had little impact on performance". I have not seen strong responses to characters in the paper. In particular, the response to Summer in Figure 2 looks very variable and relatively weak. If there are stronger responses to characters, please show them to make a convincing argument. It is fine to argue that you can get information from the population, but in my view, there are no good single-cell responses (perhaps because the actors and the movie were unknown to the subjects) to make this claim. Also, an older paper (Quian Quiroga et al J. Neurophysiol. 2007) showed that the decoding of individual stimuli in a picture presentation paradigm was determined by the responsive neurons and that the non-responsive neurons did not add any information. The results here could be different due to the use of movies instead of picture presentations, but most likely due to the fact that, in the picture presentation paradigm, the pictures were of famous people for which there were strong single neuron responses, unlike with the relatively unknown persons in this paper.

      This is an important point and we thank the reviewer for highlighting a previous paradigm in which responsive neurons did drive decoding performance. Indeed, the fact that the movie, its characters and the corresponding actors were novel to patients could explain the disparity in decoding performance by way of weaker and more variable responses. We will include additional examples in the supplement of responses to features. Additionally, we will modify the text to emphasize the point that reliable decoding is possible even in the absence of a robust set of neuronal responses. It could indeed be the case that a decoder would place more weight on responsive units if they were present (as shown in the mentioned paper and in our decoding from visual transitions in the parahippocampal cortex).

    1. eLife Assessment

      This study presents valuable findings by demonstrating that specific GPCR subtypes induce distinct extracellular vesicle miRNA signatures, highlighting a potential novel mechanism for intercellular communication with implications for receptor pharmacology within the field. The data is compelling, however, more evidence is needed to determine whether the distinct extracellular vesicle miRNA signatures result from GPCR-dependent miRNA expression or GPCR-dependent incorporation of miRNAs into extracellular vesicles.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors explore a novel concept: GPCR-mediated regulation of miRNA release via extracellular vesicles (EVs). They perform an EV miRNA cargo profiling approach to investigate how specific GPCR activations influence the selective secretion of particular miRNAs. Given that GPCRs are highly diverse and orchestrate multiple cellular pathways - either independently or collectively - to regulate gene expression and cellular functions under various conditions, it is logical to expect alterations in gene and miRNA expression within target cells.

      Strengths:

      The novel idea of GPCRs-mediated control of EV loading of miRNAs.

      Weaknesses:

      Incomplete findings failed to connect and show evidence of any physiological parameters that are directly related to the observed changes. The mechanical detail is lacking.

      The manuscript falls short of providing a comprehensive understanding. Identifying changes in cellular and EV-associated miRNAs without elucidating their physiological significance or underlying regulatory mechanisms limits the study's impact. Without demonstrating whether these miRNA alterations have functional consequences, the findings alone are insufficient. The findings may be suitable for more specialized journals.

      Furthermore, a critical analysis of the relationship between cellular miRNA levels and EV miRNA cargo is essential. Specifically, comparing the intracellular and EV-associated miRNA pools could reveal whether specific miRNAs are preferentially exported, a behavior that should be inversely related to their cellular abundance if export serves a beneficial function by reducing intracellular levels. This comparison is vital to strengthen the biological relevance of the findings and support the proposed regulatory mechanisms by GPCRs.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines how activating specific G protein-coupled receptors (GPCRs) affects the microRNA (miRNA) profiles within extracellular vesicles (EVs). The authors seek to identify whether different GPCRs produce unique EV miRNA signatures and what these signatures could indicate about downstream cellular processes and pathological processes.

      Methods:

      (1) Used U2OS human osteosarcoma cells, which naturally express multiple GPCR types.

      (2) Stimulated four distinct GPCRs (ADORA1, HRH1, FZD4, ACKR3) using selective agonists.

      (3) Isolated EVs from culture media and characterized them via size exclusion chromatography, immunoblotting, and microscopy.

      (4) Employed qPCR-based miRNA profiling and bioinformatics analyses (e.g., KEGG, PPI networks) to interpret expression changes.

      Key Findings:

      (1) No significant change in EV quantity or size following GPCR activation.

      (2) Each GPCR triggered a distinct EV miRNA expression profile.

      (3) miRNAs differentially expressed post-stimulation were linked to pathways involved in cancer, insulin resistance, neurodegenerative diseases, and other physiological/pathological processes.

      (4) miRNAs such as miR-550a-5p, miR-502-3p, miR-137, and miR-422a emerged as major regulators following specific receptor activation.

      Conclusions:

      The study offers evidence that GPCR activation can regulate intercellular communication through miRNAs encapsulated within extracellular vesicles (EVs). This finding paves the way for innovative drug-targeting strategies and enhances understanding of drug side effects that are mediated via GPCR-related EV signaling.

      Strengths:

      (1) Innovative concept: The idea of linking GPCR signaling to EV miRNA content is novel and mechanistically important.

      (2) Robust methodology: The use of multiple validation methods (biochemical, biophysical, and statistical) lends credibility to the findings.

      (3) Relevance: GPCRs are major drug targets, and understanding off-target or systemic effects via EVs is highly valuable for pharmacology and medicine.

      Weaknesses:

      (1) Sample Size & Scope: The analysis included only four GPCRs. Expanding to more receptor types or additional cell lines would enhance the study's applicability.

      (2) Exploratory Nature: This study is primarily descriptive and computational. It lacks functional validation, such as assessing phenotypic effects in recipient cells, which is acknowledged as a future step.

      (3) EV heterogeneity: The authors recognize that they did not distinguish EV subpopulations, potentially confounding the origin and function of miRNAs.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors explore a novel concept: GPCR-mediated regulation of miRNA release via extracellular vesicles (EVs). They perform an EV miRNA cargo profiling approach to investigate how specific GPCR activations influence the selective secretion of particular miRNAs. Given that GPCRs are highly diverse and orchestrate multiple cellular pathways - either independently or collectively - to regulate gene expression and cellular functions under various conditions, it is logical to expect alterations in gene and miRNA expression within target cells.

      Strengths:

      The novel idea of GPCRs-mediated control of EV loading of miRNAs.

      Weaknesses:

      Incomplete findings failed to connect and show evidence of any physiological parameters that are directly related to the observed changes. The mechanical detail is lacking.

      We appreciate the reviewer's acknowledgment of the novelty of this study. We agree with the reviewer that further mechanistic insights would strengthen the manuscript. The mechanisms by which miRNA is sorted into EVs remain poorly understood. Various factors, including RNA-binding protein, sequence motifs, and cellular location, can influence this sorting process(Garcia-Martin et al., 2022; Liu & Halushka, 2025; Villarroya-Beltri et al., 2013; Yoon et al., 2015). Ago2, a key component of the RNA-induced silencing complexes, binds to miRNA and facilitates miRNA sorting. Ago2 has been found in the EVs and can be regulated by the cellular signaling pathway.  For instance, McKenzie et al. demonstrated that KRAS-dependent activation of MEK-ERK can phosphorylate Ago2 protein, thereby regulating the sorting of specific miRNAs into EVs(McKenzie et al., 2016). In the differentiated PC12 cells, Gαq activation leads to the formation of Ago2-associated granules, which selectively sequester unique transcripts(Jackson et al., 2022). Investigating GPCR, G protein, and GPCR signaling on Ago2 expression, location, and phosphorylation states could provide valuable insights into how GPCRs regulate specific miRNAs within EVs. We have expanded these potential mechanisms and future research in the discussion section.

      The manuscript falls short of providing a comprehensive understanding. Identifying changes in cellular and EV-associated miRNAs without elucidating their physiological significance or underlying regulatory mechanisms limits the study's impact. Without demonstrating whether these miRNA alterations have functional consequences, the findings alone are insufficient. The findings may be suitable for more specialized journals.

      Thank you for the feedback. We acknowledge that validating the target genes of the top candidate miRNAs is an important next step. In response to the reviewer's concerns, we have expanded the discussion of future research in the manuscript. Although this initial study is primarily descriptive, it establishes a novel conceptual link between GPCR signaling and EV-mediated communication.

      Furthermore, a critical analysis of the relationship between cellular miRNA levels and EV miRNA cargo is essential. Specifically, comparing the intracellular and EV-associated miRNA pools could reveal whether specific miRNAs are preferentially exported, a behavior that should be inversely related to their cellular abundance if export serves a beneficial function by reducing intracellular levels. This comparison is vital to strengthen the biological relevance of the findings and support the proposed regulatory mechanisms by GPCRs.

      We appreciate the valuable suggestions from the reviewer. EV miRNA and cell miRNAs may exhibit distinct profiles as miRNAs can be selectively sorted into or excluded from EVs(Pultar et al., 2024; Teng et al., 2017; Zubkova et al., 2021). Investigating the difference between cellular miRNA levels and EV miRNA cargo would provide insight into the mechanism of miRNA sorting and the functions of miRNAs in the recipient cells. The expression of the cellular miRNAs is a highly dynamic process. To accurately compare the miRNA expression levels, profiling of EV miRNA and cellular miRNA should be conducted simultaneously. However, as a pilot study, we were unable to measure the cellular miRNAs without conducting the entire experiment again.

      Reviewer #2 (Public review):

      Summary:

      This study examines how activating specific G protein-coupled receptors (GPCRs) affects the microRNA (miRNA) profiles within extracellular vesicles (EVs). The authors seek to identify whether different GPCRs produce unique EV miRNA signatures and what these signatures could indicate about downstream cellular processes and pathological processes.

      Methods:

      (1) Used U2OS human osteosarcoma cells, which naturally express multiple GPCR types.

      (2) Stimulated four distinct GPCRs (ADORA1, HRH1, FZD4, ACKR3) using selective agonists.

      (3) Isolated EVs from culture media and characterized them via size exclusion chromatography, immunoblotting, and microscopy.

      (4) Employed qPCR-based miRNA profiling and bioinformatics analyses (e.g., KEGG, PPI networks) to interpret expression changes.

      Key Findings:

      (1) No significant change in EV quantity or size following GPCR activation.

      (2) Each GPCR triggered a distinct EV miRNA expression profile.

      (3) miRNAs differentially expressed post-stimulation were linked to pathways involved in cancer, insulin resistance, neurodegenerative diseases, and other physiological/pathological processes.

      (4) miRNAs such as miR-550a-5p, miR-502-3p, miR-137, and miR-422a emerged as major regulators following specific receptor activation.

      Conclusions:

      The study offers evidence that GPCR activation can regulate intercellular communication through miRNAs encapsulated within extracellular vesicles (EVs). This finding paves the way for innovative drug-targeting strategies and enhances understanding of drug side effects that are mediated via GPCR-related EV signaling.

      Strengths:

      (1) Innovative concept: The idea of linking GPCR signaling to EV miRNA content is novel and mechanistically important.

      (2) Robust methodology: The use of multiple validation methods (biochemical, biophysical, and statistical) lends credibility to the findings.

      (3) Relevance: GPCRs are major drug targets, and understanding off-target or systemic effects via EVs is highly valuable for pharmacology and medicine.

      Weaknesses:

      (1) Sample Size & Scope: The analysis included only four GPCRs. Expanding to more receptor types or additional cell lines would enhance the study's applicability.

      We are encouraged that the reviewer recognized the novelty, methodological rigor, and significance of our work. We recognize the limitations of our current model system and emphasize the need to test additional GPCR families and cell lines in the future studies, as detailed in the discussion section.

      (2) Exploratory Nature: This study is primarily descriptive and computational. It lacks functional validation, such as assessing phenotypic effects in recipient cells, which is acknowledged as a future step.

      We appreciate the feedback. We recognize the importance of validating the function of the top candidate miRNAs in the recipient cells, and this will be included in future studies. 

      (3) EV heterogeneity: The authors recognize that they did not distinguish EV subpopulations, potentially confounding the origin and function of miRNAs.

      Thank you for the comment. EV isolation and purification are major challenges in EV research. Current isolation techniques are often ineffective at separating vesicles produced by different biogenetic pathways. Furthermore, the lack of specific markers to differentiate EV subtypes adds to this complexity. We recognize that the presence of various subpopulations can complicate the interpretation of EV cargos. In our study, we used a combined approach of ultrafiltration followed by size-exclusion chromatography to achieve a balance between EV purity and yield. We adhere to the MISEV (Minimal Information for Studies of Extracellular Vesicles 2023) guidelines by reporting detailed isolation methods, assessing both positive and negative protein markers, and characterizing EVs by electron microscopy to confirm vesicle structure, as well as nanoparticle tracking analysis to verify particle size distribution(Welsh et al., 2024). By following these guidelines, we can ensure the quality of our study and enhance the ability to compare our findings with other studies.

    1. eLife Assessment

      In this important manuscript, Ryan et al perform a genome-wide CRISPR based screen to identify genes that modulate TDP-43 levels in neurons. They identify a number of genes and pathways and highlight the BORC complex, which is required for anterograde lysosome transport as one such regulator of TDP-43 protein levels. Overall, this is a convincing study, which opens the door for additional future investigations on the regulation of TDP-43.

    2. Reviewer #1 (Public review):

      Summary:

      As TDP-43 mislocalization is a hallmark of multiple neurodegenerative diseases, the authors seek to identify pathways that modulate TDP-43 levels. To do this, they use a FACS based genome wide CRISPR KD screen in a Halo tagged TDP-43 KI iPSC line. Their screen identifies a number of genetic modulators of TDP-43 expression including BORC which plays a role in lysosome transport.

      Strengths:

      Genome wide CRISPR based screen identifies a number of modulators of TDP-43 expression to generate hypotheses regarding RNA BP regulation and perhaps insights into disease

    3. Reviewer #2 (Public review):

      Summary:

      The authors employ a novel CRISPRi FACS screen and uncover the lysosomal transport complex BORC as a regulator of TDP-43 protein levels in iNeurons. They also find that BORC subunit knockouts impair lysosomal function, leading to slower protein turnover and implicating lysosomal activity in the regulation of TDP-43 levels. This is highly significant for the field given that a) other proteins could also be regulated in this way, b) understanding mechanisms that influence TDP-43 levels are significant given that its dysregulation is considered a major driver of several neurodegenerative diseases and c) the novelty of the proposed mechanism.

      Strengths:

      The novelty and information provided by the CRISPRi screen. The authors provide evidence indicating that BORC subunit knockouts impair lysosomal function, leading to slower protein turnover and implicating lysosomal activity in the regulation of TDP-43 levels and show a mechanistic link between lysosome mislocalization and TDP-43 dysregulation. The study highlights the importance of localized lysosome activity in axons and suggests that lysosomal dysfunction could drive TDP-43 pathologies associated with neurodegenerative diseases like FTD/ALS. Further, the methods and concepts will have an impact to the larger community as well. The work also sets up for further work to understand the somewhat paradoxical findings that even though the tagged TDP-43 protein is reduced in the screen, it does not alter cryptic exon splicing and there is a longer TDP-43 half-life with BORC KD.

    4. Reviewer #3 (Public review):

      Summary:

      In this work, Ryan et al. have performed a state-of-the-art full genome CRISP-based screen of iNEurons expressing a teggd version of TDP-43 in order to determine expression modifiers of this protein. Unexpectedly, using this approach the authors have uncovered a previously undescribed role of the BORC complex in affecting the levels of TDP-43 protein, but not mRNA expression. Taken together, these findings represent a very solid piece of work that will certainly be important for the field.

      Strengths:

      BORC is a novel TDP-43 expression modifier that has never been described before and it seemingly acts on regulating protein half life rather than transcriptome level. It has been long known that different labs have reported different half-lives for TDP-43 depending on the experimental system but no work has ever explained these discrepancies. Now, the work of Ryan et al. has for the time identified one of these factors which could account for these differences and play an important role in disease (although this is left to be determined in future studies).

      The genome wide CRISPR screening has demonstrated to yield novel results with high reproducibility and could eventually be used to search for expression modifiers of many other proteins involved in neurodegeneration or other diseases

    1. eLife Assessment

      Seon and Chung investigate changes in own risk-taking behavior, when they are being observed by a "risky" or "safe" player. Using computational modeling and model-informed fMRI, the authors present convincing evidence that participants adjust their choice congruent with the other player's type (either risky or safe). The conclusions of the paper are an important contribution to the field of social decision-making as they show a differentiated adjustment of choices and not just a universally riskier choice behavior when being observed as has been claimed in previous studies.

    2. Reviewer #2 (Public review):

      Summary:

      This study aims to investigate how social observation influences risky decision-making. Using a gambling task, the study explored how participants adjusted their risk-taking behavior when they believed their decisions were being observed by either a risk-averse or risk-seeking partner. The authors hypothesized that individuals would simulate the choices of their observers based on learned preferences and integrate these simulated choices into their own decision-making. In addition to behavioral experiments, the study employed computational modeling to formalize decision processes and fMRI to identify the neural underpinnings of risky decision-making under social observation.

      Strengths:

      The study provides a fresh perspective on social influence in decision-making, moving beyond the simple notion that social observation leads to uniformly riskier behavior. Instead, it shows that individuals adjust their choices depending on their beliefs about the observer's risk preferences, offering a more nuanced understanding of how social contexts shape decision-making. The authors provide evidence using comprehensive approaches, including behavioral data based on a well-designed task, computational modeling, and neuroimaging. The three models are well selected to compare at which level (e.g., computing utility, risk preference shift, and choice probability) the social influence alters one's risky decision-making. This approach allows for a more precise understanding of the cognitive processes underlying decision-making under social observation.

      Weaknesses:

      While the neuroimaging results are generally consistent with the behavioral and computational findings, the strength of the neural evidence could be improved. The authors' claims about the involvement of the TPJ and mPFC in integrating social information are plausible, but further analysis, such as model comparisons at the neuroimaging level, is needed to decisively rule out alternative interpretations that other computational models suggest.

      My concern raised above in the previous round has been addressed with the newly added results. I now find the manuscript substantially improved.

      I have only a minor suggestion: when discussing the conflict-related signals observed in the dACC and dlPFC, I encourage the authors to include alternative interpretations beyond conflict monitoring per se. For example, these signals may also reflect processes related to information updating during social learning or inference. While the study does not aim to dissociate these possibilities, acknowledging them would enrich the discussion and provide a broader perspective for readers.

      Comments on revised version:

      Thank you for the substantial revision. I believe the additional analyses have meaningfully strengthened the manuscript, particularly by improving the connection between the behavioral modeling and neuroimaging results. The findings are consistent with prior work while also providing novel insights.

      When discussing the conflict-related signals observed in the dACC/dlPFC, I encourage the authors to include alternative interpretations in addition to conflict monitoring per se. For example, these signals may also reflect processes related to information updating during social learning or inference. While the study does not aim to dissociate these possibilities, acknowledging them would enrich the discussion and offer a broader perspective for readers.

      I have updated my evaluation of the strength of evidence from Solid to Convincing.

    3. Reviewer #3 (Public review):

      Summary:

      This is an important paper using a novel paradigm to examine how observation affects social contagion of risk preferences. There is a lot of interest in the field on the mechanisms of social influence, and adding in the factor of whether observation also influences these contagion effects is intriguing.

      Strengths:

      There is an impressive combination of a multi-stage behavioural task as well as computational modelling and neuroimaging. The analyses are well conducted and the sample size is reasonable.

      Comments on revised version:

      Thank you for your helpful responses to my concerns. The manuscript is much improved and will make an important contribution to the literature. I have one remaining clarification. My request was for the authors to speculate in the discussion about lifespan differences in susceptibility to social influence, because the paper talks about how observing others' choices makes people riskier. I think it is important to explicitly acknowledge in the discussion that the sample tested was young adults, and it may be that the effects they observe are not the same in adolescents or older adults, as suggested in recent work (e.g. Reiter et al., 2019 Nat Comms, Su et al., 2024, Comms Psych). This is important to qualify general statements about how humans behave when observing others' risky decisions.

    1. eLife Assessment

      This fundamental study demonstrates that lipid binding can regulate the dimerization state of the SARS-CoV2 Orf9b protein. The data from biophysical and cellular experiments along with mathematical modeling are compelling. This paper is broadly relevant to those studying coupled equilibria across all aspects of biology.

    2. Reviewer #1 (Public review):

      Summary:

      Felipe and colleagues try to answer an important question in Sarbecovirus Orf9b-mediated interferon signaling suppression, given that this small viral protein adopts two distinct conformations, a dimeric β-sheet-rich fold and a helix-rich monomeric fold when bound by Tom70 protein. Two Orf9b structures determined by X-ray crystallography and Cryo-EM suggest an equilibrium between the two Orf9b conformations, and it is important to understand how this equilibrium relates to its functions. To answer these questions, the authors developed a series of ordinary differential equations (ODE) describing the Orf9b conformation equilibrium between homodimers and monomers binding to Tom70. They used SPR and a fluorescent polarization (FP) peptide displacement assay to identify parameters for the equilibrium and create a theoretical model. They then used the model to characterize the effect of lipid-binding and the effects of Orf9b mutations in homodimer stability, lipid binding, and dimer-monomer equilibrium. They used their model to further analyze dimerization, lipid binding, and Orf9b-Tom70 interactions for truncated Orf9b, Orf9b fusion mutant S53E (blocking Tom70 binding), and Orf9b from a set of Sars-CoV-2 VOCs. They evaluated the ability of different Orf9b variants for binding Tom70 using Co-IP experiments and assessed their activity in suppressing IFN signaling in cells.

      Overall, this work is well designed, the results are of high quality and well-presented; the results support their conclusions.

      Strengths:

      (1) They developed a working biophysical model for analyzing Orf9b monomer-dimer equilibrium and Tom70 binding based on SPR and FP experiments; this is an important tool for future investigation.

      (2) They prepared lipid-free Orf9b homodimer and determined its crystal structure.

      (3) They designed and purified obligate Orf9b monomer, fused-dimer, etc., a very important Orf9b variant for further investigations.

      (4) They identified the lipid bound by Orf9b homodimer using mass spectra data.

      (5) They proposed a working model of Orf9b-Tom70 equilibrium.

      Weaknesses:

      (1) It is difficult to understand why the obligate Orf9b dimer has similar IFN inhibition activity as the WT protein and obligate Orf9b monomer truncations.

      (2) The role of Orf9b homodimer and the role of Orf9b-bound lipid in virus infection, remains unknown.

      Comments on revisions:

      In the revised manuscript, the authors have addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      This study focuses on Orf9b, a SARS-COV1/2 protein that regulates innate signaling through interaction with Tom70. San Felipe et al use a combination of biophysical methods to characterize the coupling between lipid-binding, dimerization, conformational change, and protein-protein-interaction equilibria for the Orf9b-Tom70 system. Their analysis provides a detailed explanation for previous observations of Orf9b function. In a cellular context, they find other factors may also be important for the biological functioning of Orf9b.

      Strengths:

      San Felipe et al elegantly combine structural biology, biophysics, kinetic modelling, and cellular assays, allowing detailed analysis of the Orf9b-Tom70 system. Such complex systems involving coupled equilibria are prevalent in various aspects of biology, and a quantitative description of them, while challenging, provides a detailed understanding and prediction of biological outcomes. Using SPR to guide initial estimates of the rate constants for solution measurements is an interesting approach.

      Weaknesses:

      This study would benefit from a more quantitative description of uncertainties in the numerous rate constants of the models, either through a detailed presentation of the sensitivity analysis or another approach such as MCMC. Quantitative uncertainty analysis, such as MCMC is not trivial for ODEs, particularly when they involve many parameters and are to be fitted to numerous data points, as is the case for this study. The authors use sensitivity analysis as an alternative, however, the results of the sensitivity analysis are not presented in detail, and I believe the authors should consider whether there is a way to present this analysis more quantitatively. For example, could the residuals for each +/-10% parameter change for the peptide model be presented as a supplementary figure, and similarly for the more complex models? Further details of the range of rate constants tested would be useful, particularly for the ka and kB parameters.

      The authors build a model that incorporates an α-helix-β-sheet conformational change, but the rate constant for the conversion to the α-helix conformation is required to be second order. Although the authors provide some rationale, I do not find this satisfactorily convincing given the large number of adjustable parameters in the model and the use of manual model fitting. The authors should discuss whether there is any precedence for second-order rate constants for conformational changes in the literature. On page 14, the authors state this rate constant "had to be non-linear in the monomer β-sheet concentration" - how many other models did the authors explore? For example, would αT↔α↔αα↔ββ (i.e., conformational change before dimer dissociation) or α↔βαT↔ββ (i.e., Tom70 binding driving dimer dissociation) be other plausible models for the conformational change that do not require assumptions of second-order rate constants for the conformational change?

      Overall, this study progresses the analysis of coupled equilibria and provides insights into Orf9b function.

      Comments on revisions:

      The authors have done a satisfactory job addressing my concerns.

      Regarding my recommendations to the authors - point 7: "Orf9b-FITC:Tom70" and "PT", representing the same species, are still both used in the equations on page 14, which is confusing for anyone who may wish to re-use the model. I appreciate this is quite a subtle point but given the importance of the model for the manuscript I feel the authors should do their due diligence to ensure it is presented as clearly as possible.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Felipe and colleagues try to answer an important question in Sarbecovirus Orf9b-mediated interferon signaling suppression, given that this small viral protein adopts two distinct conformations, a dimeric β-sheet-rich fold and a helix-rich monomeric fold when bound by Tom70 protein. Two Orf9b structures determined by X-ray crystallography and Cryo-EM suggest an equilibrium between the two Orf9b conformations, and it is important to understand how this equilibrium relates to its functions. To answer these questions, the authors developed a series of ordinary differential equations (ODE) describing the Orf9b conformation equilibrium between homodimers and monomers binding to Tom70. They used SPR and a fluorescent polarization (FP) peptide displacement assay to identify parameters for the equilibrium and create a theoretical model. They then used the model to characterize the effect of lipid-binding and the effects of Orf9b mutations in homodimer stability, lipid binding, and dimer-monomer equilibrium. They used their model to further analyze dimerization, lipid binding, and Orf9b-Tom70 interactions for truncated Orf9b, Orf9b fusion mutant S53E (blocking Tom70 binding), and Orf9b from a set of Sars-CoV-2 VOCs. They evaluated the ability of different Orf9b variants for binding Tom70 using Co-IP experiments and assessed their activity in suppressing IFN signaling in cells.

      Overall, this work is well designed, the results are of high quality and well-presented; the results support their conclusions.

      We thank reviewer #1 for their thoughtful assessment of our work and their constructive feedback.

      Strengths:

      (1) They developed a working biophysical model for analyzing Orf9b monomer-dimer equilibrium and Tom70 binding based on SPR and FP experiments; this is an important tool for future investigation.

      (2) They prepared lipid-free Orf9b homodimer and determined its crystal structure.

      (3) They designed and purified obligate Orf9b monomer, fused-dimer, etc., a very important Orf9b variant for further investigations.

      (4) They identified the lipid bound by Orf9b homodimer using mass spectra data.

      (5) They proposed a working model of Orf9b-Tom70 equilibrium.

      Weaknesses:

      (1) It is difficult to understand why the obligate Orf9b dimer has similar IFN inhibition activity as the WT protein and obligate Orf9b monomer truncations.

      We thank the reviewer for their observation and agree that the obligate homodimer IFN results were not what we expected to observe given our FP kinetic results with the purified obligate homodimer and noted our surprise in the discussion. We also note that we have two possible hypotheses for why this is the case.

      In our discussion, we noted the possible introduction of an increased avidity effect with fused homodimer and have improved it as follows with additions in red:

      “This result was unexpected as we had anticipated the obligate homodimer results to resemble the phosphomimetic. We hypothesize that this may be explained by two possible factors. First, we can’t exclude the introduction of an increased avidity between Orf9b and Tom70 when using the fused homodimer. Although our modeled decrease in the association rate of Orf9b:Tom70 (which increases the K<sub>D</sub> of the complex) suggests that fusing two copies of Orf9b decreases the affinity to Tom70, one copy of the fusion construct could also be capable of either binding to two copies of Tom70, or, one copy of the fusion could undergo rapid rebinding to Tom70. These effects would lead to a much tighter interaction in cellular assays than we modeled in vitro. A second possible explanation is that our assumptions about high lipid binding are not valid for cell based assays.”

      We also noted that a second possible explanation is due to our limitations in isolating the apo-fused homodimer to compare to the lipid-bound fused homodimer and possible differences this could have on our assays and briefly expanded upon this. Again, we improved this with additions in red:

      “As we have shown with both WT and fusion constructs, recombinantly expressed and purified Orf9b is lipid-bound and this can stabilize the homodimer to slow or inhibit the binding to Tom70. For the Orf9b fusion construct, we attempted to isolate the lipid-free species through protein refolding as previously described to compare the effect of lipid-binding on the homodimer fusion (similar to our WT experiments); however, we could not recover the stably folded homodimer. We hypothesize that the discrepancy between our kinetic results and Co-IP/IFN results could be due to subsaturation of the Orf9b fusion homodimers by lipids in cell based assays. While we have shown that lipid-binding occurs in recombinant expression systems, it is possible that in our cell based signaling assays that lipid-binding only affects a minor population of Orf9b. Given that we were unable to isolate the apo-fusion homodimer, we could not directly compare whether there are differences in fusion homodimer stability in the presence or absence of lipid-binding. Therefore, it is possible that the apo-fusion homodimer undergoes unfolding and refolding into alpha helices that lead to Tom70 binding similar to the WT construct.”

      (2) The role of Orf9b homodimer and the role of Orf9b-bound lipid in virus infection, remains unknown.

      We agree that we did not try to directly test for the role of the homodimer during infection and this remains an open area of exploration for future studies. We have included this caveat in our discussion but suggested possible experiments and future directions that could help shed light on this:

      “Although we have not directly tested for the role the homodimer conformation plays during infection, we have demonstrated that lipid-binding to the homodimer can bias the equilibrium away from Tom70. Lipids including palmitate have been shown to act as both a signaling molecule as well as a post-translational modification during antiviral innate immune signaling (S Mesquita et al. 2024; Wen et al. 2022; S. Yang et al. 2019). As a post-translational modification (referred to as S-acylation), MAVS, a mitochondrial type 1 IFN signaling protein that associates with Tom70 (X.-Y. Liu et al. 2010; McWhirter, Tenoever, and Maniatis 2005; Seth et al. 2005), has been shown to be post-translationally palmitoylated which affects its ability to localize to the mitochondrial outer membrane during viral infection and is a known target of Orf9b (Bu et al. 2024; Lee et al. 2024). When this is impaired (either by mutation or by depletion of the palmitoylation enzyme ZDHHC24), IFN activation is impaired (Bu et al. 2024). Therefore, future investigations should consider if the homodimer conformation of Orf9b is capable of antagonizing other IFN signaling factors such as MAVS by binding to palmitoyl groups. Indeed, Orf9b has already been shown to be capable of binding to MAVS by Co-IP (Han et al. 2021), however, whether or not this occurs through the palmitoyl modification remains unknown.”

      Reviewer #2 (Public review):

      Summary:

      This study focuses on Orf9b, a SARS-COV1/2 protein that regulates innate signaling through interaction with Tom70. San Felipe et al use a combination of biophysical methods to characterize the coupling between lipid-binding, dimerization, conformational change, and protein-protein-interaction equilibria for the Orf9b-Tom70 system. Their analysis provides a detailed explanation for previous observations of Orf9b function. In a cellular context, they find other factors may also be important for the biological functioning of Orf9b.

      Strengths:

      San Felipe et al elegantly combine structural biology, biophysics, kinetic modelling, and cellular assays, allowing detailed analysis of the Orf9b-Tom70 system. Such complex systems involving coupled equilibria are prevalent in various aspects of biology, and a quantitative description of them, while challenging, provides a detailed understanding and prediction of biological outcomes. Using SPR to guide initial estimates of the rate constants for solution measurements is an interesting approach.

      Weaknesses:

      This study would benefit from a more quantitative description of uncertainties in the numerous rate constants of the models, either through a detailed presentation of the sensitivity analysis or another approach such as MCMC. Quantitative uncertainty analysis, such as MCMC is not trivial for ODEs, particularly when they involve many parameters and are to be fitted to numerous data points, as is the case for this study. The authors use sensitivity analysis as an alternative, however, the results of the sensitivity analysis are not presented in detail, and I believe the authors should consider whether there is a way to present this analysis more quantitatively. For example, could the residuals for each +/-10% parameter change for the peptide model be presented as a supplementary figure, and similarly for the more complex models? Further details of the range of rate constants tested would be useful, particularly for the ka and kB parameters.

      We thank the reviewer for their constructive feedback and have generated supplemental figures providing a deeper analysis of the residuals for each model parameter adjusted +/- 10% from the reported values which we have added to our supplemental figures as Figure 1 - Supplemental 3 and Figure 4 - Supplemental 5  .

      We note that there are modest improvements in residual plots where model parameters are individually lowered by 10% from their reported value when considering this single dataset, however, our choice of using the reported values was driven by finding values that were suitable for improving model behavior across multiple concentration series in different datasets. Specifically, we have also included the RMSD values for each model parameter subjected to a +/-10% change from a single concentration time course as well as the percent change in RMSD relative to the RMSD generated by our reported model parameters to illustrate this. We have also included text that makes note of the observed pattern in the residuals from Figure 4 - Supplement 5 and provided some explanations for why this may occur.

      “Inspection of the residuals from the 5uM apo-Orf9b homodimer time course showed clear patterns when individual model parameters were subjected to a 10% increase or decrease from the reported values. While our proposed model qualitatively describes the concentration dependent change in kinetic behavior, the residual plots may suggest that additional binding reactions may also be occurring that are not captured by our model.”

      Figure 1 - Supplemental 3. Plots of residuals from Orf9b peptide model showing effect of an increase or decrease by 10% on each model parameter. All residuals and reporting are with respect to the100uM of unlabeled Orf9b peptide condition. Blue dots: reported value. Red dots: 10% increase in reported value. Green dots: 10% decrease in reported value. Table reporting of RMSD values for model fitsafter +/-10% change to model parameter (Left column) and percent change in RMSD relative to reported model RMSD (Right column).

      “As an alternative to attempting to place CIs on the parameters, we performed sensitivity analysis to determine which parameters the model was most sensitive to (see methods and Figure 1 - Supplemental 3). Additionally, we note that the model parameters were derived from the fit of only one concentration (100uM), but fit the other concentrations equally well. We observed that the model parameter that was most sensitive to change was the rate of Orf9b-FITC:Tom70 ([PT]) dissociation when subjected to a 10% increase or decrease whereas all other model parameters showed no sensitivity to change (Figure 1 - Supplemental 3).”

      Figure 4 - Supplemental 5: Plot of residuals showing the effect of increasing or decreasing individual model parameters 10% compared to the reported values. All residual plots are with respect to the 5uM apo-Orf9b homodimer condition. Blue dots: reported value. Red dot: 10% increase in reported value. Green dot: 10% decrease in reported value. (Left columns) Table of RMSD values calculated from model fits showing the effect of both +/-10% change to individual model parameters. (Right columns) Percent change in RMSD values subjected to +/-10% change for individual model parameters relative to the RMSD of the reported model.

      We have also included the following revised text to accompany this figure.

      “Further, we repeated the sensitivity analysis described previously for the peptide model and also considered the sensitivity of model parameters by inspecting each individually (Figure 4- figure supplemental 5). We found that when examining the residuals of the lowest concentration of 5uM, the model was most sensitive to changes in three parameters: the rate of homodimer association and dissociation and the conversion from β to α-monomers.”

      “Therefore, under low concentrations of Orf9b homodimer, binding to Tom70 is limited by the rate of homodimer association and dissociation as well as the conversion of Orf9b monomers to the α-helical conformation.”

      We have also included a supplemental figure showing how changes in the model parameters ka and kB affect the models behavior to help illustrate the range of values tested as Figure 4 - Supplemental 4.

      Figure 4 - Supplemental 4: Plots of model behavior showing the effect of changes to alpha-beta and beta-alpha monomer  interconversion rates compared to experimental values. Data is modeled with respect to the apo-Orf9b homodimer 5uM condition. Black line represents reported model fit and values used.

      We have also incorporated the following revised text.

      “The model parameters k<sub>a</sub> and k<sub>B</sub> describe the rate of interchange between the β-sheet and α-helix monomer conformations. These parameters must be estimated by modeling because our assays do not allow us to directly measure the folding rates between these conformations. To identify these values, we performed a scan of k<sub>a</sub> and k<sub>B</sub> values that yielded the best agreement between the model and the experimental conditions (Figure 4 - figure supplemental 4).”

      The authors build a model that incorporates an α-helix-β-sheet conformational change, but the rate constant for the conversion to the α-helix conformation is required to be second order. Although the authors provide some rationale, I do not find this satisfactorily convincing given the large number of adjustable parameters in the model and the use of manual model fitting. The authors should discuss whether there is any precedence for second-order rate constants for conformational changes in the literature. On page 14, the authors state this rate constant "had to be non-linear in the monomer β-sheet concentration" - how many other models did the authors explore? For example, would αT↔α↔αα↔ββ (i.e., conformational change before dimer dissociation) or α↔βαT↔ββ (i.e., Tom70 binding driving dimer dissociation) be other plausible models for the conformational change that do not require assumptions of second-order rate constants for the conformational change?

      We thank the reviewer for their feedback. During our studies, we tested several models prior to the final one presented in Figure 4A. The first model that we tested as described in Figure 4 - Supplemental 3 described ββ↔α↔αT with no conformational change. We tested several models that integrated the existing structural data for both Orf9b and Tom70 and found that while these models could fit individual time series, they did not explain the concentration dependent changes in subsequent time series nor did they explain changes induced by lipid-binding and mutations in VOC.

      With respect to the possibilities of αT↔α↔αα↔ββ and α↔βαT↔ββ models, we have revised our manuscript to mention that we did test additional models before we settled on the model that we presented.

      “We tested different reaction schemes that incorporated the interconversion between β-sheet to α-helix conformations by considering models that described a conformational change in the homodimer leading to Tom70 binding rather than monomers. None of these models adequately described our experimental results, therefore we continued developing our model as outlined in Figure 4D”

      With respect to the second-order rate describing the fold change from β to α, we have added the revised text to the manuscript:

      “We initially tested the impact of keeping the rate constant k<sub>a</sub> first order, just like k<sub>B</sub> which did yield the sigmoidal behavior we observed in the 5uM apo-homodimer condition. However, this assumption failed to describe the data at other concentrations resulting in a substantial overestimation compared to our experimental results when holding k<sub>B</sub> at a constant value throughout. We found that when the β-sheet to α-helix rate (k<sub>a</sub> ) was made a second order rate constant, we were able to hold the rate constant across all concentrations tested suggesting a non-linearity in the monomer β-sheet concentration.”

      While this was surprising to us, we reasoned that a biological explanation for why the conversion from β to α was second order was that the β-monomers may transiently self-associate to cooperatively fold into the α-helical conformation. We did acknowledge this choice to make the β to α parameter non-linear (unlike the α to β conversion which was single order).

      We concede that we could not find specific examples describing non-linear kinetics comparable to the system we described in literature, however, such systems have been reported for proteins that exhibit high structural plasticity where transient interactions with another copy of the protein or another protein altogether drive folding changes and we have revised this manuscript to include some additional citations to papers that describe such systems (Zuber et al. 2022; Tuinstra et al. 2008).

      Overall, this study progresses the analysis of coupled equilibria and provides insights into Orf9b function.

      Reviewer #1 (Recommendations for the authors):

      (1) What was the unlabeled Orf9b peptide is added to the pre-equilibrated Orf9b-FITC:Tom70 solution as a competitor? Figure 1D illustrates that the competitor was full-length Orf9b.

      We have revised the figure to illustrate that in this experiment, the competitor is the unlabeled FITC peptide and not the full length Orf9b sequence

      (2) Figure 2B, what is the higher Mw peak from refolded Orf9b homodimer.

      We have added the following revised text (highlighted in red) to the manuscript to clarify Figure 2B.

      “The SEC elution profile and retention volume of refolded Orf9b directly overlapped with natively folded homodimeric Orf9b and suggested a high recovery of the refolded homodimer with the early eluting peaks corresponding to either a chaperone-bound species (natively folded) or misfolded protein (refolded) as judged by SDS-PAGE (Figure 2B). Together, the overlap in elution peaks corresponding to the folded homodimer suggested a high recovery of the homodimer from the refolding conditions.”

      (3) Figure 2C, in the main text, the authors state that "...observed that the refolded homodimer structure closely aligned with the lipid-bound reference structure, which shows that the homodimer fold can be recovered after denaturing". Please provide structural comparison details here, software used? Rmsd and Dali Z-score.

      We have added the following revised text (highlighted in red) to the manuscript to clarify Figure 2C.

      “Aligning the structure of the Orf9b homodimer (PDB 6Z4U) with our structure of the refolded Orf9b homodimer (9N55) in Pymol resulted in an RMSD of 1.1Å. Further, we also searched our structures of the refolded Orf9b homodimer on the Dali server against the existing structures of the lipid-bound Orf9b homodimer which yielded a Z-score of 2.2 which shows good correspondence between the structures.”

      (4) To prove the refolded Orf9b homodimer did not contain lipid, could the authors provide mass spectra data for the refolded Orf9b sample and compare it with the results in Figure 2 - Supplemental 1.

      We do not have complete mass spectra data for the refolded homodimer samples, however, we feel that the native mass spectrometry data provides a good orthogonal comparison between natively folded and refolded samples for the presence or absence of lipids. We concede that we only used mass spectrometry to characterize the four peaks that were unique to the natively folded deconvoluted spectra which confirmed that shift in mass relative to the expected homodimer molecular weight corresponded to the two lipids we presented. However, we would expect that performing mass spectrometry on the refolded sample would only further confirm our observations from the crystal structures and the native mass spectrometry.

      (5) Have the authors tried to use analytical ultracentrifugation to analyze the Orf9b dimer-monomer equilibrium, given that AUC provides a much more accurate measurement of molecular mass?

      We thank the reviewer for this suggestion and agree that AUC could be an additional useful strategy for monitoring the dimer-monomer equilibrium and provide additional validation of the molecule weights of both the monomer and homodimer.

      While we have not performed AUC, we have revised our manuscript to include more discussion about the determination of molecular weights by SEC.

      “For the Orf9b homodimer, the retention volume was consistent with molecular weight standards based on the expected molecular weight of the homodimer (~21kDa) and the standard (~29kDa). In the case of the Orf9b monomer, although we would expect the retention volume of the monomer (~10.6kDA) to be between the molecular weight standards of 13.4kDa and 6.5kDa, the greater retention volume could be explained by non-specific hydrophobic interactions between the monomeric Orf9b and the column.”

      (6) The authors used truncation of 7 C-terminal amino acids to generate an obligate Orf9b monomer for their assays. It would be interesting to mutate residues at the homodimer interface to generate Orf9b monomers rather than deleting residues. For example, mutate 91-96aa (FVVVTV) to negatively charged residues, which will not only disrupt the dimerization interface, but also impair lipid binding. The dimer interface mutant should then be tested in their SPR, FP assays, as well as IFN inhibition assays.

      We thank the reviewer for their suggestion and agree that mutation of the 7 C-terminal amino acids into negatively charged residues could be an interesting alternative strategy to generating an obligate Orf9b monomer without the need for truncating the residues. Our choice of using the truncated construct we proposed was driven by our analysis of the structure of the homodimer which reveals that a significant portion of the dimer interface is composed of backbone-backbone hydrogen bonding between the two chains of Orf9b. We reasoned that truncating these residues would be the most effective way to compromise the interface between the two chains and drive a predominantly monomeric behavior, however, compromising the interface with multiple mutations is an intriguing alternative.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors could comment on the slow monomer-dimer exchange observed by SEC and how it fits with their other analysis.

      We thank the reviewer for their comment and concede that the slow exchange may be a limitation of this experimental setup. Our observations from our SPR experiments and modeling showed us that the homodimer may be fast to dissociate into monomer given the off rate which would suggest a half-life for the homodimer to be on the order of seconds, however, we still observe a noticeable dimer species on the chromatograms. We initially allowed the diluted samples to reach equilibrium prior to injection onto the analytical sizing column, however, it is possible that the system is still in a pre-equilibrium prior to injection onto the column. This could be driven by interactions between the protein and the column that prevents full dissociation of the homodimer. While this is a limitation, we note that we did not use the Kd value that we determined by non-linear regression fitting to the equilibrium observed on the chromatograms for downstream experiments but instead used the value to get a ballpark estimate for the homodimer Kd which is on the same order as the Kd determined by SPR.

      (2) It might be useful to include the rate constants on the reaction arrows of the schematic representation of the models.

      We have revised Figure 4D to include the rates for both Orf9b monomer binding to Tom70 and Orf9b binding to Orf9b as derived from the SPR experiments as well as the modeled values for the interconversion between α and β monomers. We also revised Figure 7 to include these values as well as the modeled dissociation rate for homodimer when lipid-bound.

      (3) I couldn't find how the sensitivity analysis was performed for the more complex models. Was this the same +/- 10% as per the peptide model?

      We used the same +/- 10% sensitivity analysis for the peptide model in the more complex equilibrium model and have revised our manuscript to clearly reflect that.

      (4) Further clarification of "inspection of residuals suggested that the fits were accurate". In Figure 1B, the residues look to have systematic errors, perhaps indicating other processes occurring.

      We agree that in the SPR kinetic fitting results for the Orf9b peptide binding to Tom70 in Figure 1B that there are some regions where the fit over or under estimates the experimental results. This is partially the result of limitations in the number of different binding models that we can fit in the analysis software which is why we reported using a 1:1 langmuir binding model. It is certainly possible that there may be some additional binding reactions that occur, however, we limited our use of these specific kinetic results to the peptide model that we proposed in Figure 1D. We did note in the manuscript text that it was necessary for us to change the model parameter values to some extent in order to fit our experimental results which may be partially explained by the SPR fitting errors.

      “With the parameter set obtained from the 100µM condition, we then held all parameters fixed and simply changed the peptide concentrations in the model to fit the remaining conditions by hand. We note that this process saw the model parameter values change between 3% at the lowest end up to 70% at the highest end from the experimentally derived values but remained within an order of magnitude of the experimental SPR values. We speculate that this arises due to the differences in experimental setup between SPR and FP-based methods of measuring kinetics.”

      (5) The manuscript builds logically, but given the sophisticated nature of the system and the modelling could benefit from more clarity/streamlining in the descriptions/illustrations.

      We have revised our manuscript in response to both reviewers comments and hope that the clarity of the work is improved as a result.

      (6) Figure 4 Supplement 3 - where did the rate constants for Model 1 come from? Was there any attempt to alter them to fit the data better?

      We have clarified in the figure description that the rate constants used in Model 1 were the same values used in Figure 4B (but without the interconversion between beta and alpha rates).

      “Comparison of kinetic model 1 and 2 in describing experimental results from the kinetic binding assay. Experimental results using 10uM of refolded Orf9b homodimer are shown as rings with the predicted behavior of model 1 (equilibrium exchange) shown as a dark blue line. The predicted behavior of model 2 (equilibrium exchange with a conformational change between β-sheet and ɑ-helical monomers) is shown as the light blue line. Model parameter values were the same as described in Figure 4D and kept constant in both model comparisons.”

      (7) What are and [PT] in the second set of equations (page 13)?

      [‘PT] refers to the concentration of “fluorescent probe” (Orf9b-FITC) and Tom70.

      (8) "Additionally, the fused homodimer association rate (which can be viewed as a rate of tertiary complex formation)" - can the authors provide a mathematical proof for this?

      In the case of the fused homodimer kinetic data, we did not develop a separate model to explicitly take into account the differences between using a fused construct versus the WT construct that can dissociate into monomers. We have clarified our interpretation of this in the manuscript.

      “Although our model explicitly describes homodimer dissociation into monomers as a requisite step for Orf9b binding to Tom70, we adapted it for the fusion experimental data. In this case, all model parameters other than the association and dissociation kinetics of the fluorescent probe and Tom70 were adjusted to achieve the best agreement with the experimental data. When applied to the fusion homodimer, the parameters describing homodimer dissociation into separate monomers could instead describe the dissociation of the two β-sheet domains away from each other in the tertiary structure but remaining physically linked through the linker region.”

      (9) "For Lambda and Omicron, the P10S mutation results in the serine being positioned to form several hydrogen bonds between R13 and the backbone carbonyl of A11 and L48 within the same chain..." is this taken from AlphaFold predicted structures of the mutants? If so, it should be made clear that this is derived from predicted structures. And even so, AlphaFold can be poor at determining structures of mutants, and so there is greater uncertainty in the prediction of the bonds.

      For Lambda, Omicron, and Delta mutations, we used Pymol to examine how the placement of mutations could structurally explain the kinetic differences we observed in our model. We have gone back and clarified in the figure description that these predictions are not derived from AlphaFold.

      (10) "biological replicates" - is this different protein purifications?

      Yes, in this case biological replicates refer to different protein purifications for all variants described and tested.

      (11) Are any of the authors involved in the Berkeley Madonna commercial software used in the manuscript? If so, should this be in the conflict of interest statement?

      Yes, Michael Grabe is an owner of Berkeley Madonna, and we have updated our conflicts of interest statement to reflect this.

    1. eLife Assessment

      This important work describes a set of parameters that give a robust description of shape features of cells in tissues. The evidence for the usefulness of these parameters is solid. The work should be of interest for anybody analyzing epithelial dynamics, but more details about the analysis of experimental images are necessary and some streamlining of the text would increase the accessibility of the material for non-specialists.

    2. Reviewer #1 (Public review):

      Summary:

      The authors stated aim is to introduce so-called Minkowski tensors to characterize and quantify the shape of cells in tissues. The authors introduce Minkowski tensors and then define the p-atic order q_p as a cell shape measure, where p is an integer. They also introduce a previously defined measure of p-atic order in the form of the parameter \gamma_p. The authors compute q_p for data obtained by simulating an active vertex model and a multiphase field model, where they focus on p=2 and p=6 - so-called nematic and hexatic order - as the two values of highest biological relevance. Based on their analysis, the authors state that q_2 and q_6 are independent, that there is no crossover for the coarse-grained quantities, that a comparison of q_p for different values of p is not meaningful, and determine the dependence of the mean value of q_2 and q_6 on cell activity and deformability. Subsequently, they apply their method to data from MDCK monolayers and argue that the full range of q_p values needs to be considered to characterize shape and positional order in epithelia..

      Strength:

      The work presents a set of parameters that are useful for analyzing cell shape.

      Weaknesses:

      The introduction of the Minkowski tensors is hardly accessible for typical biologists. Eventually, most quantification is done using q_p, which can be defined without recursion to Minkowski functionals. The relation to Minkowski functionals makes the important properties of robustness and stability evident. However, for an audience of biologists, the derivation of this property could be relegated to an Appendix. Instead, the text could directly go to the results of the analysis of experimental and modeling data.

      Important details about how the cell shapes are extracted from the experimental data are missing. The two data sets the authors consider are not analyzed in the same way.

    3. Reviewer #3 (Public review):

      Hapel et al. present an article entitled Quantifying the shape of cells - from Minkowski tensors to p-atic order. The paper reports the p-atic quantitative method - established in physics - to extract cell full shapes in biological experiments using their images of epithelial MDCK cells (phase contrast) and also images reported in another paper as well as their own simulations based on active vertex model and multiphase phase fields approaches. Authors present the rationale of this new strategy for quantification. They adapt the method of Minkowski tensors and they extract distributions of cell shapes readouts with plots of their distributions. An emphasis is given to changes in cell shapes captured by this method. Higher rank tensors are considered as well as representations with intuitive meanings and q_i orders and their potential correlations or absence of correlations - for example q_2 and q_6, leading to statements about nematic and hexatic orders.

      This analysis and its strength are contrasted with Armengol-Collade et al. (2023) quoted in the paper, who consider polygonal shapes for cells and their shape function 𝛾_p. Authors support the notion of a key improvement thanks to Minkowski tensors approach and doing so, they challenge the former crossovers correlations statements reported in Armengol-Collade et al. (2023). In this context, they defend that nematic liquid crystals approach is not sufficient to capture cell dynamics in tissues. Also they propose that q_2 and q_6 could serve as readout for activity and deformability of cells among other statements related to their approach.

      A variety of analytical methods have been realised to track cells in monolayers in vitro and in vivo during morphogenesis - for example, shear decomposition (from MPI-PKS Dresden) or links joining centroids and their neighbours approach (MSC/Curie Paris) to name few examples. It will be interesting in the future that systematic comparisons between these analytical methods are performed with highlights on their respective advantages and drawbacks. This will allow experimentalists to identify the best relevant methods to address their morphogenetic questions.

    1. eLife Assessment

      The study provides valuable technical advances to generate and isolate neural rosettes. The technique is robust, as indicated by both reviewers. The evidence is solid, as shown in orthogonal characterization by flow cytometry, morphology, and scRNA-seq. Comparison with the manual-rosette-picking protocol will enhance the validity of the claims.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to develop a fully scalable, feeder-free protocol for deriving dorsal forebrain neural rosette stem cells (NRSCs) from human pluripotent stem cells, eliminating the need for manual rosette isolation. Using dynamic suspension culture combined with single-SMAD inhibition (RepSox), they sought to generate FOXG1⁺/OTX2⁺ NRSCs within ten days and expand them through at least twelve passages while retaining regional identity. They also aimed to demonstrate the cells' capacity to differentiate into functional neurons, astrocytes, and oligodendrocytes under defined conditions.

      Strengths:

      A key strength is the elimination of labour-intensive manual rosette picking, which significantly reduces operator variability and enhances throughput. The authors provide diverse validation in the form of flow cytometry showing >95% OTX2⁺ over passages 2-12, immunocytochemistry, single-cell RNA-seq, and functional MEA recordings, confirming both regional fidelity and neuronal activity. They also demonstrate glial differentiation and reproducibility across two hESC lines.

      The results convincingly demonstrate that the RepSox/suspension approach yields high-purity dorsal forebrain neural progenitor cells (NRSCs) that maintain marker expression and multipotency through passage 12 and differentiate into electrophysiologically active neurons and mature glia. Thus, the authors have achieved their primary objectives.

      This protocol addresses a significant bottleneck in neural stem cell production by providing a reproducible, high-throughput alternative that is well-suited to drug screening, disease modelling, and potential cell therapy manufacturing. Standardised, scalable NRSC banks will accelerate neurodevelopmental and neurodegenerative disorder studies, enable automated bioreactor workflows, and encourage the sharing of resources across academia and industry.

      Weaknesses:

      Weaknesses include a lack of direct comparison to conventional manual-selection protocols, and the need to improve the statistical rigor of all quantitative assays by applying appropriate hypothesis tests (e.g., t-tests or ANOVA with multiple-comparison correction) rather than reporting mean {plus minus} SD alone.

      Additional Context:

      Beyond the core technical advance, it's important to situate this work within the broader landscape of neural stem cell research and its downstream applications. Traditionally, dorsal forebrain NSCs have been generated via manual rosette picking after dual-SMAD inhibition (Chambers et al., 2009), a process that is labor-intensive, low-throughput, and prone to operator-dependent variability. By eliminating that step, this protocol directly addresses a key barrier to standardizing NSC production under GMP-compatible conditions - critical for both large-scale drug screening and eventual clinical use. Stable, regionally specified forebrain NSCs are especially valuable for modeling early neurodevelopmental disorders (e.g., autism spectrum disorders, microcephaly) and late-onset pathologies (e.g., Alzheimer's disease) in vitro, where precise cortical patterning is essential to recapitulate disease phenotypes. Moreover, establishing long-term epigenetic fidelity (e.g., via future ATAC-seq or histone-mark profiling) will further reassure users that transcriptional consistency reflects preserved regulatory networks, not just transient marker expression. Finally, demonstrating robust cryopreservation viability (>80%) makes these cells a readily shareable resource for the community, accelerating cross-lab reproducibility and comparative studies of patient-derived iPSC lines. This context underscores how scalable, high-purity forebrain NSCs can transform both basic neuroscience research and translational pipelines.

    3. Reviewer #2 (Public review):

      In the present manuscript, Dannulat Frazier et al. provide a novel and advanced protocol for obtaining almost pure populations of neural rosette stem cells (NRSCs) expressing the general markers NES and SOX2. These NSCs are expandable and exhibit dorsal forebrain properties and markers that are maintained throughout passages in culture (at least until passage 12). The authors also demonstrate the multipotency of these NSCs by their ability to differentiate into functional neurons, and precursors of astrocytes and oligodendrocytes.

      This method does not require the usual step of manual rosette selection and allows a greater homogeneity of the NSCs obtained and the standardization of the protocol, which will allow greater advances in the applications of these NSCs in research and as models of disease or compound testing. The manuscript is of great interest for the research area, since it describes a new methodology that can facilitate the research and therapeutic application of NSCs.

      The manuscript is well-written; the results are clear, robust, and well-explained. The conclusions reached in this paper are well-supported by the data, but some aspects could be better clarified.

      (1) The results presented in the present manuscript of the NSCS are performed up to passage 12; it would be interesting to know up to which passages these cells can be expanded, maintaining their initial properties. Have the authors analyzed passages beyond 12?

      (2) In Figure 2A, where different markers are shown in NSCs at different passages, it seems that at passage 12, there is a decrease in TJP1+ zones in relation to earlier passages, which could indicate a reduction in the potential to generate rosettes. Have the authors done any quantification along these lines? Could this be the case, or is it just an effect of the image chosen?

      (3) In Figure 3A, it is very striking and intriguing that the decrease in the expression of the PAX6 gene in passage 8 in relation to passage 2, which does not correspond to what is observed at the protein level. Have the authors verified this result using another technique, such as for example RT-q-PCR?

      (4) In Figure 5B, the labeling for GFAP, appears rather nuclear, despite being a cytoskeleton protein. How can the authors explain this?

    4. Author response:

      Reviewer #1 (Public review):

      Thank you for your thoughtful and constructive feedback on our manuscript. We greatly appreciate your insights regarding our work, as they are invaluable in refining our research.

      We are very happy to hear that you recognize the strengths of our method, particularly the elimination of manual rosette picking, which significantly enhances throughput and reduces variability. We are also pleased that our validation efforts—through flow cytometry, immunocytochemistry, single-cell RNA-sequencing, and functional MEA recordings—effectively demonstrate both the identity and functionality of our derived dorsal forebrain neural rosette stem cells (NRSCs).

      Regarding the identified weaknesses, we agree that a direct comparison with conventional manual-selection protocols, specifically those utilizing dual-SMAD inhibition, would be a significant improvement. To address this, we have initiated additional experiments that will directly compare our single-SMAD inhibition approach (RepSox) with dual-SMAD inhibition (SB/LDN), aiming for a comprehensive evaluation of both protocols.

      In terms of statistical rigor, we appreciate your suggestion on improving our quantitative assays. All data were collected from at least three independent experiments and presented as mean ±standard deviation unless otherwise specified. Due to the qualitative nature of the data, no formal statistical tests were performed for most of the experiments and the mean and standard deviation were calculated for some quantitative measurements obtained, providing a descriptive summary of the data. When possible, we will incorporate appropriate statistical tests, to present our data in a more robust manner, rather than merely reporting mean ± SD.

      Finally, we recognize the importance of situating our work within the broader landscape of neural stem cell research. We aim to elucidate the potential downstream applications for our protocol, which we believe will significantly impact neurodevelopmental and neurodegenerative disorder studies.

      Thank you again for your valuable suggestions. We look forward to refining our manuscript and enhancing the contribution of our research to the field.

      Reviewer #2 (Public review):

      Thank you for your thoughtful and constructive feedback on our manuscript. We appreciate your recognition of the novelty and potential impact of our protocol for obtaining neural rosette stem cells (NRSCs). Your comments are invaluable in improving our work.

      We are pleased that you found our methodology to be a significant advancement in the field, particularly the elimination of the manual rosette selection step, which hopefully will enhance homogeneity and standardization. We agree that this development has implications for research, disease modelling, and compound testing.

      Regarding your specific points:

      Passage expansion: Thank you for your insightful suggestion regarding the analysis beyond passage 12. We have continued passaging our NRSC line for more than 12 passages while maintaining the rosette structure. Although we do not yet have comprehensive and detailed analyses at these later passages, we will include some data and relevant information on our findings in the revised manuscript.

      TJP1+ zones: We appreciate your observation regarding the decreased TJP1+ zones at passage 12. We have not consistently detected a reduction in the number of rosettes or TJP1+ lumens across our cultures between passages. While some variability has been noted, we occasionally observe minor reductions at specific time points, followed by a recovery of rosettes in subsequent passages. This suggests that monitoring the number of rosettes is indeed a useful indicator of cell culture health. Cultures should be discarded if rosettes are completely lost. We will take a closer look at this aspect and report the findings in the revised manuscript.

      PAX6 Gene expression verification: Thank you for highlighting the discrepancy between PAX6 gene expression levels and protein levels. Unfortunately, we have not yet validated these results using an alternative technique. One potential explanation for this discrepancy may be the phenomenon of negative autoregulation, where increased levels of PAX6 protein can inhibit its own mRNA expression (Manuel et al., 2007). Moreover, Hsieh and Yang (2009) observed that during neurogenesis, PAX6 protein levels may not correlate linearly with mRNA levels, particularly in variable cellular environments. Additionally, post-transcriptional regulatory mechanisms, such as translation initiation mediated by Internal Ribosome Entry Sites (IRES), have been documented in various contexts involving PAX6, suggesting that mRNA levels may not fully represent functional protein levels in developing tissues (Li et al., 2023). We will go deeper into this discussion in the revised manuscript.

      GFAP Labeling: We appreciate your comments regarding the nuclear labeling of GFAP. In our astrocyte cultures, we have indeed observed GFAP localization in both the nucleus and the cytoplasm (Figure 5B). We will investigate this phenomenon further and provide a clearer explanation, supported by relevant literature, in the revised version. Although GFAP is primarily categorized as an intermediate filament protein localized in the cytoplasm, evidence suggests its nuclear localization may indicate additional regulatory roles during astrocyte development, activation, and pathology. This finding highlights the potential complexity of GFAP's role during fetal development and cellular stress, suggesting a broader functional scope that may extend into the nuclear space.

      Once again, thank you for your insightful feedback and for recognizing the potential of our research. We are committed to addressing your comments and enhancing the quality of our manuscript.

      Manuel, M. et al. (2007) ‘Controlled overexpression of Pax6 in vivo negatively autoregulates the Pax6 locus, causing cell-autonomous defects of late cortical progenitor proliferation with little effect on cortical arealization’, Development, 134(3), pp. 545–555. Available at: https://doi.org/10.1242/dev.02764.

      Hsieh, Y.-W. and Yang, X.-J. (2009) ‘Dynamic Pax6 expression during the neurogenic cell cycle influences proliferation and cell fate choices of retinal progenitors’, Neural Development, 4(1), p. 32. Available at: https://doi.org/10.1186/1749-8104-4-32.

      Li, Q. et al. (2023) ‘Translation of paired box 6 (PAX6) mRNA is IRES-mediated and inhibited by cymarin in breast cancer cells’, Genes & Genetic Systems, 98(4), pp. 161–169. Available at: https://doi.org/10.1266/ggs.23-00039.

    1. eLife Assessment

      The authors collected valuable time-course RNA-seq data from four tree species in natural environments and analyzed seasonal patterns of gene expression. The genome assemblies and gene expression data across multiple species and tissues are convincing, but the overarching conclusions are inadequately supported due to weaknesses in the study design, which encompasses three different environments and two distinct time periods. This makes it impossible to disentangle genetic effects - which are critical for evolutionary inferences - from environmental influences on gene expression.

    2. Reviewer #1 (Public review):

      Summary:

      The authors performed genome assemblies for two Fagaceae species and collected transcriptome data from four natural tree species every month over two years. They identified seasonal gene expression patterns and further analyzed species-specific differences.

      Strengths:

      The study of gene expression patterns in natural environments, as opposed to controlled chambers, is gaining increasing attention. The authors collected RNA-seq data monthly for two years from four tree species and analyzed seasonal expression patterns. The data are novel. The authors could revise the manuscript to emphasize seasonal expression patterns in three species (with one additional species having more limited data). Furthermore, the chromosome-scale genome assemblies for the two Fagaceae species represent valuable resources, although the authors did not cite existing assemblies from closely related species.

      Weaknesses:

      The study design has a fundamental flaw regarding the evaluation of genetic or evolutionary effects. As a basic principle in biology, phenotypes, including gene expression levels, are influenced by genetics, environmental factors, and their interaction. This principle is well-established in quantitative genetics.

      In this study, the four species were sampled from three different sites (see Materials and Methods, lines 543-546), and additionally, two species were sampled from 2019-2021, while the other two were sampled from 2021-2023 (see Figure S2). This critical detail should be clearly described in the Results and Materials and Methods. Due to these variations in sampling sites and periods, environmental conditions are not uniform across species.

      Even in studies conducted in natural environments, there are ways to design experiments that allow genetic effects to be evaluated. For example, by studying co-occurring species, or through transplant experiments, or in common gardens. To illustrate the issue, imagine an experiment where clones of a single species were sampled from three sites and two time periods, similar to the current design. RNA-seq analysis would likely detect differences that could qualitatively resemble those reported in this manuscript.

      One example is in line 197, where genus-specific expression patterns are mentioned. While it may be true that the authors' conclusions (e.g., winter synchronization, phylogenetic constraints) reflect real biological trends, these conclusions are also predictable even without empirical data, and the current dataset does not provide quantitative support.

      If the authors can present a valid method to disentangle genetic and environmental effects from their dataset, that would significantly strengthen the manuscript. However, I do not believe the current study design is suitable for this purpose.

      Unless these issues are addressed, the use of the term "evolution" is inappropriate in this context. The title should be revised, and the result sections starting from "Peak months distribution..." should be either removed or fundamentally revised. The entire Discussion section, which is based on evolutionary interpretation, should be deleted in its current form.

      If the authors still wish to explore genetic or evolutionary analyses, the pair of L. edulis and L. glaber, which were sampled at the same site and over the same period, might be used to analyze "seasonal gene expression divergence in relation to sequence divergence." Nevertheless, the manuscript would benefit from focusing on seasonal expression patterns without framing the study in evolutionary terms.

      To better support the seasonal expression analysis, the early RNA-seq analysis sections should be strengthened. There is little discussion of biological replicate variation or variation among branches of the same individual. These could be important factors to analyze. In line 137, the mapping rate for two species is mentioned, but the rates for each species should be clearly reported. One RNA-seq dataset is based on a species different from the reference genome, so a lower mapping rate is expected. While this likely does not hinder downstream analysis, quantification is important.

      In Figures 2A and 2B, clustering is used to support several points discussed in the Results section (e.g., lines 175-177). However, clustering is primarily a visualization method or a hypothesis-generating tool; it cannot serve as a statistical test. Stronger conclusions would require further statistical testing.

      The quality of the genome assemblies appears adequate, but related assemblies should be cited and discussed. Several assemblies of Fagaceae species already exist, including Quercus mongolica (Ai et al., Mol Ecol Res, 2022), Q. gilva (Front Plant Sci, 2022), and Fagus sylvatica (GigaScience, 2018), among others. Is there any novelty here? Can you compare your results with these existing assemblies?

      Most importantly, Figure 1B-D shows synteny between the two genera but also indicates homology between different chromosomes. Does this suggest paleopolyploidy or another novel feature? These chromosome connections should be interpreted in the main text-even if they could be methodological artifacts.

      In both the Results and Materials and Methods sections, descriptions of genome and RNA-seq data are unclear. In line 128, a paragraph on genome assembly suddenly introduces expression levels. RNA-seq data should be described before this. Similarly, in line 238, the sentence "we assembled high-quality reference genomes" seems disconnected from the surrounding discussion of expression studies. In line 632, Illumina short-read DNA sequencing is mentioned, but it's unclear how these data were used.

    3. Reviewer #2 (Public review):

      Summary:

      This study explores how gene expression evolves in response to seasonal environments, using four evergreen Fagaceae species growing in similar habitats in Japan. By combining chromosome-scale genome assemblies with a two-year RNA-seq time series in leaves and buds, the authors identify seasonal rhythms in gene expression and examine both conserved and divergent patterns. A central result is that winter bud expression is highly conserved across species, likely due to shared physiological demands under cold conditions. One of the intriguing implications of this study is that seasonal cycles might play a role similar to ontogenetic stages in animals. The authors touch on this by comparing their findings to the developmental hourglass model, and indeed, the recurrence of phenological states such as winter dormancy may act as a cyclic form of developmental canalization, shaping expression evolution in a way analogous to embryogenesis in animals.

      Strengths:

      (1) The evolutionary effects of seasonal environments on gene expression are rarely studied at this scale. This paper fills that gap.

      (2) The dataset is extensive, covering two years, two tissues, and four tree species, and is well suited to the questions being asked.

      (3) Transcriptome clustering across species (Figure 2) shows strong grouping by season and tissue rather than species, suggesting that the authors effectively controlled for technical confounders such as batch effects and mapping bias.

      (4) The idea that winter imposes a shared constraint on gene expression, especially in buds, is well argued and supported by the data.

      (5) The discussion links the findings to known concepts like phenological synchrony and the developmental hourglass model, which helps frame the results.

      Weaknesses:

      (1) While the hierarchical clustering shown in Figure 2A largely supports separation by tissue type and season, one issue worth noting is that some leaf samples appear to cluster closely with bud samples. The authors do not comment on this pattern, which raises questions about possible biological overlap between tissues during certain seasonal transitions or technical artifacts such as sample contamination. Clarifying this point would improve confidence in the interpretation of tissue-specific seasonal expression patterns.

      (2) While the study provides compelling evidence of conserved and divergent seasonal gene expression, it does not directly examine the role of cis-regulatory elements or chromatin-level regulatory architecture. Including regulatory genomic or epigenomic data would considerably strengthen the mechanistic understanding of expression divergence.

      (3) The manuscript includes a thoughtful analysis of flowering-related genes and seasonal GO enrichment (e.g., Figure 3C-D), providing an initial link between gene expression timing and phenological functions. However, the analysis remains largely gene-centric, and the study does not incorporate direct measurements of phenological traits (e.g., flowering or bud break dates). As a result, the connection between molecular divergence and phenotypic variation, while suggestive, remains indirect.

      (4) Although species were sampled from similar habitats, one species (Q. acuta) was collected at a higher elevation, and factors such as microclimate or local photoperiod conditions could influence expression patterns. These potential confounding variables are not fully accounted for, and their effects should be more thoroughly discussed or controlled in future analyses.

      (5) Statistical and Interpretive Concerns Regarding Δφ and dN/dS Correlation (Figures 5E and 5F):

      (a) Statistical Inappropriateness: Δφ is a discrete ordinal variable (likely 1-11), making it unsuitable for Pearson correlation, which assumes continuous, normally distributed variables. This undermines the statistical validity of the analysis.

      (b) Biological Interpretability: Even with the substantial statistical power afforded by genome-wide analysis, the observed correlations are extremely weak. This suggests that the relationship, if any, between temporal divergence in expression and protein-coding evolution is negligible.

      Taken together, these issues weaken the case for any biologically meaningful association between Δφ and dN/dS. I recommend either omitting these panels or clearly reframing them as exploratory and statistically limited observations.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors performed genome assemblies for two Fagaceae species and collected transcriptome data from four natural tree species every month over two years. They identified seasonal gene expression patterns and further analyzed species-specific differences.

      Strengths:

      The study of gene expression patterns in natural environments, as opposed to controlled chambers, is gaining increasing attention. The authors collected RNA-seq data monthly for two years from four tree species and analyzed seasonal expression patterns. The data are novel. The authors could revise the manuscript to emphasize seasonal expression patterns in three species (with one additional species having more limited data). Furthermore, the chromosome-scale genome assemblies for the two Fagaceae species represent valuable resources, although the authors did not cite existing assemblies from closely related species.

      Thank you for your careful assessment of our manuscript.

      Weaknesses:

      Comment; The study design has a fundamental flaw regarding the evaluation of genetic or evolutionary effects. As a basic principle in biology, phenotypes, including gene expression levels, are influenced by genetics, environmental factors, and their interaction. This principle is well-established in quantitative genetics.

      In this study, the four species were sampled from three different sites (see Materials and Methods, lines 543-546), and additionally, two species were sampled from 2019-2021, while the other two were sampled from 2021-2023 (see Figure S2). This critical detail should be clearly described in the Results and Materials and Methods. Due to these variations in sampling sites and periods, environmental conditions are not uniform across species.

      Even in studies conducted in natural environments, there are ways to design experiments that allow genetic effects to be evaluated. For example, by studying co-occurring species, or through transplant experiments, or in common gardens. To illustrate the issue, imagine an experiment where clones of a single species were sampled from three sites and two time periods, similar to the current design. RNA-seq analysis would likely detect differences that could qualitatively resemble those reported in this manuscript.

      One example is in line 197, where genus-specific expression patterns are mentioned. While it may be true that the authors' conclusions (e.g., winter synchronization, phylogenetic constraints) reflect real biological trends, these conclusions are also predictable even without empirical data, and the current dataset does not provide quantitative support.

      If the authors can present a valid method to disentangle genetic and environmental effects from their dataset, that would significantly strengthen the manuscript. However, I do not believe the current study design is suitable for this purpose.

      Unless these issues are addressed, the use of the term "evolution" is inappropriate in this context. The title should be revised, and the result sections starting from "Peak months distribution..." should be either removed or fundamentally revised. The entire Discussion section, which is based on evolutionary interpretation, should be deleted in its current form.

      If the authors still wish to explore genetic or evolutionary analyses, the pair of L. edulis and L. glaber, which were sampled at the same site and over the same period, might be used to analyze "seasonal gene expression divergence in relation to sequence divergence." Nevertheless, the manuscript would benefit from focusing on seasonal expression patterns without framing the study in evolutionary terms.

      We sincerely thank the reviewer for the detailed and thoughtful comments. We fully recognize the importance of carefully distinguishing genetic and environmental contributions in transcriptomic studies, particularly when addressing evolutionary questions. The reviewer identified two major concerns regarding our study design: (1) the use of different monitoring periods across species, and (2) the use of samples collected from different study sites. We addressed both concerns with additional analyses using 112 new samples and now present new evidence that supports the robustness of our conclusions.

      (1) Monitoring period variation does not bias our conclusions

      To address concerns about the differing monitoring periods, we added new RNA-seq data (42 samples each for bud and leaf samples for L. glaber and 14 samples each for bud and leaf samples for L. edulis) collected from November 2021 to November 2022, enabling direct comparison across species within a consistent timeframe. Hierarchical clustering of this expanded dataset (Fig. S6) yielded results consistent with our original findings: winter-collected samples cluster together regardless of species identity. This strongly supports our conclusion that the seasonal synchrony observed in winter is not an artifact of the monitoring period and demonstrates the robustness of our conclusions across datasets.

      (2) Site variation is limited and does not confound our findings

      Although the study included three sites, two of them (Imajuku and Ito Campus) are only 7.3 km apart, share nearly identical temperature profiles (see Fig. S2), and are located at the edge of similar evergreen broadleaf forests. Only Q. acuta was sampled from a higher-altitude, cooler site. To assess whether the higher elevation site of Q. acuta introduced confounding environmental effects, we reanalyzed the data after excluding this species. Hierarchical clustering still revealed that winter bud samples formed a distinct cluster regardless of species identity (Fig. S7), consistent with our original finding.

      Furthermore, we recalculated the molecular phenology divergence index D (Fig. 4C) and the interspecific Pearson’s correlation coefficients (Fig. 5A) without including Q. acuta. These analyses produced results that were similar to those obtained from the full dataset (Fig. S12; Fig. S14), indicating that the observed patterns are not driven by environmental differences associated with elevation.

      (3) Justification for our approach in natural systems

      We agree with the reviewer that experimental approaches such as common gardens, reciprocal transplants, and the use of co-occurring species are valuable for disentangling genetic and environmental effects. In fact, we have previously implemented such designs in studies using the perennial herb Arabidopsis halleri (Komoto et al., 2022, https://doi.org/10.1111/pce.14716) and clonal Someiyoshino cherry trees (Miyawaki-Kuwakado et al., 2024, https://doi.org/10.1002/ppp3.10548) to examine environmental effects on gene expression. However, extending these approaches to long-lived tree species in diverse natural ecosystems poses significant logistical and biological challenges. In this study, we addressed this limitation by including three co-occurring species at the same site, which allowed us to evaluate interspecific differences under comparable environmental conditions. Importantly, even when we limited our analyses to these co-occurring species, the results remained consistent, indicating that the observed variation in transcriptomic profiles cannot be attributed to environmental factors alone and likely reflects underlying genetic influences.

      Accordingly, we added four new figures (Fig. S6, Fig. S7, Fig. S12 and Fig. S14) and revised the manuscript to clarify the limitations and strengths of our design, to tone down the evolutionary claims where appropriate, and to more explicitly define the scope of our conclusions in light of the data. We hope that these efforts sufficiently address the reviewer’s concerns and strengthen the manuscript.

      To better support the seasonal expression analysis, the early RNA-seq analysis sections should be strengthened. There is little discussion of biological replicate variation or variation among branches of the same individual. These could be important factors to analyze. In line 137, the mapping rate for two species is mentioned, but the rates for each species should be clearly reported. One RNA-seq dataset is based on a species different from the reference genome, so a lower mapping rate is expected. While this likely does not hinder downstream analysis, quantification is important.

      We thank the reviewer 1 for the helpful comment. To evaluate the variation among biological replicates, we compared the expression level of each gene across different individuals. We observed high correlation between each pair of individuals (Q. glauca (n=3): an average correlation coefficient r = 0.947; Q. acuta (n=3): r = 0.948; L. glaber (n=3): r = 0.948)). This result suggests that the seasonal gene expression pattern is highly synchronized across individuals within the same species. We mentioned this point in the Result section in the revised manuscript. We also calculated the mean mapping rates for each species. As the reviewer expected, the mapping rate was slightly lower in Q. acuta (88.6 ± 2.3%) and L. glaber (84.3 ± 5.4%), whose RNA-Seq data were mapped to reference genomes of related but different species, compared to that in Q. glauca (92.6 ± 2.2%) and L. edulis (89.3 ± 2.7%). However, we minimized the impact of these differences on downstream analysis. These details have been included in the revised main text.

      In Figures 2A and 2B, clustering is used to support several points discussed in the Results section (e.g., lines 175-177). However, clustering is primarily a visualization method or a hypothesis-generating tool; it cannot serve as a statistical test. Stronger conclusions would require further statistical testing.

      We thank the reviewer for the helpful comment. As noted, we acknowledge that hierarchical clustering (Fig. 2A) is primarily a visualization and hypothesis-generating method. To assess the biological relevance of the clusters identified, we conducted a Mann-Whitney U test or the Steel-Dwass test to evaluate whether the environmental temperatures at the time of sample collection differed significantly among the clusters. This analysis (Fig. 2B) revealed statistically significant differences in temperature in the cluster B3 (p < 0.01), indicating that the gene expression clusters are associated with seasonal thermal variation. These results support the interpretation that the clusters reflect coordinated transcriptional responses to environmental temperature. We revised the Results section to clarify this point.

      The quality of the genome assemblies appears adequate, but related assemblies should be cited and discussed. Several assemblies of Fagaceae species already exist, including Quercus mongolica (Ai et al., Mol Ecol Res, 2022), Q. gilva (Front Plant Sci, 2022), and Fagus sylvatica (GigaScience, 2018), among others. Is there any novelty here? Can you compare your results with these existing assemblies?

      We agree that genome assemblies of Fagaceae species are becoming increasing available. However, our study does not aim to emphasize the novelty of the genome assemblies per se. Rather, with the increasing availability of chromosome-level genomes, we regard genome assembly as a necessary foundation for more advanced analyses. The main objective of our study is to investigate how each gene is expressed in response to seasonal environmental changes, and to link genome information with seasonal transcriptomic dynamics. To address the reviewer’s comment in line with this objective, we added a discussion on the syntenic structure of eight genome assemblies spanning four genera within the Fagaceae, including a species from the genus Fagus (Ikezaki et al. 2025, https://doi.org/10.1101/2025.07.31.667835). This addition helps to position our work more clearly within the context of existing genomic resources.

      Most importantly, Figure 1B-D shows synteny between the two genera but also indicates homology between different chromosomes. Does this suggest paleopolyploidy or another novel feature? These chromosome connections should be interpreted in the main text-even if they could be methodological artifacts.

      A previous study on genome size variation in Fagaceae suggested that, given the consistent ploidy level across the family, genome expansion likely occurred through relatively small segmental duplications rather than whole-genome duplications. Because Figure 1B-D supports this view, we cited the following reference in the revised version of the manuscript.

      Chen et al. (2014)  https://doi.org/10.1007/s11295-014-0736-y

      In both the Results and Materials and Methods sections, descriptions of genome and RNA-seq data are unclear. In line 128, a paragraph on genome assembly suddenly introduces expression levels. RNA-seq data should be described before this. Similarly, in line 238, the sentence "we assembled high-quality reference genomes" seems disconnected from the surrounding discussion of expression studies. In line 632, Illumina short-read DNA sequencing is mentioned, but it's unclear how these data were used.

      We relocated the explanation regarding the expression levels of single-copy and multi-copy genes to the section titled “Seasonal gene expression dynamics.” Additionally, we clarified in the Materials and Methods section that short-read sequencing data were used for both genome size estimation and phylogenetic reconstruction.

      Reviewer #2 (Public review):

      Summary:

      This study explores how gene expression evolves in response to seasonal environments, using four evergreen Fagaceae species growing in similar habitats in Japan. By combining chromosome-scale genome assemblies with a two-year RNA-seq time series in leaves and buds, the authors identify seasonal rhythms in gene expression and examine both conserved and divergent patterns. A central result is that winter bud expression is highly conserved across species, likely due to shared physiological demands under cold conditions. One of the intriguing implications of this study is that seasonal cycles might play a role similar to ontogenetic stages in animals. The authors touch on this by comparing their findings to the developmental hourglass model, and indeed, the recurrence of phenological states such as winter dormancy may act as a cyclic form of developmental canalization, shaping expression evolution in a way analogous to embryogenesis in animals.

      Strengths:

      (1) The evolutionary effects of seasonal environments on gene expression are rarely studied at this scale. This paper fills that gap.

      (2) The dataset is extensive, covering two years, two tissues, and four tree species, and is well suited to the questions being asked.

      (3) Transcriptome clustering across species (Figure 2) shows strong grouping by season and tissue rather than species, suggesting that the authors effectively controlled for technical confounders such as batch effects and mapping bias.

      (4) The idea that winter imposes a shared constraint on gene expression, especially in buds, is well argued and supported by the data.

      (5) The discussion links the findings to known concepts like phenological synchrony and the developmental hourglass model, which helps frame the results.

      We are grateful for the reviewer for the detailed and thoughtful review of our manuscript.

      Weaknesses:

      (1) While the hierarchical clustering shown in Figure 2A largely supports separation by tissue type and season, one issue worth noting is that some leaf samples appear to cluster closely with bud samples. The authors do not comment on this pattern, which raises questions about possible biological overlap between tissues during certain seasonal transitions or technical artifacts such as sample contamination. Clarifying this point would improve confidence in the interpretation of tissue-specific seasonal expression patterns.

      Leaf samples clustered into the bud are newly flushed leaves collected in April for Q. glauca, May for Q. acuta, May and June for L. edulis, and August and September for L. glaber. To clarify this point, we highlighted these newly flushed leaf samples as asterisk in the revised figure (Fig. 2A).

      comment; (2) While the study provides compelling evidence of conserved and divergent seasonal gene expression, it does not directly examine the role of cis-regulatory elements or chromatin-level regulatory architecture. Including regulatory genomic or epigenomic data would considerably strengthen the mechanistic understanding of expression divergence.

      We thank the reviewer for this insightful comment. As noted in the Discussion section, we hypothesize that such genome-wide seasonal expression patterns—and their divergence across species—are likely mediated by cis-regulatory elements and chromatin-level mechanisms. While a direct investigation of regulatory architecture was beyond the scope of the present study, we fully agree that incorporating regulatory genomic and epigenomic data would significantly deepen the mechanistic understanding of expression divergence. In this regard, we are currently working to identify putative cis-regulatory elements in non-coding regions and are collecting epigenetic data from the same tree species using ChIP-seq. We believe the current study provide a foundation for these future investigations into the regulatory basis of seasonal transcriptome variation. We made a minor revision to the Discussion to note that an important future direction is to investigate the evolution of non-coding sequences that regulate gene expression in response to seasonal environmental changes.

      (3) The manuscript includes a thoughtful analysis of flowering-related genes and seasonal GO enrichment (e.g., Figure 3C-D), providing an initial link between gene expression timing and phenological functions. However, the analysis remains largely gene-centric, and the study does not incorporate direct measurements of phenological traits (e.g., flowering or bud break dates). As a result, the connection between molecular divergence and phenotypic variation, while suggestive, remains indirect.

      We would like to note that phenological traits have been observed in the field on a monthly basis throughout the sampling period and the phenological data were plotted together with molecular phenology (e.g. Fig. 2A, C; Fig. 3C, D). Although the temporal resolution is limited, these observations captured species-specific differences in key phenological events such as leaf flushing and flowering times. We revised the manuscript to clarify this point.

      (4) Although species were sampled from similar habitats, one species (Q. acuta) was collected at a higher elevation, and factors such as microclimate or local photoperiod conditions could influence expression patterns. These potential confounding variables are not fully accounted for, and their effects should be more thoroughly discussed or controlled in future analyses.

      We fully agree with the reviewer that local environmental conditions, including microclimate and photoperiod differences, could potentially influence gene expression patterns. To assess whether the higher elevation site of Q. acuta introduced confounding environmental effects, we reanalyzed the data after excluding this species. Hierarchical clustering still revealed that winter bud samples formed a distinct cluster regardless of species identity (Fig. S7), consistent with our original finding.

      Furthermore, we recalculated the molecular phenology divergence index D (Fig. 4C) and the interspecific Pearson’s correlation coefficients (Fig. 5A) without including Q. acuta. These analyses produced results that were qualitatively similar to those obtained from the full dataset (Fig. S12; Fig. S14), indicating that the observed patterns are not driven by environmental differences associated with elevation.

      We believe these additional analyses help to decouple the effects of environment and genetics, and support our conclusion that both seasonal synchrony and phylogenetic constraints play key roles in shaping transcriptome dynamics. We added four new figures (Fig. S6, Fig. S7, Fig. S12 and Fig. S14) and revised the text accordingly to clarify this point and to acknowledge the potential impact of site-specific environmental variation.

      (5) Statistical and Interpretive Concerns Regarding Δφ and dN/dS Correlation (Figures 5E and 5F):

      (a) Statistical Inappropriateness: Δφ is a discrete ordinal variable (likely 1-11), making it unsuitable for Pearson correlation, which assumes continuous, normally distributed variables. This undermines the statistical validity of the analysis.

      We thank the reviewer for the insightful comment. We would like to clarify that the analysis presented in Figures 5E and 5F was based on linear regression, not Pearson’s correlation. Although Δφ is a discrete variable, it takes values from 0 to 6 in 0.5 increments, resulting in 13 levels. We treated it as a quasi-continuous variable for the purposes of linear regression analysis. This approach is commonly adopted in practice when a discrete variable has sufficient resolution and ordering to approximate continuity. To enhance clarity, we revised the manuscript to explicitly state that linear regression was used, and we now reported the regression coefficient and associated p-value to support the interpretation of the observed trend.

      (b) Biological Interpretability: Even with the substantial statistical power afforded by genome-wide analysis, the observed correlations are extremely weak. This suggests that the relationship, if any, between temporal divergence in expression and protein-coding evolution is negligible.

      Taken together, these issues weaken the case for any biologically meaningful association between Δφ and dN/dS. I recommend either omitting these panels or clearly reframing them as exploratory and statistically limited observations.

      We agree with the reviewer’s comment. While we retained the original panels, we reframed our interpretation to emphasize that, despite statistical significance, the observed correlation is very weak—suggesting that coding region variation is unlikely to be the primary driver of seasonal gene expression patterns. Accordingly, we revised the “Relating seasonal gene expression divergence to sequence divergence” section in the Results, as well as the relevant part of the Discussion.

    1. eLife Assessment

      This important study introduces an advance in multi-animal tracking by reframing identity assignment as a self-supervised contrastive representation learning problem. It eliminates the need for segments of video where all animals are simultaneously visible and individually identifiable, and significantly improves tracking speed, accuracy, and robustness with respect to occlusion. This innovation has implications beyond animal tracking, potentially connecting with advances in behavioral analysis and computer vision. While the strength of support for these advances is solid overall, the presentation could be greatly improved for clarity and broader accessibility; in addition, incorporating more standard metrics in the multi-animal tracking literature would better benchmark the approach against other methods.

    2. Reviewer #1 (Public review):

      Summary:

      This is a strong paper that presents a clear advance in multi-animal tracking. The authors introduce an updated version of idtracker.ai that reframes identity assignment as a contrastive learning problem rather than a classification task requiring global fragments. This change leads to gains in speed and accuracy. The method eliminates a known bottleneck in the original system, and the benchmarking across species is comprehensive and well executed. I think the results are convincing and the work is significant.

      Strengths:

      The main strengths are the conceptual shift from classification to representation learning, the clear performance gains, and the fact that the new version is more robust. Removing the need for global fragments makes the software more flexible in practice, and the accuracy and speed improvements are well demonstrated. The software appears thoughtfully implemented, with GUI updates and integration with pose estimators.

      Weaknesses:

      I don't have any major criticisms, but I have identified a few points that should be addressed to improve the clarity and accuracy of the claims made in the paper.

      (1) The title begins with "New idtracker.ai," which may not age well and sounds more promotional than scientific. The strength of the work is the conceptual shift to contrastive representation learning, and it might be more helpful to emphasize that in the title rather than branding it as "new."

      (2) Several technical points regarding the comparison between TRex (a system evaluated in the paper) and idtracker.ai should be addressed to ensure the evaluation is fair and readers are fully informed.

      (2.1) Lines 158-160: The description of TRex as based on "Protocol 2 of idtracker.ai" overlooks several key additions in TRex, such as posture image normalization, tracklet subsampling, and the use of uniqueness feedback during training. These features are not acknowledged, and it's unclear whether TRex was properly configured - particularly regarding posture estimation, which appears to have been omitted but isn't discussed. Without knowing the actual parameters used to make comparisons, it's difficult to assess how the method was evaluated.

      (2.2) Lines 162-163: The paper implies that TRex gains speed by avoiding Protocol 3, but in practice, idtracker.ai also typically avoids using Protocol 3 due to its extremely long runtime. This part of the framing feels more like a rhetorical contrast than an informative one.

      (2.3) Lines 277-280: The contrastive loss function is written using the label l, but since it refers to a pair of images, it would be clearer and more precise to write it as l_{I,J}. This would help readers unfamiliar with contrastive learning understand the formulation more easily.

      (2.4) Lines 333-334: The manuscript states that TRex can fail to track certain videos, but this may be inaccurate depending on how the authors classify failures. TRex may return low uniqueness scores if training does not converge well, but this isn't equivalent to tracking failure. Moreover, the metric reported by TRex is uniqueness, not accuracy. Equating the two could mislead readers. If the authors did compare outputs to human-validated data, that should be stated more explicitly.

      (2.5) Lines 339-341: The evaluation approach defines a "successful run" and then sums the runtime across all attempts up to that point. If success is defined as simply producing any output, this may not reflect how experienced users actually interact with the software, where parameters are iteratively refined to improve quality.

      (2.6) Lines 344-346: The simulation process involves sampling tracking parameters 10,000 times and selecting the first "successful" run. If parameter tuning is randomized rather than informed by expert knowledge, this could skew the results in favor of tools that require fewer or simpler adjustments. TRex relies on more tunable behavior, such as longer fragments improving training time, which this approach may not capture.

      (2.7) Line 354 onward: TRex was evaluated using two varying parameters (threshold and track_max_speed), while idtracker.ai used only one (intensity_threshold). With a fixed number of samples, this asymmetry could bias results against TRex. In addition, users typically set these parameters based on domain knowledge rather than random exploration.

      (2.8) Figure 2-figure supplement 3: The memory usage comparison lacks detail. It's unclear whether RAM or VRAM was measured, whether shared or compressed memory was included, or how memory was sampled. Since both tools dynamically adjust to system resources, the relevance of this comparison is questionable without more technical detail.

      (3) While the authors cite several key papers on contrastive learning, they do not use the introduction or discussion to effectively situate their approach within related fields where similar strategies have been widely adopted. For example, contrastive embedding methods form the backbone of modern facial recognition and other image similarity systems, where the goal is to map images into a latent space that separates identities or classes through clustering. This connection would help emphasize the conceptual strength of the approach and align the work with well-established applications. Similarly, there is a growing literature on animal re-identification (ReID), which often involves learning identity-preserving representations across time or appearance changes. Referencing these bodies of work would help readers connect the proposed method with adjacent areas using similar ideas, and show that the authors are aware of and building on this wider context.

      (4) Some sections of the Results text (e.g., lines 48-74) read more like extended figure captions than part of the main narrative. They include detailed explanations of figure elements, sorting procedures, and video naming conventions that may be better placed in the actual figure captions or moved to supplementary notes. Streamlining this section in the main text would improve readability and help the central ideas stand out more clearly.

      Overall, though, this is a high-quality paper. The improvements to idtracker.ai are well justified and practically significant. Addressing the above comments will strengthen the work, particularly by clarifying the evaluation and comparisons.

    3. Reviewer #2 (Public review):

      This work introduces a new version of the state-of-the-art idtracker.ai software for tracking multiple unmarked animals. The authors aimed to solve a critical limitation of their previous software, which relied on the existence of "global fragments" (video segments where all animals are simultaneously visible) to train an identification classifier network, in addition to addressing concerns with runtime speed. To do this, the authors have both re-implemented the backend of their software in PyTorch (in addition to numerous other performance optimizations) as well as moving from a supervised classification framework to a self-supervised, contrastive representation learning approach that no longer requires global fragments to function. By defining positive training pairs as different images from the same fragment and negative pairs as images from any two co-existing fragments, the system cleverly takes advantage of partial (but high-confidence) tracklets to learn a powerful representation of animal identity without direct human supervision. Their formulation of contrastive learning is carefully thought out and comprises a series of empirically validated design choices that are both creative and technically sound. This methodological advance is significant and directly leads to the software's major strengths, including exceptional performance improvements in speed and accuracy and a newfound robustness to occlusion (even in severe cases where no global fragments can be detected). Benchmark comparisons show the new software is, on average, 44 times faster (up to 440 times faster on difficult videos) while also achieving higher accuracy across a range of species and group sizes. This new version of idtracker.ai is shown to consistently outperform the closely related TRex software (Walter & Couzin, 2021\), which, together with the engineering innovations and usability enhancements (e.g., outputs convenient for downstream pose estimation), positions this tool as an advancement on the state-of-the-art for multi-animal tracking, especially for collective behavior studies.

      Despite these advances, we note a number of weaknesses and limitations that are not well addressed in the present version of this paper:

      (1) The contrastive representation learning formulation

      Contrastive representation learning using deep neural networks has long been used for problems in the multi-object tracking domain, popularized through ReID approaches like DML (Yi et al., 2014\) and DeepReID (Li et al., 2014). More recently, contrastive learning has become more popular as an approach for scalable self-supervised representation learning for open-ended vision tasks, as exemplified by approaches like SimCLR (Chen et al., 2020), SimSiam (Chen et al., 2020\), and MAE (He et al., 2021\) and instantiated in foundation models for image embedding like DINOv2 (Oquab et al., 2023). Given their prevalence, it is useful to contrast the formulation of contrastive learning described here relative to these widely adopted approaches (and why this reviewer feels it is appropriate):

      (1.1) No rotations or other image augmentations are performed to generate positive examples. These are not necessary with this approach since the pairs are sampled from heuristically tracked fragments (which produces sufficient training data, though see weaknesses discussed below) and the crops are pre-aligned egocentrically (mitigating the need for rotational invariance).

      (1.2) There is no projection head in the architecture, like in SimCLR. Since classification/clustering is the only task that the system is intended to solve, the more general "nuisance" image features that this architectural detail normally affords are not necessary here.

      (1.3) There is no stop gradient operator like in BYOL (Grill et al., 2020\) or SimSiam. Since the heuristic tracking implicitly produces plenty of negative pairs from the fragments, there is no need to prevent representational collapse due to class asymmetry. Some care is still needed, but the authors address this well through a pair sampling strategy (discussed below).

      (1.4) Euclidean distance is used as the distance metric in the loss rather than cosine similarity as in most contrastive learning works. While cosine similarity coupled with L2-normalized unit hypersphere embeddings has proven to be a successful recipe to deal with the curse of dimensionality (with the added benefit of bounded distance limits), the authors address this through a cleverly constructed loss function that essentially allows direct control over the intra- and inter-cluster distance (D\_pos and D\_neg). This is a clever formulation that aligns well with the use of K-means for the downstream assignment step.

      No concerns here, just clarifications for readers who dig into the review. Referencing the above literature would enhance the presentation of the paper to align with the broader computer vision literature.

      (2) Network architecture for image feature extraction backbone

      As most of the computations that drive up processing time happen in the network backbone, the authors explored a variety of architectures to assess speed, accuracy, and memory requirements. They land on ResNet18 due to its empirically determined performance. While the experiments that support this choice are solid, the rationale behind the architecture selection is somewhat weak. The authors state that:

      "\[W\]e tested 23 networks from 8 different families of state-of-the-art convolutional neural network architectures, selected for their compatibility with consumer-grade GPUs and ability to handle small input images (20 × 20 to 100 × 100 pixels) typical in collective animal behavior videos."

      (2.1) Most modern architectures have variants that are compatible with consumer-grade GPUs. This is true of, for example, HRNet (Wang et al., 2019), ViT (Dosovitskiy et al., 2020), SwinT (Liu et al., 2021), or ConvNeXt (Liu et al., 2022), all of which report single GPU training and fast runtime speeds through lightweight configuration or subsequent variants, e.g., MobileViT (Mehta et al., 2021). The authors may consider revising that statement or providing additional support for that claim (e.g., empirical experiments) given that these have been reported to outperform ResNet18 across tasks.

      (2.2) The compatibility of different architectures with small image sizes is configurable. Most convolutional architectures can be readily adapted to work with smaller image sizes, including 20x20 crops. With their default configuration, they lose feature map resolution through repeated pooling and downsampling steps, but this can be readily mitigated by swapping out standard convolutions with dilated convolutions and/or by setting the stride of pooling layers to 1, preserving feature map resolution across blocks. While these are fairly straightforward modifications (and are even compatible with using pretrained weights), an even more trivial approach is to pad and/or resize the crops to the default image size, which is likely to improve accuracy at a possibly minimal memory and runtime cost. These techniques may even improve the performance with the architectures that the authors did test out.

      (2.3) The authors do not report whether the architecture experiments were done with pretrained or randomly initialized weights.

      (2.4) The authors do not report some details about their ResNet18 design, specifically whether a global pooling layer is used and whether the output fully connected layer has any activation function. Additionally, they do not report the version of ResNet18 employed here, namely, whether the BatchNorm and ReLU are applied after (v1) or before (v2) the conv layers in the residual path.

      (3) Pair sampling strategy

      The authors devised a clever approach for sampling positive and negative pairs that is tailored to the nature of the formulation. First, since the positive and negative labels are derived from the co-existence of pretracked fragments, selection has to be done at the level of fragments rather than individual images. This would not be the case if one of the newer approaches for contrastive learning were employed, but it serves as a strength here (assuming that fragment generation/first pass heuristic tracking is achievable and reliable in the dataset). Second, a clever weighted sampling scheme assigns sampling weights to the fragments that are designed to balance "exploration and exploitation". They weigh samples both by fragment length and by the loss associated with that fragment to bias towards different and more difficult examples.

      (3.1) The formulation described here resembles and uses elements of online hard example mining (Shrivastava et al., 2016), hard negative sampling (Robinson et al., 2020\), and curriculum learning more broadly. The authors may consider referencing this literature (particularly Robinson et al., 2020\) for inspiration and to inform the interpretation of the current empirical results on positive/negative balancing.

      (4) Speed and accuracy improvements

      The authors report considerable improvements in speed and accuracy of the new idTracker (v6) over the original idTracker (v4?) and TRex. It's a bit unclear, however, which of these are attributable to the engineering optimizations (v5?) versus the representation learning formulation.

      (4.1) Why is there an improvement in accuracy in idTracker v5 (L77-81)? This is described as a port to PyTorch and improvements largely related to the memory and data loading efficiency. This is particularly notable given that the progression went from 97.52% (v4; original) to 99.58% (v5; engineering enhancements) to 99.92% (v6; representation learning), i.e., most of the new improvement in accuracy owes to the "optimizations" which are not the central emphasis of the systematic evaluations reported in this paper.

      (4.2) What about the speed improvements? Relative to the original (v4), the authors report average speed-ups of 13.6x in v5 and 44x in v6. Presumably, the drastic speed-up in v6 comes from a lower Protocol 2 failure rate, but v6 is not evaluated in Figure 2 - figure supplement 2.

      (5) Robustness to occlusion

      A major innovation enabled by the contrastive representation learning approach is the ability to tolerate the absence of a global fragment (contiguous frames where all animals are visible) by requiring only co-existing pairs of fragments owing to the paired sampling formulation. While this removes a major limitation of the previous versions of idtracker.ai, its evaluation could be strengthened. The authors describe an ablation experiment where an arc of the arena is masked out to assess the accuracy under artificially difficult conditions. They find that the v6 works robustly up to significant proportions of occlusions, even when doing so eliminates global fragments.

      (5.1) The experiment setup needs to be more carefully described.<br /> What does the masking procedure entail? Are the pixels masked out in the original video or are detections removed after segmentation and first pass tracking is done?<br /> What happens at the boundary of the mask? (Partial segmentation masks would throw off the centroids, and doing it after original segmentation does not realistically model the conditions of entering an occlusion area.)<br /> Are fragments still linked for animals that enter and then exit the mask area?<br /> How is the evaluation done? Is it computed with or without the masked region detections?

      (5.2) The circular masking is perhaps not the most appropriate for the mouse data, which is collected in a rectangular arena.

      (5.3) The number of co-existing fragments, which seems to be the main determinant of performance that the authors derive from this experiment, should be reported for these experiments. In particular, a "number of co-existing fragments" vs accuracy plot would support the use of the 0.25(N-1) heuristic and would be especially informative for users seeking to optimize experimental and cage design. Additionally, the number of co-existing fragments can be artificially reduced in other ways other than a fixed occlusion, including random dropout, which would disambiguate it from potential allocentric positional confounds (particularly relevant in arenas where egocentric pose is correlated with allocentric position).

      (6) Robustness to imaging conditions

      The authors state that "the new idtracker.ai can work well with lower resolutions, blur and video compression, and with inhomogeneous light (Figure 2 - figure supplement 4)." (L156).

      Despite this claim, there are no speed or accuracy results reported for the artificially corrupted data, only examples of these image manipulations in the supplementary figure.

      (7) Robustness across longitudinal or multi-session experiments

      The authors reference idmatcher.ai as a compatible tool for this use case (matching identities across sessions or long-term monitoring across chunked videos), however, no performance data is presented to support its usage.

      This is relevant as the innovations described here may interact with this setting. While deep metric learning and contrastive learning for ReID were originally motivated by these types of problems (especially individuals leaving and entering the FOV), it is not clear that the current formulation is ideally suited for this use case. Namely, the design decisions described in point 1 of this review are at times at odds with the idea of learning generalizable representations owing to the feature extractor backbone (less scalable), low-dimensional embedding size (less representational capacity), and Euclidean distance metric without hypersphere embedding (possible sensitivity to drift).

      It's possible that data to support point 6 can mitigate these concerns through empirical results on variations in illumination, but a stronger experiment would be to artificially split up a longer video into shorter segments and evaluate how generalizable and stable the representations learned in one segment are across contiguous ("longitudinal") or discontiguous ("multi-session") segments.

    4. Reviewer #3 (Public review):

      Summary:

      The authors propose a new version of idTracker.ai for animal tracking. Specifically, they apply contrastive learning to embed cropped images of animals into a feature space where clusters correspond to individual animal identities.

      Strengths:

      By doing this, the new software alleviates the requirement for so-called global fragments - segments of the video, in which all entities are visible/detected at the same time - which was necessary in the previous version of the method. In general, the new method reduces the tracking time compared to the previous versions, while also increasing the average accuracy of assigning the identity labels.

      Weaknesses:

      The general impression of the paper is that, in its current form, it is difficult to disentangle the old from the new method and understand the method in detail. The manuscript would benefit from a major reorganization and rewriting of its parts. There are also certain concerns about the accuracy metric and reducing the computational time.

    5. Author response:

      We thank the editor and reviewers for their positive and detailed review of the preprint. We will use these comments to improve the manuscript's revised version, which we plan to submit in the coming weeks, including: a) tests of variants of ResNet, other network architectures and the use of pre-trained weights, b) clarification and justification of the accuracy metrics used in the benchmark, c) an expanded study about the fragment connectivity in Figure 3, and d) a study the performance of idmatcher.ai with the new idtracker.ai.

    1. eLife Assessment

      This useful study presents interesting observations on the potential importance of extracellular transport of human papillomaviruses along actin protrusions by retrograde flow. The focus on the events of HPV infection between ECM binding and keratinocyte-specific receptor binding is unique and interesting. However, the evidence supporting the conclusions is incomplete, and additional experimental support is needed. Because conclusions drawn regarding HS interactions are largely based on experiments using a single HS mAb, the specificity of this mAb needs to be described in more detail, either based on the literature or further experimentation.

    2. Reviewer #1 (Public review):

      The authors' goal was to arrest PsV capsids on the extracellular matrix using cytochalasin D. The cohort was then released, and interaction with the cell surface, specifically with CD151, was assessed.

      The model that fragmented HS associated with released virions mediates the dominant mechanism of infectious entry has only been suggested by research from a single laboratory and has not been verified in the 10+ years since publication. The authors are basing this study on the assumption that this model is correct, and these data are referred to repeatedly as the accepted model despite much evidence to the contrary. The discussion in lines 65-71 concerning virion and HSPG affinity changes is greatly simplified. The structural changes in the capsid induced by HS interaction and the role of this priming for KLK8 and furin cleavage have been well researched. Multiple laboratories have independently documented this. If this study aims to verify the shedding model, additional data need to be provided. The model should be fitted into established entry events, or at minimum, these conflicting data, a subset of which is noted below, need to be acknowledged.

      (1) The Sapp lab (Richards et al., 2013) found that HSPG-mediated conformational changes in L1 and L2 allowed the release of the virus from primary binding and allowing secondary receptor engagements in the absence of HS shedding.

      (2) Becker et al. found that furin-precleaved capsids could infect cells independently of HSPG interaction, but this infection was still inhibited with cytochalasin D.

      (3) Other work from the Schelhaas lab showed that cytochalasin D inhibition of infection resulted in the accumulation of capsids in deep invaginations from the cell surface, not on the ECM.

      (4) Selinka et al., 2007, showed that preventing HSPG-induced conformational changes in the capsid surface resulted in noninfectious uptake that was not prevented with cytochalasin D.

      (5) The well-described capsid processing events by KLK8 and furin need to be mechanistically linked to the proposed model. Does inhibition of either of these cleavages prevent engagement with CD151?

      The authors need to consider an explanation for these discrepancies.

      Other issues:

      (1) Line 110-111. The statement about PsVs in the ECM being too far away from the cell surface to make physical contact with the cell surface entry receptors is confusing. ECM binding has not been shown to be an obligatory step for in vitro infection. This idea is referred to again on lines 158-159 and 199. The claim (line 158) that PsV does not interact with the cell within an hour needs to be demonstrated experimentally and seems at odds with multiple laboratories' data. PsV has been shown to directly interact with HSPG on the cell surface in addition to the ECM. Why are these PsVs not detected?

      (2) The experiments shown in Figure 5 need to be better controlled. Why is there no HS staining of the cell surface at the early timepoints? This antibody has been shown to recognize N-sulfated glucosamine residues on HS and, therefore, detects HSPG on the ECM and cell surface. Therefore, the conclusion that this confirms HS coating of PsV during release from the ECM (line 430-431) is unfounded. How do the authors distinguish between "HS-coated virions" and HSPG-associated virions?

      It is difficult to comprehend how the addition of 50 vge/cell of PsV could cause such a global change in HS levels. The claim that the HS levels are decreased in the non-cytochalasin-treated cells due to PsV-induced shedding needs to be demonstrated. If HS is actually shed, staining of the cell periphery could increase with the antibody 3G10, which detects the HS neoepitope created following heparinase cleavage.

    3. Reviewer #2 (Public review):

      Summary:

      Massenberg and colleagues aimed to understand how Human papillomavirus particles that bind to the extracellular matrix (ECM) transfer to the cell body for later uptake, entry, and infection. The binding to ECM is key for getting close to the virus's host cell (basal keratinocytes) after a wounding scenario for later infection in a mouse vaginal challenge model, indicating that this is an important question in the field.

      Strengths:

      The authors take on a conceptually interesting and potentially very important question to understand how initial infection occurs in vivo. The authors confirm previous work that actin-based processes contribute to virus transport to the cell body. The superresolution microscopy methods and data collection are state-of-the art and provide an interesting new way of analysing the interaction with host cell proteins on the cell surface in certain infection scenarios. The proposed hypothesis is interesting and, if substantiated, could significantly advance the field.

      Weaknesses:

      As a study design, the authors use infection of HaCaT keratinocytes, and follow virus localisation with and without inhibition of actin polymerisation by cytochalasin D (cytoD) to analyse transfer of virions from the ECM to the cell by filopodial structures using important cellular proteins for cell entry as markers.

      First, the data is mostly descriptive besides the use of cytoD, and does not test the main claim of their model, in which virions that are still bound to heparan sulfate proteoglycans are transferred by binding to tetraspanins along filopodia to the cell body.

      Second, using cytoD is a rather broad treatment that not only affects actin retrograde flow, but also virus endocytosis and further vesicular transport in cells, including exocytosis. Inhibition of myosin II, e.g., by blebbistatin, would have been a better choice as it, for instance, does not interfere with endocytosis of the virus.

      Third, the authors aim to study transfer from ECM to the cell body and the effects thereof. However, there are substantial, if not the majority of, viruses that bind to the cell body compared to ECM-bound viruses in close vicinity to the cells. This is in part obscured by the small subcellular regions of interest that are imaged by STED microscopy, or by the use of plasma membrane sheets. As a consequence, the obtained data from time point experiments is skewed, and remains for the most part unconvincing due to the fact that the origin of virions in time and space cannot be taken into account. This is particularly important when interpreting association with HS, the tetraspanin CD151, and integral alpha 6, as the low degree of association could originate from cell-bound and ECM-transferred virions alike.

      Fourth, the use of fixed images in a time course series also does not allow for understanding the issue of a potential contribution of cell membrane retraction upon cytoD treatment due to destabilisation of cortical actin. Or, of cell spreading upon cytoD washout. The microscopic analysis uses an extension of a plasma membrane stain as a marker for ECM-bound virions, which may introduce a bias and skew the analysis.

      Fifth, while the use of randomisation during image analysis is highly recommended to establish significance (flipping), it should be done using only ROIs that have a similar density of objects for which correlations are being established. For instance, if one flips an image with half of the image showing the cell body, and half of the image ECM, it is clear that association with cell membrane structures will only be significant in the original. I am rather convinced that using randomisation only on the plasma membrane ROIs will not establish any clear significance of the correlating signals. Also, there should be a higher n for the measurements.

    4. Author response:

      Reviewer #1 (Public review):

      The authors' goal was to arrest PsV capsids on the extracellular matrix using cytochalasin D. The cohort was then released, and interaction with the cell surface, specifically with CD151, was assessed.

      The model that fragmented HS associated with released virions mediates the dominant mechanism of infectious entry has only been suggested by research from a single laboratory and has not been verified in the 10+ years since publication. The authors are basing this study on the assumption that this model is correct, and these data are referred to repeatedly as the accepted model despite much evidence to the contrary.

      Please note that we state in the introduction on line 65/66 ´Two release mechanisms are discussed, that mutually are not exclusive´. This is implying that we do not consider the shedding model as the one accepted model. HS may associate with PsVs despite of a decreased affinity and only after priming (see below the ‘priming model’) may translocate to the cell body.

      Furthermore, we do not state in the discussion either that the shedding model is the preferred one; although it is correct that we refer to the shedding model more extensively, simply because we find HS associated with transferred PsVs, which is in line with this model and requires its citation.

      The discussion in lines 65-71 concerning virion and HSPG affinity changes is greatly simplified. The structural changes in the capsid induced by HS interaction and the role of this priming for KLK8 and furin cleavage have been well researched. Multiple laboratories have independently documented this. If this study aims to verify the shedding model, additional data need to be provided.

      As outlined above, our finding is compatible with both models, and we do not aim to verify the shedding model or disprove the priming model.

      It appears that the referee wishes more visibility of the priming model. Inhibition of KLK8 and furin should reduce the translocation to the cell body, no matter whether PsVs carry HS on their surface or not. For revision, we plan an experiment as in Figure 3 (CytD), testing whether either KLK8 or furin inhibition blocks the transfer to the cell body. Then, our data can be discussed also in the context of the priming model and by this increase its visibility.

      The model should be fitted into established entry events, or at minimum, these conflicting data, a subset of which is noted below, need to be acknowledged.

      (1) The Sapp lab (Richards et al., 2013) found that HSPG-mediated conformational changes in L1 and L2 allowed the release of the virus from primary binding and allowing secondary receptor engagements in the absence of HS shedding.

      (2) Becker et al. found that furin-precleaved capsids could infect cells independently of HSPG interaction, but this infection was still inhibited with cytochalasin D.

      (3) Other work from the Schelhaas lab showed that cytochalasin D inhibition of infection resulted in the accumulation of capsids in deep invaginations from the cell surface, not on the ECM

      (4) Selinka et al., 2007, showed that preventing HSPG-induced conformational changes in the capsid surface resulted in noninfectious uptake that was not prevented with cytochalasin D.

      (5) The well-described capsid processing events by KLK8 and furin need to be mechanistically linked to the proposed model. Does inhibition of either of these cleavages prevent engagement with CD151?

      The authors need to consider an explanation for these discrepancies.

      That PsVs carry HS-cleavage products doesn´t imply that HS cleavage is sufficient or required for infection. Therefore, we do not view our data as being in conflict with the priming model. In fact, our observations are compatible with aspects of both the shedding and the priming model.

      Yet, we acknowledge that the study would gain importance by directly testing the priming model within our experimental system. As requested by the referee, we will discuss the above papers, and further plan to test KLK8 and furin inhibitors.

      Other issues:

      (1) Line 110-111. The statement about PsVs in the ECM being too far away from the cell surface to make physical contact with the cell surface entry receptors is confusing. ECM binding has not been shown to be an obligatory step for in vitro infection.

      Not obligatory, but strongly supportive (Bienkowska-Haba et al., Plos Path., 2018; Surviladze et al., J. Gen. Viro., 2015). As recently published by the Sapp lab (Bienkowska-Haba et al., Plos Path., 2018), ´Direct binding of HPV16 to primary keratinocytes yields very inefficient infection rates for unknown reasons.´ Moreover, the paper shows that HaCaT cell ECM binding of PsVs increases the infection of NHEK by 10-fold and of HFK by almost 50-fold.

      This idea is referred to again on lines 158-159 and 199. The claim (line 158) that PsV does not interact with the cell within an hour needs to be demonstrated experimentally and seems at odds with multiple laboratories' data. PsV has been shown to directly interact with HSPG on the cell surface in addition to the ECM. Why are these PsVs not detected?

      We do not question that in many cellular systems PsVs interact with heparan sulfate proteoglycans (HSPGs) present on the cell surface, or both on the cell surface and the ECM. We stated in the manuscript on line 59 ´While in cell culture virions bind to HS of the cell surface and the ECM, it has been suggested that in vivo they bind predominantly to HS of the extracellular basement membrane (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010).´

      Moreover, we ourselves detect these PsVs, for example, in Figure 5A (CytD, 0 min time point), a handful of PsVs localize to the cell body area. However, the large majority overlaps with the strong HS staining at the cell periphery, likely the ECM. An accurate quantification of the fractions of PsVs bound to the ECM/cell body is for the following reasons very difficult. First, the ECM PsVs are very dense and therefore not microscopically resolved into single PsVs, at least not completely (see Figure 1C; the high intensity spots are non-resolved PsVs, please see our discussion on line 148 - 152). For this reason, by just counting spots we strongly underestimate the ECM PsVs versus the cell body PsVs. Second, with the available immunostainings we cannot exactly delineate the ECM from the cell body. In particular, at the cell border region (for example see Figure 4B) we often observe PsV accumulations. Assigning these ´cell border region PsVs´ entirely to the cell body fraction, a preliminary analysis (correcting for the limitation of non-resolved ECM PsVs) suggests that about a quarter of the PsVs bind to the cell body. On the other hand, assigning them to the ECM, the cell body fraction would be much below 10%. Third, we observe that in regions devoid of ECM and cells PsVs apparently adhere unspecifically to the glass-coverslip. This suggests that some of the cell body PsVs are just unspecific background. Subtraction of a background PsV density from the ECM and cell body PsV density will reduce relatively more the cell body PsVs, and consequently decreases the fraction of cell body PsVs even more.

      Moreover, in the course of the project we wondered whether at the basolateral membrane there are not many binding sites anyway. To address this question, in an unpublished experiment, we detached HaCaT cells with trypsin, incubated them with PsVs, and then allowed reattachment to assess the binding in suspension. We detected minimal to no binding, which, however, could also result from apical membrane adherence to the coverslip or trypsin-mediated cleavage of HSPGs. As suggested by the reviewing editor, we agree that repeating this experiment using EDTA for detachment—thus preserving HSPGs—would offer more definitive insight into binding efficiency in the absence of accessibility constraints. In summary, the reason why in our cellular system most PsVs do not bind to the cell surface could be a combination of several factors:

      (1) The primary binding partners are more abundant in the ECM and the polarized HaCaT cells secrete more ECM when compared to other cultured cells used to study HPV infection. This promotes ECM binding.

      (2) In the polarized HaCaT cells, the apical membrane is largely devoid of syndecan-1, CD151 and Itga6, wherefore PsVs infect the cell via the basolateral membrane. However, the accessibility to the basolateral membrane is restricted, PsVs must diffuse through a narrow slit between the glass coverslip and the attached cell to reach HS on the cell surface. This limits cell surface binding.

      (3) If HaCaT cells secrete large amounts of ECM, the may become depleted from cell surface HS. As outlined above, we will try to find out how many PsVs bind to the basolateral membrane in the absence of restricted accessibility. If it turns out that HaCaT cells have not many binding sites anyway, this would additionally promote binding to the ECM.

      The outcome of the above issues, and how we will mention them in the revised version of the manuscript, is open. In any case, we would like to point out that PsVs bound to the cell body do not weaken our main conclusion. Still, we recognize that this point merits attention and plan several modifications of the manuscript. We did already, but now we will mention more explicitly that PsVs have been shown to directly interact with HSPG on the cell surface, in addition to the ECM, but that it also has been shown that the ECM strongly supports infection in NHEK and HFK (Bienkowska-Haba et al., Plos Path., 2018). The following is a draft version of a paragraph we plan to incorporate, explaining the above issue and why we used in our experiments HaCaT cells:

      ´In vitro, PsVs bind to both the cell surface and the ECM, as has been widely documented. In vivo, however, it has been proposed that initial binding occurs predominantly to the basement membrane ECM, rather than directly to the cell surface (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010). This distinction reinforces the physiological relevance of ECM-bound particles in the early steps of HPV infection. Support for a functional role of ECM-mediated entry comes from a study showing that PsV binding to ECM derived from HaCaT cells significantly enhances infection of primary keratinocytes (Bienkowska-Haba et al., 2018). For these reasons, we specifically chose polarized HaCaT cells as a model system. These cells secrete abundant ECM from which the cells readily collect bound PsVs. On the other hand, the polarization limits the access of PsVs to basolateral receptors such as CD151 and Itgα6, and also cell body resident Syndecan-1, the most abundant HSPG in keratinocytes (Rapraeger et al., 1986; Hayashi et al., 1987; Kim et al., 1994). Hence, as polarization limits direct cell surface accessibility it biases binding toward the ECM, that in this culture system is abundant. Hence, in the HaCaT cell culture system, like probably in vivo, PsVs cannot circumvent binding to the ECM what they can do in unpolarized cell cultures that may not even secrete significant amounts of ECM. Altogether, this experimental situation closely mimics the in vivo situation where PsVs bind preferentially to the ECM (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010).´

      We appreciate the reviewer’s input and believe these additions will strengthen the manuscript with regard to the relevance of the used cellular model system.

      (2) The experiments shown in Figure 5 need to be better controlled. Why is there no HS staining of the cell surface at the early timepoints? This antibody has been shown to recognize N-sulfated glucosamine residues on HS and, therefore, detects HSPG on the ECM and cell surface.

      We have shown all images at the same adjustments of brightness and contrast. As the staining at the periphery is stronger, the impression is given that the cell surface is not stained, although there is some staining. Specific staining is documented in Figure 5D, showing the PCC between PsVs and HS only of the cell body. If there was no HS staining, the PCC would be zero, which is not the case. Yet, it is lower when compared to the PCC measured at the cell border region, with more strongly stained HS.

      We will provide images at different contrast and brightness adjustments enabling the reader to see the staining on the cell surface. We will provide also more overview images to illustrate the strong variability of the HS staining between cells.

      Therefore, the conclusion that this confirms HS coating of PsV during release from the ECM (line 430-431) is unfounded. How do the authors distinguish between "HS-coated virions" and HSPG-associated virions?

      The HS intensity transiently increases on the cell body (Fig. 5D) only after releasing a cohort of PsVs, which can be only explained by PsVs that carry HS from the ECM to the cell body. However, the effect is not significant. Using the antibody 3G10 detecting the HS neoepitope (see the referees’ suggestion below) we will reanalyze this point. This should help clarifying the issue.

      It is difficult to comprehend how the addition of 50 vge/cell of PsV could cause such a global change in HS levels.

      The distribution of bound PsVs largely varies between cells. Some areas are covered with essentially confluent cells, to which hardly any PsVs are bound, because accessing the basolateral membrane of confluent cells is nearly impossible, and PsVs do not bind to the exposed apical membrane. This is different in cultures of unpolarized cells where we expect that PsVs distribute more equally over cells.

      This means that in our experiments the vge/cell is not a suitable parameter for relating the magnitude of an effect to a defined number of PsVs. In the ECM, the PsV density is very high, enabling one cell to collect several hundred PsVs, much more than expected from the 50 vge/cell. We will point this out in the revised version.

      The claim that the HS levels are decreased in the non-cytochalasin-treated cells due to PsV-induced shedding needs to be demonstrated.

      We did not claim that PsVs induce shedding, we rather believe they just take shedded HS with them. Without PsVs, the shedded HS likely remains in the ECM or is washed out very slowly.

      If HS is actually shed, staining of the cell periphery could increase with the antibody 3G10, which detects the HS neoepitope created following heparinase cleavage.

      As outlined above, we plan to test the suggested antibody 3G10. We also plan to repeat the 0 min time point (with and without PsVs, with and without CytD) to find out whether in the PsV absence the HS intensity (at 0 min) is unchanged between control and CytD.

      Reviewer #2 (Public review):

      Summary:

      Massenberg and colleagues aimed to understand how Human papillomavirus particles that bind to the extracellular matrix (ECM) transfer to the cell body for later uptake, entry, and infection. The binding to ECM is key for getting close to the virus's host cell (basal keratinocytes) after a wounding scenario for later infection in a mouse vaginal challenge model, indicating that this is an important question in the field.

      Strengths:

      The authors take on a conceptually interesting and potentially very important question to understand how initial infection occurs in vivo. The authors confirm previous work that actin-based processes contribute to virus transport to the cell body. The superresolution microscopy methods and data collection are state-of-the art and provide an interesting new way of analysing the interaction with host cell proteins on the cell surface in certain infection scenarios. The proposed hypothesis is interesting and, if substantiated, could significantly advance the field.

      Weaknesses:

      As a study design, the authors use infection of HaCaT keratinocytes, and follow virus localisation with and without inhibition of actin polymerisation by cytochalasin D (cytoD) to analyse transfer of virions from the ECM to the cell by filopodial structures using important cellular proteins for cell entry as markers.

      First, the data is mostly descriptive besides the use of cytoD, and does not test the main claim of their model, in which virions that are still bound to heparan sulfate proteoglycans are transferred by binding to tetraspanins along filopodia to the cell body.

      The study identifies a rapid translocation step from the ECM to the cell body. We have no data that demonstrates a physical interaction between PsVs and CD151. In the model figure, we draw CD151 as part of the secondary receptor complex. We are sorry for having raised the impression that PsVs would bind directly to CD151 and will rephrase the respective section.

      Second, using cytoD is a rather broad treatment that not only affects actin retrograde flow, but also virus endocytosis and further vesicular transport in cells, including exocytosis. Inhibition of myosin II, e.g., by blebbistatin, would have been a better choice as it, for instance, does not interfere with endocytosis of the virus.

      We agree, and plan to test whether blebbistatin is equally efficient in blocking the transfer.

      Third, the authors aim to study transfer from ECM to the cell body and the effects thereof. However, there are substantial, if not the majority of, viruses that bind to the cell body compared to ECM-bound viruses in close vicinity to the cells.

      We agree that in multiple cell culture systems viruses bind preferentially to the cell directly. But we respectfully disagree with the assertion that the majority of PsVs bind to the cell body of HaCaT keratinocytes. As noted above (e.g., Figure 5A, CytD, 0 min), only a small fraction of PsVs localize to the cell body, whereas the vast majority overlap with intense HS staining at the cell periphery, consistent with ECM association, as the accessibility to the basolateral expressed HSPG is limited (see above). Based on quantitative estimation from multiple images, ECM-bound PsVs largely outnumber cell-bound particles (see above). These features make HaCaT cells a suitable in vitro model for mimicking in vivo conditions, where HPV has been proposed to bind predominantly to the basement membrane ECM rather than the cell surface (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010) which also strongly enhances infection of primary keratinocytes in vitro (Bienkowska-Haba et al., 2018).

      Thus, we believe our system appropriately models the physiologically relevant scenario of ECM-to-cell transfer, and the observed predominance of ECM binding supports the validity of our experimental focus.

      This is in part obscured by the small subcellular regions of interest that are imaged by STED microscopy, or by the use of plasma membrane sheets. As a consequence, the obtained data from time point experiments is skewed, and remains for the most part unconvincing due to the fact that the origin of virions in time and space cannot be taken into account. This is particularly important when interpreting association with HS, the tetraspanin CD151, and integral alpha 6, as the low degree of association could originate from cell-bound and ECM-transferred virions alike.

      As stated above, we observe massive binding of PsVs to the ECM, in contrast to very few PsVs that diffuse beneath the basolateral membrane of the polarized HaCaT cells and do bind directly to the cell surface (or maybe they are simply trapped between glass and basolateral membrane). PsVs are not expected to bind to the apical membrane that is depleted from CD151 and Itga6. In other cellular systems, cells may hardly secrete ECM, are not polarized, and do not adhere so tightly to the substrate. In other cultures, where virions can easily circumvent ECM binding, the large majority of PsVs will likely bind directly to the cell surface.

      As outlined above, in order to quantify PsVs that can bind without restricted accessibility, we plan to detach HaCaT cells by EDTA from the substrate, incubate them with PsVs, and let them adhere again (please see above).

      No matter what is the outcome, the fraction of PsVs that binds directly to the cell surface does not weaken our conclusion that we have identified a very fast and efficient transfer step from the ECM to the cell body.

      Fourth, the use of fixed images in a time course series also does not allow for understanding the issue of a potential contribution of cell membrane retraction upon cytoD treatment due to destabilisation of cortical actin. Or, of cell spreading upon cytoD washout.

      If blebbistatin works as expected, we can safely conclude that we observe the very same process as described in Scheelhas et al., PLoS Pathogens, 2008, showing that the PsVs migrate by retrograde transport to the cell surface and not that the cell spreads out and by this reaches the PsVs.

      The microscopic analysis uses an extension of a plasma membrane stain as a marker for ECM-bound virions, which may introduce a bias and skew the analysis.

      Our plasma membrane stain does not stain the ECM. Please see Figure 1. The stain is actually used to distinguish the cell body from the ECM area.

      Fifth, while the use of randomisation during image analysis is highly recommended to establish significance (flipping), it should be done using only ROIs that have a similar density of objects for which correlations are being established.

      We agree that the way of how randomization is done is very important. Regarding the association of PsVs with CD151 and HS, based on flipped images, we generated a calibration curve used for the correction of random background. For details, please see Supplementary Figures 3 and 5.

      For instance, if one flips an image with half of the image showing the cell body, and half of the image ECM, it is clear that association with cell membrane structures will only be significant in the original. I am rather convinced that using randomisation only on the plasma membrane ROIs will not establish any clear significance of the correlating signals.

      Figure 5D shows the PCC specifically of the cell body. In flipped images (not shown in the manuscript for clarity, but can be added) we obtain a PCC of around zero.  For CytD, the flipped images always have a significantly lower PCC compared to the original images. In the control, the PCC of the flipped images are significantly lower only for the 30 min and 60 min time point. The non-significance of the 0 min and 180 min time point is due to low PCCs also in the original images.

      Also, there should be a higher n for the measurements.

      One n is the average of 15 cells. We realize that with n = 3 we find significant effects only if the effect is very strong or moderate with very low variance.

    1. eLife Assessment

      This valuable study provides outlines the mechanism by which repeated vaccination broadens the breadth of antibody responses against epitope unmatched virus strains. The authors' mathematical model is solid and incorporates various parameters that regulate B cell activation and antibody response.

    2. Reviewer #1 (Public Review):

      In this study, Deng et al. investigate the antibody response against HA antigen following repeated vaccination with the H1N1 2009 pandemic influenza vaccine strain, using in silico modeling. The proposed model provides valuable mechanistic insights into how the broadening of the antibody response takes place upon repeated vaccination.

      Overall, the authors' model effectively explains the mechanistic principles underlying antibody responses against the viral antigens harboring epitope immunodominancy.

    3. Reviewer #2 (Public Review):

      The authors have been studying the mechanism of breadth expansion in antibody responses with repeated vaccinations using their own mathematical model. In this study, they applied this mathematical model to a cohort data analyzing anti-HA antibody responses after multiple influenza virus vaccination and investigated the mechanism of antibody breadth expansion to diversified target viral strains.<br /> The manuscript is well written, and the mathematical model is well built that incorporates various parameters related to B cell activation in GC and EGC based on experimental data.

      Strengths:

      By carefully reanalyzing the published cohort data (Nunez IA et al 2017 PLoS One), they have clearly demonstrated that the repeated influenza virus vaccinations result in an expansion of the breadth to unmatched viral strains.

      Using their mathematical model, they have determined the major factors for the breadth expansion following multiple immunizations.

      Weaknesses:

      The overall concept of their model has already been published (Yang L et al 2023 Cell Reports) with a SRAS-CoV-2 vaccine model, and they have applied it to influenza virus vaccine in this study, with the conclusions being largely the same.

      It is unclear how the re-evaluation of public data in the first half part is related to the validation of their model in the later part.

      Other points:

      In the original data by Nurez LA et al., HAI (the inhibitory effect of anti-HA antibodies on the binding of HA to sialic acid on erythrocytes) was used as the lead-out. The authors conclude that the breadth expansion with repeated vaccinations is primarily due to the activation of B cells with BCRs that recognize minor common epitopes, induced by covering up of strain specific major epitopes by pre-existing antibodies. However, as they themselves show in Fig 1, once the sialic acid-binding region is covered, it seems difficult for another BCR to bind to this region. When the target epitope is limited like this, the effect of increasing antigen supply to DCs by pre-existing antibodies and the effect of increasing the presentation of minor epitopes appears to compete with each other. Could the author please explain this point? In relation to this point, please explain the meaning of analysis of the entire ectodomain when the original data's lead-out is HAI.

      Minor point:

      The description "The purpose of this model is ...." starting at line 171 and the description of "we obtain results in harmony with the clinical findings ...." starting at line 478 sound to be contradictory. As the authors themselves state at line 171, if the purpose of this model is not to fit the data but to demonstrate the principle, then the prudent sampling and reanalyzing data itself seems to have less meaning.

    4. Author response:

      Reviewer #1 (Public Review):

      In this study, Deng et al. investigate the antibody response against HA antigen following repeated vaccination with the H1N1 2009 pandemic influenza vaccine strain, using in silico modeling. The proposed model provides valuable mechanistic insights into how the broadening of the antibody response takes place upon repeated vaccination.

      Overall, the authors' model effectively explains the mechanistic principles underlying antibody responses against the viral antigens harboring epitope immunodominancy.

      We thank the Reviewer for their positive and thoughtful assessment of the work. We address issues raised in the revised manuscript and in the point-by-point responses below.

      Reviewer #2 (Public Review):

      The authors have been studying the mechanism of breadth expansion in antibody responses with repeated vaccinations using their own mathematical model. In this study, they applied this mathematical model to a cohort data analyzing anti-HA antibody responses after multiple influenza virus vaccination and investigated the mechanism of antibody breadth expansion to diversified target viral strains.

      The manuscript is well written, and the mathematical model is well built that incorporates various parameters related to B cell activation in GC and EGC based on experimental data.

      We thank the reviewer for their positive and thoughtful review and address issues raised in a revised version of the manuscript and in the point-by-point below.

      Strengths:

      By carefully reanalyzing the published cohort data (Nunez IA et al 2017 PLoS One), they have clearly demonstrated that the repeated influenza virus vaccinations result in an expansion of the breadth to unmatched viral strains.

      Using their mathematical model, they have determined the major factors for the breadth expansion following multiple immunizations.

      We thank the reviewer for pointing out the strengths of our study.

      Weaknesses

      The overall concept of their model has already been published (Yang L et al 2023 Cell Reports) with a SARS-CoV-2 vaccine model, and they have applied it to influenza virus vaccine in this study, with the conclusions being largely the same.

      It is unclear how the re-evaluation of public data in the first half part is related to the validation of their model in the later part.

      The reviewer is correct in that we build directly on our model published previously to study related phenomena for SARS-CoV-2. However, a critical advance of the work was to now ask whether antibody broadening following repeated homologous antigen exposure is a general feature of human humoral immunity. As we point out in the introduction of our manuscript, repeated exposure to the same antigen has long been assumed to predominantly boost strain limited humoral immunity, necessitating rational design of vaccines that re-orient antibody responses to target otherwise immune-subdominant targets. Hence, antibody broadening in response to homologous SARS-CoV-2 antigen points to reconsideration of that basic premise in immunology; and if we are to now define this as general feature of human antibody responses, then evaluation of the principle using a different vaccine protocol and antigen is necessitated. Accordingly, we took advantage of the influenza vaccine space where, within the immediate years following the 2009 H1N1 pandemic, the 2009 H1N1 strain was repeatedly applied as the seasonal vaccine strain. This HA was also novel (as it was from a pandemic virus pHA), meaning that traditional back-boosting to historical strains would be limited. We then re-evaluated the longitudinal HAI data of Nurez et al. to define whether a broadening to increasingly divergent vaccine-unmatched strains is observed upon repeated exposure to pHA. This was not done before and was enabled by incorporating our amino acid relatedness parameter and our structure-based definition of the RBS patch. To then query mechanistic origins of the broadening effect, we adapted and extended our previous computational model to: (1) better reflect HA epitope diversity and overlap within the RBS patch; and (2) to better reflect the influenza immunization regimens that are used clinically. The differences between the modeling done in this paper and that in Yang et al. 2023 are described in the Methods section separately. Taken together, our analyses of data in Nunez et al and our simulations strengthen the emerging view that repeated boosting with the same antigen enables the humoral immune system to diversify immune responses because of feedback regulation which leads to enhanced antigen on FDCs, persistent GCs, and epitope masking. This, in turn, enables the immune system to generalize to recognize and respond to unseen variant antigens that harbor mutations in the immunodominant epitopes. Our results point to a new and emerging paradigm regarding booster immunizations and fundamental features of the humoral immune system.

      Other points:

      In the original data by Nurez LA et al., HAI (the inhibitory effect of anti-HA antibodies on the binding of HA to sialic acid on erythrocytes) was used as the lead-out. The authors conclude that the breadth expansion with repeated vaccinations is primarily due to the activation of B cells with BCRs that recognize minor common epitopes, induced by covering up of strain specific major epitopes by pre-existing antibodies. However, as they themselves show in Fig 1, once the sialic acid-binding region is covered, it seems difficult for another BCR to bind to this region. When the target epitope is limited like this, the effect of increasing antigen supply to DCs by pre-existing antibodies and the effect of increasing the presentation of minor epitopes appears to compete with each other. Could the author please explain this point?

      We agree that accounting for epitope overlap is important when the target is limited, as the reviewer indicates. In Figure 6C vs 6D we assess steric effects of possible spatial overlap between dominant and subdominant epitopes. Under overlapping conditions, we find evidence for steric-based constrainment of broadening, as predicted by the reviewer. Depending upon the degree of overlap between the epitopes and differences in germline characteristics in the B cells targeting dominant and subdominant epitopes, this effect could be compensated during subsequent shots, as described by our results (see lines 392-406).

      We also now incorporate the following sentence into our discussion (lines 448-453):

      “Epitope masking will also be constrained by the dimensions of the RBS and our simulations do report attenuation of titers against historical influenza strains when we introduce epitope overlap. Depending upon the degree of overlap between the epitopes and differences in germline characteristics in the B cells targeting dominant and subdominant epitopes, this effect could be compensated during subsequent shots.”

      In relation to this point, please explain the meaning of analysis of the entire ectodomain when the original data's lead-out is HAI.

      We include side-by-side full length ectodomain versus RBS patch (sialic acid binding residues + antibody epitope ring) to demonstrate relatedness differences in the lead-out data. But it is precisely because of the point raised by the reviewer that we focus on using the RBS patch as the relatedness values to assess antibody broadening as defined by HAI activity (see Figure 3 and S2). 

      Minor point:

      The description "The purpose of this model is ...." starting at line 171 and the description of "we obtain results in harmony with the clinical findings ...." starting at line 478 sound to be contradictory. As the authors themselves state at line 171, if the purpose of this model is not to fit the data but to demonstrate the principle, then the prudent sampling and reanalyzing data itself seems to have less meaning.

      We respectfully disagree. Please see above point as to how the clinical data is more than just “reanalyzing” but to first discover the previously unreported broadening effect across highly divergent strains following sequential immunization with homologous antigen in the influenza vaccine space; we then extended and adapted our computational model for the influenza vaccination paradigm to gain mechanistic insight on how such antibody broadening may occur. The word “harmony” was not meant to imply quantitative agreement, and apologize if it caused confusion.

    1. eLife Assessment

      This important study by Wu et al presents convincing data on bacterial cell organization, demonstrating that the two structures that account for bacterial motility - the chemotaxis complex and the flagella - colocalize to the same pole in Pseudomonas aeruginosa cells, and expose the regulation underlying their spatial organization and functioning. This manuscript will be of interest to cell biologists, primarily those studying bacteria.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Wu et al presents interesting data on bacterial cell organization, a field that is progressing now, mainly due to the advances in microscopy. Based mainly on fluorescence microscopy images, the authors aim to demonstrate that the two structures that account for bacterial motility, the chemotaxis complex and the flagella, colocalize to the same pole in Pseudomonas aeruginosa cells and to expose the regulation underlying their spatial organization and functioning.

      Comments on revisions:

      The authors have addressed all major and minor points that I raised in a satisfying way during the revision process. The work can now be regarded as complete: , the assumptions were clarified, the results are convincing, the conclusions are justified, and the novelty has been made clear. This manuscript will be of interest to cell biologists, mainly those studying bacteria, but not only

    3. Reviewer #2 (Public review):

      Summary:

      Here, the authors studied the molecular mechanisms by which the chemoreceptor cluster and flagella motor of Pseudomonas aeruginosa (PA) are spatially organized in the cell. They argue that FlhF is involved in localizing the receptors and motor to the cell pole, but a separate mechanism colocalizes them. Finally, the authors argue that the functional reason for this colocalization is to insulate chemotactic signaling from other signaling pathways, such as cyclic-di-GMP signaling.

      Strengths:

      The experiments and data are high quality. It is clear that the motor and receptors co-localize, and that elevated CheY levels lead to elevated c-di-GMP. The signaling crosstalk argument is plausible.

    4. Reviewer #3 (Public review):

      Summary:

      The authors investigated the assembly and polar localization of the chemosensory cluster in P. aeruginosa. They discovered that a certain protein (FlhF) is required for the polar localization of the chemosensory cluster while core motor structures are necessary for the assembly of the cluster. They found that flagella and chemosensory clusters always co-localize in the cell; either at the cell pole in wild type cells or randomly-located in the cell in FlhF mutant cells. They hypothesize that this co-localization is required to keep the level of another protein (CheY-P), which controls motor switching, at low levels as the presence of high-levels of this protein (if the flagella and chemosensory clusters were not co-localized) is associated with high-levels of c-di-GMP and cell aggregations.

      Strengths:

      The manuscript is clearly-written and straightforward. The authors applied multiple techniques to study the bacterial motility system including fluorescence light microscopy and gene editing. In general, the work enhances our understanding of the subtlety of interaction between the chemosensory cluster and the flagellar motor to regulate cell motility. This work will be of interest to bacteriologists and cell biologists in general.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      The study by Wu et al presents interesting data on bacterial cell organization, a field that is progressing now, mainly due to the advances in microscopy. Based mainly on fluorescence microscopy images, the authors aim to demonstrate that the two structures that account for bacterial motility, the chemotaxis complex and the flagella, colocalize to the same pole in Pseudomonas aeruginosa cells and to expose the regulation underlying their spatial organization and functioning.

      Comments on revisions:

      The authors have addressed all major and minor points that I raised in a satisfying way during the revision process. The work can now be regarded as complete, the assumptions were clarified, the results are convincing, the conclusions are justified, and the novelty has been made clear.

      This manuscript will be of interest to cell biologists, mainly those studying bacteria, but not only.

      Reviewer #2 (Public review):

      Summary:

      Here, the authors studied the molecular mechanisms by which the chemoreceptor cluster and flagella motor of Pseudomonas aeruginosa (PA) are spatially organized in the cell. They argue that FlhF is involved in localizing the receptors-motor to the cell pole, and even without FlhF, the two are colocalized. Finally, the authors argue that the functional reason for this colocalization is to insulate chemotactic signaling from other signaling pathways, such as cyclic-di-GMP signaling.

      Strength:

      The experiments and data are high quality. It is clear that the motor and receptors co-localize, and that elevated CheY levels lead to elevated c-di-GMP.

      Weakness:

      The explanation for the functional importance of receptor-motor colocalization is plausible but is still not conclusively demonstrated. Colocalization might reduce CheY levels throughout the cell in order to reduce cross-talk with c-di-GMP. This would mean that if physiologically-relevant levels of CheYp near the pole were present throughout the cell, c-di-GMP levels would be elevated to a point that is problematic for the cell. Clearly demonstrating this seems challenging.

      We acknowledge that directly proving the necessity of colocalization to prevent problematic c-di-GMP elevation is experimentally challenging, as it would require creating a system where CheY-P is artificially distributed throughout the cell at physiologically relevant concentrations while maintaining normal chemotaxis function.

      However, our data provide several lines of evidence supporting this model. First, we show that CheY overexpression leads to substantial c-di-GMP elevation (71.8% increase) and cell aggregation, demonstrating that elevated CheY levels can indeed cause problematic cross-pathway interference. Second, previous work has shown that CheY-P levels near the pole are an order of magnitude higher than in the rest of the cell (ref. 46). If this elevated CheY-P concentration near the pole were present throughout the cell, our data suggest that c-di-GMP levels would be elevated sufficiently to cause cell aggregation (Fig. 4A), thereby disabling normal motility and chemotaxis. Third, the dose-dependent relationship between CheY concentration and aggregation phenotype supports the idea that precise spatial regulation of CheY levels is functionally important for avoiding cross-pathway interference.

      Reviewer #3 (Public review):

      Summary:

      The authors investigated the assembly and polar localization of the chemosensory cluster in P. aeruginosa. They discovered that a certain protein (FlhF) is required for the polar localization of the chemosensory cluster while a fully-assembled motor is necessary for the assembly of the cluster. They found that flagella and chemosensory clusters always co-localize in the cell; either at the cell pole in wild type cells or randomly-located in the cell in FlhF mutant cells. They hypothesize that this co-localization is required to keep the level of another protein (CheY-P), which controls motor switching, at low levels as the presence of high-levels of this protein (if the flagella and chemosensory clusters were not co-localized) is associated with high-levels of c-di-GMP and cell aggregations.

      Strengths:

      The manuscript is clearly written and straightforward. The authors applied multiple techniques to study the bacterial motility system including fluorescence light microscopy and gene editing. In general, the work enhances our understanding of the subtlety of interaction between the chemosensory cluster and the flagellar motor to regulate cell motility.

      Weaknesses:

      The major weakness for me in this paper is that the authors never discussed how the flagellar genes expression is controlled in P. aeruginosa. For example, in E. coli there is a transcriptional hierarchy for the flagellar genes (early, middle, and late genes, see Chilcott and Hughes, 2000). Similarly, Campylobacter and Helicobacter have a different regulatory cascade for their flagellar genes (See Lertsethtakarn, Ottemann, and Hendrixson, 2011). How does the expression of flagellar genes in P. aeruginosa compare to other species? how many classes are there for these genes? is there a hierarchy in their expression and how does this affect the results of the FliF and FliG mutants? In other words, if FliF and FliG are in class I (as in E. coli) then their absence might affect the expression of other later flagellar genes in subsequent classes (i.e., chemosensory genes). Also, in both FliF and FliG mutants no assembly intermediates of the flagellar motor are present in the cell as FliG is required for the assembly of FliF (see Hiroyuki Terashima et al. 2020, Kaplan et al. 2019, Kaplan et al. 2022). It could be argued that when the motor is not assembled then this will affect the expression of the other genes (e.g., those of the chemosensory cluster) which might play a role in the decreased level of chemosensory clusters the authors find in these mutants.

      We thank the reviewer for the valuable suggestions. In the revised manuscript, we have further elaborated on the regulatory control of flagellar genes expression in P. aeruginosa (see our response to comment #4).

      Comments on revisions:

      I believe the authors have performed additional experiments that improved their manuscript and they have answered many of my comments and those of the other reviewers. I am supportive of publishing this manuscript, but I still find the following points that are not clear to me (probably I am misunderstanding some points; the authors can clarify).

      (1) In response to reviewer 1, the authors say that they "analyzed and categorized the distribution of the chemotaxis complex in both wild-type and flhF mutant strains into three patterns: precise-polar, near-polar, and mid-cell localization." I can see what they mean by polar and mid-cell, but near-polar sounds a bit elusive? Can they provide examples of this stage and mention how accurately they can identify it? Also, do the pie charts they show in Figure S4 really show "significant alterations"? There is a difference between 98% and 85% as they mention in their response to reviewer 1, but I am not sure that this is significant? Probably they can explain/change the language in the text? Also, the number of cells they counted for FlhF mutant is more than the double of other strains (WT and FlhF FliF mutant)?

      We thank the reviewer for the valuable suggestions. To clarify, we divided the intracellular area along the cell's long axis into three domains: the two ends each representing 10% of the length as the precise-polar domain, the central 50% as the mid-cell domain, and the remaining regions between these as the near-polar domain. The localization pattern of the chemotaxis complex was assigned based on the position of the fluorescence intensity centroid within these domains.

      Regarding the significance of the changes, you are correct to question our language. When flhF was knocked out, the proportion of chemotaxis complexes with precise-polar distribution decreased from 98% to 85% - a 13% reduction. While this represents a measurable shift in localization pattern, describing this as "significant alterations" was probably imprecise. We have revised this language to more accurately reflect the magnitude of the change (lines 169-177).

      For the cell counting, we increased the sample size for the flhF mutant because this strain exhibited the appearance of mid-cell localization (approximately 5% of cells), which was not observed in wild-type or flhF fliF double mutant strains. To accurately quantify this rare phenotype and ensure statistical reliability, we analyzed more cells for this particular strain. This explains why the flhF mutant dataset contains approximately double the number of cells compared to the other strains.

      We have redrawn Figure S4 to include a clear schematic diagram of the cell partitioning method and provided representative examples of each localization pattern (precise-polar, near-polar, and mid-cell) to better illustrate how we distinguished between these categories.

      (2) One thing that also confused me is the following: One point that the authors stress is that FlhF localizes both the flagellum and the chemoreceptors to the pole. However, if I look at Figure 2B, the flagellum and the chemoreceptors still co-localize together (although not at the pole). If FlhF was responsible for co-localizing both of them to the pole, then wouldn't one expect them to be randomly localized in this mutant and by that I mean that they do not co-localize but that each of them (the flagellum and the chemoreceptors) are located in a different random location of the cell (not co-localized). The fact that they are still co-localized together in this mutant could also be interpreted by, for example, that FlhF localizes the flagellum to the pole and another mechanism localizes the chemoreceptors to the flagellum, hence, they still co-localize in this mutant because the chemoreceptors follow the flagellum by another mechanism to wherever it goes?

      Thank you for this insightful observation. You are correct that our current experimental results do not definitively establish that FlhF directly localizes both the flagellum and chemoreceptors to the pole independently. The persistent colocalization of flagella and chemoreceptors in the DflhF mutant, even when both are mislocalized away from the pole, actually suggests a more complex regulatory mechanism than we initially proposed.

      This observation highlights an important distinction between polar targeting and colocalization maintenance. Our data suggest that FlhF influences the polar targeting of the flagellum-chemoreceptor assembly, but the colocalization itself appears to be governed by a different mechanism that operates independently of FlhF. This could involve direct protein-protein interactions between flagellar and chemotaxis components, or shared assembly machinery that we have yet to identify.

      To better reflect this interpretation, we have revised the subsection title (line 150). We have also modified the relevant discussion (line 180) to more accurately describe FlhF’s role in polar targeting rather than claiming it directly controls chemoreceptor localization.

      (3) In the response to reviewers, the authors mention "suggesting that the assembly of the receptor complex is likely influenced mainly by the C-ring and MS-ring structures rather than by the P ring". However, in the article, they still write "The complete assembly of the motor serves as a partial prerequisite for the assembly of the chemotaxis complex, and its assembly site is also regulated by the polar anchor protein FlhF" despite their FlgI results which is not in accordance with this statement? Also, As I mentioned in my previous report, in FliG and FliF mutant the motor does not assemble (see Hiroyuki Terashima et al. 2020., and Kaplan et al., 2022).

      We thank the reviewer for the suggestions and acknowledge the contradictions in our original text. You are correct that in DfliF and DfliG mutants, the flagellar motor does not assemble, while the P ring (FlgI) functions as a bushing for the peptidoglycan layer and its absence does not prevent motor assembly.

      Our DflgI results, which showed normal chemotaxis complex assembly similar to wild-type, clearly demonstrate that the P ring is not required for chemoreceptor complex formation. This contradicts our original statement that "complete assembly of the motor serves as a partial prerequisite for the assembly of the chemotaxis complex."

      We have corrected this inconsistency by: 1) Revising the subsection title (line 186) to more accurately reflect that core motor structures, rather than complete motor assembly, influences chemoreceptor complex formation. 2) Modifying sentences in the introduction (lines 97-98) to better align with our experimental findings.

      (4) The authors have said in their response to my point "and currently, there is no evidence that FliA activity is influenced by proteins like FliG". I just want to clarify what I meant in my previous report: In E. coli, FliA binds to FlgM, and when the hook is assembled FlgM is secreted outside the cell allowing FliA to trigger the transcription of class III genes, which include the chemosensory genes (see Figure 5 in Beeby et al, 2020 in FEMS Microbiology, and Chilcott and Hughes, 2000). This implies that if the hook is not built, then late genes (including the chemoreceptors) should not be present. However, in Kaplan et al., 2019, the authors imaged a FliF mutant in Shewanella oneidensis (Figure S3) and still saw that chemoreceptors are present (I believe the authors must highlight this). This suggests that species such as Shewanella and Pseudomonas have a different assembly process than that E. coli, and although the authors say that in the text, I believe they still can refine this part more in the spirit of what I wrote here.

      We thank the reviewer for the important clarification regarding the differences in transcriptional regulation among bacterial species. We agree that the observation of chemoreceptors in Shewanella oneidensis DfliF mutants (Kaplan et al., 2019) represents a significant deviation from the well-characterized E. coli model and merits stronger emphasis. In response, we have expanded the discussion to more clearly highlight the critical distinctions in the transcriptional regulatory circuits governing flagellar and chemoreceptor biogenesis between E. coli and species such as Shewanella oneidensis and Pseudomonas aeruginosa (lines 351-363).

      I do not like to ask for additional experiments in the second round of review, so for me if the authors modify the text to tackle these points and allow for probable alternative explanations/ highlight gaps/ modify language used for some claims, then that is fine with me.

      Reviewer #2 (Recommendations for the authors):

      It is plausible that colocalization reduces CheY levels throughout the cell in order to reduce cross-talk with c-di-GMP. This would mean that if physiologically-relevant levels of CheYp near the pole were present throughout the cell, c-di-GMP levels would be elevated to a point that is problematic for the cell. Clearly demonstrating this seems challenging.

      We acknowledge that directly proving the necessity of colocalization to prevent problematic c-di-GMP elevation is experimentally challenging, as it would require creating a system where CheY-P is artificially distributed throughout the cell at physiologically relevant concentrations while maintaining normal chemotaxis function.

      However, our data provide several lines of evidence supporting this model. First, we show that CheY overexpression leads to substantial c-di-GMP elevation (71.8% increase) and cell aggregation, demonstrating that elevated CheY levels can indeed cause problematic cross-pathway interference. Second, previous work has shown that CheY-P levels near the pole are an order of magnitude higher than in the rest of the cell (ref. 46). If this elevated CheY-P concentration near the pole were present throughout the cell, our data suggest that c-di-GMP levels would be elevated sufficiently to cause cell aggregation (Fig. 4A), thereby disabling normal motility and chemotaxis. Third, the dose-dependent relationship between CheY concentration and aggregation phenotype supports the idea that precise spatial regulation of CheY levels is functionally important for avoiding cross-pathway interference.

    1. eLife Assessment

      This important computational study investigates homeostatic plasticity mechanisms that neurons may employ to achieve and maintain stable target activity patterns. The work extends previous analyses of calcium-dependent homeostatic mechanisms based on ion channel density by considering activity-dependent shifts in channel activation and inactivation properties that operate on faster and potentially variable timescales. The model simulations provide solid evidence for the potential functional importance of these mechanisms.

    2. Reviewer #1 (Public review):

      This revision of the computational study by Mondal et al addresses several issues that I raised in the previous round of reviews and, as such, is greatly improved. The manuscript is more readable, its findings are more clearly described, and both the introduction and the discussion section are tighter and more to the point. And thank you for addressing the three timescales of half activation/inactivation parameters. It makes the mechanism clearer.

      Some issues remain that I bring up below.

      Comment:

      I still have a bone to pick with the claim that "activity-dependent changes in channel voltage-dependence alone are insufficient to attain bursting". As I mentioned in my previous comment, this is also the case for the gmax values (channel density). If you choose the gmax's to be in a reasonable range, then the statement above is simply cannot be true. And if, in contrast, you choose the activation/inactivation parameters to be unreasonable, then no set of gmax's can produce proper activity. So I remain baffled what exactly is the point that the authors are trying to make.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Mondal and co-authors present the development of a computational model of homeostatic plasticity incorporating activity-dependent regulation of gating properties (activation, inactivation) of ion channels. The authors show that, similar to what has been observed for activity-dependent regulation of ion channel conductances, implementing activity-dependent regulation of voltage sensitivity participates in the achievement of a target phenotype (bursting or spiking). The results however suggest that activity-dependent regulation of voltage sensitivity is not sufficient to allow this and needs to be associated with the regulation of ion channel conductances in order to reliably reach target phenotype. Although the implementation of this biologically relevant phenomenon is undeniably relevant, a few important questions are left unanswered.

      Strengths:

      (1) Implementing activity-dependent regulation of gating properties of ion channels is biologically relevant.

      (2) The modeling work appears to be well performed and provides results that are consistent with previous work performed by the same group.

      Weaknesses:

      (1) The main question not addressed in the paper is the relative efficiency and/or participation of voltage-dependence regulation compared to channel conductance in achieving the expected pattern of activity. Is voltage-dependence participating to 50% or 10%. Although this is a difficult question to answer (and it might even be difficult to provide a number), it is important to determine whether channel conductance regulation remains the main parameter allowing the achievement of a precise pattern of activity (or its recovery after perturbation).

      (2) Another related question is whether the speed of recovery is significantly modified by implemeting voltage-dependence regulation (it seems to be the case looking at Figure 3). More generally, I believe it would be important to give insights into the overall benefit of implementing voltage-dependence regulation, beyond its rather obvious biological relevance.

      (3) Along the same line, the conclusion about how voltage-dependence regulation and channel conductance regulation interact to provide the neuron with the expected activity pattern (summarized and illustrated in Figure 6) is rather qualitative. Consistent with my previous comments, one would expect some quantitative answers to this question, rather than an illustration that approximately places a solution in parameter space.

    4. Reviewer #3 (Public review):

      Mondal et al. use computational modeling to investigate how activity-dependent shifts in voltage-dependent (in)activation curves can complement changes in ion channel conductance to support homeostatic plasticity. While it is well established that the voltage-dependent properties of ion channels influence neuronal excitability, their potential role in homeostatic regulation, alongside conductance changes, has remained largely unexplored. The results presented here demonstrate that activity-dependent regulation of voltage dependence can interact with conductance plasticity to enable neurons to attain and maintain target activity patterns, in this case, intrinsic bursting. Notably, the timescale of these voltage-dependent shifts influences the final steady-state configuration of the model, shaping both channel parameters and activity features such as burst period and duration. A major conclusion of the study is that altering this timescale can seamlessly modulate a neuron's intrinsic properties, which the authors suggest may be a mechanism for adaptation to perturbations.

      While this conclusion is largely well-supported, additional analyses could help clarify its scope. For instance, the effects of timescale alterations are clearly demonstrated when the model transitions from an initial state that does not meet the target activity pattern to a new stable state. However, Fig. 6 and the accompanying discussion appear to suggest that changing the timescale alone is sufficient to shift neuronal activity more generally. It would be helpful to clarify that this effect primarily applies during periods of adaptation, such as neurodevelopment or in response to perturbations, and not necessarily once the system has reached a stable, steady state. As currently presented, the simulations do not test whether modifying the timescale can influence activity after the model has stabilized. In such conditions, changes in timescale are unlikely to affect network dynamics unless they somehow alter the stability of the solution, which is not shown here. That said, it seems plausible that real neurons experience ongoing small perturbations which, in conjunction with changes in timescale, could allow gradual shifts toward new solutions. This possibility is not discussed but could be a fruitful direction for future work.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Major comments:

      (1) The main issue that I have with this study is the lack of exploration of "why" the model produces the results it does. Considering this is a model, it should be possible to find out why the three timescales of half-act/inact parameter modifications lead to different sets of results. Without this, it is simply an exploratory exercise. (The model does this, but we do not know the mechanism.) Perhaps this is enough as an interesting finding, but it remains unconvincing and (clearly) does not have the impact of describing a potential mechanism that could be potentially explored experimentally.

      This is now addressed in a new section in Results (“Potential Mechanism”):

      “To explore why the properties of the resulting bursters depend on the timescale of half-(in)activation adjustments, we examined what happens when SP1 is assembled under different half-(in)activation timescales: (1) fast, (2) intermediate (matching the timescale of ion channel density changes), and (3) infinitely slow (i.e., effectively turned off). The effects of these timescales can be seen by comparing the zoomed-in views of the SP1 activity profiles under each condition (Figure 4).

      When half-(in)activations are fast, the time evolution of — which tracks how far the activity pattern is from its targets (see Methods)—shows an abrupt jump as it searches for a voltage-dependence configuration that meets calcium targets (Figure 4A). As this happens, the channel densities are slightly altered, and this process continues again. Slowing the half-(in)activations alterations reduces these abrupt fluctuations (Figure 4B). Making the alterations infinitely slow effectively removes half-(in)activation changes altogether, leaving the system reliant solely on slower alterations in maximal conductances (Figure 4C). Because each timescale of half-(in)activation produces a different channel repertoire at each time step, different timescales of half-(in)activation alteration led the model through a different path in the space of activity profiles and intrinsic properties. Ultimately, this resulted in distinct final activity patterns – all of which were consistent with the Ca<sup>2+</sup> targets [22].

      (2) A related issue is the use of bootstrapping to do statistics for a family of models, especially when the question is in fact the width of the distribution of output attributes. I don't buy this. One can run enough models to find say N number of models within a tight range (say 2% cycle period) and the same N number within a loose range (say 20%) and compare the statistics within the two groups with the same N.

      We appreciate the reviewer’s skepticism regarding our statistical approach with the “Group of 5” and “Group of 20.” These groups arose from historical aspects of our analysis and this analysis does not directly advance the main point—that changes in the timescale of channel voltage-dependence alterations impact the properties of bursters to which the homeostatic mechanism converges. Therefore, we removed the references to the Group of 5 and focus on how the Group of 20 responds to variations in the timescale of voltage-dependent alterations.

      (3) The third issue is that many of the results that are presented (but not the main one) are completely expected. If one starts with gmax values that would never work (say all of them 0), then it doesn't matter how much one moves the act/inact curves one probably won't get the desired activity. Alternately, if one starts with gmax values that are known to work and randomizes the act/inact midpoints, then the expectation would be that it converges to something that works. This is Figure 1 B and C, no surprise. But it should work the other way around too. If one starts with random act/inact curves that would never work and fixes those, then why would one expect any set of gmax values would produce the desired response? I can easily imagine setting the half-act/inact values to values that never produce any activity with any gmax.

      We appreciate this observation and agree that it highlights a limitation of our initial condition sampling. Our claim that the half-(in)activation mechanism is subordinate to the maximal conductance mechanism is not intended as a general statement. Rather, we make this observation only within the specific range of initial conditions we explored. Within this restricted set, we found that the conductance mechanism was sufficient for successful assembly, while the half-(in)activation mechanism alone was not. We have revised the manuscript to limit the claim.

      “The results shown in Figure 1A require activity-dependent regulation of the maximal conductances. When activity-dependent regulation of the maximal conductances is turned off, the model failed to assemble SP1 into a burster (Figure 1B). This was seen in the other 19 Starting Parameters (SP2-SP20), as well [22].

      (4) A potential response to my previous criticism would be that you put reasonable constraints on gmax's or half-act/inact values or tie the half-act to half-inact. But that is simply arbitrary ad hoc decisions made to make the model work, much like the L8-norm used to amplify some errors. There is absolutely no reason to believe this is tied to the biology of the system.

      Here the reviewer highlights that model choices (e.g., constraints on maximal conductance and half-(in)activation, use of the L8 norm) are not necessarily justified by biology. A discussion of the constraints on maximal conductance and half-(in)activation are in the Model Assumptions section at the end of Methods. The Methods also contains a longer discussion of the use of the L8 norm:

      “To compute this match score, we adapted a formulation from Alonso et al (2023),  who originally used a root-mean-square (RMS) or  norm to combine the sensor mismatches. In that approach, each error (, , and ) is divided by its allowable tolerance (, , and ) to produce a normalized error. These normalized errors are then squared, summed, and square-rooted to produce a single scalar score that reflects how well the model matches the target activity pattern.

      In our version, we instead used an  norm, which raises each normalized error to the 8th power before summing and taking the 1/8th root. This formulation emphasizes large deviations in any one sensor, making it easier to pinpoint which feature of the activity is limiting convergence. By amplifying outlier mismatches, this approach provided a clearer view of which sensor was driving model mismatch, helping us both interpret failure modes and tune the model’s sensitivity by adjusting the tolerances for individual sensor errors.

      Although the  norm emphasizes large deviations more strongly than the  norm, the choice of norm does not fundamentally alter which models can converge—a model that performs well under one norm can also be made to perform well under another by adjusting the allowable tolerances. The biophysical mechanisms by which neurons detect deviations from target activity and convert them into changes in ion channel properties are still not well understood. Given this uncertainty, and the fact that using different norms ultimately shouldn’t affect the convergence of a given model, the use of different norms to combine sensor errors is consistent with the broader basic premise of the model: that intrinsic homeostatic regulation is calcium mediated [22].

      (5) The discussion of this manuscript is at once too long and not adequate. It goes into excruciating detail about things that are simply not explored in this study, such as phosphorylation mechanisms, justification of model assumptions of how these alterations occur, or even the biological relevance. (The whole model is an oversimplification - lack of anatomical structure, three calcium sensors, arbitrary assumptions, and how parameter bounds are implemented.) Lengthy justifications for why channel density & half-act/inact of all currents are obeying the same time constant are answering a question that no one asked. It is a simplified model to make an important point. The authors should make these parts concise and to the point. More importantly, the authors should discuss the mechanism through which these differences may arise. Even if it is not clear, they should speculate.

      We agree. A long discussion on Model Assumptions and potential biological mechanisms that implement alteration in channel voltage-dependence obscure this. The former is relocated to the Methods section. The latter discussion is shortened. A discussion of a potential mechanism is included in the Results (Figure 4).

      (6) There should be some justification or discussion of the arbitrary assumptions made in the model/methods. I understand some of this is to resolve issues that had come up in previous iterations of this approach and in fact the Alonso et al, 2023 paper was mainly to deal with these issues. However, some level of explanation is needed, especially when assumptions are made simply because of the intuition of the modeler rather than the existence of a biological constraint or any other objective measure.

      A discussion of Model Assumptions is included in the Methods.

      Reviewer #2 (Public review):

      Summary:

      In this study, Mondal and co-authors present the development of a computational model of homeostatic plasticity incorporating activity-dependent regulation of gating properties (activation, inactivation) of ion channels. The authors show that, similar to what has been observed for activity-dependent regulation of ion channel conductances, implementing activity-dependent regulation of voltage sensitivity participates in the achievement of a target phenotype (bursting or spiking). The results however suggest that activity-dependent regulation of voltage sensitivity is not sufficient to allow this and needs to be associated with the regulation of ion channel conductances in order to reliably reach the target phenotype. Although the implementation of this biologically relevant phenomenon is undeniably relevant, the main conclusions of the paper and the insights brought by this computational work are difficult to grasp.

      Strengths:

      (1) Implementing activity-dependent regulation of gating properties of ion channels is biologically relevant.

      (2) The modeling work appears to be well performed and provides results that are consistent with previous work performed by the same group.

      Weaknesses:

      (1) The writing is rather confusing, and the state of the art explaining the need for the study is unclear.

      We reorganized the manuscript to make its focus clearer.

      Introduction: We clarified our explanation of the state-of-the-art. Briefly, prior work on activity-dependent homeostasis has focused on regulating ion channel density. Neurons have also been documented to homeostatically regulate channel voltage-dependence. However, the consequences of channel voltage-dependence alterations on homeostatic regulation remain underexplored. To study this, we extend a computational model of activity-dependent homeostasis — originally developed to only alter channel density— to alter channel voltage-dependence.

      Results: We reorganized this section to underscore the main point: that the timescale of half-(in)activation alterations influences the intrinsic properties and activity patterns targeted by a homeostatic mechanism. Figures 1A and 1B were retained to provide context—Figure 1A illustrates how activity can emerge from random initial conditions, while Figure 1B suggests that in these simulations, modulation of half-(in)activation played a specific limited role. Figure 2 builds on Figure 1A by summarizing how intrinsic properties and activity characteristics vary across a population of 20 bursters. Figure 3 then demonstrates that despite playing this specific limited role, altering the timescale of half-(in)activation in these simulations significantly impacted the intrinsic properties and activity characteristics of the bursters targeted by the homeostatic mechanism. Figure 4 supports this by offering a possible mechanistic explanation. Finally, Figure 5 reinforces the central message by showing how the same population responds to perturbation when the timescale of half-(in)activation alterations is varied—essentially extending the analysis of Figure 3 to a perturbed regime.

      Discussion: The Discussion concentrates on more specifically on how the timescale of half-(in)activation alterations shape bursters targeted he homeostatic mechanism. Extended content on model assumptions is moved to Methods. The discussion of biological pathways that implement channel voltage-dependence is shortened to avoid distracting from the main message.

      Methods: Aside from moving model assumptions here, we removed discussion of the “Group of 5” and explained in more detail why we chose the L8 norm.

      (2) The main outcomes and conclusions of the study are difficult to grasp. What is predicted or explained by this new version of homeostatic regulation of neuronal activity?

      Our message is general: the timescale of half-(in)activation alterations influences the intrinsic properties and activity characteristics of bursters targeted by a homeostatic mechanism. As such, the implications are general. Their value lies in circumscribing a conceptual framework from which experimentalists may devise and test new hypotheses. We do not aim to predict or explain any specific phenomenon in this work. To address this concern the Discussion highlights two potential implications of our findings—one to neuronal development and another to pathologies that may arise from disruptions to homeostatic processes:

      “One application for the simulations involving the self-assembly of activity may be to model the initial phases of neural development, when a neuron transitions from having little or no electrical activity to possessing it (Baccaglini & Spitzer 1977). As shown in Figure 6, the timescale of (in)activation curve alterations define a neuron's activity characteristics and intrinsic properties. As such, neurons may actively adjust these timescales to achieve a specific electrical activity aligned with a developmental phase’s activity targets. Indeed, developmental phases are marked by changes in ion channel density and voltage-dependence, leading to distinct electrical activity at each stage (Baccaglini & Spitzer 1977, Gao & Ziskind-Conhaim 1998, Goldberg et al 2011, Hunsberger & Mynlieff 2020, McCormick & Prince 1987, Moody & Bosma 2005, O'Leary et al 2014, Picken Bahrey & Moody 2003).

      Additionally, our results show that activity-dependent regulation of channel voltage-dependence can play a critical role in restoring neuronal activity during perturbations (Figure 5). Specifically, the presence and timing of half-(in)activation modulation influenced whether the model neuron could successfully return to its target activity pattern. Many model neurons only achieved recovery when a half-(in)activation mechanism was present. Moreover, the speed of this modulation shaped recovery outcomes in nuanced ways: some model neurons reached their targets only when voltage-dependence was adjusted rapidly, while others did so only when these changes occurred slowly. These observations all suggest that impairments in a neuron’s ability to modulate the voltage-dependence of its channels may lead to disruptions in activity-dependent homeostasis. This may have implications for conditions such as addiction (Kourrich et al 2015) and Alzheimer’s disease (Styr & Slutsky 2018), where disruptions in homeostatic processes are thought to contribute to pathogenesis.”

      Reviewer #3 (Public review):

      Mondal et al. use computational modeling to investigate how activity-dependent shifts in voltage-dependent (in)activation curves can complement activity-dependent changes in ion channel conductance to support homeostatic plasticity. While changes in the voltage-dependent properties of ion channels are known to modulate neuronal excitability, their role as a homeostatic plasticity mechanism interacting with channel conductance has been largely unexplored. The results presented here demonstrate that activity-dependent regulation of voltage-dependent properties can interact with plasticity in channel conductance to allow neurons to attain and maintain target activity patterns, in this case, intrinsic bursting. These results also show that the rate of channel voltage-dependent shifts can influence steady-state parameters reached as the model stabilizes into a stable intrinsic bursting state. That is, the rate of these modifications shapes the range of channel conductances and half-(in)activation parameters as well as activity characteristics such as burst period and duration. A major conclusion of the study is that altering the timescale of channel voltage dependence can seamlessly shift a neuron's activity characteristics, a mechanism that the authors argue may be employed by neurons to adapt to perturbations. While the study's conclusions are mostly well-supported, additional analyses, and simulations are needed.

      (1) A main conclusion of this study is that the speed at which (in)activation dynamics change determines the range of possible electrical patterns. The authors propose that neurons may dynamically regulate the timescale of these changes (a) to achieve alterations in electrical activity patterns, for example, to preserve the relative phase of neuronal firing in a rhythmic network, and (b) to adapt to perturbations. The results presented in Figure 4 clearly demonstrate that the timescale of (in)activation modifications impacts the range of activity patterns generated by the model as it transitions from an initial state of no activity to a final steady-state intrinsic burster. This may have important implications for neuronal development, as discussed by the authors.

      However, the authors also argue that the model neuron's dynamics - such as period, and burst duration, etc - could be dynamically modified by altering the timescale of (in)activation changes (Figure 6 and related text). The simulations presented here, however, do not test whether modifications in this timescale can shift the model's activity features once it reaches steady state. In fact, it is unlikely that this would be the case since, at steady-state, calcium targets are already satisfied. It is likely, however, as the authors suggest, that the rate at which (in)activation dynamics change may be important for neuronal adaptation to perturbations, such as changes in temperature or extracellular potassium. Yet, the results presented here do not examine how modifying this timescale influences the model's response to perturbations. Adding simulations to characterize how alterations in the rate of (in)activation dynamics affect the model's response to perturbations-such as transiently elevated extracellular potassium (Figure 5) - would strengthen this conclusion.

      The reviewer suggests that our core message — namely, that the timescale of half-(in)activation alterations influences the intrinsic properties and activity patterns targeted by a homeostatic mechanism — should also hold during perturbations. We agree that this extension strengthens the central message and have incorporated it into the subsection of the Results (“Half-(in)activation Alterations Contribute to Activity Homeostasis”) and Figure 5.

      (2) Another key argument in this study is that small, coordinated changes in channel (in)activation contribute to shaping neuronal activity patterns, but that, these subtle effects may be obscured when averaging across a population of neurons. This may be the case; however, the results presented don't clearly demonstrate this point. This point would be strengthened by identifying correlations, if they exist, between (in)activation curves, conductance, and the resulting bursting patterns of the models for the simulations presented in Figure 2 and Figure 4, for example. Alternatively, or additionally, relationships between (in)activation curves could be probed by perturbing individual (in)activation curves and quantifying how the other model parameters compensate, which could clearly illustrate this point.

      In part of the Discussion, we noted that small, coordinated shifts in half-(in)activation curves could be obscured when averaging across a population of neurons. Our intention was not to present this as a primary result, but to highlight an emergent consequence of the model: that distinct initial maximal conductances may converge to activity targets via different small shifts in half-(in)activation, making such changes difficult to detect at the population level. However, we did not systematically examine correlations between (in)activation parameters, conductances, and activity features, nor how these correlations might vary with the timescale of (in)activation modulation. While this observation is consistent with model behavior, it does not directly advance the study’s main point — that the timescale of half-(in)activation modulation influences the types of bursting patterns that satisfy the activity target. To keep the focus clear, we have removed this remark from the Discussion, though we agree that a more detailed analysis of these correlations may offer a fruitful direction for future work.

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (1) Page 5: remove "an" from "achieve a given an activity..."

      The sentence containing this error has been removed.

      (2) Page 7, bottom of page. Explain what prespecifying means here. This requires a conceptual explanation, even if the equations are given in the methods. Was one working ad hoc model built from which the three sensor values were chosen? What was this model and how was it benchmarked? The sensors are never shown. In any figure, but presumably they have different kinetics. What is meant by "average value"? What was the window of averaging and why?

      The intention of this passage was to provide a broad overview of the homeostatic mechanism, with the rationale for using sensor “averages” as homeostatic targets explained in detail in the Methods. We have replaced the word “average” with “target” to maintain this focus.

      (3) Page 9: add "the" in "electrical activity of the neuron as [the] model seeks...".

      Done

      (4) Page 9: say briefly what alpha is before using it. Also, please be consistent in either using the symbol for alpha or spelling it out across the manuscript and the figures.

      Done

      (5) Page 10: the paragraph "In general, ..." is confusing although it becomes clear later on what this is all about. Please rewrite and expand this to clarify some points. For instance, the word "degenerate" is first used here and it is unclear in what sense these models are degenerate. Then it is unclear why the first 5 models were chosen and then 15 more added. What was the point of doing this? What is the intent? Set this up properly before saying that you just did it. This also would clarify the weird terminology used later on of Group of 20 vs. Group of 5. The 20 and 5 are arbitrary. Say what the purpose is. Finally, is the "mean" at the very end the same 416 ms? If not, what do you mean by "the mean"? In fact, I find these 2% and 20% to be imprecise substitutes of (say) two distinct values of CV which are an order of magnitude different. Is that the intent?

      This comment refers to a passage that was removed during revision.

      (6) Page 10: this may be clear to you, but it took me a while to understand that in Figure 1C, you took the working model at the end of 1A, fixed the gmax values and randomized just the half-act/inact values to run it. Perhaps rewrite this to clarify?

      This comment refers to a figure that was removed during revision.

      (7) Page 13: why do channel densities not change much after the perturbation?

      This comment refers to a figure that has since been reworked during revision. In particular, we only study what happens during perturbation. This question is interesting and is the subject of ongoing work.

      Reviewer #2 (Recommendations for the authors):

      The article should be carefully corrected, because the current quality of writing might obscure the interest of the study. Particular attention should be paid to the state-of-the-art section and to the discussion, but even the writing of the results should be carefully reworked. The current state of the article makes it very difficult to understand the motivation behind the study but also what the main result provided by this work is.

      The Introduction, Results, and Discussion have been reworked to build on the central premise of the work: the timescale of half-(in)activation alterations influences the intrinsic properties and activity patterns targeted by the neuron’s homeostatic mechanism. These changes are detailed in Public Comment #1.

      Reviewer #3 (Recommendations for the authors):

      The manuscript presents an interesting computational study exploring how activity-dependent regulation of (in)activation dynamics interacts with conductance plasticity to shape neuronal activity patterns. While the study provides valuable insights, some aspects would benefit from clarification, further analyses, and/or additional simulations to strengthen the conclusions. Below, I outline concerns and comments related to specific details of the model and results presentation that were not included in the public review.

      (1) The results presented in Figure 5 show that adaptation occurs in both channel conductances and (in)activation dynamics; however, the changes in conductance remain relatively permanent after the model recovers from the transient elevation in extracellular potassium. It therefore seems likely that the model would recover bursting more quickly in response to a subsequent exposure to simulated elevated extracellular potassium since large modifications in the slowly changing conductances would not be required. If this is the case, it could provide a plausible mechanism for adaptation to repeated high-potassium exposure, as demonstrated experimentally in Cancer borealis by this group (PMID: 36060056).

      This is an astute observation and the subject of our present follow-up investigation.

      (2) In the text relating to Figure 5, it is argued that the resulting shifts in (in)activation curves may be conceptualized as alterations in window currents. It would be helpful to illustrate this by plotting and comparing changes in window currents of these channels alongside the changes in their (in)activation curves.

      This comment refers to a passage that was removed during revision.

      (3) Some discussion of the role these homeostatic mechanisms may play when the neuron is synaptically integrated into a rhythmically active network could be informative. Surely, phasic and tonic inputs to the neuron would alter its conductance and voltage-dependent properties. Therefore, the model's parameters in an intact network could be very different from those in the synaptically isolated case.

      This is an excellent point. We agree that synaptic context—particularly tonic and phasic inputs—would likely influence a neuron’s conductances and voltage-dependent properties, potentially leading to different homeostatic outcomes than in the isolated case. While our current study focuses on synaptically isolated neurons, the Marder lab has considered how homeostatically stabilized neurons might interact in network settings. For example, O'Leary et al (2014) presents an example network of three such neurons operating under homeostatic regulation. However, systematically exploring this question remains a challenge. We are currently developing ideas to study this in the context of a simplified half-center oscillator model, where network-level dynamics can be more tractably analyzed.

      (4) Why are the transitions of alpha typically so abrupt, essentially either 1 or 0? Similarly, what happens in the model when there are transient transitions from what appears to be a steady-state alpha that abruptly shifts from 0 to 1 or 1 to 0? For example, what is occurring in Figure 1A at ~150s and ~180s when alpha jumps between 1 and 0, or in Figure 1B when the model transiently jumps up from 0 to 1 at ~400s and ~830s? In Figure 1A, does the bursting pattern change at all after ~250s, or is it identical to the pattern at c?

      This is addressed in the revision (Lines 141 – 150).

      (5) Are the final steady-state parameters of the 25 (sic) models consistent with experimental observations?

      It is difficult to assess — it is hard to design an experiment to do what the reviewer is suggesting.

      (6) Why isn't gL allowed to change dynamically? This seems like the most straightforward way to allow a neuron to adjust its excitability (aside from tonic synaptic inputs).

      Passive currents could, in principle, be subject to homeostatic regulation. However, our study focused on active intrinsic currents. This focus stems from earlier investigations, which showed that active currents are dynamically regulated during homeostasis – for instance Turrigiano et al (1995) and (Desai et al 1999).

      Alonso LM, Rue MCP, Marder E. 2023. Gating of homeostatic regulation of intrinsic excitability produces cryptic long-term storage of prior perturbations. Proc Natl Acad Sci U S A 120: e2222016120

      Baccaglini PI, Spitzer NC. 1977. Developmental changes in the inward current of the action potential of Rohon-Beard neurones. J Physiol 271: 93-117

      Desai NS, Rutherford LC, Turrigiano GG. 1999. Plasticity in the intrinsic excitability of cortical pyramidal neurons. Nature Neuroscience 2: 515-20

      Gao BX, Ziskind-Conhaim L. 1998. Development of ionic currents underlying changes in action potential waveforms in rat spinal motoneurons. J Neurophysiol 80: 3047-61

      Goldberg EM, Jeong HY, Kruglikov I, Tremblay R, Lazarenko RM, Rudy B. 2011. Rapid developmental maturation of neocortical FS cell intrinsic excitability. Cereb Cortex 21: 666-82

      Hunsberger MS, Mynlieff M. 2020. BK potassium currents contribute differently to action potential waveform and firing rate as rat hippocampal neurons mature in the first postnatal week. J Neurophysiol 124: 703-14

      Kourrich S, Calu DJ, Bonci A. 2015. Intrinsic plasticity: an emerging player in addiction. Nature Reviews Neuroscience 16: 173-84

      McCormick DA, Prince DA. 1987. Post-natal development of electrophysiological properties of rat cerebral cortical pyramidal neurones. J Physiol 393: 743-62

      Moody WJ, Bosma MM. 2005. Ion channel development, spontaneous activity, and activity-dependent development in nerve and muscle cells. Physiol Rev 85: 883-941

      O'Leary T, Williams AH, Franci A, Marder E. 2014. Cell types, network homeostasis, and pathological compensation from a biologically plausible ion channel expression model. Neuron 82: 809-21

      Picken Bahrey HL, Moody WJ. 2003. Early development of voltage-gated ion currents and firing properties in neurons of the mouse cerebral cortex. J Neurophysiol 89: 1761-73

      Styr B, Slutsky I. 2018. Imbalance between firing homeostasis and synaptic plasticity drives early-phase Alzheimer’s disease. Nature Neuroscience 21: 463-73

      Turrigiano G, LeMasson G, Marder E. 1995. Selective regulation of current densities underlies spontaneous changes in the activity of cultured neurons. J Neurosci 15: 3640-52

    1. eLife Assessment

      This valuable study demonstrates that D1- and D2-striatal neurons receive distinct cortical inputs, offering key insights into corticostriatal function. For instance, in the context of striatal-dependent learning, this distinction is highly informative for interpreting synaptic physiology data, particularly when inputs to one neuron subtype may change independently of the other. The strength of the evidence is solid, with anatomical and electrophysiological findings aligning well with results from optogenetic and behavioral studies.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Klug et al. investigated the pathway specificity of corticostriatal projections, focusing on two cortical regions. Using a G-deleted rabies system in D1-Cre and A2a-Cre mice to retrogradely deliver channelrhodopsin to cortical inputs, the authors found that M1 and MCC inputs to direct and indirect pathway spiny projection neurons (SPNs) are both partially segregated and asymmetrically overlapping. In general, corticostriatal inputs that target indirect pathway SPNs are likely to also target direct pathway SPNs, while inputs targeting direct pathway SPNs are less likely to also target indirect pathway SPNs. Such asymmetric overlap of corticostriatal inputs has important implications for how the cortex itself may determine striatal output. Indeed, the authors provide behavioral evidence that optogenetic activation of M1 or MCC cortical neurons that send axons to either direct or indirect pathway SPNs can have opposite effects on locomotion and different effects on action sequence execution. The conclusions of this study add to our understanding of how cortical activity may influence striatal output and offer important new clues about basal ganglia function.

      The conceptual conclusions of the manuscript are supported by the data, but the details of the magnitude of afferent overlap and causal role of asymmetric corticostriatal inputs on some behavioral outcomes may be a bit overstated given technical limitations of the experiments.

      For example, after virally labeling either direct pathway (D1) or indirect pathway (D2) SPNs to optogenetically tag pathway-specific cortical inputs, the authors report that a much larger number of "non-starter" D2-SPNs from D2-SPN labeled mice responded to optogenetic stimulation in slices than "non-starter" D1 SPNs from D1-SPN labeled mice did. Without knowing the relative number of D1 or D2 SPN starters used to label cortical inputs, it is difficult to interpret the exact meaning of the lower number of responsive D2-SPNs in D1 labeled mice (where only ~63% of D1-SPNs themselves respond) compared to the relatively higher number of responsive D1-SPNs (and D2-SPNs) in D2 labeled mice. While relative differences in connectivity certainly suggest that some amount of asymmetric overlap of inputs exists, differences in infection efficiency and ensuing differences in detection sensitivity in slice experiments make determining the degree of asymmetry problematic.

      It is also unclear if retrograde labeling of D1-SPN- vs D2-SPN- targeting afferents labels the same densities of cortical neurons. This gets to the point of specificity in some of the behavioral experiments. If the target-based labeling strategies used to introduce channelrhodopsin into specific SPN afferents label significantly different numbers of cortical neurons, might the difference in the relative numbers of optogenetically activated cortical neurons itself lead to behavioral differences?

    3. Reviewer #2 (Public review):

      Summary:

      Klug et al. use monosynaptic rabies tracing of inputs to D1- vs D2-SPNs in the striatum to study how separate populations of cortical neurons project to D1- and D2-SPNs. They use rabies to express ChR2, then patch D1-or D2-SPNs to measure synaptic input. They report that cortical neurons labeled as D1-SPN-projecting preferentially project to D1-SPNs over D2-SPNs. In contrast, cortical neurons labeled as D2-SPN-projecting project equally to D1- and D2-SPNs. They go on to conduct pathway-specific behavioral stimulation experiments. They compare direct optogenetic stimulation of D1- or D2-SPNs to stimulation of MCC inputs to DMS and M1 inputs to DLS. In three different behavioral assays (open field, intra-cranial self-stimulation, and a fixed ratio 8 task), they show that stimulating MCC or M1 cortical inputs to D1-SPNs is similar to D1-SPN stimulation, but that stimulating MCC or M1 cortical inputs to D2-SPNs does not recapitulate the effects of D2-SPN stimulation (presumably because both D1- and D2-SPNs are being activated by these cortical inputs).

      Strengths:

      Showing these same effects in three distinct behaviors is strong. Overall, the functional verification of the consequences of the anatomy is very nice to see. It is a good choice to patch only from mCherry-negative non-starter cells in the striatum. This study adds to our understanding of the logic of corticostriatal connections, suggesting a previously unappreciated structure.

      Weaknesses:

      One limitation is that all inputs to SPNs are expressing ChR2, so they cannot distinguish between different cortical subregions during patching experiments. Their results could arise because the same innervation patterns are repeated in many cortical subregions or because some subregions have preferential D1-SPN input while others do not. There are also some caveats with respect to the efficacy of rabies tracing. Although they only patch non-starter cells in the striatum, only 63% of D1-SPNs receive input from D1-SPN-projecting cortical neurons. It's hard to say whether this is "high" or "low," but one question is how far from the starter cell region they are patching. Without this spatial indication of where the cells that are being patched are relative to the starter population, it is difficult to interpret if the cells being patched are receiving cortical inputs from the same neurons that are projecting to the starter population. The authors indicate they are patching from mCherry-negative neurons within the region of the mCherry-positive neurons, but since the mCherry population will include both true starter cells and monosynaptically connected cells, this is not perfectly precise. Convergence of cortical inputs onto SPNs may vary with distance from the starter cell region quite dramatically, as other mapping studies of corticostriatal inputs have shown specialized local input regions can be defined based on cortical input patterns (Hintiryan et al., Nat Neurosci, 2016, Hunnicutt et al., eLife 2016, Peters et al., Nature, 2021). A caveat for the optogenetic behavioral experiments is that these optogenetic experiments did not include fluorophore-only controls, although a different control (with light delivered in M1) is provided in Supplementary Figure 3. Another point of confusion is that other studies (Cui et al, J Neurosci, 2021) have reported that stimulation of D1-SPNs in DLS inhibits rather than promotes movement. This study may have given different results due to subtly different experimental parameters, including fiber optic placement and NA.

    4. Reviewer #3 (Public review):

      Review of resubmission: The authors provided a response to the reviews from myself and other reviewers. While some points were made satisfactorily, particularly in clarification of the innervation of cortex to striatum and the effects of input stimulation, many of my points remain unaddressed. In several cases, the authors chose to explain their rationale rather than address the issues at hand. A number of these issues (in fact, the majority) could be addressed simply by toning done the confidence in conclusions, so it was disappointing to see that the authors by and large did not do this. I repeat my concerns below and note whether I find them to have been satisfactorily addressed or not.

      In the manuscript by Klug and colleagues, the investigators use a rabies virus-based methodology to explore potential differences in connectivity from cortical inputs to the dorsal striatum. They report that the connectivity from cortical inputs onto D1 and D2 MSNs differs in terms of their projections onto the opposing cell type, and use these data to infer that there are differences in cross-talk between cortical cells that project to D1 vs. D2 MSNs. Overall, this manuscript adds to the overall body of work indicating that there are differential functions of different striatal pathways which likely arise at least in part by differences in connectivity that have been difficult to resolve due to difficulty in isolating pathways within striatal connectivity, and several interesting and provocative observations were reported. Several different methodologies are used, with partially convergent results, to support their main points.

      However, I have significant technical concerns about the manuscript as presented that make it difficult for me to interpret the results of the experiments. My comments are below.

      Major:<br /> There is generally a large caveat to the rabies studies performed here, which is that both TVA and the ChR2-expressing rabies virus have the same fluorophore. It is thus essentially impossible to determine how many starter cells there are, what the efficiency of tracing is, and which part of the striatum is being sampled in any given experiment. This is a major caveat given the spatial topography of the cortico-striatal projections. Furthermore, the authors make a point in the introduction about previous studies not having explored absolute numbers of inputs, yet this is not at all controlled in this study. It could be that their rabies virus simply replicates better in D1-MSNs than D2-MSNs. No quantifications are done, and these possibilities do not appear to have been considered. Without a greater standardization of the rabies experiments across conditions, it is difficult to interpret the results.

      This is still an issue. The authors point out why they chose various vectors. I can understand why the authors chose the fluorophores etc. that they did, yet the issues I raised previously are still valid. The discussion should mention that this is a potential issue. It does not necessarily invalidate results, but it is an issue. Furthermore, it is possible (in all systems) that rabies replicates better/more efficiently in some cells than others. This is one possible interpretation that has not really been explored in any study. I don't suggest the authors attempt to do that, but it should be raised as a potential interpretation. If the rabies results could mean several different things, the authors owe it to the readership to state all possible interpretations of data.

      The authors claim using a few current clamp optical stimulation experiments that the cortical cells are healthy, but this result was far from comprehensive. For example, membrane resistance, capacitance, general excitability curves, etc are not reported. In Figure S2, some of the conditions look quite different (e.g., S2B, input D2-record D2, the method used yields quite different results that the authors write off as not different). Furthermore, these experiments do not consider the likely sickness and death that occurs in starter cells, as has been reported elsewhere. Health of cells in the circuit is overall a substantial concern that alone could invalidate a large portion, if not all, of the behavioral results. This is a major confound given those neurons are thought to play critical roles in the behaviors being studied. This is a major reason why first-generation rabies viruses have not been used in combination with behavior, but this significant caveat does not appear to have been considered, and controls e.g., uninfected animals, infected with AAV helpers, etc, were not included.

      This issue remains unaddressed. I did not request clarity about experimental design, but rather, raised issues about the potential effects of toxicity. I believe this to be a valid concern that needs to be discussed in the manuscript, especially given what look visually like potential differences in S2.

      The overall purity (e.g., EnvA pseudotyping efficiency) of the RABV prep is not shown. If there was a virus that was not well EnvA-pseudotyped and thus could directly infect cortical (or other) inputs, it would degrade specificity.

      This issue has not been addressed. Viral strain is irrelevant. The quality of the specific preparations used is what matters.

      While most of the study focuses on the cortical inputs, in slice recordings, inputs from the thalamus are not considered, yet likely contribute to the observed results. Related to this, in in vivo optogenetic experiments, technically, if the thalamic or other inputs to the dorsal striatum project to the cortex, their method will not only target cortical neurons but also terminals of other excitatory inputs. If this cannot be ruled it, stating that the authors are able to selectively activate the cortical inputs to one or the other population should be toned down.

      The authors added text to the discussion to address this point. While it largely does what is intended, based on the one study cited, I disagree with the authors' conclusions that it is "clear" that potential contamination from other sites does not play a role. The simplest interpretation is the one the authors state, and there is some supporting evidence to back up that assertion, but to me that falls short of making the point "clear" that there are no other interpretations.

      The statements about specificity of connectivity are not well founded. It may be that in the specific case where they are assessing outside of the area of injections, their conclusions may hold (e.g., excitatory inputs onto D2s have more inputs onto D1s than vice versa). However, how this relates to the actual site of injection is not clear. At face value, if such a connectivity exists, it would suggest that D1-MSNs receive substantially more overall excitatory inputs than D2s. It is thus possible that this observation would not hold over other spatial intervals. This was not explored and thus the conclusions are over-generalized. e.g., the distance from the area of red cells in the striatum to recordings was not quantified, what constituted a high level of cortical labeling was not quantified, etc. Without more rigorous quantification of what was being done, it is difficult to interpret the results.

      Again, the goal here would be to make a statement about this in the discussion to clarify limitations of the study. I don't expect the authors to re-do all of these experiments, but since they are discussing the corticostriatal circuits, which have multiple subdomains, this remains a relevant point. It has not been addressed.

      The results in Figure 3 are not well controlled. The authors show contrasting effects of optogenetic stimulation of D1-MSNs and D2-MSNs in the DMS and DLS, results which are largely consistent with the canon of basal ganglia function. However, when stimulating cortical inputs, stimulating the inputs from D1-MSNs gives the expected results (increased locomotion) while stimulating putative inputs to D2-MSNs had no effect. This is not the same as showing a decrease in locomotion - showing no effect here is not possible to interpret.

      I think that the caveat of showing no clear effects of inputs to D2 stimulation should be pointed out. Yes, I understand that the viruses appeared to express etc., but again it remains possible that the results are driven by a lack of e.g., sufficient ChR2 expression. Aside from a full quantification of the number of cells expressing ChR2, overlap in fiber placement and ChR2 expression (which I don't suggest), this remains a possibility and should be pointed out, as it remains a possibility.

      In the light of their circuit model, the result showing that inputs to D2-MSNs drive ICSS is confusing. How can the authors account for the fact that these cells are not locomotor-activating, stimulation of their putative downstream cells (D2-MSNs) does not drive ICSS, yet the cortical inputs drive ICSS? Is the idea that these inputs somehow also drive D1s? If this is the case, how do D2s get activated, if all of the cortical inputs tested net activate D1s and not D2s? Same with the results in Figure 4 - the inputs and putative downstream cells do not have the same effects. Given potential caveats of differences in viral efficiency, spatial location of injections, and cellular toxicity, I cannot interpret these experiments.

      The explanation the authors provide in their rebuttal makes sense, however this should be included in the discussion of the manuscript, as it is interesting and relevant.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of the molecular basis by which early symmetry breaking events connect to the following cell fate specifications in preimplantation mammalian embryos. The evidence supporting the conclusions is compelling, with advanced image based assays and microinjection based functional tests. The work will be of broad interest to cell and developmental biologists.

    2. Reviewer #1 (Public review):

      Summary:

      This work starts with the observation that embryo polarization is asynchronous starting at the early 8-cell stage, with early polarizing cells being biased towards producing the trophectoderm (TE) lineage. They further found that reduced CARM1 activity and upregulation of its substrate BAF155 promote early polarization and TE specification, this piece of evidence connects the previous finding that at Carm1 heterogeneity 4-cell stage guide later cell lineages - the higher Carm1-expressing blastomeres are biased towards ICM lineage. Thus, this work provides a link between asymmetries at the 4-cell stage and polarization at the 8-cell stage, providing a cohesive explanation regarding the first lineage allocation in mouse embryos.

      Strengths:

      In addition to what has been put in the summary, the advanced 3D image-based analysis has found that early polarization is associated with a change in cell geometry in blastomeres, regarding the ratio of the long axis to the short axis. This is considered a new observation that has not been identified.

      Weaknesses:

      For the microinjection-based method to overexpression/deletion of proteins, although it has been shown to be effective in the early embryo settings and has been widely used, it may not fully represent the in vivo situation in some cases, compared to other strategies such as the use of knock-in mice.

      This is a minor weakness and has been discussed by the author in the revised manuscript.

    1. eLife Assessment

      This manuscript applies state-of-the-art techniques to define the cellular composition of the dorsal vagal complex in two rodent species (mice and rats). The result is a fundamental resource that substantially advances our understanding of the dorsal vagal complex's role in the regulation of feeding and metabolism while also highlighting key differences between species. The analyses of single-cell profiling experiments in the manuscript provide compelling insight into the cellular architecture of the dorsal vagal complex, with potential implications for obesity therapeutics.

    2. Reviewer #1 (Public review):

      Summary:

      This paper is using state-of-the-art techniques to define the cellular composition and its complexity in two rodent species (mice and rats). The study is built on available datasets but extends those in a way that future research will be facilitated. The study will be of high impact for the study of metabolic control.

      Strengths:

      After revision, the paper is much improved. I have no further comments.

    3. Reviewer #2 (Public review):

      In this manuscript, Hes et al. present a comprehensive multi-species atlas of the dorsal vagal complex (DVC) using single-nucleus RNA sequencing, identifying over 180,000 cells and 123 cell types across five levels of granularity in mice and rats. Intriguingly, the analysis uncovered previously uncharacterized cell populations, including Kcnj3-expressing astrocytes, neurons co-expressing Th and Cck, and a population of leptin receptor-expressing neurons in the rat area postrema, which also express the progenitor marker Pdgfra. These findings suggest species-specific differences in appetite regulation. This study provides a valuable resource for investigating the intricate cellular landscape of the DVC and its role in metabolic control, with potential implications for refining obesity treatments targeting this hindbrain region.

      In line with previous work published by the PI, the topic is of clear scientific relevance, and the data presented in this manuscript are both novel and compelling. Additionally, the manuscript is well-structured, and the conclusions are robust and supported by the data. Overall, this study significantly enhances our understanding of the DVC and sheds light on key differences between rats and mice.

      I have reviewed the revised manuscript and am pleased to confirm that the authors have addressed my previous comments and concerns.

    1. eLife Assessment

      Cryptovaranoides, an end-Triassic animal (just over 200 Ma old), was originally described as a possibly anguimorph squamate, i.e., more closely related to snakes and some extant lizards than to other extant lizards, making Squamata much older than previously thought and providing a new calibration date inside it. Following a rebuttal and a defense, this fourth important contribution to the debate makes a meticulous and solid argument that Cryptovaranoides is not a squamate. However, further comparisons to potentially closely related animals would greatly benefit this study, and parts of the text require clarification.

    2. Reviewer #1 (Public review):

      In the Late Triassic and Early Jurassic (around 230 to 180 Ma ago), southern Wales and adjacent parts of England were a karst landscape. The caves and crevices accumulated remains of small vertebrates. These fossil-rich fissure fills are being exposed in limestone quarrying. In 2022 (reference 13 of the article), a partial articulated skeleton and numerous isolated bones from one fissure fill of end-Triassic age (just over 200 Ma) were named Cryptovaranoides microlanius and described as the oldest known squamate - the oldest known animal, by some 20 to 30 Ma, that is more closely related to snakes and some extant lizards than to other extant lizards. This would have considerable consequences for our understanding of the evolution of squamates and their closest relatives, especially for their speed and absolute timing, and was supported in the same paper by phylogenetic analyses based on different datasets.

      In 2023, the present authors published a rebuttal (reference 18) to the 2022 paper, challenging anatomical interpretations and the irreproducible referral of some of the isolated bones to Cryptovaranoides. Modifying the datasets accordingly, they found Cryptovaranoides outside Squamata and presented evidence that it is far outside. In 2024 (reference 19), the original authors defended most of their original interpretation and presented some new data, some of it from newly referred isolated bones. The present article discusses anatomical features and the referral of isolated bones in more detail, documents some clear misinterpretations, argues against the widespread but not justifiable practice of referring isolated bones to the same species as long as there is merely no known evidence to the contrary, further argues against comparing newly recognized fossils to lists of diagnostic characters from the literature as opposed to performing phylogenetic analyses and interpreting the results, and finds Cryptovaranoides outside Squamata again.

      Although a few of the character discussions and the discussion of at least one of the isolated bones can probably still be improved (and two characters are addressed twice), I see no sign that the discussion is going in circles or otherwise becoming unproductive. I can even imagine that the present contribution will end it.

    3. Reviewer #2 (Public review):

      Congratulations on this thorough manuscript on the phylogenetic affinities of Cryptovaranoides. Recent interpretations of this taxon, and perhaps some others, have greatly changed the field's understanding of reptile origins- for better and (likely) for worse.

      This manuscript offers a careful review of the features used to place Cryptovaranoides within Squamata and adequately demonstrates that this interpretation is misguided, and therefore reconciles morphological and molecular data, which is an important contribution to the field of paleontology. The presence of any crown squamate in the Permian or Triassic should be met with skepticism, the same sort of skepticism provided in this manuscript.

      I have outlined some comments addressing some weaknesses that I believe will further elevate the scientific quality of the work. A brief, fresh read‑through to refine a few phrases, particularly where the discussion references Whiteside et al. could also give the paper an even more collegial tone.

      This manuscript can be largely improved by additional discussion and figures, where applicable. When I first read this manuscript, I was a bit surprised at how little discussion there was concerning both non-lepidosauromorph lepidosaurs as well as stem-reptiles more broadly. This paper makes it extremely clear that Cryptovaranoides is not a squamate, but would greatly benefit in explaining why many of the characters either suggested by former studies to be squamate in nature or were optimized as such in phylogenetic analyses are rather widespread plesiomorphies present in crownward sauropsids such as millerettids, younginids, or tangasaurids. I suggest citing this work where applicable and building some of the discussion for a greatly improved manuscript. In sum:

      (1) The discussion of stem-reptiles should be improved. Nearly all of the supposed squamate features in Cryptovaranoides are present in various stem-reptile groups. I've noted a few, but this would be a fairly quick addition to this work. If this manuscript incorporates this advice, I believe arguments regarding the affinities of Cryptovaranoides (at least within Squamata) will be finished, and this manuscript will be better off for it.

      (2) I was also surprised at how little discussion there was here of putative stem-squamates or lepidosauromorphs more broadly. A few targeted comparisons could really benefit the manuscript. It is currently unclear as to why Cryptovaranoides could not be a stem-lepidosaur, although I know that the lepidosaur total-group in these manuscripts lacks character sampling due to their scarcity.

      (3) This manuscript can be improved by additional figures, such as the slice data of the humerus. The poor quality of the scan data for Cryptovaranoides is stated during this paper several times, yet the scan data is often used as evidence for the presence or absence of often minute features without discussion, leaving doubts as to what condition is true. Otherwise, several sections can be rephrased to acknowledge uncertainty, and probably change some character scorings to '?' in other studies.

    4. Reviewer #3 (Public review):

      Summary:

      The study provides an interesting contribution to our understanding of Cryptovaranoides relationships, which is a matter of intensive debate among researchers. My main concerns are in regard to the wording of some statements, but generally, the discussion and data are well prepared. I would recommend moderate revisions.

      Strengths:

      (1) Detailed analysis of the discussed characters.

      (2) Illustrations of some comparative materials.

      Weaknesses:

      Some parts of the manuscript require clarification and rewording.

      One of the main points of criticism of Whiteside et al. is using characters for phylogenetic considerations that are not included in the phylogenetic analyses therein. The authors call it a "non-trivial substantive methodological flaw" (page 19, line 531). I would step down from such a statement for the reasons listed below:

      (1) Comparative anatomy is not about making phylogenetic analyses. Comparative anatomy is about comparing different taxa in search of characters that are unique and characters that are shared between taxa. This creates an opportunity to assess the level of similarity between the taxa and create preliminary hypotheses about homology. Therefore, comparative anatomy can provide some phylogenetic inferences. That does not mean that tests of congruence are not needed. Such comparisons are the first step that allows creating phylogenetic matrices for analysis, which is the next step of phylogenetic inference. That does not mean that all the papers with new morphological comparisons should end with a new or expanded phylogenetic matrix. Instead, such papers serve as a rationale for future papers that focus on building phylogenetic matrices.

      (2) Phylogenetic matrices are never complete, both in terms of morphological disparity and taxonomic diversity. I don't know if it is even possible to have a complete one, but at least we can say that we are far from that. Criticising a work that did not include all the possibly relevant characters in the phylogenetic analysis is simply unfair. The authors should know that creating/expanding a phylogenetic matrix is a never-ending work, beyond the scope of any paper presenting a new fossil.

      (3) Each additional taxon has the possibility of inducing a rethinking of characters. That includes new characters, new character states, character state reordering, etc. As I said above, it is usually beyond the scope of a paper with a new fossil to accommodate that into the phylogenetic matrix, as it requires not only scoring the newly described taxon but also many that are already scored. Since the digitalization of fossils is still rare, it requires a lot of collection visits that are costly in terms of time.

      (4) If I were to search for a true flaw in the Whiteside et al. paper, I would check if there is a confirmation bias. The mentioned paper should not only search for characters that support Cryptovaranoides affinities with Anguimorpha but also characters that deny that. I am not sure if Whiteside et al. did such an exercise. Anyway, the test of congruence would not solve this issue because by adding only characters that support one hypothesis, we are biasing the results of such a test.

      To sum up, there is nothing wrong with proposing some hypotheses about character homology between different taxa that can be tested in future papers that will include a test of congruence. Lack of such a test makes the whole argumentation weaker in Whiteside et al., but not unacceptable, as the manuscript might suggest. My advice is to step down from such strong statements like "methodological flaw" and "empirical problems" and replace them with "limitations", which I think better describes the situation.

    1. eLife Assessment

      This revised manuscript provides fundamental findings on how the mouse barrel cortex connects to the dorsolateral striatum, uncovering that inputs from discrete whisker cortical columns are convergent and SPN-specific, but topographically organized at the population level. The evidence supporting this claim is compelling, demonstrating that SPNs uniquely integrate sparse input from variable stretches across the barrel cortex. The study would be of interest to basal ganglia and sensory-motor integration researchers.

    2. Reviewer #1 (Public review):

      Summary:

      By applying the laser scanning photostimulation (LSPS) approach to a novel slice preparation, the authors aimed to study the degree of convergence and divergence of cortical inputs to individual striatal projection neurons (SPNs).

      Strengths:

      The experiments were well-designed and conducted, and data analysis was thorough. The manuscript was well written and related work in the literature was properly discussed. This work has the potential to advance our understanding of how sensory inputs are integrated into the striatal circuits.

    3. Reviewer #2 (Public review):

      Summary:

      How corticostriatal synaptic connectivity gives rise to SPN encoding of sensory information is an important and currently unanswered question. The authors utilize a clever slice preparation in combination with electrophysiology and glutamate uncaging to dissect the synaptic connectivity between barrel cortex and individual striatal SPNs. In addition to mapping connectivity across major anatomical axes and cortical layers, the authors provide data showing that SPNs uniquely integrate sparse input from variable stretches across barrel cortex.

      Strengths:

      The methodology shows impressive rigor and the data robustly support the authors conclusions. Overall, the manuscript addresses its core question, provides valuable insights into corticostriatal architecture, and is a welcomed addition to the field.

    4. Reviewer #3 (Public review):

      Summary:

      The authors explored how individual dorsolateral striatum (DLS) spiny projection neurons (SPNs) receive functional input from whisker-related cortical columns. The authors developed and validated a novel slice preparation and method to which they applied rigorous functional mapping and thorough analysis. They found that individual SPNs were driven by sparse, scattered cortical clusters. Interestingly, while the cortical input fields of nearby SPNs had some degree of overlap, connectivity per SPN was largely distinct. Despite sparse, heterogeneous connectivity, topographical organization was identified. The authors lastly compared direct (D1) vs. indirect (D2) pathway cells, concluding that overall connectivity patterns were the same, but D1 cells received stronger input from L6 and D2 cells from L2/3. The paper thoughtfully addresses the question of whether barrel cortex broadly or selectively innervates SPNs. Their results indicate selective input that is loosely topographic. Their work deepens the understanding of how whisker-related somatosensory signals can drive striatal neurons.

      Strengths:

      Overall this is a carefully conducted study, and the major claims are well-supported. The use of a novel ex vivo slice prep that keeps relevant corticostriatal projections intact allows for careful mapping of the barrel cortex to dorsolateral striatum SPNs. Careful reporting of both columnar and layer position, as well as postsynaptic SPN type (D1 or D2) allows the authors to uncover novel details about how the dorsolateral striatum represents whisker-related sensory information.

      Weaknesses:

      Most technical weaknesses have now been addressed in the text.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work focuses on the connection strength of the corticostriatal projections, without considering the involvement of synaptic plasticity in sensory integration.

      Thank you for raising this point. Indeed, sensory integration is a complex process with a multitude of factors beyond connectivity patterns and synaptic strength. In addition, it is true that both connectivity levels and synaptic strength can be modified by plasticity. 

      We modified our conclusion as follows, line 354: 

      “Since the inputs to a single SPN represent only a limited subset of whisker columns, a complete representation of whiskers could emerge at the population level, with each SPN’s representation complementing those of its neighbors (Fig. 7). These observations raise the hypothesis of a selective or competitive process underlying the formation of corticostriatal synapses. The degree of input convergence onto SPNs could be modulated by plasticity, potentially enabling experience-driven reconfiguration of S1 corticostriatal coupling. “

      Reviewer #2 (Public review):

      A few minor changes to the figures and text could be made to improve clarity.

      We thank you for having taken the time to indicate where changes could benefit the paper. We followed your recommendations. 

      Reviewer #3 (Public review):

      (1) Several factors may contribute to an underestimation of barrel cortex inputs to SPNs (and thus an overestimate of the input heterogeneity among SPNs). First, by virtue of the experiments being performed in an acute slice prep, it is probable that portions of recorded SPN dendritic trees have been dissected (in an operationally consistent anatomical orientation). If afferents happen to systematically target the rostral/caudal projections of SPN dendritic fields, these inputs could be missed. Similarly, the dendritic locations of presynaptic cortical inputs remain unknown (e.g., do some inputs preferentially target distal vs proximal dendritic positions?). As synaptic connectivity was inferred from somatic recordings, it's likely that inputs targeting the proximal dendritic arbor are the ones most efficiently detected. Mapping the dendritic organization of synapses is beyond the scope of this work, but these points could be broached in the text.

      Thank you for this analysis. The positions of S1 spines have been mapped on the SPN dendritic arbor by the group of Margolis (B.D. Sanabria et al., ENeuro 2024,10.1523/ENEURO.0503-23.2023). They observed that S1 spines were at 80 % on dendrites but with a specific distribution, on average rather close to the soma.  In this study, S1 spines did not exhibit a specific distribution that would systematically hinder their detection in a slice. But, it remains that the position in the dendritic arbor where an S1 input is received does indeed impact its detection in somatic recordings. We modified the discussion as follows, line 275:

      “The LSPS combined with glutamate uncaging mapped projections contained in the slice, intact from the presynaptic cell bodies to the SPN dendrites. Some cortical inputs targeting distal SPN dendrites may have gone undetected, either due to attenuation of synaptic events recorded at the soma or because distal dendritic branches were lost during slice preparation. Indeed, about 80 % of S1 synaptic contacts are distributed along dendrites (Sanabria et al., 2024). However, synapses located distally are proportionally rare (Sanabria et al., 2024), and our estimates suggest that the loss of S1 input was minimal (see Methods). More significantly, our mapping only included projections from neuronal somata located within the S1 barrel field in the slice: projections from cortical columns outside the slice were not stimulated. For this reason, our study characterized connectivity patterns rather than the full extent of connectivity with the barrel cortex.”

      We explain our estimation of truncated S1 contacts in the Methods, line 434:

      “To estimate the loss of S1 synaptic contacts caused by slice preparation, we modeled the SPN dendritic field as a sphere centered on the soma. S1 synapses were at 80 % distributed radially along dendrites, according to the specific distribution described by Sanabria et al. (2024). The simulation also incorporated the known distribution of SPN dendritic length as a function of distance from the soma (Gertler et al., 2008). Finally, it assumed that synapse placement was isotropic, with equal probability in all directions from the soma. Truncation was simulated by removing a spherical cap at one pole of the sphere, reflecting the depth of our recordings (beyond 80 μm). Based on this simulation, the loss of S1 inputs was < 10 %.”

      (2) In general, how specific (or generalizable) is the observed SPN-specific convergence of cortical barrel cortex projections in the dorsolateral striatum? In other words, does a similar cortical stimulation protocol targeted to a non-barrel sensory (or motor) cortex region produce similar SPN-specific innervation patterns in the dorsolateral striatum?

      This is an interesting question that could be addressed using the LSPS approach in areas for which ex vivo preparations have been designed to maintain the integrity of the corticostriatal projections, such as A1, M1 and S2.  

      We included this point in the discussion, line 299: 

      ” The speckled connectivity pattern of individual SPNs, arising from the abundant and diffuse cortical innervation in the DLS, suggests that somatosensory corticostriatal synapses are established through a selective and/or competitive process. It is important to determine whether this sparse innervation of SPNs by S1 is a characteristic shared with other projections. In particular, it will be interesting to test this hypothesis on the auditory projections targeting the posterior striatum, where neurons exhibit clear tone frequency selectivity (Guo et al., 2018).”

      (3) In general, some of the figure legends are extremely brief, making many details difficult to infer. Similarly, some statistical analyses were either not carried out or not consistently reported.

      We thank you for having taken the time to indicate where changes could benefit the paper. We have followed your recommendations. 

      Reviewer #1 (Recommendations for the authors):

      A few limitations should be discussed in the manuscript:

      (1) The manuscript should mention that most corticostriatal synapses are formed at the dendritic spines of the SPNs, not their cell bodies. This is particularly important regarding the analysis and interpretation of the data in Figure 4.

      Thank you for this comment. This characteristic is important with regards to a limitation of electrophysiological recordings. This is now discussed:

      Line 275:

      “The LSPS combined with glutamate uncaging mapped projections contained in the slice, intact from the presynaptic cell bodies to the SPN dendrites. Some cortical inputs targeting distal SPN dendrites may have gone undetected, either due to attenuation of synaptic events recorded at the soma or because distal dendritic branches were lost during slice preparation. Indeed, about 80 % of S1 synaptic contacts are distributed along dendrites (Sanabria et al., 2024). However, synapses located distally are proportionally rare (Sanabria et al., 2024), and our estimates suggest that the loss of S1 input was minimal (see Methods).“

      Line 313:

      [...],, we found that overlaps between the connectivity maps of SPNs were rare and, when present, involved only a small fraction of the connected sites. This indicates that neighboring SPNs predominantly integrated distinct inputs from the barrel cortex, although it is possible that overlapping inputs received in distal dendrites were not all detected”

      (1) SPNs show up- and down-states in vivo, which were not mimicked by the present study since all cells were held at - 80 mV (Line 364) and recorded at room temperature (Line 368). It should be discussed how the conclusion of the present work may be affected by the up/down states of SPNs in vivo.

      Thank you for raising this point. Indeed, our experimental conditions were not designed to capture the effects of network oscillatory activity. Instead, LSPS conditions were optimized to reveal monosynaptic connectivity between neurons in S1 and their postsynaptic targets. These optimizations include the use of a high concentration of extracellular divalents (4 mM Ca<sup>2+</sup> and Mg<sup>2+</sup>) to generate robust yet moderate and spatially-restricted stimulations of cortical cells and reliable neurotransmitter release (Shepherd, Pologruto and Svoboda, Neuron 2003; 10.1016/s0896-6273(03)00152-1; in our study, see Fig. 1D  and Suppl Fig. 2). Investigating the pre- and postsynaptic modulations of the corticostriatal coupling by up- and down-states would require specific conditions. 

      The conclusion now acknowledges that functional connectivity is subject to plasticity in general, line 358:

      “The degree of input convergence onto SPNs could be modulated by plasticity, potentially enabling experience-driven reconfiguration of S1 corticostriatal coupling.”

      (2) In addition to population-level integration (Line 337), sensory integration is likely to involve synaptic plasticity (like via NMDARs), which was not studied in the present work

      Thank you for raising this point. Indeed, we agree that sensory integration is a complex process with a multitude of factors beyond connectivity patterns and synaptic strength. We also agree that both connectivity levels and synaptic strength can be modified by plasticity. 

      We modified our conclusion as follows, line 354:

      “Since the inputs to a single SPN represent only a limited subset of whisker columns, a complete representation of whiskers could emerge at the population level, with each SPN’s representation complementing those of its neighbors (Fig. 7). These observations raise the hypothesis of a selective or competitive process underlying the formation of corticostriatal synapses. The degree of input convergence onto SPNs could be modulated by plasticity, potentially enabling experience-driven reconfiguration of S1 corticostriatal coupling. “

      (3) The potential corticostriatal connectivity may be underestimated due to loss of axonal branches during slice resection, and this might contribute to the conclusion of "sparse connectivity". Whether the author has considered performing LSPS studies within the striatum (i.e., stimulating ChR2-expressing cortical axon terminals) and whether this experiment may consolidate the conclusion of the present work.

      We appreciate the suggestion to employ Subcellular Channelrhodopsin-2-Assisted Circuit Mapping (sCRACM) to study the density of S1 spines on SPNs dendritic arbor. If ChR2 is broadly expressed in S1, this approach would likely increase spine detection, as spines contacted by presynaptic neurons located inside and outside the slice would now be activated. If ChR2 expression could be restricted to the whisker columns present in our preparation, enhanced detection could still occur, but in this case, it would reflect the activation of spines contacted by specific ChR2<sup>+</sup> axonal branches that exit and re-enter the slice to form synapses on the recorded SPN. The anatomy of corticostriatal axonal arbors suggest convoluted axonal trajectories could be relatively rare (T. Zheng and C.J. Wilson, J Neurophysiol. 2001; 10.1152/jn.00519.2001; M. Lévesque et al., Brain Res. 1996; 10.1016/0006-8993(95)01333-4).  

      Moreover, it is important to remember that sCRACM does not generate connectivity maps between 2 structures, but maps of spines on dendritic arbors (Petreanu L.T. et al., Nature 2009; 10.1038/nature07709.). Precise localization of presynaptic cell bodies was key for the present study, as it enabled distinguishing between different connectivity patterns and between different degrees of convergence of inputs from adjacent S1 cortical columns present in the slice (schematized in Fig. 1). Distinguishing these inputs using the stimulation of axon terminals would require the possibility to express one distinct opsin in each whisker column (or each cortical layer, depending on the axis of investigation). This is an exciting perspective but the technology is not yet available to our knowledge. 

      To emphasize our reasons for using LSPS, we revised the final paragraph of the Introduction, line 69: 

      “LSPS enabled precise mapping of corticostriatal functional connectivity by identifying cortical sites where stimulation evoked synaptic currents in the recorded SPNs, thereby localizing the cell bodies of their presynaptic neurons. This approach allowed us to determine both the cortical column and layer of origin within the barrel field in the slice for each SPN input.”

      Reviewer #2 (Recommendations for the authors):

      (1)  Figure 2F: SPN and cortical regions - both are shown in green. The distinction between the two would be clearer if SPNs were made a different color.

      Done

      (2)  Figure 2H: Based on their data, the authors conclude that since EPSCs in SPNs had small amplitudes (~40pA), only one or a few presynaptic cortical neurons (< 5) were activated by uncaging. It is not clear how this number was estimated. Either this statement should be qualified with data or citations provided to support it.

      We thank you for noticing it. We modified this part as follows, line 105:

      “Based on known amplitudes of spontaneous and miniature EPSCs in SPNs (10-20 pA on average; Kreitzer and Malenka, 2007; Cepeda et al., 2008; Dehorter et al., 2011; Peixoto et al., 2016), this finding is consistent with the presence of only one or a few presynaptic cells (≤ 5) at each connected site of the map.”

      (3) Figure 2I: The top graph is difficult to understand without already seeing the lower plot. Moving it below or to the side would help the reader follow the data more easily.

      done

      (4) Figure 3D: In Line 162, the authors state, " Furthermore, SPNs receiving input from a single column were often located near others receiving input from multiple ones (Figure 3D), reinforcing that the low functional connectivity with barrel columns in the slice was genuine in these cases." However, Figure 3D does not show spatial information about SPNs relative to each other. This data should be added or the statement adjusted to reflect what is shown in the panel.

      Corrected as follows, line 167:

      “Furthermore, SPNs receiving input from a single column were often located in slices where other cells received input from multiple ones (Fig. 3D), reinforcing that the low functional connectivity with barrel columns in the slice was genuine in these cases.”

      (5) Figure 3F: Are the authors attempting to show how cluster number, cluster width, and connectivity gaps contribute to input field width? If so, this could be clarified by flipping the x- and y-axes so that the input field width is the y-axis in each case. Additionally, the difference between black and white points should be stated (or, if there is no difference, made to be the same). The significance of the dotted red line vs. the solid red lines should also be stated in the figure legend.

      These plots illustrate how cluster number, cluster width, and ratio of connectivity gaps over total length vary as a function of input field width. As expected, wider input fields contain more clusters (top). However, the overall density of connected sites does not increase with input field width, as indicated by a higher ratio of connectivity gaps over total length (bottom).

      This suggests the presence of a mechanism that regulates the connectivity level of individual SPNs (mentioned in the discussion). We prefer this orientation because the flipped one makes a cluttered panel due to different X axis labels. Symbols and lines were corrected. The correlation coefficients and statistics are now indicated in the panels and in the legend.

      (6) Figure 3H: The schematic is very useful for highlighting the core conclusions and is greatly appreciated. The pie charts are a bit hard to see and could be replaced with the percentages stated simply as text within the figure. It would also help to label the panel as "Summary," so readers can quickly identify its purpose.

      Done

      (7) Figures 4B-D: To clarify the overall percentage, the maximum for the y-axis should be set to 100% in each panel.

      Done

      Reviewer #3 (Recommendations for the authors):

      (1) Though mostly minor, several sentences/statements in the manuscript are confusing or overstated. For example:

      a. Lines 62-63: "Studies have found that inputs received by D1 SPNs were stronger than those received by D2 SPNs" is a broad statement that should be qualified.

      We changed this sentence for: 

      “Electrophysiological studies have found that inputs received by D1 SPNs were stronger than those received by D2 SPNs, both in vivo and ex vivo (Reig and Silberberg, 2014 ; Filipović et al., 2019 ; Kress et al., 2013 ; Parker et al., 2016).”

      b. Lines 118-119: "EPSCs evoked with stimulations in L2/3 to L5b had similar amplitudes (Figure 2H), suggesting that L5a dominated these other layers thanks to a greater connectivity with SPNs principally." Here, the word "connectivity" is vague and could easily be misunderstood. Connectivity could refer to the amplitude of corticostriatal EPSCs, which the authors stated are not different between L2/3-L5b. Presumably, connectivity here refers to % of connected SPNs, but for the sake of clarity, the authors should be more explicit, e.g,. "...L5a dominated the other layers because a larger fraction of SPNs received connections from L5a, rather than because L5a synapses were stronger."

      We changed the sentence for (line 122): 

      “EPSCs evoked with stimulations in L2/3 to L5b had similar amplitudes (Fig. 2H), suggesting that L5a dominance over these other layers is primarily due to a higher likelihood of SPNs being connected to it, rather than to stronger synaptic inputs.”

      c. In the Figure 4 legend, (A) says "Four example slices with 2 to 4 recordings. Same as in Figure 2A." Did the authors mean Figure 3A?

      Done

      d.Line 184: Should Figure 4B, C actually be Figure 4D?

      Done

      (2) Line 32: typo in Sippy et al. reference.

      Done

      (3) In Figure 2I, the label "dSPN" is confusing, as in the literature, dSPN often refers to the direct pathway SPN.

      Done

      (4) The y-axes in Figure 3C should be better labeled/explained.

      Fig.3C. Median (red) and 25-75th percentiles (box) of cluster width and spacing, expressed in µm (left Y axis) and number of cortical columns (right Y axis). Labels have been changed in the figure.

      (5)  Lines 150-152: "...45 % of the input fields with several clusters produced no synaptic response upon stimulation." This wording is confusing. It can be inferred that the authors mean "no synaptic response in the gaps between clusters." However, their phrasing omits this crucial detail and reads as though those input fields produce no response at all.

      We changed this sentence for (line 154):

      “Strikingly, regions lacking evoked synaptic responses (i.e., connectivity gaps) made up an average of 45 % of the length of input fields with multiple clusters (maps collapsed along the vertical axis; Fig. 3F, bottom). “

      (6)  Lines 184-186: "DLS SPNs could receive inputs from the same domain in the barrel cortex and yet have patterns of cortical innervation without or little redundancy." This should be rephrased to "with little to no redundancy."

      Done

      (7)  Lines 186-187: "They support a connectivity model in which synaptic connections on each SPNs..." should be revised to "connections to each SPN...".

      Done

    1. eLife Assessment

      In this manuscript, the authors describe a software package for automatic differentiation of action potentials generated by excitatory and inhibitory neurons, acquired using high-density microelectrode arrays. The work is valuable as it offers a tool with the potential to automatically identify these neuron types in vitro. It is solid, as it provides a tool to identify putative excitatory and inhibitory neurons on high-density electrode arrays, which can be used in conjunction with other existing spike sorting pipelines.

    2. Reviewer #1 (Public review):

      Summary:

      The authors note that while many software packages exist for spike sorting, these do not automatically differentiate with known accuracy between excitatory and inhibitory neurons. Moreover, most existing spike sorting packages are for in vivo use, where the majority of electrodes are separated from each other by several hundred microns or more. There is a need for spike sorting packages that can take advantage of high-density electrode arrays where all electrodes are within a few tens of microns from other electrodes. Here, the authors offer such a software package with SpikeMAP, and they validate its performance in identifying parvalbumin interneurons that were optogenetically stimulated.

      Strengths:

      The main strength of this work is that the authors use ground truth measures to show that SpikeMAP can take features of spike shapes to correctly identify known parvalbumin interneurons against a background of other neuron types. They use spike width and peak to peak distance as the key features for distinguishing between neuron types, a method that has been around for many years (Barthó, Peter, et al. "Characterization of neocortical principal cells and interneurons by network interactions and extracellular features." Journal of neurophysiology 92.1 (2004): 600-608.), but whose performance has not been validated in the context of high-density electrode arrays.

      Another strength of this approach is that it is automated - a necessity if your electrode array has 4096 electrodes. Hand-sorting or even checking such a large number of channels is something even the cruellest advisor would not wish upon a graduate student. With such large channel counts, it is essential to have automated methods that are known to work accurately. Hence, the combination of validation and automation is an important advance.

      A nice feature of this work is that with high-density electrode arrays, the spike waveforms appear on multiple nearby electrodes simultaneously. And since spike amplitudes fall off with distance, this allows triangulation of neuron locations within the regular electrode array. Thus, spike correlations between neuron types, or within neuron types, can be plotted as a function of distance. While SpikeMAP is not the first to do this (Peyrache, Adrien, et al. "Spatiotemporal dynamics of neocortical excitation and inhibition during human sleep." Proceedings of the national academy of sciences 109.5 (2012): 1731-1736.), it is a welcome capability of this package.

      It is also good that the code for this package is open-source, allowing a community of people (I expect in vitro labs will especially want to use this) to use the code and further improve it.

      Weaknesses:

      As this code was developed for use with a 4096-electrode array, it is important to be aware of double counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas: First, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code. Second, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.

      Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.

      Appraisal:

      This work addresses the need for an automated spike sorting software package for high density electrode arrays. Although no spike sorting software is flawless, the package presented here, SpikeMAP, has been validated on PV interneurons, inspiring a degree of confidence. This is a good start, and further validation on other neuron types could increase that confidence. Groups doing in vitro experiments, where 4096 electrode arrays are more common, could find this system particularly helpful.

      Comments on revised version:

      I appreciate the dialogue that has occurred over this submission. I have seen how the authors have taken into account the issues that I have raised, as well as those brought up by reviewer 2. I am satisfied that the paper has improved and is now a novel and useful contribution in the area of spike sorting.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, entitled "SpikeMAP: An unsupervised spike sorting pipeline for cortical excitatory and inhibitory 2 neurons in high-density multielectrode arrays with ground-truth validation", the authors are presenting spikeMAP, a pipeline for the analysis of large-scale recordings of in vitro cortical activity. According to the authors, spikeMAP not only allows for the detection of spikes produced by single neurons (spike sorting), but also allows for the reliable distinction between genetically determined cell types by utilizing viral and optogenetic strategies as ground-truth validation. While I find that the paper is nicely written, and easy to follow, I find that the algorithmic part of the paper is not really new and should have been more carefully compared to existing solutions. While the GT recordings to assess the possibilities of a spike sorting tool to distinguish properly between excitatory and inhibitory neurons is interesting, spikeMAP does not seem to bring anything new to state of the art solutions, and/or, at least, it would deserve to be properly benchmarked. This is why I would suggest the authors to perform a more intensive comparison with existing spike sorters.

      Strengths:

      The GT recordings with optogenetic activation of the cells, based on the opsins is interesting and might provide useful data to quantify how good spike sorting pipelines are, in vitro, to discriminate between excitatory and inhibitory neurons. Such an approach can be quite complementary with artificially generated ground truth.

      Weaknesses:

      The global workflow of spikeMAP, described in Figure 1, seems to be very similar to the one of [Hilgen et al, 2020, 10.1016/j.celrep.2017.02.038.]. Therefore, the first question is what is the rationale of reinventing the wheel, and not using tools that are doing something very similar (as mentioned by the authors themselves). I have a hard time, in general, believing that spikeMAP has something particularly special, given its Methods, compared to state-of-the-art spike sorters. This is why at the very least, the title of the paper is misleading, because it let the reader think that the core of the paper will be about a new spike sorting pipeline. If this is the main message the authors want to convey, then I think that numerous validations/benchmarks are missing to assess first how good spikeMAP is, w.r.t. spike sorting in general, before deciding if this is indeed the right tool to discriminate excitatory vs inhibitory cells. The GT validation, while interesting, is not enough to entirely validate the paper. The details are a bit too scarce to me, or would deserve to be better explained (see other comments after)

      Regarding the putative location of the spikes, it has been shown that center of mass, while easy to compute, is not the most accurate solution [Scopin et al, 2024, 10.1016/j.jneumeth.2024.110297]. For example, it has an intrinsic bias for finding positions within the boundaries of the electrodes, while some other methods such as monopolar triangulation or grid-based convolution might have better performances. Can the authors comment on the choice of Center of Mass as a unique way to triangulate the sources?

      Still in Figure 1, I am not sure to really see the point of Spline Interpolation. I see the point of such a smoothing, but the authors should demonstrate that it has a key impact on the distinction of Excitatory vs. Inhibitory cells. What's special with the value of 90kHz for a signal recorded at 18kHz? What is the gain with spline enhancement compared to without? Does such a value depend on the sampling rate, or is it a global optimum found by the authors?

      Figure 2 is not really clear, especially panel B. The choice of the time scale for the B panel might not be the most appropriate, and the legend filtered/unfiltered with a dot is not clear to me in Bii. In panel E, the authors are making two clusters with PCA projections on single waveforms. Does this mean that the PCA is only applied to the main waveforms, i.e. the ones obtained where the amplitudes are peaking the most? This is not really clear from the methods, but if this is the case, then this approach is a bit simplistic and not really matching state-of-the-art solutions. Spike waveforms are quite often, especially with such high-density arrays, covering multiple channels at once and thus the extracellular patterns triggered by the single units on the MEA are spatio-temporal motifs occurring on several channels. This is why, in modern spike sorters, the information in a local neighbourhood is often kept to be projected, via PCA, on the lower dimensional space before clustering. Information on a single channel only might not be informative enough to disambiguate sources. Can the authors comment on that, and what is the exact spatial resolution of the 3Brain device? The way the authors are performing the SVD should be clarified in the methods section. Is it on a single channel, and/or on multiple channels in a local neighbourhood?

      About the isolation of the single units, here again, I think the manuscript lacks some technical details. The authors are saying that they are using a k-means cluster analysis with k=2. This means that the authors are explicitly looking for 2 clusters per electrodes. If so, this is a really strong assumption that should not be held in the context of spike sorting, because since it is a blind source separation technique, one cannot pre-determine in advance how many sources are present in the vicinity of a given electrode. While the illustration on Figure 2E is ok, there is no guarantee that one cannot find more clusters, so why this choice of k=2? Again, this is why most modern spike sorting pipelines are not relying on k-means, to avoid any hard coded number of clusters. Can the authors comment on that?

      I'm surprised by the linear decay of the maximal amplitude as a function of the distance from soma, as shown in Figure 2H. Is it really what should be expected? Based on the properties of the extracellular media, shouldn't we expect a power law for the decay of the amplitude? This is strange that up to 100um away from the some, the max amplitude only dropped from 260 to 240 uV. Can the authors comment on that? It would be interesting to plot that for all neurons recorded, in a normed manner V/max(V) as function of distances, to see what the curve looks like

      In Figure 3A, it seems that the total number of cells is rather low for such a large number of electrodes. What are the quality criteria that are used to keep these cells? Did the authors exclude some cells from the analysis, and if yes, what are the quality criteria that are used to keep cells? If no criteria are used (because none is mentioned in the Methods), then how come so few cells are detected, and can the authors convince us that these neurons are indeed "clean" units (RPVs, SNRs, ...)

      Still in Figure 3A, it looks like there is a bias to find inhibitory cells at the borders, since they do not appear to be uniformly distributed over the MEA. Can the authors comment on that? What would be the explanation for such a behaviour? It would be interesting to see some macroscopic quantities on Excitatory/Inhibitory cells, such as mean firing rates, averaged SNRs, ... Because again, in Figure 3C, it is not clear to me that the firing rates of inhibitory cells is higher than Excitatory ones, while it should be in theory.

      For Figure 3 in general, I would have performed an exhaustive comparison of putative cells found by spikeMAP and other sorters. More precisely, I think that to prove the point that spikeMAP is indeed bringing something new to the field of spike sorting, the authors should have compared the performances of various spike sorters to discriminate Exc vs Inh cells based on their ground truth recordings. For example, either using Kilosort [Pachitariu et al, 2024, 10.1038/s41592-024-02232-7], or some other sorters that might be working with such large high-density data [Yger et al, 2018, 10.7554/eLife.34518]

      Figure 4 has a big issue, and I guess the panels A and B should be redrawn. I don't understand what the red rectangle is displaying.

      I understand that Figure 4 is only one example, but I have a hard time understanding from the manuscript how many slices/mice were used to obtain the GT data? I guess the manuscript could be enhanced by turning the data into an open access dataset, but then some clarification is needed. How many flashes/animals/slices are we talking about. Maybe this should be illustrated in Figure 4, if this figure is devoted to the introduction of the GT data.

      While there is no doubt that GT data as the ones recorded here by the authors are the most interesting data from a validation point of view, the pretty low yield of such experiments should not discourage the use of artificially generated recordings such as the ones made in [Buccino et al, 2020, 10.1007/s12021-020-09467-7] or even recently in [Laquitaine et al, 2024, 10.1101/2024.12.04.626805v1]. In these papers, the authors have putative waveforms/firing rates patterns for excitatory and inhibitory cells, and thus the authors could test how good they are in discriminating the two subtypes

      Comments on revised version:

      While I must thank the authors for their answers, I still think that they miss an important one, and only partially answering some of my concerns.

      I truly think that SpikeMAP would benefit with a comparison with a state-of-the-art spike sorting pipeline, for example Kilosort. The authors said that they made the sorter modular enough such that only the E/I classification step can be compared. I think this would be worth it, just to be sure that SpikeMAP spike sorting, which might be more simple than other recent solution (with template matching), is not missing some cells, and thus degrading the E/I classification performances. I know that such a comparison is not straightforward, because there is no clear ground truth, but I would still need to be convinced that the sorting pipelines is bringing something, on its own. While there is no doubt that the E/I classification layer can be interesting, especially given the recordings shared by the authors, I'm still a bit puzzled by the sorting step. Thus maybe either a Table, a figure, or even as Supplementary one. Or the authors could try to generate fake GT data with MEArec for example, with putative E/I cells (discriminated via waveforms and firing rates) and show on such (oversimplified) data that SpikeMAP is performing similarly to modern spike sorters. Otherwise, this is a bit hard to judge...

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      As this code was developed for use with a 4096 electrode array, it is important to be aware of double-counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas. Firstly, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code.

      Thank you for this comment. We have added a routine to the SpikeMAP to remove highly correlated spikes detected within a given spatial radius of each other. The following was added to the main text (line 149):

      “As an additional verification step, SpikeMAP allows the computation of spike-count correlations between putative neurons located within a user-defined radius. Signals that exceed a defined threshold of correlation can be rejected as they likely reflect the same underlying cell.”

      Secondly, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.

      We have added a routine to SpikeMAP that computes population spike rates to verify stationarity over time. We have also added a routine to identify putative bursting neurons through a Hartigan statistical dip test applied to the inter-spike distribution of individual cells.

      We added the following (line 204):

      “Further, SpikeMAP contains a routine to perform a Hartigan statistical dip test on the inter-spike distribution of individual cells to detect putative bursting neurons.”

      Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.

      We have added the following (line 326):

      “future work could include different inhibitory interneurons such as somatostatin (SOM) and vasoactive intestinal polypeptide (VIP) neurons to improve the classification of inhibitory cell types. Another avenue could involve applying SpikeMAP on artificially generated spike data (Buccino & Einevoll 2021; Laquitaine et al., 2024).”

      Reviewer #2 (Public review)

      Summary:

      While I find that the paper is nicely written and easy to follow, I find that the algorithmic part of the paper is not really new and should have been more carefully compared to existing solutions. While the GT recordings to assess the possibilities of a spike sorting tool to distinguish properly between excitatory and inhibitory neurons are interesting, spikeMAP does not seem to bring anything new to state-of-the-art solutions, and/or, at least, it would deserve to be properly benchmarked. I would suggest that the authors perform a more intensive comparison with existing spike sorters.

      Thank you for your insightful comment. A full comparison between SpikeMAP and related methods is provided in Table. 1. As can be seen, SpikeMAP is the only method listed that performs E/I sorting on large-scale multielectrodes. Nonetheless, several aspects of SpikeMAP included in the spike sorting pipeline do overlap with existing methods, as these constitute necessary steps prior to performing E/I identification. These steps are not novel to the current work, nor do they constitute rigid options that cannot be substituted by the user. Rather, we aim to offer SpikeMAP users the option to combine E/I identification with preliminary steps performed either through our software or through another package of their choosing. For instance, preliminary spike sorting could be done through Kilosort before importing the spike data into SpikeMAP for E/I identification. To allow greater flexibility, we have now modularized our suite so that E/I identification can be performed as a stand-alone module. We have clarified the text accordingly (line 317):

      “While SpikeMAP is the only known method to enable the identification of putative excitatory and inhibitory neurons on high-density multielectrode arrays (Table 1), several aspects of SpikeMAP included in the spike sorting pipeline (Figure 1) overlap with existing methods, as these constitute required steps prior to performing E/I identification. To enable users the ability to integrate SpikeMAP with existing toolboxes, we provide a modularized suite of protocols so that E/I identification can be performed separately from preliminary spike sorting steps. In this way, a user could carry out spike sorting through Kilosort or another package before importing their data to SpikeMAP for E/I identification.”

      Weaknesses:

      (1) The global workflow of spikeMAP, described in Figure 1, seems to be very similar to that of Hilgen et al. 2020 (10.1016/j.celrep.2017.02.038). Therefore, the first question is what is the rationale of reinventing the wheel, and not using tools that are doing something very similar (as mentioned by the authors themselves). I have a hard time, in general, believing that spikeMAP has something particularly special, given its Methods, compared to state-of-the-art spike sorters.

      The paper by Hilgen et al. is reported in Table 1. As seen, while this paper employs optogenetics, it does not target inhibitory (e.g., PV) cells. We have added the following clarification (line 82):

      “Despite evidence showing differences in action potential kinetics for distinct cell-types as well as the use of optogenetics (Hilgen et al., 2017), there exists no large-scale validation efforts, to our knowledge, showing that extracellular waveforms can be used to reliably distinguish cell-types.”

      This is why, at the very least, the title of the paper is misleading, because it lets the reader think that the core of the paper will be about a new spike sorting pipeline. If this is the main message the authors want to convey, then I think that numerous validations/benchmarks are missing to assess first how good spikeMAP is, with reference to spike sorting in general, before deciding if this is indeed the right tool to discriminate excitatory vs inhibitory cells. The GT validation, while interesting, is not enough to entirely validate the paper. The details are a bit too scarce for me, or would deserve to be better explained (see other comments after).

      We thank the reviewer for this comment, and have amended the title as follows:

      “SpikeMAP: An unsupervised pipeline for the identification of cortical excitatory and inhibitory neurons in high-density multielectrode arrays with ground-truth validation”

      (2) Regarding the putative location of the spikes, it has been shown that the center of mass, while easy to compute, is not the most accurate solution [Scopin et al, 2024, 10.1016/j.jneumeth.2024.110297]. For example, it has an intrinsic bias for finding positions within the boundaries of the electrodes, while some other methods, such as monopolar triangulation or grid-based convolution,n might have better performances. Can the authors comment on the choice of the Center of Mass as a unique way to triangulate the sources?

      We agree with the reviewer that the center-of-mass algorithm carries limitations that are addressed by other methods. To address this issue, we have included two additional protocols in SpikeMAP to perform monopolar triangulation and grid-based convolution, offering additional options for users of the package. The text has been clarified as follows (line 429):

      “In addition to center-of-mass triangulation, SpikeMAP includes protocols to perform monopolar triangulation and grid-based convolution, offering additional options to estimate putative soma locations based on waveform amplitudes.”

      (3) Still in Figure 1, I am not sure I really see the point of Spline Interpolation. I see the point of such a smoothing, but the authors should demonstrate that it has a key impact on the distinction of Excitatory vs. Inhibitory cells. What is special about the value of 90kHz for a signal recorded at 18kHz? What is the gain with spline enhancement compared to without? Does such a value depend on the sampling rate, or is it a global optimum found by the authors?

      We clarified the text as follows (line 183):

      “While we found that a resolution of 90 kHZ provided a reasonable estimate of spike waveforms, this value can be adjusted as a parameter in SpikeMAP.”

      (4) Figure 2 is not really clear, especially panel B. The choice of the time scale for the B panel might not be the most appropriate, and the legend filtered/unfiltered with a dot is not clear to me in Bii.

      We apologize for the rendering issues in the Figures that occurred during conversion into PDF format. We have now ensured that all figures are properly displayed.

      In panel E, the authors are making two clusters with PCA projections on single waveforms. Does this mean that the PCA is only applied to the main waveforms, i.e. the ones obtained where the amplitudes are peaking the most? This is not really clear from the methods, but if this is the case, then this approach is a bit simplistic and does not really match state-of-the-art solutions. Spike waveforms are quite often, especially with such high-density arrays, covering multiple channels at once, and thus the extracellular patterns triggered by the single units on the MEA are spatio-temporal motifs occurring on several channels. This is why, in modern spike sorters, the information in a local neighbourhood is often kept to be projected, via PCA, on the lower-dimensional space before clustering. Information on a single channel only might not be informative enough to disambiguate sources. Can the authors comment on that, and what is the exact spatial resolution of the 3Brain device? The way the authors are performing the SVD should be clarified in the methods section. Is it on a single channel, and/or on multiple channels in a local neighbourhood?

      We agree with the reviewer that it would be useful to have the option of performing PCA on several channels at once, since spikes can occur at several channels at the same time. We have now added a routine to SpikeMAP that allows users to define a radius around individual channels prior to performing PCA. The text was clarified as follows (line 131):

      “The SpikeMAP suite also offers a routine to select a radius around individual channels in order to enter groups of adjacent channels in PCA.”

      (5) About the isolation of the single units, here again, I think the manuscript lacks some technical details. The authors are saying that they are using a k-means cluster analysis with k=2. This means that the authors are explicitly looking for 2 clusters per electrode? If so, this is a really strong assumption that should not be held in the context of spike sorting, because, since it is a blind source separation technique, one can not pre-determine in advance how many sources are present in the vicinity of a given electrode. While the illustration in Figure 2E is ok, there is no guarantee that one can not find more clusters, so why this choice of k=2? Again, this is why most modern spike sorting pipelines do not rely on k-means, to avoid any hard-coded number of clusters. Can the authors comment on that?

      We clarified the text as follows (line 135):

      “In SpikeMAP, the optimal number of k-means clusters can be chosen by a Calinski-Harabasz criterion (Calinski and Harabasz, 1974) or pre-selected by the user.”

      (6) I'm surprised by the linear decay of the maximal amplitude as a function of the distance from the soma, as shown in Figure 2H. Is it really what should be expected? Based on the properties of the extracellular media, shouldn't we expect a power law for the decay of the amplitude? This is strange that up to 100um away from the soma, the max amplitude only dropped from 260 to 240 uV. Can the authors comment on that? It would be interesting to plot that for all neurons recorded, in a normed manner V/max(V) as function of distances, to see what the curve looks like.

      We added Supplemental Figure 1 showing the drop in voltage over all putative somas (N=1,950) of one recording, after excluding somas with an increase voltage away from electrode peak and computing normed values V/max(V). We see a distribution of slopes as well as intercepts across somas, showing some variability across recordings sites. As the reviewer suggests, it is possible that a power-law describes these data better than a linear function, and this would need to be investigated further by quantitatively comparing the fit of these functions.

      (7) In Figure 3A, it seems that the total number of cells is rather low for such a large number of electrodes. What are the quality criteria that are used to keep these cells? Did the authors exclude some cells from the analysis, and if yes, what are the quality criteria that are used to keep cells? If no criteria are used (because none are mentioned in the Methods), then how come so few cells are detected, and can the authors convince us that these neurons are indeed "clean" units (RPVs, SNRs, ...)?

      The reviewer is correct to point out that a number of stringent criteria were employed to exclude some putative cells. We now outline these criteria directly in the text (line 161):

      “ At different steps in the process, conditions for rejecting spikes can be tailored by applying: (1) a stringent threshold to filtered voltages; (2) a minimal cut-off on the signal-to-noise ratio of voltages (see Supplemental Figure 2); (3) an LDA for cluster separability; (4) a minimal spike rate to putative neurons; (5) a Hartigan statistical dip test to detect spike bursting; (6) a decrease in voltage away from putative somas; and (7) a maximum spike-count correlation for nearby channels. Together, these criteria allow SpikeMAP users the ability to precisely control parameters relevant to automated spike sorting.”

      Further, we provide SNRs of individual channels (Supplemental Figure 2), and added to the SpikeMAP software the ability to apply a minimal criterion based on SNR.

      (8) Still in Figure 3A, it looks like there is a bias to find inhibitory cells at the borders, since they do not appear to be uniformly distributed over the MEA. Can the authors comment on that? What would be the explanation for such a behaviour? It would be interesting to see some macroscopic quantities on Excitatory/Inhibitory cells, such as mean firing rates, averaged SNRs... Because again, in Figure 3C, it is not clear to me that the firing rates of inhibitory cells are higher than Excitatory ones, whilst they should be in theory.

      We have added figures showing the distribution of E and I firing rates across a population of N=1,950 putative cells (Supplemental Figure 3). Firing rates of inhibitory neurons are marginally higher than excitatory neurons, and both E and I follow an approximately exponential distribution of rates.

      Reviewer may be right that there are more I neurons at borders in Fig.3B because injections were done in medial prefrontal cortex, so this may reflect an experimental artefact related to a high probability of activating I neurons in locations where the opsin was activated. We added a sentence to the text to clarify this point (line 201):

      “It is possible that the spatial location of putative I cells reflects the site of injection of the opsin in medial prefrontal cortex.”

      (9) For Figure 3 in general, I would have performed an exhaustive comparison of putative cells found by spikeMAP and other sorters. More precisely, I think that to prove the point that spikeMAP is indeed bringing something new to the field of spike sorting, the authors should have compared the performances of various spike sorters to discriminate Exc vs Inh cells based on their ground truth recordings. For example, either using Kilosort [Pachitariu et al, 2024, 10.1038/s41592-024-02232-7], or some other sorters that might be working with such large high-density data [Yger et al, 2018, 10.7554/eLife.34518].

      The reviewer is correct to point out that our the spike-sorting portion of our pipeline shares similarities with related approaches. Other aspects, however, are unique to SpikeMAP. We have clarified the text accordingly:

      “In sum, SpikeMAP provides an end-to-end pipeline to perform spike-sorting on high-density multielectrode arrays. Some elements of this pipeline are similar to related approaches (Table 1), including the use of voltage filtering, PCA, and k-means clustering. Other elements are novel, including the use of spline interpolation, LDA, and the ability to identify putative excitatory and inhibitory cells.”

      (10) Figure 4 has a big issue, and I guess the panels A and B should be redrawn. I don't understand what the red rectangle is displaying.

      Again, we apologize for the rendering issues in the Figures that occurred during conversion into PDF format. We have now ensured that all figures are properly displayed.

      (11) I understand that Figure 4 is only one example, but I have a hard time understanding from the manuscript how many slices/mices were used to obtain the GT data? I guess the manuscript could be enhanced by turning the data into an open-access dataset, but then some clarification is needed. How many flashes/animals/slices are we talking about? Maybe this should be illustrated in Figure 4, if this figure is devoted to the introduction of the GT data.

      Details of the open access data are now provided in Supplemental Table 1. We also clarified Figure 5B:

      “Quantification of change in firing rate following optogenetic stimulation. Average firing rates are taken over four recordings obtained from 3 mice.”

      (12) While there is no doubt that GT data as the ones recorded here by the authors are the most interesting data from a validation point of view, the pretty low yield of such experiments should not discourage the use of artificially generated recordings such as the ones made in [Buccino et al, 2020, 10.1007/s12021-020-09467-7] or even recently in [Laquitaine et al, 2024, 10.1101/2024.12.04.626805v1]. In these papers, the authors have putative waveforms/firing rate patterns for excitatory and inhibitory cells, and thus, the authors could test how good they are in discriminating the two subtypes.

      We agree with the reviewer that it would be worthwhile for future work to apply SpikeMAP to artificially generated spike trains, and have added the following (line 328):

      “Another avenue could involve applying SpikeMAP on artificially generated spike data (Buccino & Einevoll 2021; Laquitaine et al., 2024).”

      Reviewer #1 (Recommendations for the authors):

      (1) Line 154 seems to include a parenthetical expression left over from editing: "sensitive to noise (contamination? Better than noise?) generated by the signal of proximal units." See also line 186: "use (reliance?) of light-sensitive" and line 245: "In the absence of synaptic blockers (right?)," and line 270: "the size of the data prevents manual intervention (curation?)." Check carefully for all parentheses like that, which should be removed.

      Thank you for pointing this out. We have revised the text and removed parenthetical expressions left over from editing.

      (2) In lines 285-286, you state that: "k-mean clustering of spike waveform properties best differentiated the two principal classes of cells..." But I could not find where you compared k-means clustering to other methods. I think you just argued that k-means seemed to work well, but not better than, another method. If that is so, then you should probably rephrase those lines.

      The reviewer is correct that direct comparisons are not performed here, hence we removed this sentence.

      (3) Methods section, E/I classification, lines 396-405: You give us figures on what fraction was E and I (PV subtype) (94.75% and 5.25%), but there is more that you could have said. First of all, what is the expected fraction of parvalbumin-sensitive interneurons in the cortex - is it near 5%?

      We clarified the text as follows (line 444): “This number is close to the expected percentage of PV interneurons in cortex (4-6%) (Markram et al. 2004).”

      Second, how would these percentages change if you altered the threshold from 3 s.d. to something lower, like 2 s.d.? Giving us some idea of how the threshold affects the fraction of PV interneurons could give us an idea of whether this method agrees with our expectations or not.

      While SpikeMAP offers the flexibility to set the voltage threshold manually, we opted for a stringent threshold to demonstrate the capabilities of the software. As seen in Figure 2D, at 2 and 3 s.d., the signal is largely accounted for by Gaussian noise, while deviation from noise arises around 4 s.d. We clarified the text as follows (line 120):

      “At a threshold of -3 , the signal could be largely accounted for by Gaussian noise, while a separation between signal and noise began around a threshold of -4 ”

      Third, did the inhibitory neurons identified by this optogenetic method also have narrow spike widths at half amplitude? Could you do a scatterplot of all the spike widths and inter-peak distances that had color-coded dots for E and I based on your optogenetic method?

      We have added a scatterplot (Supplemental Figure 5).

      (4) Can you compare your methods with others now widely in use, like, for example, Spiking Circus or Kilosort? You do that in Table 1 in terms of features, but not in terms of performance. For example, you could have applied Kilosort4 to your data from the 4096 electrode array and seen how often it sorted the same neurons that SpikeMAP did. I realize this could not give you a comparison of how many were E/I, but it could tell you how close your numbers of neurons agreed with their numbers. Were your numbers within 5% of each other? This would be helpful for groups who are already using Kilosort4.

      As mentioned ealier, packages listed in Table 1 do not provide an identification of putative E/I neurons on high-density electrode arrays. To facilitation the integration of SpikeMAP with other spike sorting packages, our suite now provides a stand-alone module to perform E/I identification. This is now mentioned in the text (see earlier comment).

      Reviewer #2 (Recommendations for the authors):

      I would encourage the authors to decide what the paper is about: is it about a new sorting method (and if yes, more tests/benchmarks are needed to explain the pros and the cons of the pipelines, and the Methods need to be expanded). Or is it about the new data for Ground Truth validation, and again, if yes, then maybe explain more what they are, how many slices/mice/cells, ... Maybe also consider making the data available online as an open dataset.

      We agree with the reviewer that the paper is best slated toward ground truth validation of E/I identification. We now specify how many slices/mice/cells etc. (see Supplemental Table 1) and make the data available online as open source.

    1. eLife Assessment

      This is a valuable computational study of odor responses in the early olfactory system of insects and vertebrates. The study addresses the question of how information about odor concentration is encoded by second-order neurons in the invertebrate and vertebrate olfactory system; it offers insights into the transformation of neural signals from receptors to second-order neurons. While reanalysis of published data presents solid evidence supporting compression of concentration information, incomplete analysis is provided to resolve how this observation could be reconciled with the need to preserve information about changes in stimulus intensity. This work will be of interest to neuroscientists studying sensory processing broadly and olfaction specifically.

    2. Reviewer #1 (Public review):

      Summary

      This article is about the neural representation of odors in the early olfactory system of insects, fish, and rodents. Specifically, it regards the transformation that occurs between the olfactory sensory cells and the second-order neurons (projection neurons in insects, mitral/tufted cells in vertebrates). The central question is how the nervous system can encode both the identity of an odor and its concentration over many log units. The authors reanalyze data from experimental studies of odor responses in primary and secondary neurons, and test a range of computational models as to whether they match the observed transformation. They focus on two aspects of the second-order neuron response to odor concentration: the average activity across all neurons varies only a little with odor concentration, and different neurons have concentration-response curves with different shapes. They conclude that a model of divisive normalization can account for these effects, whereas two alternative models fail the test. A second observation is that tufted cells in the rodent system seem to undergo less normalization than mitral cells, and some reasons for this difference are proposed.

      Strengths:

      (1) The work compares different models for normalization, rather than simply reporting success with one.

      (2) The analysis is applied to very diverse species, potentially revealing a common principle of olfactory processing.

      Weaknesses:

      (1) It is unclear that animals actually have a need to represent odor concentration over many log units in support of olfactory behaviors.

      (2) The stimuli used in the chosen experiments, and the measure of neural response, are only weakly related to any ecological need, e.g., during odor tracking.

      (3) Some of the comparisons between receptors and second-order neurons also compare across evolutionarily distant insect species that may not use the same coding principles.

      (4) The analysis ignores the dynamics of odor responses, which figure prominently in previous answers to the question of identity/intensity coding.

      (5) There is considerable prior consensus in the literature on the importance of normalization from primary to secondary neurons.

      Elaboration of my comments:

      (1) Motivation

      The article starts from the premise that animals need to know the absolute concentration of an odor over many log units, but the need for this isn't obvious. The introduction cites an analogy to vision and audition. These are cases where we know for a fact that the absolute intensity of the stimulus is not relevant. Instead, sensory perception relies on processing small differences in intensity across space or time. And to maintain that sensitivity to small differences, the system discards the stimulus baseline. Humans are notoriously bad at judging the absolute light level. That information gets discarded even before light reaches the retina, namely through contraction of the pupil. Similarly, it seems plausible that a behavior like olfactory tracking relies on sensing small gradients across time (when weaving back and forth across the track) or space (across nostrils). It is important that the system function over many log units of concentration (e.g., far and close to a source) but not that it accurately represents what that current concentration is [see e.g., Wachowiak et al, 2025 Recalibrating Olfactory Neuroscience..].

      Still, many experiments in olfactory research have delivered square pulses of odor at concentrations spanning many log units, rather than the sorts of stimuli an animal might encounter during tracking. Even within that framework, though, it doesn't seem mysterious anymore how odor identity and odor concentration are represented differently. For example, Stopfer et al 2003 showed that the population response of locust PNs traces a dynamic trajectory. Trajectories for a given odor form a manifold, within which trajectories for different concentrations are distinct by their excursions on the manifold. To see this, one must recognize that the PN responds to an odor pulse with a time-varying firing rate, that different PNs have different dynamics, and that the dynamics can change with concentration. This is also well recognized in the mammalian systems. Much has been written about the topic of dynamic coding of identity and intensity - see the reviews of Laurent (2002) and Uchida (2014).

      (2) Conceptual

      Given the above comments on the dynamics of odor responses in first- and second-order neurons, it seems insufficient to capture the response of a neuron with a single number. Even if one somehow had to use a single number, the mean firing rate during the odor pulse may not be the best choice. For example, the rodent mitral cells fire in rhythm with the animal's sniffing cycle, and certain odors will just shift the phase of the rhythm without changing the total number of spikes (see e.g., Fantana et al, 2008). During olfactory search or tracking, the sub-second movements of the animal in the odor landscape get superposed on the sniffing cycle. Given all this, it seems unlikely that the total number of spikes from a neuron in a 4-second period is going to be a relevant variable for neural processing downstream.

      Much of the analysis focuses on the mean activity of the entire population. Why is this an interesting quantity? Apparently, the mean stays similar because some neurons increase and others decrease their firing rate. It would be more revealing, perhaps, to show the distribution of firing rates at different concentrations and see how that distribution is predicted by different models of normalization. This could provide a stronger test than just the mean.

      The question "if concentration information is discarded in second-order neurons, which exclusively transmit odor information to the rest of the brain, how does the brain support olfactory behaviors, such as tracking and navigation?" is really not an open question anymore. For example, reference 23 reports in the abstract that "Odorant concentration had no systematic effect on spike counts, indicating that rate cannot encode intensity. Instead, odor intensity can be encoded by temporal features of the population response. We found a subpopulation of rapid, largely concentration-invariant responses was followed by another population of responses whose latencies systematically decreased at higher concentrations."

      (3) Methods

      It would be useful to state early in the manuscript what kinds of stimuli are being considered and how the response of a neuron is summarized by one number. There are many alternative ways to treat both stimuli and responses.

      "The change in response across consecutive concentration levels may not be robust due to experimental noise and the somewhat limited range of concentrations sampled": Yes, a number of the curves just look like "no response". It would help the reader to show some examples of raw data, e.g. the time course of one neuron's firing rate to 4 concentrations, and for the authors to illustrate how they compress those responses into single numbers.

      "We then calculated the angle between these two slopes for each neuron and plotted a polar histogram of these angles." The methods suggest that this angle is the arctan of the ratio of the two slopes in the response curve. A ratio of 2 would result from a slope change from 0.0001 to 0.0002 (i.e., virtually no change in slope) or from 1 to 2 (a huge change). Those are completely different response curves. Is it reasonable to lump them into the same bin of the polar plot? This seems an unusual way to illustrate the diversity of response curve shapes.

      The Drosophila OSN data are passed through normalization models and then compared to locust PN data. This seems dangerous, as flies and locusts are separated by about 300 M years of evolution, and we don't know that fly PNs act like locust PNs. Their antennal lobe anatomy differs in many ways, as does the olfactory physiology. To draw any conclusions about a change in neural representation, it would be preferable to have OSN and PN data from the same species.

      (4) Models of normalization

      One conclusion is that divisive normalization could account for some of the change in responses from receptors to 2nd order neurons. This seems to be well appreciated already [e.g., Olsen 2010, Papadopoulou 2011, minireview in Hong & Wilson 2013].

      Another claim is that subtractive normalization cannot perform that function. What model was used for subtractive normalization is unclear (there is an error in the Methods). It would be interesting if there were a categorical difference between divisive and subtractive normalization.

      Looking closer at the divisive normalization model, it really has two components: (a) the "lateral inhibition" by which a neuron gets suppressed if other neurons fire (here scaled by the parameter k) , and (b) a nonlinear sigmoid transformation (determined by the parameters n and sigma). Both lateral inhibition and nonlinearity are known to contribute to decorrelation in a neural population (e.g., Pitkow 2012). The "intraglomerular gain control" contains only the nonlinearity. The "subtractive normalization" we don't know. But if one wanted to put divisive and subtractive inhibition on the same footing, one should add a sigmoid nonlinearity in both cases.

      The response models could be made more realistic in other ways. For example, in both locusts and fish, the 2nd order neurons get inputs from multiple receptor types; presumably, that will affect their response functions. Also, lateral inhibition can take quite different forms. In locusts, the inhibitory neurons seem to collect from many glomeruli. But in rats, the inhibition by short axon cells may originate from just a few sparse glomeruli, and those might be different for every mitral cell (Fantana 2008).

      (5) Tufted cells

      There are questions raised by the following statements: "traded-off energy for faster and finer concentration discrimination" and "an additional type of second-order neuron (tufted cells) that has evolved in land vertebrates and that outperforms mitral cells in concentration encoding" and later "These results suggest a trade-off between concentration decoding and normalization processes, which prevent saturation and reduce energy consumption.". Are the tufted cells inferior to the mitral cells in any respect? Do they suffer from saturation at high concentration? And do they then fail in their postulated role for odor tracking? If not, then what was the evolutionary driver for normalization in the mitral cell pathway? Certainly not lower energy consumption (50,000 mitral cells = 1% of rod photoreceptors, each of which consumes way more energy than a mitral cell).

    3. Reviewer #2 (Public review):

      Summary:

      The main goal of this study is to examine how information about odor concentration is encoded by second-order neurons in the invertebrate and vertebrate olfactory system. In many animal models, the overall mean firing rates across the second-order neurons appear to be relatively flat or near constant with increasing odor intensity. While such compression of concentration information could aid in achieving concentration invariant recognition of odor identity, how this observation could be reconciled with the need to preserve information about the changes in stimulus intensity is a major focus of the study. The authors show that second-order neurons have 'diverse' dose-response curves and that the combinations of neurons activated (particularly the rank-order) differ with concentration. Further, they argue that a single circuit-level computation, termed 'divisive normalization,' where the individual neural response is normalized by the total activity across all neurons, could help explain the coding properties of neurons at this stage of processing in all model organisms examined. They present approaches to read out the concentration information using spike rates or timing-based approaches. Finally, the authors reveal that tufted cells in the mouse olfactory bulb provide an exception to this coding approach and encode concentration information with a monotonic increase in firing rates.

      Strengths:

      (1) Comparative analysis of odor intensity coding across four different species, revealing the common features in encoding stimulus-driven features, is highly valuable.

      (2) Showing how mitral and tufted cells differ in encoding odor intensity is potentially very important to the field.

      (3) How to preserve concentration information while compressing the same with divisive normalization is also a novel and important problem in the field of sensory coding.

      Weaknesses:

      (1) The encoding problem:

      The main premise that divisive normalization generates this diversity of dose-response curves in the second-order neurons is a little problematic. The authors acknowledge this as part of their analysis in Figure 3.

      "Therefore, divisive normalization mostly does not alter the relative contribution (rank order) of each neuron in the ensemble." (Page 4, last paragraph, lines 6-8).

      The analysis in this figure indicates that divisive normalization does what it is supposed to do, i.e., compresses concentration information and not alter the rank-order of neurons or the combinatorial patterns. Changes in the combinations of neurons activated with intensity arise directly from the fact that the first-order neurons did not have monotonic responses with odor intensity (i.e., crossovers). This was the necessary condition, and not the divisive normalization for changes in the combinatorial code.

      There seems to be a confusion/urge to attribute all coding properties found in the second-order neurons to 'divisive normalization.' If the input from sensory neurons is monotonic (i.e., no crossovers), then divisive normalization did not change the rank order, and the same combinations of neurons are activated in a similar fashion (same vector direction or combinatorial profile) to encode for different odor intensities. Concentration invariance is achieved, and concentration information is lost. However, when the first-order neurons are non-monotonic (i.e., with crossovers), that causes the second-order neurons to have different rank orders with different concentrations. Divisive normalization compresses information about concentrations, and rank-order differences preserve information about the odor concentration. Does this not mean that the non-monotonicity of sensory neuron response is vital for robustly maintaining information about odor concentration?

      Naturally, the question that arises is whether many of the important features of the second-order neuron's response simply seem to follow the input. Or is my understanding of the figures and the write-up flawed, and are there more ways in which divisive normalization contributes to reshaping the second-order neural response? This must be clarified.

      Lastly, the tufted cells in the mouse OB are also driven by this sensory input with crossovers. How does the OB circuit convert the input with crossovers into one that is monotonic with concentration? I think that is an important question that this computational effort could clarify.

      (2) The decoding problem.

      The way the decoding results and analysis are presented does not add a lot of information to what has already been presented. For example, based on the differences in rank-order with concentration, I would expect the combinatorial code to be different. Hence, a very simple classifier based on cosine or correlation distance would work well. However, since divisive normalization (DN) is applied, I would expect a simple classification scheme that uses the Euclidean distance metric to work equally as well after DN. Is this the case?<br /> Leave-one-trial/sample-out seems too conservative. How robust are the combinatorial patterns across trials? Would just one or two training trials suffice for creating templates for robust classification? Based on my prior experience (https://elifesciences.org/reviewed-preprints/89330), I do expect that the combinatorial patterns would be more robust to adaptation and hence also allow robust recognition of odor intensity across repeated encounters.

      Lastly, in the simulated data, since the affinity of the first-order sensory neurons to odorants is expected to be constant across concentration, and "Jaccard similarity between the sets of highest-affinity neurons for each pair of concentration levels was > 0.96," why would the rank-order change across concentration? DN should not alter the rank order.

      If the set of early responders does change, how will the decoder need to change, and what precise predictions can be made that can be tested experimentally? The lack of exploration of this aspect of the results seems like a missed opportunity.

      (3) Analysis of existing data.

      I had a couple of issues related to the presentation and analysis of prior results.

      i) Based on the methods, for Figures 1 and 2, it appears the responses across time, trials, and odorants were averaged to get a single data point per neuron for each concentration. Would this averaging not severely dilute trends in the data? The one that particularly concerns me is the averaging across different odorants. If you do odor-by-odor analysis, is the flattening of second-order neural responses still observable? Because some odorants activate more globally and some locally, I would expect a wide variety of dose-response relationships that vary with odor identity (more compressed in second-order neurons, of course). It would be good to show some representative neural responses and show how the extracted values for each neuron are a faithful/good representation of its response variation across intensities.

      ii) A lot of neurons seem to have responses that flat line closer to zero (both firing rate and dF/F in Figure 1). Are these responsive neurons? The mean dF/F also seems to hover not significantly above zero. Hence, I was wondering if the number of neurons is reducing the trend in the data significantly.

      iii) I did not fully understand the need to show the increase in the odor response across concentrations as a polar plot. I see potential issues with the same. For example, the following dose-response trend at four intensities (C4 being the highest concentration and C1 the lowest): response at C3 > response at C1 and response at C4 > response at C2. But response at C3 < response at C2. Hence, it will be in the top right segment of the polar plot. However, the responses are not monotonic with concentrations. So, I am not convinced that the polar plot is the right way to characterize the dose-response curves. Just my 2 cents.

      (4) Simulated vs. Actual data.

      In many analyses, simulated data were used (Figures 3 and 4). However, there is no comparison of how well the simulated data fit the experimental data. For example, the Simulated 1st order neuron in Figure 3D does not show a change in rank-order for the first-order neuron. In Figure 3E, temporal response patterns in second-order neurons look unrealistic. Some objective comparison of simulated and experimental data would help bolster confidence in these results.

    4. Reviewer #3 (Public review):

      Summary:

      In their study, Shen et al. examine how first- and second-order neurons of early olfactory circuits among invertebrates and vertebrates alike respond to and encode odor identity and concentration. Previously published electrophysiological and imaging data are re-analyzed and complemented with computational simulations. The authors explore multiple potential circuit computations by which odor concentration-dependent increases in first-order neuron responses transform into concentration-invariant responses on average across the second-order neuron population, and report that divisive normalization exceeds subtractive normalization and intraglomerular gain control in accounting for this transformation. The authors then explore how either rate- or timing-based schemes in third-order neurons may decode odor identity and concentration information from such concentration-invariant mean responses across the second-order neuron population. Finally, the results of their study of second-order neurons (invertebrate projection neurons and vertebrate mitral cells) are contrasted with the concentration-variant responses of second-order projection tufted cells in mammals. Overall, through a combination of neural data re-analysis, computational simulation, and conceptual theory, this study provides important new understanding of how aspects of sensory information are encoded through the actions of distinct components of early olfactory circuits.

      Strengths:

      Consideration of multiple evolutionarily disparate olfactory circuits, as well as re-analysis of previously published neural data sets combined with novel simulations guided by those sets, lends considerable robustness to some key findings of this study. In particular, the finding that divisive normalization - with direct inspiration from established circuit components in the form of glomerular layer short-axon cells - accounts more thoroughly for the average concentration invariance of second-order olfactory neurons at a population level than other forms of normalization is compelling. Likewise, demonstration of the required 'crossover' of first-order neuron concentration sensitivity for divisive normalization to achieve such flattening of concentration variance across the second-order population is notable, with simulations providing important insight into experimentally observed patterns of first-order neuron responses. Limited clarity in other aspects of the study, in particular related to the consideration of neural response latencies and enumerated below, temper the overall strength of the study.

      Weaknesses:

      (1) While the authors focus on concentration-dependent increases in first-order neuron activity, reflecting the majority of observed responses, recent work from the Imai group shows that odorants can also lead to direct first-order neuron inhibition (i.e., reduction in spontaneous activity), and within this subset, increasing odorant concentration tends to increase the degree of inhibition. Some discussion of these findings and how they may complement divisive normalization to contribute to the diverse second-order neuron concentration-dependence would be of interest and help expand the context of the current results.

      (2) Related to the above point, odorant-evoked inhibition of second-order neurons is widespread in mammalian mitral cells and significantly contributes to the flattened concentration-dependence of mitral cells at the population level. Such responses are clearly seen in Figure 1D. Some discussion of how odorant-evoked mitral cell inhibition may complement divisive normalization, and likewise relate to comparatively lower levels of odorant-evoked inhibition among tufted cells, would further expand the context of the current results. Toward this end, replication of analyses in Figures 1D and E following exclusion of mitral cell inhibitory responses would provide insight into the contribution of such inhibition to the flattening of the mitral cell population concentration dependence.

      (3) The idea of concentration-dependent crossover responses across the first-order population being required for divisive normalization to generate individually diverse concentration response functions across the second-order population is notable. The intuition of the crossover responses is that first-order neurons that respond most sensitively to any particular odorant (i.e., at the lowest concentration) respond with overall lower activity at higher concentrations than other first-order neurons less sensitively tuned to the odorant. Whether this is a consistent, generalizable property of odorant binding and first-order neuron responsiveness is not addressed by the authors, however. Biologically, one mechanism that may support such crossover events is intraglomerular presynaptic/feedback inhibition, which would be expected to increase with increasing first-order neuron activation such that the most-sensitively responding first-order neurons would also recruit the strongest inhibition as concentration increases, enabling other first-order neurons to begin to respond more strongly. Discussion of this and/or other biological mechanisms (e.g., first-order neuron depolarization block) supporting such crossover responses would strengthen these results.

      (4) It is unclear to what degree the latency analysis considered in Figures 4D-H works with the overall framework of divisive normalization, which in Figure 3 we see depends on first-order neuron crossover in concentration response functions. Figure 4D suggests that all first-order neurons respond with the same response amplitude (R in eq. 3), even though this is supposed to be pulled from a distribution. It's possible that Figure 4D is plotting normalized response functions to highlight the difference in latency, but this is not clear from the plot or caption. If response amplitudes are all the same, and the response curves are, as plotted in Figure 4D, identical except for their time to half-max, then it seems somewhat trivial that the resulting second-order neuron activation will follow the same latency ranking, regardless of whether divisive normalization exists or not. However, there is some small jitter in these rankings across concentrations (Figure 4G), suggesting there is some randomness to the simulations. It would be helpful if this were clarified (e.g., by showing a non-normalized Figure 4D, with different response amplitudes), and more broadly, it would be extremely helpful in evaluating the latency coding within the broader framework proposed if the authors clarified whether the simulated first-order neuron response timecourses, when factoring in potentially different amplitudes (R) and averaging across the entire response window, reproduces the concentration response crossovers observed experimentally. In summary, in the present manuscript, it remains unclear if concentration crossovers are captured in the latency simulations, and if not, the authors do not clearly address what impact such variation in response amplitudes across concentrations may have on the latency results. It is further unclear to what degree divisive normalization is necessary for the second-order neurons to establish and maintain their latency ranks across concentrations, or to exhibit concentration-dependent changes in latency.

      (5) How the authors get from Figure 4G to 4H is not clear. Figure 4G shows second-order neuron response latencies across all latencies, with ordering based on their sorted latency to low concentration. This shows that very few neurons appear to change latency ranks going from low to high concentration, with a change in rank appearing as any deviation in a monotonically increasing trend. Focusing on the high concentration points, there appear to be 2 latency ranks switched in the first 10 responding neurons (reflecting the 1 downward dip in the points around neuron 8), rather than the 7 stated in the text. Across the first 50 responding neurons, I see only ~14 potential switches (reflecting the ~7 downward dips in the points around neurons 8, 20, 32, 33, 41, 44, 50), rather than the 32 stated in the text. It is possible that the unaccounted rank changes reflect fairly minute differences in latencies that are not visible in the plot in Figure 4G. This may be clarified by plotting each neuron's latency at low concentration vs. high concentration (i.e., similar to Figure 4H, but plotting absolute latency, not latency rank) to allow assessment of the absolute changes. If such minute differences are not driving latency rank changes in Fig. 4G, then a trend much closer to the unity line would be expected in Figure 4H. Instead, however, there are many massive deviations from unity, even within the first 50 responding neurons plotted in Figure 4G. These deviations include a jump in latency rank from 2 at low concentration to ~48 at high concentration. Such a jump is simply not seen in Figure 4G.

      (6) In the text, the authors state that "Odor identity can be encoded by the set of highest-affinity neurons (which remains invariant across concentrations)." Presumably, this is a restatement of the primacy model and refers to invariance in latency rank (since the authors have not shown that the highest-affinity neurons have invariant response amplitudes across concentration). To what degree this statement holds given the results in Figure 4H, however, which appear to show that some neurons with the earliest latency rank at low concentration jump to much later latency ranks at high concentration, remains unclear. Such changes in latency rank for only a few of the first responding neurons may be negligible for classifying odor identity among a small handful of odorants, but not among 1-2 orders of magnitude more odors, which may feasibly occur in a natural setting. Collectively, these issues with the execution and presentation of the latency analysis make it unclear how robust the latency results are.

      (7) Analysis in Figures 4A-C shows that concentration can be decoded from first-order neurons, second-order neurons, or first-order neurons with divisive normalization imposed (i.e., simulating second-order responses). This does not say that divisive normalization is necessary to encode concentration, however. Therefore, for the authors to say that divisive normalization is "a potential mechanism for generating odor-specific subsets of second-order neurons whose combinatorial activity or whose response latencies represent concentration information" seems too strong a conclusion. Divisive normalization is not generating the concentration information, since that can be decoded just as well from the first-order neurons. Rather, divisive normalization can account for the different population patterns in concentration response functions between first- and second-order neurons without discarding concentration-dependent information.

      (8) Performing the same polar histogram analysis of tufted vs. mitral cell concentration response functions (Figure 5B) provides a compelling new visualization of how these two cell types differ in their concentration variance. The projected importance of tufted cells to navigation, emerging directly through the inverse relationship between average concentration and distance (Figure 5C), is not surprising, and is largely a conceptual analysis rather than new quantitative analysis per se, but nevertheless, this is an important point to make. Another important consideration absent from this section, however, is whether and how divisive normalization may impact tufted cell activity. Previous work from the authors, as well as from Schoppa, Shipley, and Westbrook labs, has compellingly demonstrated that a major circuit mediating divisive normalization of mitral cells (GABA/DAergic short-axon cells) directly targets external tufted cells, and is thus very likely to also influence projection tufted cells. Such analysis would additionally provide substantially more justification for the Discussion statement "we analyzed an additional type of second-order neuron (tufted cells)", which at present instead reflects fairly minimal analysis.

    5. Author response:

      (1) Explore the temporal component of neural responses (instead of collapsing responses to a single number, i.e., the average response over 4s), and determine which of the three models can recapitulate the observed dynamics.

      (2) Expand the polar plot visualization to show all three slopes (changes in responses across all three successive concentrations) instead of only two slopes.

      (3) Attempt to collect and analyze, from published papers, data of: (a) first-order neuron responses to odors to determine the role of first-order inhibition towards generating non-monotonic responses, and (b) PN responses in Drosophila to properly compare with corresponding first-order neuron responses.

      (4) Further discuss: (a) why the brain may need to encode absolute concentration, (b) the distinction between non-monotonic responses and cross-over responses, and (c) potential limitations of the primacy model.

      (5) Expand the divisive normalization model by evaluating different values of k and R, and study the effects of divisive normalization on tufted cells.

      (6) Add discussion of other potential inhibitory mechanisms that could contribute towards the observed effects.

      Reviewer #1:

      The article starts from the premise that animals need to know the absolute concentration of an odor over many log units, but the need for this isn't obvious. The introduction cites an analogy to vision and audition. These are cases where we know for a fact that the absolute intensity of the stimulus is not relevant. Instead, sensory perception relies on processing small differences in intensity across space or time. And to maintain that sensitivity to small differences, the system discards the stimulus baseline. Humans are notoriously bad at judging the absolute light level. That information gets discarded even before light reaches the retina, namely through contraction of the pupil. Similarly, it seems plausible that a behavior like olfactory tracking relies on sensing small gradients across time (when weaving back and forth across the track) or space (across nostrils). It is important that the system function over many log units of concentration (e.g., far and close to a source) but not that it accurately represents what that current concentration is [see e.g., Wachowiak et al, 2025 Recalibrating Olfactory Neuroscience..].

      We thank the Reviewer for the insightful input and agree that gradients across time and space are important for various olfactory behaviors, such as tracking. At the same time, we think that absolute concentration is also needed for two reasons. First, in order to extract changes in concentration, the absolute concentration needs to be normalized out; i.e., change needs to be encoded with respect to some baseline, which is what divisive normalization computes. Second, while it is true that representing the exact number of odor molecules present is not important, this number directly relates to distance from the odor source, which does provide ethological value (e.g., is the tiger 100m or 1000m away?). Indeed, our decoding experiments focused on discriminating relative, and not on absolute, concentrations by classifying between each pair of concentrations (i.e., relative distances), which is effectively an assessment of the gradient. In our revision, we will make all of these points clearer.

      Still, many experiments in olfactory research have delivered square pulses of odor at concentrations spanning many log units, rather than the sorts of stimuli an animal might encounter during tracking. Even within that framework, though, it doesn't seem mysterious anymore how odor identity and odor concentration are represented differently. For example, Stopfer et al 2003 showed that the population response of locust PNs traces a dynamic trajectory. Trajectories for a given odor form a manifold, within which trajectories for different concentrations are distinct by their excursions on the manifold. To see this, one must recognize that the PN responds to an odor pulse with a time-varying firing rate, that different PNs have different dynamics, and that the dynamics can change with concentration. This is also well recognized in the mammalian systems. Much has been written about the topic of dynamic coding of identity and intensity - see the reviews of Laurent (2002) and Uchida (2014).

      Given the above comments on the dynamics of odor responses in first- and second-order neurons, it seems insufficient to capture the response of a neuron with a single number. Even if one somehow had to use a single number, the mean firing rate during the odor pulse may not be the best choice. For example, the rodent mitral cells fire in rhythm with the animal's sniffing cycle, and certain odors will just shift the phase of the rhythm without changing the total number of spikes (see e.g., Fantana et al, 2008). During olfactory search or tracking, the sub-second movements of the animal in the odor landscape get superposed on the sniffing cycle. Given all this, it seems unlikely that the total number of spikes from a neuron in a 4-second period is going to be a relevant variable for neural processing downstream.

      To our knowledge, it is not well understood how downstream brain regions read out mitral cell responses to guide olfactory behavior. The olfactory bulb projects to more than a dozen brain regions, and different regions could decode signals in different ways. We focused on the mean response because it is a simple, natural construct.

      The datasets we analyzed may not include all relevant timing information; for example, the mouse data is from calcium imaging studies that did not track sniff timing. Nonetheless, we plan to address this comment within our framework by binning time into smaller-sized windows (e.g., 0-0.2s, 0.2-0.4s, etc.) and repeating our analysis for each of these windows. Specifically, we will determine how each normalization method fares in recapitulating statistics of the population responses of each window, beyond simply assessing the population mean.

      Much of the analysis focuses on the mean activity of the entire population. Why is this an interesting quantity? Apparently, the mean stays similar because some neurons increase and others decrease their firing rate. It would be more revealing, perhaps, to show the distribution of firing rates at different concentrations and see how that distribution is predicted by different models of normalization. This could provide a stronger test than just the mean.

      We agree that mean activity is only one measure to summarize a rich data set and will perform the suggested analysis.

      The question "if concentration information is discarded in second-order neurons, which exclusively transmit odor information to the rest of the brain, how does the brain support olfactory behaviors, such as tracking and navigation?" is really not an open question anymore. For example, reference 23 reports in the abstract that "Odorant concentration had no systematic effect on spike counts, indicating that rate cannot encode intensity. Instead, odor intensity can be encoded by temporal features of the population response. We found a subpopulation of rapid, largely concentration-invariant responses was followed by another population of responses whose latencies systematically decreased at higher concentrations."

      Primacy coding does provide one plausible mechanism to decode concentration. Our manuscript demonstrated how such a code could emerge in second-order neurons with the help of divisive normalization, though it does require maintaining at least partial rank invariance across concentrations, which may not be robust. We also showed how concentration could be decoded via spike rates, even if average rates are constant, which provides an alternative hypothesis to that of ref 23.

      Further, ref 23 only considers the piriform cortex, which, as mentioned above, is one of many targets of the olfactory bulb, and it remains unclear what the decoding mechanisms are of each of these targets. In addition, work from the same authors of ref 23 found multiple potential decoding strategies in the piriform cortex itself, including changes in firing rate (see Fig. 2E of ref. 23 - Bolding & Franks, 2017; as well as Fig. 4 in Roland et al., 2017).

      It would be useful to state early in the manuscript what kinds of stimuli are being considered and how the response of a neuron is summarized by one number. There are many alternative ways to treat both stimuli and responses.

      We will add this explanation to the manuscript.

      "The change in response across consecutive concentration levels may not be robust due to experimental noise and the somewhat limited range of concentrations sampled": Yes, a number of the curves just look like "no response". It would help the reader to show some examples of raw data, e.g. the time course of one neuron's firing rate to 4 concentrations, and for the authors to illustrate how they compress those responses into single numbers.

      We agree and will add this information to the manuscript.

      "We then calculated the angle between these two slopes for each neuron and plotted a polar histogram of these angles." The methods suggest that this angle is the arctan of the ratio of the two slopes in the response curve. A ratio of 2 would result from a slope change from 0.0001 to 0.0002 (i.e., virtually no change in slope) or from 1 to 2 (a huge change). Those are completely different response curves. Is it reasonable to lump them into the same bin of the polar plot? This seems an unusual way to illustrate the diversity of response curve shapes.

      We agree that the two changes in the reviewer’s example will be categorized in the same quadrant in our analysis. We did not focus on the absolute changes because our analysis covers many log ratios of concentrations. Instead, we focused on the relative shapes of the concentration response curves, and more specifically, the direction of the change (i.e., the sign of the slope). We will better motivate this style of analysis in the revision. Moreover, in response to comments by Reviewer 2, we will compare response shapes between all three successive levels of concentration changes, as opposed to only two levels.

      The Drosophila OSN data are passed through normalization models and then compared to locust PN data. This seems dangerous, as flies and locusts are separated by about 300 M years of evolution, and we don't know that fly PNs act like locust PNs. Their antennal lobe anatomy differs in many ways, as does the olfactory physiology. To draw any conclusions about a change in neural representation, it would be preferable to have OSN and PN data from the same species.

      We are in the process of requesting PN response data in Drosophila from groups that have collected such data and will repeat the analysis once we get access to the data.

      One conclusion is that divisive normalization could account for some of the change in responses from receptors to 2nd order neurons. This seems to be well appreciated already [e.g., Olsen 2010, Papadopoulou 2011, minireview in Hong & Wilson 2013].

      While we agree that these manuscripts do study the effects of divisive normalization in insects and fish, here we show that this computation also generalizes to rodents. In addition, these previous studies do not focus on divisive normalization’s role towards concentration encoding/decoding, which is our focus. We will clarify this difference in the revision.

      Another claim is that subtractive normalization cannot perform that function. What model was used for subtractive normalization is unclear (there is an error in the Methods). It would be interesting if there were a categorical difference between divisive and subtractive normalization.

      We apologize for the mistake in the subtractive normalization equation and will correct it. Thank you for catching it.

      Looking closer at the divisive normalization model, it really has two components: (a) the "lateral inhibition" by which a neuron gets suppressed if other neurons fire (here scaled by the parameter k) , and (b) a nonlinear sigmoid transformation (determined by the parameters n and sigma). Both lateral inhibition and nonlinearity are known to contribute to decorrelation in a neural population (e.g., Pitkow 2012). The "intraglomerular gain control" contains only the nonlinearity. The "subtractive normalization" we don't know. But if one wanted to put divisive and subtractive inhibition on the same footing, one should add a sigmoid nonlinearity in both cases.

      Our intent was not to place all the methods on the “same footing” but rather to isolate the two primary components of normalization methods – non-linearity and lateral inhibition – and determine which of these, and in which combination, could generate the desired effects. Divisive normalization incorporates both components, whereas intraglomerular gain control and subtractive normalization only incorporate one of these components. We will clarify this reasoning in the revision.

      The response models could be made more realistic in other ways. For example, in both locusts and fish, the 2nd order neurons get inputs from multiple receptor types; presumably, that will affect their response functions. Also, lateral inhibition can take quite different forms. In locusts, the inhibitory neurons seem to collect from many glomeruli. But in rats, the inhibition by short axon cells may originate from just a few sparse glomeruli, and those might be different for every mitral cell (Fantana 2008).

      We thank the Reviewer for the input. Instead of fixing k for all second-order neurons, we will apply different k values for different neurons. We will also systematically vary the percentage of neurons used for the divisive normalization calculation in the denominator, and determine the regime under which the effects experimentally observed are reproducible. This approach takes into account the scenario that inter-glomerular inhibitory interactions are sparse.

      There are questions raised by the following statements: "traded-off energy for faster and finer concentration discrimination" and "an additional type of second-order neuron (tufted cells) that has evolved in land vertebrates and that outperforms mitral cells in concentration encoding" and later "These results suggest a trade-off between concentration decoding and normalization processes, which prevent saturation and reduce energy consumption.". Are the tufted cells inferior to the mitral cells in any respect? Do they suffer from saturation at high concentration? And do they then fail in their postulated role for odor tracking? If not, then what was the evolutionary driver for normalization in the mitral cell pathway? Certainly not lower energy consumption (50,000 mitral cells = 1% of rod photoreceptors, each of which consumes way more energy than a mitral cell).

      The question of what mitral cells are “good for”, compared to tufted cells, remains unclear in our view. We speculate that mitral cells provide superior context-dependent processing and are better for determining stimuli-reward contingencies, but this remains far from settled experimentally.

      We believe the mitral cell pathway evolved earlier than tufted cells, since the former appear akin to projection neurons in insects. Nonetheless, we agree that differences in energy consumption are unlikely to be the primary distinguishing factor, and in the revision, we will drop this argument.

      Reviewer #2:

      The main premise that divisive normalization generates this diversity of dose-response curves in the second-order neurons is a little problematic. … The analysis in [Figure 3] indicates that divisive normalization does what it is supposed to do, i.e., compresses concentration information and not alter the rank-order of neurons or the combinatorial patterns. Changes in the combinations of neurons activated with intensity arise directly from the fact that the first-order neurons did not have monotonic responses with odor intensity (i.e., crossovers). This was the necessary condition, and not the divisive normalization for changes in the combinatorial code. There seems to be a confusion/urge to attribute all coding properties found in the second-order neurons to 'divisive normalization.' If the input from sensory neurons is monotonic (i.e., no crossovers), then divisive normalization did not change the rank order, and the same combinations of neurons are activated in a similar fashion (same vector direction or combinatorial profile) to encode for different odor intensities. Concentration invariance is achieved, and concentration information is lost. However, when the first-order neurons are non-monotonic (i.e., with crossovers), that causes the second-order neurons to have different rank orders with different concentrations. Divisive normalization compresses information about concentrations, and rank-order differences preserve information about the odor concentration. Does this not mean that the non-monotonicity of sensory neuron response is vital for robustly maintaining information about odor concentration? Naturally, the question that arises is whether many of the important features of the second-order neuron's response simply seem to follow the input. Or is my understanding of the figures and the write-up flawed, and are there more ways in which divisive normalization contributes to reshaping the second-order neural response? This must be clarified. Lastly, the tufted cells in the mouse OB are also driven by this sensory input with crossovers. How does the OB circuit convert the input with crossovers into one that is monotonic with concentration? I think that is an important question that this computational effort could clarify.

      It appears that there is confusion about the definitions of “non-monotonicity” and “crossovers”.  These are two independent concepts – one does not necessarily lead to the other. Non-monotonicity concerns the response of a single neuron to different concentration levels. A neuron’s response is considered non-monotonic if its response goes up then down, or down then up, across increasing concentrations. A “cross-over” is defined based on the responses of multiple neurons. A cross-over occurs when the response of one neuron is lower than another neuron at one concentration, but higher than the other at a different concentration. For example, the responses of both neurons could increase monotonically with increasing concentration, but one neuron might start lower and grow faster, hence creating a cross-over. We will clarify this in the manuscript, which we believe will resolve the questions raised above.

      The way the decoding results and analysis are presented does not add a lot of information to what has already been presented. For example, based on the differences in rank-order with concentration, I would expect the combinatorial code to be different. Hence, a very simple classifier based on cosine or correlation distance would work well. However, since divisive normalization (DN) is applied, I would expect a simple classification scheme that uses the Euclidean distance metric to work equally as well after DN. Is this the case?

      Yes, we used a simple classification scheme, logistic regression with a linear kernel, which is essentially a Euclidean distance-based classification. This scheme works better for tufted cells because they are more monotonic; i.e., if neuron A and B both increase their responsiveness with concentration, then Euclidean distance would be fine. But if neuron A’s response amplitude goes up and neuron B’s response goes down – as often happens for mitral cells – then Euclidean distance does not work as well. We will add intuition about this in the manuscript.

      Leave-one-trial/sample-out seems too conservative. How robust are the combinatorial patterns across trials? Would just one or two training trials suffice for creating templates for robust classification? Based on my prior experience (https://elifesciences.org/reviewed-preprints/89330https://elifesciences.org/reviewed-preprints/89330), I do expect that the combinatorial patterns would be more robust to adaptation and hence also allow robust recognition of odor intensity across repeated encounters.

      As suggested, we will compute the correlation coefficient of the similarity of neural responses for each odor (across trials). We will repeat this analysis for both mitral and tufted cells. To determine the effect of adaptation, we will compute correlation coefficients of responses between the 1st and 2nd trials vs the 1st and final trial.

      Lastly, in the simulated data, since the affinity of the first-order sensory neurons to odorants is expected to be constant across concentration, and "Jaccard similarity between the sets of highest-affinity neurons for each pair of concentration levels was > 0.96," why would the rank-order change across concentration? DN should not alter the rank order.

      We agree that divisive normalization should not alter the rank order, but the rank order may change in first-order neurons, which carries through to second-order neurons. This confusion may be related to the one mentioned above re: cross-overs vs non-monotonicity. Moreover, in the simulated data (Fig. 4D-H), the Jaccard similarity was calculated based on only the 50 neurons with the highest affinity, not the entire population of neurons. As shown in Fig. 4H, most of the rank-order change happens in the remaining 150 neurons.

      Note that in response to a comment by Reviewer 3, we will change the presentation of Fig. 4H in the revision.

      If the set of early responders does change, how will the decoder need to change, and what precise predictions can be made that can be tested experimentally? The lack of exploration of this aspect of the results seems like a missed opportunity.

      In the Discussion, we wrote about how downstream circuits will need to learn which set of neurons are to be associated with each distinct concentration level. We will expand upon this point and include experimentally testable predictions.

      Based on the methods, for Figures 1 and 2, it appears the responses across time, trials, and odorants were averaged to get a single data point per neuron for each concentration. Would this averaging not severely dilute trends in the data? The one that particularly concerns me is the averaging across different odorants. If you do odor-by-odor analysis, is the flattening of second-order neural responses still observable? Because some odorants activate more globally and some locally, I would expect a wide variety of dose-response relationships that vary with odor identity (more compressed in second-order neurons, of course). It would be good to show some representative neural responses and show how the extracted values for each neuron are a faithful/good representation of its response variation across intensities.

      It appears there is some confusion here; we will clarify in the text and figure captions that we did not average across different odors in our analysis. We will also add figure panels showing some representative neural responses as suggested by the Reviewer.

      A lot of neurons seem to have responses that flat line closer to zero (both firing rate and dF/F in Figure 1). Are these responsive neurons? The mean dF/F also seems to hover not significantly above zero. Hence, I was wondering if the number of neurons is reducing the trend in the data significantly.

      Yes, if a neuron responds to at least one concentration level in at least 50% of the trials, it is considered responsive. So it is possible that some neurons respond to one concentration level and otherwise flatline near zero.  We will highlight a few example neurons to visualize this scenario.

      I did not fully understand the need to show the increase in the odor response across concentrations as a polar plot. I see potential issues with the same. For example, the following dose-response trend at four intensities (C4 being the highest concentration and C1 the lowest): response at C3 > response at C1 and response at C4 > response at C2. But response at C3 < response at C2. Hence, it will be in the top right segment of the polar plot. However, the responses are not monotonic with concentrations. So, I am not convinced that the polar plot is the right way to characterize the dose-response curves. Just my 2 cents.

      Your 2 cents are valuable! Thank you for raising this point. Instead of computing two slopes (C1-C3 and C2-C4), we will expand our analysis to include all three slopes (C1-C2, C2-C3, C3-C4). Consequently, there are 2^3 = 8 different response shapes, and we will list them and quantify the fraction of the responses that fall into each shape category.

      In many analyses, simulated data were used (Figures 3 and 4). However, there is no comparison of how well the simulated data fit the experimental data. For example, the Simulated 1st order neuron in Figure 3D does not show a change in rank-order for the first-order neuron. In Figure 3E, temporal response patterns in second-order neurons look unrealistic. Some objective comparison of simulated and experimental data would help bolster confidence in these results.

      We believe the Reviewer is referring to Figs. 4D and 4E, since Fig. 3D does not show a first-order neuron simulation, and there is no Fig 3E. In Fig. 4D there is no change of rank order because the simulation is for a single odor and single concentration level, and the change of rank-order (i.e., cross-overs) as we define occurs between concentration levels. We will clarify this in the manuscript.

      Reviewer #3:

      While the authors focus on concentration-dependent increases in first-order neuron activity, reflecting the majority of observed responses, recent work from the Imai group shows that odorants can also lead to direct first-order neuron inhibition (i.e., reduction in spontaneous activity), and within this subset, increasing odorant concentration tends to increase the degree of inhibition. Some discussion of these findings and how they may complement divisive normalization to contribute to the diverse second-order neuron concentration-dependence would be of interest and help expand the context of the current results.

      We thank the Reviewer for the suggestion. We will request datasets of first-order neuron responses from the groups who acquired them. We will analyze this data to determine the role of inhibition or antagonistic binding and quantify what percentage of first-order neurons respond less strongly with larger concentrations.

      Related to the above point, odorant-evoked inhibition of second-order neurons is widespread in mammalian mitral cells and significantly contributes to the flattened concentration-dependence of mitral cells at the population level. Such responses are clearly seen in Figure 1D. Some discussion of how odorant-evoked mitral cell inhibition may complement divisive normalization, and likewise relate to comparatively lower levels of odorant-evoked inhibition among tufted cells, would further expand the context of the current results. Toward this end, replication of analyses in Figures 1D and E following exclusion of mitral cell inhibitory responses would provide insight into the contribution of such inhibition to the flattening of the mitral cell population concentration dependence.

      We will perform the analysis suggested, specifically, we will set the negative mitral cell responses to 0 and assess whether the population mean remains flat.

      The idea of concentration-dependent crossover responses across the first-order population being required for divisive normalization to generate individually diverse concentration response functions across the second-order population is notable. The intuition of the crossover responses is that first-order neurons that respond most sensitively to any particular odorant (i.e., at the lowest concentration) respond with overall lower activity at higher concentrations than other first-order neurons less sensitively tuned to the odorant. Whether this is a consistent, generalizable property of odorant binding and first-order neuron responsiveness is not addressed by the authors, however. Biologically, one mechanism that may support such crossover events is intraglomerular presynaptic/feedback inhibition, which would be expected to increase with increasing first-order neuron activation such that the most-sensitively responding first-order neurons would also recruit the strongest inhibition as concentration increases, enabling other first-order neurons to begin to respond more strongly. Discussion of this and/or other biological mechanisms (e.g., first-order neuron depolarization block) supporting such crossover responses would strengthen these results.

      We thank the reviewer for providing additional mechanisms to consider. As suggested, we will add discussion of these alternatives to divisive normalization.

      It is unclear to what degree the latency analysis considered in Figures 4D-H works with the overall framework of divisive normalization, which in Figure 3 we see depends on first-order neuron crossover in concentration response functions. Figure 4D suggests that all first-order neurons respond with the same response amplitude (R in eq. 3), even though this is supposed to be pulled from a distribution. It's possible that Figure 4D is plotting normalized response functions to highlight the difference in latency, but this is not clear from the plot or caption. If response amplitudes are all the same, and the response curves are, as plotted in Figure 4D, identical except for their time to half-max, then it seems somewhat trivial that the resulting second-order neuron activation will follow the same latency ranking, regardless of whether divisive normalization exists or not. However, there is some small jitter in these rankings across concentrations (Figure 4G), suggesting there is some randomness to the simulations. It would be helpful if this were clarified (e.g., by showing a non-normalized Figure 4D, with different response amplitudes), and more broadly, it would be extremely helpful in evaluating the latency coding within the broader framework proposed if the authors clarified whether the simulated first-order neuron response timecourses, when factoring in potentially different amplitudes (R) and averaging across the entire response window, reproduces the concentration response crossovers observed experimentally. In summary, in the present manuscript, it remains unclear if concentration crossovers are captured in the latency simulations, and if not, the authors do not clearly address what impact such variation in response amplitudes across concentrations may have on the latency results. It is further unclear to what degree divisive normalization is necessary for the second-order neurons to establish and maintain their latency ranks across concentrations, or to exhibit concentration-dependent changes in latency.

      As suggested by the Reviewer, we will add another simulation scenario where the response amplitudes (R) are different for different neurons. For each concentration, we will then average each neuron’s response across the entire response window and determine if the simulation reproduces the cross-overs as observed experimentally.

      How the authors get from Figure 4G to 4H is not clear. Figure 4G shows second-order neuron response latencies across all latencies, with ordering based on their sorted latency to low concentration. This shows that very few neurons appear to change latency ranks going from low to high concentration, with a change in rank appearing as any deviation in a monotonically increasing trend. Focusing on the high concentration points, there appear to be 2 latency ranks switched in the first 10 responding neurons (reflecting the 1 downward dip in the points around neuron 8), rather than the 7 stated in the text. Across the first 50 responding neurons, I see only ~14 potential switches (reflecting the ~7 downward dips in the points around neurons 8, 20, 32, 33, 41, 44, 50), rather than the 32 stated in the text. It is possible that the unaccounted rank changes reflect fairly minute differences in latencies that are not visible in the plot in Figure 4G. This may be clarified by plotting each neuron's latency at low concentration vs. high concentration (i.e., similar to Figure 4H, but plotting absolute latency, not latency rank) to allow assessment of the absolute changes. If such minute differences are not driving latency rank changes in Fig. 4G, then a trend much closer to the unity line would be expected in Figure 4H. Instead, however, there are many massive deviations from unity, even within the first 50 responding neurons plotted in Figure 4G. These deviations include a jump in latency rank from 2 at low concentration to ~48 at high concentration. Such a jump is simply not seen in Figure 4G.

      We apologize that Fig. 4H was a poor choice for visualization. What is plotted in Fig. 4H is the sorted identity of neurons under low and high concentrations, and points on the y=x line indicate that the two corresponding neurons have the same rank under the two concentrations. We will replace this panel with a more intuitive visualization, where the x and y axes are the ranks of the neurons; and deviation from the y=x line indicates how different the ranks are of a neuron to the two concentrations.

      In the text, the authors state that "Odor identity can be encoded by the set of highest-affinity neurons (which remains invariant across concentrations)." Presumably, this is a restatement of the primacy model and refers to invariance in latency rank (since the authors have not shown that the highest-affinity neurons have invariant response amplitudes across concentration). To what degree this statement holds given the results in Figure 4H, however, which appear to show that some neurons with the earliest latency rank at low concentration jump to much later latency ranks at high concentration, remains unclear. Such changes in latency rank for only a few of the first responding neurons may be negligible for classifying odor identity among a small handful of odorants, but not among 1-2 orders of magnitude more odors, which may feasibly occur in a natural setting. Collectively, these issues with the execution and presentation of the latency analysis make it unclear how robust the latency results are.

      The original primacy model states that the latency of a neuron decreases with increasing concentration, while the ranks of neurons remain unaltered. Our results, on the other hand, suggest that the ranks do at least partially change across concentrations. This leads to two possible decoding mechanisms. First, if the top K responding neurons remain invariant across concentrations (even if their individual ranks change within the top K), then the brain could learn to associate a population of K neurons with a response latency; lower response latency means higher concentration. Second, if the top K responding neurons do not remain invariant across concentrations, then the brain would need to learn to associate a different set of neurons with each concentration level. The latter imposes additional constraints on the robustness of the primacy model and the corresponding read-out mechanism. We will include more discussion of these possibilities in the revision.

      Analysis in Figures 4A-C shows that concentration can be decoded from first-order neurons, second-order neurons, or first-order neurons with divisive normalization imposed (i.e., simulating second-order responses). This does not say that divisive normalization is necessary to encode concentration, however. Therefore, for the authors to say that divisive normalization is "a potential mechanism for generating odor-specific subsets of second-order neurons whose combinatorial activity or whose response latencies represent concentration information" seems too strong a conclusion. Divisive normalization is not generating the concentration information, since that can be decoded just as well from the first-order neurons. Rather, divisive normalization can account for the different population patterns in concentration response functions between first- and second-order neurons without discarding concentration-dependent information.

      We agree that the word “generating” is faulty. We thank the reviewer for their more precise wording, which we will adopt.

      Performing the same polar histogram analysis of tufted vs. mitral cell concentration response functions (Figure 5B) provides a compelling new visualization of how these two cell types differ in their concentration variance. The projected importance of tufted cells to navigation, emerging directly through the inverse relationship between average concentration and distance (Figure 5C), is not surprising, and is largely a conceptual analysis rather than new quantitative analysis per se, but nevertheless, this is an important point to make. Another important consideration absent from this section, however, is whether and how divisive normalization may impact tufted cell activity. Previous work from the authors, as well as from Schoppa, Shipley, and Westbrook labs, has compellingly demonstrated that a major circuit mediating divisive normalization of mitral cells (GABA/DAergic short-axon cells) directly targets external tufted cells, and is thus very likely to also influence projection tufted cells. Such analysis would additionally provide substantially more justification for the Discussion statement "we analyzed an additional type of second-order neuron (tufted cells)", which at present instead reflects fairly minimal analysis.

      We agree that tufted cells are subject to divisive normalization as well, albeit probably to a less degree than mitral cells. To determine the effect of this, we will alter the strength (and degree of sparseness of interglomerular interactions) of divisive normalization and determine if there is a regime where response features of tufted cells match those observed experimentally.

    1. eLife Assessment

      This study reports important negative results by showing that genetic removal of the RNA-binding protein PTBP1 in astrocytes is not sufficient to induce their conversion into neurons, challenging prior claims in the field. It also provides a systematic and insightful analysis of the role of PTBP1 in regulating astrocyte-specific splicing. The evidence is convincing, as the experiments are technically robust, rigorously controlled, and supported by both imaging and transcriptomic analyses.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang et al. used a conditional knockout mouse model to re-examine the role of the RNA-binding protein PTBP1 in the transdifferentiation of astroglial cells into neurons. Several earlier studies reported that PTBP1 knockdown can efficiently induce the transdifferentiation of rodent glial cells into neurons, suggesting potential therapeutic applications for neurodegenerative diseases. However, these findings have been contested by subsequent studies, which in turn have been challenged by more recent publications. In their current work, Zhang et al. deleted exon 2 of the Ptbp1 gene using an astrocyte-specific, tamoxifen-inducible Cre line and investigated, using fluorescence imaging and bulk and single-cell RNA-sequencing, whether this manipulation promotes the transdifferentiation of astrocytes into neurons across various brain regions. The data strongly indicate that genetic ablation of PTBP1 is not sufficient to drive efficient conversion of astrocytes into neurons. Interestingly, while PTBP1 loss alters splicing patterns in numerous genes, these changes do not shift the astroglial transcriptome toward a neuronal profile.

      Strengths:

      Although this is not the first report of PTBP1 ablation in mouse astrocytes in vivo, this study utilizes a distinct knockout strategy and provides novel insights into PTBP1-regulated splicing events in astrocytes. The manuscript is well written, and the experiments are technically sound and properly controlled. I believe this study will be of considerable interest to a broad readership.

      Weaknesses:

      (1) The primary point that needs to be addressed is a better understanding of the effect of exon 2 deletion on PTBP1 expression. Figure 4D shows successful deletion of exon 2 in knockout astrocytes. However, assuming that the coverage plots are CPM-normalized, the overall PTBP1 mRNA expression level appears unchanged. Figure 6A further supports this observation. This is surprising, as one would expect that the loss of exon 2 would shift the open reading frame and trigger nonsense-mediated decay of the PTBP1 transcript. Given this uncertainty, the authors should confirm the successful elimination of PTBP1 protein in cKO astrocytes using an orthogonal approach, such as Western blotting, in addition to immunofluorescence. They should also discuss possible reasons why PTBP1 mRNA abundance is not detectably affected by the frameshift.

      (2) The authors should analyze PTBP1 expression in WT and cKO substantia nigra samples shown in Figure 3 or justify why this analysis is not necessary.

      (3) Lines 236-238 and Figure 4E: The authors report an enrichment of CU-rich sequences near PTBP1-regulated exons. To better compare this with previous studies on position-specific splicing regulation by PTBP1, it would be helpful to assess whether the position of such motifs differs between PTBP1-activated and PTBP1-repressed exons.

      (4) The analyses in Figure 5 and its supplement strongly suggest that the splicing changes in PTBP1-depleted astrocytes are distinct from those occurring during neuronal differentiation. However, the authors should ensure that these comparisons are not confounded by transcriptome-wide differences in gene expression levels between astrocytes and developing neurons. One way to address this concern would be to compare the new PTBP1 cKO data with publicly available RNA-seq datasets of astrocytes induced to transdifferentiate into neurons using proneural transcription factors (e.g., PMID: 38956165).

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhang and colleagues describes a study that investigated whether the deletion of PTBP1 in adult astrocytes in mice led to an astrocyte-to-neuron conversion. The study revisited the hypothesis that reduced PTBP1 expression reprogrammed astrocytes to neurons. More than 10 studies have been published on this subject, with contradicting results. Half of the studies supported the hypothesis while the other half did not. The question being addressed is an important one because if the hypothesis is correct, it can lead to exciting therapeutic applications for treating neurodegenerative diseases such as Parkinson's disease.

      In this study, Zhang and colleagues conducted a conditional mouse knockout study to address the question. They used the Cre-LoxP system to specifically delete PTBP1 in adult astrocytes. Through a series of carefully controlled experiments, including cell lineage tracing, the authors found no evidence for the astrocyte-to-neuron conversion.

      The authors then carried out a key experiment that none of the previous studies on the subject did: investigating alternative splicing pattern changes in PTBP1-depleted cells using RNA-seq analysis. The idea is to compare the splicing pattern change caused by PTBP1 deletion in astrocytes to what occurs during neurodevelopment. This is an important experiment that will help illuminate whether the astrocyte-to-neuron transition occurred in the system. The result was consistent with that of the cell staining experiments: no significant transition was detected.

      These experiments demonstrate that, in this experimental setting, PTBT1 deletion in adult astrocytes did not convert the cells to neurons.

      Strengths:

      This is a well-designed, elegantly conducted, and clearly described study that addresses an important question. The conclusions provide important information to the field.<br /> To this reviewer, this study provided convincing and solid experimental evidence to support the authors' conclusions.

      Weaknesses:

      The Discussion in this manuscript is short and can be expanded. Can the authors speculate what led to the contradictory results in the published studies? The current study, in combination with the study published in Cell in 2021 by Wang and colleagues, suggests that observed difference is not caused by the difference of knockdown vs. knockout. Is it possible that other glial cell types are responsible for the transition? If so, what cells? Oligodendrocytes?

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zhang et al. used a conditional knockout mouse model to re-examine the role of the RNA-binding protein PTBP1 in the transdifferentiation of astroglial cells into neurons. Several earlier studies reported that PTBP1 knockdown can efficiently induce the transdifferentiation of rodent glial cells into neurons, suggesting potential therapeutic applications for neurodegenerative diseases. However, these findings have been contested by subsequent studies, which in turn have been challenged by more recent publications. In their current work, Zhang et al. deleted exon 2 of the Ptbp1 gene using an astrocyte-specific, tamoxifen-inducible Cre line and investigated, using fluorescence imaging and bulk and single-cell RNA-sequencing, whether this manipulation promotes the transdifferentiation of astrocytes into neurons across various brain regions. The data strongly indicate that genetic ablation of PTBP1 is not sufficient to drive efficient conversion of astrocytes into neurons. Interestingly, while PTBP1 loss alters splicing patterns in numerous genes, these changes do not shift the astroglial transcriptome toward a neuronal profile.

      Strengths:

      Although this is not the first report of PTBP1 ablation in mouse astrocytes in vivo, this study utilizes a distinct knockout strategy and provides novel insights into PTBP1-regulated splicing events in astrocytes. The manuscript is well written, and the experiments are technically sound and properly controlled. I believe this study will be of considerable interest to a broad readership.

      Weaknesses:

      (1) The primary point that needs to be addressed is a better understanding of the effect of exon 2 deletion on PTBP1 expression. Figure 4D shows successful deletion of exon 2 in knockout astrocytes. However, assuming that the coverage plots are CPM-normalized, the overall PTBP1 mRNA expression level appears unchanged. Figure 6A further supports this observation. This is surprising, as one would expect that the loss of exon 2 would shift the open reading frame and trigger nonsense-mediated decay of the PTBP1 transcript. Given this uncertainty, the authors should confirm the successful elimination of PTBP1 protein in cKO astrocytes using an orthogonal approach, such as Western blotting, in addition to immunofluorescence. They should also discuss possible reasons why PTBP1 mRNA abundance is not detectably affected by the frameshift.

      We thank the reviewer for raising this important point. Indeed, the deletion of exon 2 introduces a frameshift that is predicted to disrupt the PTBP1 open reading frame and trigger nonsensemediated decay (NMD). While our CPM-normalized coverage plots (Figure 4D) and gene-level expression analysis (Figure 6A) suggest that PTBP1 mRNA levels remain largely unchanged in cKO astrocytes, we acknowledge that this observation is counterintuitive and merits further clarification.

      We suspect that the process of brain tissue dissociation and FACS sorting for bulk or single cell RNA-seq may enrich for nucleic material and thus dilute the NMD signal, which occurs in the cytoplasm. Alternatively, the transcripts (like other genes) may escape NMD for unknown mechanisms. Although a frameshift is a strong indicator for triggering NMD, it does not guarantee NMD will occur in every case. We will include this discussion in the revised manuscript to provide additional context for the apparent discrepancy between mRNA abundance and protein loss.

      Regarding the validation of PTBP1 protein depletion in cKO astrocytes by Western blotting, we acknowledge that orthogonal approaches to confirm PTBP1 elimination would address uncertainty around the effect of exon 2 deletion on PTBP1 expression. The low cell yield of cKO astrocytes poses a significant burden on obtaining sufficient samples for immunoblotting detection of PTBP1 depletion. On average 3-5 adult animals per genotype are needed for each biological replicate. Our characterization of this Ptbp1 deletion allele in other contexts show the loss of full length PTBP1 proteins in ESCs and NPCs using Western blotting. Furthermore, germline homozygous mutant mice do not survive beyond embryonic day 6, supporting that it is  a loss of function allele.

      (2) The authors should analyze PTBP1 expression in WT and cKO substantia nigra samples shown in Figure 3 or justify why this analysis is not necessary.

      We thank the reviewer for pointing out this important question. We used Aldh1l1-CreERT2, which is designed to be active in all the astrocyte throughout mouse brain. Although we have systematically verified PTBP1 elimination in different mouse brain regions (cortex and striatum) at multiple time points (from 4w to 12w after tamoxifen administration), we agree that it remains necessary and important to demonstrate whether the observed lack of astrocyte-to-neuron conversion is indeed associated with sufficient PTBP1 depletion. We will analyze the PTBP1 expression in the substantia nigra, as we did in the cortex and striatum. 

      (3) Lines 236-238 and Figure 4E: The authors report an enrichment of CU-rich sequences near PTBP1-regulated exons. To better compare this with previous studies on position-specific splicing regulation by PTBP1, it would be helpful to assess whether the position of such motifs differs between PTBP1-activated and PTBP1-repressed exons.

      We thank the reviewer for this insightful comment. We agree that assessing the positional distribution of CU-rich motifs between PTBP1-activated and PTBP1-repressed exons would provide valuable insight into the position-specific regulatory mechanisms of PTBP1. In response, we will perform separate motif enrichment analyses for PTBP1-activated and PTBP1-repressed exons and examine whether their positional patterns differ. This will help clarify whether these exons are differentially regulated by PTBP1 through distinct motif positioning in mature astrocytes.

      (4) The analyses in Figure 5 and its supplement strongly suggest that the splicing changes in PTBP1-depleted astrocytes are distinct from those occurring during neuronal differentiation. However, the authors should ensure that these comparisons are not confounded by transcriptome-wide differences in gene expression levels between astrocytes and developing neurons. One way to address this concern would be to compare the new PTBP1 cKO data with publicly available RNA-seq datasets of astrocytes induced to transdifferentiate into neurons using proneural transcription factors (e.g., PMID: 38956165).

      We would like to express our gratitude for the thoughtful feedback. We agree that transcriptomewide differences in gene expression between astrocytes and developing neurons could confound the interpretation of splicing differences. To address this concern, we will incorporate publicly available RNA-seq datasets from studies in which astrocytes are reprogrammed into neurons using proneural transcription factors (PMID: 38956165). 

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhang and colleagues describes a study that investigated whether the deletion of PTBP1 in adult astrocytes in mice led to an astrocyte-to-neuron conversion. The study revisited the hypothesis that reduced PTBP1 expression reprogrammed astrocytes to neurons. More than 10 studies have been published on this subject, with contradicting results. Half of the studies supported the hypothesis while the other half did not. The question being addressed is an important one because if the hypothesis is correct, it can lead to exciting therapeutic applications for treating neurodegenerative diseases such as Parkinson's disease.

      In this study, Zhang and colleagues conducted a conditional mouse knockout study to address the question. They used the Cre-LoxP system to specifically delete PTBP1 in adult astrocytes. Through a series of carefully controlled experiments, including cell lineage tracing, the authors found no evidence for the astrocyte-to-neuron conversion.

      The authors then carried out a key experiment that none of the previous studies on the subject did: investigating alternative splicing pattern changes in PTBP1-depleted cells using RNA-seq analysis. The idea is to compare the splicing pattern change caused by PTBP1 deletion in astrocytes to what occurs during neurodevelopment. This is an important experiment that will help illuminate whether the astrocyte-to-neuron transition occurred in the system. The result was consistent with that of the cell staining experiments: no significant transition was detected.

      These experiments demonstrate that, in this experimental setting, PTBT1 deletion in adult astrocytes did not convert the cells to neurons.

      Strengths:

      This is a well-designed, elegantly conducted, and clearly described study that addresses an important question. The conclusions provide important information to the field.

      To this reviewer, this study provided convincing and solid experimental evidence to support the authors' conclusions.

      Weaknesses:

      The Discussion in this manuscript is short and can be expanded. Can the authors speculate what led to the contradictory results in the published studies? The current study, in combination with the study published in Cell in 2021 by Wang and colleagues, suggests that observed difference is not caused by the difference of knockdown vs. knockout. Is it possible that other glial cell types are responsible for the transition? If so, what cells? Oligodendrocytes?

      We are grateful for the reviewer’s careful reading and valuable suggestions. These will help us improve the manuscript. We will expand the Discussion. The contradictory results in the previously published studies can be due to the stringency and neuronal leakage of the astrocytespecific GFAP promoter that some investigators chose. Other possibilities include alternative cell origin, increased neuronal resilience, or combinations of as yet unidentified factors.

    1. Reviewer #1 (Public review):

      Summary:

      This paper presents maRQup, a Python pipeline for automating the quantitative analysis of preclinical cancer immunotherapy experiments using bioluminescent imaging in mice. maRQup processes images to quantify tumor burden over time and across anatomical regions, enabling large-scale analysis of over 1,000 mice. The study uses this tool to compare different CAR-T cell constructs and doses, identifying differences in initial tumor control and relapse rates, particularly noting that CD19.CD28 CAR-T cells show faster initial killing but higher relapse compared to CD19.4-1BB CAR-T cells. Furthermore, maRQup facilitates the spatiotemporal analysis of tumor dynamics, revealing differences in growth patterns based on anatomical location, such as the snout exhibiting more resistance to treatment than bone marrow.

      Strengths:

      (1) The maRQup pipeline enables the automatic processing of a large dataset of over 1,000 mice, providing investigators with a rapid and efficient method for analyzing extensive bioluminescent tumor image data.

      (2) Through image processing steps like tail removal and vertical scaling, maRQup normalizes mouse dimensions to facilitate the alignment of anatomical regions across images. This process enables the reliable demarcation of nine distinct anatomical regions within each mouse image, serving as a basis for spatiotemporal analysis of tumor burden within these consistent regions by quantifying average radiance per pixel.

      Weaknesses:

      (1) While the pipeline aims to standardize images for regional assessment, the reliance on scaling primarily along the vertical axis after tail removal may introduce limitations to the quantitative robustness of the anatomically defined regions. This approach does not account for potential non-linear growth across dimensions in animals of different ages or sizes, which could result in relative stretching or shrinking of subjects compared to an average reference.

      (2) Furthermore, despite excluding severely slanted images, the pipeline does not fully normalize for variations in animal pose during image acquisition (e.g., tucked body, leaning). This pose variability not only impacts the precise relative positioning of internal anatomical regions, potentially making their definition based on relative image coordinates more qualitative than truly quantitative for precise regional analysis, but it also means that the bioluminescent light signal from the tumor will not propagate equally to the camera, as photons will travel differentially through the tissue. This differing light path through tissues due to variable positioning can introduce large variability in the measured radiance that was not accounted for in the analysis algorithm. Achieving more robust anatomical and quantitative normalization might require methods that control animal posture using a rigid structure during imaging.

    2. Reviewer #2 (Public review):

      Summary:

      The authors developed a method that automatically processes bioluminescent tumor images for quantitative analysis and used it to describe the spatiotemporal distribution of tumor cells in response to CD19-targeting CAR-T cells, comprising CD28 or 4-1BB costimulatory domains. The conclusion highlights the dependence of tumor decay and relapse on the number of injected cells, the type of cells, and the initial growth rate of tumors ( where initial is intended from the first day of therapy). The authors also determined the spatiotemporal analysis of tumor response to CAR T therapy in different regions of the mouse body in a model of acute lymphoblastic leukemia (ALL).

      Strengths:

      The analysis is based on a large number of images and accounts for many variables. The results of the analysis largely support their claims that the kinetics of tumor decay and relapse are dependent on the CAR T co-stimulatory domain and number of cells injected and tumor growth rates.

      Weaknesses:

      The study does not specify how a) differences in mouse positioning (and whether they excluded not-aligned mice) and b) tumor spread at the start of therapy influenced their data. The study does not take into account the potential heterogeneity of CAR T cells in terms of CAR T expression or T cell immunophenotype ( differentiation, exhaustion, fitness...).

    3. Reviewer #3 (Public review):

      Summary:

      The paper "The 1000+ mouse project: large-scale spatiotemporal parametrization and modeling of preclinical cancer immunotherapies" is focused on developing a novel methodology for automatic processing of bioluminescence imaging data. It provides quantitative and statistically robust insights into preclinical experiments that will contribute to optimizing cell-based therapies. There is an enormous demand for such methods and approaches that enable the spatiotemporal evaluation of cell monitoring in large cohorts of experimental animals.

      Strengths:

      The manuscript is generally well written, and the experiments are scientifically sound. The conclusions reflect the soundness of experimental data. This approach seems to be quite innovative and promising to improve the statistical accuracy of BLI data quantification.

      This methodology can be used as a universal quantification tool for BLI data for in vivo assessment of adoptively transferred cells due to the versatility of the technology.

      Weaknesses:

      No weaknesses were identified by this Reviewer.

    4. eLife Assessment

      The authors developed a fundamental computational method, which is intended to automatically process bioluminescence imaging-derived tumour images across anatomical regions and over time. This allows quantitative analysis of such data, and the authors applied it to describe the spatiotemporal distribution of tumour cells in response to CD19-targeted CAR-T cells that contained either CD28 or 4-1BB costimulatory domains. Some operational limitations were identified, which relate to the pipeline's reliance on predefined regions of interest instead of aligning signal sites with anatomical information, scaling, and not taking animal pose into account. Overall, the authors provide compelling evidence for the functionality of their computational approach towards automated analysis of bioluminescence imaging data, while applying it to a current topic of wide interest in cell therapy research.

    1. eLife Assessment

      This fundamental work provides solid evidence that advances our understanding of the physical mechanisms underlying bacterial cell division by examining the role of membrane tension and FtsZ condensation in sequential stages of division. The effect of accDA overexpression on membrane tension was carefully characterized. To further enhance rigor, the authors could consider examining orthogonal perturbations to membrane tension, addressing membrane tension vs. fluidity, and addressing the ability of FtsZ to bend membranes in cells.

    2. Reviewer #1 (Public review):

      In this study, Ramirez-Diaz and coworkers address an important and lingering question in the bacterial cell division field, i.e., whether FtsZ polymers bend the cell membrane inwards, using an elegant and innovative approach. The key cell division protein FtsZ is a homolog of tubulin and forms curved polymers in the presence of GTP. It has long been hypothesized that this curvature provides the force to bend the cell membrane inwards, thereby triggering septal synthesis. Several in vitro studies have shown that purified FtsZ, when attached to the membrane, can indeed deform artificial membranes. However, other studies favor the view that only septal peptidoglycan synthesis drives cell division. Ramirez-Diaz has tried to address the membrane deformation theory in vivo by developing a mutant that synthesizes extra lipids. In this way, the membrane tension is lowered, which would facilitate cell division if deformation of the cell membrane by curved FtsZ polymers is a crucial step in cell division. Surprisingly, they showed that this mutant overcomes the cell division block in a sepF ezrA double mutant. In addition, they carefully characterize the membrane characteristics of the mutant and the effect on FtsZ ring formation. With this work, they have set up a very useful model system to study the role of the cell membrane in cell division, and also a new tool to better study the function of the cell division proteins EzrA and SepF. Overall, this is a very important study for the bacterial cell division field with interesting findings and ideas.

      Nevertheless, the authors jump to a conclusion that I cannot yet share. The main issue I have is that they focus on membrane tensions, yet what they seem to modulate is membrane fluidity. Both are clearly related but not the same. I think that it is important to extensively address this issue in the manuscript. They (also) use Laurdan generalized polarization as an indication of membrane tension (Figure 1F), but this method is primarily used in the literature to measure membrane fluidity. In addition, they explain the occurrence of strong local fluorescent membrane signals as the occurrence of double membranes (Figure S1D), whereas others have shown that such fluorescent hot spots can, in theory, also be formed by local accumulation of fluid lipids (PMID: 24603761). The reason why it is so important to distinguish fluidity from tension is that for the attachment of FtsZ polymers, the cell makes use of anchor proteins like FtsA that contain an amphipathic alpha helix, which inserts into the inner leaflet of the lipid bilayer. Importantly, this insertion only works when the fatty acids can be "pushed apart", and this is stimulated by unsaturated and short-chain fatty acids that make the membrane more fluid (PMID: 12676941). If a membrane is "more fluid", then it can more easily accommodate an amphipathic helix. Thus, the production of extra membrane material may increase the fluidity of the cell membrane, as the Laurdan GP measurements indicated, which can then facilitate the attachment of FtsA, including the attached FtsZ polymers, to the membrane. In other words, what the authors have observed may not be a stimulation of Z-ring formation due to lowering membrane tension, but rather because of stimulated binding of FtsZ polymers to the cell membrane. It might be that the attachment of late cell division to the Z-ring, which is all transmembrane proteins, is also facilitated in a more fluid lipid environment. The authors have not excluded the latter (by using a mutant depleted for one of the late cell division proteins).

      Finally, the authors performed EM studies to measure septa thickness, and surprisingly, they did not seem to observe deformed septa in a sepF-ezrA double mutant, when overexpressing accDA, while it has been shown before that the absence of SepF leads to strongly deformed septa. Since this finding nuances the mode of action of SepF polymers, it should be discussed.

      In conclusion, this is an important and interesting study, but it seems crucial for the interpretation of the findings to include a clear discussion on membrane fluidity and its consequences.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Ramirez-Diaz and colleagues set out to examine key physical mechanisms of bacterial cell division, using the Gram-positive model Bacillus subtilis. Specifically, they investigate the hypothesis that condensation of polymers of the master regulator of division FtsZ can deform membranes to initiate division, but that this is limited by membrane tension. They test this by modulating both membrane tension and FtsZ condensation genetically. To modulate membrane tension, they overexpress accDA to increase the rate of phospholipid synthesis and increase the "hidden membrane reservoir", thereby decreasing membrane tension. To modulate FtsZ condensation, they deplete the bundling protein EzrA in a background lacking a second bundling protein, SepF. They confirm the effects of accDA overexpression on membrane tension using two different sensors before assessing the relationship between membrane tension, FtsZ condensation, and division. They demonstrate that cells with excess membrane (reduced membrane tension) can divide with reduced bundling protein abundance, suggesting that FtsZ condensation driven by ZBPs normally serves to overcome membrane tension to initiate division. In addition, they find an inverse relationship between membrane tension and FtsZ ring constriction rate, but no effect of membrane tension on FtsZ treadmilling. Estimation of physical parameters leads them to conclude that very small membrane fluctuations are sufficient to initiate division in unperturbed cells and that the membrane contributes only ~0.1% of the total surface tension strength, maintaining cell shape.

      Strengths:

      The highly quantitative approach of this work is a strength, as is the rigorous assessment of membrane tension with multiple sensors. The model proposed is largely consistent with existing data and provides a mechanism for further study and validation. The study tackles a major outstanding question in bacterial cell biology, and provides a potential mechanism for a key step in replication with broad implications in other organisms.

      Weaknesses:

      The authors only use one method (overexpression of accDA) to perturb membrane tension, which could influence division in unanticipated ways (e.g., metabolic adaptations and/or activation of signaling pathways). The proposed model for initiation of division posits that FtsZ condensation bends membranes, which is supported by in vitro evidence, but there is no in vivo evidence that FtsZ condensation can bend membranes in cells. It remains possible that the function of FtsZ condensation is to localize sufficient cell wall synthetic activity to build peptidoglycan that rectifies membrane fluctuations.

    1. eLife Assessment

      This important study presents the rational redesign and engineering of interleukin-7. The data from the integrated approach of using computational, biophysical, and cellular experiments are convincing, but this study can further benefit from more quantitative analyses and structural data. This paper is broadly relevant to those studying immunomodulation using biologics.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript describes the use of computational tools to design a mimetic of the interleukin-7 (IL-7) cytokine with superior stability and receptor binding activity compared to the naturally occurring molecule. The authors focused their engineering efforts on the loop regions to preserve receptor interfaces while remediating structural irregularities that destabilize the protein. They demonstrated the enhanced thermostability, production yield, and bioactivity of the resulting molecule through biophysical and functional studies. Overall, the manuscript is well written, novel, and of high interest to the fields of molecular engineering, immunology, biophysics, and protein therapeutic design. The experimental methodologies used are convincing; however, the article would benefit from more quantitative comparisons of bioactivity through titrations.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents the computational design and experimental validation of Neo-7, an engineered variant of interleukin-7 (IL-7) with improved folding efficiency, expression yield, and therapeutic activity. The authors employed a rational protein design approach using Rosetta loop remodeling to reconnect IL-7's functional helices through shorter, more efficient loops, resulting in a protein with superior stability and binding affinity compared to wild-type IL-7. The work demonstrates promising translational potential for cancer immunotherapy applications.

      Strengths:

      (1) The integration of Rosetta loop remodeling with AlphaFold validation represents an established computational pipeline for rational protein design. The iterative refinement process, using both single-sequence and multimer AlphaFold predictions, is methodologically sound.

      (2) The authors provide thorough characterization across multiple platforms (yeast display, bacterial expression, mammalian cell expression) and assays (binding kinetics, thermostability, bioactivity), strengthening the robustness of their findings.

      (3) The identification of the critical helix 1 kink stabilized by disulfide bonding and its recreation through G4C/L96C mutations demonstrates deep structural understanding and successful problem-solving.

      (4) The MC38 tumor model results show clear therapeutic advantages of Neo-7 variants, with compelling immune profiling data supporting CD8+ T cell-mediated anti-tumor mechanisms.

      (5) The transcriptomic profiling provides valuable mechanistic insights into T cell activation states and suggests reduced exhaustion markers, which are clinically relevant.

      Weaknesses:

      (1) While computational predictions are extensive, the manuscript lacks experimental structural validation of the designed Neo-7 variants. The term "Structural Validation" should not be used in the header.

      (2) The authors observe slower on/off-rates for Neo-7 variants compared to wild-type IL-7. Could the authors speculate about the potential biological impacts of the slow off-rate, especially focusing on downstream signaling pathways that might be differentially affected by the altered binding kinetics of Neo-7 variants?

      (3) While computational immunogenicity prediction is provided, these methods are very limited.

    1. eLife Assessment

      This fundamental study explores a novel cellular mechanism underlying the degeneration of locus coeruleus neurons during chronic restraint stress. The evidence supporting the overexpression of LC neurons after chronic stress is compelling. However, to fully support the broad implications for LC degeneration and Alzheimer's disease, the study would benefit from stronger causal integration and validation in age-relevant models.

    2. Reviewer #1 (Public review):

      This study aims to elucidate the mechanisms by which stress-induced α2A-adrenergic receptor (α2A-AR) internalization leads to cytosolic noradrenaline (NA) accumulation and subsequent neuronal dysfunction in the locus coeruleus (LC). While the manuscript presents an interesting but ambitious model involving calcium dynamics, GIRK channel rundown, and autocrine NA signaling, several key limitations undermine the strength of the conclusions.

      First, the revision does not include new experiments requested by reviewers to validate core aspects of the mechanism. Specifically, there is no direct measurement of cytosolic NA levels or MAO-A enzymatic activity to support the link between receptor internalization and neurochemical changes. The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence.

      Second, the behavioral analysis remains insufficient to support claims of cognitive impairment. The use of a single working memory test following an anxiety test is inadequate to verify memory dysfunction behaviors. Additional cognitive assays, such as the Morris Water Maze or Novel Object Recognition, are recommended but not performed.

      Third, concerns regarding the lack of rigor in differential MAO-A expression in fluorescence imaging were not addressed experimentally. Instead of clarifying the issue, the authors moved the figure to supplementary data without providing further evidence (e.g., an enzymatic assay or quantitative reanalysis of Western blot, or re-staining of IF for MAO-A) to support their interpretation.

      Fourth, concerns regarding TH staining remain unresolved. In Figure S7, the α2A-AR signal appears to resemble TH staining, and vice versa, raising the possibility of labeling errors. It is recommended that the authors re-examine this issue by either double-checking the raw data or repeating the immunostaining to validate the staining.

      Overall, the manuscript offers a potentially interesting framework but falls short in providing the experimental rigor necessary to establish causality. The reliance on indirect reasoning and reorganizing of existing data, rather than generating new evidence, limits the overall impact and interpretability of the study.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the mechanism by which chronic stress induces degeneration of locus coeruleus (LC) neurons. The authors demonstrate that chronic stress leads to the internalization of α2A-adrenergic receptors (α2A-ARs) on LC neurons, causing increased cytosolic noradrenaline (NA) accumulation and subsequent production of the neurotoxic metabolite DOPEGAL via monoamine oxidase A (MAO-A). The study suggests a mechanistic link between stress-induced α2A-AR internalization, disrupted autoinhibition, elevated NA metabolism, activation of asparagine endopeptidase (AEP), and Tau pathology relevant to Alzheimer's disease (AD). The conclusions of this paper are largely well-supported by the data, but some aspects of image acquisition require further examination.

      Strengths:

      This study clearly demonstrates the effects of chronic stimulation on the excitability of LC neurons using electrophysiological techniques. It also elucidates the role of α2-adrenergic receptor (α2-AR) internalization and the associated upstream and downstream signaling pathways of GIRK-1, using a range of pharmacological agents, highlighting the innovative nature of the work. Additionally, the study identifies the involvement of the MAO-A-DOPEGAL-AEP pathway in this process. The topic is timely, the proposed mechanistic pathway is compelling, and the findings have translational relevance, particularly about therapeutic strategies targeting α2A-AR internalization in neurodegenerative diseases.

      Weaknesses:

      (1) The manuscript reports that chronic stress for 5 days increases MAO-A levels in LC neurons, leading to the production of DOPEGAL, activation of AEP, and subsequent tau cleavage into the tau N368 fragment, ultimately contributing to neuronal damage. However, the authors used wild-type C57BL/6 mice, and previous literature has indicated that AEP-mediated tau cleavage in wild-type mice is minimal and generally insufficient to cause significant behavioral alterations. Please clarify and discuss this apparent discrepancy.

      (2) It is recommended that the authors include additional experiments to examine the effects of different durations and intensities of stress on MAO-A expression and AEP activity. This would strengthen the understanding of stress-induced biochemical changes and their thresholds.

      (3) Please clarify the rationale for the inconsistent stress durations used across Figures 3, 4, and 5. In some cases, a 3-day stress protocol is used, while in others, a 5-day protocol is applied. This discrepancy should be addressed to ensure clarity and experimental consistency.

      (4) The abbreviation "vMAT2" is incorrectly formatted. It should be "VMAT2," and the full name (vesicular monoamine transporter 2) should be provided at first mention.

      Comments on revisions:

      The authors have addressed all of the reviewers' comments.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a technically impressive data set showing that repeated excitation or restraint stress internalises somatodendritic α2A adrenergic autoreceptors (α2A ARs) in locus coeruleus (LC) neurons. Loss of these receptors weakens GIRK-dependent autoinhibition, raises neuronal excitability, and is accompanied by higher MAO A, DOPEGAL, AEP, and tau N368 levels. The work combines rigorous whole-cell electrophysiology with barbadin-based trafficking assays, qPCR, Western blotting and immunohistochemistry. The final schematic is appealing and in principle, could explain early LC hyperactivity followed by degeneration in ageing and Alzheimer's disease.

      Strengths:

      Multi-level approach - The study integrates electrophysiology, pharmacology, mRNA quantification, and protein-level analysis.

      Use of barbadin to block β-arrestin/AP-2-dependent internalisation is both technically precise and mechanistically informative

      Well-executed electrophysiology

      translation relevance

      converges to a model that peers discussed (scientists can only discuss models - not data!)

      Weaknesses:

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The manuscript's logical flow is challenging and hard to follow, and key arguments could be more clearly structured, particularly in transitions between mechanistic components.

      We have revised our manuscript so as to make it easy for readers to follow the logical flow in transitions between mechanistic components by adding the descriptions of Figure S1E-J, Figure S2F-K, Figure S3A-H, Figure S4A-F, Figure S5, and Figure S6 in the revised manuscript.

      (2) The causality between stress-induced α2A-AR internalization and the enhanced MAO-A remains unclear. Direct experimental evidence is needed to determine whether α2A-AR internalization itself or Ca2+ drives MAO-A activation, and how they activate MAO-A should be considered.

      We believe that the causality between stress-induced α2A-AR internalization and the enhancement of MAO-A is clearly demonstrated by our current experiments, while our explanations may be improved by making them easier to understand especially for those who are not expert on electrophysiology.

      Firstly, it is well established that autoinhibition in LC neurons is mediated by α2A-AR coupled-GIRK (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience). We found that spike frequency adaptation in LC neurons was also mediated by α2A-AR coupled GIRK-I (Figure 1A-I), and that α2A-AR coupled GIRK-I underwent [Ca<sup>2+</sup>]<sub>i</sub> dependent rundown (Figures 2, S1, S2), leading to an abolishment of spike-frequency adaptation (Figures S4). [Ca<sup>2+</sup>]<sub>i</sub> dependent rundown of α2A-AR coupled GIRK-I was prevented by barbadin (Figure 2G-J), which prevents the internalization of G-protein coupled receptor (GPCR) channels.

      Abolishment of spike frequency adaptation itself, i.e., “increased spike activity” can increase [Ca<sup>2+</sup>]<sub>i</sub> because [Ca<sup>2+</sup>]<sub>i</sub> is entirely dependent on the spike activity as shown by [Ca<sup>2+</sup>]<sub>i</sub> imaging method in Figure S3.

      Thus, α2A-AR internalization can increase [Ca<sup>2+</sup>]<sub>i</sub> through the abolishment of autoinhibition or spike frequency adaptation, and a [Ca<sup>2+</sup>]<sub>i</sub> increase drives MAO-A activation as reported previously (Cao et al., 2007, BMC Neurosci). The mechanism how Ca<sup>2+</sup> activates MAO-A is beyond the scope of the current study.

      Our study just focused on the mechanism how chronic or sever stress can cause persistent overexcitation and how it results in LC degeneration.

      (3) The connection between α2A-AR internalization and increased cytosolic NA levels lacks direct quantification, which is necessary to validate the proposed mechanism.

      Direct quantification of the relationship between α2A-AR internalization and increased cytosolic NA levels may not be possible, and may not be necessarily needed to be demonstrated as explained below.

      The internalization of α2A-AR can increase [Ca<sup>2+</sup>]<sub>i</sub> through the abolishment of autoinhibition or spike frequency adaptation, and [Ca<sup>2+</sup>]<sub>i</sub> increases can facilitate NA autocrine (Huang et al., 2007), similar to the transmitter release from nerve terminals (Kaeser & Regehr, 2014, Annu Rev Physiol).

      Autocrine released NA must be re-uptaken by NAT (NA transporter), which is firmly established (Torres et al., 2003, Nat Rev Neurosci). Re-uptake of NA by NAT is the only source of intracellular NA, and NA re-uptake by NAT should be increased as the internalization of NA biding site (α2A-AR) progresses in association with [Ca<sup>2+</sup>]<sub>i</sub> increases (see page 11, lines 334-336).

      Thus, the connection between α2A-AR internalization and increased cytosolic NA levels is logically compelling, and the quantification of such connection may not be possible at present (see the response to the comment made by the Reviewer #1 as Recommendations for the authors (2) and beyond the scope of our current study.

      (4) The chronic stress model needs further validation, including measurements of stress-induced physiological changes (e.g., corticosterone levels) to rule out systemic effects that may influence LC activity. Additional behavioral assays for spatial memory impairment should also be included, as a single behavioral test is insufficient to confirm memory dysfunction.

      It is well established that restraint stress (RS) increases corticosterone levels depending on the period of RS (García-Iglesias et al., 2014, Neuropharmacology), although we are not reluctant to measure the corticosterone levels. In addition, there are numerous reports that showed the increased activity of LC neurons in response to various stresses (Valentino et al., 1983; Valentino and Foote, 1988; Valentino et al., 2001; McCall et al., 2015), as described in the text (page 4, lines 96-98). Measurement of cortisol levels may not be able to rule out systemic effects of CRS on the whole brain.

      We had already done another behavioral test using elevated plus maze (EPM) test.By combining the two tests, it may be possible to more accurately evaluate the results of Y-maze test by differentiating the memory impairment from anxiety. However, the results obtained by these behavioral tests are just supplementary to our current aim to elucidate the cellular mechanisms for the accumulation of cytosolic free NA. Therefore, we have softened the implication of anxiety and memory impairment (page 13, lines 397-400 in the revised manuscript).

      (5) Beyond b-arrestin binding, the role of alternative internalization pathways (e.g., phosphorylation, ubiquitination) in α2A-AR desensitization should be considered, as current evidence is insufficient to establish a purely Ca<sup>2+</sup> -dependent mechanism.

      We can hardly agree with this comment. 

      It was clearly demonstrated that repeated application of NA itself did not cause desensitization of α2A-AR (Figure S1A-D), and that the blockade of b-arrestin binding by barbadin completely suppressed the Ca<sup>2a</sup>-dependent downregulation of GIRK (Figure 2G-K). These observations can clearly rule out the possible involvement of phosphorylation or ubiquitination for the desensitization.

      Not only the barbadin experiment, but also the immunohistochemistry and western blot method clearly demonstrated the decrease of α2A-AR expression on the cell membrane (Figure 3).

      Ca<sup>2+</sup>-dependent mechanism of the rundown of GIRK was convincingly demonstrated by a set of different protocols of voltage-clamp study, in which Ca<sup>2+</sup> influx was differentially increased. The rundown of GIRK-I was orderly potentiated or accelerated by increasing the number of positive command pulses each of which induces Ca<sup>2+</sup> influx (compare Figure S1E-J, Figure S2A-E and Figure S2F-K along with Figure 2A-F). The presence or absence of Ca<sup>2+</sup> currents and the amount of Ca<sup>2+</sup> currents determined the trend of the rundown of GIRK-I (Figures 2, S1 and S2). Because the same voltage protocol hardly caused the rundown when it did not induce Ca<sup>2+</sup> currents in the absence of TEA (Figure S1F; compare with Figure 2B), blockade of Ca<sup>2+</sup> currents by nifedipine would not be so beneficial.

      We believe the series of voltage-clamp protocols convincingly demonstrated the orderly involvement of [Ca<sup>2+</sup>]<sub>i</sub> in accelerating the rundown of GIRK-I.

      (6) NA leakage for free NA accumulation is also influenced by NAT or VMAT2. Please discuss the potential role of VMAT2 in NA accumulation within the LC in AD. 

      It has been demonstrated that reduced VMAT2 levels increased susceptibility to neuronal damage: VMAT2 heterozygote mice displayed increased vulnerability to MPTP as evidenced by reductions in nigral dopamine cell counts (Takahashi et al, 1997, PNAS). Thus, when the activity of VMAT2 in LC neurons were impaired by chronic restraint stress, cytosolic NA levels in LC neurons would increase. We have added such discussion in the revised manuscript (page 12, lines 381-384).

      (7) Since the LC is a small brain region, proper staining is required to differentiate it from surrounding areas. Please provide a detailed explanation of the methodology used to define LC regions and how LC neurons were selected among different cell types in brain slices for whole-cell recordings.

      LC neurons were identified immunohistochemically and electrophysiologically as we previously reported (see Fig. 2 in Front. Cell. Neurosci. 16:841239. doi: 10.3389/fncel.2022.841239). We have added this explanation in the method section of the revised manuscript (page 15, lines 474-475). A delayed spiking pattern in response to depolarizing pulses (Figure S10 in the revised manuscript) applied at a hyperpolarized membrane potential was commonly observed in LC neurons in many studies (Masuko et al., 1986; van den Pol et al., 2002; Wagner-Altendorf et al., 2019).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The manuscript reports that chronic stress for 5 days increases MAO-A levels in LC neurons, leading to the production of DOPEGAL, activation of AEP, and subsequent tau cleavage into the tau N368 fragment, ultimately contributing to neuronal damage. However, the authors used wild-type C57BL/6 mice, and previous literature has indicated that AEP-mediated tau cleavage in wild-type mice is minimal and generally insufficient to cause significant behavioral alterations. Please clarify and discuss this apparent discrepancy.

      In our study, normalized relative value of AEP-mediated tau cleavage (Tau N368) was much higher in CRS mice than non-stress wild-type mice. It is not possible to compare AEP-mediated tau cleavage between our non-stress wild type mice and those observed in previous study (Zhang et al., 2014, Nat Med), because band intensity is largely dependent on the exposure time and its numerical value is the normalized relative value. In view of such differences, our apparent band expression might have been intensified to detect small changes.

      (2) It is recommended that the authors include additional experiments to examine the effects of different durations and intensities of stress on MAO-A expression and AEP activity. This would strengthen the understanding of stress-induced biochemical changes and their thresholds.

      GIRK rundown was almost saturated after 3-day RS and remained the same in 5-day RS mice (Fig. 4A-G), which is consistent with the downregulation of α2A-AR and GIRK1 expression by 3-day RS (Fig. 3C, F and G; Fig. 4J and K). However, we examined the protein levels of MAO-A, pro/active-AEP and Tau N368 only in 5-day RS mice without examining in 3-day RS mice. This is because we considered the possibility that a high [Ca<sup>2+</sup>]<sub>i</sub> condition may have to be sustained for some period of time to induce changes in MAO-A, AEP and Tau N368, and therefore 3-day RS may be insufficient to induce such changes. We have added this in the revised manuscript (page 17, lines 521-525).

      (3) Please clarify the rationale for the inconsistent stress durations used across Figures 3, 4, and 5. In some cases, a 3-day stress protocol is used, while in others, a 5-day protocol is applied. This discrepancy should be addressed to ensure clarity and experimental consistency.

      Please see our response to the comment (2).

      (4) The abbreviation "vMAT2" is incorrectly formatted. It should be "VMAT2," and the full name (vesicular monoamine transporter 2) should be provided at first mention.

      Thank you for your suggestion. We have revised accordingly.

      Reviewer #3 (Public review):

      Weaknesses:

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain. Below, I outline the key points that should be addressed to make the model convincing.

      Please see the responses to the recommendation for the authors made by reviewer #3.

      Reviewer #1 (Recommendations for the authors):

      (1) Improve the clarity and organization of the manuscript, ensuring smoother transitions between concepts and mechanisms.

      Please see the response to the comment raised by Reviewer #1 as Weakness

      (2) Adjust any quantifying method for cytosolic NA levels under different conditions to support the link between receptor internalization and NA accumulation.

      If fluorescent indicator of cytosolic free NA is available, it would be possible to measure changes in cytosolic NA levels. However, at present, there appeared to be no fluorescence probe to label cytosolic NA. For example, NS521 labels both dopamine and norepinephrine inside neurosecretory vesicles (Hettie & Glass et al., 2014, Chemistry), and BPS3 fluorescence sensor labels NA around cell membrane by anchoring on the cell membrane (Mao et al., 2023, Nat Comm). Furthermore, the method reported in “A Genetically Encoded Fluorescent Sensor for Rapid and Specific In Vivo Detection of Norepinephrine” is limited to detect NA only when α2AR is expressed. In the present study, increases in cytosolic NA levels are caused by internalization of α2AR. Cytosolic NA measurements with GRAB NE photometry may not be applicable in the present study. However, we have discussed the availability of such fluorescent methods to directly prove the increase in cytosolic NA as a limitation of our study (page 14, lines 429-436 in the revised manuscript).

      (3) Include validation of the chronic stress model with physiological and behavioral measures (e.g., corticosterone levels and another behavioral test).

      Please see the response to the comment raised by Reviewer #1 as Weakness (4).

      (4) All supplemental figures should be explicitly explained in the Results section. Specifically, clarify and describe the details of Figure S1G-K, Figure S2F-K, Figure S3A-H, Figure S4A-F, Figure S5, and Figure S6 to ensure all supplementary data are fully integrated into the main text.

      We have more explicitly and clearly described the details of Figure S1E-J, Figure S2F-K, Figure S3A-H, Figure S4A-F, Figure S5, and Figure S6 and fully integrated those explanations into the main text in the revised manuscript.

      (5) In Figure 3, the morphology of TH-positive cells differs between panels D and E. Additionally, TH is typically expressed in the cytosol, but in the provided images, it appears to be localized only to the membrane. Please clarify this discrepancy and provide a lower-magnification image to display a larger area, not one cell.

      In a confocal image, TH is not necessarily expressed homogenously in the cytosol, but is expressed in a ring-shaped pattern inside the plasma membrane, avoiding the cell nucleus and its surrounding Golgi apparatus and endoplasmic reticulum (ER) (Henrich et al., 2018, Acta Neuropathol Commun; see Fig. 4a and 6e), especially when the number of z-stack of confocal images is small. This is presumably because LC neurons are especially enriched with numerous Golgi apparatus and ER (Groves & Wilson, 1980, J Comp Neurol).

      In Figure S7, we showed a lower-magnification image of LC and its adjacent area (mesencephalic trigeminal nucleus). In the LC area, there are a variety of LC neurons, which include oval shaped neurons (open arrowhead; similar to Figure 3D) and also rhombus-like shaped neurons (open double arrowheads, similar to Figure 3E). A much lower-magnification image of LC neurons constituting LC nucleus was shown in Figure 5A.

      (6) In Figure 5, the difference in MAO-A expression is not clearly visible in the fluorescence images. Enzymatic assays for AEP and MAO-A should be included to demonstrate the increased activity better.

      In the current study, we did not elaborate to detect the changes in TH, MAO-A and AEP in terms of immunohistochemical method. Instead, we elaborated to detect such changes in terms of western blot method. The main conclusions in the current study were drawn primarily by electrophysiological techniques as we have expended much effort on electrophysiological experiments. Because the relative quantification of active AEP and Tau N368 proteins by western blotting analysis may accurately reflect changes in those enzyme activities, enzymatic assay may not be necessarily required but is helpful to better demonstrate AEP and MAO-A activity. We have described the necessity of enzymatic assay to better demonstrate the AEP and MAO-A activities (page 10, lines 314-315).

      Reviewer #3 (Recommendations for the authors):

      (1) Causality across the pathway

      Each step (α2A internalisation, GIRK rundown, Ca<sup>2+</sup> rise, MAO-A/AEP upregulation) is demonstrated separately, but no experiment links them in a single preparation. Consider in vivo Ca<sup>2+</sup> or GRAB NE photometry during restraint stress while probing α2A levels with i.p. clonidine injection or optogenetic over excitation coupled to biochemical readouts. Such integrated evidence would help to overcome the correlational nature of the manuscript to a more mechanistic study.

      It is not possible to measure free cytosolic NA levels with GRAB NE photometry when α2A AR is internalized as described above (see the response to the comment made by reviewer #1 as the recommendation for the authors).

      (2) Pharmacology and NE concentration

      The use of 100 µM noradrenaline saturates α and β adrenergic receptors alike. Please provide ramp measurements of GIRK current in dose-response at 1-10 µM NE (blocked by atipamezole) to confirm that the rundown really reflects α2A activity rather than mixed receptor effects.

      It is true that 100 µM noradrenaline activates both α and β adrenergic receptors alike. However, it was clearly showed that enhancement of GIRK-I by 100 µM noradrenaline was completely antagonized by 10 µM atipamezole and the Ca<sup>2+</sup> dependent rundown of NA-induced GIRK-I was prevented by 10 µM atipamezole. Considering the Ki values of atipamezole for α2A AR (=1~3 nM) (Vacher et al., 2010, J Med Chem) and β AR (>10 µM) (Virtanen et al., 1989, Arch Int Pharmacodyn Ther), these results really reflect α2A AR activity but not β AR activity (Figure S5). Furthermore, because it is already well established that NA-induced GIRK-I was mediated by α2A AR activity in LC neurons (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience), it is not necessarily need to re-examine 1-10 µM NA on GIRK-I.

      (3) Calcium dependence is not yet definitive

      The rundown is induced with a TEA-enhanced pulse protocol. Blocking L-type channels with nifedipine (or using Cd²⁺) during this protocol should show whether Ca<sup>2+</sup> entry is necessary. Without such a control, the Ca<sup>2+</sup> link remains inferential.

      The Ca<sup>2+</sup> link was precisely demonstrated by a series of voltage clamp experiment, in which Ca<sup>2+</sup> influx was orderly potentiated by increasing the number of positive voltage pulses (Figures S1 and S2). As the number of positive voltage pulses was increased, the rundown of GIRK-I was accelerated or enhanced more. The relationship between the number of spikes and the Ca<sup>2+</sup> influx detected as Ca<sup>2+</sup> transients was well documented in Ca2+ imaging experiments using fura-2 (Figure S3).

      The presence or absence of Ca<sup>2+</sup> currents and the amount of Ca<sup>2+</sup> currents determined the trend of the rundown of GIRK-I (Figs. 2, S1 and S2). The same voltage protocol hardly caused the rundown when it did not induce Ca<sup>2+</sup> currents in the absence of TEA (Fig. S1F; compare with Fig. 2B), and the series of voltage-clamp protocols convincingly demonstrated the orderly involvement of [Ca<sup>2+</sup>]<sub>i</sub> in accelerating the rundown of GIRK-I. Therefore, blockade of Ca<sup>2+</sup> currents by nifedipine may not be so beneficial.

      (4) Age mismatch and disease claims

      All electrophysiology and biochemical data come from juvenile (< P30) mice, yet the conclusions stress Alzheimer-related degeneration. Key endpoints need to be replicated in adult or aged mice, or the manuscript should soften its neurodegenerative scope.

      As described in the section of Conclusion, we never stress Alzheimer-related degeneration, but might give such an impression. To avoid such a misunderstanding, we have added a description “However, the present mechanism must be proven to be valid in adult or old mice, to validate its involvement in the pathogenesis of AD.” (page 14, lines 448-450).

      (5) Direct evidence for extracellular/cytosolic NE

      The proposed rise in reuptake NA is inferred from electrophysiology. Modern fluorescent sensors (GRAB NE, nLight) or fast scan voltammetry could quantify NE overflow and clearance during stress, directly testing the model.

      Please see the response to the comment made by Reviewer #1 as the Recommendations for the authors (2) as described above.

      (6) Quantitative histology

      Figure 5 presents attractive images but no numerical analysis. Please provide ROI-based fluorescence quantification (with n values) or move the images to the supplement and rely on the Western blots.

      We have moved the immunohistochemical results in Fig. 5 to the supplement as we believe the quantification of immunohistochemical staining is not necessarily correct.

    1. eLife Assessment

      This study examines a valuable question regarding the developmental trajectory of neural mechanisms supporting facial expression processing. Leveraging a rare intracranial EEG (iEEG) dataset including both children and adults, the authors reported that facial expression recognition mainly engaged the posterior superior temporal cortex (pSTC) among children, while both pSTC and the prefrontal cortex were engaged among adults. However, the sample size is relatively small, with analyses appearing incomplete to fully support the primary claims.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates how the brain processes facial expressions across development by analyzing intracranial EEG (iEEG) data from children (ages 5-10) and post-childhood individuals (ages 13-55). The researchers used a short film containing emotional facial expressions and applied AI-based models to decode brain responses to facial emotions. They found that in children, facial emotion information is represented primarily in the posterior superior temporal cortex (pSTC) - a sensory processing area - but not in the dorsolateral prefrontal cortex (DLPFC), which is involved in higher-level social cognition. In contrast, post-childhood individuals showed emotion encoding in both regions. Importantly, the complexity of emotions encoded in the pSTC increased with age, particularly for socially nuanced emotions like embarrassment, guilt, and pride. The authors claim that these findings suggest that emotion recognition matures through increasing involvement of the prefrontal cortex, supporting a developmental trajectory where top-down modulation enhances understanding of complex emotions as children grow older.

      Strengths:

      (1) The inclusion of pediatric iEEG makes this study uniquely positioned to offer high-resolution temporal and spatial insights into neural development compared to non-invasive approaches, e.g., fMRI, scalp EEG, etc.

      (2) Using a naturalistic film paradigm enhances ecological validity compared to static image tasks often used in emotion studies.

      (3) The idea of using state-of-the-art AI models to extract facial emotion features allows for high-dimensional and dynamic emotion labeling in real time.

      Weaknesses:

      The study has notable limitations that constrain the generalizability and depth of its conclusions. The sample size was very small, with only nine children included and just two having sufficient electrode coverage in the posterior superior temporal cortex (pSTC), which weakens the reliability and statistical power of the findings, especially for analyses involving age. Electrode coverage was also uneven across brain regions, with not all participants having electrodes in both the dorsolateral prefrontal cortex (DLPFC) and pSTC, and most coverage limited to the left hemisphere-hindering within-subject comparisons and limiting insights into lateralization. The developmental differences observed were based on cross-sectional comparisons rather than longitudinal data, reducing the ability to draw causal conclusions about developmental trajectories. Moreover, the analysis focused narrowly on DLPFC, neglecting other relevant prefrontal areas such as the orbitofrontal cortex (OFC) and anterior cingulate cortex (ACC), which play key roles in emotion and social processing. Although the use of a naturalistic film stimulus enhances ecological validity, it comes at the cost of experimental control, with no behavioral confirmation of the emotions perceived by participants and uncertain model validity for complex emotional expressions in children. A non-facial music block that could have served as a control was available but not analyzed. Generalizability is further limited by the fact that all participants were neurosurgical patients, potentially with neurological conditions such as epilepsy that may influence brain responses. Additionally, the high temporal resolution of intracranial EEG was not fully utilized, as data were downsampled and averaged in 500-ms windows. Finally, the absence of behavioral measures or eye-tracking data makes it difficult to directly link neural activity to emotional understanding or determine which facial features participants attended to.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Fan et al. aim to characterize how neural representations of facial emotions evolve from childhood to adulthood. Using intracranial EEG recordings from participants aged 5 to 55, the authors assess the encoding of emotional content in high-level cortical regions. They report that while both the posterior superior temporal cortex (pSTC) and dorsolateral prefrontal cortex (DLPFC) are involved in representing facial emotions in older individuals, only the pSTC shows significant encoding in children. Moreover, the encoding of complex emotions in the pSTC appears to strengthen with age. These findings lead the authors to suggest that young children rely more on low-level sensory areas and propose a developmental shift from reliance on lower-level sensory areas in early childhood to increased top-down modulation by the prefrontal cortex as individuals mature.

      Strengths:

      (1) Rare and valuable dataset: The use of intracranial EEG recordings in a developmental sample is highly unusual and provides a unique opportunity to investigate neural dynamics with both high spatial and temporal resolution.

      (2) Developmentally relevant design: The broad age range and cross-sectional design are well-suited to explore age-related changes in neural representations.

      (3) Ecological validity: The use of naturalistic stimuli (movie clips) increases the ecological relevance of the findings.

      (4) Feature-based analysis: The authors employ AI-based tools to extract emotion-related features from naturalistic stimuli, which enables a data-driven approach to decoding neural representations of emotional content. This method allows for a more fine-grained analysis of emotion processing beyond traditional categorical labels.

      Weaknesses:

      (1) The emotional stimuli included facial expressions embedded in speech or music, making it difficult to isolate neural responses to facial emotion per se from those related to speech content or music-induced emotion.

      (2) While the authors leveraged Hume AI to extract facial expression features from the video stimuli, they did not provide any validation of the tool's accuracy or reliability in the context of their dataset. It remains unclear how well the AI-derived emotion ratings align with human perception, particularly given the complexity and variability of naturalistic stimuli. Without such validation, it is difficult to assess the interpretability and robustness of the decoding results based on these features.

      (3) Only two children had relevant pSTC coverage, severely limiting the reliability and generalizability of results.

      (4) The rationale for focusing exclusively on high-frequency activity for decoding emotion representations is not provided, nor are results from other frequency bands explored.

      (5) The hypothesis of developmental emergence of top-down prefrontal modulation is not directly tested. No connectivity or co-activation analyses are reported, and the number of participants with simultaneous coverage of pSTC and DLPFC is not specified.

      (6) The "post-childhood" group spans ages 13-55, conflating adolescence, young adulthood, and middle age. Developmental conclusions would benefit from finer age stratification.

      (7) The so-called "complex emotions" (e.g., embarrassment, pride, guilt, interest) used in the study often require contextual information, such as speech or narrative cues, for accurate interpretation, and are not typically discernible from facial expressions alone. As such, the observed age-related increase in neural encoding of these emotions may reflect not solely the maturation of facial emotion perception, but rather the development of integrative processing that combines facial, linguistic, and contextual cues. This raises the possibility that the reported effects are driven in part by language comprehension or broader social-cognitive integration, rather than by changes in facial expression processing per se.

    1. eLife Assessment

      This work presents a useful investigation of functional and structural brain changes following navigation and verbal memory training. The analyses of whole-brain structural changes are incomplete and would benefit from a more comprehensive approach to support the study's main conclusion regarding the lack of a structural whole-brain plasticity effect. However, some analyses are exhaustive and compelling in demonstrating the presence of longitudinal behavioural effects, the presence of functional activation changes, and the lack of hippocampal volume changes.

    2. Joint Public Review:

      Summary:

      This study investigates plasticity effects in brain function and structure from training in navigation and verbal memory.

      The authors used a longitudinal design with a total of 75 participants across two sites. Participants were randomised to one of three conditions: verbal memory training, navigation training, or a video control condition. The results show behavioural effects in relevant tasks following the training interventions. The central claim of the paper is that network-based measures of task-based activation are affected by the training interventions, but structural brain metrics (T2w-derived volume and diffusion-weighted imaging microstructure) are not impacted by any of the training protocols tested.

      Strengths:

      (1) This is a well-designed study which uses two training conditions, an active control, and randomisation, as appropriate. It is also notable that the authors combined data acquisition across two sites to reach the needed sample size and accounted for it in their statistical analyses quite thoroughly. In addition, I commend the authors on using pre-registration of the analysis to enhance the reproducibility of their work.

      (2) Some analyses in the paper are exhaustive and compelling in showcasing the presence of longitudinal behavioural effects, functional activation changes, and lack of hippocampal volume changes. The breadth of analysis on hippocampal volume (including hippocampal subfields) is convincing in supporting the claim regarding a lack of volumetric effect in the hippocampus.

      Weaknesses:

      (1) The rationale for the study and its relationship with previous literature is not fully clear from the paper. In particular, there is a very large literature that has already explored the longitudinal effects of different types of training on functional and structural neuroimaging. However, this literature is barely acknowledged in the Introduction, which focuses on cross-sectional studies. Studies like the one by Draganski et al. 2004 are cited but not discussed, and are clumped together with cross-sectional studies, which is confusing. As a reader, it is difficult to understand whether the study was meant to be confirmatory based on previous literature, or whether it fills a specific gap in the literature on longitudinal neuroimaging effects of training interventions.

      (2) The main claim regarding the lack of changes in brain structure seems only partially supported by the analyses provided. The limited whole-brain evidence from structural neuroimaging makes it difficult to confirm whether there is indeed no effect of training. Beyond hippocampal analyses, many whole-brain analyses of both volumetric and diffusion-weighted imaging metrics are only based on coarse ROIs (for example, 34 cortical parcellations for grey matter analyses). Although vertex-wise analyses in FreeSurfer are reported, it is unclear what metrics were examined (cortical thickness? area? volume?). Diffusion-weighted imaging seems to focus on whole-tract atlas ROIs, which can be less accurate/sensitive than tractography-defined ROIs or voxel-wise approaches.

      (3) Quality control of images is only mentioned for FA images in subject space. Given that most analyses are based on atlas ROIs, visual checks following registration are fundamental and should be described in further detail.

    1. eLife Assessment

      This important study fills a gap in our knowledge of the evolution of GPCRs in holozoans, as well as the phylogeny of associated signaling pathway components such as G proteins, GRKs, and RIC8 proteins. The evidence supporting the conclusions is compelling, with the analysis of extensive new genomic data from choanoflagellates and other non-animal holozoans. Overall, the study is thorough and well-executed. It will be a resource for researchers interested in both the comparative genomics of multicellularity and GPCR biology more broadly, especially given the importance of GPCRs as highly druggable targets.

    2. Reviewer #1 (Public review):

      Summary:<br /> The authors strived for an inventory of GPCRs and GPCR pathway component genes within the genomes of 23 choanoflagellates and other close relatives of metazoans.

      Strengths:<br /> The authors generated a solid phylogenetic overview of the GPCR superfamily in these species. Intriguingly, they discover novel GPCR families, novel assortments of domain combinations, novel insights into the evolution of those groups within the Opisthokonta clade. A particular focus is laid on adhesion GPCRs, for which the authors discover many hitherto unknown subfamilies based on Hidden Markov Models of the 7TM domain sequences, which were also reflected by combinations of extracellular domains of the homologs. In addition, the authors provide bioinformatic evidence that aGPCRs of choanoflagellates also contained a GAIN domain, which are self-cleavable thereby reflecting the most remarkable biochemical feat of aGPCRs.

      Weaknesses:<br /> The chosen classification scheme for aGPCRs may require reassessment and amendment by the authors in order to prevent confusion with previously issued classification attempts of this family.

    3. Reviewer #2 (Public review):

      Summary:<br /> The authors set out to characterise the GPCR family in choanoflagellates (and other unicellular holozoans). GPCRs are the most abundant gene family in many animal genomes, playing crucial roles in a wide range of physiological processes. Although they are known to evolve rapidly, GPCRs are an ancient feature of eukaryotic biology. Identifying conserved elements across the animal-protist boundary is therefore a valuable goal, and the increasing availability of genomes from non-animal holozoans provides new opportunities to explore evolutionary patterns that were previously obscured by limited taxon sampling. This study presents a comprehensive re-examination of GPCRs in choanoflagellates, uncovering examples of differential gene retention and revealing the dynamic nature of the GPCR repertoire in this group. As GPCRs are typically involved in environmental sensing, understanding how these systems evolved may shed light on how our unicellular ancestors adapted their signalling networks in the transition to complex multicellularity.

      Strengths:<br /> The paper combines a broad taxonomic scope with the use of both established and recently developed tools (e.g. Foldseek, AlphaFold), enabling a deep and systematic exploration of GPCR diversity. Each family is carefully described, and the manuscript also functions as an up-to-date review of GPCR classification and evolution. Although similar attempts of understanding GPCR evolution were done over the last decade, the authors build on this foundation by identifying new families and applying improved computational methods to better predict structure and function. Notably, the presence of Rhodopsin-like GPCRs in some choanoflagellates and ichthyosporeans is intriguing, even though they do not fall within known animal subfamilies. The computational framework presented here is broadly applicable, offering a blueprint for surveying GPCR diversity in other non-model eukaryotes (and even in animal lineages), potentially revealing novel families relevant to drug discovery or helping revise our understanding of GPCR evolution beyond model systems.

      Weaknesses:<br /> While the study contributes several interesting observations, it does not radically revise the evolutionary history of the GPCR family. However, in an era increasingly concerned with the reproducibility of scientific findings, this is arguably a strength rather than a weakness. It is encouraging to see that previously established patterns largely hold, and that with expanded sampling and improved methods, new insights can be gained-especially at the level of specific GPCR subfamilies. Then, no functional follow ups are provided in the model system Salpingoeca rosetta, but I am sure functional work on GPCRs in choanoflagellates is set to reveal very interesting molecular adaptations in the future.

      Comments on the latest version:

      The authors have done a good job answering my questions and suggestions.

    1. eLife Assessment

      This valuable study tested the impact of DNA methylation on CTCF binding in two cancer cell lines. Increased CTCF binding sites are enriched in gene bodies, and associate with nuclear speckles, indicating a role in increased transcription. In the revised work, the inferred association with nuclear speckles has been supported with more solid data. These results will be of interest to the epigenetics field.

    2. Reviewer #2 (Public review):

      Summary:

      CTCF is one of the most well-characterized regulators of chromatin architecture in mammals. Given that CTCF is an essential protein, understanding how its binding is regulated is a very active area of research. It has been known for decades that CTCF is sensitive to 5-cystosine DNA methylation (5meC) in certain contexts. Moreover, at genomic imprints and in certain oncogenes, 5meC-mediated CTCF antagonism has very important gene regulatory implications. A number of labs (eg, Schubeler and Stamatoyannopoulos) have assessed the impact of DNA methylation on CTCF binding, but it is important to also interrogate the effect on chromatin organization (ie, looping). Here, Roseman and colleagues used a DNMT1 inhibitor in two established human cancer lines (HCT116 [colon] and K562 [leukemia]), and performed CTCF ChIPseq and HiChIP. They showed that "reactivated" CTCF sites-that is, bound in the absence of 5meC-are enriched in gene bodies, participate in many looping events, and intriguingly, appear associated with nuclear speckles. This last aspect suggests that these reactivated loops might play an important role in increased gene transcription. They showed a number of genes that are upregulated in the DNA hypomethylated state actually require CTCF binding, which is an important result.

      Strengths:

      Overall, I found the paper to be succinctly written and the data presented clearly. The relationship between CTCF binding in gene bodies and association with nuclear speckles is an interesting result. Another strong point of the paper was combining DNMT1 inhibition with CTCF degradation.

      Weaknesses:

      The most problematic aspect of the original version was the insufficient evidence for the association of "reactivated" CTCF binding sites with nuclear speckles. This has been more diligently assessed in the revised version.

      Comments on revisions:

      The authors have adequately addressed my points in this revised version.

    1. eLife Assessment

      This important study investigates changes in oscillatory activity across cortical and subcortical areas during stroke recovery in a nonhuman primate model. The authors distinguish between global and local oscillatory bursts, providing solid evidence that these two types of bursts correlate with distinct aspects of movement; additionally, they show that the likelihood of these bursts occurring follows opposing trends during recovery. The study could be further improved by accounting for inter-individual differences and by some technical improvements, such as employing more robust burst detection methods and more stringent analyses.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates beta burst dynamics in the primate motor cortex during movement and recovery from stroke. The authors differentiate between "global" beta bursts, which are synchronous across cortical and often subcortical regions, and more spatially confined "local" bursts. Global bursts are associated with reduced spiking variability, slower movements, and are more frequent after stroke, while local bursts increase during recovery and grasp execution. The study provides compelling evidence that beta bursts with different spatial and temporal characteristics may play distinct roles in motor control and recovery.

      Strengths:

      The major strength of this paper lies in its conceptual advance: the identification and characterization of distinct global and local beta bursts in the primate motor cortex. This distinction builds upon and considerably extends previous work on the heterogeneity of beta bursts. The paper is methodologically rigorous, using simultaneous cortical and subcortical recordings, detailed behavioral tracking, and thorough analyses of spike-LFP interactions. The use of stroke models and neurotypical animals provides converging evidence for the functional dissociation between burst types. The observation that local bursts increase with motor recovery and occur during grasping is particularly novel and may prove valuable for developing biomarkers of motor function.

      Weaknesses:

      There are several conceptual and methodological limitations that should be addressed. First, the burst detection method relies on an amplitude threshold (median + 1 SD), which is susceptible to false positives and variability (Langford & Wilson, 2025). The classification into global or local bursts then depends on the number of co-bursting channels, compounding the arbitrariness. Second, the imposition of a minimum of three co-bursting cortical channels may bias against the detection of truly local bursts. Third, the classification is entirely cortical; subcortical activity is considered post hoc rather than integrated into the classification, despite the key role of subcortical-cortical synchrony in motor control. Fourth, the apparent dissociation between global and local bursts raises important questions about their spatial distribution across areas like M1 and PMv, which are not thoroughly analyzed. Finally, while the authors interpret local bursts during grasping as novel, similar findings have been reported (e.g., Szul et al., 2023; Rayson et al., 2023), and a deeper discussion of these precedents would strengthen the argument.

      Impact:

      This work is likely to have a substantial impact on the field of motor systems neuroscience. The distinction between global and local beta bursts offers a promising framework for understanding the dual roles of beta in motor inhibition and sensorimotor computation. The findings are relevant not only for basic research but also for translational efforts in stroke rehabilitation and neuromodulation, particularly given the emerging interest in beta burst-based biomarkers and stimulation targets. The dataset and analytical framework will be useful to researchers investigating beta dynamics, spike-field relationships, and recovery from neural injury.

      Langford, Z.D., Wilson, C.R.E., 2025. Simulations reveal that beta burst detection may inappropriately characterize the beta band. https://doi.org/10.1101/2023.12.15.571838.

      Rayson, H., Szul, M.J., El-Khoueiry, P., Debnath, R., Gautier-Martins, M., Ferrari, P.F., Fox, N., Bonaiuto, J.J., 2023. Bursting with potential: How sensorimotor beta bursts develop from infancy to adulthood. J. Neurosci. https://doi.org/10.1523/JNEUROSCI.0886-23.2023.

      Szul, M.J., Papadopoulos, S., Alavizadeh, S., Daligaut, S., Schwartz, D., Mattout, J., Bonaiuto, J.J., 2023. Diverse beta burst waveform motifs characterize movement-related cortical dynamics. Prog. Neurobiol. 228, 102490.

    3. Reviewer #2 (Public review):

      Summary:

      The paper by Khanna et al. describes global vs local beta synchrony between a cortical premotor area (PMv) and subcortical structures during motor tasks in the non-human primate, specifically investigating the progression following M1 injury. They found that increases in global beta synchrony between PMv and subcortical structures during the sub-acute phase of injury, and that global synchrony was associated with relatively slower motor movements. As recovery progressed, they report a shift from global synchrony to local synchrony and a subsequent reduction in the movement time. The authors suggest that global changes in subcortical and cortical beta synchrony may generally underpin a variety of movement disorders, including Parkinson's disease, and that shifting from global to local (or reducing global synchrony) might improve functional outcomes.

      Strengths:

      Ischemic insults and other acquired brain injuries have a significant public health impact. While there is a large body of clinical and basic science studies describing the behavioral, neurophysiological, and mechanistic outcomes of such injury, there is a significant lack studies looking at longitudinal, behaviorally-related neurophysiological measures following cortical injury, so any information has outsized contribution to understanding how brain injury disrupts underlying neural activity and how this may contribute to injury presentation and recovery.

      A significant percentage of pre-clinical stroke studies tend to focus on peri-infarct or other cortical structures and their role in recovery. The addition of subcortical recordings allows for the investigation of the role of thalamo-basal gangliar-cortical loops that may be contributing to the degree of impairment or to the recovery process is important for the field. Here, there are longitudinal (up to 3 months post-injury) recordings in the ventral premotor area (PMv) and either the internal capsule or sensorimotor thalamus that can be synchronized with phases of behavioral recovery.

      The methods are well described and can act as a framework for assessing synchrony across other data sets with similar recording locations. Limitations in methodology, recordings, and behavior were noted.

      Weaknesses:

      A major limitation of this paper is that it is a set of case studies rather than a well-designed, well-controlled study of beta synchrony following motor cortex injury. While non-human primate neurophysiological studies are almost always limited by extremely low animal numbers, they are made up for by the fact that they can acquire significant numbers of units or channels, and in the case of normal behavior, can obtain many behavioral trials over months of individual sessions. Here, there were two NHPs used, but they had different subcortical implant locations (thalamus vs internal capsule). They had different injury outcomes, with one showing a typical recovery curve following injury while one had complications and worsening behavior before ultimately recovering. Further, there were significant differences in the ability to record at different times, with one NHP having poor recordings early in the recovery process while one had poor recordings late in the process. Due to the injury, the authors report sessions in which they were not able to record many trials (~10). Assuming that recovery after a cortical injury is an evolving process, breaking analysis into "Early" and "Late" phases reduces the interpretation of where these shifts occur relative to recovery on the task, especially given different thresholds for recovery were used between animals. Because of this, despite a careful analysis of the data and an extensive discussion, the conclusions derived are not particularly compelling. To overcome this, the authors present data from neurotypical NHPs, but with electrodes in M1 rather than PMv, doing a completely different task with no grasping component, again making accurate conclusions about the results difficult. Even with low numbers, the study would have been much stronger if there were within-animal longitudinal data prior to and after the injury on the same task, so the impact of M1 injury could be better assessed.

      It is unclear to what extent the subpial aspiration used is a stroke model. While it is much more difficult to perform a pure ischemic motor injury using electrocoagulatory methods in animal models that do not have a lissencephalic cortex, the suction ablation method that the authors use leads to different outcomes than an ischemic injury alone. For instance, in rat models, ischemic vs suction ablation leads to very different electrophysiological profiles and differences in underlying anatomical reorganization (see Carmichael and Chesselet, 2002), even if the behavioral outcomes were similar. There is a concern that the effects shown may be an artifact of the lesion model rather than informing underlying mechanisms of recovery.

      The injury model leads to seemingly mild impairments in grasp (but not reach), with rapid and complete recovery occurring within 2-3 weeks from the time of injury. Because of the rapid recovery, relating the physiological processes of recovery to beta synchronization becomes challenging to interpret - Are the global bursts the result of the loss of M1 input to subcortical structures? Are they due to the lack of M1 targets, so there is a more distributed response? Is this due to other post-injury sub-acute mechanisms? How specific is this response - is it limited to peri-infarct areas (and to what extent is the PMv electrode truly in peri-infarct cortex), or would this synchrony be seen anywhere in the sensorimotor networks? Are the local bursts present because global synchrony wanes over time as a function of post-injury homeostatic mechanisms, or is local beta synchrony increasing as new motor plans are refined and reinforced during task re-acquisition? How coupled are they related to recovery - if it is motor plan refinement, the shift from global to local seemingly should lag the recovery? While the study has significant limitations in design that reduce the impact of the results, it should act as a useful baseline/pilot data set in which to build a more complete picture of the role of subcortical-cortical beta synchrony following cortical injury.

    4. Reviewer #3 (Public review):

      Summary:

      Khanna et al. use a well-conceived and well-executed set of experiments and analyses primarily to document the interaction between neural oscillations in the beta range (here, 13-30 Hz) and recovery of function in an animal model of stroke. Specifically, they show that cortical "beta bursts", or short-term increases in beta power, correlate strikingly with the timeline of behavioral recovery as quantified with a reach-to-grasp task. A key distinction is made between global beta bursts (here, those that synchronize between cortical and subcortical areas) and local bursts (which appear on only a few electrodes). This distinction of global vs. local is shown to be relevant to task performance and movement speed, among other quantities of interest.

      A secondary results section explores the relationship between beta bursts and neuronal firing during the grasp portion of the behavioral task. These results are valuable to include, though mostly unsurprising, with global beta in particular associated with lower mean and variance in spike rates.

      Last, a partial recapitulation of the primary results is offered with a neurologically intact (uninjured) animal. No major contradictions are found with the primary results.

      Highlights of the Discussion section include a thoughtful review of atypical movements executed by individuals with Parkinson's disease or stroke survivors, placing the current results in an appropriate clinical context. Potential physiological mechanisms that could account for the observed results are also discussed effectively.

      Strengths:

      Overall, this is a very interesting paper. The ultimate impact will be enhanced by the authors' choice to analyze beta bursts, which remain a relatively under-explored aspect of neural coding.

      The reach-and-grasp task was also a well-considered choice; the combination of a relatively simple movement (reaching towards a target in the same location each time) and a more complex movement (a skilled object-manipulation grasp) provides an internal control of sorts for data analysis. In addition, the task's two sub-movements provide a differential in terms of their likelihood to be affected by the stroke-like injury: proximal muscles (controlling reach) are likely to be less affected by stroke, while distal muscles (controlling grasp) are highly likely to be affected. Lastly, the requirement of the task to execute an object lift maximizes its difficulty and also the potential translational impact of the results on human injury.

      The above comments about the task exemplify a strength that is more generally evident: a welcome awareness of clinical relevance, which is in evidence several times throughout the Results and Discussion.

      Weaknesses:

      The study's weaknesses are mostly minor and, for the most part, correctable.

      One concern that may not be correctable in this study: the results about the spatial extent of beta activity seem constrained by relatively poor-quality data. It seems half or more of the electrodes are marked as too noisy to provide useful data in Figure 3. If this reflects the wider reality for all analyses, as mentioned, it may not be correctable for the present study. In that case, perhaps some of the experiments or analyses can be revisited or expanded for a future study, when better electrode yields are available.

      Other concerns:

      In some places, there is a lack of clarity in the presentation of the results. This is not serious but should be addressed to aid readers' comprehension.

      Lastly, given the central role of beta oscillations within the study, it would be better for completeness to include even a brief exploration of sustained beta power (rather than bursts), and the modulation of sustained beta (or lack thereof) in the study's areas of concern: behavioral recovery, task performance, etc.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates beta burst dynamics in the primate motor cortex during movement and recovery from stroke. The authors differentiate between "global" beta bursts, which are synchronous across cortical and often subcortical regions, and more spatially confined "local" bursts. Global bursts are associated with reduced spiking variability, slower movements, and are more frequent after stroke, while local bursts increase during recovery and grasp execution. The study provides compelling evidence that beta bursts with different spatial and temporal characteristics may play distinct roles in motor control and recovery.

      We thank the reviewer for their assessment that the manuscript proves compelling evidence for distinct roles of local and global beta bursts on motor control and recovery.  

      Strengths:

      The major strength of this paper lies in its conceptual advance: the identification and characterization of distinct global and local beta bursts in the primate motor cortex. This distinction builds upon and considerably extends previous work on the heterogeneity of beta bursts. The paper is methodologically rigorous, using simultaneous cortical and subcortical recordings, detailed behavioral tracking, and thorough analyses of spikeLFP interactions. The use of stroke models and neurotypical animals provides converging evidence for the functional dissociation between burst types. The observation that local bursts increase with motor recovery and occur during grasping is particularly novel and may prove valuable for developing biomarkers of motor function.

      We thank the reviewer for recognizing the strengths of this manuscript. 

      Weaknesses:

      There are several conceptual and methodological limitations that should be addressed. First, the burst detection method relies on an amplitude threshold (median + 1 SD), which is susceptible to false positives and variability (Langford & Wilson, 2025). The classification into global or local bursts then depends on the number of co-bursting channels, compounding the arbitrariness. Second, the imposition of a minimum of three co-bursting cortical channels may bias against the detection of truly local bursts. 

      We thank the reviewer for bringing up these methodological details. We plan to conduct a follow-up analysis using alternative burst detection methods to verify that the paper’s main results hold when using different burst detection methodologies. We anticipate this will improve confidence in our results. 

      Third, the classification is entirely cortical; subcortical activity is considered post hoc rather than integrated into the classification, despite the key role of subcortical-cortical synchrony in motor control. 

      We thank the reviewer for this comment. First, because the different animals had subcortical recording sites in different locations, we hesitate to use subcortical activity in the classification of bursts since we were not sure we would be identifying the same burst-phenomenon (e.g. thalamo-cortical bursts vs. capsule-cortical bursts may differ). Second, we believe that having a cortical-only criteria allows the designation of local vs. global bursts to be more widely applied in preparations that only have access to cortical data (e.g. surface ECoG recordings, EEG, Utah array recordings). Thus, in this study we chose to analyze the subcortical data post-hoc (after burst detection and classification) to support our “global” vs. “local” designation of burst types 

      Fourth, the apparent dissociation between global and local bursts raises important questions about their spatial distribution across areas like M1 and PMv, which are not thoroughly analyzed. 

      We thank the reviewer for this comment. In our study’s stroke animals, we chose to study PMv due to its role in compensating for damage to M1, thus we hesitate to make any comparisons between PMv (which was recorded in stroke animals) and M1 (recorded in healthy unimpaired animals). Furthermore, animals are doing different tasks (e.g. reaching vs. reaching and grasping) which may also influence the spatial distribution. We agree that future work should certainly investigate the spatial distribution of global vs. local beta bursts across areas of sensorimotor cortex and subcortex, and that this comparison would be best done in healthy animals with both reaching and grasping behaviors.  

      Finally, while the authors interpret local bursts during grasping as novel, similar findings have been reported (e.g., Szul et al., 2023; Rayson et al., 2023), and a deeper discussion of these precedents would strengthen the argument.

      Thank you for these references! We will review them and incorporate them into our discussion of our results. 

      Impact:

      This work is likely to have a substantial impact on the field of motor systems neuroscience. The distinction between global and local beta bursts offers a promising framework for understanding the dual roles of beta in motor inhibition and sensorimotor computation. The findings are relevant not only for basic research but also for translational efforts in stroke rehabilitation and neuromodulation, particularly given the emerging interest in beta burst-based biomarkers and stimulation targets. The dataset and analytical framework will be useful to researchers investigating beta dynamics, spike-field relationships, and recovery from neural injury.

      We thank the reviewers for their assessment that our work will likely have a substantial impact on the field of motor systems neuroscience. 

      Reviewer #2 (Public review):

      Summary:

      The paper by Khanna et al. describes global vs local beta synchrony between a cortical premotor area (PMv) and subcortical structures during motor tasks in the non-human primate, specifically investigating the progression following M1 injury. They found that increases in global beta synchrony between PMv and subcortical structures during the sub-acute phase of injury, and that global synchrony was associated with relatively slower motor movements. As recovery progressed, they report a shift from global synchrony to local synchrony and a subsequent reduction in the movement time. The authors suggest that global changes in subcortical and cortical beta synchrony may generally underpin a variety of movement disorders, including Parkinson's disease, and that shifting from global to local (or reducing global synchrony) might improve functional outcomes.

      Strengths:

      Ischemic insults and other acquired brain injuries have a significant public health impact. While there is a large body of clinical and basic science studies describing the behavioral, neurophysiological, and mechanistic outcomes of such injury, there is a significant lack studies looking at longitudinal, behaviorally-related neurophysiological measures following cortical injury, so any information has outsized contribution to understanding how brain injury disrupts underlying neural activity and how this may contribute to injury presentation and recovery.

      A significant percentage of pre-clinical stroke studies tend to focus on peri-infarct or other cortical structures and their role in recovery. The addition of subcortical recordings allows for the investigation of the role of thalamo-basal gangliar-cortical loops that may be contributing to the degree of impairment or to the recovery process is important for the field. Here, there are longitudinal (up to 3 months post-injury) recordings in the ventral premotor area (PMv) and either the internal capsule or sensorimotor thalamus that can be synchronized with phases of behavioral recovery.

      The methods are well described and can act as a framework for assessing synchrony across other data sets with similar recording locations. Limitations in methodology, recordings, and behavior were noted.

      We thank the reviewer for their comments on the strengths of this paper.  

      Weaknesses:

      A major limitation of this paper is that it is a set of case studies rather than a welldesigned, well-controlled study of beta synchrony following motor cortex injury. While non-human primate neurophysiological studies are almost always limited by extremely low animal numbers, they are made up for by the fact that they can acquire significant numbers of units or channels, and in the case of normal behavior, can obtain many behavioral trials over months of individual sessions. Here, there were two NHPs used, but they had different subcortical implant locations (thalamus vs internal capsule). They had different injury outcomes, with one showing a typical recovery curve following injury while one had complications and worsening behavior before ultimately recovering. Further, there were significant differences in the ability to record at different times, with one NHP having poor recordings early in the recovery process while one had poor recordings late in the process. Due to the injury, the authors report sessions in which they were not able to record many trials (~10). Assuming that recovery after a cortical injury is an evolving process, breaking analysis into "Early" and "Late" phases reduces the interpretation of where these shifts occur relative to recovery on the task, especially given different thresholds for recovery were used between animals. Because of this, despite a careful analysis of the data and an extensive discussion, the conclusions derived are not particularly compelling. To overcome this, the authors present data from neurotypical NHPs, but with electrodes in M1 rather than PMv, doing a completely different task with no grasping component, again making accurate conclusions about the results difficult. Even with low numbers, the study would have been much stronger if there were within-animal longitudinal data prior to and after the injury on the same task, so the impact of M1 injury could be better assessed.

      We thank the reviewer for these comments. Below we address some of these in more detail: 

      Different subcortical implant locations: We would like to clarify that the subcortical recordings were only used to confirm that global beta bursts (as characterized by cortical recordings alone) did indeed occur on subcortical sites coincidentally with cortical site more frequently than local beta bursts. Neither the beta burst categories nor the beta bursts themselves were influenced by the subcortical recordings.  

      Different injury outcomes: There is difficulty in creating strokes that result in identical deficits across animal as we and others have noted in previous work[1.3]. As a field, we are still understanding what factors give rise to variability in recovery curves. For example, one recent study noted that biological sex is a factor in predicting differences in recovery rates[4], and another noted that baseline white matter hyperintensities is also predictive of post-stroke recovery [5]. Overall, our methodology that creates structurally-consistent lesions can still result in very different functional outcomes depending on a variety of factors. Given this state of the field, we have done our best to match the recovery curves between our two animals, especially the initial recovery curves before Monkey H’s secondary decline. 

      Differences in ability to record at different times: We note this as a strength. One concern with these studies that induce stroke at the same time as implanting electrode arrays is that it is well appreciated that single-unit neuron yield right after array implantation is low and then improves in the following weeks [6]. There is always that concern that having more units later in recovery may drive results, but in this case, since one animal showed the opposite trend we are more confident that results are not driven by increases in unit-yield. We also note that we broadly see similar unit quality metrics in the early and late stages in both animals (Fig. S7).  

      Breaking continuous recovery curve into early and late: We note that this division was only made for one main analysis in the paper (Fig. 5CD): assessment of mean firing and variance of single-unit firing rates.  Without this split our analyses would be underpowered and inconclusive, thus we would not be able to provide any comment on how firing rates change, even coarsely, with recovery. 

      Presentation of data from M1 of healthy animals doing a different task: We agree that the strongest data would be longitudinally recorded from the same animals/brain areas pre-stroke and then post-stroke. However, we also view our inclusion of separate healthy animals doing a different task as evidence that our global vs. local segregation of beta bursts generalizes beyond the reach-to-grasp task to reaching-only tasks.  

      Overall, we appreciate the reviewer pointing out these notes about our data. In some cases we do not think these notes are concerning, in others, we acknowledge that have done the best we can given the state of the neurophysiology stroke recovery field. 

      It is unclear to what extent the subpial aspiration used is a stroke model. While it is much more difficult to perform a pure ischemic motor injury using electrocoagulatory methods in animal models that do not have a lissencephalic cortex, the suction ablation method that the authors use leads to different outcomes than an ischemic injury alone. For instance, in rat models, ischemic vs suction ablation leads to very different electrophysiological profiles and differences in underlying anatomical reorganization (see Carmichael and Chesselet, 2002), even if the behavioral outcomes were similar. There is a concern that the effects shown may be an artifact of the lesion model rather than informing underlying mechanisms of recovery.

      We thank the reviewer for bringing this up. 

      Clarification of our stroke model methodology: We wish to highlight that when we create stroke, we first do surface vessel occlusion as the first step. This is designed to match true ischemic injury. After a waiting period, the injured tissue is then aspiration to reduce the effects of edema and secondary mass effect in the model. 

      Carmichael and Chesselet 2002: The rodent work cited did show differential effects of a suction ablation method (without any surface vessel occlusion first) versus an ischemic method. The effects observed in this work were in the first 5 days following stroke. In our case, we started recording on day 7 and examined recovery over extended periods (weeks to months). 

      Effects of acute insult on rehabilitation: From a rehabilitation perspective, it remains unclear how the acute insult affects outcomes weeks and months later. One line of evidence to suggest that the manner that the acute insult occurs may not matter for rehabilitation is the observation that one therapeutic approach (vagus nerve stimulation) has been found to successfully improve rehabilitation outcomes in a range of injury models (intracranial hemorrhage, stroke, spinal cord injury). We agree that additional work is required in this area.

      Human stroke data shows similar results reported: Lastly, we note that neurophysiology performed in humans with clinical strokes supports the results we seek here (e.g.[7], see discussion section for full elaboration) suggesting that our stroke model methodology is similar enough to clinical stroke to result in similar results. 

      The injury model leads to seemingly mild impairments in grasp (but not reach), with rapid and complete recovery occurring within 2-3 weeks from the time of injury. Because of the rapid recovery, relating the physiological processes of recovery to beta synchronization becomes challenging to interpret - Are the global bursts the result of the loss of M1 input to subcortical structures? Are they due to the lack of M1 targets, so there is a more distributed response? Is this due to other post-injury sub-acute mechanisms? How specific is this response - is it limited to peri-infarct areas (and to what extent is the PMv electrode truly in peri-infarct cortex), or would this synchrony be seen anywhere in the sensorimotor networks? Are the local bursts present because global synchrony wanes over time as a function of post-injury homeostatic mechanisms, or is local beta synchrony increasing as new motor plans are refined and reinforced during task re-acquisition? How coupled are they related to recovery - if it is motor plan refinement, the shift from global to local seemingly should lag the recovery?  

      We think these are all wonderful questions that could be addressed in follow-up studies! 

      While the study has significant limitations in design that reduce the impact of the results, it should act as a useful baseline/pilot data set in which to build a more complete picture of the role of subcortical-cortical beta synchrony following cortical injury.

      We agree that this is a study that should be treated as a starting point for further investigation. 

      Reviewer #3 (Public review):

      Summary:

      Khanna et al. use a well-conceived and well-executed set of experiments and analyses primarily to document the interaction between neural oscillations in the beta range (here, 13-30 Hz) and recovery of function in an animal model of stroke. Specifically, they show that cortical "beta bursts", or short-term increases in beta power, correlate strikingly with the timeline of behavioral recovery as quantified with a reach-to-grasp task. A key distinction is made between global beta bursts (here, those that synchronize between cortical and subcortical areas) and local bursts (which appear on only a few electrodes). This distinction of global vs. local is shown to be relevant to task performance and movement speed, among other quantities of interest.

      A secondary results section explores the relationship between beta bursts and neuronal firing during the grasp portion of the behavioral task. These results are valuable to include, though mostly unsurprising, with global beta in particular associated with lower mean and variance in spike rates.

      Last, a partial recapitulation of the primary results is offered with a neurologically intact (uninjured) animal. No major contradictions are found with the primary results.

      Highlights of the Discussion section include a thoughtful review of atypical movements executed by individuals with Parkinson's disease or stroke survivors, placing the current results in an appropriate clinical context. Potential physiological mechanisms that could account for the observed results are also discussed effectively.

      Strengths:

      Overall, this is a very interesting paper. The ultimate impact will be enhanced by the authors' choice to analyze beta bursts, which remain a relatively under-explored aspect of neural coding.

      The reach-and-grasp task was also a well-considered choice; the combination of a relatively simple movement (reaching towards a target in the same location each time) and a more complex movement (a skilled object-manipulation grasp) provides an internal control of sorts for data analysis. In addition, the task's two sub-movements provide a differential in terms of their likelihood to be affected by the stroke-like injury: proximal muscles (controlling reach) are likely to be less affected by stroke, while distal muscles (controlling grasp) are highly likely to be affected. Lastly, the requirement of the task to execute an object lift maximizes its difficulty and also the potential translational impact of the results on human injury.

      The above comments about the task exemplify a strength that is more generally evident: a welcome awareness of clinical relevance, which is in evidence several times throughout the Results and Discussion.

      Weaknesses:

      The study's weaknesses are mostly minor and, for the most part, correctable.

      One concern that may not be correctable in this study: the results about the spatial extent of beta activity seem constrained by relatively poor-quality data. It seems half or more of the electrodes are marked as too noisy to provide useful data in Figure 3. If this reflects the wider reality for all analyses, as mentioned, it may not be correctable for the present study. In that case, perhaps some of the experiments or analyses can be revisited or expanded for a future study, when better electrode yields are available.

      We thank the reviewer for their comments. We note that we have chosen to be particularly conservative with which channels we considered noise-free and acceptable for analysis as our animals were not head-posted (see methods: “On each day, trials were manually inspected alongside camera data for any movement or chewing artifacts (note that animals were not head-posted) and were discarded from neural data analysis if there were any artifacts”). After re-visiting our analysis, we note that the data shown in Fig. 3 (spatial distribution of local bursts) is not representative from a data quality perspective – this data was from a session that had a particularly large number of channels discarded due to artifacts. We plan to correct this to show a more representative figure. 

      Other concerns:

      In some places, there is a lack of clarity in the presentation of the results. This is not serious but should be addressed to aid readers' comprehension.

      We thank the reviewer for this comment and for their numerous suggestions in the notes to the authors. We plan to address as many of these as we can to improve clarity and comprehension.  

      Lastly, given the central role of beta oscillations within the study, it would be better for completeness to include even a brief exploration of sustained beta power (rather than bursts), and the modulation of sustained beta (or lack thereof) in the study's areas of concern: behavioral recovery, task performance, etc.

      We thank the reviewer for this suggestion – we plan to include this in our revisions.  

      References cited in response to public reviewer comments: 

      (1) Ganguly, K., Khanna, P., Morecraft, R. J. & Lin, D. J. Modulation of neural co-firing to enhance network transmission and improve motor function after stroke. Neuron 110, 2363–2385 (2022).

      (2) Khanna, P. et al. Low-frequency stimulation enhances ensemble co-firing and dexterity after stroke. Cell 184, 912-930.e20 (2021).

      (3) Darling, W. G. et al. Sensorimotor Cortex Injury Effects on Recovery of Contralesional Dexterous Movements in Macaca mulatta. Exp Neurol 281, 37–52 (2016).

      (4) Bottenfield, K. R. et al. Sex differences in recovery of motor function in a rhesus monkey model of cortical injury. Biology of Sex Differences 12, 54 (2021).

      (5) Schwarz, A. et al. Association that Neuroimaging and Clinical Measures Have with Change in Arm Impairment in a Phase 3 Stroke Recovery Trial. Ann Neurol 97, 709– 719 (2025).

      (6) Gulati, T. et al. Robust Neuroprosthetic Control from the Stroke Perilesional Cortex. J. Neurosci. 35, 8653–8661 (2015).

      (7) Silberstein, P. et al. Cortico-cortical coupling in Parkinson’s disease and its modulation by therapy. Brain 128, 1277–1291 (2005).

    1. eLife Assessment

      This manuscript describes solid and very interesting findings that substantially advance our understanding of a major research question on the role of Cx32 hemichannels in the Schwann cell paranode. It provides an interdisciplinary integration of imaging, in silico approaches, and functional data. This important study proposes a new mechanism with profound physiological relevance and provides new insights into glial modulation of electrical conduction in sensory/motor myelinated nerves.

    2. Reviewer #1 (Public review):

      The manuscript by Butler et al. explores a novel physiological role for connexin 32 (Cx32) hemichannels in Schwann cells at peripheral nerves. Building on the authors' prior work on CO₂-sensitive gating of connexins, this study proposes that mitochondrial CO₂ production dependent on neuronal activity promotes the opening of Cx32 hemichannels in the paranode, which in turn modulates neuronal activity by reducing conduction velocity. This hypothesis is addressed using a multifaceted approach that includes immunofluorescence microscopy, dye uptake assays, calcium imaging, computational modeling, and extracellular recordings in isolated sciatic nerves.

      Among the strengths of the study are the interdisciplinary integration of imaging, in silico approaches, and functional data. Also, this study proposes a new mechanism with profound physiological relevance. Specifically, Butler et al. provide new insights into glial modulation of electrical conduction in sensory/motor myelinated nerves.

      In the current state, the study has some limitations. The evidence linking Cx32 to the observed dye uptake and conduction velocity changes relies primarily on pharmacological inhibition with carbenoxolone, which lacks specificity. The imaging data show overlapping marker signals that preclude the anatomical distinction between nodes and paranodes. FITC uptake, while convincing to test Cx32 hemichannel gating, lacks spatial-temporal information and validation of distribution and localization to viable intracellular compartments. Moreover, while the findings are intriguing, functional proof that Cx32 regulates conduction velocity through ATP release or other downstream effects remains incomplete. Further work using targeted genetic tools, live-tissue imaging, and additional controls would strengthen the mechanistic conclusions.

      Overall, the manuscript offers compelling preliminary evidence that supports a new role for Cx32 in peripheral nerve physiology and raises important questions for future investigation.

    3. Reviewer #2 (Public review):

      Summary:

      This article aims to demonstrate that local production of CO₂ at the axonal node opens Cx32 hemichannels in the Schwann cell paranode, and that CO₂ diffuses through the AQP1 channel to reach Cx32 and trigger its opening. The authors also present evidence supporting a physiological role for this regulatory mechanism. They propose that CO₂-dependent Cx32 activation mediates activity-dependent Ca²⁺ influx into the paranode, and by increasing the leak current across the myelin sheath, it contributes to a slowing of action potential conduction velocity.

      The study presents a very interesting and novel mechanism for the physiological regulation of Cx32 hemichannels. The findings are relevant to the field, and the methods and results are of good quality, with some improvements in interpretation and explanation required, and some minor experimental suggestions.

      Strengths:

      The article is solid in terms of the novelty of the findings and relevance for the physiology of myelinated axons. In addition, it is of major interest for the Connexin field because it explores a physiological way to open Cx32 hemichannels. The experiments are well elaborated, and most of them are sufficient for the main points described by the authors. The finding that nervous activity will trigger the mechanism of hemichannel opening by CO2 is probably the most relevant biological mechanism derived from this article.

      Weaknesses:

      Throughout the manuscript, the authors interpret their findings as if the described mechanism specifically occurs in the node and paranode regions. However, there is no direct evidence identifying the precise site of CO₂ production or the activation site of Cx32 hemichannels. Therefore, statements such as the one in the title ("activity-dependent CO₂ production in the axonal node opens Cx32 in the Schwann cell paranode") should be reconsidered or removed, as they may be misleading and are not essential to the interpretation of the data. In addition, the participation of aquaporin AQP1 as the main conduit for CO2 diffusion through the plasma membrane could have another interpretation.

    4. Author response:

      Reviewer #1 (Public review): 

      The manuscript by Butler et al. explores a novel physiological role for connexin 32 (Cx32) hemichannels in Schwann cells at peripheral nerves. Building on the authors' prior work on CO₂-sensitive gating of connexins, this study proposes that mitochondrial CO₂ production dependent on neuronal activity promotes the opening of Cx32 hemichannels in the paranode, which in turn modulates neuronal activity by reducing conduction velocity. This hypothesis is addressed using a multifaceted approach that includes immunofluorescence microscopy, dye uptake assays, calcium imaging, computational modeling, and extracellular recordings in isolated sciatic nerves. 

      Among the strengths of the study are the interdisciplinary integration of imaging, in silico approaches, and functional data. Also, this study proposes a new mechanism with profound physiological relevance. Specifically, Butler et al. provide new insights into glial modulation of electrical conduction in sensory/motor myelinated nerves. 

      In the current state, the study has some limitations. The evidence linking Cx32 to the observed dye uptake and conduction velocity changes relies primarily on pharmacological inhibition with carbenoxolone, which lacks specificity. The imaging data show overlapping marker signals that preclude the anatomical distinction between nodes and paranodes. FITC uptake, while convincing to test Cx32 hemichannel gating, lacks spatial-temporal information and validation of distribution and localization to viable intracellular compartments. Moreover, while the findings are intriguing, functional proof that Cx32 regulates conduction velocity through ATP release or other downstream effects remains incomplete. Further work using targeted genetic tools, live-tissue imaging, and additional controls would strengthen the mechanistic conclusions. 

      Overall, the manuscript offers compelling preliminary evidence that supports a new role for Cx32 in peripheral nerve physiology and raises important questions for future investigation. 

      We thank the reviewer for their comments and agree that the evidence for involvement of Cx32 is indirect. We are planning to perform genetic manipulations to strengthen this link. We shall review our presentation of the morphology in terms of the node/paranode/juxtaparanode distribution and adjust accordingly. We have in the interim generated new data using GCaMP transduced into Schwann cells that provides the live-tissue imaging that the reviewer requests. This strengthens our conclusions, and we will add these data into the paper.

      Reviewer #2 (Public review): 

      Summary: 

      This article aims to demonstrate that local production of CO₂ at the axonal node opens Cx32 hemichannels in the Schwann cell paranode, and that CO₂ diffuses through the AQP1 channel to reach Cx32 and trigger its opening. The authors also present evidence supporting a physiological role for this regulatory mechanism. They propose that CO₂-dependent Cx32 activation mediates activity-dependent Ca²⁺ influx into the paranode, and by increasing the leak current across the myelin sheath, it contributes to a slowing of action potential conduction velocity. 

      The study presents a very interesting and novel mechanism for the physiological regulation of Cx32 hemichannels. The findings are relevant to the field, and the methods and results are of good quality, with some improvements in interpretation and explanation required, and some minor experimental suggestions. 

      Strengths: 

      The article is solid in terms of the novelty of the findings and relevance for the physiology of myelinated axons. In addition, it is of major interest for the Connexin field because it explores a physiological way to open Cx32 hemichannels. The experiments are well elaborated, and most of them are sufficient for the main points described by the authors. The finding that nervous activity will trigger the mechanism of hemichannel opening by CO2 is probably the most relevant biological mechanism derived from this article. 

      Weaknesses: 

      Throughout the manuscript, the authors interpret their findings as if the described mechanism specifically occurs in the node and paranode regions. However, there is no direct evidence identifying the precise site of CO₂ production or the activation site of Cx32 hemichannels. Therefore, statements such as the one in the title ("activity-dependent CO₂ production in the axonal node opens Cx32 in the Schwann cell paranode") should be reconsidered or removed, as they may be misleading and are not essential to the interpretation of the data. In addition, the participation of aquaporin AQP1 as the main conduit for CO2 diffusion through the plasma membrane could have another interpretation. 

      We thank the reviewer for their comments and agree that we do not have direct evidence for the site of CO2 production or the site of activation of Cx32 hemichannels. This direct evidence is extremely difficult to obtain, and we therefore depend on indirect arguments. Mitochondria represent the major source of CO2, and their distribution will therefore indicate where CO2 is likely to be produced. We agree that this is not essential to the interpretation of the data and will adjust the text as recommended. We will add a section to the Discussion to consider this point in more detail.

    1. eLife Assessment

      The article presents important findings of a dissociation between phasic and tonic pain functions in adaptive behavior, combining immersive VR, computational modeling, skin conductance, and EEG data. The methodology used is solid. Its ecological design and sophisticated computational modeling are major strengths. The article would benefit from adding details on hypotheses, VR implementation, sample size determination, modeling, analysis, and pain specificity.

    2. Reviewer #1 (Public review):

      Summary:

      This article presents a study consisting of two experiments, which aim to dissociate and quantify the distinct motivational functions of phasic and tonic pain within a naturalistic and immersive VR setting. Specifically, the authors test two hypotheses: (i) that phasic pain acts as a punishment signal that drives avoidance learning; (ii) that tonic pain reduces motivational vigor, promoting energy conservation and recuperation. In both experiments, participants performed a free-operant foraging task, where they collected virtual pineapples to earn points.

      In Experiment 1, phasic pain was delivered as a brief electric shock to the grasping hand when picking up green pineapples. As phasic pain intensity increased, participants were less likely to choose painful fruits. A reinforcement learning model that incorporated reward, pain cost, and effort cost was able to successfully capture behavior.

      Experiment 2 combined the effects of phasic and tonic pain. Tonic pain was induced by a pressure cuff on the non-dominant arm, simulating sustained discomfort. Interestingly, tonic pain did not affect the perceived intensity or avoidance of phasic pain. However, it significantly reduced movement velocity and pineapple collection rate, interpreted as a reduction of motivational vigor. A temporal decision model incorporating vigor cost successfully captured these effects.

      Concomitant EEG recordings showed that tonic pain was associated with reduced alpha and beta power in parietal and temporal areas. Phasic pain ratings and decision values distinctively correlated with skin conductance responses.

      Overall, these findings indicate that phasic and tonic pain have distinct and dissociable motivational effects.

      Strengths:

      This is an ambitious study that provides a quantitative dissociation of the roles of phasic and tonic pain in adaptive behavior, by integrating ecological neuroscience, motivational theory, and computational modeling. The use of immersive VR combined with a free-operant foraging task offers a more ecologically valid context to study pain-related behavior compared to traditional paradigms. Furthermore, the study employs a multimodal approach by combining behavioral data, computational frameworks, physiological signals, and EEG. In particular, one of the main strengths of the study is the use of sophisticated computational modeling to capture phasic and tonic pain effects. The experiment codes are available on GitHub, increasing reproducibility.

      Weaknesses:

      The main limitations of this article are that it provides insufficient detail on VR implementation. The design of the VR environment is, at this stage, under-described. Crucial information is missing, such as the number of pineapples per block, timing precision, details on how motion is mapped to the virtual movement, etc. This aspect strongly limits the reproducibility of the experiments. A second limitation lies in the lack of clarity regarding the study hypotheses. Although two overarching hypotheses can be inferred, they are not explicitly formulated. To this end, it is unclear which analyses were merely exploratory, especially for physiological and EEG outcomes.

      In Experiment 2, the reduction in vigor during tonic pain could plausibly reflect attentional load rather than pain per se. As recognized by the authors, there is no control condition involving an innocuous salient stimulus to rule out non-specific effects of distraction. Perhaps a tonic non-painful but salient somatosensory stimulus (e.g., a strong vibrotactile stimulus applied on the same arm) could have been used as a control stimulus.

    3. Reviewer #2 (Public review):

      Summary:

      The study investigated the distinct roles of phasic and tonic pain in adaptive behavior. Phasic pain was proposed to function as a teaching signal, promoting avoidance of further injury, while tonic pain was hypothesized to support recuperative behavior by reducing motivational vigor. This hypothesis was tested using an immersive virtual reality (VR) EEG foraging task, in which participants harvested fruit in a forest environment. Some fruits triggered brief phasic pain to the grasping hand, which in turn reduced the likelihood of choosing those fruits. Concurrently, tonic pressure pain applied to the contralateral upper arm was associated with reduced action velocities. The authors employed a free-operant computational framework to quantify how phasic and tonic pain modulate motivational vigor and decision value. Importantly, model parameters were found to correlate with EEG responses, providing neurophysiological support for the hypothesized functional distinctions.

      Strengths:

      Overall, this study aims to address an important topic and is generally well written.

      Weaknesses:

      Two critical issues require clarification or justification.

      First, phasic pain was induced using electrical stimulation, which typically elicits somatosensory evoked potentials (SEPs). These responses may not reflect pain-specific processes and thus complicate interpretation. This issue bears directly on the study's conclusions, especially when discussing interactions between phasic and tonic pain. For example, tonic pain is known to reduce perceived intensity or cortical responses to phasic pain stimuli delivered elsewhere on the body - an effect not expected for SEPs elicited by electrical stimuli.

      Second, additional control experiments are necessary to rule out alternative explanations. For instance, the authors are suggested to deliver phasic pain to the contralateral arm (e.g., at 1-2 Hz), which might also reduce action velocity. Similarly, tonic pain applied to the grasping hand should be tested to disentangle hand-specific effects.

    4. Reviewer #3 (Public review):

      Summary:

      This study investigates how phasic and tonic pain modulate behaviour in a free-operant foraging paradigm. The authors apply a computational modeling approach to the behavioural data to quantify the decision value of phasic pain, as well as the degree to which tonic pain reduces motivational vigour. EEG assessments showed, e.g., reduced signal power at alpha and beta frequencies in tonic pain conditions compared to no-tonic-pain conditions, but no association between these neural measures and motivational vigour. The authors conclude that tonic and phasic pain serve different motivational functions, with phasic pain acting as a punishment signal promoting avoidance and tonic pain reducing motivational vigour.

      Strengths:

      The experimental paradigm is highly innovative. Assessing human behaviour in a naturalistic yet highly controlled setting represents a promising approach to pain research. Notably, assessing pain magnitude implicitly, via its motivational value, offers insights about the overall pain experience that are not usually accessible via common pain ratings.

      Weaknesses:

      Despite these strengths, the manuscript would benefit significantly from more precise definitions of key concepts and an overall clearer, more coherent presentation of its main arguments. The writing, in its current form, often presents claims that are too vague or insufficiently connected with the experimental findings. Moreover, certain aspects of the computational modeling and statistical analysis appear flawed or inadequately justified.

    5. Author response:

      Reviewer #1 (Public review):

      The main limitations of this article are that it provides insufficient detail on VR implementation. The design of the VR environment is, at this stage, under-described. Crucial information is missing, such as the number of pineapples per block, timing precision, details on how motion is mapped to the virtual movement, etc. This aspect strongly limits the reproducibility of the experiments. A second limitation lies in the lack of clarity regarding the study hypotheses. Although two overarching hypotheses can be inferred, they are not explicitly formulated. To this end, it is unclear which analyses were merely exploratory, especially for physiological and EEG outcomes.

      In Experiment 2, the reduction in vigor during tonic pain could plausibly reflect attentional load rather than pain per se. As recognized by the authors, there is no control condition involving an innocuous salient stimulus to rule out non-specific effects of distraction. Perhaps a tonic non-painful but salient somatosensory stimulus (e.g., a strong vibrotactile stimulus applied on the same arm) could have been used as a control stimulus.

      We appreciate the reviewer's comments regarding the insufficient implementation details. We hope the newly uploaded software for reproducing the experiment can improve the reader's understanding of the task. In addition to making the software available, we will expand the Methods section in the revised manuscript to include greater detail on the task description.

      The hypothesised functions of phasic and tonic pain, and their collaborative interaction, are both broad and deep topics. In the revised manuscript, we will more explicitly formulate our hypotheses and clarify the distinction between a priori predictions and exploratory analyses, particularly concerning the extent to which our evidence supports these hypotheses.

      We agree that examining the potential role of attentional load on the interaction between tonic and phasic pain is an important area of future investigation. Addition of additional control conditions matched for attentional salience with additional experiments is possible but introduces other confounds related to their different qualities (e.g. a salient vibrotactile stimulus might invigorate behaviour): however more fundamentally, attentional processes are a core part of pain function, and should not necessarily be viewed as a confound (i.e. the way that pain mediates some of its core functional effects may directly be through its salient attentional nature) . This view is formalised in Wall and Melzack’s classical tripartite model of pain, and distinguishes pain from purely sensory systems such as somatosensation, vision and so on..

      Reviewer #2 (Public review):

      Two critical issues require clarification or justification. First, phasic pain was induced using electrical stimulation, which typically elicits somatosensory evoked potentials (SEPs). These responses may not reflect pain-specific processes and thus complicate interpretation. This issue bears directly on the study's conclusions, especially when discussing interactions between phasic and tonic pain. For example, tonic pain is known to reduce perceived intensity or cortical responses to phasic pain stimuli delivered elsewhere on the body - an effect not expected for SEPs elicited by electrical stimuli.

      We acknowledge the reviewer’s concern regarding the specificity of evoked potentials elicited by electrical stimulation. We agree that traditional SEPs—particularly those evoked by large surface electrodes—primarily reflect activation of non-nociceptive A-beta fibres and thus may not reliably index pain-specific processes or be modulated by tonic pain via descending nociceptive control. However, we would like to clarify that phasic pain was administered in the present study using small-diameter concentric ‘Wasp’ electrodes. These are comparable to intraepidermal electrodes shown to preferentially activate nociceptive A-delta fibres, thereby eliciting ERPs more closely associated with nociceptive processing rather than mixed somatosensory input [1, 2]. Accordingly, our ERP results demonstrated a reliable increase in N1-P2 amplitude with higher phasic pain intensity, suggesting that the evoked responses captured stimulus-evoked nociceptive processing.

      We acknowledge that these ERPs may still reflect mixed sensory processing and thus may not be fully modulated by tonic pain. Previous studies have shown that ERPs elicited by nociceptive electrical stimulation can be attenuated during tonic pain using cold-water immersion in CPM paradigms [3, 4]. However, these studies typically employ passive tasks, whereas our paradigm involved continuous voluntary behaviour during sustained tonic pressure pain. This difference in task context may engage distinct modulatory systems, possibly prioritising behavioural adaptation over sensory gating.

      We will revise the manuscript to acknowledge these factors and to encourage a more nuanced interpretation of the ERP findings in light of this literature.

      Second, additional control experiments are necessary to rule out alternative explanations. For instance, the authors are suggested to deliver phasic pain to the contralateral arm (e.g., at 1-2 Hz), which might also reduce action velocity. Similarly, tonic pain applied to the grasping hand should be tested to disentangle hand-specific effects.

      We are grateful to the reviewer for this suggestion. In the current study, phasic pain was delivered to the grasping hand to generate a coherent, spatially congruent representation of virtual stimuli (painful fruit) and behavioural consequences (pain upon grasp). Delivering phasic pain stimuli to the contralateral hand would be incongruent with the task design and may alter the interpretation of the learning signal, which was central to our computational modelling framework. Similarly, tonic pain was not applied to the grasping hand to avoid interfering with motor control. Applying tonic pain to the grasping hand would make it extremely difficult for participants to effectively grasp the hand controller, thereby complicating the interpretation of behavioural and neural measures. We will discuss these issues in the revision. Therefore, while we agree that such manipulations could be informative for future studies, they were not the focus of the current investigation.

      Reviewer #3 (Public review):

      Despite these strengths, the manuscript would benefit significantly from more precise definitions of key concepts and an overall clearer, more coherent presentation of its main arguments. The writing, in its current form, often presents claims that are too vague or insufficiently connected with the experimental findings. Moreover, certain aspects of the computational modeling and statistical analysis appear flawed or inadequately justified.

      We thank the reviewer for highlighting the need for clearer definitions and a more coherent presentation. In the revised manuscript, we will refine our definitions of key concepts and improve the presentation of hypothesised functions of phasic and tonic pain. As stated previously, we will clarify the extent to which our evidence supports these hypotheses. We also appreciate the feedback on our statistical analysis and computational modelling. We will address these points and provide the necessary clarifications and justifications in the revised manuscript.

    1. eLife Assessment

      This valuable study presents a mouse gastruloid system to generate successive waves of hematopoietic progenitors that in vivo would emerge during embryonic development. Although this newly revised manuscript has addressed some of the concerns raised during the first round of review, the study is still considered incomplete, as the claims are only partially supported. In particular, the claim of definitive wave hematopoietic progenitors being produced in the gastruloids, and their engraftment after transplantation, would benefit from further validation.